COVID 19 Forecasting using Bidirectional-CNN Stacked LSTM
Coronavirus disease 2019 (COVID-19) is an infectious disease that emerged in China in December 2019 and has affected the whole world. On 30 January 2020, WHO declared it to be a Public International Health Emergency. The Coronavirus cases are still increasing in India also. We need to prepare forecasting models to assess the situation in the future, which can help in making the right decisions, concrete plans of action and restraining similar epidemics in the future.
I have done comparison between statistical and deep learning methods for forecasting of COVID 19. Statistic methods are known to be decent for epidemic forecasting, so we will try to compare performance of LSTM Models on the same data.
Dataset
The dataset is extracted from the Github Repository of John Hopkins University USA . It is shrinked to data for indian regions. The time period of training data is from 30/01/2020 to 10/05/2020, and the test data from the time period 11/05/2020 to 31/05/2020.It consists of two columns: Date and cumulative cases on that day.
Methods
LSTM
A commonly used prediction strategy for time-series data (e.g. the epidemic data of daily cases that we consider here) is recurrent neural networks (RNNs). Specific types of RNNs that could provide robust prediction are LSTMs (Long Short-Term Memory units). LSTMs are able to recognize temporal patterns in time-series data that are then used in the prediction. Typical RNN uses information from the previous step to predict the output. But if only the previous step is not enough, that is long-term dependency. If we use RNN using all previous steps, the explosion/vanishing gradient problem is encountered. LSTM can solve this problem because it uses gates to control the memorizing process.
It has three gates: (See Fig 2)
● Input
● Forget, and
● Output
These gates are deciding the information to keep and discard according to its importance by using sigmoid and tanh activation.
Different LSTM Architectures
- Standard Vanilla LSTM
This LSTM architecture was designed into 4 layers: an input layer, an LSTM layer (hidden layer), a fully-connected layer, and an output. Each LSTM Layer have n neurons, and the activation function was ReLU. The loss function was MSE, and the optimizer was “Adam”.
2. Bidirectional LSTM
A Bidirectional LSTM, or biLSTM, is a sequence processing model that consists of two LSTMs: one taking the input in a forward direction, and the other in a backwards direction. BiLSTMs effectively increase the amount of information available to the network, improving the context available to the algorithm (e.g. knowing what words immediately follow and precede a word in a sentence).
3. CNN LSTM
Convolutional Neural Network(CNN) are particular Deep Neural Network(DNN) based on the concept of weight sharing so that weights does not have to be as large as that for a fully connected structure.
CNN contain generally four levels in structure: an input layer, convolutional layers, pooling layers, and fully connected layer (output). The convolutional layer is the most important part of a CNN, in which the input is convoluted with several and each filter represents a smaller matrix and corresponding feature maps can be obtained after convolution operation.
The pooling layer gives summary statistic of the nearby outputs such as max pooling and average-pooling
a. Max Pooling - maximum of a rectangular neighborhood
b. Average Pooling- the average of the rectangular neighborhood.
The convolutional and pooling layers are generally used to extract features, and then one or more fully connected layers are usually adopted after one or more groups of convolutional and pooling layers. The fully connected layer can put the information from feature maps together, and then output them to latter layers. CNN is good at reducing frequency variations, so they are complementary in their modeling capabilities. This is used as an idea for CNN LSTM.
4. Stacked CNN - Bidirectional LSTM(Best)
This is a stacked LSTM model which consists of CNN, Dropout, Bidirectional and Dense Layer. The model takes advantage of feature detection ability of CNN and contextual boost of bidirectional LSTM to discover complex pattern easily. Dropout Layer is used for avoiding overfitting of the model. Dense Layer finally summarizes all the information learned by the model.
Evaluation Criteria
The Root Mean Squared Log Error (RMSLE) can be defined using a slight modification on sklearn’s mean_squared_log_error function, which itself a modification on the familiar Mean Squared Error (MSE) metric.
Results
The stacked LSTM performed quite better than SEIR , Polynomial Regression, Vanilla LSTM,CNN LSTM and Bidirectional LSTM.
Vanilla LSTM model performance can be improved on adding dropout layers but still the stacked model outperform it.
RMSLE of Polynomial Regression 1.75
RMSLE of SEIR: 1.52
RMSLE of Vanilla LSTM: 0.542
RMSLE of Bidirectional LSTM :0.2
RMSLE of CNN LSTM :0.12
RMSLE of Stacked CNN Bidirectional LSTM :0.00925
Conclusion
Our model predicted very well on the dataset using stacked LSTM approach. So the deep learning frameworks like LSTM can perform very well on the when optimized hyperparameters are used. The various features like lockdown, recovery rate and vaccination rate can further boost the accuracy of the model.
References
[1] RAJAN GUPTA,GAURAV PANDEY and POONAM CHAUDHARY,SAIBAL K. PAL — Machine Learning Models for Government to Predict COVID-19 Outbreak
[2] Fenglin Liu ,Jie Wang,Jiawen Liu,Yue Li,Dagong Liu- Predicting and analyzing the COVID-19 epidemic in China: Based on SEIRD, LSTM and GWR models
[3] Hafiz Tayyab Rauf, M. Ikram Ullah Lali, Muhammad Attique Khan, Seifedine Kadry, Hanan Alolaiyan, Abdul Razaq & Rizwana Irfan
[4] CSSEGISandData/COVID-19: Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE (github.com)