# Estimation of the reproduction number and early prediction of the COVID-19 outbreak in India using a statistical computing approach

## Article information

## Abstract

Coronavirus disease 2019 (COVID-19), which causes severe respiratory illness, has become a pandemic. The World Health Organization has declared it a public health crisis of international concern. We developed a susceptible, exposed, infected, recovered (SEIR) model for COVID-19 to show the importance of estimating the reproduction number (R_{0}). This work is focused on predicting the COVID-19 outbreak in its early stage in India based on an estimation of R_{0}. The developed model will help policymakers to take active measures prior to the further spread of COVID-19. Data on daily newly infected cases in India from March 2, 2020 to April 2, 2020 were to estimate R_{0} using the earlyR package. The maximum-likelihood approach was used to analyze the distribution of R_{0} values, and the bootstrap strategy was applied for resampling to identify the most likely R_{0} value. We estimated the median value of R_{0} to be 1.471 (95% confidence interval [CI], 1.351 to 1.592) and predicted that the new case count may reach 39,382 (95% CI, 34,300 to 47,351) in 30 days.

## INTRODUCTION

Coronavirus disease 2019 (COVID-19) has rapidly spread worldwide, with 896,450 confirmed total new cases and 45,526 deaths globally as of April 2, 2020 [1]. The disease emerged as 27 cases of pneumonia with an unknown cause in Wuhan, China. The first COVID-19 case in India was identified on January 30, 2020, and the total number of reported cases reached 2,322 as of April 3, 2020 [2]. On March 3, 2020, the Indian government suspended all new visas and visas issued to nationals of Iran, Italy, Japan, and Korea, and on the next day implemented compulsory screening of all international passengers. The Indian government declared a countrywide lockdown for 21 days on March 24, 2020 as a measure to control the spread of COVID-19, which has developed into a pandemic. The transmission rate of COVID-19 has been relatively low in most countries, but with major outbreaks in a few countries, such as Iran, Italy, Japan, and Korea. Most countries have at least an early stage of COVID-19 spread before any mitigation measures have an impact [3]. Myers et al. [4] stated that accurate epidemic forecasting models would noticeably improve epidemic prevention and control capabilities. No vaccine is available for COVID-19, and vaccination is typically not a good option for stopping the spread of a new epidemic, as considerable time is required to develop a safe and effective vaccine (approximately 10 years) [5]. Li et al. [6] found that the COVID-19 incubation period was 5.2 days (95% confidence interval [CI], 4.1 to 7.0) and found indications that human-to-human transmission occurred among close contacts. India is the second most populated country, it is important to estimate the transmissibility of COVID-19 and to predict the total number of new cases, which will help direct focus towards this public health crisis. Mathematically based epidemic models, such as susceptible-infected-recovered (SIR) models [7], susceptible-infected-susceptible (SIS) models [8], susceptible-exposed-infected-recovered (SEIR) models [9], and susceptible-exposed-infected-recovered-susceptible (SEIRS) models [10] are used to predict the trajectory of epidemics. Estimating the reproduction number (R_{0}) can be estimated statistically or empirically. In this work, we used the earlyR (https://cran.r-project.org/) package to estimate R_{0} and predict the trajectory of the outbreak.

## METHODS

### Susceptible-exposed-infected-recovered-susceptible mathematical model

SEIR models can be used to predict the number of people infected based on R_{0}. We have given a SEIR model in this study to demonstrate the importance of estimating R_{0} [11]. COVID-19 has an incubation period, also known as a latent period or latent delay (τ), of 2-14 days. The following assumptions were made for developing the mathematical model for COVID-19.

- The population growth of the region/country is exponential, and the COVID-19 epidemic is occurring in a sufficiently short period

- Infected individuals are assumed not to give birth

- Recovered individuals acquire permanent immunity with a probability *f*(0 ≤ f ≤ 1) or die from the disease with a probability of (1-*f*)

With S referring to susceptible individuals, E to susceptible individuals that become exposed at time t-τ, I to individuals who are infected, and R to those who have recovered from COVID-19, the resulting differential equations are:

Where μ is the per capita death rate due to causes other than the disease, γ is the rate of contact (or) transmission rate (or) infection rate, α is the recovery rate, and *b* is the per capita birth rate (with *b*>μ).

At any instant,

R_{0} is defined as,

This constant is extremely important in characterizing the spread of COVID-19. It reflects how many people contract the disease from an infectious individual. In general, If R_{0}> 1, secondary infections will occur and the disease is spreading throughout the population. According to WHO information as of January 23, 2020, the R_{0} of COVID-19 lies between 1.4 and 2.5. R_{0} may vary considerably for different infectious diseases, but also for the same disease in different populations [12].

### Data

All the data shown in Table 1 were collected from an Indian official website [2]. The epidemiological data from March 2, 2020 to April 2, 2020, as shown in Table 1, were utilized to estimate R_{0}. A higher R_{0} indicates a higher likelihood of new infections.

### Model development

The transmissibility of COVID-19 in India was evaluated using the earlyR package. It was assumed that interventions so far have had a minimal impact on COVID-19 transmission in India. The model used herein is a simplified version of the model introduced by Cori et al. [13]. Serial interval distributions (i.e., mean and standard deviation [SD]) are required to estimate R_{0}. We assumed that the mean and SD were 4.7 days and 2.9 days, respectively, based on existing research [14]. The maximum-likelihood (ML) approach was applied to obtain the distribution of R_{0}. The bootstrap strategy was applied for re-sampling 1,000 times to obtain likely R_{0} values. The R package projection was used to predict the cumulative daily incidence [15]. We forecast the cumulative total new cases after 30 days. The daily incidence obeys a Poisson distribution determined by daily infectiousness, which is denoted as,

Where V (t-k) the vector of the probability mass function and X_{k} is is the real-time incidence at time k. The forecasting model depended on the present incidence and serial interval distributions. The projections were based on resampling and probability computations. The statistical analysis and model development were done using R version 3.6.3 (https://cran.r-project.org/bin/windows/base/old/3.6.3/).

### Ethics statement

The analysis in the article is based on data which is open to public. The article does not require the ethical committee approval.

## RESULTS AND DISCUSSION

Figure 1 shows the daily incidence of COVID-19 in India from March 2, 2020 to April 2, 2020. Figure 2 shows the distribution of likely values of the R_{0} of COVID-19 in India. We estimated the ML value of R_{0} as 1.471 (95% CI, 1.351 to 1.592) for COVID-19 in the early stage in India. Figure 3 shows a histogram of R_{0} values using the bootstrap strategy with 1,000 likely samples.

Figure 4 shows the global spread of COVID-19 during the same period. The vertical gray bars indicate the presence of cases and black dots denote the dates of symptom onset. The dashed vertical blue line indicates the current date (April 3, 2020). The vertical scale in Figure 4 shows the relative scale of infections. Figure 5 shows the predicted cumulative cases in next 30 days.

We computed that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The R_{0} data were estimated based on the existing COVID-19 data from March 2, 2020 to April 2, 2020. The Indian government has already announced a nationwide lockdown. As per the WHO information on January 23, 2020, the R_{0} of COVID-19 lies between 1.4 and 2.5. Our estimation indicates that for India, the median R_{0} value of 1.471 (95% CI, 1.351 to 1.592) is in the lower range. However, various studies have indicated that precisely estimating R_{0} is challenging, because R_{0} depends on environmental conditions, demography, and the modeling method. In our method, the accuracy of R_{0} depended on the premise that all cases of COVID-19 in India were identified in the study period. If the same scenario continues, we predict that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in next 30 days. We believe that our forecasting numbers may help in various aspects, such as developing the required medical infrastructure and focusing efforts on mitigating the economic impact of the pandemic. Our findings were derived based on a limited time frame, and the results may change after the occurrence of a considerable number of additional cases. The R_{0} value corresponding to the spread of COVID-19 can be controlled by strictly following social distancing in daily life, wearing masks, frequent hand-washing with soap or sanitizers, quarantining infected people, identifying cases using rapid diagnostic methods, and so on.

## CONCLUSION

We estimated the median value of R_{0} to be 1.471 (95% CI, 1.351 to 1.592) and predicted that the cumulative number of new cases may reach 39,382 (95% CI, 34,300 to 47,351) in the next 30 days. The predicted size largely depends on changes in R_{0}. Effective measures against COVID-19 will help to reduce R_{0}. The presence of numerous unidentified cases in the study period may result uncertainties in the estimated value of R_{0} used in the developed forecasting model.

## Notes

**CONFLICT OF INTEREST**

The authors have no conflicts of interest to declare for this study.

**FUNDING**

None.

**AUTHOR CONTRIBUTIONS**

Conceptualization: KK. Data curation: KS. Formal analysis: KS. Funding acquisition: None. Methodology: KK. Writing – original draft: KK. Writing – review & editing: KK, KS.

## Acknowledgements

None.