The mrSIR mathematical forecasting model

The mrSIR project was born in a spontaneous and completely voluntary way from the encounter between the world of mathematics and that of data analysis.

How did the project start?

Exploiting the data daily published by the Italian Civil Protection, Felice and Luigi first tried to apply the classical SIR epidemiological model to get a forecast of what might be the epidemic spread in the next future. However, they soon realized that this simple model was not able to properly capture some relevant aspects which turned out to influence the dynamics of the epidemic in the long time range. Therefore, they introduced a number of extensions to cope with issues such as lack of homogeneity, migration of people from severely contaminated areas to regions not yet affected. The resulting model was named mrSIR.

In the meantime, Paolo started publishing some informative web videos to provide simple explanations of the contagion data and make them accessible to everyone. During his research activity he came across the mrSIR model.

Starting from the beginning of April, a fruitful exchange of emails allowed the three researchers to share their expertise and ideas. A number of further refinements were thereafter introduced in the equations, which allowed the resulting model to attain a forecast accuracy greater than 90% over a time span of 8 weeks.

Short description of the model

We employ an extension of the classical SIR epidemiological model by splitting the population in five compartments.

S = Susceptible individuals

Healthy individuals who might catch the disease when exposed to infected people.

I2 = infectious (quarantined) individuals of class 2

Individuals who have contracted the virus and have been tested (e.g. by means of a throat swab) and diagnosed as active cases by the Istituto Superiore di Sanità. Depending on the extent of the disease, these people are quarantined or hospedalized, and therefore are not able to transmit the virus to other people.

R2 = removed individuals of class 2

Recovered or deceased people coming from the class I2.

I1 = infectious (circulating) individuals of class 1

Individuals who have contracted the virus but have not been tested (e.g. by means of a throat swab) and are therefore responsible for virus transmission. Among these infected people are, for example, asymptomatic, pauci-symptomatic or pre-symptomatic individuals, the latter being individuals who have not yet developed symptoms at a significant level to require a public health service control.

R1 = removed individuals of class 1

Recovered people coming from the class I1

Progress of the research activity

The evolutionary model splits Italy in four macro-regions and considers the effect of migration flow from one area to another. Furthermore, a term containing a delay time has been introduced to account for the period between infection, symptoms onset and detection of the disease.


By definition, the model is able to provide estimations of undetected, circulating infected people. Considering that these individuals are responsible for a great deal of the virus transmission, a knowledge of their dynamics turns out to be of primary importance to get information about the actual status of the epidemic spread. As is clear by looking at the forecast pictures in the dedicated webpage, this SIR model extension has been able to accurately detect the peak day of active cases over two weeks in advance.

Future work

The model may be easily scaled in terms of number of handled macro-regions. Consequently, it may be adapted to study the epidemic spread in each Italian region, as well as to cover European or extra-EU countries. However, the computational requirements heavily increase with the number of simulated regions.

Computational resources

Tuning all the model parameters in order to fit the observed data as good as possible requires a relevant computational effort. If the execution times needed to get a previsional projection are reasonably acceptable in the present configuration, scaling to more regions and/or adding new equations to refine the model would require more powerful computational resources with respect to the ones currently employed.