Given all the interest in modelling pandemics at the moment, I decided I’d write my own pandemic simulation. After all, it can’t be any harder than the botched job done by the Imperial College model.
Before we start, I need to point out that I know absolutely nothing about epidemiology. This will quickly become apparent once you see the results I get. The output of my model bears only a superficial resemblance to reality.
In my world, there are 100,000 people at the moment. Everybody goes out and meets twenty random other people every day. With respect to the virus, everybody is in one of three categories:
- susceptible – a person who has not caught the virus
- infected – a person who currently has the virus and is infectious
- recovered – a person who has had the virus and is no longer infectious. Recovered people are considered to be immune.
A virus in my model has three properties:
- the number of days it takes for a person to recover from the virus
- an asymptomatic period which should be shorter than the number of days to recover. During this period an infected person is infectious but not showing symptoms. This is intended for use when modelling social distancing.
- the probability of transmission from an infected person to a susceptible person they meet.
I picked parameters for my first virus that approximate the numbers for the COVID19 virus. People are ill for fifteen days – exactly fifteen days (this is obviously a bit unrealistic). I chose the transmission probability to get an R of approximately 3. This was done on the assumption of meeting twenty people per day for fifteen days, hence why it is quite low.
The algorithm is as follows:
For each infected person, we pick twenty people at random from the entire population. This is done by generating a random integer in [0,p)
where p is the size of the population. If the random number is less than the number of susceptible people, we consider the randomly selected person to be susceptible and we determine whether they get infected by selecting a random floating point number in [0, 1]
and comparing it to the transmission probability.
For each infected person, we maintain a tally of the number of other people they have infected so that we can calculate R.
Every day, the stats for the day are dumped into a SQLite3 database, so they can be analysed. Here are the results for a run of the program on a population of 100,000 people.

The R value is calculated as the mean of the number of people infected by the people who recovered over the previous seven days. The early erratic nature of the line is probably due to a very small sample size. In this case, the first person to recover happened to infect five other people.
Interestingly, when the virus dies out, 5.9% of the population have still not had it. The infection rate starts to decline significantly at about day 60 when 74% of the population is still susceptible and infections themselves peak at day 74 when 23% of the population are still susceptible. R goes below 1 eight days later. This is probably an artefact of the seven dy window used to calculate it.
The next step is to introduce an asymptomatic period to see what that does to the curve. The idea would be that the number of people met while asymptomatic remains the same but is severely reduced after the asymptomatic period ends. I also need to change the way recovery is determined because some people recover sooner than the mean and some after the mean but the distribution is quite skewed which might explain why real pandemics seem to have a long tail after the peak.
Links
The repository: https://bitbucket.org/jeremy-pereira/epidemickit/commits/tag/blog-939
The spreadsheet: https://www.icloud.com/numbers/0j7qiGJzxywVb-ouvp7qWeO5A#simple%5F20200523%5F1525
One thought on “Pandemic”