Epidemics and pandemics in maps – the case of COVID-19

ABSTRACT Epidemics and pandemics are geographical in nature and constitute spatial, temporal, and thematic phenomena across large ranges of scales: local infections with a global spread; short-term decisions by governments and institutions with long-term effects; and diverse effects of the disease on many aspects of our lives. Pandemics pose particular challenges to their visual representation by cartographic means. This article briefly summarizes some of these challenges and outlines ways to approach these. We discuss how to use the information usually available for telling the story of an epidemic, illustrated by the example of the 2019–2020 COVID-19 pandemic. The maps attached to this article demonstrate the discussed cartographic means.


Introduction
The cartographic representation of how diseases spread has a long tradition. Among the first who conducted studies about such a spread with respect to its spatial aspects was Valentine Seaman, who published two maps about the 1795 Yellow Fever Outbreak in New York City (Seaman, 1798; Figure 1a). Seaman mapped all deaths caused by the yellow fever in this area and was able to successfully relate these to the waste areas, which 'became the common receptacles of rubbish and filth of every description' (Seaman, 1798). Similar examples of maps depicting outbreaks of the yellow fever in some of the largest cities in the United States have been created by others (Shannon, 1981;Stevenson, 1965). Another well-known example is the map of the 1854 Cholera Outbreak, which was created by John Snow and shows the number of deaths near Broad Street in Soho, London (Snow, 1854a(Snow, , 1854bFigure 1b). While his study might suffer from some systematic shortcomings (Koch & Denike, 2009), it again demonstrates how important the visualization of a spread of a disease can be.
Current history shows that epidemics and pandemics are by no means only historical artefacts. The currently ongoing 2019-2020 COVID-19 pandemic had its origin in Wuhan, China, in November 2019. The SARS-CoV-2 virus, which causes the COVID-19 disease, is of zoonotic origin, which is likely a reason for why it did not spread very effectively at first. Later, in February 2020, the virus spread at a larger scale, which lead to many tens of thousands confirmed infections. 1 Due to the measures taken, the spread of the disease could be slowed down at this time. In March and April, however, a large number of infections were observed in Europe, and South and North America. By 26 April, more than 3 million infections had been confirmed worldwide, and there had been more than 0.2 million confirmed deaths (Dong et al., 2020). It is likely that the real numbers of infections and deaths are much higher, and history will show what share of the world's population will be infected by the end of the pandemic.
The effects of pandemics can be extensive. For instance, during the 1918-1919 Spanish flu pandemic about one third of the world's population had become infected, causing about 50 million deaths worldwide (these numbers are subject to a large uncertainty, and there are still many open questions about this pandemic; Taubenberger & Morens, 2006). This pandemic had major social and economic effects, some of which were still evident many decades later (Almond, 2006). The extent of the impacts from the current pandemic on both social life and economy remain to be seen. Many measures have been taken throughout the world in order to contain the disease, and timely information is needed to understand the effect these measures have. Such information is important for both experts and political leaders who take decisions, as well as for the general public.
According to their definitions, place is an important factor for both an epidemic and a pandemic. Porta (2014, p. 93), for instance, defines an epidemic as 'the occurrence in a community or region of cases of an illness, specific health-related behaviour, or other health-related events clearly in excess of normal expectancy', to then further emphasize that the 'number of cases indicating the presence of an epidemic varies according to the […] time and place of occurrence'. A pandemic, in turn, is defined as an 'epidemic occurring over a very wide area, crossing international boundaries, and usually affecting a large number of people' (Porta, 2014, p. 209). The main difference between an epidemic and a pandemic is thus mainly the spatial extent. The emphasis put on spatial aspects suggests that maps should prove particularly effective in providing information about the epidemic. In this article, we ask what are the challenges posed by the cartographic visualization of an epidemic or pandemic, and which cartographic techniques can be used to approach these challenges. After a brief review of related work (Section 2), we discuss the challenges and related cartographic techniques (Section 3). The techniques outlined are used to create maps for China, Europe, and the United States, which are enclosed (Section 4). Finally, the results are discussed along with directions of future research (Section 5).

Related work
There exist a number of systems to analyse and visualize epidemics and pandemics. Most of these systems are meant to visually convey the number of infections and deaths that can be observed in an epidemic, or information computed from these. For instance, these systems depict the spatial distribution of several aspects related to an epidemic using choropleth maps (e.g., Fonseca Nobre et al., 1997;Ford et al., 2006;Maciejewski et al., 2011;Robinson, 2007;Robinson et al., 2005Robinson et al., , 2011 or proportional symbol maps (e.g., Alonso & McCormick, 2012).
The distribution of such numbers has been visualized using histograms (e.g., Castronovo et al., 2009), and the visualization of several combinations of numerical aspects has been explored (e.g., Chui et al., 2011). Also, ways to depict several figures and aspects related to an epidemic over time have been explored, for instance, with respect to line charts (e.g., Alonso & McCormick, 2012;Cheng et al., 2011;Maciejewski et al., 2011), parallel coordinates using an axis for each temporal unit (e.g., Robinson, 2007;Robinson et al., 2005Robinson et al., , 2011, and 'temporal strips' consisting of a vertical line for each temporal unit (e.g., Alonso & McCormick, 2012;Chui et al., 2011). Other systems put more emphasis on the modelling of epidemics. These permit to understand the effect of certain actions and measures taken in simulating the spread of epidemics. Corresponding results are, in many cases, visually conveyed through maps and diagrams (e.g., Ford et al., 2006;Maciejewski et al., 2011). Systems to analyse the data are very different in the way they offer insights. Some of these systems have a major focus on the statistical analysis of the data in order to determine the characteristics of the spread of the epidemic. Others have a strong focus on the visual communication of the data, which facilitates drawing conclusions through mental reasoning. Reviews of such systems to analyse and visualize epidemics have been written by Carroll et al. (2014) and Chen et al. (2010). Numerous examples of visualizations can be found in the literature. For instance, Christakos et al. (2005) use isoline, vector, choropleth, and density maps in their book to discuss reasoning and modelling in the context of epidemics. Other scientific publications use well established techniques (such as bar charts, point diagrams, and line charts) and link these (e.g., Lessler et al., 2011). Further publications focus on spatial interaction in the context of network theory, which is commonly depicted by connectivity matrices and graphs (Guo, 2007;König et al., 2016). Spatial interaction often comprises spatial aspects, but topological characteristics of the interaction network are in most cases more important. When focussing on spatial aspects at a larger scale, maps and map-related techniques are often made use of. Dominkovics et al. (2011) discuss dot distribution and density maps as cartographic means in the context of the spread of a disease, and Carr et al. (2000) discuss box plot diagrams depicting the mortality rate for every spatial unit in combination with maps to convey the position of the spatial units. Others have paid more attention to non-traditional ways beyond classical cartographic means in order to visualize an epidemic independently of the analysis of the data. For instance, Karlsson et al. (2013) have explored donut charts as a means to depict how a particular aspect is distributed across different age categories, and heat map diagrams as a variant of the above mentioned temporal strips. Battersby et al. (2011) have explored the use of ensembles of squares, the colours of which represent several aspects of an epidemic, similar to heat map diagrams of one (nominal) dimension. These ensembles, in turn, were positioned in a circle around and linked to a choropleth map.
As has been highlighted by Olsen et al. (1996), maps and visualizations tend to be misinterpreted in the context of diseases. This applies, in particular, to the identification of spatial clusters, the estimation of absolute and relative numbers, and avoiding cognitive overload due to the complexity inherent to the data. Accordingly, there is a need to adapt techniques and software to visualize, in appropriate ways, spatial aspects of how diseases spread (Robertson & Nelson, 2010).

Methods
When making sense of epidemics, we usually have to rely on a limited amount of data. Among the data usually available are how many people are infected at which places and at which points in time. This is no coincidence, because an epidemic is mainly characterized by a rapid spread of a disease that affects a large number of people when the epidemic is fatal (Porta, 2014), thus stressing the importance of space, time, and the numbers of people infected, died, and recovered. Especially in the case of a pandemic, the spatial aspect concerns both local and global phenomena, because infections are passed on from one person to another, leading to both local and global centres of disease. The numbers concerned often extend over several orders of magnitude, which makes non-linear ways of representation necessary. For instance, the number of infections in Hubei, China, is almost two orders of magnitude larger than in its neighbouring provinces in case of the COVID-19 pandemic. The increase or decrease over time of the numbers concerned, in turn, varies greatly from region to region. Visualizing all three aspects (space, time, numbers) simultaneously is therefore key to understanding an epidemic.
The joint representation of place, time, and the number of infections is hard to achieve by traditional cartographic means only. This does not mean, however, that the relevant information could not be communicated well, because certain combinations of these three aspects are particularly meaningful in the light of the geographical phenomena involved. The spatial spread of the disease, for instance, relates to space and time; the temporal alignment (or lag) between the spread relates, in particular, to time and the numbers; and where new centres of an epidemic emerge, move in space, or disappear (with potentially only few people physically moving around) relates, in particular, to space and the numbers. Displaying such combinations can thus create a good picture of an epidemic or pandemic. Figure 2 shows some of the cartographic and other visual techniques to decode combinations of the aspects related to an epidemic. Choropleth maps, e.g., can be used to convey the percentage of the infected people (Figure 2a). If additional numbers, such as the relative number of deaths, shall be conveyed, two or more choropleth maps can be combined. This can either be achieved by colouring the boundaries of the spatial units used (Raposo & Robinson, 2016), thus making possible to convey one additional number, or through the use of a regular pattern (such as stripes) to alter between these maps, thus making possible to convey two or even more numbers. Choropleth maps are easy to read but combining several choropleth maps can be expected to decrease readability. Further, choropleth maps necessarily limit to relative numbers and percentages. To better understand the potential impact of people moving around and potentially infecting others as well as the demands on the healthcare system, it is advantageous to communicate absolute numbers. Diagram maps are one solution for conveying absolute numbers, which also makes possible to add further thematic information in the map. The diagrams consisting of disks, or semi-disks, as shown in Figure 2a, are particularly useful in the light of the fact that the numbers involved often extend over several orders of magnitude. Since the area of a disk and a semi-disk, respectively, is quadratic in the diameter of the disk, the numbers depicted are subject to a natural square root scaling. Despite this advantage, it should be noted that the differences in the areas of disks tend to be underestimated (Flannery, 1971). Such square root scaling is in contrast to the scaling used in choropleth maps. The colour schemes of the latter are often perceived as near-linear and are thus more suitable to more or less linear scalings (Bujack et al., 2018). For very large numbers, however, the semi-disks would hide much of the map content. One solution is to only show the border of the semi-disks, i.e., semi-rings, in case the numbers exceed a predefined range, possibly with the inside area displayed partly transparently. Since in the case of some epidemics only a small percentage of infected people die, there is usually a need to choose different scales for the two semidisks, as is the case in the example of the maps enclosed to this article. It might be argued that a pie chart should be used instead of two semidisks, because the number of those who die from the disease represents a proportion of the total number of people infected. In particular, this would even provide a better visual understanding of the mortality rate of an epidemic, which is defined as the ratio of deaths to persons at risk (Porta, 2014). The temporal lag between an infection and a possible death makes it, however, impossible to interpret the two numbers displayed by the semi-disks in their mutual context without further interpretation while the epidemics is still active and if the numbers do not reflect this temporal lag properly.
The temporal evolution of the numbers of infections, deaths, and recoveries over time is typically depicted in line charts. Another way of depicting the number of infections and deaths in a mutual context is to use these two numbers as axes of a diagram, as is shown in Figure 2b. The combination of the numbers of infections and deaths can then be indicated for each point in time and each spatial unit, with lines connecting the several points in time for one spatial unit. Accordingly, one curve is shown for each spatial unit. An advantage of this type of diagram is that the shape of the curves is able to uncover several aspects of the temporal evolution of the epidemic. First, the slope of the curve relates to the current mortality rate, which usually changes over time due to the temporal lag between infections and deaths. The change of the slope, secondly, shows whether the speed of the epidemic is accelerating, remaining constant, or even slowing down. If the slope is constant over time, this indicates that the current mortality rate does not change, which is usually the case if the number of new infections and deaths is constant. A curve that turns right over time indicates an acceleration of the epidemic, while a curve turning left indicates that the epidemic is coming to a halt. Thirdly, when breaking the curve into smaller equitemporal segments (e.g., for each day), an acceleration of the epidemic is indicated by an increasing length of the segments. In case of an epidemic slowing down, the segments become shorter again, which can make it impossible to display these properly. Fourthly, the curve can usually be expected to be smooth. Abrupt changes are often an expression of changed counting methods and similar phenomena, which can easily be detected this way. It should be noted that a linear scaling of the axes is advantageous here since a logarithmic scaling would make changes of the slope less visible and distort the end of the epidemic when compared to its start. The temporal evolution of these numbers can also be examined for several spatial units at the same time, as is shown in Figure 2c. The 'temporal  strips' make it possible to depict large differences over time much easier than if line charts were used (cf., Alonso & McCormick, 2012;Chui et al., 2011). While this technique allows one to easily compare the temporal lag of these numbers for several spatial units, it has two major drawbacks. First, the spatial relations between the (discrete) spatial units are not conveyed. Secondly, the numbers need to be normalized per spatial unit because the differences in the numbers between the spatial units exceed what is possible to convey visually in such a limited space in many cases.

Results
Three thematically related maps about the 2019-2020 COVID-19 pandemic are enclosed to this article (Figure 3, Main Map). In the context of this article, these maps are meant to illustrate by means of a practical example the discussed cartographic techniques for the visualization of an epidemic or pandemic. Beyond this, the maps are of interest for making sense of the particular pandemic depicted independently of the techniques outlined in this article. They provide a general-purpose and easy-to-understand overview of the COVID-19 pandemic in several regions of particular relevance, which can be used to visually convey the spread of SARS-CoV-2 and the COVID-19 pandemic in newspapers and online media, as well as in textbooks and scientific publications. Accordingly, the maps have been designed with a broad range of readers in mind.
The three maps enclosed focus on several regions. The first map focusses on China, where the COVID-19 pandemic started. The measures taken to contain the virus have been successful in the sense that by far most of the infections were reported in Hubei, meaning that the virus did not spread widely beyond this province. After several months, the virus appeared in Europe at a larger scale, which led to several new epicentres, among them Italy, Spain, France, Germany, and the UK. This spread of the virus in Europe is displayed in the second map. The third map, in turn, depicts the situation in the United States, which, at the time of writing, accounts for most infections and deaths worldwide.
All three maps enclosed make use of the techniques discussed in Section 3. For providing an overview of the spread of the pandemic, each of these maps contains a diagram map that depicts the number of infections and deaths in several regions (provinces, countries, and states, respectively), aligned by inset maps for Alaska and Hawaii in the case of the United States. Diagram maps have been preferred over the other means depicted in Figure 2a, because they seem to provide most clarity in this case. To include information about the temporal development of the spread, line charts (with absolute numbers) have been included, as well as diagrams depicting the number of deaths vs the number of infections, because this makes possible to understand better at which stage the pandemic currently is in the corresponding area (Figure 2b). Where appropriate, comments have been added to provide further context in terms of story telling (cf., Mocnik & Fairbairn, 2018). In addition, 'temporal strips' are included to provide a temporal context for each region, thus making possible to better compare the temporal lag between the different regions ( Figure 2c).
In case of China, the pandemic has progressed more rapidly and the effects are thus partly better understood. This is why inset maps of NO 2 concentrations at two different points in time have been added to the map of China, showing a significant decrease in atmospheric pollution under the quarantine measures taken. The same effect might be observed in the other areas displayed in the maps, but due to the shorter period the pandemic is present in these areas less data is as yet available.

Discussion and conclusions
The spatial and temporal development of an epidemic or pandemic are among the key aspects that need to be explored for understanding its effect on society, trade and economy, social life, and further areas. Corresponding visualizations pose challenges, in particular, in terms of the numbers involved, which extend over several orders of magnitude and make linear scaling approaches ineffective. In this article, we have outlined some of the aspects of an epidemic that are of particular interest for understanding its spatial and temporal development, together with corresponding cartographic techniques to visually convey these aspects. The maps enclosed make use of the techniques and provide good examples of how to visualize many of the aspects of a pandemic, in this case the 2019-2020 COVID-19 pandemic. In particular, the square root scaling in combination with the semi-rings makes possible to practically convey numbers that extend across more than two orders of magnitude.
Future research could focus on understanding better the currently available cartographic techniques, including the ones outlined in this article; improving these techniques; and extending these to better portray even more aspects of epidemics. In more detail, this could mean to, first, conduct empirical experiments to better understand how efficient the outlined cartographic techniques are and which prior experience or expertise is necessary for reading the map. The comparison of two related semi-disks included in the maps can provide the illusion that two such numbers could be visually compared in absolute terms (this is impossible because these numbers are scaled differently), but such comparison can also open up the possibility to make relative comparisons in terms of the difference between the ratio of these numbers across the diagrams. Such opportunities and limitations need to be examined in more detail. As there are several techniques available for communicating the same aspects, there is a need to choose among them. One might, accordingly, research which of these choices are most appropriate and in which context, as well as the parameters on which this decision depends. The results might be able to answer the question of whether there is, during the course of an epidemic, a change in which of the respective techniques are particularly suitable.
Secondly, the outlined techniques are able to handle numbers that extend over more than two orders of magnitude, which is important in the context of the strong local variation in the number of infections and deaths. Future research might push these boundaries even further to allow using the same scaling for the number of deaths and infections (even though these differ often by about a magnitude of two). In case of the enclosed maps, this issue has been mitigated by using different scalings for each map sheet. This can, however, confuse the reader. In addition, one might explore how to include absolute numbers into the 'temporal strips', which have been discussed in this article.
Thirdly, techniques can be explored to represent and convey aspects of an epidemic that go beyond what has been discussed here. Traditional cartographic means are restricted in the way they convey information. Compared to these, interactive means from the field of Information Visualization often render possible to display further details on demand, and they often link several maps and diagrams. When being able to combine such means with the techniques discussed here, it would be possible to display not only combinations of two factors (space, time, and numbers), but also all three of these. This could lead to an even more integrated understanding of epidemics. Even aspects beyond the spread of the disease can be included to a larger degree, for instance, information about the effect it has on the various geographical aspects of our lives and society.

Software
The geographical data used, including data from naturalearthdata.com, were prepared using QGIS 3.10 and exported as PDF files. The visualizations and diagrams were, for the most part, created using Vega Lite 4.10.2. In case Vega Lite did not offer sufficient possibilities to implement a visualization as intended, the visualization was created by the library svgwrite 1.3.2 under Python 3.7.7. The latter applies, in particular, to the diagrams shown in the map. The symbols used for airports and ports are part of Font Awesome 5.13.0. The maps and visualizations created this way, as well as all other materials, were formatted and arranged using Affinity Publisher 1.8.3. Note 1. As is usual for epidemics and pandemics, the number of confirmed cases is known and communicated, while a large number of infections go unnoticed.