Connective data: Markov chain models and the datafication of cervical cancer and HPV vaccination in Colombia

ABSTRACT This article explores the role of statistical modeling in the production of sound epidemiological objects in a context in which data are perceived as fragile and precarious. It analyzes the use of Markov chain modeling in the evaluation of cost-effectiveness of HPV vaccines in Colombia. Markov chain modeling has an important role in producing a “national” cohort of women in which it is possible to test the “virtual” effectiveness of HPV vaccines, transforming information that comes from international literature into national and local data. This modeling device plays a key role in the datafication of cervical cancer and HPV vaccination. The datafication of disease and populations allows connecting the calculative domains of epidemiology and economic valuation. Drawing on studies on data production, statistics and modeling I examine the role of statistics and data dynamics in the making of HPV vaccination as a public health matter. This paper offers a juxtaposition of two analytical domains entangled by the Markov chain. On the one hand, it presents an analysis of a set of technoscientific practices showing where and how a statistical enumerated entity that purports to represent cervical cancer becomes a socio-politically agential entity in the institutional life of the State. On the other, the paper discusses how social difference is left out the model. The exclusion of social difference from data production is coherent with the socio-political workings of Colombian healthcare and vaccination policy.

Datos conectivos: Módelos de cadena de Markov y la datificación del cáncer de cuello uterino y la vacunación contra VPH en Colombia RESUMEN Este artículo explora el papel de la modelación estadística en la producción de objetos epidemiológicos coherentes en un contexto en el que los datos son percibidos como frágiles y precarios. Analiza el uso de los modelos de Cadena de Markov en los estudios de costo-efectividad de las vacunas contra VPH en Colombia. El modelo de Cadena de Markov tiene un papel muy importante en la producción de una 'cohorte nacional' de mujeres sobre la cual es posible correr un test 'virtual' sobre la efectividad de las vacunas, transformando información proveniente de literatura internacional en datos nacionales y locales. El ejercicio de modelación es clave en la datificación del cáncer de cuello uterino y en la vacunación contra VPH. En diálogo con los estudios sobre producción de datos, estadísticas y modelación, examino el papel de las estadísticas y otras dinámicas de datos en la producción de la vacunación contra VPH como asunto de salud pública. Este artículo ofrece una yuxtaposición entre dos campos analíticos enredados en la cadena de Markov. Por una parte, presenta un análisis de un conjunto de prácticas tecnocientíficas mostrando donde y como una entidad estadística que pretende representar el cáncer de cuello uterino se convierte en una entidad agencial sociopolítica en la vida institucional del Estado. La exclusión de esta diferencia de la producción de datos es coherente con el funcionamiento socio-político del cuidado de la salud en Colombia y su política de vacunación.

Introduction
HPV vaccination in Colombia is a controversial issue. After an outbreak of adverse effects in the town of Carmen de Bolivar in 2014, HPV became a matter of contestation between public health authorities that support the vaccination as secure and effective, and thousands of parents that have lost trust in medical authorities and the vaccine. The drop of vaccination rates has affected the political legitimacy of this intervention and its costeffectiveness. This paper aims to discuss an invisible element that had an important role in connecting these realms, now in dispute, in policymaking: the Markov chain model of HPV natural history in a cohort of Colombian women.
Algorithms built on Markov models have also been extensively used in health economics to estimate monetary and health costs in the development of a disease, allowing health economists to compare the impact of different health technologies at different stages of a disease's development. This algorithm is often portrayed as a "virtual" clinical trial that produces different scenarios for each technology and procedure that is compared. This simulation has been introduced into the frameworks of evidence as an alternative to randomized controlled trials -RCTin cases in which ethical, political and legal restrictions limit their design and application. A Markov chain can be understood as an interface between epidemiological and economical valuations of healthcare; through this calculative device diseases, costs, and human lives are entangled and expressed in health currencies.
This algorithm is also used in cost-effectiveness analysis to simulate health outcomes of diseases whose evolution may involve different stages. This tool has been extensively used to compare HPV vaccines and cervical screening programes in order to calculate the cost-effectiveness of these technologies (Viscondi et al. 2018). Such simulation relies on a set of assumptions about population composition, HPV infection prevalence, cervical screening coverage and the vaccines' effectiveness that are made explicit in the model design.
This analysis approaches Markov chains as a device that is key in the calculation of the cost-effectiveness of HPV vaccines and in the estimation of the burden of disease of cervical cancer and genital warts in Colombia. Cervical cancer is strongly associated with the persistent and untreated infection of specific types of Human Papillomavirus (HPV). There are currently two vaccines that protect against the HPV types associated with 70% of cervical cancers. The technical studies developed in the introduction of HPV vaccines have justified it as right intervention based on the national epidemiological profile of the population and the burden of the disease in the country. After three years of debate about their cost-effectiveness, in 2012 the Colombian government introduced Gardasil ® (Merck's HPV vaccine) into the Colombian Expanded Programme of Immunization.
Since that year, three million girls have received the vaccine in Colombia. The National Committee of Immunization Practices (NCIP) on behalf of the Ministry of Health approved the introduction of Gardasil into the expanded programme of immunization based on cost-effectiveness analyses developed by the National University. Markov chains constitute the key statistical device to produce the numbers that are required for policy discussion.
Markov chain simulation had an important role in assembling a specific "national" portrayal of cervical cancer dynamics. This assemblage involves data from different locations that can hardly be considered as representative of the "Colombian female population." Colombia is a country of highly diverse regions in terms of ethnicity and race. Such variability has consequences in the definition of a "Colombian" epidemiological profile.
The cervical cancer "suffered in a population" is ontologically distinct from the cervical cancer suffered by individuals who make up a people. Modeling assumptions differ from the representations about population, vaccine and disease that circulate in the political and public discourses about the development of cancer, the vaccines' effectiveness and the characteristics of prevention. The Markov algorithm makes the uncertainty and the complexities of the relation between HPV and cervical cancer explicit, rending HPV visible as a necessary but not sufficient cause for the development of cervical cancer. In contrast, media, public and campaign narratives about cervical cancer and HPV vaccines present a causal and unidirectional process between disease and prevention.
If one lays out the assumptions on which the algorithm is based a story emerges which names the particular contingencies, uncertainties and complexities in HPV vaccination. Conceptual relations are worked in a Markov chain process whereby particular sets of categories that are familiar in epidemiology and health economics become embedded in chains of relations effected between further categories. Setting these down as a processual sequence reveals a story out of the internal conceptual design of "a particular Markov chain population": a statistical enumerated entity.
Drawing on a textual analysis of technical reports and medical literatures, interviews with epidemiologists, medical statisticians and members of the national committee of immunization practices and participant observation in medical statistics training, I reconstruct the use of Markov algorithms in the definition of HPV vaccines as a cost-effective and epidemiologically pertinent strategy of public health in Colombia. Cervical cancer modeling, I argue, has meaningful consequences in the production of new entanglements between individual and social histories of the disease. In many ways this exercise can be understood as reverse engineering. I have traced papers and documents that are quoted as references to support particular claims and data, and I have re-enacted some calculations in order to understand the origin and use of some of these results. In classic STS terms, I have been opening the black box of technical decision making. Within this set of documents, institutions and collective agents, one technical committee, its minutes and memoranda have demanded special attention; the National Committee of Immunization Practices (NCIP), which has the responsibility of assessing the findings of the technical studies produced by the Universidad Nacional. Its deliberations are transcribed in minutes which become the official voice of the committee, enacting this agent as an independent entity. These minutes are the basis for writing official memoranda that will be read by the Ministry of Health and Congress.
Regarding the analysis of algorithms and models, I have understood them as material semiotic entities that are developed in what could be described as a three-step process.
First the relevant entities are sorted out, detached, and displayed within a single space. Note that the space may come in a wide variety of forms or shapes: a sheet of paper, a spreadsheet, a supermarket shelf, or a court of law, all of these and many more are possibilities. Second, those entities are manipulated and transformed. Relations are created between them, again in a range of forms and shapes: movements up and down lines; from one place to another; scrolling; pushing a trolley, summing up the evidence. And third a result is extracted. A new entity is produced. A ranking, a sum, a decision. A judgment. A calculation. And this new entity corresponds precisely tois nothing other thanthe relations and manipulations that have been performed along the way. (Callon and Law 2005, 719) In principle, the natural history of cervical cancer describes and predicts the development of this malady through different stages in a "typical" individual; these are changes that happen in an individual body. However, through epidemiological modeling this dynamic is extended to populations using cohorts. A Markov chain simulation based on cohorts allows epidemiologists to produce material and semiotic connections between individual bodies and the social body, the population.
Markov chain algorithms and in general the calculation practices of epidemiology tame data, making their context of production invisible. Living bodies are translated into classification categories and numbers (Adams 2016;Moreira 2012). These data disentangle the differences (intersections between race, gender and class) that shape them. This paper reflects on two analytical domains entangled by the Markov chain. First, it analyzes the technoscientific practices of modeling as a statistical enumerated entity, that purports to represent cervical cancer, showing how it becomes a socio-politically agential entity in policy. In this article I describe the ways in which the Markov chain simulation as a calculative device produces "ideal" cohorts of populations. Cohorts that are diverse in epidemiological and statistical senses but socially undifferentiated. Second, the paper shows the ways in which the Markov modeling of cervical cancer operates as an embodied and experiential scientific entity, intervening as an element in the socio-political workings of HPV vaccination in Colombia. I show how medical science and its known objects, work as an element in the political culture of the nation.

Algorithms, models, statistical practices and governance
Different authors have noted, calculation and numerical operations play a key role in contemporary governance, promising objective, unbiased and reliable information for rational and trustworthy policy making (Hacking 1975;Porter 1995;Ashmore, Mulkey, and Pinch 1989;Espeland and Stevens 2009;Mackenzie 1981;. These works argue that numbers and quantification have had a deep impact in modern politics. Figures and quantified expressions constitute a powerful rhetorical resource in political debates and in the enactment of policies (Jasanoff 2003;Keating and Cambrosio 2012). One of the most important transformations of contemporary policymaking has been the integration of cost-benefit analysis and cost-effectiveness analysis in public decisionmaking, as a substitute for pure ad-hoc decision-making (Porter 1995, 222). Porter notes such practices have had a great impact not only because of their rationality but their impersonality (Porter 1995, 227). The increasing use of calculation and numerical methods in social and policy arenas is related to their capacity to reduce potential controversies in political processes through the translation of these issues into a numerical scale.
STS has a well-established tradition on the analysis of statistics and scientific visualization. Some works have highlighted the role of visual objects such as graphs, charts, models, maps, and other representations, as end products, in making data meaningful and in enabling particular modes of understanding and seeing data (Latour 1987;Carusi 2012;Dumit 2012;Lynch and Woolgar 1990). These works have shown the ways in which scientific visual objects privilege specific forms of seeing and interacting with the world (Haraway 1997).
Recent works have critically approached the practices and performativity of data, information, and the statistics behind such visualizations. Some scholars have explored the relations and challenges that shape data production (Edwards et al. 2011;Fujimura and Rajagopalan 2011;Gitelman 2013;McNally et al. 2012;Levin 2013). Others have focused on the organization, movement, use and reuse of data with databases in the production of trials (Bauer 2008;Hine 2006). Scholars such as Levin (2013) and Coopmans (2011), for instance, have focused on the social practices of statistics' production and use, particularly in the ways in which "data and statistics are practiced, performed, and negotiated in everyday settings" (Levin 2013).
Other work has approached the ways in which statistics and models are practiced, contested and re-enacted outside the laboratory, particularly in policy and decision making scenarios (Mansnerus 2013;Mackenzie 2013;Verran 2012a;2012b). Some works have studied the role of modeling and simulation in epidemiology alerts in healthcare governance (Bauer 2008;Mansnerus 2013;Mackenzie 2013;2014). Statistical models have allowed healthcare managers to calculate risks and to device detailed descriptions of future scenarios. These scenarios interfere with the current practices of management of healthcare and with the distribution of economic and symbolic resources. Decisions about procedures and actual interventions are done based on calculations about their sustainability and the impact in future budgets.
On the other hand, models have provided a way of synthesizing, extrapolating and harmonizing data from different sources producing coherent and sound versions of medical and policy objects (Bauer 2008). Simulations have rendered computers "as substitutes for events in the world, and they render that world more manipulable by knowing subjects" (Mackenzie 2013, 14). Adrian Mackenzie (2013Mackenzie ( , 2014 has noted that Bayesian statistics and simulation techniques such as the Monte Carlo Markov chain are transforming the understanding of probability and therefore our representation of uncertainty, chance and change. "Certain shifts in the role played by probability change the meaning and value of data as such, and hence, everything that depends on data" (Mackenzie 2013, 20).
In the same fashion, Verran provides an account of numbers transformation in her analysis of the impact of evidence based policy repertoires in the transformation of calculations to define environmental policies in Australia (Verran 2012a, 112). Consultants within evidence framework generate calculation and indexical numbers that are used in policy intervention; however, the liveliness (indexicality) of those numerical entities is not kept "alive." Measurement and values become iconic in favor of calculation based on efficiency and customer tailored solutions. "Those indexes that had ephemeral life as the consultants hurried about doing this and that, inventing categories and assembling values, come to be visible as icons, but in this glare of visibility, the enumerated entities are rendered lifeless" (Verran 2012b, 68).
This article aims to show the "liveliness" of statistical modeling and the ways in which these entities interact in the practices and decisions of the National Committee of Immunization. This study relies on approaching Markov chain modeling of cervical cancer as a device of calculation and on understanding numbers and quantified expressions as material-semiotic objects (Verran 2012a(Verran , 2012b. This approach pays attention to material and discursive practices in which data emerge (Coopmans 2011;Levin 2013) and the role of statistical objects in the production of qualities, capacities, and values in policy. In this case the Markov chain algorithm produces connections between realms that are conflictive from the perspective of public decision making such as cost-effectiveness and healthcare provision, international evidence and national specificity. A Markov chain is a device, a number generator, that allows establishing new connections between biomedical objects and economics in health policy. A focus on devices provides a methodological strategy to reconstruct data and statistic practices. This analysis aims to contribute to the STS literature on datafication and quantification by exploring the use of these calculation devices in issues of global health in the Global South, and by bringing our attention to the devices that make possible the connections between healthcare and economics.

Studying calculation in practices, looking at devices in health policy
The Markov chain model is a probabilistic method, used often to describe the ways in which objects and systems change over a particular time. It represents random processes that change over time, known in statistics as stochastic processes (Bonacich and Lu 2012, 149). This kind of algorithm has had an important impact in medical decision analysis. Markov chains have been extensively used to simulate the development and the progression of chronic diseases. Medical statisticians perceive that Markov chains suit the dynamics of diseases that change by states. "The disease in question is divided into distinct states and transition probabilities are assigned for movement between these states over a discrete time period known as the Markov circle" (Briggs and Sculpher 1998, 399).
Although Markov chains have a long history as a probabilistic technique, it was in the 1980s that it became a tool integrated in medical decision making. The increasing use of the technique is linked with the rise of information technologies in healthcare institutions, with the digitalization of clinical and administrative data and with the development of software that has facilitated statistical and mathematical operations (Sonnenberg and Beck 1993, 222). The Markov chain model allows recreating medical and epidemiological entities as processes in which an "agent" (patient, population) is always in one of a finite number of possible states. These states are called Markov states. For each state a utility is assigned; the contribution of this utility to the complete process depends on the length of time spent in such state (Sonnenberg and Beck 1993, 224). A utility or a rate in this case is "an instantaneous likelihood of transition at any point in time, whereas a probability is the proportion of a population at risk that makes a transition over a specific period of time" (Briggs and Sculpher 1998, 402). In the case of Markov chains there are at least two strategies of simulation: Cohort and Monte Carlo. Both strategies run the model through many cycles to define a "profile" of how many patients are in each state of the model over time (Briggs and Sculpher 1998, 402).
In a cohort simulation, populations are the units of analysis. A cohort of individuals moves through different stages of the model following a set of probabilities or utilities of transition. The objective of this algorithm is to determine exactly what proportion of the cohort is in which state at a given time (or model cycle). In contrast, in the Monte Carlo simulation, rather than simulate the dynamics of a whole cohort of patients through the model together, a large number of individuals or patients are followed through the model individually. Both cohort and Monte Carlo simulations involve the same probabilities of transition. However, as Briggs and Sculper have noted the main difference between these methods is that in the Monte Carlo simulation "an individual patient can only be in 1 stage at a given time, they may or not transit between stages in any given cycle" (Briggs and Sculpher 1998, 407).
As Mackenzie has noted Cohort and Monte Carlo simulations enact different epidemiological objects; such objects are embedded into political narratives about disease, risk and population control. Cohort simulations are used to produce demographic units that generally correspond to political classifications such as cities, departments and countries (Bauer 2008). In contrast, the Markov chain Monte Carlo (MCMC), as Adrian Mackenzie has noted, is related with the post-demographic power attributed to data by information experts, politicians and policymakers, in which individuals rather than populations or subpopulations are the main objects of interest (Mackenzie 2013, 2).
In this analysis the concept "device" occupies a central place in the description of orderings, practices and transformation of entities that algorithms make possible. Singleton and Law (2013) have pointed out that the use of "device" in STS has been widely metaphorical, describing machine-like entities that perform specific transformations in other entities. They argue that although many devices are in fact artifacts, not all the devices are necessarily machine-like. For them, a device, "can be understood as a set of implicit and explicit strategies that work more or less repetitively to order, sort, define and arrange a heterogeneous but relatively discrete social and material field" (Singleton and Law 2013, 260). Muniesa, Millo, and Callon (2007) define devices as objects with agency that can help (minimalist version) or force (maximalist) the production of particular arrangements. Devices act or make others act. The distributed agency of these arrangements is the result of the interactions and relations that emerge in the encounter of these entities (Muniesa, Millo, and Callon 2007, 3).
The study of this calculation device is an exercise of reverse engineering. This analysis demanded to follow the production and circulation of information between documents, tracking the transformation of data and numbers, their disentanglement from the calculation spaces in which they are produced and their re-entanglement in new texts by different actors. The Markov chain facilitates connections between natural history, epidemiology and health economics to produce relatively sound objects from data perceived as precarious. I have traced papers and documents that are quoted as references to support the model's parameters and I have re-enacted some of these calculations in order to understand the origins and uses of these results. Additionally, I have done interviews with the epidemiologist that designed the model and with the members of the national committee of immunization who used the modeled scenarios to make decisions about the introduction of the vaccine.
In what follows I describe two processes. Firstly, I show how the Markov chain algorithm entangles the natural history of cervical cancer with the production of nationally situated populations, transforming information from international literature into national and local data. Such data make HPV vaccines the right intervention for the local and national epidemiological problems that policy purports to tackle. The final section in the paper shifts from storying the processes of the making of population through a Markov chain process to storying the political discourse by which publics are effected. I describe the political process in which the vaccination programme is developed and the position of modeled entities in policymaking scenarios.

Markov chain modeling of cervical cancer: virtual trials and the taming of clinical complexity
The National University, using a Markov model, simulated the natural history of cervical cancer and genital warts in a hypothetical cohort of 430,859 Colombian women. These 430,859 virtual women were generated by the algorithm. The parameters were defined from a literature review of national and international studies (UNAL 2011). The Markov chain fits in the ways in which medical science traditionally has represented the development of cervical cancer. This malady has been understood as a disease whose stages are clearly identified. Changes in cervical tissue are organized in four types of cervical intraepithelial neoplasia (CIN) according to their severity, from CIN I that is related to minor damage to CIN IV to metastasis (Benedet and Pettersson 2003, 1). These different stages will define the Markov states of the model.
In the technical reports (UNAL 2009; UNAL 2011) the model is represented through a flow diagram that describes the transitions between states and the different paths that the cohort may follow. The model recreates in silico the dynamics that a cohort would follow in vivo during a clinical trial. The National University team developed two studies of cost-effectiveness; both were supported on Markov simulations. In those studies, Markov chain algorithms can be understood as a systematic review in movement. Its parameters are selected from an evaluation of technical and scientific literature. Such review does not correspond necessarily with the standards of evidence-based medicine for systematic reviews but in principle it is assumed that it should follow the "best" available evidence. Papers contribute data about demographical composition of the cohort that is simulated; moreover they provide information about incidence and prevalence of HPV infection, cervical lesions, genital warts and cancer. Finally, the literature contributes information about the probabilities of transitions between states in the development or healing of the disease. Such probabilities connect the data and the rates defined as parameters and make the model dynamic. Figure 1 is a graphical representation of the Markov model used by the Universidad Nacional in its cost-effectiveness studies. This diagram is organized in three levels: firstly, at the top is the sequence of possible states in which the individuals of the cohort can be: normality, contagion without lesion, reversible lesion, pre-cancer, cancer and healing/death. Below this level, different stages of development of the diseases are rendered. For instance the infection with low risk HPV types can lead to genital warts, whilst infection with high risk HPV lead down a path in which cancer is a possible ending point. During the different stages death is a possibility. In low risk stages the probability of death is the same as that of the general population; in stages in which the risk is higher the probability of death increases. The third level, at the bottom of the diagram, encompasses different stages of detection and treatment and their effect in the transits of the simulated cohort on the possible paths.
Epidemiological data from biobanks, registries and cohort studies are constantly reused and re-assembled to construct new objects. Despite the diversity of sources that support the model, these different papers are the result of a reduced number of studies and share the same socio-material origins. For instance, the papers about HPV and cervical cancer in Colombia are the result of the Bogotá's Cohort study developed between 1997 and 2007; the articles on bivalent vaccine are the result of the clinical trial PATRICIA (GSK) and on tetravalent vaccine from the RCT FUTURE (Merck).
Many of these studies share the same funding sources. Merck and Glaxo SmithKlein have funded many of the local studies used in the model, as part of their strategy to recruit local populations and experts in the development of big international clinical trials for vaccines licensing. According to the epidemiologists involved in the model, the same women that participated in the Bogotá were enrolled in the phase III and IV of FUTURE. The populations recruited have developed a close relation with the researchers, who are perceived as healthcare providers. At the same time the researchers have used their connection with these groups as an asset in their negotiation with pharmaceutical companies in relation to funding and participation in bigger studies and trials.
This heterogeneous assemblage of papers, figures and institutions has an important role in the representation of populations and in their governance. A detailed reading of the assumptions of the model shows the ways in which it makes different accounts and distributions of multiplicity/homogeneity, visible/invisible risk and inclusion/exclusion. The model has the intention of recreating the development of the disease according to the specificities of the Colombian epidemiological profile. Such specificity relies on the origin of input data, the location of data and other technical sources. As noted in the 2011 study: "The parameters of incidence, prevalence and mortality by cervical cancer were defined through a review of Colombian literature, governmental and clinical databases, such as DANE, 1 Cali Cancer Demographic Register, Colombian National Cancer Institute and IARC" (UNAL 2011, 25).
Nevertheless, in the moment of assembling the model the limitations of gathering and using "Colombian" data as the main source of evidence becomes clear. Who designs the model identifies problems and "bias" in the register of the information (Adams 2016). These issues are solved through the use of other calculation devices such as correction methods. For instance, mortality databases were assessed and corrected using the Bennett-Horiuchi method: "In order to correct bias by wrong classification of deaths by cervical cancer this method was used: Deaths by cervical cancer: deaths registered by cervical cancer + α*deaths by cervical cancer not specified + β* deaths by corpus uteri cancer: α = 0.9 and β = 0.3" (UNAL 2011, 25).
However, the main limitation is the lack of data. These reports (UNAL 2009;2011) note the lack of national information about the prevalence of cervical cancer and HPV infection. The "National epidemiological profile" is rendered an extension of the data provided by the Cancer Register of Cali and the Cohort Study of Bogotá. Even more importantly, in some cases there are no national data about the disease in the model. As it is noted by one of the experts in charge of the algorithm: In all these studies because of the lack of epidemiological data there is a long and careful process to construct scenarios of morbidity, mortality and loss of life years by disability (DALY) using local data but also international data of close countries. This is a delicate process because for some things there are no data. For example, in the HPV case, mortality and incidence data of cervical cancer are more or less robust but for other cancers are not. Even less regarding the role of HPV in other cancers and warts, there is no data about incidence, nor data about national prevalence. Therefore, one has to use data from other sources and make simulations to see what sounds reasonable Epidemiologist Sr. 1 NCIP.
The case of genital warts illustrates the tensions that the algorithms aim to solve between political interests in particular medical objects and lack of data to numerically assemble them. The political and "clinical" interests in genital warts contrast with the lack of studies and epidemiological data about their incidence, costs and treatment in Colombia. This is not an exclusive problem of Colombia; information about incidence, prevalence and treatment costs of genital warts is relatively scarce compared to cancers and other maladies associated with HPV infection.
The Markov chain becomes a calculative space in which statisticians and epidemiologists can render robust precarious data by stating their inner uncertainty. The statement of such uncertainty becomes a strategy to enact data objectivity, sometimes in contradiction with its statistical meaning. Strategies such as sensitivity analysis and choosing the most "conservative" rates and data about the interventions accomplish such function. Conservative in this case means the lowest estimation. As it is noted by the experts that designed the epidemiological modeling: We found a German study that had very conservative estimations, even the reported incidence was very low compared with other countries, that was the source for the analysis of costs and with that study we did the simulation, and the sensitivity analysis (…) it was putting something, taking something else … Anyway genital warts have an effect and we had to find the ways of estimating it. We thought that such study (Hillemanns et al. 2008) really underestimates the effect of the economic burden produced by genital warts. In spite of such underestimation, those data have an important role in the cost-effectiveness. (Epidemiologist Sr. 1 NCIP) These limitations make explicit the precarious nature of modeling. The assemblage of a coherent and complete history of genital warts and cervical cancer in Colombia is only possible because of the gathering of heterogeneous elements that are connected by uncertain relations expressed in terms of probability.
The next sections of this article discuss the objects that are produced by this model and secondly, what problems and people are excluded and rendered invisible by these practices. In what follows, I want to focus on the ways in which population, nation and risk are enacted in the Markov algorithm. At the same time, I would like to explore the elements that are left out of the algorithm. I have presented the Markov chain as a machine of producing actual and future realities of population and disease. Accordingly, the exclusion of objects, people and entities from the analysis has important consequences in rendering other objects invisible or marginal.

Enacting the women of the nation: statistical reflexivity and homogenization
Whilst the model's parameters show the effort to differentiate viruses, stages and paths of development of the disease, women and female populations are homogenized. In contrast to the unified portrayal of HPV presented in vaccination campaigns and public arenas, the model makes explicit the diversity of HPV types and its consequence in the development of different disease alternatives. The model produces a modest account of the incidence, probability of contagion and probability of developing cervical cancer from HPV infection. These parameters affect the ways in which HPV vaccines' efficiency and its protection are constructed in the model.
The incidence of HPV infections varies by type of virus and geographical location. For instance, HPV 16 and 18, targets of the vaccine, constitute just 12% of infection by HPV in Sub-Saharian Africa, 19% in Asia, 20% in Latin America and 26% in Europe (Clifford et al. 2005, 4). This narrative contrasts strongly with the public discourses about HPV vaccines where the contagion of HPV is presented as undifferentiated and universal. Moreover, as medical anthropology has noted in the case of other pharmaceuticals (Petryna, Lakoff, and Kleinman 2006;Lakoff 2005), the diversity of HPV types and incidence of infection around the world suggest that in the design and development of pharmaceutical interventions, such as HPV vaccines, the priority is to attend the necessities of "markets" in Europe and U.S. which later are promoted as universal and pertinent for other populations and "markets" around the world.
Although the Markov model's account of the diversity of HPV is simplified compared to the version presented in journal articles; it still presents the limitation of attributing a direct association between HPV infection and the potential development of cervical cancer. The algorithm distinguishes high risk and low risk HPV types (see Table 1).
The range of high risk viruses is wide and contrasts with the focus of vaccine advertisements on HPV 16 and 18. Forty percent of cervical cancer cases simulated in the model can be attributed to other high risk types (HPV 31,33,52,56,58,59). This account of the contingency and diversity involved in the design of the modeling and in the gathering of data can be called statistical reflexivity. Numbers and quantified expressions have the capacity to show contingency; they are not exclusively elements to enact objectivity. Another aspect that is not emphasized in public discourses about cervical cancer and HPV vaccines but is relatively clear in the narrative of the Markov model is that cervical lesions are a very rare consequence of HPV infection. The model using as reference a study of Bogotá's Cohort (Muñoz et al. 2004(Muñoz et al. , 2078 describes that incidence of HPV16 is 5.0 (Incidence rate per 100); HPV 18, 1.0 and HPV 31, 1.0. Even if the infection of these hypothetic cases progresses to further stages, the individual risk of having cervical cancer is low, because it should be added to the probability of contagion with a high risk type, the different probabilities of transition to Cervical intraepithelial neoplasia (CIN) and eventually to cervical cancer. Nevertheless, this is a model of big numbers and the algorithm is not based on individuals (Monte Carlo) but in a cohort simulation. Accordingly, risk is enacted and distributed in terms of populations, not in terms of individual perceptions and gambling. The different calculations, done through the Markov chains model, constitute a detailed portrayal of a collective entity, the population, which is the enactment that fits better into the state governance practices (Mackenzie 2014, 189).
Although the probabilities of transition from HPV infection to cervical lesions and cervical cancer are very small, they become important in terms of population and from an economic perspective. The results of the modeling expressed in deaths avoided and Disability Adjusted Life Years (DALYS) have important consequences in decision-making and it is the kind of information that will travel from the model to public arenas as evidence. In this case, because of the aggregation of small probabilities and effects, small matters have important consequences in the constitution of the population as object of governance.
In this framework, individual risk is reshaped and redistributed from the perspective of the population. The risk is not understood as a matter of individual chances but as an issue that affects the society as a whole and therefore represents a threat for individual safety because the individual is part of the group. The population becomes the framework to understand the individual perception of risk. Despite this, recent work on epidemiology modeling has noted that one of the main limitations of Markov chain models of HPV infection and vaccination is the lack of evaluation of the impact of herd immunity, that is the indirect protection of susceptible individuals by a significant proportion of immune individuals in the population (Viscondi et al. 2018).
On the other hand, not all the risks are rendered visible. Other risks remain absent in the algorithm and in the experts and public narratives. Possibly the most dramatic case is the estimation of adverse effects from HPV vaccination. Although this is a contested issue and a matter of concern addressed by different publics, it is completely absent in the algorithm and in the discussion about cost-effectiveness of HPV vaccines in Colombia. Whilst low probabilities and small numbers of disease progression are rendered visible, the probabilities of adverse effects are not taken into consideration in the simulated cohort. The intended invisibility of risks is a recurrent problem in the design of trials in the South (Sunder Rajan 2007; 2017) an issue that became particularly visible in the controversy about deaths associated to HPV vaccine trials in India (Mattheij, Pollock, and Brhlikova 2012;Sunder Rajan 2017).
In that regard, analysts and health authorities have argued that adverse effects have such small probabilities that everyday risks represent a bigger threat. For them, any reference to adverse effect in numbers could undermine the trust in the vaccine. Paradoxically, legal and ethical requirements such as informed consent assume that adolescents and parents understand the risks and the benefits of vaccination and they can make a trade-off. Adverse effects' discussion seems proscribed from technical and policymaking arenas, and it is even statistically and numerically absent.

Exclusion: calculation and difference in the policy arena
In the final section of this paper I would like to introduce some of the wider political processes in which HPV vaccination and its models are entangled. This section is concerned with those elements that have materially contributed with the production of data but are left out of the models and their political discussion. This discussion aims to address the social diversity of the women involved in the main studies about HPV infection and cervical cancer developed in the country, noting how such difference and diversity are erased in the production of data.
As I have presented previously, the objective of the algorithm was to produce a consistent and trustworthy image of the Colombian women's population. Modeling implies simplification. The messiness, heterogeneity and diversity of the individuals constituting the entity known as the population are organized into categories and parameters. In this process of homogenization, some features are rendered visible and constitute the elements that define the "identity" of such entity (Epstein 2007, 277).
Despite the capacity of the algorithm to capture some of the complexity and diversity of HPV, it fails to represent the social diversity of the Colombian women. This contrasts with the public discourses about vaccination in the country. Vaccination campaigns and public speeches about healthcare in Colombia are deeply inserted in the rhetoric of multi-culturalism and racial diversity. In that regard, the algorithm reproduces some of the contradictions of politics and policymaking in Colombia. The recognition of social diversity in the country has not been translated in the development of infrastructures or policies to address the inequities and wellbeing gaps between regions and between ethnic groups.
The cohort simulated by the Markov model is socially homogenous; it is just differentiated in terms of age. Aging classification is very important in the definition of parameters in the modelingbased on research dataprobabilities of infection, transition and regression are assigned to specific age groups. Other aspects such as geographic differences, socio-economic background and ethnicity/race are not considered. The omission of other classification parameters has important consequences in the representativeness of the model. The problem of the dependence on one regional source to produce estimations on the whole country has been noted before by the Colombian National Cancer Institute (INC) in relation to the calculation of national cancer incidence reported by Globocan and the Cali Register.
Because of the enormous geographical and socio-cultural heterogeneity of Colombia, the data from Cali's Cancer Population Register are not representative of the country. This has generated the creation of new registers in other cities, which do not provide reliable and continuous information yet. In absence of data about incidence, the estimation from mortality figures has been recommended. This is the methodology used by the IARC and is published in Globocan. In spite of the quality and utility of the information of Globocan, this aggregation is very limited. It just presents information at national level, moreover it lacks specificity (INC 1994, 7).
In the case of cost-effective analysis and Markov algorithms, data and parameters depend heavily on Bogotá's Cohort study . This research followed a cohort of 2.000 women during 10 years in order to detect incidence and prevalence of HPV infection, the types of virus involved and the transitions toward cervical lesions and cervical cancer. This epidemiological research has contributed detailed information about the development of HPV infection into cervical cancer. However, information and details about the recruited population are omitted in the journal articles that have communicated the results of the study. The cohort is presented as an average aggregation of women which is representative of a general population. Despite the fact that the recruited individuals come from very particular backgrounds, most of them are poor or marginalized women.
Cervical cancer and social class have had a long relationship. Social difference has been considered as a risk factor or as an element that clusters other risks associated with the development of the disease such as sexual behavior and nutritional habits. However, these risk factors gradually disappear from narratives and calculations of cervical cancer. HPV becomes the key factor to understand the development of the disease, erasing other factors, other relations. To the extent that risk factors were not a matter of concern, the cohort became more homogeneous in the algorithm.
Paradoxically, the production of clinical trials and epidemiological studies in the Global South has depended on the recruitment of populations marked by poverty and exclusion (Sunder Rajan 2017). Clinical trials and cohort studies have constituted promises of healthcare for the populations whose bodies are the input in the production of data. Despite the claim of studies and clinical trials, such as Bogotá Cohort and FUTURE (Females United to Unilaterally Reduce Endo/Ectocervical Disease), that the recruitment of population is based on raised consciousness and altruism; in practice data production is heavily supported by the promises of healthcare. In that regard one of the researchers from Bogotá's Cohort points out: May I be very clear: I think they never really understood the benefits of the vaccination, many of the participants really needed and wanted a doctor. The health system did not provide one, so we became the doctors for them. We were their physician for everything … if I have an allergy, [family] planning … (Sr. researcher 3 INC) Although risk factors are strongly entangled with difference, with many kinds of embodied and material difference, once the cohort is enacted, these issues are assumed to be secondary. The social entanglements that make the production of these data possible are perceived as mere accidents. For instance, the connections between the Bogotá Cohort and the trial FUTURE were the result of affective and familiar involvements between researchers and patients. The cohort study lasted 10 years, in which the recruitment was promoted from mothers to daughters. In an attempt to reach a more heterogeneous population, even researchers' and doctors' daughters participated in these studies: In the institute we do not have access to high strata, except the daughters of the doctors, so we have in the study from stratum 1 to 6, we have daughters from colleagues that study in "Los Andes 2 " and girls stratum 1, daughters or relatives of women from the original cohort. We know absolutely the socio-economical background of our patients because we need to know which kind of healthcare coverage they have. But it was not important for the recruitment. (Sr. researcher 2 INC) This effort to disentangle and make social difference invisible in calculation and modeling of cervical cancer and HPV infection in Colombia contrasts heavily with the public discourses about cervical cancer and vaccines. HPV vaccines have been promoted as an intervention of equity; the government argues marginalized women will benefit as a risk group from a reduction of cervical cancer in the future. Campaign materials such as videos and posters present a multiracial and diverse country of girls who are democratically protected by the vaccine. Moreover, pilot vaccination was developed in public schools in poor neighborhoods in Bogotá.
However, beyond this contrast, public narratives, technical modeling and algorithms are manifestations of a new regime of public health and healthcare governance in which pharmaceutics are promoted as the privileged strategy of intervention. Despite the experts and policymakers' perception about the connections between material conditions of living and the development of the disease, pharmaceutical technologies promise an immediate, simple and cost-effective solution to control maladies. Such policy disjuncture is nicely described by one of the advisors of the Ministry of Health involved in the Bogotá cohort epidemiological design: Personally, I have strongly argued that mortality for cervical cancer is an indicator of inequity in social and economic development. Countries with bigger social and economic inequity have a bigger mortality of cervical cancer. So when one introduces a technology such as vaccination, you will vaccinate the indigenous people, poor women … one will reduce the problem of cervical cancer. In 30 years we will see such reduction. I would prefer that the reduction of this indicator, I mean cervical cancer, were effectively done by improving the quality of life of women, the vaccine does not solve inequity. Probably many women will have just primary education, many may never finish … (…) but if we vaccinate almost 100% of the population, as it has happened with other vaccines, we will eradicate the disease. (Sr. researcher 1 INC)

Conclusion
This paper explored the ways in which Markov chains algorithm "datafies" HPV vaccination and cervical cancer. This device produces an interface between a molecularised narrative of the development of cancer and an epidemiological understanding based on population dynamics. This model provides the basis of the production of models and simulations on the development of the disease in the context of cost-effectiveness analysis.
On the one hand, the model reshapes the causal relation between HPV and cervical cancer introducing uncertainty expressed in terms of probability. HPV and cervical cancer are not connected by a mechanical link; the movement from infection to cancer depends on chance. On the other hand, by making it possible to measure and to quantify the models' uncertainty, Markov chains significantly contribute in rendering decision making as an objective process. The Markov chain becomes a calculative space in which statisticians and epidemiologists can render robust precarious data by stating their inner uncertainty. The statement of such uncertainty becomes a strategy to enact data objectivity, sometimes in contradiction with its statistical meaning.
Markov chains simulations produce algorithmic populations, statistical assemblages from diverse and fragmented data. These elements are harmonized through statistical methods to enact a coherent and sound representation of the women of the nation. The presentation of HPV vaccines as the "right tool" (Clarke and Fujimura 1992;Casper and Clarke 1998) for preventing cervical cancer depends on the enactment of the cohort as representative of the epidemiological specificity of Colombian women.
STS has noted the ways in which standards produce realities and through such enactments generate difference and exclusion (Bowker and Star 1999). The algorithm enacts a homogeneous and socially undifferentiated representation of Colombian women's epidemiological profile. Such enactment contrasts heavily with the socio-material conditions of production of these data. Social difference has been a key element in the generation of data about cervical cancer. Risk factors and criteria of selection of population have identified marginalized groups as special targets of research and policy intervention. The connections between research and healthcare services have encouraged the recruitment of populations with unmet healthcare needs.
Policymakers and analysts recognize the material and discursive connection between the risk of developing cervical cancer and social exclusion. However, these considerations are excluded from calculations because they are seen as very complex to control and tame through quantification. Moreover, to disentangle social difference from disease produces the right scenario for introducing pharmaceuticsin this case, vaccinesas the best healthcare alternative. Such disentanglement is supported by "practical" considerations on the Colombian healthcare system and its limitations. As one of the INC experts noted: "it is easier to prevent cancer than to change inequity." The technical flaws of the algorithm are justified by technicians as an inevitable consequence of the (limited) availability of data. Despite the dependence on Bogotá's data, the analysts perceive that the model is still a good proxy for understanding the Colombian population. As I discussed in another article (see Maldonado, 2018), in the case of genital warts, all the key information on Burden of Disease and treatment costs came from a German study, because it was the only study available on the subject at that time. The consequences of such limitations are matter of discussion. Although amongst epidemiologists there is a consensus about the importance of HPV vaccination, particularly for women in rural areas; the vaccination program has experienced serious problems linked to a poor understanding of the communities' perceptions of risks and the lack of trust of medical and healthcare authorities. Technical and social flaws are deeply connected.
After this analysis, the question about the possibility of doing other calculations, more humble and reflexive but still operative and useful for decision-making, remains open. To some extent the Markov algorithm developed by the Universidad Nacional provides an illuminating account of contingencies and complexities in the understanding of the relations between HPV, cervical cancer and vaccines that are taken for granted in policy and the public arena. On the one hand, social difference can be reintroduced in this calculative space, even with the current limitation of data. For instance, in the model developed by the Universidad Nacional (2011) a more open account of social difference could be generated without changing the current parameters and probabilities of transition for HPV infection. This calculation can be re-enacted by including differentiated rates of access to treatments by socio-economic background and region. This could have important changes in the estimations and it could make algorithms more sensible to difference.
On the other hand, relying on a single method to make important decisions on health might be counterproductive, even dangerous. The Colombian Health authorities in practice do not rely exclusively on evidence-based tools to make decisions on HPV vaccination. Anecdotal accounts and personal experience was key in the choice of Gardasil as the vaccine for the HPV vaccination program. However, these accounts are not presented to the public. The official reports show that decisions are made on evidence and a very narrow understanding of evidence. As Adams (2016) has noted in her book Metrics, it is key to include qualitative and ethnographical research as evidence, as well as to engage with numbers in a different, more open-ended, manner.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributor
Oscar Javier Maldonado Castañeda is an Assistant Professor in the programme of Sociology at Escuela de Ciencias Humanas in Universidad del Rosario (Colombia). He completed his Ph.D. in Sociology at Lancaster University (UK) in 2015 under the supervision of Professor Celia Roberts and Professor Maggie Mort. Oscar held a position as Postdoc in the Department of Thematic studies at Linköping University (Sweden). There, he worked in two research projects: "Healthy and Pharmaceutical Subjectivities" funded by the European Research Council and "A constant torment: Tracing the Discursive Contours of the Aging Prostate" funded by the Swedish Research Council (Vetenskapsrådet). Oscar has been an associate researcher of the Group of Social Studies of Science, Technology and Medicine at Universidad Nacional de Colombia (GESTM) and an elected member of the Council of the Society for Social Studies of Science (4S) for the period 2015-2018.
Oscar works primarily in science and technology studies (STS) and medical sociology. He is interested in understanding the intersections between governance and quantification in healthcare using the analytical resources provided by different theoretical traditions within STS, such as material semiotics, anthropology of markets and political sociology of science. He has research experience in social science perspectives on medicine, clinical trials, global health, expertise and contemporary healthcare governance.