Visualizing quantitative microscopy data: History and challenges

Abstract Data visualization is a fundamental aspect of science. In the context of microscopy-based studies, visualization typically involves presentation of the images themselves. However, data visualization is challenging when microscopy experiments entail imaging of millions of cells, and complex cellular phenotypes are quantified in a high-content manner. Most well-established visualization tools are inappropriate for displaying high-content data, which has driven the development of new visualization methodology. In this review, we discuss how data has been visualized in both classical and high-content microscopy studies; as well as the advantages, and disadvantages, of different visualization methods.


The importance of data visualization in science
Science occurs through the collection of data (observation), analysis of these data (interpretation), and communication of this analysis to audiences that may consist of one to millions. Ideally data visualizations should be concise, intuitive and unambiguous in their representation of the data. Although it is not an explicit aim of data visualization, many of the best representations of data are also visually appealing, command attention, and draw an audience. Notably, the process of turning observations to communicable results can often involve considerable abstraction, and may require pre-existing knowledge. For example, visualizing protein structures based on nuclear magnetic resonance spectroscopy, or X-ray crystallography, data involves making numerous assumptions that may not always be correct. Thus, scientists are still often the prisoners looking at shadows on the wall in Plato's cave.

Classical microscopy
Of course, scientific observation is not always indirect. The fields of cell biology and pathology are founded on the principle that ''seeing is believing'', and make extensive use of microscopy. The microscope is a powerful means by which to gain insight into living systems in a way that requires few conceptual leaps to perform data interpretation or communication.
Due to its direct nature, visualization of microscopy-based data has historically involved presentation of the images themselves. The first example of visualizing data generated by a microscope is largely credited to Francesco Stelluti, who in 1624, as part of a pamphlet celebrating the election of Pope Barberini, and then later in 1630 as part of a book dedicated to the Pope, presented hand drawn images of a magnified bee observed through a microscope (Crane, 1999). Hand-drawn representations of human observations continued to be the main form of data visualization for microscopy data until the late nineteenth century. Although drawings cannot be free of human bias, and have little quantitative information, their role in science was of tremendous impact. Even today, the drawings of Robert Hooke and Ramón Cajal are visually stunning, provoke a visceral response, and provide tremendous scientific insight. Like any good scientific visualization, experts, and non-experts, alike can understand them.
Using photography as a means to capture, and display, microscopy images happened very quickly after the invention of the camera (Overney & Overney, 2011), and continues in cell biology even today. Obviously an image itself is a powerful data visualization method in microscopy-based studies as it directly presents the data -i.e. the phenotype of cell. While cell microscopy now involves much more advanced technologies, such as confocal microscopy, or super resolution microscopy, the principle of presenting an image itself as data visualization is still the same. Bias can still easily be introduced when communicating data in this fashion. Specifically, a portion of the data, such as the image of a single cell, is typically selected by the human experimentalist from a complex population, and portrayed as ''representative''.
Presenting the representative cell from a population is often driven by practical concerns, as it would be unfeasible to show images of all the cells in a dataset. However, an obvious issue is that it is unlikely that any single cell can accurately represent a population. Moreover, presenting kinetic or 3D data as raw images is difficult in conventional formats. Another critical issue with using raw images for visualization is that the interpretation of the content is dependent on the prior knowledge of the observer. An expert in cell shape will see cell images in a different way to a mathematician.

High-content analysis
Advances in microscope technology and workflows now allow researchers to gather millions of images in very rapid fashion at sub-cellular, cellular and population levels (Boutros et al., 2015). As the sheer volume of images that can be generated in a single experiment continues to increase, the ability of human beings to directly examine cellular phenotypes decreases. To facilitate the analysis of cellular phenotypes in large imagebased datasets, methods pioneered decades ago (Olson et al., 1980) are now used routinely in high-throughput studies to ''segment cells'' (identify cellular boundaries), and quantify basic aspects of cell morphology (e.g. size, width-length ratio, protrusiveness) (Bakal et al., 2007;Graml et al., 2014). Alternatively, when cells are labeled with antibodies, or dyes, that detect specific proteins, and/or organelles, the levels and localization of these proteins can be quantified at a single cell level (Boland & Murphy, 2001;Collinet et al., 2010;Glory & Murphy, 2007;Liberali et al., 2014;Perlman et al., 2004;Sero et al., 2015). Cellular phenotypes can also be quantified over time (Cooper et al., 2015). Often different transformations are applied to the data, which can be used to project the data into useful spaces, or generate additional features of interest (Shariff et al., 2010). How raw features should be analyzed appropriately is well beyond the subject of this review, and is still a somewhat open question, but we and others have used several statistical and computational methods to make use of these features, especially in the context of high-throughput screens (Bakal et al., 2007;Cooper et al., 2015;Graml et al., 2014;Liberali et al., 2014Liberali et al., , 2015Pincus & Theriot, 2007;Sailem et al., 2014;Yin et al., 2013). Newer ''deep-learning'' methods process images without segmentation to quantify cellular phenotypes (Ciresan et al., 2013).
Imaging where many cellular features are quantified is often termed ''high-content'' analysis (HCA) (Giuliano et al., 1997). 1 The advantages of performing HCA in cell biology are many fold, but foremost among these is that cellular phenotypes are described, in unbiased, systematic and quantitative fashions; thereby allowing rigorous analysis to be performed. HCA has been typically been associated with phenotypic genetic or chemical screens (Taylor, 2007), and is also now being used in the context of pathology (Rizzardi et al., 2012). However, if you can quantify hundreds of different features for every cell, and your dataset can be comprised of millions of cells, how does one best visualize this data?

Bar charts and box plots
Bar charts, box plots and parallel coordinate graphs (or ''line graphs'') are frequently used to display data generated by high-throughput imaging ( Figure 1A). In the case of bar charts, they are generally used in microscopy-based studies to compare a small number of features between populations of cells (e.g. the average size of different cell types; Figure 1B). Despite their common use, bar charts in particular are not appropriate for microscopy-based data. Bar charts were originally designed to display categorical, and not continuous variables, and thus may hide the underlying distributions of data (Weissgerber et al., 2015). Many cellular phenotypes not only exhibit non-Gaussian distributions at the single-cell level, but also are often heterogeneous (i.e. there exists distinct sub-populations (Altschuler & Wu, 2010;Pelkmans, 2012;Yin et al., 2013), thus bar charts are effectively ''hiding'' data. Box plots are slightly more appropriate for high-content data, as they provide a broad overview of the spread, and skew, of the data ( Figure 1C). However, presenting multiple features generated during HCA as boxes is not intuitive, does neither provide a good sense of how multiple features may be related, nor is useful for conveying the magnitude of differences between features. Finally, using bar or box charts to display dozens-hundreds of cells, let alone millions, is not feasible. Pie charts face similar issues.
Parallel coordinate graphs (Gehlenborg & Wong, 2012c) were developed over 150 years ago, and remain well used today. Classic examples include those used by Gannett to visualize census data (Hewes & Gannett, 1883), and Fisher's visualization of Iris phenotypes (Fisher, 1936). Parallel coordinate graphs are highly amenable for visualizing highcontent data in 2D ( Figure 1D), as it allows the observer to quickly interpret relationships between individual dimensions (Collinet et al., 2010;Graml et al., 2014). However, like bar charts or box plots, the observer cannot immediately translate the data presented as a parallel coordinate graph into a meaningful representation of a cellular phenotype, or what the cell actually looks like.

Heat maps
The use of heat maps, or color maps, to display highdimensional imaging data was inspired by the use of heat maps to visualize transcriptomic data, which can involve displaying the expression levels of hundreds-thousands of mRNAs, for dozens-hundreds of samples, in a single graph. However, heatmaps have a long history that far predates expression profiling, and were first used to display economic data (Gehlenborg & Wong, 2012a;Loua, 1873) In heat maps, each value is represented as a colored box, and the value is directly represented by color type (i.e. green or red for positive and negative values), and color intensity (high intensity for high values; Figure 1E). Heat maps are excellent for datasets comprised of large numbers of high-dimensional vectors, though as mean values are often displayed; heat maps, like bar charts, can misrepresent the underlying 1 High-content data does not necessarily mean ''high-throughput'', as even a single image may be quantified in a very high-dimensional manner. Conversely, high-throughput is not by definition high-content, as a genome-wide RNAi screen can be performed by measuring a single feature (e.g. viability) following gene depletion. population distribution. Positive correlations between phenotypes are well captured by heatmaps. For example, when the phenotypes following systematic gene depletion (Bakal et al., 2007;Graml et al., 2014), or treatment with small molecules (Perlman et al., 2004), are clustered in heat maps based on the similarity of their phenotypic signatures, the eye is immediately drawn to highly similar phenotypes that cluster together.
However, heat maps poorly represent correlations that are weak, or even negative, as two very distinct phenotypes can appear very close to one another on the heat map (i.e. following clustering), but in fact can be quite different. A related issue is that it is difficult to visualize the relationship between more than two phenotypes or features using heat maps, because there is only a single degree of freedom regarding the placement of one feature with regard to another -meaning one row can either be above or below, or one column can only be to the left or right of another. For example, phenotypes A and B may be more similar to each other than to phenotype C; which is well observed on a heat map. However, visualizing the differences between A and B themselves can be challenging if the magnitude of these differences is less than that of the differences to C. This problem can sometimes be resolved by adequate transformation of the data, for example coloring based on logtransformed data. An additional weakness of heat maps is that different people see color differently, which can lead to inconsistent interpretations of heat maps (Gehlenborg & Wong, 2012b;Wong, 2010). Finally, it is very difficult to grasp what a cell, or population of cells, actually looks like based on colored boxes.

Network graphs
As the outcome of many high-throughput phenotypic studies is the inference of functional interactions between genes, based on phenotypic similarity of genetic depletion, visualizing these interactions on a genome-wide scale is often best done through images displaying networks of interactions. There are numerous striking examples of using networks to display functional interactions (Costanzo et al., 2010;Snijder et al., 2013). Despite the frequent use of this type of graph in high-content studies, it is important to differentiate visualization of interactions between genes, and visualizing interactions between cellular features -which is the subject of this review.
Network graphs can be used to describe images, where each node is a feature, and each edge is a correlation ( Figure 1F) (Snijder et al., 2009). Edges can also be scaled (either in length or thickness) to represent the extent of that correlation. In feature networks edges can also be colored to describe whether a correlation is positive or negative (Keren et al., 2008). Furthermore, when methods such as Bayesianbased methods are used to analyze datasets, causal relationships between features , or phenotypes , can inferred, and such relationships can be visualized by assigning directed arrows between features. Compared to heat maps, network graphs lead to a much more intuitive visualization of relationships between individual values in a dataset.

Scatter plots
Scatter plots have been used since the earliest days of HCA, and are the basis for displaying data analyzed by Fluorescence-Activated Cell Sorting (FACS). Because each point in a scatter plot can represent a single cell, they are excellent for displaying inter-and intra-population phenotypic heterogeneity, and identifying small sub-populations (Loo et al., 2009;Singh et al., 2010;Slack et al., 2008). Scatter plots are often used to display three dimensions, and data reduction methods such as Principal Component Analysis (PCA) can be used to identify useful three dimensional projections from datasets that may contain hundreds of dimensions ( Figure 1G). Additional dimensions can be presented by coloring and/or sizing different points plotted in 3D. There is an intuitive nature to scatter plots, and the structure of even complex datasets can be easily conveyed to both experts and non-experts alike. Scatterplots are also very effective for assessing distances between data points, however care must be taken to ensure distances remain true to the original datasets.
However, what the phenotypic space of the scatter plot itself may represent or translate to in terms of real images is not always clear, especially when the phenotypic space being displayed represents a transformed subspace of a much larger original space ( Figure 1G). Moreover, data visualization by scatter plots is unsuitable when populations exhibit high degrees of overlap in phenotypic space. Thus, in the context of genetic or chemical screens when experiments number into the thousands or millions, scatter plots of the data often appear as a large poorly interpretable ''cloud''. Conversely, scatter plots are often not appropriate for visualizing very sparse datasets. Given these advantages and disadvantages, we suggest scatter plots are best used to display both intra-and inter-population phenotypic heterogeneity when the data is well distributed in phenotypic space, and there are 1000-10 000 datapoints. With these constraints in mind, scatter plots are excellent for initial data exploration as well as for the presentation of processed data.

From frequencies to landscapes
A commonly used method in data visualization is the use of histograms, as estimates of underlying probability distributions (Pearson, 1895). Such methods have been used to visualize single cell high-content data; such as the frequency of a 1D phenotype (Keren et al., 2008;Perlman et al., 2004). Histograms can be generated that describe 2D phenotypes, where the x-and y-axes now describe two different features, and contour lines and/or shading is used to describe frequency; such graphs resemble topographic maps (Leha et al., 2015).
The intuition behind a 1D or 2D histogram can be extended to create 3D surfaces, or landscapes. By borrowing concepts from dynamical systems theory, such landscapeswhether they are generated using estimates or actual distributions -can be interpreted as either landscapes of fitness peaks or attractors. Landscapes derived from real data have no doubt been inspired by landscapes that have been used to describe theoretical concepts such as Waddington's visualization of phenotypic canalization during fate determination (Waddington, 1957). In a Waddington-type landscape, regions of phenotypic space where cells are more likely to explore are visualized as ''basins'' in the landscape, or attractor regions. A region between two attractors is one that cells can explore, but are unlikely to exist in for the long-term. Peaks in the landscape are regions where phenotypes are particularly unstable, and ''fall from'' towards attractors. In contrast, in fitness landscapes peaks are regions of phenotypic space that biological systems are attempting to ''climb'' (Kauffman, 1993). At the peak, the system has achieved the maximal possible fitness in a given environment. Once the peak is found a system (cell) can exist stably near, or at, the peak. A classic example is the Fujiyama landscape, where one peak dominates the landscape (Kauffman, 1993) The advantage of using landscapes for high-content data is that landscapes are compact and intuitive representations of often very complex phenotypic spaces. Moreover, landscapes can provide insight into how cells in the population are dynamically exploring this space -even when the data is static in nature. Depending on how landscapes are generated using real datasets, stable phenotypes present in the dataset may appear as basins, peaks or even both. Snijder et al. (2012) have used landscapes to describe the relationships between cell size, cell density and viral infection following RNAi screening. Here peaks represent phenotypes that are most susceptible to viral infection, and thus are most analogous to fitness peaks. We have used such plots to show the frequency of particular shapes in a dataset, where the most predominant shapes can be considered fitness peaks (an example is shown in Figure 1H), and how systematic gene depletion affects the topology of fitness landscapes (Cooper et al., 2015;Yin et al., 2013).
We have also generated landscapes based on a dataset describing phenotypes following depletion of hundreds of genes. Such landscapes describe the potential space that can be explored by cells over a wide range of genetic backgrounds. In these cases, we have combined ideas from both attractor and fitness landscapes. Wild-type cells exist at a peak in the landscape, that cells are striving to ''climb towards'' whereas alternate stable forms exists as attractors that cells can ''fall into'', and become ''trapped in'' depending on their genetic background -for example, in cases of very deleterious mutations (Yin et al., 2014).

Glyph-based methods
In all the cases, we have described thus far to present highcontent data, no visualization method provides a sense of what the specimen under investigation, the cells themselves, actually look like under a microscope. Thus, while HCA provides a means by which to quantify microscopy data, it comes at the price of weakening one of the great strengths of microscopy -the power to present the data in as direct a manner as possible.
Quantitative morphological data can be presented as scaled contours of cells, which provides a simple, but very powerful and intuitive means by which to convey complex phenotypes (Keren et al., 2008;Pincus & Theriot, 2007). Furthermore, we have recently developed a method termed PhenoPlot that presents phenotypic data as graphical, and accurately scaled, representations in graphs which resemble actual cells ( Figure 1I) . Each PhenoPlot can be used to display multiple features of single cells, or the average cell of a population, simultaneously as an intuitive glyph. PhenoPlots are based on facial glyphs devised by Herman Chernoff where k-dimensional data is represented as cartoon faces (Chernoff, 1973), and are also similar to striking graphs made by W. Duane Brown to convey the average and standard deviation of 11 dimensions by scaling different box-shaped body parts of a cartoon body (Williams, 1967). PhenoPlots have two key aspects that make them appropriate for high-content data. First, multiple features can be shown in one single glyph, which provides a compact representation that is not offered by bar charts, heat maps or scatter plots. Second, PhenoPlots are intuitive representations of cellular phenotypes that are interpretable by non-experts. However, PhenoPlots are not ideal for displaying datasets describing the phenotypes of single cells in large populations, and are poor to describe more than 12 features at a time.

Cell simulations
One means by which to display complex numerical data derived from image-based analysis is to generate simulated cells based on actual data (Johnson et al., 2015a,b;Murphy, 2012). Simulations are particularly powerful because they are perhaps the best visual representation of complex quantitative phenotypes, even when such populations might not actually be present in the data (i.e. the average cell). Unlike almost any other type of data visualization method, the number of features they can display scales well with the number of features that can be measured. Moreover, such simulations are ideal for predictive studies and hypothesis generation, as the effects of perturbing one or more features on all other features can be determined. Although cell simulations can be used to display the phenotypes of single cells in complex populations (Johnson et al., 2015b;Rajaram et al., 2012), they are unlikely to be useful in displaying large datasets, as the level of visual complexity would exceed that which makes visualization useful.

The future
The complexity of all imaging data, but especially that which can be acquired in high-throughput, is already increasing at a rapid rate. Recent advances in technologies mean that single cell phenotypes can be quantified across millions of single cells in 3D, and over time. However, already data visualization tools lag considerably behind imaging tools, thus there is an immediate challenge to develop new ways to present microscopy-based data. Given the remarkable interactive multi-media environments we are able to explore on computers, televisions and phones, the future of scientific visualization must surely be headed in this direction. However for such visualization tools to become widespread, scientists, publishers and their audiences alike, must accept and embrace data presentations that break the mold of the 2D static figures we have become so accustomed to.