Modeling spatial interaction networks of the gut microbiota

ABSTRACT How the gut microbiota is organized across space is postulated to influence microbial succession and its mutualistic relationships with the host. The lack of dynamic or perturbed abundance data poses considerable challenges for characterizing the spatial pattern of microbial interactions. We integrate allometric scaling theory, evolutionary game theory, and prey-predator theory into a unified framework under which quasi-dynamic microbial networks can be inferred from static abundance data. We illustrate that such networks can capture the full properties of microbial interactions, including causality, the sign of the causality, strength, and feedback loop, and are dynamically adaptive along spatial gradients, and context-specific, characterizing variability between individuals and within the same individual across time and space. We design and conduct a gut microbiota study to validate the model, characterizing key spatial determinants of the microbial differences between ulcerative colitis and healthy controls. Our model provides a sophisticated means of unraveling a complete atlas of how microbial interactions vary across space and quantifying causal relationships between such spatial variability and change in health state.


Introduction
Microorganisms residing in the human gut have direct and indirect impacts on health and disease. 1 Disruptions to the community of gut microbes can contribute to the risk and severity of a number of medical conditions, such as obesity, cancer, and autism among others. [2][3][4][5] While most studies focus on the determination of microbe-health relationships using homogenized samples, such as stools, [6][7][8][9][10] there is growing appreciation for how the influence of microbes on host physiology is driven by the spatial heterogeneity of microbiota along the length of the digestive tract. [11][12][13][14][15][16][17][18] Microbiota data collected at a series of gut locations are the key to chart the organizational and functional atlas of the microbial community across space, 18 but this will critically rely on how these spatial data are modeled and analyzed. Trillions of microbes populate a narrow gut, forming the most complex and densest ecosystem on Earth. It is unlikely that microbes exert their influences independently of each other on host traits, rather they do so interactively through complex but wellorchestrated networks. [19][20][21][22][23] More importantly, such interactions may occur among microorganisms not only at the same taxon level but also across taxonomic groups, given the co-occurrence of phylogenetically diverse microbes. As such, reconstructing large-scale microbial interaction networks across spatial gradients is a crucial choice to interrogate the elaborate crosstalk among gut microbes and between microbes and their hosts.
Here, we implement a computational model for inferring space-specific microbial interaction networks. Existing approaches reconstruct an overall microbial network from a large number of samples, 24 failing to characterize sample-specific topological differences. Correlation-based approaches can estimate the strength of microbemicrobe interaction, but cannot infer the direction of the interaction. 25 Bayesian networks can identify causality, but lack a capacity to find feedback cycles. 26 High-density longitudinal data are powerful for reconstructing informative networks filled with bidirectional, signed, and weighted interactions, but such data are hardly available for the gut microbiota, 27 especially at multiple locations of the gut. Our model that integrates elements of different disciplines can overcome all the above limitations to reconstruct informative, dynamic, omnidirectional, and personalized networks (idopNetwork) from static abundance data. [28][29][30] The idopNetwork can reveal the positiondependent change of microbial networks and infer its spatially causal relationships with health state. As a proof of concept, we validate the usefulness of idopNetwork by designing and conducting a spatial mapping study of the gut microbiota. We identify key biogeographically-varying microbial interactions that distinguish diseased guts from healthy counterparts.

Inferring ecologically functional microbial networks across gut space
We sample seven spatially different positions along and inside the guts from five patients infected with ulcerative colitis (UC) and one healthy control (HC) ( Figure 1) and measure microbial abundance levels at different taxa from a total of 23 positions, i.e., habitats. The diagram in Figure 2 illustrates the workflow of how we reconstruct microbial networks from this spatial data. We identify the 16 most abundant phyla and the remaining phyla (attributed to the other) and plot the abundance level (i.e., niche index) of each phylum against habitat index (HI), defined as the summed abundance level of all microbes at a gut position. We find that niche index-HI change can be well fit by the power equation described in equation (1) (Figure 3). A majority of phyla increase their abundance with habitat index in an exponential manner (R 2 = 0.80-0.90), although the slope of increase varies from phylum to phyla. Only two rich phyla, Firmicutes and Bacteroidetes, follow a reverse pattern of change, i.e., decreasing exponentially with habitat index. We plot the niche index of each species against habitat index and find that their relationship can be similarly fitted by the power equation, but with the slope depending on species (Fig. S1). Taken together, the power equation can not only explain the part-whole relationship of individual microbes at various taxonomic levels with the total microbes, but also provides a means of modeling microbial diversity and interactions across spaces.
In the Materials and methods section, we describe the derivation of the idopNetwork model by building quasi-dynamic ordinary differential equations (qdODE) based on niche index-HI relationship ( Figure 2). The qdODE decompose the observed abundance level of a microbe into its independent component due to its own capacity and dependent component resulting from the influence of other microbes, as shown by equations (2) and (7). We use the independent and dependent abundance components estimated by an optimization technique to reconstruct spatially-varying idopNetworks at the phylum level for UC and HC groups ( Figure 4). The topological organization of microbial networks displays both similarities and differences among gut positions. For example, Deferribacteres acts as a leader (i.e., those with more and stronger outgoing links than other microbes) that exerts influences on Cyanobacteria, Fusobacteria, Chloroflexi, Planctomycetes, and Acidobacteria in all networks, regardless of their gut positions and gut health states. There is also Figure 2. Workflow of idopNetwork reconstruction from spatial mapping data of the gut microbiota as collected from the study of Figure 1. Each gut can be viewed as an ecosystem in which different positions represent natural habitats of microbes. As a toy example, we assume four microbes residing at each gut position, whose abundance is monitored to form a data structure. By taking the sum of abundance of all microbes at each position, we calculate habitat index for this position. The power equation is used to model the allometric relationship between the abundance of individual microbes and habitat index, which establishes a foundation for converting static data into its quasi-dynamic representation crucial for the integration of evolutionary game theory and predatorprey theory into graph theory. gut position-dependent variability not only expressed in total microbial abundance and microbial composition ( Figure 3), but also in the topological structure and organization of microbemicrobe interactions ( Figure 4). Positiondependent networks of the UC group can be clearly classified into two types; one involving lumen, rectum, transverse colon, ileum and cecum networks and the second including sigmoid colon and descending colon networks. In the second type, Firmicutes and Deinococcus-Thermus bring strong amensalism to bear on Proteobacteria and Fusobacteria, respectively, whereas in the first type, the former is commensalistic toward the latter. Detailed position-dependent differences can be observed in network organization from the same type. In the first type, the strength of this commensalism decreases from the cecum and ileum networks to the transverse colon network to the rectum and lumen networks. Deferribacteres is commensalistic for Fusobacteria in the lumen, rectum, transverse colon, and cecum networks, but this relationship changes as amensalism in the ileum network. In the second type, compared to the sigmoid colon network, the descending colon network is encoded by amensalism at a higher frequency.
Only three positions, transverse colon, ileum, and cecum, were sampled for the HC group, and these positions display distinct network topologies ( Figure 4). Each of these three HC networks is tremendously different from the UC network at the same position. In general, the HC networks are well mixed by commensalism and amensalism, whereas the UC networks tend to be dominated by commensalism, suggesting that a healthy network can better balance different types of microbes than a diseased network. At the cecum and ileum positions, Firmicutes establishes a commensalistic relationship toward Proteobacteria in the UC networks, but an amensalistic relationship between these two phyla is detected in the HC networks. In the transverse colon network, although commensalism occurs from Firmicutes to Proteobacteria in both UC and HC groups, the strength of this interaction is larger in the former than in the latter. Also, Deinococcus-Thermus is commensalistic toward Fusobacteria in the UC networks, but this relationship is amensalistic in the HC networks. Taken together, the strength, pattern and architecture of microbial interactions vary spatially across the biogeographic positions of the gut, with the degree of variation depending on healthy state. Network analysis on the three commonly measured positions suggests that healthy networks display a greater position-dependent variability than diseased networks. A series of spatially reconstructed networks can more precisely characterize key microbial interactions that interrogate why and how a healthy state becomes unhealthy.
Our qdODE model decomposes the observed abundance level of a microbe into its independent component (expressed when this microbe is socially in isolation) and dependent component (due to the accumulated (positive and negative) influence of other microbes on this microbe) at each gut position. This decomposition is illustrated in Fig. S2, where the independent and dependent abundance of each phylum are characterized across samples. Figure 5 reveals the position-dependent total amounts of independent abundance, positive dependent (promotion) abundance and negative dependent (inhibition) abundance over all microbes at the phylum level. For UC guts, lumen, cecum, ileum, sigmoid colon, and rectum has a similar amount of independent abundance, which is much lower than that at transverse colon and descending colon. Among three commonly measured positions, cecum and ileum are similar in independent abundance between UC and HC guts, whereas the independent abundance of transverse colon is strikingly larger in the UC than HC guts. This suggests that the intrinsic capacity of microbes to express themselves in transverse colon is a determinant of the health state of a gut. The amounts of positive dependent abundance (via promotion) and negative dependent abundance (via inhibition) substantially vary among gut positions. Some positions, such as ileum, transverse colon, and rectum, have a large amount of positive dependent abundance, whereas some positions, like descending colon, are dominated by negative dependent abundance ( Figure 5). At transverse colon, both positive and negative dependent abundance are much richer for UC guts than HC guts, suggesting that the strength and pattern of microbial interaction at transverse colon are a determinant of gut health state. Taken together, when healthy guts get infected by ulcerative colitis, the capacity of microbes to be independently expressed increases at certain positions, i.e., transverse colon and descending colon. Also, the total strength of microbial interaction and the relative strength of cooperation and competition are associated with the shift of health state from healthy to diseased and vice versa.

Reconstructing multilayer and multiplex microbial networks
Microbial networks may occur and function at different taxonomic levels, with larger sizes at lower than higher levels. Modularity theory suggests that a robust and stable large-scale network would be divided into distinct network communities within which entities are more functionally correlated with each other than with those from other modules. 31,31 Thus, when we attempt to reconstruct a microbial network from a large number of microbes, we can break it down into sparsely interconnected network communities. Wu and Jiang 30 proposed a bottomup approach for identifying network communities by classifying all entities into different modules according to their similarity of sample-dependent variability. As shown by Figure 2, the abundance of a microbe changes with habitat index following the power law. We implement the power equation into Kim et al. 31 functional clustering to detect different microbial modules (see the Method).
The mean values of abundance for all microbes within modules are used to reconstruct the coarsegrained module-module idopNetwork or the toplayer microbial network. Microbes within each module form multiple fine-grained microbial networks or bottom-layer microbial networks. The dimension reduction of data by clustering increases statistical and computational efficiency in inferring coarse-and fine-grained networks. However, if the number of microbes within a module is still too large to be used for network reconstruction, we can implement functional clustering to classify this module into distinct submodules and reconstruct cross-submodule networks. If needed, a submodule can be further classified into different sub-submodules and this process repeats until a trackable number of microbes is obtained. In the end, we reconstruct multilayer microbial networks at different levels of classification. If microbes at a higher classification level interact with those at a lower level, multiplex microbial networks can be reconstructed.
In our spatial mapping study, we identify 65 species, which are classified into seven distinct modules M1-M7 by functional clustering (Fig. S3). Each module contains a different number of species, forming a network community. We reconstruct a seven-node coarse-grained microbial network from modules and seven fine-grained microbial networks from species of the same modules, forming a 2-layer microbial network between and within modules ( Figure 6). Such a 2-layer network is inferred for UC and HC groups, respectively. The first-layer network is structurally similar but organizationally different between Figure 6. Two-layer microbial networks at the species level for HC and UC groups. At the top level of the network is the 7-node intermodule (coarse-grained) network and at the bottom level are seven interspecific (fine-grained) subnetworks, visualized by Voronio treemaps. Arrow lines with warm and cold colors represent commensalism and amensalism, respectively, with the thickness of lines proportional to the strength of microbe-microbe interactions. the two groups. Module M6, as a leader, exerts a strong commensalism to modules M1 and M5 in HC guts, but this relationship becomes amensalistic in UC guts.
M1 and M5 are highly abundant, but each is only composed of two species. In M1, Escherichia coli and those unidentified species establish a commensalistic relationship in both HC and UC groups, but in M5 Bacteroides plebelus prompts Suttereslla wadsw in the HC group, but the latter turns to inhibit the former in the UC group ( Figure 6). Considerable differences in both structure and organization are observed for the secondlayer networks of the other five modules. Compared to the UC group, the second-layer networks of M2, M3, and M7 contains more amensalistic links than those in the HC group, further suggesting that microbial competition may be a cause of ulcerative colitis. For example, in the second-layer network of M2, Bacteroides ovatus is a positive strong leader that promotes many other species in HC guts, but its leadership becomes negative at a reduced strength in UC guts. Also, Bacteroides caccae is activated in UC guts, exerting strong commensalism for Flavonifractor plautii and strong amensalism for Bifidobacterium bifidum and Bacteroides sterco.
More remarkably, the second-layer network of M3 produces structural and organizational changes from HC to UC groups ( Figure 6). Lactobacillus coryniformis is a dominant leader that positively impact many other species in the HC group, but it turns to exert strong negative impacts on the same species in the UC group. Lactococcus piscium that coexists peacefully with its partners in HC guts becomes aggressive toward Firmicutes oral and phylum sp. Oral in UC guts. M4 and M6 are filled of commensalism in both HC and UC networks, but the strength of commensalism notably differs between two types of networks. Stronger commensalism exerted by particular species is observed in UC than HC groups. Taken together, multilayer microbial interaction networks provide a detailed roadmap of how each microbial species interacts with every other species through cooperation or competition to determine the change of guts from healthy to unhealthy states.

Spatial tracing of multilayer networks
We reconstruct multilayer networks at each position from which we can trace how coarse-and fine-grained networks vary along biogeographical gradients (Figures 7,8). For the UC group, coarse-grained networks are structurally very similar over seven sampled positions, except for the link from modules M6 to M1, which is commensalistic at lumen, cecum, and ileum but amensalistic at transverse colon, descending colon, sigmoid colon, and rectum ( Figure 7). Finegrained networks also vary among gut positions, but with the degree of this difference depending on modules. Network communities for M4 and M6 exhibit a slight increase of commensalism strength from lumen to rectum, but those for M1, M2, M3, M5, and M7 differ dramatically among positions. In those network communities, some species-species links change from commensalism to amensalism. For example, in M5, Bacteroides plebeius and Sutterella wadsworthesis are mutualistic to each other at lumen, cecum, and ileum, but become increasingly parasitic from transverse colon to descending colon to sigmoid colon to rectum. Similarly, in M3, Lactobacillus coryniformis is commensalistic to Lactococcus garvieae at lumen and cecum, but this relationship becomes amensalistic with an increasing strength from ileum to transverse colon to descending colon to sigmoid colon to rectum.
The three gut positions sampled for the HC group also exhibit differences in both coarse-and finegrained networks, although the extent of such differences is module-dependent ( Figure 8). In the coarsegrained network, leader M6ʹs commensalism toward M1 and M5 at cecum and ileum becomes amensalism at transverse colon, but the internal workings within M6, M1, and M5 little vary among three positions. Pronounced differences in the sign and strength of certain interspecific interaction are observed in fine-grained networks of M2, M3, and M7. For example, in M2, Bacteroides caccae exerts commensalism at cecum and ileum but amensalism at transverse colon for Bacteroides stercoris and Bifidobacterium bifidum. In M7, Bacteroides uniformis is commensalistic toward Flavisolibacter ginsengisoli at cecum, but this commensalism changes as amensalism at ileum and transverse colon.
The two-layer networks provide a detailed atlas of interspecific interactions that determine differences between UC and HC groups (Figure 7 vs. Figure 8). Although most pairwise interactions are consistent in sign between the two groups, there are a few certain interactions that change their sign from HC to UC guts. In general, UC networks contain more amensalistic interactions than the HC networks, suggesting that microbes tend to turn to be competitive when the guts are infected   by ulcerative colitis. For example, in M7, Bacteroides uniformis at cecum is commensalistic toward Flavisolibacter ginsengisoli in the HC group, but becomes amensalistic toward the same species in the UC group. In addition, for some pairs of species, the strength of their interactions may change from HC to UC at the same position. All suggest that certain interspecific interactions may be used as a biomarker to assess health risk.

Discussion
The network model described in this article adds a previously neglected dimension to fully capture the topology of the vast complexity of the gut microbiota. This additional dimension is the spatial distribution of microbes among physically, chemically and immunologically distinct niches in the gut. 12,32, The biogeography of bacteria in the gut is regulated by nutrient selection and immune activation and, meanwhile, it determines the microbial compositions and heterogeneity along the longitudinal axis of the intestines. However, a precise characterization of microbial interaction networks along gut positions is challenged by unavailability of high-density dynamic or perturbed abundance data collected at each position. In this article, we present a spatial model to disentangle this challenge by extracting dynamic snapshots from static data.
The capacity of the new model to dynamize networks in space with no need of dynamic data results from the seamless integration of multiple disciplines through the implementation of highdimensional statistical theory. From a dynamic perspective, evolutionary game theory explains the overall abundance of a microbe to be due to its independent component (reflecting how much a microbe can grow in isolation due to its intrinsic capacity) and dependent component (describing the amount of microbial growth due to influence by other co-existing members). The dynamic formulation of evolutionary game theory using static data is achieved by converting them into their quasi-dynamic representation via allometric scaling theory. By introducing predator-prey theory, we construct a system of qdODE to quantify the independent and dependent components for each microbe. 28,29,30 The estimated independent and dependent components by an optimization technique are coded into graphs, producing one of the most advanced networks -idopNetwork. As compared to the most commonly used Dynamic Bayesian Networks (DBNs) which can identify the causality of interactions from evenly-spaced dynamic data, idopNetwork can extract and capture full properties of microbial interaction networks (including bidirectionality, sign, weight, and feedback loop) from any data domain with any measurement schedule. Through extensive computer simulation studies, idopNetworks were found to perform much better than DBNs in terms of the power and false positive rate of interaction detection. 32 In particular, idopNetworks provide a visualized platform to trace how microbial interaction architecture changes from one location to the next and how these changes impact the reciprocal shift between healthy state to diseased state.
As a proof of concept, we apply idopNetwork to reveal how microbial network structure and organization vary along biogeographical gradients of the gut using the gut microbiota data from a spatial mapping study including ulcerative colitis-infected patients and healthy controls. idopNetwork characterize a dramatic change in network organization (the sign and strength of interactions) from one position to next along and inside the gut (Figures 3, 5, 7, 8). Such an organizational change is much more pronounced in guts inflected by ulcerative colitis than healthy guts, suggesting that the position-dependent change of interaction strength and sign may be a driver of gut shift from a healthy state to an unhealthy state.
An increasing number of studies have showed a great potential to control many diseases by changing the ecological equilibrium of microbes in the gut, 33,34,35 but existing approaches can only treat the whole gut as a mixture, without taking into account position-dependent variability in the gut microbiota. The idopNetwork model can discern spatial changes of microbial interactions and identify the precise gut positions at which microbial functions are most tightly associated with the disease. Spatial data analysis shows that difference between ulcerative colitis guts and healthy guts lies in the shift of ecological interactions between specific microbes from commensalism-dominated cooperation to amensalism-dominant competition at certain gut positions. Phylum Planctomycetes exerts strong commensalism for phylum Verrucomicrobia at transverse colon and cecum in healthy guts, but this commensalism becomes strong amensalism when the guts are inflected by ulcerative colitis. In clinical practice, by shifting Planctomycetes-Verrucomicrobia amensalism back to Planctomycetes-Verrucomicrobia commensalism at these gut positions via medical interventions, ulcerative colitis may be controlled and eliminated.
The idopNetworks can be reconstructed at any phylogenetic level, including a web of crosstalk among bacteria, fungi, and viruses and networks at any taxonomic level for the same type of microbes. While microbial networks at a higher level of taxon, such as phylum, can generalize a general rule of thumb behind microbial community assembly, those at a lower level, such as species, strain, or even genotype, can precisely characterize the detailed roadmap of how individual microbes interact with each other spatially to determine host health. However, a low taxonomic level may include an enormous number of operational taxonomic units (OTU), which makes it computationally infeasible to reconstruct a large-scale network. To circumvent this issue, we implement power equation-functional clustering to classify all OTUs into distinct size-reduced modules according to developmental modularity theory. In network science, this procedure is a bottom-up approach that breaks down any large-scale network into distinct network communities or subnetworks. 30 We use this approach to classify 65 species into seven modules and reconstruct a two-layer network involving all species, at a higher layer of which is a 7-node module-module network and a lower layer of which are seven species-species networks within modules. Two-layer networks display gut position-dependent variability, healthy statedependent variability, and position-state interaction variability in topological organization. The shift of commensalism to amensalism between Bacteroides uniformis and Flavisolibacter ginsengisoli at cecum (within the subnetworks of module M7) could be a driver of healthy guts that turn to ulcerative colitis-infected (Figure 7 vs. Figure 8). Yet, at ileum and transverse colon, these two species consistently maintain an amensalistic relationship, regardless of health state. Thus, Bacteroides uniformis-Flavisolibacter ginsengisoli interaction at these two positions cannot be used to distinguish between healthy guts and ulcerative colitis-infected guts.
Although idopNetworks have many advantages, the above results from our spatial mapping study of the gut microbiota should be interpreted with caution. First, the sample size (23 positions) used may be insufficient to make a rigorous inference. Computer simulation shows that at least 20 samples are required to obtain reasonable power for interaction detection, but increasing sample sizes, such as 50 to 100, are necessary if the measurements of microbial abundance contain a certain level of noises. 28,30 Second, as a proof of concept, our data only contain one healthy control. At least three biological replicates for the control are required to reasonably infer about microbial differences between diseased and health groups. Third, the impact of the gut microbiota on host health is determined by genetic, demographic, and environmental factors. 36 The utility of idopNetwork can be best justified only after it is implemented into a genome-wide association studies aimed at characterizing the combined effects of various genetic and environmental factors on the microbiotahealth interplay. Despite these limitations, the application of idopNetwork in this study provides a starting point for deciphering how ecological interactions transit along biogeographical locations to shape the resilience, stability and diversity of microbial ecosystems and human health from a large-scale data domain. Furthermore, idopNetwork can be brought into play to unleash a broader microbial community assembly including oral cavity, esophagus, and vaginal microhabitats.

Conclusions
The spatial variation of microbes determines how the gut microbiota impacts human health. This impact can be better revealed by inferring spatially dynamic interaction networks. In this article, we develop a model for tracing network dynamics along space with no need of dynamic data. Statistical analysis of microbial data according to this model includes four steps: (i) fitting of the power equation to the e2106103-12 abundance of individual microbes over habitat index, (ii) LASSO-based variable selection implemented to choose a small set of the most significant microbes that are linked with a focal microbe, (iii) network reconstruction based on qdODE, and (iv) functional clustering of all microbes into distinct modules based on their pattern of microbial abundance. The last step allows large-scale multilayer sparse microbial interactome networks across gut space to be reconstructed. Our model can discern interaction changes of any microbes from healthy to unhealthy guts to gain new insight into microbial impact on human health and disease.

Allometric scaling law
Consider a spatial mapping study of the gut microbiota involving multiple subjects. For each subject, the abundance of microbes at different taxa is monitored at seven distinct positions, i.e., cecum, ileum, transverse colon, descending colon, sigmoid colon, and rectum along the gut, and lumen inside the gut (Figure 1). If a gut is viewed as an ecosystem, different spatial positions inside and along the gut form multiple ecological habitats. Let y jkv denote the abundance of microbe j at position k from gut v. Note that a microbe considered here may represent an OTU at any given level, such as individual, strain, species, genus, family, order, class, or phylum. For habitat k from gut v, the total amount of abundance of all colonizing microbes (say m), i.e., H kv ¼ P m j¼1 y jkv , reflects its overall capacity to carry and feed the microbes with essential resources for their survival and propagation, which is defined as the habitat index. From an ecological perspective, the habitat index is determined by a mixture of factors including host genes, diet types and life style, 36,37 a concept similar to the environmental index coined to describe the overall quality of site in terms of the accumulative growth of all plants. 38,39 How much a given microbe within a habitat is expressed is the confounding consequence of its intrinsic capacity and its interactions with all possible biotic and abiotic factors. We thus define the abundance of a microbe in a habitat as the niche index. 37, A habitat forms a microbial community within the gut, with habitat index (H kv ) describing the whole community behavior and niche indices (y jkv ) describing components of the community. From a physical perspective, habitat index and niche index establish a part-whole relationship, which can be described by the power function according to allometric scaling theory. 40 Let n denote the total number of habitats from all subjects (guts). Thus, for a given habitat i (i = 1, . . ., n), this part-whole relationship can be mathematically expressed as where β j is the scaling exponent and α j is an intercept constant of microbe j, which together determine the scaling shape of individual microbes with habitat index. Previous studies have shown that this power equation fits microbial abundance at any taxonomic levels across habitats. 30

Quasi-dynamic evolutionary game theory
Game theory states that a rational player strives to maximize its payoff by choosing an optimal strategy in response to the strategies of other players until the Nash equilibrium is reached. 41,42 As can be seen, such a strategy choice is not arbitrary, but rather includes a rational judgment based on a player's accrued knowledge of the environment affected by other players. However, it seems unreasonable to assume the rationality of microbes in making their decision. We introduce evolutionary game theory, the combined theory of game theory and evolutionary biology, 43, which uses the concept of an evolutionarily stable strategy (ESS) to refine the Nash equilibrium. In an evolving population, any strategy used by a player to maximize its payoff would be constrained by strategies of other players that also strive to maximize their own payoffs and, ultimately, this process through natural selection would optimize the structure and organization of the population, making it reach the maxima of its overall payoff. 44, The dynamic modeling of evolutionary game theory does not specify any individual ESS, but it bears all of the ESS that change in the population, 45,46 in which case the rationality assumption is relaxed.
Although there is no time dimension in our microbiota mapping study, allometric scaling law, described by equation (1), provides a bridge to dynamically model evolutionary game theory by deriving a system of generalized nonlinear predator-prey qdODE with the time derivative replaced by the habitat index derivative. 28,29,30 Each qdODE specifies the overall abundance of a microbe that is determined by its own "strategy" and the "strategies" of its interacting counterparts. The qdODE that quantify the nLV system are expressed as where the abundance of any microbe j expressed at habitat i is decomposed into the independent component (Q j � ð Þ) and dependent components (Q jj 0 � ð Þ). The independent component of microbe j is determined by its intrinsic capacity, which can be fully expressed when it is in isolation, whereas the dependent component of microbe j is the aggregated effect of influences on it by all other microbe j′ (j′ = 1, . . ., j -1, j + 1, . . ., m). Functions, Q j � ð Þ and Q jj 0 � ð Þ, may not have explicit forms, but they can be smoothened by a nonparametric approach, such as B-spline or Legendre orthogonal polynomials (LOP), 47,48 where Θ j and Θ jj 0 are the unknown parameters that specify the nonparametric curves. We code the estimates of Q j � ð Þ and Q jj 0 � ð Þ as nodes and edges, respectively, into quantitative networks as a tool to characterize the structure and organization of microbial community assembly along the gut.

Sparse microbial networks
In practice, the number of microbes for network reconstruction may be very large, thus if the abundance of each microbe involves the effects of all other microbes, ODEs (2) will quickly become intractable. In biology, it is unlikely that each microbe performs an interaction with every other microbe in the community because a fully connected network is not helpful for the organism to maintain its robustness and stability in response to random perturbations. 49,50,51 Therefore, even if there are a number of microbes within a habitat, interaction networks they constitute are likely to be sparse. Sparsely connected networks are consistent with Dunbar's laws. In modeling social networks of non-human primates, Dunbar 52, found that there is a limit to the number of relationships within a network an individual can stably maintain because of a limited size of its neocortex. Dunbar's finding on the limit of social relationships, now known as Dunbar's law, has been considered as a general argument for network reconstruction in a wide range of physical and life sciences. 53, To choose a subset of the most significant microbes that interact with a given microbe, we formulate a multiple regression model that regresses the abundance of the microbe (response) on the abundance of all other microbes (predictors) across habitats. A regularization-based variable selection approach, such as LASSO 54, and its variants, 55,56,57 is implemented to shrink the dimension of links possibly owned by a microbe.
Regression model. We consider each gut position from each individual (healthy or unhealthy) as a sample. A sample represents a microbial community assembly that contain m microbes. Let y j = (y j (W 1 ), . . ., y j (W n )) denote a vector of observed abundance values for microbe j (j = 1, . . ., m) in n samples. Based on the structure of qdODE in equation (2), this microbe's habitat index-varying abundance level can be described by a multiple regression model, expressed as In equation (3), and G jj 0 ðÞ are the habitat indexvarying independent and dependent abundance of microbe j, whose derivatives are Q j ðÞ and Q jj 0 ðÞ of equation (1), respectively, and e j W i ð Þ in equation (4) is the residual error of microbe j at sample i, obeying a multivariate normal distribution with mean vector 0 and sample-dependent covariance matrix for microbe j. We assume that the residual errors of microbial abundance are independent among samples so that P j is structured as the residual variance of microbe j at the same sample and I n is the identity matrix. In equation (4), we have a j W i ð Þ ¼ GðÞ and where X T j is the vector containing m -1 ones and b j (W i ) = (b j1 (W i ), . . ., b jm (W i )) is a vector of the dependent value of microbe j determined by all microbes, except for j.
Group LASSO. LASSO is particularly powerful for the penalty regression analysis of a response on an extremely large number of predictors across a much smaller size of samples. Considering a focal microbe j as a response, we use nonparametric independent parameters a j to fit its G j ðÞ. As predictors, (m -1) microbes contribute to microbe j's dependent component through unknown nonparametric dependent parameters β j = (β j1 , . . ., β j (j-1) , β i(j+1), . . . , β jm ). Thus, we have m -1 groups of dependent parameters that reflects the influence of other microbes on the focal microbe. We implemented group LASSO 55 to select those nonzero groups. The group LASSO estimators of dependent parameters, denoted as = (β j1 , . . ., β j(j-1) , β j(j+1), . . . , β jd i ), where d j m is the number of the most significant microbes that interact with microbe j. It can be obtained by minimizing the following penalized weighted least-square criterion, where y j = (y j (W 1 ), . . ., y j (W n )), μ j = (μ j (W 1 ), . . ., μ j (W n )), and b j = (b j (W 1 ), . . ., b j (W n )) determined by β j through a nonparametric link; λ 1i is a penalty parameter determined by BIC or extended BIC; and Z j = diag{z j (W 1 ), . . ., z j (W n )} where z j (W i ) is a prescribed nonnegative weight function on [W 1 , W n ] with boundary conditions z j (W 1 ) = z j (W N ) = 0. This weight function is used to speed up the rate of convergence. Adaptive group LASSO: In group LASSO, penalty parameters of each group are treated equally, without considering the relative importance of different groups. It has been recognized from traditional linear regression analysis that the over-penalization of parameters for some predictors may reduce the efficiency of parameter estimation and the continuity of variable selection. 55 To overcome this limit of group LASSO, Wang and Leng 56 integrated it with adaptive LASSO to create adaptive group LASSO. This integrative approach selects significant groups by weighted penalty parameters. Weight w j|j′ is obtained as β jjj 0 À 1 2 if β jjj 0 2 > 0 and ∞, otherwise. The adaptive group LASSO estimators of dependent parameters are obtained by minimizing the penalized weighted least-square criterion as follow: where λ 2j is a penalty parameter determined by BIC or extended BIC.
After the most significant links (say d j ≪ m) for each microbe j are detected, we substitute them into nLV qdODE in equation (2) to formulate a sparse representation of the full model. The sparse qdODE are written as where the notation of the independent and dependent component terms is the same as described in equation (2). By solving equation (7) in which the number of incoming links for a microbe j is changed from m to d j through variable selection, we can reconstruct an m-node sparse microbial network.

Reconstructing microbial networks via maximum likelihood
A number of approaches, including non-linear least-squares and maximum likelihood, can be implemented to solve the shrunk qdODE from equation (2). Since we argue that microbial community assembly tends to reach its maximum overall payoff guided by evolutionary game theory, a maximum likelihood approach that is founded on the most probable existence of all microbes is chosen for our qdODE solving. Let ϕ ¼ μ; P ð Þ denote the parameters that explain the regression model. The likelihood function of ϕ given the abundance data is written as L μ; where f(⋅) is the n-dimensional m-variate normal distribution for m microbes across n samples with mean vector μ and covariance matrix P . Specifically, we model the mean vector by equation (7) subject to variable selection, i.e., Now, we implement a power equation-based LOP nonparametric approach to fit G j ðÞ by parameters Θ j and fit G jj 0 ðÞ by parameters Θ jj 0 . We use a statistically robust approach for modeling the covariance matrix, where P j is the sample-dependent covariance matrix of microbe j, and P j 1 j 2 is the sampledependent covariance matrix between microbes j 1 and j 2 . P j 1 j 2 is structured as P j 1 j 2 σ j 1 j 2 I n , where σ j 1 j 2 is the residual covariance of microbes j 1 and j 2 at the same sample. If there is a mix of static and temporal data involved, we may implement the first-order autoregressive (AR(1)) model to model autocorrelative structure of the covariance.
Under mean-covariance structure modeling by equations (9) and (10), model parameters ϕ ¼ μ; P ð Þ become model parameters ϕ 0 ¼ Θ j ; Θ jj 0 j ¼ 1; . . . ; m; j 0 ¼ 1; . . . ; j À 1; j þ 1; . . . ; d j Þ; À � σ 2 j ; σ j 1 j 2 j 1 �j 2 ¼ 1; . . . ; m ð Þ�, whose optimal solution can be obtained, by maximizing the likelihood (8), as Intuitively, this maximization implies an optimal topological structure and organization by which microbes interact with each other to maximize the overall abundance of microbial community assemble as a whole. The convex optimization formulation under equation (11) ensures the stability and sparsity of the network reconstructed from qdODE of equation (7). Since no constraints are given on the number of outgoing links, the resulting network can be high-dimensional and of a large-scale size.

Topological dissection of microbial networks
Inferred microbial networks via maximizing the likelihood function (8) meet three essential properties of networks, i.e., causality (derived from directed qdODE), sparsity (due to variable selection), and stability (assured under optimization). Apart from the methodological advantages, these networks have many biological merits. First, they are informative because they can capture the full properties of microbial interactions, including bidirectionality, sign, weight, and feedback loop. 28 Second, they are dynamically visualizable by reconstructing a series of quasi-dynamic networks along HI gradient from the instantaneous estimates of independent and dependent components for each sample. Third, the networks reconstructed by qdODE can omnidirectionally capture ecological interactions that occur among microbes. It can cover all possible types of microbial interactions, including mutualism (two microbes promote each other by producing factors that are beneficial for both interacting parties), antagonism (two microbes inhibit each other), commensalism (one microbe promotes its partner whereas the latter does not affect the former), amensalism (one microbe inhibits the other and the other is neutral), and parasitism (one microbe inhibits the other but the latter promotes the former). The opposite to parasitism is altruism (one microbe promotes the other but the latter inhibits the former). 25,49,52 A microbe may actively manipulate other microbes (by promoting or inhibiting the latter) and, meanwhile, it may be passively manipulated by other microbes. In an idopNetwork, one can identify the numbers of such active links and passive links for each microbe. If a microbe has more active links than passive links, it is regarded as a leader microbe. If a microbe's active links are more than the average of all microbes (i.e., connectivity), then this microbe is a mighty hub or keystone microbe that is believed to play a pivotal role in maintaining microbial communities. If a microbe has less links, including active and passive, than the average, it is a solitary microbe. The ecological interpretation of these strategies will stimulate researchers to explore the mass, energetic, or signal basis of microbial interactions. 53 Fourth, taking the means of the estimates of independent and dependent components for a subject from qdODE of equation (7), we can identify a network specifically for this subject. Such personalized microbial networks may provide unique information for designing personalized medicine based on the gut microbiota. Taken together, the qdODE-based networks are informative, dynamic, omnidirectional, and personalized, thus call ideopNetwporks. 28

Inferring context-specific microbial networks
In general, gut-microbial studies are designed in a case-control cohort fashion, which allows microbial interactions to be compared between a diseased group and healthy group. This can be achieved by reconstructing context-specific networks. Consider a panel of subjects who are classified into C groups or contexts (such as different races, sexes, use vs. no use of a drug, etc.). Here, we formulate a likelihood function which is the same as equation (8), but the mean vector is now modeled as μ ¼ μ 1 ; . . . ; μ m À � ¼ μ 1 W 1 ð Þ; . . . ; μ 1 W n ð Þ; . . . ; μ m W 1 ð Þ; . . . ; μ m W n ð Þ À � #! (12) and the covariance matrix modeled similarly as above. By plugging in the MLEs of mean vectors (9) and (12) into likelihood (8), we obtain the likelihood values L 1 (assuming that there is no difference among C contexts) and L 2 (assuming that there are differences among C contexts), respectively. We further estimate the log-likelihood ratio (LR), as a statistic used to test if n samples should be sorted into C contexts. By reshuffling n samples randomly into C groups, we calculate the LR value. If this permutation procedure is repeated 1,000 times, we obtain the 95th percentile from 1,000 LR values and use it as a critical threshold.

Patient recruitment and ethics
Six volunteers were recruited for the study at the Department of Gastroenterology and Hepatology of Tianjin Medical University General Hospital. Disease severity in five patients infected with UC was assessed by the modified Mayo endoscopic score. One patient with colonic polyp was marked as a non-UC control (HC). Ethics approval was received from the Tianjin Medical University General Hospital Clinical Research Ethics Committee. All patients signed the informed consent form prior to their operation, and they received polyethylene glycol-based bowel preparation for colonoscopy. Demographic data and clinical characteristics of the UC and non-UC patients are shown in Table 1.

Sample collection and brush sampling
Intestinal sampling involves the lumen and six positions long the gut, ileum, cecum, transverse colon, descending colon, sigmoid colon, and rectum ( Table 1). The specimens were sampled from three to four positions in each patient. A total of 23 specimens were collected from four UC patients and one HC.

DNA extraction and amplicon sequencing
The QIAamp DNA Mini Kit (Qiagen, Inc, Hilden, Germany) was used to extract DNA from biopsies and brush specimens, according to the manufacturer's instructions. Primers 515 F and 806 R, each with a barcode and adaptor sequences for highthroughput sequencing, were utilized to amplify the bacterial 16S rRNA gene V4 region; 16S rRNA sequencing was then performed on the Illumina HiSeq2500 platform by Novogene (Beijing, China).

Bioinformatics analysis
After removing the barcodes and primers, we merged the reads of each sample using FLASH (version 1.2.7) software to obtain the raw reads. 58 According to the raw read quality control process of Quantitative Insights Into Microbial Ecology (QIIME, version 1.7.0), 59 merged raw reads were qualified to generate clean reads: raw reads were cut off at the first base of three or more continuous low-quality bases (the default quality threshold was 19); reads with continuous high-quality bases less than 75% were further filtered out. Chimeras were identified and removed through the UCHIME Algorithm against the Gold database to generate effective reads. 60 Uparse was used to cluster all effective reads into OTUs with 97% identity; 61 the highest frequency sequence in each OTU was selected as the representative sequence. The taxonomies were assigned using the Ribosomal Database Project (RDP) classifier 62 against the Greengenes database (the confidence threshold was 80%), then taxonomic information was obtained and the community composition of each sample was summarized at each level. 63 The sequences of all samples were normalized based on the sample with the lowest number of sequences (47,986 sequences). The absolute abundance levels of microbes at different taxonomic levels were inferred by taking the product of relative abundance (assessed by 16S rRNA gene amplicon sequencing) and bacterial BF: biopsy forceps; DSB: disposable specimen brush; PSB: protected specimen brush DSB used clinically has no plugging structure and is easily disturbed by body fluids or tissues. To address the defects of the existing technology, we developed a PSB as a contrast, which set up a closed plugging device at the end of the DSB seeker using polyethylene glycol. The plugging portion can be tightly integrated with the inner wall of the sheath. The brush was enclosed in the sheath, introduced through the colonoscope channel, and exposed to the external environment when the brush was sent out to the front of the colonoscope. Specific sampling parts were rubbed and rotated around bristles to obtain superficial microbes. After sampling, the brush was retracted into the sheath, then slowly extracted from the colonoscope channel. Samples were immediately placed on liquid nitrogen and stored at -80°C until DNA extraction.