4DNvestigator: Time Series Hi-C and RNA-seq Data Analysis Toolbox

Motivation Comprehensive investigation of genome structure and function dynamics underlying cell phenotype produces vast amounts of high-dimensional, multilayered data. New methods are required to organize these data into an informative framework. Here we provide a comprehensive data analysis toolbox guided by theory of networks. Results We present the “4DNvestigator”, a user-friendly toolbox for the analysis of time-series genome structure, measured by genome-wide chromosome conformation capture (Hi-C), and genome function, measured by RNA sequencing (RNA-seq). Our toolbox provides network-based data analysis tools such as single-layer/multilayer network centrality and network entropy to characterize the 4D Nucleome. The toolbox also contains statistical methods for comparing genomic structure at multiple genomic scales. Availability https://github.com/lindsly/4DNvestigator Contact indikar@umich.edu


Introduction
The genome is a dynamical system where changes in genome structure and function over time affect cell phenotype. The relationship between genome structure, gene transcription, and cellular phenotypes is referred to as the 4D Nucleome (4DN) (1,2). To analyze the 4DN, genome-wide chromosome conformation capture (Hi-C) and RNA sequencing (RNA-seq) are used to observe genome structure and function, respectively. The availability and volume of Hi-C and RNA-seq data is expected to increase as high throughput sequencing costs decline, thus the development of methods to analyze these data is imperative.
The relationship of genome structure and function has been studied previously (3)(4)(5), yet comprehensive and accessible tools for 4DN analysis are underdeveloped. Unlike the aforementioned literature, 4DNvestigator provides a unified toolbox that loads time series Hi-C and RNA-seq data, extracts important structural and functional features, and conducts both established and novel 4DN data analysis methods. The 4DNvestigator also includes network-based approaches, such as von Neumann network entropy which provides an efficient way to track the amount of uncertainty of the dynamic genome, and multilayer network centrality which elucidates structural changes.

Methods
The 4DNvestigator takes processed Hi-C and RNA-seq data as input, along with a metadata file which describes the sample and time point for each input Hi-C and RNA-seq file. The 4DNvestigator workflow is depicted in Fig. 1A, and a Getting Started document is provided to guide the user through the main functionalities of the 4DNvestigator.

4DN Feature Analyzer
The "4DN Feature Analyzer" quantifies and visualizes how much a genomic region changes in structure and function over time. To achieve this, we adopt a network point of view, where genomic regions are nodes and interactions between genomic regions are edges. We can then quantify the importance of each node using network centrality (6), which has been shown to reflect the dynamics of genome structure over time (5). Centrality measures and gene expression (RNA-seq) are then integrated to define the structure-function state of each genomic region at each time point (Fig. 1B).

Multilayer Networks
In addition to combining genome features over time, the 4DNvestigator enables modeling of dynamic genome structure as a multilayer network, where the inter-layers correspond to the snapshots of the network at different time points. Nodal degree is an important network centrality measure, but in a multilayer network, one needs to know the degree within a single layer and how the degree is distributed across different layers. In 4DNvestigator, we use the multiplex participation coefficient to evaluate the heterogeneity of nodal degrees in dynamic biological networks.

Network Entropy
Entropy measures the order within a system, where higher entropy corresponds to more disorder (7). The 4DNvestigator applies this measure to Hi-C data to quantify the order in chromatin structure. Biologically, genomic regions with high entropy likely correlate with high proportions of euchromatin and (8,9) entropy can be used to quantify stemness, since cells with high pluripotency are less defined in their chromatin structure (10). Since Hi-C is a multivariate analysis measurement (each contact coincidence involves two variables, the two loci), we use multivariate entropy, Von Neumann Entropy (VNE).

Additional 4DNvestigator Tools
The 4DNvestigator includes a suite of previously developed Hi-C and RNA-seq analysis methods. Hi-C A/B compartments can be extracted using previously defined methods (3,11). Regions that change compartments between samples are automatically identified. The 4DNvestigator also utilizes developed MATLAB scripts for differential gene expression using established methods (12). In addition, we include a method for testing the equality of correlation matrices proposed by Larntz and Perlman (13). For more details on all methods, data, and analysis validation, see Supplementary Materials.

Conclusion
The 4DNvestigator provides methods to analyze time series Hi-C and RNA-seq data in a rigorous yet automated manner. The combined analysis of network centrality and RNA-seq over time can be easily performed using the 4DN feature analyzer. Multilayer network centrality and network entropy can also be applied to characterize the structural disorder in a region over multiple samples or time points.