EVpedia: an integrated database of high-throughput data for systemic analyses of extracellular vesicles

Secretion of extracellular vesicles is a general cellular activity that spans the range from simple unicellular organisms (e.g. archaea; Gram-positive and Gram-negative bacteria) to complex multicellular ones, suggesting that this extracellular vesicle-mediated communication is evolutionarily conserved. Extracellular vesicles are spherical bilayered proteolipids with a mean diameter of 20–1,000 nm, which are known to contain various bioactive molecules including proteins, lipids, and nucleic acids. Here, we present EVpedia, which is an integrated database of high-throughput datasets from prokaryotic and eukaryotic extracellular vesicles. EVpedia provides high-throughput datasets of vesicular components (proteins, mRNAs, miRNAs, and lipids) present on prokaryotic, non-mammalian eukaryotic, and mammalian extracellular vesicles. In addition, EVpedia also provides an array of tools, such as the search and browse of vesicular components, Gene Ontology enrichment analysis, network analysis of vesicular proteins and mRNAs, and a comparison of vesicular datasets by ortholog identification. Moreover, publications on extracellular vesicle studies are listed in the database. This free web-based database of EVpedia (http://evpedia.info) might serve as a fundamental repository to stimulate the advancement of extracellular vesicle studies and to elucidate the novel functions of these complex extracellular organelles.

C ommunication between cells and the environment is essential for single and multicellular organisms. Almost all kinds of simple unicellular organisms (e.g. archaea; Gram-positive and Gram-negative bacteria) to complex multicellular ones secrete nano-sized extracellular vesicles (EVs) for their intercellular communica-tions (1Á4). These membrane vesicles are spherical bilayered proteolipids with a mean diameter of 20Á1,000 nm (2,3,5,6), which are known to contain various bioactive molecules, including proteins, nucleic acids (mRNA, microRNA, rRNA, and tRNA), and lipids (3,6Á11). EVs derived from archaea and Gram-positive bacteria are called membrane vesicles (1,2), whereas Gram-negative bacterial EVs are called outer membrane vesicles (3). Mammalian cells secrete exosomes and ectosomes (also known as microvesicles) either constitutively or in a regulated manner (4).
Although EVs play several physiological and pathological functions (12,13), and occupy an emerging position in the field of biomarker discovery (14,15), it is difficult to study EVs because they contain various bioactive molecules. Recently, however, to solve this problem, high-throughput analyses were performed on prokaryotic and eukaryotic EVs. For example, many mass spectrometry-based proteomic studies, microarray or next-generation sequencing-based transcriptomic studies,   (16,17), which was recently updated into Vesiclepedia (http://www.microvesicles.org; 18). However, there has been no resource on vesicular components (proteins, nucleic acids, and lipids) derived from diverse types of prokaryotic and eukaryotic cells. In addition, an analytical tool for their Gene Ontology enrichment analyses, network analyses of vesicular proteins and mRNAs, and a comparison of vesicular datasets by ortholog identification have not been developed. These types of systematic analyses on vesicular components provide global insights into the mechanisms involved in vesicular cargo-sorting and EV biogenesis as well as the pathophysiological roles of EVs. For example, we recently showed that mammalian vesicular proteins are physically and functionally interconnected to form functional modules involved in EV biogenesis and functions; those data suggest that an EV is a nano-sized extracellular organelle (i.e. nanocosmos) rather than a cellular dust (19).
Here, we present EVpedia (http://evpedia.info), which is an integrated database of high-throughput datasets from EVs launched in January of 2012 with the latest update in November of 2012 (Fig. 1). EVpedia provides information on proteins, mRNAs, miRNAs, and lipids enclosed in prokaryotic, non-mammalian eukaryotic, and mammalian EVs. Moreover, EVpedia also provides an array of tools, such as the search and browse of vesicular components, Gene Ontology enrichment analysis, network analysis of vesicular components, and a comparison of vesicular datasets by ortholog identification. In addition, publications on EV studies are listed in the database. This free web-based database of EVpedia might serve as a fundamental repository to stimulate the advancement of EV studies and to elucidate the novel functions of these complex extracellular organelles.

Overall structure of EVpedia
For a systematic exploration of high-throughput datasets from prokaryotic and eukaryotic EVs, EVpedia has four functional modules: (a) database of high-throughput datasets, (b) search and browse of database, (c) identification of orthologous vesicular proteins and (d) bioinformatic analyses of vesicular components.
A total of 230,937 vesicular components from 190 highthroughput datasets from 130 high-throughput studies  were collected in the current EVpedia database (Table I).
Among the 190 high-throughput datasets, 166 derived from eukaryotes and 24 from prokaryotes. Vesicular highthroughput datasets, detailed methods for EV isolation, and high-throughput analysis for each dataset were arranged as tables under the ''Experiment'' menu. Userrequested lists of vesicular components are provided in the search and browse function. A new vesicular highthroughput dataset can be submitted to the database by the ''Upload'' menu.
To analyze protein or mRNA lists, EVpedia provides Gene Ontology enrichment analysis and network analysis. In addition, for comparison of vesicular proteome and transcriptome (mRNA) from different strains, we provide ortholog identification among those species in EVpedia. Based on the ortholog information, one can compare lists of vesicular proteins or mRNAs among different species with set analysis. For all the analyses, the molecule lists from the EVpedia and the new molecule lists from the users are both applicable.
An overall comparison of EVpedia with Exocarta and Vesiclepedia is shown in Tables I and II. These three webbased repositories contain proteomic, transcriptomic and lipidomic studies of non-mammalian eukaryotic and mammalian EVs. However, only EVpedia provides additional proteomic and lipidomic studies on prokaryotic   EVs (Table I). Moreover, EVpedia also provides the information on EV-related publications (Table I) as well as an array of analytic tools (Table II), including (a) Gene Ontology enrichment analysis of vesicular components, (b) network analyses of vesicular components and (c) set analysis Á a comparison of vesicular proteome and transcriptome data sets with ortholog identification. Currently, EVpedia collects vesicular proteomes identified only with high-throughput studies, but not immunoblotting or immunoelectron microscopy. We will expand the EVpedia database by adding these low-throughput protein datasets to address the hypothesis-driven or biological questions on EVs.

Gene ontology enrichment, network, and set analyses of EVpedia
The ''Analysis'' menu in EVpedia provides an array of bioinformatic analysis tools: (a) Gene Ontology enrichment and network analyses of the vesicular components and (b) set analysis among more than two different sets of the vesicular proteins or mRNAs through ortholog identification. Through Gene Ontology enrichment analysis is in the ''Analysis Á Gene Ontology enrichment analysis'' menu, the enriched terms (i.e. Gene Ontology biological process, molecular function, and cellular component) of vesicular components can be obtained (e.g. mRNAs in Fig. 3a). Via network analysis in the ''Analysis Á Network analysis'' menu, functional relationships among vesicular components can be drawn into biological networks (e.g. mRNAs in Fig. 3b).
Moreover, EVpedia provides comparative analyses among more than two different sets of vesicular components by the ''Analysis Á Set analysis'' menu. For example, we selected two sets of vesicular proteins (22): Homo sapiens colorectal cancer cell SW480 and SW620 (Fig. 4). The Venn diagram in Fig. 4a shows the number of members in the set intersection and the set difference between SW480 and SW620. The lists of each subset in the Venn diagram can be obtained for further analyses, such as the network analysis (Fig. 4b). Note that all of these analyses can be applied to a new list of proteins or mRNAs, including a newly uploaded vesicular proteome or mRNA transcriptome.
As shown in Fig. 5, the number of studies on prokaryotic and eukaryotic EVs is growing rapidly; this rapid growth indicates that the field of EVs is expanding  intensively. In addition, the major principal investigators published a paper on EVs are listed with their EV-study publications as tables in the ''Principal investigators'' menu. The users can survey the major researchers and their research fields to provide more insights on their EV studies.

System requirements of EVpedia
When building the EVpedia, we followed the international web standards that are compatible with most existing systems for web browsing. However, we recommend the following system requirements for best performance; operating system Á MS Windows 7; internet browser Á Google Chrome; resolution Á 1,280 )1,024.
We have tested the performance of EVpedia on the following systems; operating systems*MS Windows XP/ 7 and Apple OS X for personal computers, Google Android and Apple iOS for cell phones and tabloids; internet browsers Á Google Chrome, Microsoft Internet Explorer, Apple Safari, and Mozilla Firefox. For network analysis, EVpedia requires Java Web Start (http://www. oracle.com/technetwork/java/javase/downloads/index.html).

Conclusion and future directions
EVpedia is an integrated database of high-throughput datasets from EVs derived from prokaryotes and eukaryotes. This database is scheduled to be updated every six months. This free web-based database should be a useful resource to elucidate fundamental roles of EVs derived from prokaryotes and eukaryotes.
Furthermore, for high-quality EV datasets, the unified criteria for high-throughput datasets are needed. First, the coherent standards for EV preparation should be defined. Although the detailed procedures vary with different studies, most studies commonly used combinations of filtration, differential centrifugation, and density gradient centrifugation methods to purify EV. In addition, coordinated standards for high-throughput data production are required because there are many systems and programs to produce and analyze high-throughput data, such as mass spectrometry-based proteomics, microarraybased transcriptomics, next-generation sequencing-based transcriptomics, and chromatography-based lipidomics (24). Therefore, in order to collect together the dispersed data of EVs, it is crucial that a clear and detailed guideline  for the preparation of EVs and analysis of their highthroughput data is set up.