Vitellogenin gene family in vertebrates: evolution and functions

Abstract The vitellogenin gene family is constituted of variable gene numbers encoding for polypeptides that are precursors of yolk proteins and derivatives in oviparous and ovoviviparous vertebrates. The comprehension of which mechanisms have shaped the evolution of vtg gene family represents an attractive field of research. The primary intent of this review is to summarize the evolutionary hypotheses that have been proposed over recent decades, highlighting the differences between the proposed models. Overall in vertebrates the evolutionary history of this gene family is the result of complex modifications deeply influenced by events such as Whole Genome Duplications (WGDs), lineage-specific gene losses and duplications. Interestingly the last hypothesis allowed to date the vitellogenin gene cluster origin in the common ancestor of gnathostomes. In addition, in the last decades, several works evidenced non-nutritional functions such as antibacterial, immunological and antioxidant activities overcoming its classical view as a simple source of nourishment for the developing embryos.


Introduction
The following review is focused on the vitellogenin (vtg) gene family in vertebrates. The members of this family encode for polypeptides that are a precursor of yolk proteins, the main energy source for the developing embryos in oviparous and ovoviviparous species (Wahli et al. 1981). Several works have evidenced that a variable number of vtg members is present in different lineages. These observations have been corroborated also by the sensible increase of data coming from available sequenced genomes. The comprehension of which mechanisms have shaped the evolution of vtg gene family represents an attractive field of research. The primary intent of this review is to summarize the evolutionary hypotheses that have been proposed over recent decades, highlighting the differences between the proposed models. Indeed, the increasing number of information about vtg genes in several species allowed to perform comparative analyses that have led to overcome some hypotheses and to suggest new ones. In particular, a strong contribution in this regard has been made possible by the advent of next-generation sequencing technologies that allowed to get new information about the number of genes and their chromosome arrangement. The most recent hypothesis on the evolution of the vtg gene family by Biscotti et al. (2018) takes advantage of these knowledges and together with the results obtained also from an exhaustive phylogenetic analysis evidenced an intriguing view of vtg gene family evolution starting from the vertebrate ancestor, tracing the mechanisms underlying this process.
Although most of the vertebrates are oviparous and ovoviviparous, a transition from yolk-dependent nourishment toward lactation and placentation has been observed during mammal evolution (Brawand et al. 2008). This has been followed by a progressive loss of vtg genes leading to the onset of new genes involved in the development of embryonic annexes.
Furthermore, another interesting aspect of the vtg gene family is the presence of multiple forms that opens up a series of questions about the role of individual Vtgs. Thus, simultaneously to the increase of knowledge about vtg gene family evolution, several works have investigated the roles of Vtgs and its derivatives yolk proteins, evidencing additional nonnutritional functions.

Vitellogenin structure and synthesis
Vitellogenin is a large multidomain apolipoprotein typically produced in females but also present at lower levels in males (Canapa et al. 2007(Canapa et al. , 2012Barucca et al. 2010;Verderame & Scudiero 2017). This protein is mainly synthesized in the liver as a result of coordinated endocrine stimulation that involves brain, ovary, and liver. The production of vitellogenin is seasonal or cyclic depending on gonadotropins. Several factors such as nutritional status, seasonal changes, for example in water temperature, induce the production of gonadotropin-releasing hormone (GnRH) by the brain (hypothalamus), which stimulates the pituitary FSH (follicle-stimulating hormone) production (Bhandari et al. 2003). This hormone in turns induces the ovarian follicle to secrete estradiol-17β (E2) that binds to specific estrogen receptors on hepatocytes. This leads to gene induction and transcription of vtgs in the liver. The produced vitellogenin is posttranslationally phosphorylated, glycosylated, and lipid groups are added before to be released into the bloodstream as homomeric complexes. Through blood, flux Vtgs reach growing oocytes where specific receptors, anchored in the plasma membrane, bind these proteins that are incorporated by clathrin-mediated endocytosis (Anderson et al. 1996;Patiño & Sullivan 2002;Yamaguchi et al. 2005).
A complete Vtg is made up of a signal polypeptide, a heavy chain lipovitellin (LvH) including four subdomains (N sheet, α helix, C sheet, and A sheet), a phosvitin (Pv), a light chain lipovitellin (LvL), and a von Willebrand factor type D domain (vWFD) containing a β'component (β'-c) and a C-terminal coding region (Ct) (Figure 1).
In LvH, amphipathic secondary and tertiary structures form a basket constituted by hydrophobic residues useful to contain lipids. This structure is similar to that present in apolipoprotein B of vertebrates. Moreover, the N sheet subdomain contains a receptor binding site responsible for the interaction with oocyte (Reading et al. 2017). In the α helix subdomain, a site binding zinc ions is localized while an alanine-rich sequence is present mainly in the A sheet subdomain of teleosts and is involved in embryo gluconeogenesis (Mikawa et al. 2006;Reading et al. 2009). The Pv domain is a serinerich polypeptide able to bind phosphates whose negative charge attracts multivalent cations as calcium, magnesium, zinc, and iron. This function is crucial for freshwater fish, living in environments poor of these metal ions. Another feature of the Pv domain is the presence of glycosylation sites useful to bind carbohydrates that, together with ions, promote the aqueous solubility of Vtgs. The LvL domain also contains glycosylation sites and as LvH is able to carry lipids. The vWFD is involved in the Vtg folding and dimerization through disulfide linkages depending on highly conserved cysteine residues (Finn 2007;Reading et al. 2009).
During the vitellogenic stage, in the ovarian follicle, the endosomes containing Vtgs are acidified by the action of proton pumps and the cathepsin D, consequently activated, cleaves vitellogenins in their constituents: lipovitellins, Pv, β'-c, and Ct (Carnevali et al. 2006;Finn & Kristoffersen 2007;Sun & Zhang 2015;Hara et al. 2016). The cleavage sites present in the Vtg proteins and responsible for the formation of these constituents show different levels of conservation. In the major part of vertebrates, the cleavage site between LvH and Pv is made up of sequence KLKKIL, while between Pv and LvL is constituted by the K(Y/F)LG consensus sequence (Finn 2007). Differently from these two cleavage sites, that between LvL and β'-c shows a higher variability. Moreover, besides these principal sites, other cleavage sites are present in the vitellogenin peptides that are implicated in the process of secondary degradation that these proteins undergo by different cathepsins (Reading et al. 2017).

Vitellogenin gene family and evolution in vertebrates
Vtgs are members of the Large Lipid Transfer Protein (LLTP) superfamily and are considered to be paralogous to apolipoproteins (APO) and microsomal triglyceride transfer proteins (MTP). This suggests that the LLTP superfamily arose from an ancestral gene encoding for a protein involved in the transport of hydrophobic molecules (Babin et al. 1999;Wu et al. 2013).
In different oviparous and ovoviviparous vertebrate lineages, the vtg gene family includes a variable number of paralog genes. For example, a single gene was found in the jawless lampreys Ichthyomyzon unicuspis and Petromyzon marinus, while three sequences of vitellogenin have been reported in the cartilaginous fish Callorhinchus milii and in non-teleost fish, the spotted gar Lepisosteus oculatus and the bichir Acipenser schrenckii. More variable is the number of vtg genes in teleosts from three up to eight in zebrafish Danio rerio (Yilmaz et al. 2018a). Regarding sarcopterygians, in coelacanths and in oviparous and ovoviviparous tetrapods three genes are present while four genes have been identified in the lungfish Protopterus annectens (Biscotti et al. 2018).
Interesting is the genomic arrangement of vtg genes. Indeed, the microsyntenic analysis has evidenced that these genes are organized in the cluster (Babin 2008;Biscotti et al. 2018) and are found in two chromosomal regions, named M and S region in a recent paper by Biscotti et al. (2018). In the analyzed organisms belonging to the main vertebrate lineages, the M region harbors a variable number of genes; the S region contains a unique gene, exception made for Xenopus laevis in which this gene is absent (Figure 2). In teleosts, the vtg gene located in the S region is named vtgC, lacks the Pv domain and presents a truncated C-terminal end (Finn & Kristoffersen 2007).
Given the high number variability of vtg paralog genes in vertebrates, several studies have been performed to investigate the evolutionary history of the vtg gene family ( Figure 3). Initially, the presence of vtg multiple copies in the genome was hypothesized to be due to whole genome duplication (WGD) events by Finn and Kristoffersen (2007) (Figure 3a). In vertebrates four events of WGD are known, in particular 1R and 2R predated the evolutionary split of vertebrates (Smith et al. 2013), Teleosts 3R (Ts3R) occurred at the origin of teleosts (Jaillon et al. 2004;Kasahara et al. 2007;Nakatani et al. 2007) and a further duplication, named Salmonids 4R (Ss4R), took place in salmonids (Near et al. 2012;Macqueen & Johnston 2014). According to this hypothesis four vtg genes were expected in tetrapods, eight in teleosts, and 16 in salmonids. However, three vtg genes named vtgI, vtgII, and vtgIII have been identified in tetrapods (van Het Schip et al. 1987;Silva et al. 1989) while actinopterygians show several multiple genes coding for different Vtg forms (Buisine et al. 2002;Wang et al. 2005;Babin et al. 2007): in Acanthomorpha three vtg genes, named vtgAa, vtgAb and vtgC are present (Matsubara et al. 2003;Hiramatsu et al. 2006;Finn & Kristoffersen 2007); in cyprinids and in eels a variable number of genes named vtgAe and vtgAo, respectively, are reported (Finn & Kristoffersen 2007). The incongruence between the number of vtg genes identified and the number of those expected has been justified by gene loss events that accompanied the WGDs and/or by specific duplication phenomena occurred in certain taxa (Finn & Kristoffersen 2007) (Figure 3a).
In 2008 Babin, through a comparative microsyntenic analysis, has shown that the vtg genes are located in two regions on the same chromosome: one harboring the vtgI of tetrapods orthologous to the vtgC of teleosts and the other harboring vtgII and vtgIII of tetrapods orthologous to vtgAa and vtgAb of teleosts. This observation led to hypothesize the presence of an ancestral vtg gene cluster composed of three genes already in the common ancestor of tetrapods and teleosts. Moreover, the proximity between the two chromosomal regions suggested that these genes originated from a duplication of a single ancestral gene (Figure 3b). For sarcopterygians and teleosts, it has been deducted from gene arrangement of several species, while for cartilaginous fish from elephant shark. White-filled dots represent flanking vtg genes; black-filled dots represent vtg genes. The schematic representation of flanking vtg genes does not reflect the real chromosomal arrangement. * indicates vtg genes absent in mammals, exception made for Platypus, in which a unique vtg gene has been reported (Brawand et al. 2008). In the M region multiple genes are present (n ≥ 2). In S region a single gene is present corresponding to vtgI of tetrapods and vtgC of teleosts. Xenopus laevis lacks the single gene in the S region. In teleosts, the distribution of vtg and related flanking genes on two chromosomes is the result of Teleost-specific whole genome duplication event (Ts3R). Kasahara et al. (2007) reformulated their hypothesis on the basis of the results of Babin (2008).

Subsequently, Finn et al. (2009) and
Further studies performed by Brawand et al. (2008) on the acquisition of new nutritional reserves for early offspring in mammals have strengthened the hypothesis of the evolution of vitellogenin gene family from an ancestral gene cluster constituted by two genes, vitI (called vtgI in Babin 2008) and vitanc (vtg ancestral). These genes were present at the time of separation between amphibian and reptile lineages. Moreover, vitanc has duplicated in tandem in the common ancestor of reptiles, birds and mammals leading to vtgII and vtgIII and it has undergone lineage-specific duplications in amphibians (Figure 3c). In mammals, a progressive loss of vtg genes followed the acquisition of new reproductive strategies from yolk-dependent nourishment toward lactation and placentation (Brawand et al. 2008).
To increase knowledge on the evolutionary history of vtg gene family Canapa et al. (2012) investigated the vtg genes in the basal sarcopterygian Latimeria menadoensis. One of the three identified sequences resulted in phylogenetically separated and orthologous to the vtgI of tetrapods.
Recently, a report published by Biscotti et al. (2018) based on an extended microsyntenic and phylogenetic analyses proposed a new intriguing scenario to elucidate the evolutionary history of vtg gene family in vertebrates (Figure 3d). The presence of a unique vtg gene in agnathes suggested that the first vtg gene duplication can be dated 500Mya, at the moment of Gnathostome origin. Moreover, concerning the gene located in the S region, the orthology between different evolutionary lineages analyzed was confirmed. In the contrary, for genes located in the M region, phylogenetic analysis did not evidence an orthology relationship. This finding suggests that these vtg genes resulted from independent tandem duplication events, in agreement with the hypothesis proposed for tetrapods by Brawand et al. (2008).
The studies performed in the last 10 years on this issue clearly demonstrate how the increase of knowledge regarding vtg genes in different species contributed significantly to the comprehension of the vtg gene family evolution. At the same time the works reviewed here evidenced the paucity of data in some taxa like lungfish and salamanders. The uncertainty of information in these organisms is also due to the lack of sequenced genomes, difficult to obtain given the huge size of their genomes (Biscotti et al. 2016;Nowoshilow et al. 2018). Thus, the findings here summarized are a starting point for further experimental opportunities that will allow to get insights into the evolution of this interesting gene family. Furthermore, the intriguing mode of evolution showed for the vtg gene family represents a case of study to review the evolution of other gene families.

Vitellogenin functions
During vertebrate evolution, the vtg gene family was subject to events that led to a wide repertoire of gene number in various species. The main function of vitellogenin proteins is to represent a source of yolk nutrients for early developmental stages. However, the presence of multiple genes of vitellogenin opens new questions about different functions that individual Vtgs and their yolk protein derivates could have. Moreover, an increasing number of works has reported several non-nutritional roles for Vtgs.
The synthesis of vtg in the liver is triggered by estrogens secreted from ovarian follicles. Through the bloodstream, vitellogenins reach female gonads and are incorporated into oocytes. During vitellogenesis, Vtgs are cleaved into major yolk components, lipovitellin, phosvitin, and β'-c that are stored in the cell. Lipovitellin is a dimer consisting of a heavy (LvH) and a light (LvL) chains. This component is rich in amino acids and lipids essential for embryonic development. Phosvitin is characterized by a high phosphorus content and serine residues, which in turns bind calcium useful for osteogenesis. Moreover, after this initial processing, Vtgs undergo a second proteolysis that in fishes can vary on the basis of producing pelagic or demersal eggs or having rapid or slow embryonic development (Finn & Kristoffersen 2007). In acanthomorph fish spawning pelagic eggs, the heavy chain of VtgAa lipovitellin is highly degraded during oocyte maturation, producing a pool of free amino acids that generates an osmotic gradient able to draw water. The consequent increase of oocyte hydration has an effect on egg buoyancy. This is also linked to water salinity that influences the proportional ratio between VtgAa, VtgAb, and VtgC (Reading & Sullivan 2011). Contrarily, the LvH derived from VtgAb is subject to a lesser proteolysis during oocyte growth and maturation and is used in late larval stages, as well as VtgC (Reading & Sullivan 2011). In salmonids, this second proteolysis has not been evidenced, probably due to the spawning of their eggs in freshwater (Hiramatsu et al. 2002). Finally, the third proteolysis occurs during embryogenesis but scarce information is reported in literature. Recently a new function has been reported about the action of a vitellogenin subdomain as a binding protein able to transfer tetraodotoxin (TTX) from liver to ovary in Takifugu pardalis. This toxin is accumulated in eggs has a dual function as a repellent against predators and as pheromone able to attract males (Yin et al. 2017).
Furthermore, the evidence of a not gender-related expression of vtg (Shyu et al. 1986) overcomes its classical view as a simple source of nourishment for the developing embryos, addressing research to the identification of non-nutritional functions of vtg. Indeed, several papers have described the active role of vtg in antibacterial activity (Zhang et al. 2005;Shi et al. 2006;Liu et al. 2009) and in enhanced phagocytosis of microbes (Li et al. 2008;Liu et al. 2009). Indeed, it has been demonstrated that Vtg is a multivalent pattern recognition receptor (PRR) able to selectively bind conserved components of bacteria and virus. After this association, Vtg may act either as effector destabilizing/disrupting cell walls or as a bridging molecule in enhancing phagocytosis via opsonization (Li et al. 2008;Zhang et al. 2011).
In addition to immune functions, Vtg and yolk proteins have been found to have also antioxidant activity (Sun & Zhang 2015), fundamental for protection against oxidative damage (Li & Zhang 2017). In particular, Pv, due to its high serine and phosphorous content, chelates iron avoiding DNA damage (Ishikawa et al. 2004).
Recently Yilmaz et al. (2018b) reported the first experimental evidence of selective knockout of multiple vtg forms in zebrafish. Their findings have revealed not only a role of Vtg in development of embryo and larvae but also new regulatory effects on fecundity and fertility. Using a multiple CRISPR/ Cas9 genome editing, they showed that fecundity was doubled in vtg1-knock out females and fertility was 50% less in vtg3-knock out females. Moreover, mortality increases in vtg3-knock out eggs/embryos and in vtg1-knock out embryos. These new findings firstly assessed that vitellogenins are essential exerting their action at different stages during reproduction and embryonic development.
Overall the synthesis of vtg can be induced by exposure to estrogens but also to endocrine disrupting chemicals (EDCs) frequently found in polluted environments. Several chemical compounds that show estrogen-like activity are strictly associated with anthropic activities and are mainly present in aquatic environments (Hara et al. 2016). The injurious effects of environmental estrogens (Thorpe et al. 2009;Tetreault et al. 2011;Zoeller et al. 2012) led vitellogenin to fulfill a key role as a biomarker in assessing the EDC effects in teleosts. In the last two decades, a huge number of studies has reported the vtg response to endocrine disruptor exposition in various fish species (Petersen et al. 2000;Tilton et al. 2005;Orn et al. 2006;Andersson et al. 2007;Canapa et al. 2007;Mortensen & Arukwe 2007;Peters et al. 2007;Ekman et al. 2009;Salierno & Kane 2009;Wang et al. 2017). Moreover, the employment of Vtg and yolk proteins in the detection of EDC contamination allowed simultaneously to develop new Vtgbased bioassays useful to easily detect environmental pollution (Hiramatsu et al. 2006;Wang et al. 2017).

Conclusions
Data here reviewed evidence that the mode of vtg gene family evolution represents an extremely intriguing case of study made complex by the action of whole genome duplication events, together with lineagespecific gene loss and duplications. Although papers published in the last decade clearly demonstrated how the increase of knowledge has been significantly improved the comprehension of mechanisms of vtg gene family evolution, some questions still remain open. Indeed, the scarcity of genomic data from lungfish and salamanders does not allow to confirm the presence of vtg gene cluster in these taxa, representing the missing pieces of the unsolved puzzle in tetrapods.
In addition, very little is known about specific contributions of the different types of Vtg in vertebrate development thus future efforts should be concentrated in this research field.

Disclosure statement
No potential conflict of interest was reported by the author(s).