The complex barnacle perfume: identification of waterborne pheromone homologues in Balanus improvisus and their differential expression during settlement

Abstract A key question in barnacle biology is the nature of cues that induce gregarious settlement. One of the characterised cues is the waterborne settlement pheromone (WSP). This study aimed to identify WSP homologues in Balanus improvisus and to investigate their expression during settlement. Six WSP homologues were identified, all containing an N-terminal signal peptide, a conserved core region, and a variable C-terminus comprising several -GR- and -HDDH- motifs. The B. improvisus WSP homologues were expressed in all settlement stages but showed different expression patterns. The homologue most similar to the B. amphitrite WSP was the most abundant and was constantly expressed during settlement. In contrast, several of the other WSP homologues showed the greatest expression in the juvenile stage. The presence of several WSP homologues suggests the existence of a pheromone mix, where con-specificity might be determined by a combination of sequence characteristics and the concentration of the individual components.


Introduction
Barnacles are among the most important fouling organisms in the marine environment. During settlement, the barnacle cyprid larvae respond to chemical and physical cues in the surrounding environment while swimming in the water column as well as when exploring various surfaces. The surface exploration comprises wide search, close search, and inspection behaviours that precede the settling decision leading to permanent adhesion through the release of cement (Crisp 1976;Aldred et al. 2018). Because adult barnacles are permanently attached, cyprid substratum selection is the main factor determining the local abundance and distribution of barnacles in natural populations as well as on marine constructions (eg ships' hulls), and is therefore a key question in barnacle biology and ecology (Thiyagarajan 2010). Understanding the mechanisms underlying cyprid attraction and substratum selection will facilitate the tailoring of antifouling technologies resulting in the development of surfaces that are actively rejected by cyprids during the initial stages of the surface exploration, thus preventing attachment.
Gregariousness in barnacles, ie the attraction of conspecific individuals and the subsequent settling and aggregation in dense communities, is crucial for reproduction and is achieved through cyprid settlement in response to chemical cues produced by individuals of the same species (Clare and Matsumura 2000). Since Knight-Jones and Stevenson (1950) described gregarious barnacle settlement behaviour for the first time almost 70 years ago, considerable advances have been made in the understanding of the mechanisms of this behaviour. Early research in gregarious settlement led to the discovery that extracts of whole adults contain a 'settling factor' that induces conspecific settlement of cyprids (Crisp and Meadows 1962). Later studies resulted in the identification of the contact pheromone in Balanus amphitrite (¼Amphibalanus amphitrite), called settlement-inducing protein complex (SIPC) (Matsumura et al. 1998), the cDNA sequence of which was cloned and sequenced (Dreanno et al. 2006). It was shown that SIPC is a 171 kDa cuticular glycoprotein with similarity to the a2-macroglobulin protein family. SIPC is active when bound to a surface (Matsumura et al. 1998), and cyprids can potentially detect the SIPC with their antennules while walking on the surface of, or close to, adult barnacles. Cyprids are able to differentiate between conspecific and allospecific SIPC (Dreanno et al. 2007). While the exact mechanism remains unknown, it has been suggested that either variable regions of the SIPC or glycosylation patterns might provide species specificity (Yorisue et al. 2012). Further research showed that SIPC has a dual role; it works as a settlement-inducing cue by attracting cyprids, but also as a settlement avoidance cue at higher concentrations informing of overcrowding and increased reproductive competition (Kotsiri et al. 2018). A SIPC homologue has also been identified and characterised in Balanus glandula (called MULTIFUNCin) where it induces both gregariousness as well as predation by sea snails Zimmer et al. 2016).
Apart from the SIPC that induces settlement when bound to the surface, there is also evidence for a waterborne settling factor (Clare and Matsumura 2000). According to the current model, cyprids detect the waterborne cue while swimming and respond to it by transitioning from water to a substratum where the surface-bound SIPC pheromone in turn induces permanent attachment (Elbourne and Clare 2010). The first evidence of a waterborne cue came from an assay that used seawater conditioned with adults of Semibalanus balanoides showing that it contains a factor that induces temporary attachment of cyprids (Rittschof 1985). However, there is still ambiguity regarding the nature of the waterborne cue with several studies reporting varying estimates of the size of the active component of the conditioned water, ranging from peptides of 3-5 kDa to peptides of < 500 Da (Clare and Matsumura 2000). In addition, several synthetic di-and tripeptides were tested for settlementinducing activity, showing that peptides containing basic carboxy-terminal amino acids, in particular the glycyl-glycyl-arginine (GGR) peptide, had the tendency to enhance settlement of cyprids (Tegtmeyer and Rittschof 1988). However, this effect could not be reproduced by another study (Clare and Yamazaki 2000). Later, a protein corresponding to 32 kDa was purified from homogenized adult extracts of B. amphitrite and was shown to induce cyprid settlement. The protein diffuses into seawater when embedded in an agarose gel and induces settlement, suggesting a waterborne pheromone function (Endo et al. 2009). Interestingly, the N-terminal sequence obtained did not show any resemblance to SIPC or to any other proteins in the available databases. The full sequence of this B. amphitrite waterborne pheromone was published in sequence databases in 2012 and was called waterborne settlement pheromone (WSP): BAM34601.
Since the original publication of the discovery of WSP in 2009 (Endo et al. 2009), there have been no follow-up studies to further elucidate the structure and function of the WSP. However, it was recently indicated that more than one WSP gene is present in Tetraclita japonica formosana (Lin et al. 2014) and B. amphitrite (So et al. 2017;Wang et al. 2018), but the available information is extremely scarce and only one of the sequences has been published (Lin et al. 2014) and none are deposited in public databases. Moreover, the study by Endo et al. (2009) mentioned preliminary unpublished data showing that several proteins with a molecular mass of around 32 kDa detected in barnacle-conditioned seawater have settlement-inducing activity. Altogether, this indicated that there appear to be more than one WSP homologue in barnacles. From the ecological point of view, this opens the possibility that a combination of WSP homologues might work as a pheromone blend. Pheromone mixtures are commonly used for chemical communication in various animals, including marine invertebrates (Kelly 1996;Cummins et al. 2004). In particular, the sea slug Aplysia releases a pheromone blend comprising more than three different types of waterborne pheromones to attract mates (Cummins et al. 2005).
The indications of the presence of several WSP homologues in some barnacle species led the present authors to explore the presence of WSP homologues in B. improvisus using transcriptome data from cyprids and adults. Several WSP candidates in B. improvisus were identified, cloned, and compared to homologues present in B. amphitrite. In addition, gene expression analysis of the WSP homologues in B. improvisus during the settling process was performed showing that the genes are clearly differentially expressed. This study adds yet another dimension to the complexity of barnacle pheromones forming a basis for a better understanding of the gregarious behaviour as well as providing essential information for developing highly barnacle-selective antifouing strategies.

Materials and methods
Identification and cloning of WSP homologues in B. improvisus Blast searches of several different RNA datasets (unpublished data) from cyprids and adults of B. improvisus were performed using B. amphitrite WSP (BAM34601) as a query. Six sequences that constituted full or partial fragments of WSP-like sequences were found, five of which were cloned and confirmed by Sanger sequencing. The sixth sequence was identified from a single molecule PacBio RNA dataset obtained from an adult of B. improvisus (Lind et al. 2017). The sequences identified were named according to the level of identity to the published B. amphitrite WSP, eg the Bi_WSP clone was 74% identical and 91% similar at the protein level to the B. amphitrite WSP (Table 1), while the WSP-like 1-5 sequences had a gradually lower (in the range of 39%-70% identity and 71%-88% similarity) level of identity/similarity that followed the gene numbering.
Cloning of the cDNA containing the complete open reading frame (ORF) for Bi_WSP and Bi_WSPlike 2-5 was done by first performing RACE using PCR primers based on sequences found in a Sangersequenced cDNA library prepared from B. improvisus cyprids (unpublished data). RNA preparations of cyprids and cDNA synthesis were carried out as described in Lind et al. (2010). 3-RACE was performed for all five pheromones (GeneRacer kit; Invitrogen), and 5-RACE was performed for Bi_WSPlike 3. As a first step, a touch-down PCR was performed using the following PCR program: an initial denaturing step of 98 C for 2 min followed by 5 cycles of 98 C for 30 s, 72 C for 1 min, 5 cycles of 98 C for 30 s, 70 C for 1 min, and 25 cycles of 98 C for 30 s, 65 C for 30 s, and 72 C for 1 min. A final elongation step of 72 C for 7 min was added. A nested PCR was thereafter performed using 1 ll of the touch-down reaction applying the following PCR program: an initial denaturing step of 98 C for 2 min, followed by 35 cycles of 98 C for 30 s, 65 C for 30 s, and 72 C for 1 min and a final elongation step of 72 C for 7 min. The polymerase PfuUltra (Stratagene) was used in all PCR reactions. Sequences obtained from RACE were used to design primers to clone the complete ORF for the genes using the same PCR program as for the nested PCR, but with an annealing temperature of 60 C. For the primers used for the cloning, see Supplemental material Table S1. Bi_WSPlike 1 was not cloned. Instead, a full-length sequence corresponding to Bi_WSP-like 1 was identified from an adult single-molecule PacBio RNA dataset (Lind et al. 2017), where the corresponding consensus contig covering the ORF was supported by more than 30 PacBio reads. The sequences have been deposited in GenBank with accession numbers MK275628-MK275633. To find the sequences that are most divergent from the B. improvisus WSP homologues but still having the B. amphitrite WSP as the best match in NCBI nr, the identified B. improvisus WSP and WSP-like 1-5 protein sequences were used as queries for searching in the authors' different datasets. The search resulted in the identification of one additional sequence showing 30% identity or less to any of the queries, and this was named Bi_pb29993 according to the PacBio contig.

Searching for WSP homologues in other species
To determine whether barnacles other than B. improvisus also have several WSP homologues, the available larval and adult B. amphitrite NCBI/SRA datasets (SRX120025/SRR426836, SRX1035030/31, SRX1035107/ 27) were downloaded and transcriptome assemblies were created using Trinity 2.0.6 (Haas et al. 2013). These B. amphitrite assemblies were searched using the six identified B. improvisus WSP homologues. To avoid using wrongly assembled sequences, only B. amphitrite WSP-like candidates that were found in at least two assemblies were used for further sequence analysis. All candidate B. amphitrite ORFs were checked vs NCBI nr to make sure they were most similar to the published B. amphitrite WSP and not more similar to other types of proteins. B. amphitrite sequences that had the best match to the B. amphitrite WSP in NCBI nr searches were regarded as WSP-like candidates and named according to their contig numbers.
All identified protein sequences from B. improvisus and B. amphitrite were analysed using the InterPro (https://www.ebi.ac.uk/interpro/) and Pfam (https:// pfam.xfam.org) domain databases, as well as with Table 1. Pairwise identity and similarity (in parenthesis) of the B. amphitrite WSP (BAM34601) and the identified B. improvisus WSP homologues.

Ba_WSP
Bi_WSP Bi_WSP-like 1 Bi_WSP-like 2 Bi_WSP-like 3 Bi_WSP-like 4 Bi_WSP-like 5 SignalP and TargetP available at http://www.cbs.dtu. dk/services/. All six B. improvisus WSP homologues as well as Bi_pb29993 had weak matches (see Supplemental material Table S2 for the details) to the Pfam cupin_5 domain (PF06172), referred to as the RmlC-like cupin domain in InterPro. Sequences with a better match to the cupin_5 protein domain were also identified in both B. improvisus and B. amphitrite and were named Bi_cupin and Ba_cupin, respectively.
To further investigate whether crustaceans other than barnacles have proteins similar to the WSP homologues, an alignment of the six B. improvisus WSP homologues with the signal peptide and C-terminus removed was used to create an HMM profile with the HMMER 3.0 package (Potter et al. 2018). A set of all available 867,743 crustacean proteins was downloaded from the NCBI protein database (2018-05-02) and searched using the obtained WSP-specific HMM profile. The HMM search identified 27 sequences including the B. amphitrite WSP and 26 sequences from either Daphnia or Eurytemora. Among these 26 sequences, none had the B. amphitrite WSP as the best Blastp match in NCBI nr nor were they more similar to the B. improvisus WSP homologues than to the barnacle cupin_5 representatives in the present authors' dataset. To identify any other domains than cupin_5 that were present in the identified sequences, the protein sequences were searched against the collection of profiles in Pfam with hmmscan HMMER 3.0 (Potter et al. 2018). The search showed that there were no other domains than cupin_5 in the identified sequences, with scores well above the trusted Pfam cut-off (37.2-98.3).

Protein alignments
The B. improvisus and B. amphitrite WSP homologue protein sequences were aligned using Clustalw 2.0.10 and visualised with Jalview 2.10.1. The alignment revealed the presence of a well-aligned central region of 200 aa with a few gaps (Figure 1b) that is referred to as the 'core region' throughout the text. The fasta36 package (Pearson 2016) was used to create the identity table for the core region of the identified B. improvisus WSP homologues and the B. amphitrite WSP (Table 1). The Zappo sequence colour scheme was used to show the physico-chemical properties of amino acids in the C-terminal region (Figure 2 and Figure S1B in the Supplemental material).

Phylogenetic trees
The B. improvisus and B. amphitrite WSP homologues identified were used for phylogenetic analysis.
The Daphnia cupin_5, the closest homologue to the B. improvisus WSPs found among crustaceans, was used as an outgroup together with B. improvisus and B. amphitrite cupin_5 representatives.
The two methods used for the phylogenetic analysis were MrBayes 3.2.6 and PhyML 3.0 at phylogeny.fr. The ClustalW alignment of the identified B. improvisus and B. amphitrite WSP homologues and cupin-5 proteins ( Figure S1A) was used as input into MrBayes 3.2.6 with one cold and three hot chains running for 100,000 generations at which point the average standard deviation was < 0.01. Alignments of full-length sequences or core regions only were used as input to PhyML.

Expression analysis of the B. improvisus waterborne pheromones identified
Transcriptomic data on settlement stages were used to investigate the expression of the B. improvisus WSP homologues identified, and the details of the experimental set up and the sampling of the different settling stages are described in Abramova et al., 2018 (manuscript). Briefly, cyprid larvae of B. improvisus were reared in a laboratory culture as described in Jonsson et al. (2018). For the expression analysis, 1-2-day-old cyprids were settled on Petri dishes (Nunc No 150340, Ø 48 mm) for 4-5 days. Four different settling stages were collected: free-swimming, close-search, attached cyprid, and juvenile. Each stage was represented by three biological replicates to ensure the generality of the findings (three independent batches of cyprids each coming from different sets of B. improvisus parents) with 20 individuals in each. Extracted RNA was used to make sequencing libraries with the Illumina TruSeq Stranded mRNA protocol following poly-A selection, and these were multiplexed and sequenced on three Illumina lanes. Library preparation and Illumina sequencing were performed at the national genomics facility at the Science for Life Laboratory in Stockholm (https://www.scilifelab. se/platforms/ngi/). Obtained RNA-seq data were assembled with the Trinity 2.0.6 software (Haas et al. 2013), and the previously identified B. improvisus WSP homologues were used to identify the corresponding contigs in the transcriptome assembly. The sequences of Bi_WSP and Bi_WSP-like 2 were complete in the assembly, whereas the other WSP homologues were either found as partial contigs or were split between different Trinity contigs. To make a statistical analysis possible, all the fragmented WSPlike contigs were removed and substituted with the corresponding WSP-like sequences with complete ORFs that were previously identified. This edited assembly was used as the template for read mapping and to calculate the transcript abundances using the pseudoalignment-based quantification method kallisto (Bray et al. 2016) through the use of a Trinity 2.0.6 wrapper. The resulting abundances were used to generate a matrix of counts and a TMM (trimmed mean  (Table S2). A solid line is used if the score is above the threshold of 25.0, and a dashed line is used if it is below the threshold. (b) Sequence alignment of the B. improvisus (in bold) and B. amphitrite WSP homologues showing conservation in the core region (marked) and high variability of the C-terminal region. The default Clustal X colour scheme in the Jalview program was used.
of M-values)-normalised matrix. Lowly-expressed genes were filtered out by keeping only genes with at least 2 CPM (corresponding to 60 counts on average) in at least two samples across the entire experiment.
The preliminary data quality check indicated a pronounced batch effect reflecting the different parental backgrounds of the cyprid batches. To minimise the batch effect, the script removeBatchEffect from the edgeR 3.4.2 was applied to the count matrix (Robinson et al. 2010). The differential gene expression analysis was performed using the edgeR package. Genes with false discovery rate (FDR) 0.05 and a logFC > 2 were considered to be significantly differentially expressed.

Identification of WSP-like sequences in B. improvisus
To identify sequences in B. improvisus that are similar to the published WSP in B. amphitrite, tBlastn searches of RNA-seq datasets obtained from both cyprids and adults of B. improvisus were performed using the published B. amphitrite WSP as a query. Six unique B. improvisus RNA-contigs were found, of which five encoded partial WSP proteins. These sequences were used to obtain full-length variants of all WSP homologues in B. improvisus. The protein sequence identity between the six obtained B. improvisus homologues and the B. amphitrite WSP was 39%-74% (Table 1), with Bi_WSP being the most similar. The numbering of the WSP-like homologues in B. improvisus is related to their sequence identity to the Bi_WSP, with Bi_WSP-like 1 being the highest (80%). All B. improvisus protein sequences contained a predicted N-terminal signal peptide, a rather wellconserved central core region, and a highly variable C-terminal region (Figure 1). Four of the six identified B. improvisus WSP homologues were of almost the same length (255-266 aa) while Bi_WSP-like 3 and Bi_WSP-like 5 (217 and 222 aa, respectively) had a roughly 40 aa shorter C-terminus ( Figure 1 and Table 2).
Except for the relatively high sequence identity to WSP from B. amphitrite, the WSP homologues in B. improvisus did not have any reliable matches to other proteins in the databases. However, all B. improvisus WSP homologues contained a region with weak resemblance to the cupin_5 domain (PF06172) ( Table S2). Cupin_5 is a domain of unknown function, and proteins containing this domain belong to the cupin superfamily (http://www.ebi.ac.uk/interpro/entry/IPR009327). In the Pfam database, for metazoans there are currently only 17 sequences from 10 species indicated to have a cupin_5 domain. Among these sequences, the scores for the predicted cupin_5 domain varied from 41.3 to 139.2 with the only two crustacean representatives having scores of 70.5 or 78.7, both coming from Daphnia (see Table S3 in Supplemental material). The Pfam cupin_5 domain scores for the B. improvisus WSP homologues varied between 19.8 and 36.3, with some of them thus being below the Pfam cupin_5 profile threshold score of 25.0 (Table S2). However, proteins with higher cupin_5 Pfam scores in both  Figure 2. Characterisation of the C-terminus of the WSP homologues identified. Alignment of the C-terminus from the identified WSP homologues with the Zappo colouring scheme showing charged amino acids in red and blue. Examples of the -HDDH-and -GR-motifs are marked with black boxes (see Figure S1B in the Supplemental material for the full alignment).
B. improvisus and B. amphitrite (with scores 45.4 and 46.0) were found, and these were named Bi_cupin and Ba_cupin, respectively. The Bi_cupin sequence was only 15% identical to Bi_WSP. Even if some of the WSP homologues in B. improvisus have a domain with weak similarity (slightly above the threshold score) to cupin_ 5, for the ease of the following discussion only proteins with high domain scores, ie Bi_cupin, Ba_cupin and Daphnia cupin_5, will be referred to as 'cupin_5 proteins' further in the text. A search for more WSP-like sequences in B. improvisus, using the six initially identified B. improvisus WSP homologues as queries, resulted in the identification of one additional sequence showing a roughly 30% identity to the other WSPs and 15% identity to the Bi_cupin. This sequence was named Bi_pb29993 according to the name of the assembly contig. An alignment with the WSP-like sequences ( Figure S1A) showed that the Bi_pb29993 protein was rather different from the six B. improvisus WSP homologues, but it still had the B. amphitrite WSP as the best match in the NCBI protein database.
Analysing the sequences of the B. improvisus WSP homologues showed that the long C-terminus found in most homologues was clearly distinct from the conserved core region by containing many charged residues (D/E, R/H/K) as well as glycines, with several -GR-and -HDDH-motifs or close variants of these ( Figure 2). It should be noted that the frequent -HDDH-and -GR-motifs in the Cterminal regions of the identified WSP homologues vary greatly in their number, position, and composition, making it difficult to align this particular region of the sequences.
Additional WSP-like sequences in B. amphitrite By using the identified WSP homologues from B. improvisus as queries for searching the published B. amphitrite RNA datasets, three more B. amphitrite WSP homologues in addition to the published Ba_WSP (Endo et al. 2009) were identified, of which two were encoding complete proteins (Figure 1b). Furthermore, two B. amphitrite sequences with the best match either to Bi_cupin or Bi_pb29993 were identified.
The alignment of the identified WSP homologues (Figure 1b) clearly shows that the B. amphitrite WSPlike sequences are structurally similar to the B. improvisus sequences, with an N-terminal signal peptide, a conserved core region, and the greatest difference found in the C-terminal region. The C-terminus of three of the B. amphitrite sequences, including the published Ba_WSP, had a prevalence of charged amino acids similar to what was found for the B. improvisus homologues Bi_WSP and Bi_WSP-like 2 and 4 (Figure 2 and Figure S1B in Supplemental material), while the fourth B. amphitrite sequence had a shorter C-terminus resembling B. improvisus WSPlike 3 and 5.

Phylogenetic analysis
Phylogenetic analyses showed that all B. improvisus WSP homologues were found in a well-supported core clade together with the B. amphitrite WSP, except for the Bi_WSP-like 5, which formed a sister group (Figure 3). The Bi_WSP-like 3 seems to be more closely related to Bi_WSP than to Bi_WSP-like 5, even though both Bi_WSP-like 3 and 5 lack the long, charged C-terminus (proteins with a long C-terminus are marked with an asterisk in Figure 3). The WSP homologues from B. improvisus and B. amphitrite were not divided into separate clades, thus suggesting that several WSP homologues were already present before these two species diverged. While the phylogenetic method used (either Bayesian or Maximum Likelihood) did not influence the overall topology of the phylogenetic tree, it did change the exact branching within the core clade indicating the difficulties in resolving the true topology in this part of the tree ( Figure S2).
Overall, the phylogenetic analysis provided strong evidence that the B. improvisus WSP homologues identified are clearly more closely related to the B. amphitrite WSP than to the cupin_5 proteins (Daphnia cupin_5, Bi_cupin, and Ba_cupin).

Expression of B. improvisus WSP candidates during settlement
The WSP homologues identified from B. improvisus showed different expression patterns during settlement (Figure 4). The expression of Bi_WSP did not change significantly during settlement progression and was by far the highest, with roughly a hundredfold higher expression compared to the other WSPs in the cyprid stages (Figure 4g). In contrast, Bi_WSPlike 1 was overall the least-expressed candidate in any of the settling stages (note the different scales on the y-axes). Bi_WSP-like 2, Bi_WSP-like 3, and Bi_WSPlike 5 showed similar expression patterns, being relatively lowly expressed in the cyprids and in the attachment stage and substantially upregulated when cyprids metamorphosed into juveniles; for example, Bi_WSP-like 5 underwent about a 7-fold upregulation in the juvenile stage compared to the attached stage ( Figure 4f). In summary, the metamorphosed juvenile stage of B. improvisus displayed substantial expression of all WSP homologues, with Bi_WSP having the highest expression.

Discussion
Since the initial publication of the discovery of WSP in 2009 (Endo et al. 2009), there have been several studies indicating that more than one WSP gene is present in some barnacle species (Lin et al. 2014;So et al. 2017;Wang et al. 2018). Originally, Endo et al. (2009) reported the results of a preliminary study that detected several proteins with weights around 32 kDa in B. amphitrite-conditioned seawater that exhibited settlement-inducing activity. In addition, several WSP sequences were found in the water-soluble fraction of proteins rinsed from the cement (So et al. 2017) and among the proteins from longitudinal canal and submantle tissues of B. amphitrite (Wang et al. 2018). In adults of T. j. formosana, Lin et al. (2014) reported the presence of two different WSP sequences displaying different expression in prosoma and basis, suggesting the existence of more than one WSP also in this species. However, only one of the sequences from these studies was published. The indications of the presence of several WSP homologues in some barnacle species led the present authors to explore the presence of WSP homologues in B. improvisus using transcriptome data from cyprids and adults. Six sequences that are homologous to the published B. amphitrite WSP were identified from B. improvisus. The similar size and isoelectric point observed for several of the WSP homologues identified here (data not shown) suggest that in the previous studies based  Figure  S1A. Cupin_5 domain proteins were added as an outgroup. All identified B. improvisus WSP homologues are marked in blue. Asterisks indicate WSP-like proteins with a long and charged C-terminus. on protein purifications (Endo et al. 2009;So et al. 2017) a single band on protein gels corresponding to 32 kDa could theoretically represent co-purification of several different proteins.
Phylogenetic analysis clearly showed that five of the six identified WSP sequences from B. improvisus proteins belong to the same clade as WSP homologues from B. amphitrite, suggesting that several WSPs were already present before the two species diverged. Despite some of the sequences appearing to form orthologous groups in the phylogenetic analysis (Figure 3), it is not presently possible to infer the true orthologous relationships between the WSP-like candidates from B. improvisus and B. amphitrite due to the lack of complete genome-based proteomes for any of these barnacle species. Furthermore, it should be pointed out that it is not yet possible to establish the true number of WSP family homologues in B. improvisus and B. amphitrite using only transcriptome data because some of the WSP genes might be expressed at extremely low levels at the life stages and tissues examined and thus not be detected. The phylogenetic analysis presented here should therefore be seen as a first step to aid in the classification of WSP-like sequences. Although an evolutionary relationship is not evidence of similar function, it is clear that the identification of several WSP-like sequences in these species suggests that WSP is a substantially larger protein family than previously thought and that multiple WSP paralogues also exist in other barnacle species.
There is currently no evidence for the presence of WSP homologues in any crustaceans other than barnacles, suggesting that the WSPs are a relatively late evolutionary invention and thus a protein family that might be unique to barnacles. However, part of the core region of the WSP homologues has some similarity to the cupin_5 domain found in both prokaryotes and eukaryotes. Nevertheless, the sequence and phylogenetic analyses indicated that the WSPs are clearly distinct from other cupin_5 domain proteins found in B. improvisus, B. amphitrite, and Daphnia. Moreover, identification of the cupin_5 domain does not provide any information about function because the cupin_5 family is poorly characterised (Zhou et al. 2005;Gaowa and Zhang 2009;Du et al. 2010).
Analysis of the sequence-features of the identified WSP homologues revealed that all have quite conserved N-terminal signal peptides and core regions, with the greatest variability residing in the C-terminal regions. The importance of the C-terminus for watersoluble pheromones was noticed many years ago because digestion with carboxypeptidases that cleave carboxyl-terminal lysine and arginine residues eliminated pheromone activity (Tegtmeyer and Rittschof 1988). Furthermore, the C-terminus has a striking number of charged amino acids compared to the core region ( Figure 2 and Figure S1B). A charged, basic Cterminus has previously been suggested to be present in a small barnacle waterborne cue of 3-5 kDa ( 40 aa) (Tegtmeyer and Rittschof 1988). Interestingly, the sizes of the peptides that showed potent induction of settlement in the study by Tegtmeyer and Rittschof (1988) correspond to the average size of the C-termini in the WSP homologues reported here. This opens up the possibility that the short peptides reported (Rittschof 1985;Tegtmeyer and Rittschof 1988;Clare and Matsumura 2000) with settlement-inducing activity are derived from cleavage of the WSPs, releasing biologically active peptides from the C-termini as earlier proposed (Rittschof 1993). A similar mechanism was found in lobsters and hermit crabs, where olfactory serine proteases have been suggested to cleave after arginine and lysine residues in inactive odorant peptide molecules to generate active chemical cues (Rittschof 1990;Levine et al. 2001). Notably, the charged amino acids in the C-terminus of some of the identified WSP homologues are organised in repeated motifs. The most commonly observed were -HDDHand -GR-motifs (Figure 2), but some of the WSP homologues also contained variations of these motifs, eg -HDEH-, -HDH-, and -GGR-. Interestingly, the latter motif was previously reported to be important for settlement-inducing activity in both barnacle and oyster larvae (Zimme-Faust and Tamburri 1994;Browne and Zimmer 2001).
In comparison to SIPC, WSP is largely under-investigated. An important piece of missing evidence is whether WSP is actually released into the water by adult organisms in nature and functions as a waterborne cue, as earlier suggested (Endo et al. 2009). Further experimental studies are required to establish whether the multiple WSP homologues identified here in both B. amphitrite and B. improvisus act as pheromones. However, assuming that the WSP homologues exhibit pheromone activity, the current study brings a new level of complexity to the mechanisms of barnacle chemical communication and barnacle settlement ecology. The presence of several WSP-like candidates in barnacles is consistent with previously reported data on pheromones in some insects, nematodes, and marine invertebrates that consist of a mix of several components. It has been suggested that the specificity of these pheromones is determined by the relative concentrations of the different components in the mixture (Linn et al. 1984;Roelofs 1995;Cummins et al. 2004). Moreover, evidence exists that the concentration of the different pheromone components depends on age, sex, and habitat, thus providing additional information to the individuals receiving the pheromone signal (Copp ee et al. 2011;Choe et al. 2012). One particularly well-studied example of pheromone mixtures among marine invertebrates is the pheromone blend in the sea slug Aplysia (Cummins et al. 2005). Mate attraction in these animals involves several waterborne pheromones such as attractin, seductin, and temptin, that in different combinations act to maintain mating aggregations (Cummins et al. 2005). In barnacles, according to the current hypothesis, cyprids detect the waterborne cue released by adults while swimming and respond to it by transitioning from the water column to a substratum where the surface-bound SIPC pheromone in turn induces permanent attachment (Elbourne and Clare 2010).
Details on how the waterborne cue is released, distributed, and sensed by cyprids remain unknown. However, studies on host location by cyprids of the parasitic barnacle Heterosaccus dollfusi (Pasternak et al. 2004) showed that larvae use chemoreception to initiate the motion and rheoreception (sensing direction of the flow) to follow the host's odour plume upstream until locating the organism releasing the odour. This study also suggested that larvae do not use the concentration gradient of the attractant to find the direction but rather its presence or absence in the water flow. In the context of a pheromone mixture, it is therefore probable that cyprids' first encounter the most abundant WSP that travels the longest distance from where it was released. Following the odour plume, cyprids will sense the other more lowly expressed WSP homologues that would serve as a confirmation of the direction towards the odour source as well as possibly trigger downward movement thus bringing cyprids closer to the surface where the cue is released by already established adults.
The relative abundance of the WSPs in combination with the sequence differences between species might be the basis for the species specificity of the cue. However, no obvious and consistent amino acid differences between the WSPs from B. amphitrite and B. improvisus were observed in the core region. Instead, the C-terminus with its intriguing repetitions of acidic and basic residues might contain speciesunique combinations of motifs. Although there is some evidence that glycosylation patterns might be involved in the species specificity of the SIPC protein (Yorisue et al. 2012), nothing currently indicates that this type of post-translational modification plays a role in the activity of WSP. On the contrary, Endo et al. (2009) found that lentil lectin treatment (which binds to and inhibits sugars on proteins) did not influence the pheromone activity of their WSP fraction. On the other hand, the waterborne cue might not necessarily be species specific as has been shown for Aplysia attractins, which are the only peptide pheromone family in invertebrates known not to be species specific (Painter et al. 2003). Behavioural studies on Aplysia also showed that fewer individuals were attracted by a single component than by a pheromone blend, suggesting that pheromones act in concert (Painter et al. 2003). The present data show that different barnacle life stages express different combinations of WSPs. In particular, the cyprids expressed mostly Bi_WSP, whereas juveniles expressed all six identified WSP homologues. It should be pointed out that newly metamorphosed juvenile barnacles were found to be particularly attractive to cyprids (Crisp and Meadows 1962). Therefore, the blend of several WSPs produced by juveniles might signal to cyprids that it is a suitable place to successfully settle and metamorphose.
Except for the reported pheromone function of the published B. amphitrite WSP (Endo et al. 2009), neither pheromone activity nor any other function for the WSP homologues identified here from B. amphitrite or B. improvisus have been experimentally examined. In addition to inducing cyprid settlement, the WSP homologues might have completely different or dual functions. For instance, the identification of WSP among the proteins rinsed from cement collected from adult B. amphitrite (So et al. 2017) suggests that WSPs might be a part of the permanent adhesive from where they leak into the water and induce cyprid settlement. A patent has been issued for an adhesive 35 kDa protein in B. improvisus, B. crenatus, and S. balanoides extracted from the hemolymph or base plate (Kaplan et al. 2003) that contains an N-terminal fragment of about 20 amino acids that exactly matches the Bi_WSP downstream of the signal peptide region (starting at aa number 16). Furthermore, the Semibalanus sequence (File S1), which is a protein of 245 aa and found both in the hemolymph and the base plate, is clearly a WSP homologue with 67%, 65%, and 62% identity to Bi_WSP-like 2, Ba_WSP, and Bi_WSP, respectively. Additionally, the triad -HDDH-motif in the C-terminus of this 35 kDa Semibalanus sequence (mentioned in the same patent) is found in Bi_WSP-like 4 with a G separating the three -HDDH-motifs, whereas Bi_WSP-like 2 contains three repetitions of the alternative -HDH-motif. It should be noted that a repeated -HDH-motif in the C-terminus is also found in two of the B. amphitrite WSP sequences ( Figure 2). It is interesting that these motifs consistently occur in all the barnacle species mentioned, albeit sometimes with small differences, especially because the C-terminus is the least conserved part of these WSP homologues. The presence and particular combinations of motifs might indicate differences in function for the different WSP homologues, whether acting as pheromones or as a part of the cement.
In the current study, several WSP homologues in B. improvisus were identified and characterised. Further experimental studies are needed to determine the function of all of the WSP homologues, especially what features of WSPs are behind the settling-inducing effect and whether there is a functional connection to the properties of the adhesives. Although several WSP-like sequences exist in both B. improvisus and B. amphitrite, a complete set of genes encoding WSP-like proteins in any barnacle species can only be obtained using a high-quality genome as a reference. Despite that, the identification of a set of WSP homologues is an important discovery that will first of all promote further research into the great complexity of barnacle chemical communication. Furthermore, it is clear from the present study that the WSP homologues seem to be barnacle-specific and thus might be used as targets for non-toxic antifouling agents analogous to pheromone-based applications in insect management strategies (Witzgall et al. 2010). Finally, the protein sequences of the identified WSP homologues are now available in the NCBI database, and this will aid in the annotation of WSPs in future studies.