How tRNAs dictate nuclear codon reassignments: Only a few can capture non-cognate codons

ABSTRACT mRNA decoding by tRNAs and tRNA charging by aminoacyl-tRNA synthetases are biochemically separated processes that nevertheless in general involve the same nucleotides. The combination of charging and decoding determines the genetic code. Codon reassignment happens when a differently charged tRNA replaces a former cognate tRNA. The recent discovery of the polyphyly of the yeast CUG sense codon reassignment challenged previous mechanistic considerations and led to the proposal of the so-called tRNA loss driven codon reassignment hypothesis. Accordingly, codon capture is caused by loss of a tRNA or by mutations in the translation termination factor, subsequent reduction of the codon frequency through reduced translation fidelity and final appearance of a new cognate tRNA. Critical for codon capture are sequence and structure of the new tRNA, which must be compatible with recognition regions of aminoacyl-tRNA synthetases. The proposed hypothesis applies to all reported nuclear and organellar codon reassignments.

Introduction mRNA (mRNA) decoding is determined by the specific basepairing between tRNA and mRNA triplet (codon). While the anticodon of the tRNA defines the coupling between tRNA and mRNA, it is the charging of the tRNA that determines which amino acid is added to the growing polypeptide chain. Both functions are in general coupled. Nevertheless, in a growing number of viral, bacterial, archaeal, eukaryotic and organellar genomes, tRNA charging is different to what would be expected from the anticodon identity. 1,2 Assigning tRNAs to the right codons is obviously basis for correctly translating gene predictions. It is also prerequisite for developing and evaluating codon reassignment hypotheses. Undiscovered reassignments and mis-annotated tRNAs are spread over literature and databases, and severely influence the understanding of tRNA identity change and codon capture. For example, the report that Euplotes species, which encode UGA as cysteine, only contain the canonical tRNA Cys GCA but no dedicated cognate tRNA Cys UCA remained puzzling for a long time because no nucleotide modifications or base-pairing rules are known that would allow a tRNA Cys GCA to read UGA codons. 3 This discrepancy has been resolved only recently by identifying a cognate tRNA Cys UCA via genome sequencing. 4 In mitochondria of some chlorophyceaen green algae the canonical stop codon UAG is translated as alanine, in others it is translated as leucine. 5 Nevertheless, the tRNA CUA genes in related hydrodictyacean green algae species have very recently been annotated as leucine decoder without further inspection. 6 However, the tRNAs of at least the 2 sequenced Pediastrum duplex mitochondria are unambiguously tRNA Ala . They contain all Ala-tRNA identity determinants and miss Leu-tRNA elements such as the elongated variable loop. Sequence alignments show that UAG codons align to highly conserved alanine positions. Similarly, AGR codons were thought to be used as stop codons in human mitochondria for the past 30 years 7 until elegant in-vivo and molecular dynamics experiments showed that these codons are unused and instead cause ribosomal pausing. 8,9 The polypeptides are then released by either subsequent ¡1 frame shifting and termination at TGA codons further downstream 8 or mitochondrial ICT1 release factor. 9 These examples illustrate the difficulties in determining correct codon assignments. In today's highthroughput era, the methods in codon reassignment research should include nuclear and/or organellar genome and/or transcriptome sequencing, sequence alignments of multiple proteins to determine highly conserved positions, determination and phylogenetic exploration of the entire tRNA content, and protein sequencing by e.g. high-resolution mass spectrometry. To trace the history of a codon reassignment event it is not only important to correctly identify the capturing amino acid but also to determine the origin of the respective capturing tRNA. A radically different approach put forward recently is to define a tRNA exclusively by its anticodon sequence and to completely ignore the rest of its sequence. 10 Accordingly, the tRNA Ser CAG in Candida species and the tRNA Thr YAG in yeast mitochondria were regarded as "mischarged" while the stop codon decoding tRNAs were termed "novel," independently of the tRNA's origin. However, to define a tRNA's identity by its charging as outlined above is way more common. Here, we follow the latter definition and do not discuss the contrasting study further.

Hypotheses explaining codon reassignments
The first models to describe codon reassignment events were guided by the sparse data available at that time. The codon capture hypothesis assumes that certain codons disappear through GC pressure and that the unused cognate tRNAs are subsequently lost. Free codons can then be captured by other tRNAs and reappear when the mutation bias is released. 11,12 However, this hypothesis is unlikely to explain nuclear codon reassignments given the number of codons and tRNAs with similar GC content involved. Single codon reassignments in very small genomes such as mitochondrial and plastid genomes seem plausible.
In contrast, the ambiguous intermediate hypothesis assumes the presence of 2 differently charged tRNAs competing for the same codon until the new cognate tRNA replaces the old. 13,14 Ambiguous decoding by 2 competing tRNAs has not yet been observed in nature but was successfully experimentally induced in Candida albicans by expressing a tRNA Leu UAG -derived tRNA Leu CAG from Saccharomyces cerevisiae 15 and in S.cerevisiae by expressing the isoacceptor tRNA Ser CAG from C.albicans. 16 Under experimental conditions, these competing tRNAs resulted in decoding ambiguities from 1.5 to 67%. However, the induced tRNAs were not derived from tRNAs with newly induced single anticodon mutations but from evolutionarily optimized tRNAs. Similar to the codon capture hypothesis, the ambiguous intermediate theory might explain reassignments that can be traced back to single, isolated events. However, studies of mitochondrial genomes in the last 10 y and recent discoveries of codon reassignments in nuclear genomes showed that all reassignments including the yeast CUG codon alteration are highly polyphyletic and include multiple amino acids. [17][18][19][20][21][22][23] Reassignments in species such as Blastocrithidia species 24 and Amoeboaphelidium protococcarum 25 currently appear to be isolated events. Sequencing of further, related species might lead to reconsolidation.
To account for the polyphyly of the yeast CUG codon reassignments we have recently proposed a new model termed tRNA loss driven codon reassignment. 19 Central to this model is the loss or loss of function of a tRNA or a release factor, that is at least partially compensated by wobble decoding of the free codon through isoacceptor or near-cognate tRNAs. Reduction in translation fidelity causes a subsequent decrease of the free codon. The free codon cannot be captured by any other tRNA but only by those whose anticodons are not part of the aminoacyl-tRNA synthetase recognition sites. With this hypothesis we could reasonably explain all reported codon reassignment events in mitochondrial and nuclear genomes. 19 Here, we want to summarize general rules by which tRNAs base-pair and occasionally capture codons. We will not, however, resolve all controversial, contradicting and missing tRNA and codon assignments.
A ubiquitous set of tRNAs translates the canonical nuclear genetic code The canonical genetic code consists of 64 codons divided into family boxes (sets of 4 codons differing only in the third position and coding for the same amino acid) and 2-codon sets. Exceptions are the AUN box (3 codons decoding isoleucine and one decoding methionine) and the UGR set (UGA being a stop codon and UGG decoding tryptophan; Fig. 1A). All amino acids except methionine and tryptophan are decoded by multiple codons making translation extremely redundant. A dedicated tRNA for each of the 61 amino acid would result in extensive costs for minimizing translation errors and for maintaining each tRNA gene. Rather, recognition accuracy of the third codon position is to some extend released allowing a single tRNA to couple to 2 or more codons through wobble base pairing. This allows a G at the first anticodon position (nucleotide 34) to base-pair with both C and U at the third codon position, and U to base-pair with A and G (wobble rules; Fig. 1B). Uridine is chemically the most flexible nucleotide with a central amino and 2 neighboring carbonyl groups. It is able to form hydrogen bonds and thus to base-pair with all 4 nucleotides (extended wobble rules). Mitochondria use this extended basepairing to decode each family box with a single tRNA. 26,27 Similarly, Saccharomyces cerevisiae encodes a tRNA Leu UAG that has the unique capability to decode all 6 leucine codons. 28 To prevent mis-reading in split codon boxes, the U34 is modified in all cytoplasmic tRNAs (but the aforementioned S.cerevisiae Leu-tRNA) and in mitochondrial 2-codon set tRNAs. In addition to U34 modifications, A-to-I editing (deamination of A34 to inosine) occurs in all eukaryotes and some bacteria. The such generated tRNAs with INN anticodons can base-pair to C, U and A at the third codon position. 29 All possible combinations of modified and unmodified tRNAs that decoding the 61 sense codons are used in nature (Fig. 1A). For example, tRNA GNN , tRNA UNN and tRNA CNN are needed to decode all 2-codon sets. In the highly compacted organellar genomes and in many prokaryotes however the tRNA CNN is absent. Decoding redundancy is generated through modified tRNA UNN that either specifically decode their cognate codon(s), or, more often, additionally decode NNC codons through wobble base-pairing. 30 Some eukaryotes translate some 2-codon sets by tRNA ANN , which are tRNAs prone to Ato-I editing and subsequent mistranslating of codons of other 2-codon sets. However, such tRNA ANN are extremely rare and, if present at all, outcompeted by the number of corresponding tRNA GNN (e.g., Nadsonia fulvescens contains a single tRNA AAA but 21 tRNA GAA ; Fig. 1). Also, this misreading of 2-codon sets results in chemically similar amino acids, or in the case of tyrosine and cysteine tRNAs in premature termination. Family boxes are usually decoded by 2 or 3 tRNAs. Bacteria strongly prefer a combination of tRNA GNN and tRNA UNN to decode all family boxes with occasionally additional tRNA CNN . Eukaryotes use different combinations for different family boxes that are surprisingly conserved in all extant species, which indicates their presence in the last eukaryotic common ancestor. In eukaryotes, glycine codons are never decoded by tRNA ACC but instead by combinations of the other 3 tRNAs of this family box (Fig. 1A). All other family boxes are usually decoded by an A-to-I edited tRNA INN , and by both tRNA UNN and tRNA CNN . This leads to extensive decoding redundancy in S.cerevisiae, 30 and likely in other eukaryotes. Yeasts uniquely decode the proline family box by a U34-modified tRNA UGG that is able to decode all 4 codons (Fig. 1A). 30 Some yeasts including S.cerevisiae add further redundancy by also expressing a non-essential tRNA AGG . The only non-redundantly decoded family box in yeasts is the arginine CGN family box which is decoded by an A-to-I edited tRNA ICG and a tRNA CCG .
The yeasts' leucine CUN family box decoding is unique to all nuclear genetic codes. All eukaryotes including the early diverging yeasts (e.g., Yarrowia lipolytica and N.fulvescens) decode this family box with the usual A-to-I edited tRNA IAG , a tRNA UAG and a tRNA CAG . This decoding was disrupted and subsequently extensively reorganised in the ancestor of Saccharomycetaceae, Debaryomycetaceae and Pichiaceae. The reorganisation of CUN decoding most likely lead to the CUG codon reassignment in Debaryomycetaceae and Pachysolen tannophilus. 19 In summary, the decoding of the nuclear genetic code in prokaryotes and even more so in eukaryotes is highly conserved and vastly redundant.

Only charged tRNAs can capture near-and noncognate codons
Transfer-RNA charging is determined by the respective cognate aminoacyl-tRNA synthetases (aaRS) [31][32][33] and their discriminator regions. These identify characteristic nucleotides (often including the anticodon) and surface regions of the tRNA. The combination of these aaRS recognition elements specifies the identity of the tRNA. There are 4 scenarios to change the decoding of the mRNA, which involve mutations in the tRNA only or in both the tRNA and an aaRS (Fig. 2). 19 In the first scenario, one or multiple mutations in the anticodon lead to base-pairing with a different codon while the identity and thus the charging of the tRNA remains unchanged (Fig. 2, scenario A). Such tRNAs with identical amino acidaccepting activity but different mRNA decoding selectivity are termed isoacceptor tRNAs. Because the anticodon nucleotides are part of the recognition elements of most aaRS, cytoplasmic tRNAs with mutated anticodons are usually only retained when they are isoaccepting tRNAs of the same family box or 2codon set. We have shown recently that such switches within family box tRNAs happened frequently in yeast evolution. 19 It is likely that similar switches happened in other lineages, too. However, compiling the tRNAs in extant yeasts also shows that mutations to isoacceptors generally absent from the ubiquitous set of decoders rarely happened (Fig. 1A). The reassignment of a codon to a different amino acid can only happen when the anticodon nucleotides are not part of the aaRS recognition elements. The only tRNAs whose anticodons are not part of the identity elements in all species analyzed are Ser-and Ala-tRNAs. [31][32][33] Accordingly, CUG codon reassignments to serine in "CTG clade" Candida species and to alanine in the Pichiaceae Pachysolen tannophilus 19 are the only nuclear sense codon alterations reported so far. Sense codon reassignments yet to be detected are most likely reassignments to serine or alanine. Anticodon identity determinants are usually conserved in all species. However, in some branches deviations might exist that allow for other reassignments. For example, the middle anticodon nucleotide in Leu-tRNAs (A35) is a determinant in yeasts but not in E.coli and human. 32,34,35 In the second scenario (Fig. 2, scenario B), a mutation in the anticodon or in one of the other identity elements leads to a tRNA that is no longer recognized and thus no longer charged by its former cognate aaRS. Such unchargeable tRNAs are usually highly selected against. However, in very rare cases such mutations in the identity elements lead to a tRNA that is recognized by a non-cognate aaRS. This process is also called tRNA gene recruitment. Because of the multitude of recognition elements in cytoplasmic tRNAs, this scenario has so far only been observed in mitochondria. For example, in mitochondria of some Platyhelminthes the lysine AAA codon has been reassigned to asparagine, in mitochondria of some arthropods the AGR codons have been reassigned to serine and lysine, 17 and at the origin of the Saccharomycetaceae the mitochondrial CUN codons have been reassigned to threonine and in Eremothecium species subsequently to alanine. 36 The origin of the tRNA AAA and tRNA YCU is not known, but the tRNA Ala UAG in Eremothecium originated from a tRNA Thr UAG by mutation of the acceptor stem identity determinant to the alanine-RS specific "G3:U70". 36 The successful experimental exchange of the endogenous essential tRNA Thr UGU in E.coli by a tRNA derived from an anticodon Figure 2. How tRNAs can capture other codons. The scheme outlines the 4 scenarios for a tRNA under which a non-cognate codon can be captured. All scenarios observed so far require a mutation in the anticodon. Some scenarios in addition necessitate changes in the aaRSs. Required mutations in the anticodon and in discriminator elements of the aaRSs are highlighted in gray. Scenarios A and C generate isoacceptor tRNAs, scenarios B and D alloacceptor tRNAs.

mutated tRNA
Arg UCU that is charged with threonine 37 demonstrates that reassignments following this scenario should in principal also be possible in nuclear genetic codes.
The third scenario (Fig. 2, scenario C) comprises the appearance of new isoacceptor tRNAs and slight adjustments in corresponding cognate aaRSs to release the recognition requirements for the respective anticodon position. This scenario best describes all stop codon reassignments in nuclear eukaryotic genomes. The glutamine tRNAs decoding the UAA and UAG stop codons in diplomonads and ciliates are highly similar to the canonical tRNA RUG , 38 the glutamate tRNAs decoding UAA in Vorticella and Opisthonecta species (ciliate subfamilies) are almost identical to the canonical tRNA RUG , 39 and the Euplotes species tRNA Cys UCA clearly originated from the canonical tRNA Cys GCA s. 4 In contrast to bacteria and archaea, where GluRSs glutamylate both tRNA Glu and tRNA Gln , eukaryotes have distinct GluRSs and GlnRSs for tRNA Glu and tRNA Gln . The GluRS discriminates between the 2 tRNAs by a single arginine residue in the anticodon-binding pocket. 40 Mutation of this arginine to glutamine releases the discrimination between Glu and Gln anticodons. 40 In non-discriminating GluRSs a glycine is found at the respective position. 41 Releasing the discrimination for C36 in GluRSs to allow charging of tRNA Glu with UAR stop codons while retaining specificity against tRNA Gln is seemingly more difficult to achieve, and rarely observed. Codon capture of the UAR stop codons by tyrosine has been recently discovered in Mesodinium. 20,21 species, but the respective tRNA Tyr GUA are not known. The tyrosylation fidelity of the TyrRS is tenfold lower for U34 mutated tRNA Tyr and 100 fold lower for C34 and A34 mutated tRNA Tyr compared with the cognate tRNA Tyr GUA 42 This indicates that the mutant tRNA Tyr YUA could be tyrosylated during the initial phase of codon capture until the TyrRS is optimized through further mutations to acylate all isoacceptor tRNAs equally well.
The fourth scenario of codon capturing is the most complex as it involves the establishment of a novel orthogonal pair of tRNA and aaRS in which the tRNA cannot be charged by an existing aaRSs and the aaRS cannot charge other existing tRNAs (Fig. 2, scenario D). To our knowledge, this scenario has been observed in nature only in mitochondria of Saccharomycetaceae where the tRNA Thr UAG and a dedicated ThrRS emerged. 43,44 The tRNA Thr UAG originated from a tRNA His GUG . The cognate ThrRS originated from the canonical mitochondrial ThrRS by gene duplication and subsequent modification.
In principle, codon capture could also be achieved by mutations in only the aaRS that would then charge non-cognate tRNAs. However, this would affect all isoacceptor tRNAs and might erase one of the amino acids completely. This is extremely unlikely to happen and has not been observed in nature so far.
Why the CTG codon is captured in yeasts and stop codons are captured elsewhere Codons can be captured if tRNAs with fully complementary anticodons are missing or displaced by novel isoaccepting tRNAs. Because of the redundant decoding through wobble base-pairing, new cognate tRNAs compete with isoacceptor tRNAs and (potentially) also with the original, cognate tRNAs (if an ambiguous intermediate is considered). Some tRNAs are consistently missing in each family box and 2codon sets. This implies that wobble decoding is highly efficient and competitive. Correspondingly, frequencies of codons without cognate tRNAs are often considerably higher than frequencies of codons with cognate tRNAs. While there are only one or 2 copies for each tRNA in bacteria, there are often 10 and more tRNA copies in eukaryotes (Fig. 1C). Thus, decoding redundancy and tRNA copy numbers strongly restrict codon capture possibilities. The only decoder existing in all eukaryotic nuclear genomes in a single copy only is the termination factor eRF1. In bacteria, there are 2 translation termination factors providing redundancy in decoding the TAA stop codon. This is most likely the reason why stop codon reassignments happened less often in bacteria than in eukaryotes. Because of high gene numbers, stop codons are not particularly rare codons in most eukaryotes. But with a single gene copy only, few mutations in eRF1 may decrease the decoding capability for certain stop codons. Subsequently, the stop codon(s) affected become less frequent or are, in rare cases, even eliminated. With low stop codon numbers, a novel cognate tRNA derived by anticodon mutation of an alloacceptor tRNA might become competitive, and might completely capture the stop codon(s) when complemented by adjustments within the cognate aaRS. Judging by the number of observed cases the GlnRS can be best adjusted to also accept tRNA RUA as cognate tRNAs.
Yeasts decode the family box codons and the 2-codon sets with the general eukaryotic repertoire of tRNAs, and each tRNA-type is present in up to 39 tRNA copies (Fig. 1C). The only tRNAs with unusually low copy numbers are the CUN and CGN tRNAs. In addition, their numbers dramatically decreased from the ancient Yarrowia clade species to the Saccharomycetaceae/Debaryomycetaceae/Pichiaceae species (Fig. 1C). The ancestor of the latter species might have had only single copies of the tRNA Leu AAG , tRNA Leu UAG and tRNA Leu CAG although the CUN codons were still highly used. The loss of the tRNA Leu CAG might have generated an extremely chaotic situation with only few remaining isoacceptor tRNAs for some of the most frequently used codons, and thus translation considerably slowed down at the respective codon positions. The loss of the tRNA Leu AAG would have been lethal. Subsequently, many CUN codons mutated to UUR codons and the CUG codon could be captured by isoacceptor or alloacceptor tRNAs. 19 Extant yeasts contain all possible combinations of tRNAs with modified and unmodified U34 to decode the CUN family box. As outlined above, alanine and serine tRNAs are the only ones with anticodons not part of their respective aaRS recognition sites, and therefore the only alloacceptor tRNAs found that have captured the CUG codon. Translation is seemingly a complex balance of isoacceptor tRNA combination, tRNA copy number and codon frequency. The incident in the ancient yeast might have been a unique situation in eukaryotes where this balance had dramatically been disturbed.

Disclosure of potential conflicts of interest
No potential conflicts of interest were disclosed.