Identification of Plasmodiophora brassicae effectors — A challenging goal

ABSTRACT Clubroot is an economically important disease affecting Brassica plants worldwide. Plasmodiophora brassicae is the protist pathogen associated with the disease, and its soil-borne obligate parasitic nature has impeded studies related to its biology and the mechanisms involved in its infection of the plant host. The identification of effector proteins is key to understanding how the pathogen manipulates the plant’s immune response and the genes involved in resistance. After more than 140 years studying clubroot and P. brassicae, very little is known about the effectors playing key roles in the infection process and subsequent disease progression. Here we analyze the information available for identified effectors and suggest several features of effector genes that can be used in the search for others. Based on the information presented in this review, we propose a comprehensive bioinformatics pipeline for effector identification and provide a list of the bioinformatics tools available for such.


Introduction
Clubroot disease is without doubt the most devastating disease affecting Brassicas, including the oilseed plant canola (Brassica napus) [1]. Brassica crops are widely cultivated and economically important for many countries around the world, with economic losses exceeding billions of dollars per year [1,2]. Clubroot disease, although it appears to have been first identified in Western Europe, today has been reported in countries as widely distributed as Brazil, South Africa, Australia, New Zealand, China, and Russia, across six of the seven continents [2,3]. Clubroot is caused by Plasmodiophora brassicae, a soil-borne pathogen member of the Order Plasmodiophorales, which are obligate intracellular parasites of fungi, algae, or higher plants [4]. In 2010, phylogenetic analysis also placed P. brassicae in the protist subgroup Rhizaria [5], one of the more poorly understood subgroups of the eukaryotes [5,6]. Plants affected by P. brassicae develop galls (abnormal outgrowths similar to tumors) on their roots to support the development of secondary plasmodia during the pathogen life cycle ( Figure 1A) 7 . Gall formation leads to wilting associated with difficulties in water and nutrient uptake by the plant, and subsequent death [7]. Mature secondary plasmodia, the last stage of the pathogen life cycle, develop into resting spores that are released into the soil where they can resist severe environmental conditions for up to 20 years [7], making it almost impossible to prevent the disease through crop rotation and/or chemical treatments [1].
Breeding of clubroot-resistant cultivars is an important management strategy for controlling the disease, but in many countries such as Canada, there is a narrow genetic background with which to work [8][9][10]. Although there are some commercial clubroot-resistant (CR) canola and cabbage cultivars available, this resistance is associated with single dominant CR genes [11], as reported in Chinese cabbage [12] and oilseed rape [13], and leads to rapid breakdown. Two resistance genes have been isolated from Chinese cabbage (Brassica rapa), CRa and Crr1 [14,15], while genetic mapping and identification of clubroot resistance has been also achieved in Brassica oleracea [16][17][18][19].
Most studies into the interaction between P. brassicae and its hosts have focussed on the plant, mainly because the pathogen is a soil-borne obligate biotroph impossible to study outside of the plant host. This pathogen life style is the reason why key knowledge pertaining to the identification of effector proteins mediating infection and subsequent disease progression is still unavailable. The recent draft genomes for European and North American P. brassicae pathotypes have provided the opportunity for the identification of putative effector proteins through comparative genomics [20,21]. However, low levels of similarity to known sequences at either the nucleotide and/or amino acid level has meant that annotating P. brassicae genes has proven to be extremely difficult.
Effectors from well-studied biotrophic plant pathogens such as the Basidiomycete rusts and the Ascomycete powdery mildews have been extensively studied and characterized as the basis of disease resistance breeding strategies [22]. Among these well-characterized biotrophic pathogens are: (i) Puccinia monoica [23] which, similar to aster yellows phytoplasma [24], can induce floral mimicry in order to promote its own sexual reproduction; (ii) Melampsora lini [25], which produces effector proteins in haustoria that are recognized inside the plant cell [26]; and (iii) Cladosporium fulvum, the tomato pathogen which, during infection, secretes the chitinbinding virulence factor Avr4 that is thought to protect the fungal chitin cell wall from hydrolysis by plant produced chitinases [27]. Extensive studies with P. graminis f. sp. tritici have identified racespecific avirulence factors (Avr) such as AvrSr35 that mediates resistance against the highly virulent wheat stem rust race Ug99 [28] and AvrSr50, recognized by the Sr50 resistance protein, that provides resistance against all race groups of P. graminis f. sp. tritici worldwide, including Ug99 [29,30].
This review provides an overview of our current knowledge of putative effector proteins and suggests strategies for better annotation of the P. brassicae draft genomes with respect to effector proteins.

What is known so far?
In order to manipulate plant defenses and enable parasitic colonization, many eukaryotic biotrophic plant pathogens have evolved advanced strategies to deliver effector proteins into the host cell during infection [31]. Successful infection relies primarily on the success of the release of the effectors, which in many cases are responsible for the suppression of plant immunity [32]. The initial recognition of conserved microbial features, known as pathogenassociated molecular patterns (PAMPs), leads to PAMP-triggered immunity (PTI) in the host [31]. PAMP-triggered immunity is a first level of immune response that can be overcome by effector proteins produced by adapted pathogens. Resistance (R) to adapted pathogens is achieved through specific recognition of effectors, also known as avirulence proteins, by corresponding R proteins produced by the plant host. This effector-R protein recognition constitutes the second level of immune response, effectortriggered immunity (ETI) [32]. P. brassicae is a welladapted pathogen of Brassica hosts; though indirect evidence suggests lack of either response [33], both PTI and ETI have not been well-characterized in the host-P. brassicae pathosystem.
Understanding and identifying the proteins that comprise the secretome of P. brassicae is an  [7], representing spindle-shaped resting spores, biflagellate primary zoospores, zoospores, and primary and secondary plasmodia (oval black figure in root hairs and cortical cells, respectively). Further steps in P. brassicae´s life cycle, such as the formation of resting spores in cortical cells and its ejection to the soil, are not shown in this scheme.
important step towards identifying the complementary R proteins in the plant host. Putative effectors from biotrophic plant pathogens such as oomycetes and fungi are emerging from the sequencing and assembly of their genomes or transcriptomes, followed by comparative analysis with candidate effector genes [34,35]. This was precisely the strategy followed by Schwelm et al [20]. and Rolfe et al [21]. to identify putative effectors released by P. brassicae during the infection process. However, the obligate parasitic nature of P. brassicae has made it impossible to obtain a complete genome, with six draft genomes barely annotated for five Canadian pathotypes and one European pathotype [20,21]. Transcriptomes of P. brassicae infecting B. napus and Arabidopsis thaliana in the Canadian study, and infecting B. rapa, B. napus, and B. oleracea var. capitata in the European study, have identified some candidate effectors [20,21]. However, while the information obtained in both studies is a good start, the effectors specifically responsible for P. brassicae infection and subsequent disease progression have still to be identified.
A common finding in both studies was the overexpression of a benzoic acid/salicylic acid methyltransferase-encoding gene [PBRA_T000444 in P. brassicae European pathotype 3(Pbe3), PbPT3Sc00 026_A_1.308_1 in P. brassicae pathotype 3 (Pb3) draft genomes; Genbank accession number AFK13134] during the second week of infection, with expression peaking three and four weeks after infection [20,21,36]. Salicylic acid (SA) is essential for the activation of plant defence [37]. In the plant cell, regulation of active SA is managed through the maintenance of different inactive forms of SA, such as methyl salicylate (MeSA) [38]. The methyltransferase identified in P. brassicae has been characterized in detail by Ludwig-Müller et al [36]., identifying its role in the methylation of salicylic, benzoic and anthranilic acids, thereby contributing to the suppression of the salicylic acid-induced defense in a plant host. This is the first and only well-characterized putative effector for P. brassicae and for this reason, we suggest that it is necessary to use a less conservative approach in this endeavour.
Where and how to look for effectors?
The concept of an effector is constantly evolving with the understanding of plant-pathogen interactions. The basic criteria to identify candidate secreted effector proteins are: proteins with a signal peptide (within the initial 60 amino acids at the N-terminus), no trans-membrane domains, small size between 300 to 450 amino acids, and mostly species-specific [32,39,40]. These parameters were those used in the identification of putative P. brassicae effectors in the previously mentioned studies (Table 1). In addition, several other characteristics, motifs, and domains have been associated with effector proteins and have been used to improve identification and functional characterization of P. brassicae effectors.

Cysteine-rich proteins
Cysteine rich small proteins have been identified as effectors in several plant pathogens, especially fungi, such as Cladosporium fulvum (syn. Passalora fulva), an asexual extracellular fungal pathogen of tomato [41]. In C. fulvum, cys-rich effectors can inhibit and protect against plant hydrolytic enzymes, such as proteases, glucanases, and chitinases [32]. Cys-rich small-secreted proteins have also been identified as major effectors in the obligate biotrophic pathogens Melampsora larici-populina [35], and the Asian soybean rust fungus Phakopsora pachyrhizi [42], where one of the cys-rich small proteins identified as an effector has been shown to suppress plant immunity [43].

Rxlr motifs
The motif RxLR, arginine-any amino acid-leucinearginine, has been identified in the N-terminal of some oomycete and fungal effectors [43,44]. Although the function of the RxLR motif in effector proteins remains unclear, it has been shown to be necessary for translocation into the host cell [45] and to elicit immune responses in plant cells [46]. Curiously, despite the divergence between P. brassicae and oomycetes, RxLR motifs have been reported in effectors in both the P. brassicae Canadian strain Pb3 and the European strain Pbe3 [20,21], but the IDs of these RxLR protein-encoding genes have yet to be determined. Many putative effectors containing the RxLR motif also contain the second conserved motif, DEER (aspartate-glutamate-glutamatearginine), located toward the C-terminus [47].

Chitin-binding domains
Chitin, a recognized microbial PAMP, is a major structural component of fungal cell walls. Some fungal effectors have been shown to contain chitinbinding domains that are able to protect the pathogen against plant chitinases [48,49]. These effectors can also act as scavengers of chitin fragments released by the pathogen during infection [50], thereby avoiding a PAMP-triggered immunity response by the host plant. The resting spores of P. brassicae, formed at the end of the pathogen life cycle, contain chitin in their cell walls [20]. The presence of the carbohydrate/chitin-binding (CBM18) domain, enriched in the plasmodiophorid secretome, suggests that these putative effectors might be involved in the formation and possibly the germination of resting spores. A blastp (https:// blast.ncbi.nlm.nih.gov) search with a chitin-binding domain protein, identified in the genome of Canadian strain Pb3 (PbPT3Sc00048_S_5.266_1), showed identity with a Fusarium fujikuroi chitinase (Genbank accession number CCT72994) [15]. In the Pbe3 genome, chitin recognition proteins, like the protein encoded by the PBRA_002543 gene, have also been identified [20].

Protease/protease inhibitors
Sequence identity between plant pathogen effectors and other protein sequences is often low such that the assessment of functions based on putative orthology alone has been limited [31,50]. In many cases, the three-dimensional structure of the protein, the disulfide bond pattern, and the cysteine spacing have been used to identify protease and/or protease inhibitors as putative effectors in obligate biotrophic soil-born plant pathogens [34,51,52]. These effectors target host proteins/proteases during infection, thereby manipulating the host response to infection. During resting spore formation, Pbe3 overexpresses Kazal-like (e.g. PBRA_001430) and papain protease inhibitors [20]. The Kazal family of serine protease inhibitors, characterized by the presence of ten cysteine residues including the characteristic CX7CX6YX3CX2-3C signature, have been reported as effectors in fungi and oomycetes [53,54].
A putative serine protease (GenBank accession number AM411657) was identified among the P. brassicae genes that were expressed during infection [55]. This protease carried a predicted signal peptide sequence but lacked homologs in other plant pathogens. Further studies identified the serine protease as Pro1 [56], a member of the S28 family of proteases that, due to its proteolytic activity, may play a role during clubroot pathogenesis by stimulating resting spore germination [56]. Curiously, none of these studies suggested that this protein was an effector, probably because its suggested role occurs outside of the plant cell.

Nuclear localization domains
Differing from many fungi and oomycetes, P. brassicae is an intracellular pathogen [57]. Another criterion used to identify putative effectors from obligate intracellular pathogens has been the presence of nuclear localization domains, which allows effectors to directly modulate plant gene expression [58,59]. For many years, effectors capable of migrating to the plant cell nucleus have only been described in bacteria [58][59][60], but more recently these motifs together with nuclear localization of effectors has been described in fungi [61,62] and nematodes [63]. In bioinformatics pipelines designed to identify putative effectors, the inclusion of steps to remove proteins containing subcellular localization signals will remove these effectors, although researchers could analyze these amino acid sequences directly using the online tool TargetP 1.1 Server (http://www.cbs.dtu.dk/services/ TargetP). To date, none of the studies on the secretome and putative effectors of P. brassicae have detected or identified secreted proteins with such domains and none of the putative secreted proteins reported for P. brassicae have been identified as containing nuclear localization domains [20,21].

Pexel motif
P. brassicae is evolutionarily closer to the malaria parasite, Plasmodium spp., than to oomycetes or fungal pathogens [5]. While there are many differences between the immune system of animals and plants, they both share the common characteristic of being targeted by pathogen effectors [64]. The discovery of the Pexel motif (Plasmodium export element) was a ground breaking finding that contributed to the understanding of the infection process of Plasmodium [65]. Pexel is a pentameric motif present in the N-terminal portion of all the proteins translocated through the parasitophorous vacuole membrane. It is comprised of a positively charged, hydrophilic amino acid in position one (Arg or Lys), a hydrophobic amino acid in position three (Leu or Ile), and another less conserved amino acid in position 5 (predominantly Asp, Glu, or Gln), with non-charged amino acids in positions two and four (Ser, Thr, Cys, Met, Asn, or Gln) [65]. The N-terminal domain of an effector protein from the soybean cyst nematode Heterodera glycines, containing unique sequence similarity to domains of an effector of Plasmodium spp [66]., indicates that the use of analogous effectors by highly diverse parasites of plants and animals occurs and is worth exploring.
Plant pathogenic plasmodiophorids P. brassicae is not the only plant pathogenic protist [67], nor is it the only plasmodiophorid affecting economically important crops: the group includes Spongospora subterranean, the causal agent of powdery scab on potato tubers, and Polymyxa spp., Polymyxa graminis and Polymyxa betae, which affect graminaceous plants and Chenopodiaceae plants, respectively [68,69].
Genomic sequences, although limited, are available from S. subterranea [70][71][72] and comprehensive S. subterranea transcriptomic datasets are available from root galls [5,20]. These data suggest intronrich genes and an enrichment of chitin-related enzymes in the S. subterranean transcriptome. Transposable elements are more expressed in S. subterranea than in P. brassicae [20,70,71], but evidence for recombination in S. subterranea is limited and there is little understanding of sexual recombination in phytomyxids [73]. A study of secreted proteins in S. subterranea has been carried out, identifying the benzoic acid/salicylic acid methyltransferase-encoding gene over-expression that was previously described [20]. To date, there is no genomic data available for Polymyxa spp [67]..

Life cycle and effectors
P. brassicae has three main stages to its life cycle, (i) survival in the soil and germination of resting spores, (ii) root hair or epidermal cell infection, and (iii) cortical infection ( Figure 1B), although the pathogen has also been observed in phloem (Reviewed in Kageyama and Asano [7]). The serine protease, Pro1, is thought to be involved in step 1 ( Figure 1B), the germination of resting spores [56]. From step 2 to 5 of the life cycle ( Figure 1B), it is expected that other proteases and protease inhibitors, such as cys-rich proteases and methyltransferases will be produced to suppress plant immunity, thereby preventing the host plant from mounting responses such as programmed cell death and increasing the probability of success of the infection process [33]. During the formation of primary and secondary plasmodia, infected root tissues develop into swollen galls; it is expected that P. brassicae will secret an array of effector proteins triggering growth, expansion and differentiations of infected host cells. During zoospore maturation ( Figure 1B), as expected, P. brassicae effectors with chitin-binding domains will express to remove chitin fragments that otherwise could trigger PAMP-associated immunity during late stages of the P. brassicae life cycle ( Figure 1). Based on previous studies, might be also expected over expression of effectors implies on manipulating plant defense such as PbBSMT [36] and effectors disturbing plant meristematic activity during the formation of secondary zoospore and secondary plasmodia [74]. While there are still steps in the life cycle of P. brassicae that remain to be clarified, the study of effectors and their roles in life cycle completion and clubroot disease progression is an arena waiting to be explored.

A coherent pipeline to identify effectors
Computational prediction is an excellent starting point to screen for putative effectors, to identify functional domains and to help us understand the evolution, distribution and characteristics of effectors [75,76]. Utilizing the information presented in this review we have designed a coherent pipeline aimed at identifying putative effectors involved in the infection process by P. brassicae and subsequent clubroot disease progression in its plant host ( Figure 2). The pipeline makes use of tools to identify the origin of reads, and identify motifs and functions of the predicted secreted proteins. The bioinformatics tools referred to in the pipeline, together with their descriptions and websites, are listed in Table 2. This pipeline is only a suggestion based on our previous experience [77], and although key parameters should be set in order to run it properly, it is a good starting point.

Concluding remarks
Bioinformatics analysis is the route to the identification of candidate effector proteins, but laboratory confirmation of function will always be required. Validation of the RNA-seq data through qPCR [77] or ddPCR [78], identification of the subcellular localization of the candidate effector proteins [20] through transient expression or stable transformation [21,79] and identification of the plant proteins interacting with the pathogen effectors are some of the logical next steps towards the identification of P. brassicae effector proteins. The work with pathogen effectors in general is challenging, but the work with intracellular, biotrophic pathogens appears to be even more so, requiring creativity and novel solutions.

Disclosure statement
No potential conflict of interest was reported by the authors.

Funding
This work was supported by the Saskatchewan Ministry of Agriculture, Canada.
http://code. google.com/ p/rna-star [81] SignalP Software that predicts signal peptide cleavage sites in proteins from eukaryotes and prokaryotes.
http://www. cbs.dtu.dk/ser vices/SignalP [82] WoLF PSORT An algorithm that predicts the subcellular localization sites of proteins based on their amino acid sequences. It makes predictions based on both known sorting signal motifs and some correlative sequence features such as amino acid content.