rpoB and efp are stable candidate reference genes for quantitative real-time PCR analysis in Saccharopolyspora spinosa

Abstract Spinosad (spinosyn A and spinosyn D), the secondary metabolite produced by Saccharopolyspora spinosa, is a potent insecticide with low effects on the environment and mammals. Strategies such as metabolic engineering, mutagenesis and fermentation process optimization have been employed for its production enhancement. Quantitative real-time polymerase chain reaction(qRT-PCR) is one of the preferred methods for the evaluation of gene transcript levels, whose accuracy and sensitivity depend on the normalization using optimal reference genes. However, no single reference gene is universally appropriate for all strains under various conditions. In this study, the transcriptional stability of 35 candidate reference genes, including 23 traditionally used reference genes and 12 novel reference genes identified in S. spinosa homologous species, was analysed in S. spinosa ATCC 49460 and spinosad high-yield strain S. spinosa S3-3 under three fermentation phases. The transcriptional stability of these genes was assessed by three statistical algorithms, geNorm, NormFinder and BestKeeper. The overall rankings suggested that rpoB and efp were the most stable candidate reference genes, and they may be the most promising reference genes for further study on the measurement of expression levels of target genes involved in the biosynthetic process of spinosad.


Introduction
Saccharopolyspora spinosa, a kind of soil actinomycete, has attracted great attention because of its secondary metabolites, spinosad. Spinosad (spinosyn A and spinosyn D) is a member of polyketide-derived macrolide insecticide with high effectiveness and broad-spectrum insecticidal activity [1]. Taking the advantages of low mammalian and avian toxicity and low environmental impact, spinosad was awarded 'The Presidential Green Chemistry Challenge (U.S.A.)' three times and was permitted to be used on organic food by the European Union, leading to a high market demand [2][3][4].
In order to increase the fermentation yield of spinosad, researchers had made great efforts through chemical mutagenesis or fermentation optimization of S. spinosa [3,5]. Recently, the whole genome sequence of S. spinosa has been published, which enables further studies on the construction of highyield spinosad strains by metabolic engineering [6]. The whole genome sequence of S. spinosa laid the foundation to search for new target genes in S. spinosa to increase spinosad yield through molecular biology approach. Thereafter, transcriptional level research played a key role in revealing the regulation network and catalytic actions during cell processes.
Several methods have been used to investigate gene transcription levels, such as Northern blotting, semi-quantitative reverse transcription polymerase chain reaction (RT-PCR), in situ hybridization and quantitative real-time PCR (qRT-PCR). qRT-PCR is the most widely used technique to investigate gene expression due to its high sensitivity, specificity, accuracy, reproducibility, convenience and high throughout [7][8][9]. However, multiple factors can affect the accuracy of the qRT-PCR data, including RNA quality and quantity, cDNA synthesis methods and its concentration, the efficiency of amplification and so on [10]. For normalization and precise quantification of the expression of the target genes, the selection and validation of stable references genes are crucial steps before proceeding to gene expression analysis.
A number of housekeeping genes such as 16S ribosomal RNA (16S rRNA), ribosome protein L13 (rbL13), RNA polymerase factor, DNA polymerase factor and elongation factor have been used as reference genes in different expression profiling studies in bacteria [11][12][13][14]. However, it is reported that the expression of housekeeping genes may vary among various species and different experimental conditions. To identify the key steps in spinosad biosynthesis of S. spinosa, the most commonly used reference gene was DNA polymerase sigma factor subunit A (sigA) [15][16][17]. Also, 16S rRNA was selected as the reference gene for revealing the metabolism difference between Saccharopolyspora pogona and S. spinosa, as well as seeking the link between primary metabolism and spinosad biosynthesis [18,19]. And ribosome protein L13 (rbL13) was also selected to determine the transcript levels of rex, ctyA and ctyb, which were related to the redox control in S. spinosa [20]. In the study of heterologous expression of spinosyn biosynthetic gene cluster, DNA gyrase subunit A (gyrA) was chosen as the reference gene [21]. Additionally, the polyketide synthase (spnA), which was involved in spinosad biosynthesis, was also considered to be a reference gene. It was reported that 16S rRNA and rbL13 were considered to be the most suitable reference genes in different S. spinosa strains and under different culture conditions [22]. Overall, it is obvious that the normalization of reference genes in S. spinosa has not been standardized.
In this study, the most stable reference genes for gene transcriptional analysis in S. spinosa were screened with two different strains and three fermentation phases. The study will contribute to more complex studies, for instance, searching for the limiting enzyme during spinosad biosynthesis and new target genes for S. spinosa yield improvement, as well as exploring other macrolide secondary metabolites.

Strains and cultivation
The wild-type strain, S. spinosa ATCC 49460 was purchased from China General Microbiological Culture Collection Center (CGMCC). The high-yield mutant, S. spinosa S3-3 derived from strain ATCC 49460, was obtained by nitrosoguanidine mutagenesis, UV irradiation combined with chloramphenicol screening successively, and shows stable inheritance.
For strain activation, 0.5 mL of spore suspensions (approx. 10 7 -10 8 spores/mL) was inoculated to seed medium and incubated for 54 h at 29 C, 240 rpm on a rotary shaker. Each litre of seed medium contained the following: 4 g yeast extract, 4 g tryptone, 4 g casein hydrolysate (acid), 10 g glucose, 1.36 g K 2 HPO 4 , 0.5 g MgSO 4 , with a pH of 7.2 ± 0.1 before autoclaving. Next, 10% (v/v) of the seed culture was inoculated into 30 mL of seed medium to prepare fermented seed liquid. Then, 0.3 mL of seed culture (10%, v/v) was inoculated into 30 mL of fermentation medium in 250-mL Erlenmeyer flasks and cultured for 7 d at 29 C, 240 rpm. The composition of the fermentation medium (per litre) was 10 g peptonized milk, 20 g cottonseed protein, 20 g dextrin, 1 g NaCl, 1.0 g MgSO 4 , 0.5 g KH 2 PO 4 , 3 g CaCO 3 , 10 g yeast extract, 60 g glucose, 20 g soybean oil. The pH value was adjusted to 7.2 with 1 mol/L of NaOH before autoclaving. All experiments had three biological replicates.

Analysis of spinosad production
The yield of spinosad was assessed using high-performance liquid chromatography (HPLC) as described by Huang et al. [15]. For determination of spinosad productivity, fermentation broth was extracted with 3Â volume of methanol overnight at 4 C, protected from light. Then, the leaching liquor was centrifuged at 14,000 rpm for 15 min before being analysed by HPLC. Samples (10 lL) were injected into an Agilent C18 reverse-phase column (5 lm, 150 Â 4.6 mm) operated at 40 C using an isocratic mobile phase composed of 45% (v/v) methanol, 45% (v/v) acetonitrile, and 10% 6.5 mmol/L ammonium acetate with a flow rate of 1.0 mL/min using a UV detector at 244 nm.

RNA isolation and cDNA synthesis
Total RNA was extracted using TRNzol reagent (Tiangen Biotech Co., LTD. cat. no. DP424) from the frozen powder of S. spinosa mycelia of different fermentation time [23]. The possibility of contamination of genomic DNA was eliminated with RNase-free DNaseI (Takara Bio, Dalian, China; cat. no. 2270 A), and then, the digested RNA was used as a template for the synthesis of cDNA with Reverse Transcriptase (Promega Co., cat. no. M170A) with random hexamer primer (Takara Bio, Dalian, China; cat. no. 3801) according to the manufacturer's protocol (https:// www.takarabiomed.com.cn/DownLoad/3801.pdf).

Candidate gene selection and primer designing
It was reported that the reliable reference genes of the model organism Streptomyces coelicolor were identified by transcript abundance analysis [24]. Based on the research, we compared the amino acid sequences of selected reference genes of S. coelicolor through BLAST in the NCBI database, and then, 12 candidate genes were found. Besides, according to previous studies, another 23 commonly used housekeeping genes such as 16S rRNA, rbL13, sigA, GAPDH, tufA, hrdB and so on were included in the study [12-14, 22, 25]. All primers for candidate genes were designed by software online (https://sg. idtdna.com). The definition and primer sequences of genes are listed in Table 1.
For qRT-PCR, the primer specificity was determined by the PCR products and melting curve analysis. The amplification efficiency (E) for each primer and the R 2 value were evaluated based on the standard curve generated from a 10-fold serial dilution of pooled cDNA (10 , 10 À1 , 10 À2 and 10 À3 ). The amplification efficiency was calculated using the given formula [E¼(10 (-1/slope) -1 Â 100] [26,27].

SYBR green qRT-PCR
For analysis of the transcriptional levels of target genes, 0.07 lg of cDNA was used for qRT-PCR on the CFX96 touch Real-Time PCR System (Bio-Rad). Each 15 lL reaction system consisted of 1.0 lL cDNA (70 ng/ lL), 0.75 lL each primer (10 lmol/L), 7.5 lL 2 Â SYBRV R Select Master Mix (Applied Biosystems; cat. no. 44729) and 5.0 lL RNase-free ddH 2 O. The PCR procedures were as follows: 2 min at 50 C for activating uracil-DNA glycosylase, 2 min at 95 C for pre-denaturation, followed by 40 cycles of 15 s at 95 C for denaturation and 30 s at 60 C for annealing and extension. To reduce the experimental error, each sample had four parallels. The CFX Manager Software V3.1 was used for visualizing and analysing the raw data.

Data analysis
To analyse the expression stability of candidate reference genes, the cycle threshold (Ct) values of the candidate genes were analysed with three statistical tools: geNorm, NormFinder and BestKeeper [28][29][30].

Fermentation and sampling
The whole fermentation period of S. spinosa was 7 days. According to the fermentation process, the S. spinosa strain showed different physiological statuses, and the samples of two different strains were collected at three fermentation stages: initial stage of spinosad production (fermentation for 3 days), logarithmic period of spinosad production (fermentation for 5 days) and stationary phase of spinosad production (fermentation for 7 days). As shown in Figure 1, at these three time points, the spinosad production of wild-type strain and high-yield mutant was comparable, while the yield of spinosad production of S. spinosa S3-3 was 6-16 times higher than that of S. spinosa ATCC 49460.

Primer amplification specificity and efficiency evaluation
To assess the primer specificity of the 35 candidate genes in our study, the amplified products were checked by 2% agarose gel electrophoresis. Figure 2 shows that the amplicon sizes were consistent with the expected sizes. The amplification of a single PCR fragment with no dimer formation and a single sharp peak in the melting curves (data not shown) ensured the specificity of the primers for the selected reference genes. In addition, the primer efficiency of all selected genes was higher than 90% with R 2 values of the standard curve ranging from 0.961 to 1.000 (the detailed description is shown in Table 1). It was reported that the acceptable amplification efficiency varied from 90% to 110% [26,31]. Based on the principle, some of the genes, including RS0111025 (134.56%), RS0122145 (135.79%), mshA (26034.28%), RS0138655 (131.54%), RS0117990 (332.97%), RS0140840 (154.91%), were excluded.

Candidate reference gene transcriptionlevel profiling
The expression rates of all the selected reference genes were measured based on Ct values from the qRT-PCR for different samples. The mean Ct values gave insight into the approximate gene expression  Figure 3A, the mean Ct values for the remaining 29 candidate reference genes ranged from 9.82 for 16S rRNA to 36.65 for ruvA for S. spinosa ATCC 49460. Similarly, for S. spinosa S3-3, the mean Ct values ranged from 9.21 for 16S rRNA to 38.05 for ruvA ( Figure 3B), which revealed a relatively wide range of expression differences between genes. Among these selected reference genes, 16S rRNA showed a maximum expression variation, whereas RS0140840, ruvA and RS0117990 were the least abundantly expressed genes with higher Ct values (>35). According to the MIQE guideline that the Ct value of the reliable reference gene should be less than 31 [32], the candidate genes ubQ2, ruvA, RS0101290, RS0103685, phoX and Ts were not suitable to be selected as the internal reference genes.

geNorm analysis
The software geNorm was used to calculate the gene expression stability value (M value) to rank the candidate genes. Based on the geNorm algorithm, the gene with the least M value was considered as the most stable reference gene, whereas with the highest M value was regarded as the least stable gene [30]. For S. spinosa ATCC 49460, the M value of  Table 2).   Figure 5. geNorm analysis to determine optimal number of reference genes required for gene normalization. Pairwise variation (V) analysis for S. spinosa ATCC 49460 (A) and S. spinosa S3-3 (B).
Note: V1 to V23 represent the variation in candidate reference genes ranked based on their stability. V1 is the variation for the most stable, whereas V23 is the variation for the least stable gene.
As for S. spinosa S3-3, except for hrdB2, the other 22 candidate reference genes showed M value less than 1.5. The M value of genes varied from 0.668 for rpoB to 2.33 for hrdB2. The gene rpoB (M ¼ 0.668) was the most stable reference gene followed by efp (M ¼ 0.698). Similarly with S. spinosa ATCC 49460, hrdB2 was the least stable candidate reference gene with the highest M value (2.33) (Figure 4(B); Table 2).
Moreover, geNorm also determines an optimum number of reference genes based on the pairwise variation (V n /V nþ1 ) between sequential normalization factor. A value of V n /V nþ1 less than 0.15 indicates that the value of n being the optimal number of candidate reference genes is enough to be used as an internal reference gene [30]. In this study, the values of V 2 /V 3 in two strains were lower than 0.15, suggesting that two reference genes were sufficient for normalization of genes ( Figure 5).

NormFinder analysis
The stability of the selected reference genes was further estimated by the NormFinder based on intergroup and intra-group variations, and the genes with lower values indicated higher stability [28]. Based on the stability value (S value) of all genes, hrdB2 was the Figure 6. Expression stability analysis of each candidate reference gene in S. spinosa ATCC 49460 (A) and S. spinosa S3-3 (B) using NormFinder. The lower stability value (S value) indicates higher stability of the candidate reference genes. The direction of the arrow indicates the most and least stable reference genes, respectively. least stable reference gene with S value higher than 1.5 either in S. spinosa wild-type strain or in mutant. hprT2, hrdA, efp, fusA and rpoB showed higher stability with lower values. For S. spinosa ATCC 49460, hprT2 was identified as the most stable reference gene with an S value of 0.101 followed by hrdA (S ¼ 0.257) (Figure 6(A); Table 3). While for S. spinosa S3-3, geNorm analysis and NormFinder algorithm revealed almost similar results for selecting candidate reference gene. rpoB was ranked as the most stable reference gene with an S value of 0.094. efp was the third stable candidate reference gene with an S value of 0.15 ( Figure 6(B); Table 3).

BestKeeper analysis
Another program, the BestKeeper could identify potential reference genes based on two significant criteria: standard deviation (SD) of the average Ct values and coefficient of correlation (r) value [29]. Higher r value means more stable candidate reference genes, and lower SD value represents higher stability. Here, the r value was considered as the prime element for evaluation. In the wild-type strain, greAB (r ¼ 0.91) was ranked as the most stable reference gene, followed by tufA and rpoB with r values of 0.89 and 0.88, respectively. And hrdB2 was the least stable candidate reference gene with the minimum r value (Table 4), which was identical with the results from the geNorm and NormFinder analyses. For the mutant strain, rbL13 (r ¼ 0.95) was the most stable gene, followed by rpoB (r ¼ 0.94), thyA (r ¼ 0.94) and greAB (r ¼ 0.94). RS0103485 (r ¼ 0.35), sigA (r ¼ 0.48) and hrdB2 (r ¼ 0.56) were less stably expressed genes with lower r values. Considering that an SD value less than 1.0 suggests consistence and stability of gene expression levels, the gene greAB was not eligible for a reference gene with SD value higher than 1.0.

Comprehensive analysis of expression stability
For a comprehensive assessment of the most stable reference genes, the above results from geNorm, NormFinder and BestKeeper were taken into consideration. As shown in Table 5, rpoB was the most stable reference gene followed by fusA and hprT2 in S. spinosa ATCC 49460. For S. spinosa S3-3, rpoB was the most stable reference gene followed by efp, greAB and rpoC. While fusA ranked the seventh in S. spinosa S3-3, comparing to efp ranked the sixth, we suggested efp as a preferable one. Overall, the gene rpoB ranked as the most reliable reference gene, and efp was the second most stable candidate reference gene in both S. spinosa wild-type strain and its mutant S3-3. hrdB2 and gyrB performed as the least stable candidate reference genes.

Discussion
qRT-PCR is an essential technique for analysing transcriptomes and investigating the regulatory molecular mechanisms [33]. But an inappropriate reference gene could result in invalid results. In general, a reference gene is not universal under different conditions. Therefore, it is extremely necessary to select suitable references to guarantee the accuracy of the transcriptional analysis of the target genes. In previous studies, stable reference genes have been identified in multiple species of cyanobacteria [34], bacteria [22,35,36], yeast [37], mycetes [38] and plants [39,40] under different conditions. This study is a systematic assessment of candidate reference genes in S. spinosa wild-type strain and high-yield mutant during different fermentation periods. Twenty-three traditionally used reference genes (encoding ribosomal RNA, ribosomal protein, elongation factor, RNA polymerase subunit, ubiquitin, RNA polymerase sigma factor, glyceraldehyde-3phosphate dehydrogenase, etc.) and 12 novel genes from its homologous species Streptomyces coelicolor were selected as the candidate genes. The specificity of synthesized primers was determined by agarose gel electrophoresis and melting curve analysis, and amplification efficiency of primers was calculated using a series of 10-fold dilutions of pooled cDNA.
Three different statistical algorithms named geNorm, NormFinder and BestKeeper were regularly used to analyse the datasets to identify reference genes [28][29][30]. In this study, the geNorm and NormFinder programs displayed the same results of the most stable and least suitable reference genes, but differed from the order analysed by BestKeeper in two different strains. In the S. spinosa ATCC 49460, hprT2 and hrdB2 were identified as the reference gene with the highest and the lowest stabilities by geNorm and NormFinder softwares, respectively, while greAB and hrdB2 were the most and the least stable genes obtained by BestKeeper. As for S. spinosa S3-3, rpoB and hrdB2 were identified as the most and the least stable genes by geNorm and NormFinder, whereas in BestKeeper, rbL13 was ranked as the highest stable gene and RS0103485 as the lowest stable one. Similar results that the ranking orders obtained from geNorm and NormFinder varied from those obtained by BestKeeper have been described in previous reports [33,41]. Based on the comprehensive evaluation of the results from three methods of analysis, the genes rpoB and efp were the most reliable reference genes in S. spinosa strains at different fermentation stages. And hrdB2 and gyrB exhibited relatively lower stability in S. spinosa strains.
rpoB, encoding DNA-directed RNA polymerase subunit beta, was identified as stable reference in Table 5. Overall ranking of the candidate reference genes by the results from geNorm, NormFinder and BestKeeper. Bifidobacterium animalis under different culture conditions [42]. Also, rpoB had demonstrated its transcriptional stability in Actinobacillus suis, Staphylococcus aureus, Xanthomonas, Listeria monocytogenes and other bacteria [43][44][45][46]. In addition, elongation factor evaluated among multiple species showed a quite stable expression level under different conditions [11,22,27]. Although hrdB was widely used as a reference gene for detection of transcription levels of target genes in Streptomyces [47], it was more stable in S. spinosa S3-3 than in S. spinosa ATCC 49460. Zhang et al. [22] found that 16S rRNA and rbL13 were the most suitable reference genes in different S. spinosa strains and under different culture conditions. On the contrary, our study showed rpoB and efp were ranked as top candidate reference genes in different strains, under the condition of different fermentation periods, yet 16S rRNA and rbL13 were less stable.

Conclusions
In summary, we have successfully evaluated and validated stable reference genes in different S. spinosa strains at different periods of fermentation. The analysis revealed the reliable reference genes in S. spinosa wild-type strain and high-producing spinosad mutant. The results showed that rpoB and efp were the best candidate reference genes for the normalization of gene expression in the process of spinosad synthesis. These two genes may be the most promising reference genes for future studies on gene function, spinosad biosynthesis process, as well as molecular mechanisms in S. spinosa.