A specific class of infectious agents isolated from bovine serum and dairy products and peritumoral colon cancer tissue

ABSTRACT The in silico analyses of 109 replication-competent genomic DNA sequences isolated from cow milk and its products (97 in the bovine meat and milk factors 2 group – BMMF2, and additional 4 in BMMF1) seems to place these in a specific class of infectious agents spanning between bacterial plasmid and circular ssDNA viruses. Satellite-type small plasmids with partial homology to larger genomes, were also isolated in both groups. A member of the BMMF1 group H1MBS.1 was recovered in a distinctly modified form from colon tissue by laser microdissection. Although the evolutionary origin is unknown, it draws the attention to the existence of a hitherto unrecognized, broad spectrum of potential pathogens. Indirect hints to the origin and structure of our isolates, as well as to their replicative behaviour, result from parallels drawn to the Hepatitis deltavirus genome structure and replication.


Introduction
A large set of novel viral and phage genomes has been identified during the past several years. These were identified through analyses of the human microbiome [1,2] or of samples from other sources. Metagenomic analyses are commonly applied [3][4][5], although the application of more conservative techniques also led to the identification of a number of new infectious agents [6][7][8]. Analyses mostly relied on comparisons to known agents or the identification of a replication gene in the assembled genome. Unfortunately, several studies concentrated only on the presence of known infectious entities, while paying less attention to identifying yet unknown infectious agents [9,10]. One difficulty originates from the discovery of chimeric genomes. This weakened the strict demarcation between certain types or groups of infectious agents resulting in studies investigating their phylogenetic origin [11][12][13].
We approached the question of as yet unknown agents from a different perspective: published data on the epidemiology of colon and breast cancer suggested milk and meat factors (BMMFs) derived from dairy cattle as presumably species-specific risk for these cancers after consumption of products from these animals [8,14,15]. Therefore, we initiated studies to isolate infectious agents from bovine sera and commercially available cow milk and its products. Based on our previous experience with single stranded DNA virus genomes [6,16] we initiated experiments by gradient density fractionation of sera followed by rolling circle amplification of DNA obtained. Putative Rep genes were identified as part of the DNA sequences obtained by in silico comparisons to available sequences. Amplification using inverse PCR with back-to-back primers in the rep gene led to the isolation of full and partial circular DNA genomes from bovine sera [17].
This was extended to samples from commercially available cow milk for the presence of specific circular single-stranded DNA genomes. Four additional isolates were obtained from human brain and serum (from patients with multiple sclerosis). A total of 18 full-length circular single-stranded DNA molecules (∼1100-3000 nucleotides) were cloned and sequenced [17][18][19][20]. We divided the initial 18 isolates into four different groups BMMF1 through BMMF4, according to their molecular characteristics [14]. Three of these groups revealed a remarkable degree of similarity to Acinetobacter baumannii and Psychrobacter plasmids. The fourth group comprised 3 isolates being representatives of Genomoviridae (Gemycircularvirus).
The isolation of these circular DNA molecules, in part closely related to bacterial plasmids, raised the question whether they represented bacterial contaminants or whether these were bacteria-related sequences adapted to infect and replicate in vertebrate and human cell. Presently representatives of all four groups have been analyzed for expression of genetic activity and replication in human cells [14,21].
We further extended our search for additional DNA plasmids in the BMMF1 and BMMF2 groups in the present study. We also describe the detection by mass spec of peptides specific for 3 new BMMF2 isolates, as well as the recovery of a modified BMMF1 isolate H1MSB.1, from colon tissue. The intention of this communication is to draw the attention to the existence of a hitherto unrecognized, broad spectrum of potential pathogens.

Materials and methods
Samples: The isolates were obtained from milk (10) and dairy products (8) bought from local supermarkets. To avoid a biased selection, we chose milk in different types of containers (glass, carton, etc.), and from many different providers. In addition, we tested crème fraiche, cream, curd, butter and yoghurt.
DNA extraction: DNA was extracted with phenol/ chloroform as previously described [19]. Samples were diluted 1:1 with proteinase K buffer (0.2M Tris-HCl, pH7.5, 25 mM EDTA, 0.3M NaCl, 2%SDS) and digested overnight with proteinase K (final concentration 1 mg/ml, Sigma-Aldrich) at 37°C. Ethanol precipitation followed after the phenol/chloroform extraction. DNA was extracted this way from the initial 4 milk samples, whereas all other samples were subsequently subjected to an additional purification using a Nucleospin Gel and PCR Clean up Kit (Machery and Nagel) according to the manufactureŕs protocol. DNA was finally eluted in 5 mM Tris-HCl, pH 8,0.
Rolling circle amplification was performed as described [17]. Template DNA (50 ng) was incubated in a total volume of 10 μl (1x phi29 DNA polymerase buffer) with 25 μM Exo-resistant Random primer (Thermo Fisher Scientific) at 95°C for 3 min followed by cooling on ice. This sample was diluted to 20μl by adding BSA (0,4 mg/ml), 0,75 mM dNTPs (Takara) each and 10U phi29 DNA Polymerase (New England Biolabs) in 1x phi29 DNA polymerase buffer. Incubation followed at 30°C for 18 h and 10 min at 65°C before stopping the reaction at 15°C.
Polymerase chain reaction: A 3 μl template from the RCA amplified DNA was used in the polymerase chain reaction (PCR) using TAKARA Taq enzyme and the accompanying solutions according to the manufactureŕs protocol (LA TAKARA). The respective backto-back primers were added at a final concentration of 0,2 mM each. Primers were designed based on the conserved regions of HCBI1, HCBI2 and HCBI7 for the BMMF2 group and for the BMMF1 group based on the conserved region of the replication gene. PCR was performed by using touchdown protocols specific for the respective primer pairs: MBB2 primers for the BMMF1 group: Mbb1 (MSBI1.176bbF296 forward) -5´-TGCAGAAATTG-CCCCTCGACT-3´and Mbb2 (MSBI1.176bbR295 reverse) -5´-AACAATGGGGAAGAAGTCAAAGG-3´. The first round of 5 cycles amplification was performed at 30 s melting at 94°C, 1 min annealing at 64°C and 2 min elongation at 72°C, followed by a second round of 5 cycles using 62°C annealing and a third round of 30 cycles using 60°C for annealing. The final round of 10 min at 72°C followed. Amplification for the BMMF2 group was performed as described for the BMMF1 group using the MBB2 primers, except that the touchdown temperatures were 58°C, 56°C and 54°C for the LSconA primer and 56°C, 54°C and 52°C for the LSconB primer and the elongation time was prolonged to 3 min at 72°C per cycle. The back-to-back primer pairs used for the BMMF2 group were: . PCR products were separated by gel electrophoresis, visible fragments excised and DNA extracted using the Nucleospin Gel and PCR Clean up Kit (Machery and Nagel) according to the manufactureŕs protocol. DNA was finally eluted in 5 mM Tris-HCl, pH 8,0. Eluted DNA was ligated and cloned into pCR2.1 using the TA-Cloning Kit and transformed into E.coliall according to the manufactureŕs protocol (Invitrogen™ TA Cloning™ Kit, with pCR™2.1 Vector and One Shot™ TOP10F' Chemically Competent E. coli (Invitrogen)). Sequencing of all clones was performed by GATC. This company applies cycle sequencing technology on ABI3730XL sequencing machines. Quality Phred20 is achieved for reads up to 1100 nucleotides in length.
Laser Dissection Microscopy (LDM): Formalin-fixedparaffin-embedded sections with a thickness of 8μm were mounted onto LDM-compatible membrane slides (MembraneSlide 1.0 PEN NF, Carl Zeiss) and stained with anti-Rep antibodies [15]. Antibody-stained tissue regions were specifically marked and dissected using a high intensity laser beam (PALM microbeam microscope, Carl Zeiss). The selected tissue pieces together with the underlying membrane were laser-catapulted into the caps of special opaque adhesive cap microfuge tubes (500 microliter, Carl Zeiss). The tissue pieces were subsequently dissolved in ATL lysis buffer (QIAamp DMA Mini Kit, Qiagen) and homogenized by pipetting before transferring into a fresh 1.5 ml microfuge tube. An equal volume of Chelex resin (Che-lix® 100 Resin, mesh 200-400, Bio-Rad, 5% suspension in water) was added in addition to 10 μg/ml Proteinase K (Sigma), followed by incubation overnight in a thermomixer at 56°C at 750 rpm. The reaction was stopped by vortexing for 10 s and incubation for 8 min at 99°C in a thermomixer. Chelex beads were removed by centrifugation and the DNA containing supernatant collected for further processing. RCA was performed as described above, followed by PCR amplification using NnXn or NoXo back-to-back primers as previously described [19]. Products were cloned and sequenced as described above.
In silico analyses: All sequences obtained were compared to nucleotide sequences available in Genbank by using BLASTN and the putative proteins to BLASTP (swissprot and swissprot_splicevar). In addition, Domainsweep [22] was used to identify the domain architecture within a protein sequence by using different database searches including blockssearcher, cddsearch, cathscan, pfamascan, printscan, prodomblast, prosite, smartscan, superfamily and tigrfamscan.
Predicted genes were obtained with an in-house ORFmap program, as well as with the gene identification program GENSCAN which was developed by Chris Burge from Stanford University [23]. It analyzes genomic DNA sequences from human, other vertebrates, invertebrates and plants. Other more sensitive prediction tools (Augustus [24] or Genemark [25]), were considered not suitable for our purposes, as they rely on calibration with species-specific training sets. Genscan was chosen because information is provided on the presence of polyA signals and our isolates [21] had been shown to replicate in human cells.
Phylogenetic trees were developed using maximum likelihood based on aligned (MUSCLE algorithm) sequencesboth full-length nucleotide genomes, as well as the core region of Rep. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version X [26].
Data availability: Genome sequencing files are available at the European nucleotide archive (https://www. ebi.ac.uk/ena) under Project PRJEB30101. Accession numbers LR215494-LR215600 were allocated for the respective sequences (re Table 1).

Results
In silico characterization of additional BMMF1 (bovine meat and milk factor) isolates The BMMF1 group of isolates consists of 13 circular DNA genomes [14,19,27] (Suppl. Table 1). Mutual characteristics included the presence of an initiator replication protein RepB, an iteron-like tandem repeat region (3 × 22 nt plus a partial repeat of 17 /18nt, in one isolate 4 complete tandem repeats) and a conserved palindromic structure located upstream of the repeats which may serve as a putative origin of replication. The RepB protein and the iterons were similar to those found in bacterial plasmids, whereas the putative origin of replication structure was similar to the nonanucleotide stem-loop origin of replication in single-stranded plant and animal viruses [3,14]. The genomes of all the isolates shared nucleotide identity to various plasmids of Acinetobacter baumanii and one isolate HCBI5.173 nucleotide identity to plasmids of Psychrobacter spp.
The putative Rep proteins of C1MI.15M and pRGRH0677 are identical (100%), whereas C1MI.9M shares 94% similarity to the rat gut plasmid ( Figure 1). The putative Rep protein of C1MI.3M.1 is 95% similar to the Rep protein of pHD4bpcirc, 85% to H1MSB.2Rep and 81% to pRGRH0677Rep. In silico analyses of this putative Rep protein point to RepA, as well as to RepC function, differing here from other BMMF1 Rep proteins. In contrast, the putative Rep proteins of the previous group of BMMF1 isolates are only between 55 and 56% similar to these above-mentioned isolates.
The putative 225aa protein from the second large ORF in the C1HB.4 genome (BMMF1) shared 66% identity in 227aa to a mobilization protein (DOUH-C7_ACIBA 381aa) identified in Acinetobacter. This protein could be involved in plasmid recombination [30]. It similarly shared 91% identity in 225aa to an   . The evolutionary history was inferred by using the Maximum Likelihood method and JTT matrix-based model [28]. The bootstrap consensus tree inferred from 500 replicates [29] is taken to represent the evolutionary history of the taxa analysed [29]. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches [29]. Initial tree(s) for the heuristic search were obtained by applying the Neighbour-Joining method to a matrix of pairwise distances estimated using a JTT model. This analysis involved 104 amino acid sequences. There were a total of 319 positions in the final dataset. Evolutionary analyses were conducted in MEGA X [26]. uncharacterized protein (N9GYE3_AIHA uncharacterized, 377aa).
The single-stranded Geminiviridae (Begomovirus and Mastrevirus genera) and Nanoviridae of plants are associated with small circular ssDNA components which have a nonanucleotide stem-loop origin of replication and encode a Rep protein. They range from 970nt (nanoviruses) to ca 1400 nucleotides (geminiviruses) in size and are dependent on a helper virus for their spread. These alphasatellites have recently been classified into a new family of viruses Alphasatellitidae [31]. The functions of these alphasatellites are not clear, although it has been suggested that they contribute to symptoms by influencing transcription rates. The genomes of TT viruses (family Anelloviridae) undergo rearrangement during propagation to lead to the formation of small replicationcompetent circular DNA genomes (micro-TTV, ca 700-900nt) [6]. These micro-TTVs encode in part new putative proteins with similarity to cellular proteins suggesting a role in molecular mimicry [6]. The identification of the novel BMMF1 isolates reported in this study also led to the identification of a small circular DNA molecule C1MI.3Ms of 461nt in size. These small circular genomes resulted after PCR amplification using back-to-back primers as described for the full-length genomes. This genome shares 89% nucleotide identity with the plasmid pRGRH0677 isolated from rat gut and 78% and 77% nucleotide identity to H1MSB2 and pHD4bpcirc, respectively. The iteron repeats (4 × 22nt, ATAAGAC-GAGACTTACCTACCA) present are identical to that in the genomes of C1MI.15M and C1MI.9M. In silico analyses revealed a putative Rep protein of 75 amino acids sharing >95% amino acid similarity to putative Rep proteins of C2MI.15M and C1MI.9M. BLASTP analyses indicate a 97% similarity to the first 50 amino acids of a protein in the Tunicate Oikopleura dioica [32] and lower similarity to uncharacterized RepB proteins.

In silico characterization of BMMF2 genomes and putative proteins
The circular DNA genomes (HCBI1, HCBI2 and HCBI7, Suppl. Table 1) were isolated from serum of healthy cattle [17]. These isolates were grouped as BMMF2 based on their nucleotide identity to the previously described Sphinx2.36 [14,33]. In this study, we used two sets of back-to-back primers to generate the full-length circular genomes of 97 isolates (Table 1 including accession numbers) ranging from 2102 to 3090 nucleotides. BLASTN comparisons to available DNA databanks, revealed similarity mainly to bacterial plasmids (A. baumannii str). AYE plasmid p4ABAYE [34], A. baumannii strain A85 plasmid pA85-1 [35], and A. baumannii strain DS002 plasmid pTS236 [36], but also to ssDNA isolates from Rat gut (pRGRH0103 and pRGRH0636) [37] (Suppl. Table  2). We performed phylogenetic analyses in order to determine relationships between isolates and to facilitate our selection of isolates with which to continue in-depth biological studies.
We included only one representative of a group of isolates sharing 90-99% nucleotide identity in the phylogenetic tree shown in Figure 2, as well as isolates originating from different dairy samples. Two main clades were seenone clade sharing 50% to 57% nucleotide identity to the A. baumanii plasmid p4ABAYE (Figure 2 clade A) and between 40% and 50% to the second clade (Figure 2 clade B). The nucleotide identity between pRGRH0636 and pRGRH0103, two circular ssDNA genomes identified by metagenomic analyses from Rat gut, is 39%. The latter shares between 80 and 92% nucleotide identity to C2HB1, C2MI.15B.2, C2MI.16B.3, C2MI.7.16 and C2MI.9B.5.
HCBI2.170 was isolated from bovine serum by density gradient fractionation followed by RCA amplification of restricted DNA fragments. Despite using a series of back-to-back primers on the known sequence [17], we failed to rescue a larger fragment. An isolate obtained in the present study resulted in the full-length genome, i.e. HCBI2.170 and C2MI.5A.1 (2257nt) shared identical nucleotides in the overlapping part of the genomes. C2MI.5A.1 now replaces HCBI2.170.
In silico analyses of putative open reading frames (ORF, >50aa) were performed on each genome by using the following alternative start codons: ATG, CTG, ATA, ATT, ACG, GTG, TTG and ATC [39]. This resulted in the identification of mainly 2 additional major ORFs on the positive DNA strand (Table 1), as well as putative proteins ranging from 101 to 138 amino acids and 96-148 amino acids in size, respectively. Additional smaller ORFs were present. Multiple ORFs (average number 2-10, up to 125 amino acids) were identified on the reverse strand. BLASTP analyses of the latter putative proteins revealed no or very weak similarity to known proteins. Further experimental investigation is needed to determine whether splicing events may result in combined, functional proteins. Transcription of the negative DNA strand was previously noted for the BMMF1 group of isolates [21].
An interesting result was the isolation of smaller circular DNA molecules 697nt in size and sharing 93-99% nucleotide identity. These were isolated from 3 different milk samples and one yoghurt sample. In addition, single amino acid differences were noted between the 2 groups of small circular genomes.
As the aim of our study is to identify, isolate and associate unknown infectious agents to the aetiology Figure 2. Phylogenetic tree of the nucleotide sequences of representatives of each BMMF2 isolate in comparison to the plasmids of A. baumanii p4ABAYE, pA85-1 and pTS236, genomic isolates from rat gut pRGRH0103 (RH103) and pRGRH0636 (RH0636) in clade A, as well as Sphinx2.36 and our previous BMMF1 isolates C2MI1 and C2HB7 in clade B. The recently isolated C2MI.5A.1 represents the full-length genome of C2HB2. The star indicate BMMF2 isolates of which the nucleotide sequence is almost identical in the overlapping nucleotide sequence of the respective "small" BMMF2 isolate. The evolutionary history was inferred by using the Maximum Likelihood method and Tamura-Nei model [38]. The bootstrap consensus tree inferred from 500 replicates [29] is taken to represent the evolutionary history of the taxa analysed [29]. Branches corresponding to partitions reproduced in less than 50% bootstrap replicates are collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches [29]. Initial tree(s) for the heuristic search were obtained by applying the Neighbour-Joining method to a matrix of pairwise distances estimated using the Maximum Composite Likelihood (MCL) approach. This analysis involved 48 nucleotide sequences. There were a total of 3170 positions in the final dataset. Evolutionary analyses were conducted in MEGA X [26].
of colon cancer, we performed mass spec analyses of protein samples from colon cancer and normal tissue, as well as laser microdissection of these samples. Mass spec analyses of protein samples from colon cancer and normal colon tissue (Bund et al., in preparation), resulted in identification of multiple specific peptide motifs originating from 3 BMMF2 isolates. Specific peptide motifs could be allocated, 5 each to C2MI.15B.16 and C2MI.9B.10 (Table 2). Interestingly, several of these amino acid motifs indicating a possible functional involvement, were located on the negative DNA strand. Isolates C2MI.10A.1 and C2MI.10A.2 share 99.7% nucleotide identity, but 3 specific peptide motifs were identified in mass spec to C2MI.10A.1, whereas C2MI.10A.2 harboured only 2 of these 3 motifs. BMMF isolates sharing >90% identity in their nucleotide sequences, were initially regarded as variants of a single genome. Based on the possible biological significance of such small differences, we decided to include 97 BMMF2 isolates for more detailed analyses. In addition, we recovered a member of the BMMF1 group from colon tissue by laser microdissection. Cloning and sequencing confirmed a H1MSB.1 genome sequence modified from the original isolate MSB1.176 (now H1MSB.1) from an autopsy brain sample of a multiple sclerosis patient (Figure 4(a)). The nucleotide modifications included mainly C/T or T/C or less often, A/G or G/A transitions. These led to alterations in the putative gene organization with an additional 62aa putative protein in LD10.154 ( Figure 4). Domainsweep analyses indicated a weak similarity to a domain (diarrhea inducing) of a nonstructural glycoprotein of Rotaviruses.

Characteristics specific for putative BMMF replication proteins
Bacterial plasmids are classified based on nucleotide homology of their replicase genes [40]. We identified 2 overlapping putative Rep genes in the BMMF2 isolates differing between 30aa to 50aa in length. Putative Rep proteins of all BMMF2, as well as BMMF1 isolates were compared. A core conserved region (329aa) was used in the phylogenetic analyses. BMMF2 rep proteins were grouped into 2 clades (A and B), whereas BMMF1 rep proteins formed a third clade (C) (Figure 1). Clades A and B shared 60-65% similarity. Rep proteins in clade A shared 69-90% similarity within the clade and 70-82% to the p4ABAYE rep. Similarly, clade B rep proteins shared 80-90% within the clade and 59-62% to p4ABAYE rep. The rep proteins of BMMF1 (clade C) are more diverse and shared almost no similarity (10-12%) to clades A and B. The rep proteins of H1MSB.1, C1MI5.1, and C1MI.5.2 are identical and share 97% identity to that of Sphinx1.76 [33]. These all share 75-88% similarity to C1MI.1, C1MI.2, C1MI.3, C1MI.4 and C1HB.6.2 reps. This group of rep proteins share only 45% identity to C1HB.4rep, 37-39% to C1HB.5rep and 58% to H1MSB.2rep. The Rep proteins of the newly identified BMMF1 isolates described in this study, share 50-62% to the other BMMF1 isolates.
In silico structure prediction [41] of putative BMMF Reps revealed a partial structural homology of Rep with known replication initiator proteins of the Rep superfamilies 1, 2 and 3. Catalytically active motifs previously identified in the Reps of Geminivirus and other known ssDNA viruses were identified in the BMMF2 Reps [3,42,43]. Motif I (consensus aa FLTYP) is generally less conserved and also partially represented in BMMF2 Reps (T(F/L)(T/S)V(K/R)N). Motif II (consensus HxH) is important for metal ion binding and iswith one exception C2MI.5B.10reppresent as H(V/L/I)H in all BMMF2 reps. Motif III containing the active site tyrosine is present as consensus YALK in all BMMF2 reps. Motif positions in the C2HBI7Rep, taken as representative, are as follows: motif I at aa 110-115, motif II at 172-174 and motif III at aa 227-230. The lengths of amino acid spacers between the motifs in C2HBI7Rep are roughly 60aa between motifs I and II and 50aa between motif II and III. Putative Walker A, B and C (SF3 helicase) motifs are not conserved amongst the BMMF2 Reps. Structure predictions based on amino acid sequences of BMMF1 Reps show similarity to RepB within the Rep superfamily 3 which also harbours RepE, RepC and RepA. Crystallography based on the N-terminal domain of the H1MSB.1 Rep (aa1-136) resulted in a structure with a resolution of 1.53 [44]. The overall structure showed a remarkable similarity with the structure of the RepA protein of Pseudomonas syringae despite sharing only 28% amino acid sequence identity.
The putative rep proteins of 14 BMMF2 isolates did not result only from one continuous ORF, but were compiled of more than one truncated ORF. BLASTP analyses of the smaller putative proteins indicated similarity to the Rep gene. A similar phenomenon was observed in a previous study on TT viruses (Anelloviridae). Here the premature stop codons had been verified and resulted from confirmed single nucleotide differences between independent isolates of the same TTV type [16]. In order to ascertain whether potential introns were present in BMMF2 genomes (with and without continuous Rep genes), we performed GENSCAN analyses [23]. This prediction tool was developed to identify genes in prokaryotes, eukaryotes and plants and was most suitable for our present purpose. We aligned the resulting genes to the respective gene maps obtained on the full-length nucleotide genomes. The GENSCAN result of A. baumanii plasmid p4ABAYE is included (Figure 3, Suppl. Table 3). The closest related isolate with an intact Rep ORF is presented with each isolate with truncated Rep ORFs. The presence in almost every isolate of a polyadenylation (poly(A)) signal (restricted to 6nt) is very notable in the GENSCAN products, including in the plasmid p4ABAYE. The polyA signal (AATAAA) of the latter was identical to the majority of putative poly(A) signals in the BMMF2 group. Variations (ATAAAA, AAATAA) were detected in 9 other isolates. Notable differences were AATCAA in C2MI.5A.2 and CTTAAA in C2MI.16B.2. No significant hits were obtained in the majority of BLASTP analyses performed on most of the additional putative proteins identified in various BMMF2 isolates. DomainSweep analyses provide a compilation of protein characteristics resulting from identification of protein structures by employing different protein database search methods and scanning a number of protein domain/family databases [22,45]. The minority of additional putative proteins identified in individual isolates shared very weak similarities (i.e. not significant similarities, but nevertheless included in the first 10 listed for the respective search method used in Domainsweep) to proteins with known functions. In view of the aim of the present study, we regard these data as indicative of possible functions, but needs to be verified by further investigation. Interestingly a few putative Rep proteins were assigned very weak similarities to RNA helicase, to the RNA-binding S4 domain and to RNA polymerase II Rpb4 core domain, as well as a putative protein from region 5 Figure 3. Schematic presentation of the ORF maps of BMMF2 isolates displaying truncated ORFs which encode for putative Rep proteins (arrows in red) in comparison to closely related isolates with intact Rep ORFs (blue arrows). The ORF map of p4ABAYE is shown in the middle. Arrows in same colour (green or yellow) represent ORFs which encode putative proteins showing high similarity between isolates. The GENESCAN analyses indicating the resulting splicing transcriptional events are indicated in purple. The large purple dot represents the poly A signal. In some cases no poly A signal is present for the respective transcript.
(116aa) to a specific subgroup of CDD/CDA-like deaminases [46] ( Table 1). The H1MSB.1 isolate recovered from colon tissue displayed mainly C/T modifications. These transitions led to alterations in the genome structure ( Figure 4) resulting in putative genes which could be involved in pathogenesis. Small ssDNA viruses experience higher mutation rates when in a changing environment such as chronic infections or host immune defence [47]. These modifications could also lead to more virulent viral strains inducing severe disease [48]. RNA editing of hepatitis D virus occurs via C/T or on the negative strand as a functional A-to-G substitution [49]. The unexpected similarities to RNA helicase and deaminase, the perfect matching peptides (identified in mass spec) which are allocated to the negative strand of the respective genome, as well as the two overlapping putative Rep genes, posed the question to the origin of these BMMF2 isolates which were initially all obtained by DNA amplification. These characteristics are very conspicuous and stimulated speculation as to whether parallels can be drawn to the structure and replication of Hepatitis delta virus [50], but this requires further investigation.
In addition, the following two results deserve discussion: C2HB7 (BMMF2) codes for a putative 228aa protein sharing a weak similarity to the bacterial ribbon-helix-helix CopG protein which acts as a transcriptional repressor on the Rep gene [51]. The putative 291aa protein of C1MI.1 (BMMF1) also shares weak similarity to the CopG protein. Surprisingly this 228aa putative protein of C2HB7 displays similarity to viral proteins as well, i.e. it shares weak homology to the capsid protein of narcissus mosaic RNA virus reflecting recombination between DNA and RNA viruses during evolution [52]. Bacilladnaviruses (with a partial double-stranded DNA genome) probably acquired their capsid gene from ssRNA nodaviruses by horizontal transfer [11].
Possible recombination events within the BMMF2 group were also noted. The 196aa putative protein of C2HB7 shares 90% identity with the 183aa putative protein of C2MI.3A.2, whereas their rep proteins share only 45-55% identity (conserved core 64%). The nucleotide identity between these full-length genomes is 55%. Possible recombination events were noted in aligning the full-length nucleotide genomes of all BMMF2 isolates. This observation needs to be confirmed by additional phylogenetic analyses which were not covered in this study.

Discussion
Infectious agents are involved in the pathogenesis of a number of malignant diseases [8]. Based on available epidemiological data, we intensified our search for yet unknown infectious agents in additional cancer types [8,14]. We described the isolation of BMMF factors from serum obtained from healthy cattle, as well as brain and serum from patients suffering multiple sclerosis [14]. Closer characterization of these isolates not only revealed similarities to plasmids from A. baumanii, but included features of circular ssDNA viruses in plants and animals. The recovery of a BMMF1 isolate H1MSB1 from the peritumoral lamina propria cells of colon cancer patients with CD68 macrophage invasion [15] also demonstrates the association of these agents to infections of the colon. These results potentially place such infectious agents into a new class of pathogens [15].
Acinetobacter baumanii is broadly distributed in the environment and commonly found in water, soil and animals [53]. Classification of the Acinetobacater baumanii plasmids is based on typing of the replicons and replicase genes [40]. These plasmids typically have 4 conserved direct repeats (iterons) and a Rep gene belonging to the Rep_3 superfamily or no iterons when comprising a Rep gene belonging to the Rep1/2 superfamily. Isolates of the BMMF1 group were related to a broader range of bacterial plasmids. They harbour iterons and have genes encoding initiator of replication proteins (RepB or Rep superfamily 3). RepB does not have an intrinsic helicase activity, but retains this activity through formation of a hexameric ring structure [54]. The latter is also a necessity for the helicase activity in eukaryotic viruses. One putative protein of isolate C1HB4 has weak similarity to a mobilization protein. In contrast, the closest related sequence to the BMMF2 isolates is plasmid p4ABAYE, the smallest of 4 plasmids of A. baumanii strain ABAYE. This plasmid has been described as cryptic and is classified together with plasmids pTS236 and pA85-1 in the GR14 clade [35,40,55]. Plasmid p4ABAYE is unique and seems to have followed a different evolutionary pathway than other bacterial plasmids. Distinct differences are that p4ABAYE has no iterons, has lost its mobility, with only the replication (Rep_1 gene) and transfer functions remaining, and did not share any link with the 6 chromosomes of A. baumanii, i.e. does not participate in exchange of genes [53]. The BMMF2 isolates share characteristics with plasmid p4ABAYE, i.e. size of genome (2726nt), nucleotide similarity, do not have iterons and have a gene encoding a putative Rep protein belonging to the Rep superfamily 1/2. BMMF1 and BMMF2 also display viral features. The Rep genes of recently identified CRESS (circular Repencoding ssDNA)-DNA viruses are highly conserved and form a hexameric ring structure [3]. These ssDNA viruses (which include the plantviruses Geminiviridae and Nanoviridae) are all characterized as having a stem-loop structure as the origin of replication. Such a putative stem-loop structure is present in BMMF1 isolates (in addition to the bacterial iterons). The second large ORF of several CRESS-DNA viruses shows some similarity to capsid proteins [3]a similar observation was made for single BMMF2 isolates. The in silico observation of "truncated" ORFs which putatively code for a Rep protein may represent a similar mechanism of transcription as described in plant RNA viruses. Here the non-canonical translation includes leaky scanning, stop-codon read-through and ribosomal frameshifting [56]. The presence of polyA signal in almost all putative transcripts including that of p4ABAYE, also indicates transcription in eukaryotes.
The most striking feature of both BMMF1 and BMMF2 groups is the existence of larger full-length genomes in conjunction with smaller circular satellite-like genomes encoding for only Rep-associated genes. The bacterial iterons are also present in smaller genomes in BMMF1. Members of the recently established family Alphasatellitidae all have a stemloop structure as origin of replication. Although their specific function remains unclear, their co-infection is associated with exacerbation or reduction of disease symptoms in plants [31]. Similarly, the noncoding deltasatellites (682nt) associated with genus Begomovirus of the Geminiviridae seem to influence the severity of disease or virus accumulation in plants indirectly [57]. Co-infection of a "large" (C1MI.1, 2523nt) and "small" (H1MB1, 1776nt) BMMF1 isolates resulted in increased transcription of the smaller genome [21]. The identification of even smaller circular genomes in both BMMF1 and BMMF2 -C1MI.3Ms (461nt with a 75aa putative Rep protein), C2MIs.5A.1, C2MIs.7A.1, C2MIs.9A.1, and C2MIs.10A.1 (697nt encoding for a putative fulllength Rep protein), is intriguing. Future in vivo investigations are urgently needed exploring their functions, as this may indicate whether these molecules act like viral satellites influencing or even causing disease symptoms.
This group of infectious agents has been implicated in the pathogenesis of colon and breast cancer [15]. Although the evolutionary origin is unknown, it seems to fall between a unique bacterial plasmid and circular ssDNA viruses. We suspect that it originated from a bacterial plasmid which has evolved to adapt to infect and replicate in mammalian cells. The acquisition of smaller satellite-like genomes may have evolved subsequently in order to strengthen and/or extend survival or to convey pathogenic symptoms. Indirect hints as to the origin and structure of our isolates, as well as its replicative behaviour, result from parallels drawn from Hepatitis deltavirus genome structure and replication.
Thus far, the present available analyses of human whole genome sequencing precluded the identification of originally bacterially derived plasmids as human pathogens by dismissing and discarding sequences suspected to originate from bacterial contaminants. A different perspective for investigating an unexplored spectrum of potential human pathogens seems to be highly desirable and recommended.