Domestic animals as models for biomedical research

Domestic animals are unique models for biomedical research due to their long history (thousands of years) of strong phenotypic selection. This process has enriched for novel mutations that have contributed to phenotype evolution in domestic animals. The characterization of such mutations provides insights in gene function and biological mechanisms. This review summarizes genetic dissection of about 50 genetic variants affecting pigmentation, behaviour, metabolic regulation, and the pattern of locomotion. The variants are controlled by mutations in about 30 different genes, and for 10 of these our group was the first to report an association between the gene and a phenotype. Almost half of the reported mutations occur in non-coding sequences, suggesting that this is the most common type of polymorphism underlying phenotypic variation since this is a biased list where the proportion of coding mutations are inflated as they are easier to find. The review documents that structural changes (duplications, deletions, and inversions) have contributed significantly to the evolution of phenotypic diversity in domestic animals. Finally, we describe five examples of evolution of alleles, which means that alleles have evolved by the accumulation of several consecutive mutations affecting the function of the same gene.


Introduction
During the last 30 years I have used domestic animals as models for biomedical research. Why domestic animals, when there are well-established animal models such as mouse, zebra fish, and Drosophila? The unique feature of domestic animals is their long history of selective breeding. Ever since the first animals were domesticated (1) humans have changed their gene pools by favouring animals that could survive and reproduce in captivity, and that provided useful commodities to humans such as food, skin and fur, transportation, and company. Domestication and breeding is an evolutionary process in which gene variants with favourable phenotypic effects are enriched. A major aim in animal genomics as well as in most genome projects is to reveal genotype-phenotype relationships and to study their underlying molecular mechanisms.
In my PhD thesis defended in December 1983 I expressed the vision that the emerging methods of molecular genetics will unleash the full potential of domestic animals as models for genetic studies of phenotypic traits. I started this research programme the following year as a guest post-doctoral fellow at the Department of Cell Research in Uppsala led by Per A. Peterson and Lars Rask-two of the pioneers applying molecular genetics for biomedical research in Sweden. Furthermore, they led a strong research programme on major histocompatibility complex (MHC) genes and their role in the human immune system. This area was and still is of considerable interest in domestic animals because immune response and disease resistance are of major importance in animal breeding. Thus, during my first years as an independent researcher I characterized the MHC class II region in cattle and demonstrated the presence of extensive genetic diversity at this locus (2), as well as providing evidence that genetic polymorphism at the cattle DRB3 locus is maintained by balancing selection (3). However, by the end of the 1980s, after PCR was invented, it became possible to do genome research, and since then my research team has been working in the field of genetics and genomics. Table I   most cases the specific mutation(s) causing a phenotypic effect.
My vision to use domestic animals as models for biomedical research by taking advantage of the advances in molecular genetics and genomics has been shown to be a fruitful approach that has resulted both in new basic knowledge and in practical applications. However, during the first 20 years of my career it was not easy to convince funding bodies that this was an approach worth supporting. My research applications were sometimes rejected by funding bodies supporting agricultural research because the approach was considered of limited practical interest and was rather to be classified as basic research, and the applications were consistently rejected by the Swedish Research Council (supporting basic research) because they were considered too applied.
Research on the genetic basis for diseases and disorders in domestic animals is of outmost importance for veterinary medicine and animal breeding in order to keep disease incidence at a minimum level. However, domestic animals are less important as models for human disease for several reasons. Firstly, genetic variants causing disease usually occur at a low frequency because there is strong selection to eliminate unfit animals, in particular in the animals we use for food production. Secondly, the clinical characterization of disease is rarely as advanced in animals as in human medicine. Thirdly, thanks to the development of powerful methods for genetic studies in humans the need for using animals to identify candidate genes for human disease has become less important. However, when one finds mutations in domestic animals in genes associated with human disease, the domestic animal can be used as a model to develop disease prevention and new treatments.
The major merit of domestic animals as models for biomedical research relates to the rapid phenotypic evolution that has occurred during the course of animal domestication. Here domestic animals give a unique possibility to gain insight into molecular mechanisms underlying phenotypic change. The aim with this review is to give examples of important discoveries that we have made by studying some of the traits under selection in domestic animals. Table I lists 50 phenotypes in various domestic animals for which my group has been involved in a study that has resulted in the detection of the underlying gene and in most cases the underlying mutation(s). This list involves about 30 different genes. The number of genes is lower than the number of traits because some genes are associated with multiple phenotypes. For instance, we have identified mutations in the melanocortin 1 receptor (MC1R) gene underlying six different variant alleles affecting pigmentation in horse, pig, and chicken. For about 10 of the genes listed in Table I our study was the first in any organism that has documented an association between that gene and a phenotype. These studies have therefore contributed to the functional annotation of the vertebrate genome. For instance, the paper on our finding that a 7.4 Mb inversion disrupts the coiled-coil domain containing 108 gene (CCDC108) which leads to poor sperm motility in homozygous Rose-comb roosters (4) is still the only publication on the CCDC108 gene in any organism. It is likely that loss-offunction mutations affecting this sperm protein gene are also causing reduced sperm motility and reduced fertility in some human males.

General lessons
Only a handful of the traits listed in Table I are obvious disorders. One example is a missense mutation in the lowdensity lipoprotein receptor gene (LDLR) that we identified in a strain of pigs used as models for hypercholesterolemia in humans (5). Other examples include leukocyte adhesion deficiency in dogs caused by a missense mutation in the integrin beta 2 (ITGB2) gene (6), sensory ataxic neuropathy in dogs caused by a one base pair deletion in the mitochondrial genome (7), and fishy off-flavour in cattle, a condition where homozygous cows produce milk that smells of rotten fish, caused by a missense mutation in the flavin containing monooxygenase 3 (FMO3) gene (8). Mutations in the corresponding genes cause very similar diseases or disorders in humans. In the three last-mentioned cases a diagnostic DNA test has been developed and used to reduce or eliminate disease. Furthermore, a colony of Irish setters carrying the mutation causing leukocyte adhesion deficiency was subsequently used as a model for human gene therapy (9). Sick dogs were cured by introducing a functional copy of the ITGB2 gene ex vivo into hematopoietic stem cells which were then introduced to the affected animal, an example of how a big animal model could be very valuable for the development of new therapeutic strategies in human medicine.
A long-standing question in biology is the relative importance of coding and non-coding mutations for explaining phenotypic variation and disease (10). Almost 50% of the mutations listed in Table I are non-coding, and this strongly suggests that non-coding mutations by far dominate among the mutations underlying the phenotypic diversity present in domestic animals. This conclusion is based on the fact that this is a biased estimate inflating the proportion of coding changes because they are much easier to find. Firstly, changes in coding sequences often lead to alleles with more striking phenotypic effects. Secondly, it is much more straightforward to interpret the functional consequence of changes in coding sequence than in non-coding sequence. These conclusions are consistent with data from human genetics showing that whereas a majority of mutations causing severe inherited disorders affects coding sequences, a majority of sequence variants associated with increased risk for a multifactorial disorder occurs in noncoding sequences (11).
Our characterization of the genetic basis for a phenotypic trait has revealed the importance of structural changes (deletions, duplications, and inversion). Table I lists about 10 examples of structural changes being the causal mutation for a phenotypic trait. A common theme here is that the structural change leads to altered regulation of one or several of the genes directly affected by the structural change or located in the close vicinity of the structural change. This may occur because a regulatory element is deleted (e.g. Dark brown colour in chicken) (12), or duplicated (e.g. Greying with age in horses) (13), or because a gene has been translocated to another position and is there influenced by another constellation of regulatory elements (e.g. Rose-comb in chicken) (4).
Another interesting finding is that we have documented evolution of alleles, which means the accumulation of multiple consecutive causal mutations in the same gene. The genetic literature on the identification of causal mutations is largely based on studies of monogenic disorders in humans or mutations causing monogenic phenotypes in experimental organisms. These are almost always due to a single hit. However, our data from domestic animals gives a different picture because animal domestication has a sufficiently long history (about 10,000 years) to allow evolution of alleles. We have so far documented five examples where multiple causal mutations contribute to a phenotype: Dominant white colour in pigs (14), Black spotting in pigs (15), Smoky colour in chicken (16), the Rose-comb2 allele in chicken (4), and White-spotting in dogs (7). It appears plausible that a considerable portion of the phenotypic diversity that occurs in natural populations, including the one underlying the risk to develop multifactorial disorders in humans, is due to allelic variants that differ by multiple causal differences rather than single mutations with large effects.

Pigment cell biology
Many of the phenotypes listed in Table I are related to pigmentation. Pigmentation has been used throughout the history of genetics as a model to study how genes act and interact because they often show a simple monogenic inheritance that also facilitates the identification of casual genes and mutations. Furthermore, the phenotypic readout is precise, allowing the scoring of subtle phenotypic differences.
A striking difference between domestic animals and their wild ancestors is the amazing coat colour diversity, as species in the wild in most cases show very modest variation. In fact, coat colour was one of the first traits that changed after domestication, and colour variants in domestic animals are mentioned in some of our earliest written records from the Ur III dynasty in Mesopotamia dated to about 5,000 years before present (17). So why did the coat colour change in our domestic animals? Most importantly, humans have actively selected for colour variation among domestic animals because 1) this allowed us to distinguish our prized domesticated animals from their wild ancestors at a time when gene flow often occurred; 2) selection against camouflage facilitated animal husbandry; and 3) we and apparently our ancestors appreciate diversity of colour and therefore have kept animals carrying novel phenotypes as long as the variant is not associated with deleterious effects that reduce their utility. Relaxed purifying selection has most likely also contributed to the rich coat colour diversity in domestic animals.
Our study comparing genetic variation in the melanocortin 1 receptor (MC1R) gene among wild and domestic pigs illustrates the striking difference in selection pressure in the wild and at the farm (18). MC1R is expressed at the cell surface of melanocytes and has a critical role in determining pigmentation in vertebrates because it controls pigment switching, with the absence or presence of MC1R signalling being associated with the synthesis of red and black pigment, respectively (19). Mutations in MC1R are clearly the most common reason for different colour morphs both in domestic animals and in wild species, most likely because the function of MC1R is largely restricted to the pigment cell, which means that mutations in this gene are not associated with strong negative effects on other traits. MC1R mutations are causing the dominant black, recessive red, and black spotting coat colour variants in pigs (Table I). A striking difference between wild boars and domestic pigs is that the wild boar piglets are striped whereas piglets carrying MC1R mutations are not ( Figure 1). The striping pattern is a camouflage colour requiring MC1R pigment switching, and this mechanism is disrupted by these mutations (18). In our study we sequenced the entire MC1R coding sequence from European and Chinese domestic pigs as well as European and Asian wild boars. We identified seven different sequence variants among the wild boars, and all were synonymous, and thus all tested wild boars expressed an identical MC1R protein sequence despite the fact that European and Asian wild boars are classified as different subspecies that diverged about 1 million years ago (20). This indicates strong purifying selection to maintain camouflage colour in the wild. In contrast, 9 out of 10 sequence variants detected among domestic pigs changed the protein sequence consistent with strong selection to change colour. In this study we analysed 51 different pig breeds from Europe and China, and almost all breeds carried MC1R mutations. The only exception was the Mangalica pig from Hungary that carried the Winner of the Rudbeck Award 2013, at the Medical Faculty of Uppsala University for his pioneering studies of the pathogenesis of many non-communicable diseases by means of molecular and animal genetics.
MC1R wild-type allele, and their piglets are in fact striped like the wild boar!
The black spotting phenotype in pigs ( Figure 1B) is a particularly interesting variant because this allele is the result of two consecutive mutations (15). Firstly, it carries the D121N missense mutation leading to a constitutively active receptor causing the dominant black colour variant. In addition, the black spotting allele is associated with a two base pair insertion at codon 23 (nt67insCC) causing a frameshift and thus a complete loss-of-function. As explained above, lack of MC1R signalling is expected to lead to only red/yellow pigmentation, so how can these pigs show black spots? By sequencing MC1R mRNA isolated from black spots we were able to demonstrate that this is caused by somatic mutations that restore the reading frame. The most likely explanation why this happens at a high frequency is that the insertion of two cytosine nucleotides occurs in a stretch of six cytosines and this results in a mononucleotide repeat (CCCCCCCC) that is somatically unstable. This illustrates why studies on pigmentation have been so rewarding because it would have been extremely challenging to reveal such a somatically unstable mutation if the gene for instance affected insulin secretion unless one characterized the release from individual cells. The MC1R black spotting allele is one of our examples of 'evolution of alleles' by the accumulation of several consecutive mutations affecting the same gene.

Behaviour
Behaviour is another trait that has changed dramatically after animal domestication. Changes in behaviour were required for the animals to survive and reproduce in captivity. There is a huge potential to study the genetic basis for variation in behaviour in dogs due to the complex interaction between humans and dogs that has evolved since domestication (21). There is also a very fascinating diversity in behaviour among breeds where dogs have been bred for various tasks such as herding, hunting, retrieving various objects, guarding, or just for pleasure as a companion to humans. However, so far, little progress has been made in identifying specific genes underlying variation in dog behaviour. A possible reason is that behaviour has a very complex genetic background with many genes involved.
We have used the rabbit as a model to study the genetic basis for domestication (22). There are three main reasons why the rabbit is a good model for studies of domestication. Firstly, domestication is relatively recent, only about 1,400 years ago. Secondly, we know where rabbit domestication took place (Southern France), and at that time wild rabbits (Oryctolagus cuniculus) were restricted to Southern France and the Iberian Peninsula. Thirdly, the area where domestication happened is still densely populated with wild rabbits that can be sampled for genetic studies. Thus, we can make very precise comparisons of allele frequency differences between domestic rabbits and relevant populations of wild rabbits. This is more difficult for other domestic animals. For instance, domestication of the wolf happened a long time ago (15,000 years before present or earlier) (1), and at that time wolves were spread across the entire Northern hemisphere. It is also possible that the population(s) of wolves that contributed mostly to dog domestication has become extinct due to human expansion. Furthermore, since domestication there has probably been a considerable amount of gene flow between wolves and dogs. This complex demographic history blurs the picture because it is difficult to deduce whether an observed difference in allele frequency between dogs and contemporary wolves is caused by selection, genetic drift, or because there is a genetic difference between contemporary wolves and the wolf population(s) that contributed to dog domestication.
We carried out whole-genome sequencing of 14 population samples of wild rabbits and 6 breeds of domestic rabbits (22). We sequenced pools of animals from each population, and each pool comprised 10 to 20 animals. The rabbit is one of the most polymorphic mammals sequenced so far, and the nucleotide diversity measured as the average number of nucleotide substitutions per 1,000 base pairs between two random chromosomes is nine times higher in rabbits (about 9) than in humans (about 1). We identified a total of 50 million SNPs in rabbits and compared the allele frequency of these in wild and domestic rabbits. This analysis revealed strong signatures of selection and the major conclusions were: 1) rabbit domestication has a highly polygenic basis involving many hundreds of genes; 2) non-coding changes dominate largely over changes in coding sequence; 3) we observed very few complete fixations but rather shifts in allele frequencies consistent with a polygenic basis where the majority of loci have small phenotypic effects; and 4) sequence variants in the vicinity of genes with an established role in brain and neuronal development were highly enriched among those showing the strongest differentiation between wild and domestic rabbits. This implies that changes in genes affecting behaviour have played a prominent role during rabbit domestication. This result makes perfect sense because the wild rabbit has a very strong flight response making it extremely difficult to keep them in captivity, whereas domestic rabbits are well adapted to a life in captivity. In fact, Charles Darwin wrote in On the Origin of Species that '. . . hardly any animal is more difficult to tame than the young of the wild rabbit; scarcely any animal is tamer than the young of the tame rabbit . . .' (23). We postulated that tame behaviour in rabbits and other domestic animals has a truly complex genetic background and evolved by shifts in allele frequencies at many loci rather than by critical changes at a few domestication loci.

Metabolic traits
The metabolism of our domestic animals has often been drastically altered by our need to use them for food production. A good example is layer chicken that have been selected to produce more than 300 eggs during a year without being mated to a rooster, whereas its wild ancestor, the red junglefowl female, normally produces one clutch of eggs after mating. We have in particular studied the altered metabolism and body composition in pigs. Since the 1940s there has been a drastic increase in pig muscle growth and a corresponding decrease in fat deposition due to the consumer's demand for lean meat. This was achieved after procedures to measure body composition (the relative proportion of protein and fat in the carcass) were introduced and powerful statistical methods for calculating breeding values were developed. We have identified two major loci that have responded to the strong selection for lean meat: the RN locus affecting glycogen content in skeletal muscle, and the IGF2 locus affecting muscle growth.
The RN story started in France where researchers noted that there was a meat quality problem in Hampshire pigs since a large proportion of the individuals produced meat with an unusually low pH (measured 24 h after slaughter), reduced water-holding capacity, and reduced yield of cured cooked ham. This had a large effect on pig production worldwide since it is a common practice to use Hampshire pigs as a sire line mated to a dam line, which means that a large proportion of the pigs used for meat production would have a Hampshire male as the father. Further research revealed that these effects on meat quality were due to a 70% increase in glycogen content in skeletal muscle and that this phenotype showed a simple monogenic inheritance with two alleles, RN À (high glycogen) and rn + (normal glycogen) (24). The fact that liver glycogen was normal in mutant pigs suggested that the mutation underlying this phenotype affected the function of a muscle-specific isoform not expressed in the liver.
In the mid-1990s we and others decided to attempt to identify the gene underlying the RN phenotype by positional cloning, but this was very challenging at the time because there was no genome assembly available from any vertebrate. The first step was to map the locus to a region on pig chromosome 15 by classical linkage analysis using pedigree data (25). We then investigated whether the corresponding region in the more well-studied human genome harboured any candidate genes with a known role in glycogen metabolism, but that was not the case. We then carried out a very laborious procedure where we isolated the entire region harbouring the RN locus in overlapping bacterial artificial chromosomes (BACs) (26). The BACs were used to isolate new genetic markers and for further fine mapping that eventually resulted in the assignment of the locus to a region of about 100 kb present in a single BAC. We sequenced this entire BAC and identified four genes. One of these was particularly interesting because it encoded a homolog to the regulatory g-chain of the SNF4 kinase in yeast that has a key role in carbohydrate metabolism including glycogen metabolism. This kinase named AMPactivated protein kinase (AMPK) in vertebrates is a heterotrimeric protein composed of a catalytic a-chain and noncatalytic band g-chains. At the time two a-chain genes (PRKAA1 and PRKAA2), two b-chain genes (PRKAB1 and PRKAB2), and two g-chain genes (PRKAG1 and PRKAG2) had been identified in humans. We therefore named our newly discovered isoform PRKAG3 (27). The fact that AMPK is an important energy-sensing enzyme, activated by high AMP-low ATP, made it an excellent positional candidate gene for the RN phenotype. Northern blot analysis provided very strong support for this notion because it revealed that PRKAG3 is a muscle-specific isoform not expressed in the liver, in perfect agreement with the RN phenotype. Furthermore, sequence analysis revealed an R225Q missense mutation at a site that is extremely well conserved among AMPK g-chains in eukaryotes. Genetic analyses in many thousands of pigs have provided conclusive evidence that R225Q is the causal mutation for the RN phenotype.
The year after our discovery, another group reported the identification of a quantitative trait locus (QTL) for glycogen content that mapped to the PRKAG3 region, and they could exclude the presence of the R225Q mutation (28). The data strongly suggested that the causal mutation for this phenotype (lower glycogen than the wild-type) was a missense mutation V224I at the neighbouring residue. Transfection experiments in COS cells demonstrated that the wild-type allele V224R225 showed low kinase activity at low AMP and was activated by high AMP, whereas the V224Q225 (RN -) allele was constitutively active, and the third allele I224R225 showed low activity and could not be induced by high AMP (29). Thus, the result in this assay is a perfect match in relation to the glycogen content in skeletal muscle V224Q2254V224R2254I224R225. The explanation for the strong effects of these missense mutations became apparent a few years later when it was demonstrated that these two residues are located in the binding pocket for AMP and ATP that regulates AMPK activity (30).
To explore further the functional significance of the PRKAG3 isoform we created transgenic mice overexpressing the wildtype (R225) or mutant (Q225) forms in skeletal muscle as well as a PRKAG3 knock-out mouse (29). The transgenic mutant but not the transgenic wild-type mouse showed excess glycogen content in skeletal muscle and thus replicated the pig phenotype. Surprisingly, the knock-out mice showed normal glycogen levels and normal glycogen utilization during exercise. However, re-synthesis of glycogen after exercise was impaired, and in vitro tests on skeletal muscle showed defective AMPK-mediated glucose uptake (29). Taken together the combined pig and mouse data show that the AMPK isoform containing the muscle-specific g3-chain has a key role in monitoring glycogen content in white skeletal muscle and promotes glycogen re-synthesis after exercise by activating glucose uptake and fatty oxidation (29). In humans, a rare naturally occurring R225W mutation affecting the same residue as the pig R225Q mutation has been reported (31,32). Individuals carrying this mutation showed about 90% higher glycogen content in skeletal muscle, about 30% lower intramuscular triacylglycerol content, and no obvious impairment in glucose metabolism, in agreement with the pig and transgenic mouse phenotypes associated with the R225Q mutation.
The PRKAG3 protein is a validated drug target for treatment of type II diabetes. It is plausible that a drug activating the AMPK isoform involving PRKAG3 would have a positive effect on blood glucose levels by activating insulin-independent glucose uptake and promoting fatty acid oxidation mimicking the effects of exercise. However, the project is challenging because it is more difficult to activate a kinase than to inactivate it; in the latter case one 'only' needs to find a small molecule that interferes with protein function.
The discovery of the mutation in IGF2 encoding insulin-like growth factor II was made using our intercross between Large White domestic pigs and the European wild boar (33). This intercross was initiated already 1989 with the ambition to develop a comprehensive linkage map for pigs and to map loci underlying phenotypic traits. The pedigree comprised 200 F 2 progeny that were carefully phenotyped for a range of traits including body composition and weight of internal organs. A genetic analysis revealed a paternally expressed QTL with major effects on muscle growth, subcutaneous fat depth, and the size of the heart that mapped to the IGF2 region (34). F 2 progeny that had inherited the domestic pig allele showed 3%-4% more muscle, lower subcutaneous fat depth, and had a bigger heart. The fact that both this QTL and IGF2 showed paternal expression and that IGF2 is an important growth factor strongly suggested that the causal mutation(s) for this QTL affected IGF2 function.
Sequence analysis revealed that the IGF2 coding sequence was identical in wild boar and domestic pigs, suggesting a regulatory mutation. A problem with genetic analysis of quantitative traits or multifactorial disorders is that there is no simple one-to-one relationship between genotype and phenotype because each QTL controls only a fraction of the variance. However, in domestic animals it is possible to transform a quantitative trait to a Mendelian trait by progeny testing. In collaboration with Michel Georges (University of Liège, Belgium), who had independently identified the IGF2 QTL in an intercross between Large White and Piétrain pigs (35), we used pedigree analysis to identify individual sires that segregated for the QTL and then sorted their chromosomes as Q (for high muscle growth) and q (wild-type). Sequence analysis of the entire IGF2 region as well as the flanking regions including the insulin gene showed that the mutation(s) underlying the QTL must be located within the IGF2 region since all Q chromosomes were identical-by-descent (IBD) for a 29 kb region within IGF2. The problem was that the sequence divergence between the Q and q chromosomes was as high as 1%, meaning that there were about 300 sequence differences between the two types of chromosomes. A 1% sequence difference is exceptionally high between two alleles from the same species (cf. orthologous sequences from human and chimpanzee show on average a 1.2% sequence divergence), and we therefore hypothesized that the Q allele may originate from Asian pigs since we had shown a few years before that European and Asian pigs were domesticated from different subspecies of wild boars and that there has been a considerable import of Asian pigs into Europe during the eighteenth and nineteenth centuries (36). We therefore started a collaboration with Alan Archibald and Chris Haley (The Roslin Institute, Edinburgh, UK) who had developed an intercross between Chinese Meishan pigs and European Large White pigs. Statistical and sequence analysis showed that the Meishan pigs carried chromosomes classified as q, and it turned out that these differed from the Q allele by a single base change (C to G) in intron 3 of IGF2, providing conclusive genetic evidence that this must be the causal mutation (37). This was the first study in any organism that revealed the causal mutation for a multifactorial trait or disorder as a single base change in a non-coding region. This major achievement, accomplished long before there was a pig genome assembly, is a beautiful illustration of how powerful genetic studies in domestic animals can be by combining extensive pedigree analysis and the rapid evolution that makes it sometimes possible to identify the wild-type ancestral chromosome for a mutant chromosome. It is still an open question whether the IGF2 mutation arose in Asian wild boars before domestication, in Asian domestic pigs prior to introgression to European pigs, or in Europe after introgression of a wild-type Asian chromosome. This mutation has gone through a selective sweep in modern meat-producing pigs, and a very high proportion of all pigs used for meat production worldwide carry this mutation.
The IGF2 mutation occurs at an evolutionary conserved CpG island in intron 3; 16 base pairs involving the mutated site show 100% sequence identity among 18 out of 18 mammalian species (38). Gel shift experiments revealed that the mutation disrupts the interaction with a nuclear factor present in mouse C2C12 myoblast cells, and a Luciferase reporter assay in the same cell type showed that the wild-type sequence but not the mutant sequence supports repression of transcription from the endogenous IGF2 promoter (37). This result was in perfect agreement with expression analysis showing upregulated IGF2 expression in postnatal skeletal and cardiac muscle from mutant chromosomes but not in prenatal muscle or in liver. Thus, the mutation knocks out the interaction with a repressor, and the effect of the mutation is both tissue-and stagespecific.
The remaining big question after the identification of the IGF2 mutation was which transcription factor binds the wildtype sequence. The mutation did not disrupt the recognition sequence for a known factor, suggesting that it is an unusual site for a known factor or that an uncharacterized factor binds to the site. The latter turned out to be the correct answer when we 6 years later in collaboration with researchers at the Broad Institute (Cambridge, USA) were able to fish out the factor from preparations of SILAC-labelled C2C12 nuclear proteins using biotin-labelled wild-type and mutant oligonucleotides (39). The protein that we named ZBED6 was not just an uncharacterized transcription factor but a previously unknown protein. The reason why it was not annotated as a protein in the human and mouse genomes was that it is encoded by a domesticated DNA transposon located in an intron of another gene, ZC3H11A. Bioinformatic analysis indicated that this DNA transposon must have integrated in the genome more than 200 million years ago, before the split between monotremes and other mammals, but the open reading frame is only maintained in placental mammals. The data imply that ZBED6 is an innovation in placental mammals that evolved after the divergence of the ancestors of marsupials and placental mammals. The extremely high sequence conservation of the two DNA-binding BED domains among placental mammals (39) suggests that sequence changes in these regions are not tolerated and that ZBED6 has evolved an essential function. ChIP-seq analysis revealed about 2,500 putative ZBED6 binding sites in mouse C2C12 cells, and the consensus binding motif GCTCG was in perfect agreement with the wild-type sequence in pig IGF2 intron 3 that has changed to GCTCA in mutant pigs (39). ZBED6 sites are found in GC-rich sequences close to promoters with a peak downstream of the start of transcription. An analysis of histone marks in mouse C2C12 cells showed that ZBED6 primarily binds active promoters (40). Further characterization of ZBED6 in C2C12 mouse myoblasts, mouse and human islet cells, as well as in human colorectal cancer cell lines has demonstrated the following about ZBED6: 1) it regulates IGF2 expression in a variety of cell types; 2) it has profound effects on transcriptional regulation; and 3) it acts as a transcriptional modulator that fine-tunes the expression of many genes including IGF2 (40)(41)(42). The working hypothesis is that the essential function of ZBED6 is to provide metabolic flexibility as illustrated by the pig IGF2 mutation. Wild boars have a better ability to store fat when muscle growth is not needed, whereas domestic pigs carrying the IGF2 mutation tend to shunt available energy to muscle growth.

Patterns of locomotion
The horse is particularly well suited for genetic studies of patterns of locomotion because the gait of horses is critical for the different ways we use horses, e.g. as draught horses for transporting heavy loads, as riding horses for fast transport, or as trotting horses in front of light carriages. All horses can perform the three gaits walk, trot, and gallop (Figure 2A-C). However, some horses can perform alternative gaits. Icelandic horses are a good example since they can perform two alternative gaits, tö lt and pace ( Figure 2D and E). Tö lt is a fourbeat ambling gait that is almost as fast as trot but gives a much smoother ride as the horse always has one foot on the ground. The ability to perform the tö lt is an important reason for the popularity of this breed. Pace is a two-beat lateral gait whereas trot is a two-beat diagonal gait ( Figure 2B and E). Pace is faster than tö lt, as fast as trot, but slower than the gallop.
We decided to explore the genetic basis for gait variations in Icelandic horses because half of the population is classified as four-gaited (walk, trot, gallop, and tö lt) and the other half is five-gaited (walk, trot, gallop, tö lt, and pace). It was also known that this variation was not solely a consequence of training because the ability to pace has a high heritability (43). However, a high heritability does not necessarily imply a simple, monogenic inheritance. For instance, human height has a similar high heritability but is affected by hundreds of genes each with a tiny effect. We carried out a genome-wide association analysis using 40 five-gaited and 30 four-gaited Icelandic horses and a SNP-chip comprising about 50,000 SNPs evenly spread across the genome. Remarkably, the statistical analysis revealed a single marker on chromosome 23 that showed a highly significant association (44). Further analysis quickly revealed a single base change causing a premature stop codon in DMRT3 encoding doublesex and mab-3 related transcription factor 3, which turned out to be the causative mutation. All five-gaited Icelandic horses were homozygous mutant (AA) whereas a majority of four-gaited horses were heterozygous CA, about 40% were homozygous mutant (AA), and a few were homozygous wild-type (CC) (44). It is still an open question whether lack of training or genetic modifiers explain why some homozygous mutants do not perform pace. Further analysis showed that horses used for harness racing (trot or pace) have a very high frequency of the Gait keeper mutation, suggesting that, in addition to promoting pace, it also inhibits the transition from trot/pace to gallop and thereby allows horses to trot and pace at very high speed without galloping, which is the natural gait at high speed but not allowed in harness racing. The Gait keeper mutation has a very strong positive effect on racing performance (44), and a diagnostic test for this mutation is now used for horse breeding.
We performed a worldwide screen of 4,396 horses representing 141 breeds for the presence of the DMRT3 mutation (45). It is present in breeds spread across Eurasia and is also widely spread in breeds originating from North and South America. It is absent from breeds used as draught horses and in breeds where the gallop is the most important gait, such as Thoroughbred horses. The worldwide distribution indicates that horses carrying the Gait keeper mutation have been highly appreciated most likely because of their ability to offer a smooth ride when horses were the only means of longdistance transport and for trotting at high speed in front of small carriages. The worldwide distribution in combination with the presence of ancient sculptures showing horses with alternate gaits ( Figure 2F) suggests that the Gait keeper mutation arose more than 2,000 years before present.
The DMRT3 mutation is another example where our genetic analysis allowed us to identify the causal mutation without any prior knowledge about gene function. When we discovered this locus the function of the DMRT3 protein was unknown. It belongs to a small family of transcription factors, and the most well-studied member (DMRT1) has a critical role in sexual development in vertebrates, so the assumption was that the closely related DMRT3 protein had a similar role. We explored the function of DMRT3 in collaboration with Klas Kullander and his group at Uppsala University that has a strong expertise in spinal cord neurobiology. DMRT3 is expressed in a specific subset of neurons in the spinal cord (now named DMRT3 neurons) in mice and horses (44). These neurons were classified as interneurons as their axons crossed the mid-line of the spinal cord and they are inhibitory neurons that make direct contacts with motor neurons. Thus, these characteristics are in perfect agreement with the genetic data suggesting that these neurons play a critical role in co-ordinating muscle contractions during locomotion. Interestingly, another group had previously developed a Dmrt3 knock-out mouse but did not notice any striking phenotype (44). In the light of the horse data and the characterization of the DMRT3 neurons we decided to characterize the pattern of locomotion in these mice that appeared to move normally in the cage. A detailed characterization of the pattern of locomotion on a TreadScan, limb co-ordination of newborn mice, and fictive locomotion of isolated spinal cords revealed severe defects in limb co-ordination in the Dmrt3 knock-out mice (44). Despite severe disturbances in the coordination of limb development at birth, the Dmrt3 null mice are able to move normally under unstressed conditions in the cage implying that the locomotor network in the spinal cord to a large extent can compensate for the loss of DMRT3. The horse and mouse data together with the functional characterization demonstrate that DMRT3 neurons have a critical role for co-ordinating limb movements in vertebrates. It is likely that the horse mutation has a more severe effect than the null mutation present in the knock-out mouse. The horse mutation causes a premature stop codon, and the mutant protein contains only 300 out of the 474 residues in the fulllength protein. The mutant protein may constitute a dominant negative form containing the DNA-binding domain but with defect protein-protein interaction. The horse mutation is partially dominant with a clear phenotypic effect in heterozygotes, albeit milder than the one in homozygotes, whereas no significant phenotypic effect was found in the mice heterozygous for the null allele.
No DMRT3 mutation in humans has as yet been described, but since homozygotes for the horse mutation and the mouse knock-out are fully viable it is very likely that humans lacking functional DMRT3 expression exist. The prediction is that they have a mild defect in the co-ordination of limb movements. In fact, there are humans that tend to pace rather than performing the diagonal movement of legs and arms, but no investigation has yet been carried out if this has a genetic basis. However, there is a previously described 225 kb deletion upstream of DMRT3 that causes a dominant form of a congenital neurodegenerative disease resembling cerebral palsy (46). It is possible that this is due to a defect in DMRT3 function. This large deletion disrupts the KANK1 gene, and one of the deletion breakpoints occurs just upstream of the DMRT1 gene, which in turn is located just upstream of DMRT3. Thus, it is possible that the deleted region contains negative regulatory elements and the elimination of these leads to overexpression or ectopic expression of DMRT3 and/or DMRT1; altered expression of DMRT1 may disturb DMRT3 function since they are expected to interact with identical or very similar DNA sequences (47). It is likely that a disturbed development of DMRT3 neurons, which are classified as inhibitory neurons controlling the firing of motor neurons, may have severe deleterious effects on limb co-ordination. This hypothesis can be tested using animal models carrying targeted mutations mimicking the human mutation.

Concluding remarks
Genetic and genomic studies of domestic animals are well justified due to their agricultural importance. This review illustrates how research on domestic animals can also contribute with new basic knowledge concerning gene function and biological mechanisms. Genetic studies on domestic animals give a complimentary view on genotype-phenotype relationships compared with what we have learned from studies on humans and experimental organisms. The reason is that domestication and breeding are evolutionary processes with strong phenotypic selection over thousands of years. The fact that this is the most extensive genetic screen that has been accomplished ensures that studies of domestic animals will continue to enrich the field of biomedicine.