East Anglian early Neolithic monument burial linked to contemporary Megaliths

Abstract In the fourth millennium BCE a cultural phenomenon of monumental burial structures spread along the Atlantic façade. Megalithic burials have been targeted for aDNA analyses, but a gap remains in East Anglia, where Neolithic structures were generally earthen or timber. An early Neolithic (3762–3648 cal. BCE) burial monument at the site of Trumpington Meadows, Cambridgeshire, UK, contained the partially articulated remains of at least three individuals. To determine whether this monument fits a pattern present in megalithic burials regarding sex bias, kinship, diet and relationship to modern populations, teeth and ribs were analysed for DNA and carbon and nitrogen isotopic values, respectively. Whole ancient genomes were sequenced from two individuals to a mean genomic coverage of 1.6 and 1.2X and genotypes imputed. Results show that they were brothers from a small population genetically and isotopically similar to previously published British Neolithic individuals, with a level of genome-wide homozygosity consistent with a small island population sourced from continental Europe, but bearing no signs of recent inbreeding. The first Neolithic whole genomes from a monumental burial in East Anglia confirm that this region was connected with the larger pattern of Neolithic megaliths in the British Isles and the Atlantic façade.

Analysis, which does not take into account either reservoir effects or bone 23 turnover/remodelling but is still broadly robust, using Bcal (https://bcal.sheffield.ac.uk; (Buck 24 et al. 1999)) indicates that the period of burial at Monument probably began between 3763-25 3662 cal BC) and ended between 3626-3484 cal BC (both 68% highest posterior density 26 (HPD). It has been suggested that the burial chamber is unlikely to have been operational for 27 more than 50 to 75 years ( (Evans et al. 2018)p. 80), while analysis suggests that it was in use 28 for at least 28 (95% HPD) or 46 (68% HPD) years. Concerning the two individuals with a 29 full-sibling relationship there is a 96.5% probability that Skeleton 1 died before Skeleton 4 30 and at a 68% HPD Skeleton 1 probably died 28 to 105 years before Skeleton 4, while at the 31 95% HPD Skeleton 1 probably died between 23 and 293 years before Skeleton 4 (Bcal 32 analysis) 33 34

Preparation of samples for Isotopic analysis 37
Samples for carbon and nitrogen isotope analysis were prepared using a modified version of 38 the Longin method (Longin 1971). Approximately 500mg of bone was cut using a hand-held 39 dremel drill with a diamond tipped cutting wheel. The bone surface was then abraded using a 40 precision sandblaster to remove surface dirt and contaminants. Samples were placed into pre-41 weighed glass test tubes and approximately 8ml of 0.5M aq. HCl was added. The samples 42 were agitated twice a day and 0.5M aq. HCl replaced as appropriate until full 43 demineralisation. When all samples were demineralised, samples were rinsed 3 times with 44 distilled water, then the distilled water was removed and approximately 8ml of pH 3.0 water 45 was added. The samples were then heated in an oven for 48hrs at 75°C until gelatinised. Once 46 partially cooled, the supernatant liquor was filtered using Ezee filters, frozen, then lyophilised. 47 Analysis was carried out using a Costech automated elemental analyser coupled with a 48 Thermo Finnigan Delta V mass spectrometer in the Godwin Laboratory, Department of Earth 49 Sciences, Cambridge. All samples were run in triplicate (0.8mg ±0.1mg). Data Carbon and 50 nitrogen stable isotope values are expressed as delta values (e.g. δ 13 C) on the VPDB and AIR 51 scales, for carbon and nitrogen respectively (Coplen 2011). 52 The 'quality' of the collagen was deemed acceptable if the C:N fell within the accepted range 53 of 2.9 -3.6 (DeNiro 1985). The preservation and purity of the collagen was deemed 54 acceptable if the samples contained more than 1% collagen by mass, with a composition of 55 more than 13%C and 4.8%N (Ambrose 1990 The samples were shotgun-sequenced on the Illumina NextSeq500 using single-end 75 base 103 pair kit. Before mapping, the sequences of adaptors and indexes and poly-G tails occurring 104 due to the specifics of the NextSeq 500 technology were removed from the ends of DNA 105 sequences using cutadapt 1.9 (Martin 2011). Sequences shorter than 30 bp were also removed 106 with the same program to avoid random mapping of sequences from other species. 107 The sequence reads were mapped to reference sequence GRCh37 (hg19) using Burrows-108 Wheeler Aligner (BWA 0.7.12) (Li & Durbin 2009) command aln with seeding disabled. 109 After mapping, the sequences were converted to BAM format and only sequences that 110 mapped to the human genome were kept with samtools 1.9 (Li et al. 2009). Next, multiple 111 bams from the same individual, but different runs were merged using samtools merge, reads 112 with mapping quality under 30 were filtered out and duplicates were removed with picard 113 2.12 (http://broadinstitute.github.io/picard/index.html). sites as well as adjacent sites and calculates a ratio that takes into account ancient DNA 123 damage by excluding positions where a major allele is C or G and the minor is T or A 124 respectively. 125 On average, the samples showed 48% -52% C > T substitutions in the first five base pairs of 126 the 5' ends (Table S7). The mtDNA contamination estimate for samples ranged from 0.4% to 127 1.22% with an average of 0.92% (Table S2). The average of the two X chromosome 128 contamination methods was between 0.19 and 2.27% with an average of 0.71% (Table S2). 129 reported for relationships between the TRM10 samples and the comparative individuals in 202 Table S6. formatting the genotype information using PLINK 1.9 and R (R Core Team 2018). The 233 missing genotypes were coded as "NA". 234

Calculating general statistics and determining genetic sex
The same pipeline was used to extract the allele information for other five variants: one SNP 235 responsible for the lactase persistence in the European populations (rs4988235 in MCM6), 236 three SNPs involved in the protection against leprosy (rs5743618 and rs4833095 in TLR1 and rs3135388 in HLA) and one 32bp deletion responsible of the HIV resistance (rs333 in CCR5). 238 Since indels are not present in the HRC reference panel, we used the 1000 Genomes samples 239 in the Beagle -gt step to impute the rs333 variant. 240 Concordance between global and local imputation is listed in Table S12. 241 242

Modern Descendants Analysis 243
Anthony Wilder Wohns 244 The program tsinfer seeks to infer sequences of trees representing the complete gene 245 genealogy of a sample using a two-step process: first, the ancestral haplotypes which 246 contributed material to the samples are estimated using heuristic methods. Second, a Li and 247 Stephens copying process is employed to show how sampled genomes copy genetic material It is important to note that this copying process does not necessarily mean that the 263 Trumpington individuals were direct genetic ancestors of the modern samples. Instead, it 264 indicates that modern samples descend from individuals genetically indistinguishable from the 265 Trumpington individuals over some parts of their genome. 266 The resulting tree sequences contain considerable detail about the ancestry of the ancient and 267 modern samples at every site included in the analysis. To gain a high-level understanding of 268 the genomic legacy of the Trumpington individuals we computed the genomic descent 269 statistic, described in the next section. 270 The code implementing this analysis can be found at https://github.com/awohns/neolithic-271 tsinfer. 272

Genomic Descent Statistic: 273
We define the genomic descent statistic quantifying the relative proportion of genetic material 274 in each population which copies from an ancestral node u.
Let R be a list of k sets of nodes. In our application, R is a list of the populations in the 1KGP 276 or SGDP, with set Rk containing the samples in population k. For node i in reference set Rk, 277 define , as the total span of node i which copies from node u on chromosome c. The 278 statistic is averaged over the portion of genome where the node is ancestral to any sample 279 from the sets in R, . Finally, it is normalized by the size of reference set, |Rk|. Thus, the 280 genomic descent statistic of node u in reference set k on chromosome c is: The height of the stacked bars in Figure 2B reflects the total genomic descent statistic for each 291 population (the value of Equation 2). The relative size of each coloured section of the stacked 292 bars in Figure 2B reflects the value of Equation 1. Thus, while the total height represents the 293 genomic inheritance from the Trumpington haplotypes, the relative size of coloured regions 294 within the bar shows the contribution from each chromosome relative to the length of that 295 chromosome. In other words, long chromosomes such as chromosome 1 will not necessarily 296 show a greater relative contribution than smaller chromosomes. This has been done to 297 highlight any differences between chromosomes in terms of their proportion of genomic 298 descent from Trumpington haplotypes. The evenness of the distribution of colours in the 299 different bars indicates that most chromosomes support the same pattern of genomic descent. 300 Large deviations between chromosomes, for example in chromosome 22 in the SGDP Itelman 301 population, may be due to phasing or imputation errors in ancient or SGDP samples (see 302 Kelleher et al. 2018 for evidence of phasing errors in some 1KGP and SGDP samples). 303 Variance due to small reference set size (for instance, only two Itelman individuals are 304 included in the SGDP) or small chromosome size may also be potential explanatory factors. 305 parameter settings described previously (Gamba et al. 2014). X axis is the number of 419 total lengths of ROH over 1.6 megabase pairs, Y axis is the number of total lengths 420 over 500 kilobase pairs but less than 1.6 megabase pairs. Ancient samples are plotted 421 against comparative samples from the 1000