Purkait’s triangle revisited: role in sex and ancestry estimation

Abstract Identification of unknown remains recovered from marine and terrestrial locations is a significant humanitarian problem. This investigation proposes a simple method applicable to fragmentary femora for a more refined level of ancestry and/or sex estimation. To that end, we re-examined Purkait’s triangle which involves three inter-landmark distances between the traction epiphyses and the articular rim of femoral head. A large sample (n = 584) from geographically diverse (Egyptian, Indian and Greek) populations was compiled. Additionally, shape (n = 3) and trigonometrically derived variables and ratios (n = 9 variables) were employed to detect any geographically-clustered morphological differences between these populations. Random forest modelling (RFM) and linear discriminant function analysis (LDA) were employed to create classification models in instances where sex was known or unknown. The sample was apportioned into training and test sets with a ratio 70/30. The classification accuracies were evaluated by means of k fold cross-validation procedure. In sex estimation, RFM showed similar performance to LDA. However, RFM outperformed LDA in ancestry estimation. Ancestry estimation was satisfactory in the Indian and Egyptian samples albeit the Greek sample was problematic. The Greek samples presented greater morphological overlap with the Indian sample due to high within-group variation. Test samples were accurately assigned to their ancestral category when sex was known. Generally, higher classification accuracies in the validation sample were obtained in the sex-specific model of females than in males. Using RFM and the linear variables, the overall accuracy reached 83% which is distributed as 95%, 71% and 86% for the Egyptian, Indian and Greek females, respectively; whereas in males, the overall accuracy is 72% and is distributed as 58%, 87% and 50% for the Egyptian, Indian and Greek males, respectively. Classification accuracies were also calculated per group in the test data using the 12 derived variables. For the females, the accuracies using the medians model was comparable to the linear model whereas in males the angles model outperformed the linear model for each group but with similar overall accuracy. The classification rates of male specific ancestry were 82%, 78% and 56% for the Egyptian, Indian and Greek males, respectively. In conclusion, Purkait’s triangle has potential utility in ancestry and sex estimation albeit it is not possible to separate all groups successfully with the same efficiency. Intrapopulation variation may impact the accuracy of assigned group membership in forensic contexts. Key pointsPurkait’s method is a possible ancestry group indicator applicable to fragmentary femora.Random forest model surpassed linear discriminant function analysis in multi-group ancestry classification.Ancestry is more accurately assessed in females than males.The intertrochanteric distance is the most important feature in discrimination of sex whereas in ancestry it was the head to lesser trochanter distance.Sex differences override ancestry due to the tendency of misclassification into same sex but different group rather than the opposite sex of the same ancestry.


Introduction
Sex and geographic ancestry are fundamental components of the biological profile of decedents that should be established in order to expedite the settlement of justice [1][2][3]. Ancestry is defined as the geographic region of origin of an individual which is not interchangeable with race [3,4]. Ancestry estimation in forensic anthropology generally aims to classify an unknown individual to the most likely geographic origin using statistical classification models [1]. Demonstrably valid methods may aid in the positive identification of unknown remains encountered in challenging forensic circumstances such as cases of mutilation, commingling or dispersal where only single skeletal element or fragments could be analyzed [2,3].
The impetus for ancestry assessment in research and practice is halted by the variations in global practice of forensic anthropology because of political issues, demographic composition of the country as well as regional medico-legal demands [5,6]. Nevertheless, it is a typical requirement of international forensic casework in natural or manmade disasters and humanitarian settings as a consequence of the complex systems of interrelations [5,7,8]. The domestic and international laws and crime prevention are the building blocks for establishing security, human rights and peace in the world [9]. The conducted studies have been employed to distinguish between the broad socially constructed ancestral group in North America [2], South Africa [10] and Balkan region [11] with limited capacity to differentiate between only two to four groups within the same geographical area which poses a significant problem for the other populations in the world [12].
In the Middle East and North African (MENA) region, Egyptian labour migration is the principal migration pattern to Arab region and Europe. India has the largest diasporas in the world. The US, UAE and Saudi Arabia host the Indian migrants in addition to other gulf states since the "oil boom" in the 1970s [13,14]. Over the last few years the number of migrants residing in the Gulf Cooperation Council (GCC) countries has increased considerably where non-nationals outnumber national citizens in some countries [15]. By examining the composition of Gulf labour market, Egyptians and Indians represent the major constituents of the demographic, economic and cultural fabric of the GCC countries [16]. Subsequently, the traditional forensic casework may involve unknown remains/victims from both populations.
On the other hand, Greece is the fourth country in the European Union as a destination for refugees and asylum seekers from the Arab region according to the Missing Migrants Project [17]. The humanitarian tragedy due to a loss of human life at sea is particularly linked to the irregular migration surge through maritime routes such as those crossing the Mediterranean. Between 1988 and 2013 a reported 14 309 people died in an attempt for migration using low quality vessels [18]. Whilst the majority of maritime fatalities are related to migration, other recent prominent incidents resulting in a large number of individuals died in the sea for example the crash of Egypt Air flight 804 killed 66 passengers in the Mediterranean in 2016 [19].
For possible identification of missing persons from a closed list, investigators must have complete information on the age at death, sex, ancestry, stature and time since death in order to meet the demands of the legal process [20]. Ancestry estimation is a difficult undertaking in forensic anthropology owing to the complexity of the concept and confusion in interpretation [21,22]. The crania provide unique information for positive scientific identification in anthropological analysis [23]. The post-cranial elements also present abundant anatomical features useful for identification [24]. The traditional research focus on the skull particularly the midfacial region, over the post-cranial, bones [1,3,6,[23][24][25]. By contrast, sex estimation is a straightforward procedure using almost any skeletal element [25]. Thus, the pursuit of alternative methods using the appendicular skeletal elements is beneficial not only to provide more corroborative evidence to cranial findings but to make specimens available for use when some cranial information are not recovered [26,27].
Post-cranial metric methods of estimating ancestry have found the most success with the lower limb bones and pelvic girdle, especially the femur [28,29]. The design of the proximal femur, that is, the head, neck and its internal architecture supports the mechanical stress and strain that influence bone remodelling and density (Wolff 's law) [30]. A number of studies have shown marked ancestral differences in the shape of the proximal femur and in at least one trait of the distal femur-intercondylar notch height [31]. Other aspects of the femur, such as the femoral neck length [32], platymeria [11,12,33], and anterior femoral curvature [34], have shown differences among the major geographic populations.
Early models used in sex and ancestry estimation involved classical multivariate analysis techniques such as linear discriminant function analysis in FORDISC 3.0. [35,36] and logistic regression models [36]. Recently, researchers have used a novel algorithm called random forest modelling (RFM) in both craniometric and morphoscopic approaches of ancestry estimation [27,37]. Over the last decade, machine learning algorithms have brought a whirlwind of new insights to human variation. Moreover, they surpassed traditional classification methods in anthropological research, even when all of their statistical assumptions are met [37,38].
In 2005, a method of sex estimation was developed by Purkait [39] that involved three inter-landmark distances taken from traction epiphyses where muscles inserted and connected to the most lateral point of the head of femur at articular rim. The performance of the lengths of the triangle varies among populations as regards to the allocation accuracies and degree of sexual dimorphism [40][41][42]. Inspired by the illustration created by Anastopoulou et al. [40] and linking this with the different classification rates obtained from previous validation studies of the method, we hypothesized that such changes might to some extent underpin the size and shape differences of the proximal femur observed among the three studied populations: Egyptian, Indian and Greek.
Hitherto, no research has been carried out to determine if the anatomical variation in proximal femur as captured by Purkait's triangle can be used to reconstruct ancestry of unknown individuals. To that end, the construction of a large identified sample from geographically distant populations in conjunction with an intelligent statistical modelling technique by means of RFM would quantify the morphometric characteristics of the femur in each group and also likely indicate the accuracy of the ancestry estimates. We also compared the classification ability of RFM with models generated through linear discriminant function analysis (LDA). If the overall classification accuracy is demonstrated to remain higher than chance alone, then Purkait's triangle can be used as a criterion for estimating the ancestry of the remains, thus greatly contributing to forensic studies on identity reconstruction.

Reference samples
In order to evaluate the applicability of Purkait's triangle for sex and/or ancestry estimation, we used matched femoral metric data of identified individuals from three geographically distant populations, namely the Egyptian, Indian and Greek samples. Overall, 584 individuals were analyzed, with a total of 352 males and 232 females representing a sex ratio of 1.5:1. Table 1 depicts the sample size, characteristics and demographic data.

Metric data collection
The femoral Digital Imaging and Communications in Medicine (DICOM) datasets were obtained from a 64-slice multi-slice Computerized tomography (CT) scanner manufactured by Siemens Somatom  Perspective (Siemens Medical Solutions USA, Malvern, PA, USA) and based on slice thickness of 1.5 mm slices. Three dimensional (3D) reconstruction of proximal femur was performed using the RadiAnt DICOM Viewer (v.5.0.2 64-bit; Medixant, Poznań, Poland) for Windows [44]. The different aspects of the dataset were interactively explored in the 3D VR (volume rendering) tool window, rotated 180° to visualize the posterior aspect of the femur in order to obtain three inter-landmark distances of Purkait's triangle following the description in the original paper of Purkait [39]. The acquisition of measurements was conducted by a single observer (the third author) to eliminate inter-observer errors. The measurements of the Indian [39] and Greek [40] samples were previously collected manually using Dial and Mitutoyo digital callipers of 0.01 mm accuracy for the published original studies, respectively.
Measurements were exclusively taken on the left proximal femur with the exception of the Indian sample which were taken from both sides and only one side of each was included [39]. We excluded cases with history of bilateral pelvic/femoral fractures, deformity, previous hip operation, age-related diseases and bone tumours. Figure 1 represents a graphical representation of the three measurements.

Evaluation of the technical error of measurements
Since the current study is retrospective in nature, a unified strategy in error quantification was not possible. Intra-observer error was evaluated for the measurements obtained from the Egyptian virtually reconstructed femur models using the technical error of measurement (TEM), relative TEM (rTEM) and the coefficient of reliability (R) [45]. Prior to data collection, a subsample of 10 femora were randomly selected representing approximately 10% of the total sample size (the recommended range is 10%-20%) [46]. Each variable was measured twice, and each set of observations was made 1 month apart using the same method of measurement.

Evaluation of the degree sexual dimorphism and inter-population comparisons
Descriptive statistics of the three measurements were computed in order to provide a characterization of the reference samples analysed in the current study. One-way ANOVA was performed to evaluate the relationships of each variable in Purkait's triangle with sex and ancestry. Variables were tested for normality (Kolmogorov-Smirnov test) for the two groups (males and females) in each population sample. Separate ANOVA were run on male and female individuals of each population due to sex differences in size. Tukey honestly significant difference (HSD) post hoc tests were used to determine which ancestral groups differed from one another by each variable in terms of the size and patterns of sexual dimorphism. The coefficient of variation (CV) was computed using the formula [(SD/x × 100]. It was implemented in order to compare the normally distributed data with respect to their variability [39,47].

The variables
The three inter-landmark distances and other variables (n = 9) including the angles, the angle to measurement ratios and the medians have also been considered. The angles were defined by the linear measurements where Angle A, B and C are opposite to BC, AC and AB. The angles were calculated using the Law of Cosines [48]. Ratios between the linear measurement and angles were used because a larger individual may have a larger triangle with no measurable change in the angles of the triangle. Therefore, the angle data relative to the triangle lengths may be more efficient predictors of sex and/or ancestry. The ratios were employed following Albanese et al. [48] as: • The ratio of Angle A divided by the length AB multiplied by 100. • The ratio of Angle B divided by the length BC multiplied by 100. • The ratio of Angle C divided by the length AC multiplied by 100.
Medians of triangle were calculated following the Apollonius' theorem [49]. These variables were pooled and employed in a single model to evaluate their efficiency in improving the accuracy of the sex and ancestry classification tasks. Medians consider the distances between the femoral head, greater and lesser trochanters in relation to each other.
Three additional shape variables ABsh, ACsh, BCsh were computed using Darroch and Mosimann (1985) method [50]. The shape variables are the three original variables divided by the geometric mean for each case. Working with the shape variables control for size-related shape differences, therefore sexes can be combined for ancestry estimation.

Modelling techniques
Each group of the three linear and the newly derived trigonometric variables of Purkait's triangle were modelled for sex and/or ancestry estimation using the LDA and RFM included in the Scikit-learn [51] in Python (version 0.22) [52].
LDA calculates the multivariate distance between groups using the group-specific means and a pooled within-group variance-covariance matrix (VCM), then  classifies the observation discriminant score (expressed as a linear combination of a set of three variables from Purkait's triangle or the pooled variables) within one of the several groups under experiment depending on whether the score is higher or lower than the threshold. LDA requires fulfilment of three statistical assumptions: (i) the explanatory variables within each class should have a multivariate normal distribution; (ii) equal VCMs of the groups; (iii) low correlation between explanatory variables [38]. RFM is trained using bootstrap aggregation (Bagging) which is the process of repeatedly testing randomly drawn samples from the original training data (bootstraps), repeating the process and refining the model over several trees and then aggregating the models learned on each bootstrap. A subset of the variables is selected randomly and whichever variable gives the best split is used to split the node iteratively. After creating all the decision trees in the forest, out-of-bag (OOB) error is the average prediction error for each training sample calculated using predictions from trees that do not contain this particular sample in their respective bootstrap sample. This allows the model to be fit and validated while being trained. OOB error is an estimate of generalization (prediction) error of unseen data. Bagging technique has methods for balancing errors in training datasets where classes are imbalanced. Further, it overcomes variance (overfitting) by training multiple decision trees on different subspace of the feature space at the cost of slightly increased bias (i.e. underfitting due to systematic under or over prediction of the target groups) [37].
Two variable importance measures (VIMs) or rankings are calculated during the analysis: mean decrease in accuracy (MDA) and mean decrease in impurity (MDI) [53]. In both cases, the higher a variable is on the visualization chart, the more important it is determined to be. MDA is based upon the mean decrease of accuracy in predictions on the OOB samples when a given variable is permuted (randomly changed) in the OOB samples when passed down the trees of the model. It is a direct measure of the impact of each feature on the accuracy of the model. MDI is defined as the total decrease in node impurity averaged over all trees of the ensemble calculated for each variable separately and the features are ranked according to this measure. Nevertheless, calculation of the MDA is the most advanced way to measure importance, while the MDI is more like a proxy because it is influenced by the correlation between variables and it depends on the data [54] as any of these correlated variables can be used as the predictor reducing the importance of others.
The sklearn.random_projection module was implemented as a computationally simple and efficient way to reduce the dimensionality of the data in order to visualize them in a 2D plot. The dimensions and distribution of random projections matrices are controlled so as to preserve the pairwise distances between any two data points in the dataset in the original space. Thus random projection is a conventional approximation technique for biological distance based studies [55].

Experiments design, tuning and assessment of performance
The present study builds methodologically upon Liebenberg et al. [10], which runs separate analyses to classify the remains according to (i) sex only within the three populations pooled, (ii) ancestry only by geographic level with sexes pooled, (iii) sex and ancestry simultaneously, and (iv) sex-specific ancestry where the recorded sex allowed the partitioning of the pooled dataset into male and female groups to train the models separately to gauge the effect of prior knowledge of sex on the correct identification of ancestry [10]. The training/test sets design was employed in which each model was developed from 70% of the sample and tested on the remaining 30%. The number of individuals in each experiment varied by the model according to the intended classification task. In the multi-class situation, the One-Vs-All decomposition scheme was implemented to distinguish between a single class (positives) and the remaining ones (negatives) [56]. Table 2 shows that AB measurement had the highest magnitude of measurement error in relation to the size of the measurement (rTEM = 3.38%) with an intermediate margin of error as demonstrated by the value of TEM (0.97 mm). This was followed by the AC measurement, which had an rTEM of 2.77% and a relatively higher TEM = 1.10 mm. Both measurements exhibited coefficient of reliability (R) equal to 0.95. The lowest error was achieved in the Table 2. results obtained for the absolute technical error of measurement (TeM), the relative technical error of measurement (rTeM) and the coefficient of reliability (R) showing intra-observer reproducibility for the subsample n = 10 (two repeats). refer Figure 1 for the definitions of aB, Bc, and ac. These results are within the recommended range of human error [57,58].

Evaluation of the sexual dimorphism and inter-population comparisons of the reference samples
The descriptive statistics of each variable by geographic origin and in the pooled sample are summarized in Table 3. All measurements are statistically significantly different between the males and females with the exception of AB in the Greek sample. The differences between samples in the dimensions of Purkait's triangle within each sex and pooled sex groups were examined using one-way ANOVA as shown in Table 4. Statistically significant differences in the three dimensions among the three groups were found in each sex and pooled sample. Post hoc comparisons showed that significant differences consistently exist in AC diameter among all three population groups in both males and females. Moreover, the Indian males have a significantly longer AB and shorter AC distances than those of the Greeks. Thus, the superior point on the greater trochanter was high relative to the head. Further, Greek males have the highest within-sex variation, especially in the AB diameter whereas Indian males scored the lowest CV in all measurements (Table 3). Indian females have a significantly shorter AC distance than Greeks. Although the BC and AB measurements are not significantly different, the CV of the BC and AB were the highest and the second highest values among Indian females. Given that AB and AC dimensions are the shortest in Egyptian females and males, this means that the neck length of Egyptian females and males is on average shorter than that of the Indians and Greeks as measured from the head border dipping most laterally (Tables 3  and 4). These differences are indicating the existence of morphological differences of proximal femur among the three samples.

Performance of linear discriminant function vs. RFM in sex and ancestry estimation
Supplementary Tables S1-S6 provide the classification accuracies for training and test samples, by the individual models. Supplementary Table S1 depicts the classification accuracies of sex only models using the three variables of Purkait's triangle. Results in the test sample showed that the performance of RFM was generally comparable to the LDA model (the overall accuracies were 80% and 81%, respectively), with a higher misclassification rate of females than males by both models. Some of the newly generated variables showed different performance from the linear ones ( Figure 2). In LDA and RFM, the medians model performance was more or less similar to the linear variables models with slight improvement in male classification accuracy. The females classification accuracies using the angles models in both LDA and RFM greatly declined in favour of more accurate male classification rates. It is not surprising that the ratio of angle to linear variables models showed an intermediate performance between the linear and angles models. For Table 3. summary statistics (in mm), coefficient of variation (cV), sexual dimorphism index (sDi) (reported in %) and anoVa on df = 1 for three measurements by sex. refer Figure 1 for the definitions of aB, Bc, and ac. the pooled variables LDA model, the sex allocation accuracies improved by 3% and 2% in males and females, respectively. In the RFM pooled variables models, a similar performance to the linear model in the overall accuracy and female sex allocation was obtained with a trivial decrease in male sex accuracy by 1%. However, LDA and RFM for ancestry estimation did not show equal performance. In Supplementary  Table S2, the ancestry only models based on Random forest outperformed LDA with accurate classification of 95% versus 65% of the training samples to the appropriate ancestral group whereas in the test sample, comparable overall accuracy was obtained being 67% versus 66% of the test samples. Looking at each group separately in the test sample, the Egyptians were accurately classified 87% (LDA) and 77% (RFM) of the time, while the Indian were accurately classified 79% of the time by both classifiers. Only 39% of the Greeks were correctly classified by LDA but RFM improved the accuracy to 46%. The models including the newly computed variables, however, resulted in 81%-87% correct group membership versus 77%-87% for the Egyptians using LDA and RFM, respectively and for the Indians 75%-80% versus 82%-85% using LDA and RFM, respectively. For the Greek sample, 38%-39% versus 49%-54% were correctly classified using LDA and RFM, respectively. Surprisingly, the pooled variables model showed 15% higher accuracy for the Greek sample using RFM which outperformed the LDA pooled variables for the same sample. Generally, the pooled variables models improved the classification accuracies over the linear models using LDA and RFM (Figure 3). In Supplementary Table S3, the shape variables models resulted in 65% (LDA) and 62% (RFM) overall classification accuracy in the test sample albeit the Greek sample showed the lowest accurate classification in both classifiers being 31% versus 35% for the LDA and RFM, respectively.
In Supplementary Table S4, the six-way classification models of sex and ancestry simultaneously classified only 53% and 56% of the test sample using the LDA and RFM, respectively. While an accuracy of 53%-56% is considered low, in the six-way classification, chance is calculated as 16.7% (using six groups and assuming equal prior probabilities). Hence, accuracy is approximately 40% greater than chance alone. The best performance by LDA and RFM was encountered in the Egyptian females and Indian males followed by Indian females and Egyptian males. Greek males showed the lowest accuracies with 23% and 33%, using LDA and RFM, respectively. Generally, RFM achieved higher accuracies in the overall accuracy and each subgroup but with slightly lower Egyptian females and Indian males accuracies than LDA leading to more balanced classification accuracies without bias towards certain group. Classification accuracies were also calculated per group for both classifiers in the test data using the new variables. A drop in the females allocation accuracy regardless their ancestral origin was noted by both classifiers using the angles and subsequently the ratio models. The medians and to lesser extent the pooled variables models achieved comparable or slightly better performance than the linear variables models using both LDA and RFM (Figure 4).
By considering ancestry estimation when the sex is not pooled, Supplementary Tables S5 and S6 demonstrate that the RFM outperformed LDA in the overall classification accuracies as well as the classification rate for the Greek males and Indian females resulting in a more consistent classification accuracies among the groups in each sex. In   general, higher classification accuracies were achieved in the known female sex model than in the known male sex model. While these two models require sex determination prior to ancestry estimation, the known sex model could be useful in circumstances where there is a higher likelihood of finding certain sex remains and in closed crime scenes contexts or where other traits indicate one sex either male or female, but further assessment of ancestry is needed. By comparing the performance of the new variables to the linear variables in the females experiments, the LDA pooled variables model achieved higher classification accuracy rates than the linear variables model, whereas RFM demonstrated more or less similar performance using the medians to the linear models ( Figure 5). In male ancestry estimation, both LDA and RFM angles models outperformed the linear variables model. RFM achieved the highest Greek male classification with accuracy of 56% versus 26% using the LDA model ( Figure 6). Figure 7 is labelled according to (i) ancestry, (ii) sex and ancestry to illustrate the extent of overlap among the geographical ancestry groups. As seen in Figure 7A, a substantial overlap between the Indian and Greek groups is revealed, Figure 3. comparison between the different variables models using linear discriminant function analysis (LDa) and random forest modelling (rFM) on test data in ancestry only experiment. rFM outperformed LDa in all the models. an improvement in the overall as well as per group classification accuracies can be visualized in the plot by rFM pooled variables model. The pooled variables LDa model is nearly similar to the linear variables rFM model.  explaining why the Greek individuals were more prone to misclassification into the Indian group. The Egyptian sample is clearly separated from the two other samples. As such, the highest classification rates were obtained by LDA and RFM models in the Egyptian group in the ancestry only and ancestry in females experiments.
In the sex-ancestry cohorts, the plot in Figure 7B reveals the considerable overlap between the males in the Indian and Greek populations explaining why Greek individuals are more likely to misclassify for the Indian group. This indicates the greater overlap among the populations by ancestry than by sex. Therefore, individuals of the same sex are more likely to misclassify into a different ancestry group rather than the opposite sex of the same ancestry group. However, Egyptian males and Greek females are still more likely to misclassify to the opposite sex rather than misclassifying according to ancestry (into the incorrect sex). The confusion matrices in the Supplementary Figures S1-S5 reveal these patterns. Visualizations of the variable importance measures in RFM of each experiment are included as Supplementary  Figures S6-S11.

Discussion
In the present study, contemporaneous Egyptian population metric data were compiled to construct a virtual skeletal database using 3D volume rendered computed tomography (3D-CT) technology while overcoming the cultural and legal constraints, mitigating the time-consuming nature and tediousness of skeletal maceration or the need for physical storage space [59][60][61]. One notable contribution of this work is emphasizing on the importance of collaborative research and data sharing which are invaluable for the development of new databases. The availability of such virtual databases are paramount to account for modern human skeletal morphological variations, particularly for the identification of human remains in modern forensic cases. Hence, the goal of the current project was the distillation of any presumed patterns of morphological dissimilarity in these broadly sampled populations.

Sex estimation
In the sex classification task, RFM achieved better sex-specific and overall accuracy than the application of the population specific function on each population (Table 5). Generally, the overall accuracy obtained from the generalized models are within the range of 81%-87% reported in the published population specific studies [39][40][41][42]48]. A relatively high sex bias towards male sex allocation in the sex only model was observed. Since the mean values for the female measurements of the BC in the Egyptian, AC in the Greek, and AB in the Indian populations variables were slightly larger than the mean values of the variables in the pooled data, we would expect a decreased accuracy in females because there will be, on average, more females larger in size and thus increase the overlap with smaller males. Together with the higher coefficient of variation of BC measurement (most important feature) in the pooled female sample (9.56%) than male samples (8.41%), this may lead to overestimation of the measurements in a considerable number of females in each population [62].
Brown and colleagues [41] found that BC diameter was determined to be the most important variable in the pooled ancestry sample from the Terry  collection, a finding consistent with our results using RFM. Similarly, Anastopoulou et al. [40], Purkait [39] and Djorojevic et al. [42] achieved higher accuracy using the BC variable than the other variables. All these studies [39][40][41][42] also reported higher classification accuracy when the three variables were employed simultaneously in models. While the BC dimension reflects upper body weight transmission and muscular development as insertion regions of muscles, the AC diameter reflects the femur neck length and position of the lesser trochanter relative to the femur head [40]. Arguably, several clinical orthopaedic and bioanthropological studies [30,32] suggested the presence of both sex and population differences in the femur neck axis length. These differences stem from the fact that the femur neck axis length is correlated with the general femoral strength [32]. Thus, it is not surprising that the AC dimension was selected as the most important variable in ancestry estimation and the second most important variable in sex estimation. Furthermore, it is obvious that there are different size variables operating in the discrimination of sex and ancestry [26].

Ancestry estimation
In the ancestry classification tasks, several trends can be observed. First, the Indian group demonstrated better resolution from the Egyptian and Greek groups using Sex Pooled and Sex Unknown attempts. Surprisingly, 54% of femora from the Greek sample were classified into the Indian group using the Sex Unknown and 50% for the Greek males and less than 14% for the Greek females using the Sex Known information (see the confusion matrices in the electronic supplemental materials and Supplementary  Tables S4 and S5). The Egyptian sample showed overall better classification accuracies than the Greek sample for both males and females. The Egyptian females and males have a shorter femur neck on average than individuals of both sexes in all the other populations that have been studied.
On the other hand, the lower classification rate for the Greek male sample is likely due to the substantial overlap existing between the Indian and Greek males but not in their female counterparts where the pattern is reversed due to the high intrapopulation variation in the femoral morphology in the Greek males and Indian females ( Figure 7B and Tables 3 and 4). Nevertheless, Ousley and associates [63] and Witherspoon et al. [64] acknowledged the presence of large amounts of within-group variation leading to modest differences between populations, however, accurate group allocation is still possible.
Anastopoulou et al. [40] presented a schematic comparison of the Purkait's triangles of the right femora for each sex in four populations: Greek, Indian, European-American and African-American populations. They observed that while the triangle for Greek males does not differ significantly in terms of size and shape from the other populations, the triangle of Greek females is considerably different in both aspects. This may explain the successful ancestry estimation among the Greek females better than males.
In our study, the shape variables adequately separated the Egyptians and Indians from the other ancestral groups, however, in Greeks the classification accuracies fall below the linear variables. The shape of the proximal end of femur was quantified as a vector of ratios: each of the three measurements were divided by a standard size variable (geometric mean). A standard size variable quantifies the overall size of the bone and scales linearly in relation to the raw measurements [50]. Thus, it preserves the aspect ratio without removing the allometric effects of size variation. This method also identified cases with similar shape but have different sizes [65][66][67] (indicated by the high misclassification rate of the Greeks to the Indians group being 69% and 66% for the LDA and RFM models, respectively). Since the possibility of genetic admixture can be excluded owing to the geographical separation, the results may also reflect the nature of the Athens collection (Greek) samples being from the middle to low socioeconomic classes with variable nutritional status [43,68]. Taking into consideration that points B and C are placed in the area of muscle attachments and upper body weight transmission, the differences in anatomy of this region in terms of size and shape might then be a reflection of social conditions. The distance between traction epiphyses (BC) in the Indian and Greek (males and females) individuals was non-significant, indicating that they were exposed to similar access to strenuous forms of labour [69]. Moreover, the shape variable BCsh was the most important feature in the ancestry only experiment. This may explain the drop in the classification accuracy of the Greek sample. Nevertheless, marked improvement in the overall classification rates with 15% higher accuracy for the Greeks were observed in the ancestry only experiment using the pooled variables model (shape variables were not included).
Second, the calculation of separate estimates of the ancestral groups for males and females in this study had a positive impact on the classification accuracies. This observation is in concert with the recommendation by Komar [70, p. 163] to separate the ancestral categories by sex as a method of increasing reliability. Sex accounts for a major proportion of the variation among groups and ancestry accounts for most of that remaining [71] as shown in Figure 7B. Taking the effects of sexual dimorphism into account through the use of Sex Known models for ancestry analyses achieves better allocation accuracies for Greek males. Moreover, these models require sex to be determined prior to estimating ancestry, which may not be possible in all cases. The most surprising of these results can be seen in the degree of similarity between Indian and Greek male individuals is diminished using the angles models due to elimination of the size effect and the demonstration of the differences in the position of the femoral head to the greater and lesser trochanters (see Section "Evaluation of the sexual dimorphism and inter-population comparisons of the reference samples"). Angles are computed as ratio using only two sides of triangle and this procedure introduces an additional one degree of freedom that allows for more variability. Thus, angles enhanced the classification accuracies. On the other hand, in the sex and ancestry experiment, the Egyptians, Indians and Greek females were misclassified to the opposite sex of the same population rather than other ancestry group using the angles model. This might be relevant to the observation that sexual differences in the femoral neck-shaft angle (NSA) are minor and inconsistent [72]. The same finding was evident in the sex only experiment where the classification accuracies of females dropped using this model. In the Ancestry in females experiment, the medians model was comparable to the linear measurements model achieving the best accuracies in all of the three populations because size may be a significant discriminatory factor in females.
Third, the ancestry of females is more accurately assessed than males using our method. The accuracy of group classification in females was more balanced in each population and higher than in their male counterparts with the exception of the Indian males (source of imbalance in the male group). This observation is consistent with ancestry estimation in several settings using postcranial bones [10,29]. Using LDA, Holliday and Falsetti [29] achieved 100% accurate ancestry classification in females versus 87% of male training samples. Liebenberg et al. [10] reported higher allocation accuracy for black females (70%) than in males (67%) and in coloured females (80%) than in males (73%) whereas in white South African males and females both were classified equally (93%).

Performance of LDA versus RFM
The model accuracy using LDA was comparable to RFM in the binary classification of sex. Male and female groups are nearly linearly separable using a combination of the three explanatory variables [73]. Notwithstanding, RFMs proved to provide better performance than LDA in ancestry estimation (multi-group classification). LDA has long been considered as the standard classification statistic for ancestry [3]. In the present work, RFM demonstrated lower classification bias of the difficult groups, that is, Greek males and Indian females than LDA in the different variables (Figures 2-6). Hence, the predictive performance is reliable due to efficient estimation of the feature importance and test error. The results for the test sample were not as high as the training sample, however, this reflects the coarse resolution of accuracy estimates because small validation samples suffer from larger variance [38] rather than an error in the training procedure (i.e. overfitting) as evidenced from Figures 2-5. Since the three population samples were not proportionally represented in the sample, this may have had an effect on the classification accuracies. A larger validation sample would improve classification [38]. Moreover, in multi-group classification, certain groups will often demonstrate higher classification rates than other groups, with the tendency of some groups to be misclassified into specific groups due to similarities to each other than to other groups, so maximizing correct classification rates may be a challenging task [74].
The results also showed that none of the new variables was found to be significant for sex assessment. This is in agreement with a previous Spanish study where the accuracy of sex estimation did not increase using the angles, areas and ratios between these variables [42]. The model of pooled variables does not improve the accuracy over the traditional linear inter-landmark distances. Interestingly, the pool of variables in LDA approached the accuracy by RFM. This indicates that RFM learn from the complexities in the underlying pattern of the training data and the influence of a predictor variable directly corresponds to its discriminatory power, interaction with other variables, and position in the tree of the ensemble [37,53,75].
While there are no studies exploring the utility of populations differences in Purkait's triangle in the proximal femur for ancestry estimation, a handful of previous studies have explored the shape and size differences in raw measurements and ratios of femur for ancestry estimation [31][32][33][34][35]76]. In 2015, Shirley et al. [76] explored the shape differences between the African and European Americans using LDA and principal component analysis (PCA). They found that the stepwise procedure selected the lesser trochanter-head centre distance among the top 10 significant variables for discrimination between both groups using femur and/or tibia. This particular variable is readily comparable to the AC measurement in Purkait's triangle without the need of complicated procedures to determine the virtual head centre. Moreover, the PCA demonstrated that the greater trochanter area is the most significant site for differences between both groups which is one of the components of the Purkait's triangle.
The results of ancestry estimates in the previous studies vary with the type and number of variables employed, available skeletal elements, as well as the population studied. For example, the utilization of the femur neck axis length (FNAL) alone achieved 46.2% in a six-way analysis of sex and ancestry in American European, African, and Native individuals with complete failure in allocation of American African females. A male sex-specific three-and two-way analysis provided overall accurate classification of 57.1% and 53.2%, respectively [32]. These classification accuracies are lower than the rates of classification achieved by our models.
Furthermore, the findings herein indicate comparable or even better results to the overall accuracy (60%) reported by Lienberberg [10] in a three-way analysis of ancestry regardless of sex in the South African population and the 63% reported by Spradley [24] in the male specific ancestry analysis of North American European and African individuals versus Hispanic male individuals using seven variables namely femur maximum length, femur epicondylar breadth, femur maximum diameter of head, femur anterior-posterior subtrochanteric diameter, femur transverse subtrochanteric diameter, femur anteriorposterior diameter at midshaft, femur circumference at midshaft.
Using multiple postcranial bones in the LDA, better overall results were obtained being 79% and 85% accurate classification in the North American [24] and South African [10] populations, respectively. Dibennardo and Taylor [26] used a combination of 15 pelvic and femoral measurements from White and Black Americans in the Terry Collection and achieved a high predictive accuracy of 95% for sex and ancestry simultaneously. These measurements capture the morphological differences of femur between the African and European Americans as well as the proportional differences in lower limb length to torso length between both groups as related to ecogeographical patterns in body form [29]. Notwithstanding, the inclusion of two variables (i.e. the maximum femoral length and iliac height) demonstrated an overall accuracy of 87% for ancestry prediction regardless of sex. Similarly, Holliday and Falsetti [29] collected the maximum length of the femur in addition to other long bone lengths, trunk height and bi-iliac breadth then devised a sex-specific LDA. The training sample consisted of African Americans and European Americans from the Terry Collection and was tested on a modern sample of forensic anthropology cases. They achieved an overall cross-validation classification rate of 93.5% for American Black and White males and females however, the test sample did not perform well, and it was attributed to secular change. FORDISC 3.0 provided an overall classification rate of 92.2% for modern American Black and White individuals using all postcranial bone lengths or heights available [24,36]. Notwithstanding, the complete skeleton in good condition for the acquisition of these measurements is rarely encountered in real casework due to taphonomic factors [32].
Whilst the findings of these aforementioned studies cannot be directly compared with the results of this study, it is clear that sex and ancestry differences in the morphology of posterior aspect of femur can be discerned. Given the previously reported populations differences in proximal femur morphology, it should come as no surprise that the reported classification rates to the appropriate population group are more than chance but are based solidly on the intelligent classifier (i.e. RFM) employed in the current investigation. This patterning is -in part -thought to be the result of genetic drift as well as natural selection and adaptation to highly varied microenvironments in worldwide populations [24,63,77].
Purkait's triangle method combats several issues encountered in ancestry estimation, for example, the measurements are typically reproducible, and relatively simple, with few number of variables can be collected. Purkait's triangle captures an approximation of femur neck length and angle simultaneously. There is a paucity of research on the femoral NSA due to problems and uncertainties regarding the best method for determining the NSA. Moreover, the complex anatomical aspects of the proximal femur make measurement difficult, including the lack of reliable landmarks and the irregularities of the neck contours, which makes the definition of the neck axis ambiguous [48,78]. These findings strengthen the value of the triangle in ancestry estimation using simple, flexible and rapid technique rather than the complicated methods of Geometric Morphometric techniques because these measurements can be collected directly from the dry skeletal elements using callipers and do not require the availability of adjuvant radiological imaging facilities [76].
A great utility pertinent to its applicability in forensic case scenarios where partial remains are recovered with intact proximal portion of femur. On land, post-cranial bones may be less affected by animal scavenging and other post-mortem factors [48,79]. The proximal end of femur is more likely to be recovered in various terrestrial scenarios [32,48,57]. On the other hand, the recovery and identification efforts of human remains from the sea are affected by the poorly understood marine taphonomy effects [18]. In the sea, floating of the human remains weaken the soft tissue connecting the major joints leading to disarticulation of the appendages from distal to proximal; on the upper limbs first followed by the lower limb through current and successive wave action. On the lower limbs loss of the feet at the ankle joint is followed by the knee joint while the hip joint is usually preserved in connection with the trunk. The disarticulation pattern of the cranium and mandible parallels that of the upper limb where the mandible is lost with hands, and the cranium disarticulates with the forearms. The sunken remains show more severe scavenging and skeletonization than those floating [18]. Therefore, proper recovery and analysis of the entire human remains evidence are paramount to positive identification [80].
The comprehensive morphological information presented in this paper may be useful in the event that the cranium is unavailable for examination or in instances in which Supplementary Information is needed. Even better, the use of multiple elements approach offers the potential for combined probabilities and likelihood that should enhance the identification effort in order to achieve the required levels of reliability for forensic applications. Thus, the proposed method might be considered for use in forensic cases involving these three populations while practicing some caution with Greek males. The difficulty in Greek (European) male allocation supports the need for a more refined level of ancestry estimation to fully account for human variation at the population-level [77]. It must be emphasized though that the samples may not be representative of the whole region of Greece and India, which creates the need for larger validation studies.
One should also note that Greek individuals are accurately separated from the Egyptians which may refer to the applicability in tracking deaths of migrants through the Mediterranean routes. Further, migration in the Arab region is multi-faceted and varies in the scope between countries. Conflicts in the region have led to over 16 million refugees, mainly in Iraq, Libya, Somalia, the Sudan, the Syrian Arab Republic and Yemen. Syrian refugees represented one-third of them since 2011 [14]. Large numbers of migrants are transiting through the Arab region and across the Mediterranean [81]. Recognizing the diverse and complex migration, refugees and displacement dynamics in the MENA regions poses legal and social challenges that should be addressed for establishing peace and justice in these regions.
Finally, to what extent the overall body mass/size could underpin the intrapopulation variation of the shape of the proximal femur as opposed to inter-population differences remains to be established. Therefore, evaluation of populations from different backgrounds such as Arab migrants and other nationalities needs more attention in future studies to be able to assess the general usability of the method for more accurate and precise ancestry estimates.

Conclusion
The use of standardized procedures for investigations of the dead by forensic scientists and investigative agencies involved in the management of dead migrants is paramount to ensure a dignified management of the tragedic incidents. The results presented here are the first, comprehensive analysis of variation in the morphology of the proximal femur as captured by Purkait's measurements. The core elements to the proposed methodology include analysis of geographically distant groups for insightful study of the diversification of modern humans, applying robust modelling techniques to reveal the subtle differences among the pooled sample, and the demonstration of the differences between the most important variables selected for each classification task. However, the distinctions among geographically distant populations may be blurred as a result of significant intra-population variation which may adversely impact the correct allocation into appropriate ancestry groups. Undetected and/or undocumented intra-and inter-group variations beyond the historical three-group model have significant bearing on forensic anthropological casework.

Authors' contributions
MennattAllah Hassan Attia conceived of the study, and participated in its design and coordination and drafted the manuscript. Yasmin Tarek Farghaly carried out the Egyptian data collection and DICOM file processing, participated in the study design and helped to draft the manuscript; Mohamed Hassan Attia and Bassam Ahmed El-Sayed Abulnoor participated in the study design and performed the statistical analysis in Python and R programming languages, respectively. Sotiris K. Manolis and Ruma Purkait carried out Greek and Indian data collection, respectively. They provided resources and supervised writing the first draft of the manuscript. They participated in reviewing and editing of the manuscript. Douglas H. Ubelaker provided resources and supervised writing the first draft of the manuscript. He participated in reviewing and editing of the manuscript; All authors contributed to the final text and approved it.