Testing Todd and Matching Murdock: Global Data on Historical Family Characteristics

ABSTRACT This paper investigates the possibilities for the creation of a global dataset on family and household characteristics. This is done by scrutinizing and comparing two prominent data sources on family system classifications. We first focus on historical data, by comparing Emmanuel Todd's classification of countries by family systems with ethnographic data compiled in George Murdock's Ethnographic Atlas. Qualitative and quantitative tests show that the two datasets frequently agree about family traits. Nonetheless, substantial differences exist that are mostly attributable to the focus of the datasets on different regions, and the difficulties in translating local, descriptive studies to hard data. We therefore emphasize that it is important to know the strengths and weaknesses of the two datasets and emphasize that robustness checks are necessary in empirical research into family characteristics. We also compare these historical data with present-day data. This comparison suggests that family characteristics and the values associated with them can persist over long periods.


1
Auke Rijpma, Utrecht University, Drift 6, 3512BK Utrecht, The Netherlands. Email: a.rijpma@uu.nl Sarah G. Carmichael, Utrecht University, Drift 6, 3512BK Utrecht, The Netherlands. Email: s.g. carmichael@uu.nl We would like to thank Tine De Moor, Jan Kok and Jan Luiten van Zanden for their comments on previous versions of this paper. Additionally we are grateful to Jaco Zuijderduijn, Bastian Mönkediek, Pim de Zwart, Paul Rotering, Benjamin Guilbert, Christiaan van Bochove, Oscar Gelderblom, Maarten Prak, Selin Dilli, Lotte van der Vleuten, and Kati Buzási for their comments during workshops in April and June 2012. We also thank participants at the WEHC 2012 session 'Marriage patterns, agency in households, and economic growth' and three anonymous referees. Last but not least we thank Jutta Bolt for the underlying data, her patience, and always responding to our numerous email questions.

INTRODUCTION
The institution of the family is a fundamental building block of society. Families provide the setting in which children learn about power relations and equality, which are in turn important for the formation of adult beliefs (Dolan 1995;Mitterauer & Sieder 1982). As such, they play an important role in socialization, education, and the instilling of values which are key to the way societies function. The way families are organized differs around the world and has important consequences for the education of children, the rights of women, the level of freedom or agency of an individual, and also for economic development.
A number of authors have already explored these themes. Theoretical and empirical research into intra-household bargaining highlights the importance of the division of power and resources within households (Agarwal 1997;Schultz 2001). In work on the link between family organization and social and economic outcomes, Tim Dyson and Mick Moore (1983) found differences between the Southern and Northern states of India in terms of female autonomy and demographic behaviour. They ascribed the superior performance of the Southern states in both aspects to kinship structure: spousal choice preferences, control over female sexuality, kinship reckoning, and inheritance practices.
Likewise, Branisa, Klasen and Ziegler (2013) use data on social institutions in non-OECD countries to measure gender inequality, with a prominent role for family codes. In cross-country analysis they find that gender inequality is associated with lower female school enrolment, and higher fertility, and child mortality. Looking at an Indonesian family life survey from 2000, Rammohan and Johar (2009) find that kinship norms matter for female autonomy. Specifically, they find post-marital residence near the parents of the bride (uxorilocality) to be associated with greater autonomy for women. Likewise, Olmsted (2005) argues that the strong family obligations in the Arab world create care regimes that constrain women's options.
As for more general economic and social outcomes in developed regions, Duranton, Rodriguez-Pose and Sandall (2009) find for Europe that family systems purported to date back to the Middle Ages still have an effect on a wide range of social and economic outcomes. Similarly, using a cross-national world-system approach, Kick et al. (2000) find that family characteristics are a vital, if somewhat unpredictable contributor to economic development. David Reher (1998) shows, in a paper which ends in a plea to policymakers to take the family system context into consideration, that there is a persistent contrast between Southern and Northern Europe when it comes to social organization and elderly care. This he puts down to long term differences between the two regions in terms of the importance they Testing Todd and Matching Murdock give to family ties, with the North stressing the importance of the individual, while the South gives the family grouping priority.
It seems, therefore, that the way families organize themselves is important both for general development outcomes and more specifically for the position of women at home and within the wider society. However, in order to test global level hypotheses about how family types affect any number of different outcomes (female empowerment, human capital formation, political systems to name but a few), global data on family systems is needed. Moreover, identifying which variables are important in distinguishing family systems from one another and how they interrelate remains a challenge.
This leads to a fundamental question: what is a family system? Mason (2001, 160-161) defines family systems as: a set of beliefs and norms, common practices, and associated sanctions through which kinship and the rights and obligations of particular kin relationships are defined. Family systems typically define what it means to be related by blood, or descent, and by marriage; who should live with whom at which stages of the life course; the social, sexual, and economic rights and obligations of individuals occupying different kin positions in relation to each other; and the division of labour among kin-related individuals.
Besides identifying relevant aspects of a family system, this definition also highlights the fact that we are talking about systems, implying that what is being analysed is a series of variables working together in some combination to form a whole. It is important to note that her definition refers to beliefs and norms. Norms and beliefs are typically measured by surveys (e.g., the World Values Surveys). They are not the sort of information one can extract directly from historical data. Therefore we use proxies from the historical record which provide insight into the rights and obligations of individuals within a given family setting.
There are two scholars who have attempted to create world-scale historical classifications of family systems: Emmanuel Todd and Göran Therborn. Therborn's (2004) work, although based on an impressive number of case studies and regional analyses, does not provide a systematic framework for family systems. Rather, he uses relatively loose categories which are basically the major geographic regions of the world. The Therborn classification therefore does not lend itself to being transformed into a country-level dataset. Todd (1985Todd ( , 1987, on the other hand, provides strict categories into which he divides all countries of the world on the basis of a number of indicators, combinations of which make up a family system. At the time his work attracted criticism from historians, anthropologists, and sociologists alike for its far-reaching generalizations and claims. Todd also makes some sweeping simplifications, for example lumping much of Africa together into one system classification, and at times he only gives scant attention to the evidence underlying his classification. On the other hand, many of the reviewers also suggest that his ideas deserve to be further tested (Kiernan 1990;Kertzer 1988;Greenhalgh 1987;Roseberry 1990). Todd's use of strict categories, classification of macro-regions, and claims of deep historical roots is also in disagreement with some of the findings in historical demography (see Szołtysek 2012 for an overview). Overall, Todd's work is attractive for its global scope, and his marriage of historical sources with categories that translate easily to cross-national variables.
The purpose of this paper is to examine where we can improve upon existent global family system models and where data issues remain. We do this by taking up the gauntlet laid down by Todd's critics. We test his classification of family systems, the only system of global scope, against ethnographic data to see if we observe the same patterns of indicator variables, both in terms of combinations of family system indicators and in the geographical patterns of the underlying family characteristics which Todd puts forward.
The motivation for this comes, partly, from the surge in interest in incorporating culture into economics models which has come about as a result of the development of New Institutional Economics (Guiso et al. 2006). Providing cross-country datasets on family practices allows for the further development and refinement of country level comparative analysis. In order to test theories on the impact of family practices, country-level data presents the opportunity to link it to other, historically available data. Moreover, high-quality data and wide country coverage is needed in regressions on family system classification.
The central research question is whether Murdock's Ethnographic Atlas corroborates Todd's classifications. And do similar family systems appear from these two sources and from more recent data, such as the OECD's Gender, Institutions and Development Database (GID-DB; Jütting et al. 2008;OECD 2009), the censuses available through IPUMS (Minnesota Population Center 2013) and the data of the World Values Survey?
In order to do this we make use of Jutta Bolt's (2010Bolt's ( , 2012 work with George Murdock's (1969) Ethnographic Atlas, which she updated and turned into country level variables using ethnic population estimates based on the Atlas Narodov Mira (Bruk & Alencˇenko (1964). Murdock's global ethnographic data has become increasingly popular amongst economists and economic historians (e.g., Fenske 2013; Michalopoulos & Papaioannou 2013, 2014Osafo-Kwaako & Robinson 2013). It is especially, though not exclusively, used in African economic history as a source for pre-colonial data. For example, Wantchekon (2011), Bolt (2010), Henderson and Whatley (2014), Alsan (2015), and Besley and Reynal-Querol (2014) rely on it for their research on Africa, while others have used it to investigate fertility and female labour force participation, linking these to traditions stemming from historical plough use (Alesina, Giuliano and Nunn 2013). 2 Moreover, Todd himself used Murdock's atlas to analyse the origins of domestic organization (nuclear versus community households), though not as a test of the soundness of his observations (Sagart and 2 Similar data by Murdock (1959) on Africa has also seen frequent use (e.g., Whatley & Gillezau 2014;Besley & Reynal-Querol 2014).

Testing Todd and Matching Murdock
Todd 1992). Although Murdock's data has become popular, the reliability of the data is rarely questioned. By providing an in-depth analysis of his observations on family organizationcharacteristics which should be relatively straightforward to observewe put this important dataset to the test as well. 3 After a discussion of Todd's family systems and how comparable variables can be constructed from Murdock's data, we move to a variety of tests. These show a decent, if imperfect, correspondence between the two datasets. We finish with suggestions on using the two datasets. In light of their imperfect matchup, we emphasize the importance of playing to their relative strengths and present a hybrid dataset that can do just that. This is then used to check the persistence of family values by comparing it with present-day data on family practices from the OECD's Gender, Institutions and Development database (used to construct the Social Institutions and Gender Index or SIGI), census data from IPUMS (c. 2000), data on consanguineous marriage (Bittles 1994), and the World Values Survey (2014).

HISTORICAL DATA
Emmanuel Todd has written extensively on family systems. Here we choose to focus on the two books in which he provides a family system classification scheme on a world scale. Both works, Explanation of Ideology (1985) and Causes of Progress (1987), use family systems to explain larger societal phenomena. Explanation of Ideology is intended to explain the global development of political systems based on the underlying values ingrained in individuals from an early age through family systems. In Causes of Progress he claims that the more power women have in a family, the more educated the next generation will be (cf. Schultz 2002). In short, Todd describes family traits that are hypothesized to be linked to key developments in the economic and social history of the nineteenth and twentieth centuries. For the purposes of this paper we will focus on Explanation of Ideology, which contains essentially the same classification as Causes of Progress. Todd defines family systems as a combination of three elements: the practice of living in complex or nuclear households, whether cousin marriage is practised, and whether there is partible inheritance between brothers. 4 He started with 1960s and 1970s censuses and went to the historical record from there to arrive at data that was meant to capture preindustrial yet persistent family characteristics. Although Todd generally uses a macro-level approach, he provides considerable detail for a number of countries in Europe. Moreover, Todd is one of the few authors providing global coverage.
As mentioned in the introduction, Todd's work has attracted criticism. Research in historical demography tends to emphasize local diversity and 3 Family organization was one facet of a society that ethnographers were trained to observe and describe. 4 Todd also makes a distinction between cross-cousin and parallel-cousin marriage. Because this difference by and large only applies within India, we leave this out of our country-level exercise.
Auke Rijpma & Sarah G. Carmichael heterogeneity, which goes counter to Todd's classification in macro-regions. For example, Szołtysek et al. (2014) show detailed variety of household complexity (measured by the average number of married couples) in 1884 Germany. Barbagli and Kertzer (1990, 374) also discuss great local family diversity in Italy (see also Viazzo 2003). Household structure in nineteenth-century Russia, though overwhelmingly consisting of multiple generations, showed variation between regions depending on the economic activity in the region (Dennison 2011;Polla 2006). While Todd acknowledges a North/Centre/South difference in Italy and sees some diversity in the West of Germany, he cannot replicate the same level of detail. Moreover, it casts doubts on the great swathes of land categorized as one family type outside of Europe (see Todd 2011 for a more detailed account of family systems outside Europe).
Another finding from the field of historical demography in disagreement with Todd's scheme concerns his claims of century-long stability in family systems. For example, Collomp (1988, 72-75) documents a shift from stem to nuclear households in the Provence in the eighteenth and the nineteenth centuries. Ruggles (2010) argues that the prevalence of stem households can be explained by taking into account the demographic structure and share of agriculture in employment. This would suggest that as economies develop, family systems would change. However, Ruggles also found evidence for an aversion to joint households in Northwest Europe and North America.
All in all, however, we should expect a degree of path-dependency in family systems. People learn about family behaviour from their parents and institutions tend to be path dependent, with existing rules being preferred over innovations (Kok 2014). At the same time, change in family systems is expected as societies go through momentous changes such as industrialization and the demographic transition (Harrell 1997). Below we present evidence that while the observable characteristics of families might change, the values and expectations surrounding the family show more persistence. Nonetheless, it is important to note that family systems should not be assumed to be immutable. Todd's global scope comes at a cost and it is important to check his categorization against more detailed data.
To check the classification of Todd, an independent source of information on household (family) systems is needed. The ethnographic information on many societies for the period 1820-1960 contained in Murdock's (1969) Ethnographic Atlas can provide comparable data. The atlas was initially a regular feature of the journal Ethnology from 1962 to 1980. In 1967 the existing data was compiled into a book. 5 One of the most important underlying reasons for producing data this way and on this scale was to facilitate comparative research, particularly of a cross-cultural nature. Nunn and others claim that Murdock's data is historical, even pre-colonial (e.g., Nunn 2008, 165;Nunn & Wantchekon 2011, 3222, 3236, 3237). As half the observations pre-date 1920 and a quarter pre-date 1890 (see Figure 4), there is some truth to this, though it is important to note that many of the observations in Murdock are relatively recent. Our approach of comparing Todd and Murdock relies on both sources trying to provide preindustrial, rural family characteristics. The possibility exists that mismatches can be attributed to a differences in focus period. We will at first assume Murdock's data captures pre-industrial conditions (like Todd claims his data does) and will later consider the effect of loosening this assumption.

CONSTRUCTION OF VARIABLES
Todd's data is largely defined on the country level, though he reports regional differences for a number of European countries (e.g., France, Italy, Spain, and the Netherlands). Murdock's observations on the other hand are all on the level of ethnic groups. In some countries, especially in Africa, this means that there are multiple observations for each of the ethnic groups in a country. To make the data comparable, both datasets need to be at the same level of observations.
The practices for the 1267 societies tabulated by Murdock were assigned to present-day population using the ethnic population figures in the 1964 Atlas Narodov Mira (Bruk & Alencˇenko 1964;cf. Weidman, Rød & Cederman 2010). Bolt (2012) did this by adding up the population shares of ethnic groups within a country characterized by the same trait for each variable. In doing so, a share of the population characterized by a given variable was derived, in our case practising some form of family organization. 6 If a family trait was practised by more than 50% of the population covered in Murdock for a given country, and if the total coverage of ethnic groups for that country was more than 10%, we coded that family trait as present. If coverage was lower than 10%, the observation was set to missing. The dominance of one ethnic group in most countries meant that a coverage threshold of 10% included mostly countries with extensive coverage (Figure 1).
For most Eurasian and American countries, the populations were ethnically fairly homogenous. 7 However, in some countries, especially in sub-Saharan Africa, the high number of different ethnic groups within the borders of modern day nation states (Easterly & Levine 1997), meant that this procedure did not always give clear-cut results. For example, in Cameroon, both asymmetrical and 6 A similar approach was employed by Jütting et al. (2008, 68) in the construction of their Gender Institutions and Development Database (GID-DB), who take into account the share of the population adhering to certain social institutions when coding their ordinal variables. 7 This is not to say that differences do not exist within countries but rather that most countries in Eurasia and the Americas tend to have been coded either as one ethnic group that makes up a majority of the population, or more groups classified in the same way. symmetrical inheritance were practised by groups of around 40% of the population each. Such countries were coded as not having either of the traits as present. There are only four countries in the dataset with this problem (Kenya, Cameroon, Niger and South Africa) which together represent 1.4% of the global population covered. In a further 2.2% of the world population a large ethnic group is coded differently than the majority (Senegal, Angola, Ghana, New Zealand, Mexico, Guinea and Qatar). Since they had a majority, their practices were nonetheless coded as unambiguously present. As discussed above, Todd's data is a generalization from local diversity in family practices. Since our purpose is to create a country-level dataset and the comparison can only be performed at the highest level of aggregation provided by Todd and Murdock, it is unavoidable that we must abstract from such heterogeneity. It is however important to note that any discrepancies found between the two datasets might be attributable to differences within countries that our comparison cannot capture.
The coding of variables in Murdock's Atlas is far more detailed than Todd's classification of family systems. 8 Therefore, the first step before comparing For example, variable 23 in Murdock is 'cousin marriage allowed' which is broken down into 13 different types of cousin marriage. Then, in addition to variable 23 there is variable 25 detailing the presence of preferred cousin marriage which in turn is split into 15 different categories. Todd, on the other hand, mentions only four types of cousin marriage: obligatory exogamy, endogamy, asymmetric endogamy and indifference.

Testing Todd and Matching Murdock
Murdock and Todd's data was to reclassify Murdock so that his variables matched those of Todd. This section presents the reconstruction for each of Todd's variables and the underlying arguments.
In Explanation of Ideology, Todd makes his breakdown of family systems based on three variables that he thinks determine values on liberty and equality: endogamy, co-residence, and inheritance. We have tried to define these directly in terms of Murdock variables in the following manner.
Liberty is measured through a combination of choice of marriage partner (whether marriage partners are pre-determined by custom such as consanguinity, chosen by parents, or chosen by the couple-to-be), and where married couples live after marriage (co-residence of all married sons with parents, neolocal residence, or in the stem family that one child remains at home after marriage as a successor). For the first aspect of cousin marriage (or endogamy) the Ethnographic Atlas includes a number of variables. These are variable 23: Cousin marriage (Allowed); variable 24: Subtypes of Cousin Marriage; variable 25: Preferred rather than just Permitted Cousin Marriages; and variable 26: Subtypes of Cousin Marriages (Preferred rather than just permitted).
To capture endogamy and exogamy we only used variable 25 (and therefore indirectly, variable 23) to construct categories that match those of Todd (Table 1). Even though Todd speaks of permitted cousin marriage in his tables, his text argues more for the interpretation of preferred cousin marriage. This also solves another problem. Many societies in the Ethnographic Atlas are said to nominally allow cousin marriage (e.g., New Englanders, Dutch). Though cousin marriage is indeed not legally forbidden in these societies, they rarely practise it and even condemn it (Goody 1983). It makes sense to classify these as exogamous societies. From the perspective of non-exogamous societies, it seems that societies likely to practise endogamy (Islamic societies), were characterized in Murdock as having a preference for cousin marriage rather than merely permitting it. This choice does make it difficult to identify societies that were indifferent to the issue of cousin marriage, which Todd ascribes to the anomic family system. 9 Arguably, societies where exogamy was neither obligatory nor preferred, can be viewed as indifferent. However, since Murdock followed strict legality of cousin marriage as his measure of whether cousin marriage was permitted, this would include countries like the Netherlands, the USA, Portugal, and Britain in the category indifferent.
Todd considers intergenerational co-residence to be an important family characteristic determining liberty, arguing that permanent residence with older generations diminishes the freedom of younger generations within the household. For this we turned to variable 8 in Murdock: Domestic organization. The translation of this to Todd's categories is described in Table 2.
The last variable in Table 2, polygamy, is not strictly speaking part of Todd's liberty classification scheme, nor does it fall neatly into a category of co-residence. Polygamy may, however, be useful for introducing greater nuance to the category of the African family system. It is one of the attributes Todd ascribes to Africa but he does not go into great detail on the prevalence of or how it combines with other family traits. He merely notes that polygamy was frequently practised in sub-Saharan Africa and that this means the other family traits were not as readily defined as elsewhere in the world (Todd 1985). We have therefore included polygamy in the analysis below to strengthen the analysis of African family systems.
The final variable in the Explanation of Ideology scheme is inheritance. Symmetric (partible) and asymmetric (impartible) inheritance in Todd's structure determines whether individuals are seen as equal or not. He divides inheritance practices into three categories: symmetry and asymmetry between brothers as well as an indifferent category. For this variable we used Murdock's variable 75: Inheritance distribution for real property (land). 10 As in the case of cousin marriage, it was not possible to find variables in Murdock that captured indifference to inheritance practices. The only societies that did not have a rule for the inheritance of real property, were those without individual property rights.

UNDERLYING SOURCES
Before quantitatively comparing the two datasets, it is worthwhile to briefly examine some of the sources Murdock and Todd used. This may help determine where the two datasets fall short and understand any discrepancies that might  Using variable 77, inheritance distribution for movable property, was an option but the rules on inheritance of land are closer to Todd's ideas on the subject, since he distinguishes real estate from 'money, a secondary asset' (Todd 1985, 78).
surface. Both Todd and Murdock rely on case studies. Murdock mostly used ethnographic studies, though he also included historical and sociological works. 11 Some of the case studies concern whole countries or regions, but some cover one or a few villages. Besides anthropological works, Todd also relied extensively on historical work, using these to work back from family patterns derived from censuses from the 1970s. There is little overlap between the sources of the two authors. Looking at the sources used for Europe, North-Africa, the Middle East, and Northern and Eastern Asia, only eight were in both Murdock's (209) and Todd's (136) sources for these regions. 12 In two more cases, they relied on the same author, but not on the same work. The sources are not without problems. For one, it is often difficult to reconstruct how the information from the case studies was coded into a dataset. Furthermore, Todd had to reconcile observations for a 400-year period for some countries. A closer look at some of the sources shared by Todd and Murdock can be illuminating.
Stephen and Ethel  Peasants of Central Russia, a book both Todd and Murdock rely on, is a case in point. Most land was communally owned and rights to it were vested in households that continued to exist after the head died (Dunn & Dunn 1967, 31, 41, 47). Should this be interpreted as Murdock's 'absence of individual property rights in land' or Todd's 'symmetrical inheritance' since, arguably, all household members inherited rights to the land? Extended households could also be difficult to establish. Dunn and Dunn claim the nuclear household was the norm, but also consider the extended households as the ideal and present census data showing that 20-25% of households contained three or more generations. 13 At the same time, the decline of extended households between the 1920s and the 1960s gives difficulties for Todd's classification (Dunn & Dunn 1967, 11-2).
As another example, the existence of nuclear or extended households was also difficult to establish in Greece. Although the ethnographic study used by Murdock as well as Todd explicitly calls the families nuclear, newlyweds unable to afford setting up their own household at first moved in with their parents and could stay there until one of their parents died. Consequently 20% of the households in the 1950s contained three generations (Friedl 1962, 12-13, 18, 53-61). 14 Generally, establishing the preference for extended households is difficult (Berkner 1975), and this can cause discrepancies between the two datasets to arise.
Another problem lies in the thin empirical base for some countries and regions. Though most societies (Murdock) and countries (Todd) are based on multiple case 11 See Appendix A for a selection of source material of both authors. 12 Murdock's references to sources are spread over the issues of Ethnology from 1962-71. 13 Many scholars attest to the prevalence of extended or multiple households in nineteenthcentury Russia (Czap 1982;Polla 2006;Dennison 2011). 14 More recent research by Hionidou (1995Hionidou ( , 1999 suggests neolocality and nuclear households were the norm in the nineteenth century.

Testing Todd and Matching Murdock
studies, some are based on only one or a few villages. Murdock's data on Dutch society, for example, relies entirely on a study of a single village in the north-east of the country (Keur & Keur 1955). In turn, Todd has been criticized for using observations for one locality to describe entire regions, for instance by conflating South China and Taiwan (Rawski 1988). Finally, the two datasets focus on different regions of the globe. Although Todd includes many countries, his data is at its most detailed for Europe. Africa gets scant attention, according to Todd because the prevalence of polygyny made detailed analysis of households impossible (Todd 1985, 25, 191). Murdock (1969, 7) says his data is worst for Europe and that coverage in Latin America is also problematic. Bolt (2012, 12) confirms this assessment with her figure on data coverage per continent. The data for Africa, on the other hand, is where Murdock excels, as this is the area where most ethnographic studies were conducted.
Since there is little overlap in the underlying sources of the datasets, comparing the two with each other can provide an important check. It alleviates problems arising from relying on one or a few cases and can provide a second opinion on the coding practice. Moreover, given the different focus of the datasets, they might be able to complement each other, especially for coverage of Africa and Europe.

RESULTS AND TESTS
How do the family systems originating from Murdock's data compare to Todd's classification of countries by family system? Beginning with an exploratory analysis for the family systems from Explanation of Ideology, we compare maps of Todd's original classification (Figure 2) and the match to the societies in Murdock's Ethnographic Atlas (Figure 3).
Broadly speaking, parts of Africa and the Americas in Murdock-Narodov match Todd's classification, as do South and South-East Asia (China, Japan, Vietnam). Europe and countries in the former USSR fare a lot worse.
The extent of similarity between these two classifications of family systems can also be explored by cross-tabulating the data in a contingency table and computing its measure of association. Table 4 examines the family systems presented in The Explanation of Ideology and shows that 49 of the 102 cases are matched correctly. Todd and Murdock match well for a number of systems: the African, stem (authoritarian), egalitarian nuclear, and endogamous community family systems reappear in Murdock's ethnographic data. Absolute nuclear, anomic, and exogamous community families, on the other hand, are not frequently matched.
Investigating the underlying variables of the family systems can help identify weaknesses and strengths of the two datasets in further detail. Tables 5-7 present contingency tables for the underlying family characteristics in Todd and Murdock. Generally, the two datasets match somewhat better when considered from the angle of the underlying family traits. This makes sense, as combining variables into family systems increases the chance of mismatch.
In domestic organization, there are two main sources of disagreement between the datasets. One is that Todd identifies a substantial number of community families (extended households) where Murdock observed nuclear households. The remaining mismatches mostly originate in the Middle East, where we believe Todd more   accurately reflects the source material (see section 'Using the two datasets' for a discussion). The disagreement between the two sources may also arise from the fact that in both Murdock and Todd's sources there are references to the fact that increased urbanization is causing a shift away from traditional village life and domestic organization.
The second source of disagreement in the domestic organization variable concerns the classification of a number of African countries by Todd as polygamous which Murdock sees as community families. Todd's blanket categorization of Africa as polygamous means we place more trust in Murdock's observations. At the same time, many African countries displayed great ethnic diversity, so the countries that are coded as community families in Murdock-Narodov might nonetheless have substantial minorities that practised polygamy. Table 6 shows that the two datasets generally agree on symmetric inheritance practices. Although there is some disagreement on asymmetric inheritance, this is a rare feature in both datasets. Table 6 also shows that there are fewer observations on this family trait than there are for the others. Again, this is due to our inability to match Todd's indifferent inheritance systems with any variable in Murdock.
The two datasets are generally in agreement on the variables on exogamy (banning or not preferring cousin marriage). The mismatches mostly occur in Africa, where Todd suggests exogamy was the norm, whereas Murdock observes numerous ethnic groups in Western Africa practising some preference for cousin marriage. Again, Murdock is probably the more accurate source on Africa.
We have also performed logistic regressions between Murdock's and Todd's constituent variables (Table 8). In all cases except that of asymmetric inheritance, the variables are statistically significant predictors of one another. Having a certain family trait in Murdock is usually associated with about a 40-60% higher probability of the same family trait being found in Todd.
Another way to test whether Todd's systems exist in Murdock's ethnographic data is by looking at whether these combinations also match when we cluster the data based on the constituent variables of Todd. This can show whether the data naturally divides in groups based on these criteria (Everitt 2011). The results are similar to the previous tests and are reported in Appendix B, Table A1.

CHANGES OVER TIME
One of the downsides of consolidating the data in the Ethnographic Atlas to country-level variables is that this process lumps together observations of ethnic groups from the entire 1820-1960 period to come up with one set of observations per country. Similar worries exist about Todd's unchanging family systems. In order to check how much of an effect this had on the data and to see whether we can observe changes over time, we conducted a check of whether using observations from two different time periods affects the match up with Todd. We split the Murdock dataset in two: one set for before 1920 and one for after 1920, each capturing about half of the observations in Murdock (see Figure 4). This allowed us to compare the results before and after 1920 for each of the underlying variables: domestic organization, inheritance and exogamy.
However, note that for each ethnic group, we have only one observation in one year. The ethnographic atlas does not provide two observations for the same ethnic group. Any conclusions for consistency and change over time therefore depend on the assumption that ethnic groups in the same country are similar. Murdock himself claims geographic proximity would make societies similar (Murdock 1967, 112), but the results should be interpreted cautiously.
The maps showing these results are presented in Figures 5-6. In these figures, 'false' indicates that Murdock and Todd do not match on their categorization of a certain country while 'true' indicates a match.
One thing that all these maps highlight is the lack of data coverage of Latin America for the pre-1920 period. Looking at the first set of maps (Figure 5a and 5b), we see that comparing nuclear families for before and after 1920 the largest change is driven by the former satellite states of the USSR in Central Asia. For most of the other countries in the dataset the match remains relatively stable. This shift in Central Asia data is likely caused by the sea change which occurred in the Soviet political scene during the first half of the twentieth century and the issues of interpretation this can give (see above). It suggests that if we base ourselves on the interpretations of later scholars and later data this area of the world changes in one key variable in the family systems structure, suggesting the presence of dynamism in the family system.
The maps for polygamy again show shifts in the mismatch between the two datasets over time, although these shifts are small. These are driven entirely by  countries in Africa. While countries such as Mali, Nigeria and Cote d'Ivoire are classified both before 1920 and after as non-polygamous by Murdock. In 1920 Niger and Chad have also joined the mismatches along with Zambia. As opposed to the 14 mismatches out of 65 comparisons before 1920, the post-1920 data exhibits 23 mismatches out of 134 comparisons. Angola is one of the few countries which goes from being classified as non-polygamous before 1920 to polygamous after 1920. Two further sets of maps show how classifications of preferences regarding cousin marriage and asymmetrical inheritance differ between the sources we have for before and after 1920 in the Ethnographic Atlas (available upon request). The only area where differences arise is in Africa where some countries change from being endogamous to exogamous and vice versa. For asymmetrical inheritance we again see overall good matching with little change between the two maps.

USING THE TWO DATASETS
Since the match between Murdock-Narodov and Todd is far from perfect, the question arises which of the two should be preferred for a historical dataset on family practices. Here we will discuss some of the discrepancies between the two datasets and their relative merit.
We first report our research into the discordant observations using the underlying sources and the wider literature. We have done this for the top fifteen mismatched countries sorted by population size. We will briefly discuss these cases and their resolutions here. Further details and the literature consulted for this can be found in Appendix A.
The first issue is a number of countries in South-East Asia that Todd classified as anomic: Indonesia, Thailand, the Philippines, Myanmar (Burma), and Malaysia. Since this system implied a lack of strict rules, it was difficult to code the equivalent system with Murdock's data. Although the literature bears out Todd's observations of flexible family systems, we largely follow Murdock's more detailed observations for these societies.
In Turkey and Morocco, Todd's observations of preferences for cousin marriage and extended families respectively were corroborated by the literature. In Iraq, Jordan, and Kuwait, Murdock observes nuclear households whereas Todd observes extended households. This mismatch is difficult to trace back in detail. The few available sources Murdock used for this region suggest that extended households were actually preferred.
We have also investigated two cases for Africa: Madagascar and Ethiopia. Madagascar again is a case of Todd's anomic family and we have followed Murdock's more precise observations on nuclear families and partible inheritance, but the preferences in regard to cousin marriage are probably truly indeterminate there. In Ethiopia, Todd's observations of nuclear families rather than Murdock's extended families most closely resembled the temporary co-residence scheme to be found there.
In Bangladesh and Pakistan the clash is a result of Murdock coding the majority population group as practising exogamous marriage while Todd considers cousin marriage the norm for the region (particularly asymmetrical cousin marriage between the children of brothers and sisters). It is unlikely that Murdock is correct in this respect. To this day approximately 60% of marriages in Pakistan are consanguineous, 80% of which are between first cousins and these levels have remained more or less constant over the last four decades (Bittles 2001). We therefore choose to adopt Todd's coding of this variable.
In Europe a few countries gave mismatches as well. For France, Murdock's observation of impartible inheritance was rejected in favour of Todd's observation of partible inheritance in the more populous Northern France. For the Netherlands, Murdock's extended families were rejected in favour of the far more prevalent nuclear families observed by Todd. However, Murdock's observation of partible inheritance in the Netherlands has been followed instead of Todd's observation of indifference towards inheritance practices. For the English too, including settlers in America, partible inheritance has been followed rather than indifference.
For Russia we observe a mismatch between the two datasets driven by the classification of the region as practising nuclear domestic organization while Todd categorizes the area as following an extended household ideal. The sources reveal that the difference arises from focusing on different time periods. Murdock's reading of the sources focuses on events after the Russian revolution, when collectivization forced a break with past family structures. Todd, on the other hand, is more interested in the historical situation pre-dating such events. Murdock's sources would not dispute a historical predominance, or ideal type, of extended households in this region. We therefore choose to follow Todd. 15 A further mismatch in Russia is due to the lack of property rights observed in Murdock's data while Todd assigns them the label of symmetrical inheritance. Most of the sources mention patrilineal inheritance as the norm, although a dowry for women appears to be common, and a degree of asymmetry in that oldest sons may well inherit more. In Murdock the classification of inheritance as lacking in property rights is likely due to the changes incurred after the Russian Revolution, which entails that we follow Todd for the historical family system classification.
Overall, Todd's data comes out favourably when trying to solve discrepancies between the data. Nonetheless, for the remaining, smaller countries that do not match, we think it is best to consider the fact that the two datasets have different strengths. Murdock's data is obviously at its most detailed for Africa and Asia while Todd has used very broad characterizations for these regions, though his later work provides more detail (Todd 2011). Murdock, on the other hand has a very weak empirical basis for European societies and their settler populations, 15 See also note 11 above.

Auke Rijpma & Sarah G. Carmichael
which is where Todd is at his most detailed. Todd's broad observations of indifference in regard to family practices, though they sometimes capture reality well, should probably be discarded in favour of Murdock's more detailed observations. Finally, Todd has a stronger historical focus than Murdock. If the focus is on the historical traditions of family formation, the cultural ideal rather than actual practice at a given time, Todd's data has the edge.
We have used these observations on the strengths and weaknesses of the two datasets as a guide to creating a hybrid dataset. Provided both datasets are expressed in a dummy variable format, the strengths of the two can be combined. This involves using our corrections on the most populous countries, discarding Todd's observation of indifference to the practice of inheritance and consanguineous marriage, and using Murdock for Africa and Asia and Todd for Europe and the Americas (see Supplementary Data). Figure 7 presents a map with the family-systems in the hybrid dataset which shows most of the differences between the datasets to be located in Africa and South-East Asia. Creating a hybrid dataset also leaves us with more observations. While we could extract family system observations for 163 countries from Todd's maps and could reconstruct family systems for 127 countries from Murdock's Ethnographic Atlas, the new hybrid dataset contains observations for 178 countries.
In order to get a sense of how the use of this dataset might change conclusions of existing work on the long-term persistence of family values, we performed robustness checks for two articles using Todd's data. For Carmichael's (2011) work on the determinants of marriage ages, the change in dataset changed the signs on the coefficient on various family systems, though the overall magnitude of the effects remained unchanged. For Dilli et al.'s (2015) investigation of the factors driving gender equality, the main conclusions remained unchanged, though there were slightly increased effects for the endogamous community family. Within the scope of this paper, any explanations for these differences must remain speculative. One possibility is that family systems matter in different ways for the outcome variables. Another possibility lies in the geographical focus of the two studies. While Carmichael (2011) focused on countries outside Western Europe with the outcome variable being marriage ages and spousal age gaps, Dilli et al. (2015) sought to explore the determinants of a composite indicator of gender equality at a global scale. The focus on countries outside Western Europe means that the changes made to Todd's African and Asian classifications could have a far larger impact than they do in a global comparison. 16

FAMILY PRACTICES PAST AND PRESENT
Because the data from both Todd and Murdock-Narodov should capture historical family organization, it can be used to explore developments over time by comparing the hybrid dataset to present-day data. One source for this is the OECD's Gender, institutions and development database (GID-DB), containing data for non-OECD countries for 2009 (Jütting et al. 2008;OECD 2009). Because part of this dataset, and the resulting Social Institutions and Gender Index (SIGI), take into account 'family code' (consisting of indicators on early marriage, polygamy, parental authority, inheritance), this data is very well suited for comparisons with data on family practices.
The GID-DB family code data looks at women's right to inherit, early marriages, polygamy and the parental authority of women (whether women have the same right to be a legal guardian of a child during marriage and whether women have custody rights over a child after divorce). We compared data on polygamy and the right of women to inherit with the equivalent variables in Murdock and Todd. In the case of inheritance, this means we used a variable that has not been used so far: variable 74 on real property inheritance rules to see whether inheritance rules were patrilineal. To compare Murdock's and Todd's data on co-residence, we used census data from IPUMS-International to compute the average number of married couples per household for all available countries in c. 1997 (Minnesota Population Center 2013). Finally, to compare data on preference for cousin marriage, we used data on the percentage of the population practising consanguineous marriage collected by Bittles for the period 1957-1994(Bittles 1994Woodley & Bell 2013).
The matchup of these data sources is not always straightforward. Taking the example of polygamy, fewer countries show up in Murdock as practising polygamy than they do in the GID-DB data. To Murdock, polygamy is by and large restricted to sub-Saharan Africa, whereas the GID-DB also records the practice in Muslim countries, India, and Russia. In part, this is due to their different method of measurement. Although the GID-DB claims to look at acceptance of the practice, 16 Results are available upon request.
as well as its legality, a look at the data from countries such as India or Pakistan suggests categorization as polygamous based mostly on its legality (OECD 2012).
As was the case for many of the variables in the Ethnographic Atlas, Murdock coded societies as polygamous only if it was the dominant practice in a society and we coded the country as polygamous only if these societies made up the majority of the population. Despite these coding differences, some observations can be made. In countries the GID-DB codes as non-polygamous but Murdock coded as polygamous, the practice must have declined. After all, it used to be the dominant practice but is not even legal in 2009. Likewise, we can also observe cases where polygamy was stable and may even have grown. In countries with large Muslim countries, the GID-DB observes that polygamy is still accepted. As Murdock did not code them as polygamous, the practice was not dominant in ca. 1920, though it may still have been accepted. Muslim countries were therefore at least stable in this regard. Polygamy seems to have declined in some of the southernmost countries of Africa. While it was still common practice at the beginning of the twentieth century, it was no longer commonly accepted one hundred years later.
Keeping in mind that coding differences may add substantial noise, we now proceed to the regressions for the persistence of the family traits. Because some of the outcome variables can be interpreted as continuous variables, we start with OLS regressions before moving to the logistic models. Table 9 presents the results of the present-day data as the dependent variable against the hybrid dataset, Murdock, or Todd. Patrilineal inheritance practices in the hybrid dataset or Murdock-Narodov in c. 1920 were statistically significant predictors of presentday inheritance. It increases the score on inheritance in the GID-DB by 0.15-0.20 towards a more disadvantageous score for women. 17 This is no negligible effect on the GID-DB's 0, 0.5, 1 scale of the GID-DB (no, intermediate, and strong discrimination). Polygamy gives slightly higher estimates. Being coded as a country that practises polygamy in the hybrid data or Murdock increases the expected value of the GID-DB sub-index by 0.3-0.4. The Todd data on polygamy is an even stronger predictor of present-day polygamy. It is associated with a full step (0.5 points) on the GID-DB. Since the GID-DB scores are of an ordinal nature, ordered logistic models might be more appropriate for these variables (Table 10). Such models generally show that the historical family characteristics poorly predicts countries being coded being coded 0 or 0.5 in the GID-DB's present-day data, but strongly predict the difference between a country being coded 0.5 or 1.
Consanguineous marriage in ca. 1920 is a strong predictor of more recent figures on consanguineous marriage (Table 9). Having a preference for cousin 17 Todd's data on daughters' inheritance from Causes of Progress gives no significant results, but this is not unexpected given that Todd derived the ability of daughters to inherit entirely on whether brothers shared equally. Note: Constant terms included, but not reported. *** , ** , * indicate significance at <0.1%, 1%, and 5% respectively. marriage in Murdock's data increases the expected value of the share of the population practising consanguineous marriage in the 1960s and 1970s by 26 percentage points. For Todd's data, the effect is similar: it is associated with a 25 percentage points higher share of the population practising consanguineous marriage, while the hybrid dataset predicts a 20 percentage point increase.
The existence of extended families has a positive association with the extent of co-residence in the 2000s and it too is statistically significant. A preference for extended families in Murdock's data in c. 1920 is associated with 0.14 more couples per household in c. 2000. Extended families in Todd's data predict 0.18 more couples per household in c. 2000 and the hybrid data is in between these values (0.16). With the number of couples per household in IPUMS in c. 2000 varying between 0.5 and 1.4, this is a moderate effect.
In short, the data on historical family characteristics has some predictive power for today's measures of family characteristics, but it is far from perfect. Consanguineous marriage appears as a very persistent practice. Considering the GID-DB subindices, the variables from Murdock and Todd show some persistence, with better results for strong present-day cases. A preference for extended households is a moderately persistent trait.

FAMILY SYSTEMS AND CURRENT DAY VALUES
Persistence in terms of the characteristics described above is one test of the value of the dataset. However, possibly more importantly are the outcomes in terms of values today. The underlying determinants of family systems can capture a set of norms and values for which we have very little systematic data available historically. However current day data allows us to explore whether the family systems we constructed above explain present day variation in norms and values. For this we made use of the World Values Survey's longitudinal data for 1981-2014 (World Values Survey 2014) and tested for the effect of the various family systems on variables related to gender attitudes and agency.
We focus here on two values that we believe could be influenced by historical family systems. First, the way families are organized and the norms and values accompanying this can influence the amount of control individual perceives themselves to have over their own lives (agency). For instance, strong expectations on where children should live or whom they should marry could limit the extent to which people can make decisions on their own life course. To measure this, we use question A173 asking people to indicate on a 1-10 scale 'how much freedom of choice and control you feel you have over the way your life turns out? ' We further look at attitudes towards women. Family practices can be particularly restrictive towards women because they have an important role in transmitting family values and membership to cultural groups (Shachar 2001). As a measure of the attitude towards gender equality, we look at question D059, asking Note: *** , ** , * indicate significance at <0.1%, 1%, and 5% respectively.
whether respondents strongly agreed, agreed, disagreed, or strongly disagreed with the statement that 'on the whole, men make better political leaders than women do'. Table 11 presents the results of regressing the hybrid Murdock-Todd data on the responses to these two questions. At the individual level we control for the survey year, education, income, gender, age, age-squared, city size (to capture the difference between urban and rural respondents), marital status, and whether the respondents has children. At the country level we also control for GDP per capita (Bolt et al. 2014). The outcome variables are measured at the individual level and the main predictor of interest (family practices) are measured at the country-level and this could bias our estimates. To correct for this we have used clustered standard errors and in an alternative specification used a varying-intercept multilevel model (Primo et al. 2007;Bates et al. 2015).
Relative to countries characterized by extended families (the reference category), respondents in countries with nuclear or stem families report feeling more freedom of choice and control: one point extra on the 10-point scale. Nuclear families especially lack the residence under the authority of a father or in-laws, so this fits Todd's (1985) (1) to strongly agree (4) with men being better political leaders. Controls for survey year, education, income, gender, age, age-squared, city size (to capture the difference between urban and rural respondents), marital status, whether the respondents have children, and GDP per capita included, but not reported. ***, **, * indicate significance at <0.1, 1, and 5% respectively. model. Regarding gender equality, we find that people living in countries without a history of extended households were less likely to agree with the statement that men would make better leaders. The 0.7 points lower on a four-point scale for stem households is a fairly large effect and fits with Todd's (1987) idea that these family types were especially conducive to the empowerment of women.

CONCLUSIONS
It is one thing to recognize that family characteristics matter for social and economic outcomesgendered or otherwiseit is another to test this empirically. This paper has tried to provide scholars with cross-country data and tools to approach the role of the family, by investigating whether the family systems that Todd attributes great explanatory power to can be corroborated with other data. This check came from a widely used source of data in economics and economic history: the ethnographic data collected in Murdock's Ethnographic Atlas, translated to country-level data with ethnic population figures from the Narodov Atlas. The underlying characteristics of Todd's family systems (domestic organization, inheritance, preferences for cousin marriage) match in roughly 70% of the cases. The family systems composed of these variables correspond to the ethnographic data from Murdock in half the cases. Countries in North-Africa, the Middle East, and Southern Asia often match Todd's family types. As a result his endogamous community, African, and egalitarian nuclear family types perform well. There are also important mismatches between the Ethnographic Atlas and Explanation of Ideology. The exogamous community and the absolute nuclear, and the anomic family types are not readily matched to the Murdock data. Observing the absolute nuclear family in Murdock's Atlas is further hampered by the lack of an indifferent inheritance classification. The opposite occurs in sub-Saharan Africa. Whereas this is classified with a blanket category by Todd, Murdock is at his most detailed for this region. Though the prevalence of polygyny in Africa means that Todd's African type is frequently encountered, the Murdock data allows for more detail. More generally, we should allow for the possibility that the use of macro-region by Todd and the potential for change in family practices is behind some of the mismatches. This paper has also explored the possibilities of the Murdock data by comparing it to present day data on family practices. Despite occasional coding differences between the two, doing so allowed us to observe moderate persistence of the practices of extended families, polygamy, and inheritance problems and strong persistence in preferences for consanguineous marriage. Likewise, historical family systems seem to have predictive power for people experiencing freedom and having positive views on gender equality.
Finally, we have made recommendations on the relative strength of the two datasets. We have made detailed suggestions to resolve some of the more glaring contradictions. For the remaining smaller contradictions, we suggest considering the relative strengths of the datasets: Todd's strong data on European and historical societies and Murdock's detailed observation for Africa and Asia.
This exercise provides scholars with a set of tools and data to further test and explore the role that different patterns of family organization play in determining current day development outcomes at a country-level. However, with new historical micro-datasets on the North Atlantic, Central and Eastern Europe, East Asia, etc. covering ever more periods and work being done on linking and harmonizing these datasets (Ruggles 2012), the logical next step is to start using regional and microlevel data to ask similar questions.

SUPPLEMENTARY DATA
Data is available from the authors and will be available from www.clio-infra.eu.

Funding
This work was supported by the Netherlands Organisation for Scientific .