Better Off? Distributional Comparisons for Ordinal Data About Personal Well-Being

How to undertake distributional comparisons when personal well-being is measured using income is well-established. But what if personal well-being is measured using subjective well-being indicators such as life satisfaction or self-assessed health status? Has average well-being increased or well-being inequality decreased? How does the distribution of well-being in New Zealand compare with that in Australia, or between young and old people in New Zealand? This paper addresses questions such as these, stimulated by the increasing weight put on subjective well-being measures by international agencies such as the OECD and national governments including New Zealand's. The paper reviews the methods appropriate for distributional comparisons in the ordinal data context, comparing them with those routinely used for comparisons of income distributions. The methods are illustrated using data from the World Values Survey.


Introduction
How to undertake distributional comparisons when personal well-being is measured using income is well-established. But what if personal well-being is measured using subjective well-being indicators such as life satisfaction or self-assessed health status? Has average well-being increased, and what has happened to the inequality of well-being? How does the distribution of well-being in New Zealand compare with that in Australia, or between young and old people and men and women? This paper reviews and illustrates the methods for addressing questions such as these, stimulated by the increasing weight put on subjective well-being measures by international agencies such as the OECD and national governments including New Zealand's.
A catalyst for the new emphasis on subjective well-being was the Report by the Commission on the Measurement of Economic Performance and Social Progress (Stiglitz, Sen, and Fitoussi, 2009), which set out a comprehensive agenda for going 'Beyond GDP'.
The Report's Quality of Life sections emphasize that 'well-being is multidimensional ' (2009: 14), and that 'objective and subjective dimensions of well-being are both important ' (2009: 16). The OECD has played an important role in implementing the Report's recommendations in this area, launching its Better Life Initiative (in 2011), regularly reporting on well-being outcomes (How's Life; see e.g. OECD 2017), and developing the Better life Index and multiple online resources (see https://www.oecd.org/statistics/better-life-initiative.htm). In parallel, the national statistical agencies of OECD member countries have also introduced initiatives to address the Beyond GDP agenda, including a greater emphasis on collection of and reporting on subjective well-being data. For a recent review of initiatives to date, see the symposium on 'New measures of well-being: perspectives from statistical offices' in the March 2015 issue of the Review of Income and Wealth. Subjective well-being measures continue to be highlighted by high-level groups taking forward the Stiglitz-Sen-Fitoussi commission's work: see e.g. Stiglitz, Fitoussi, and Durand (2018, especially section 3.7). New Zealand has embraced the Beyond GDP agenda too. New Zealand Treasury began developing its Living Standards Framework in 2011 and has continued to refine it, drawing heavily on the OECD's Better Life approach (New Zealand Treasury 2018a). Again, the motivation is that there is 'more to wellbeing than just a healthy economy' (New Zealand Treasury 2018b). New Zealand Treasury assesses current well-being across twelve domains of people's lives, ranging from their civic engagement and governance through to the quality and quantity of their leisure and recreation time, including income and consumption and also subjective well-being. A Living Standards Dashboard, consistent with the OECD's dashboard, summarises the twelve domains using a large number of indicators, including life satisfaction and self-assessed health status. For a recent overview of New Zealand's situation relative to other OECD countries using this approach, see OECD (2017). More recently, Statistics New Zealand has been developing a new set of indicators for measuring social progress. The 'Indicators Aotearoa New Zealand' are intended to support various crossgovernment initiatives and international reporting, and to align with those in the Treasury's Living Standards Framework and the UN's Sustainable Development Goals (Statistics New Zealand 2019).
These sets of indicators are not only for ex post monitoring of social progress but also for improving strategic decision-making in the realms of economic and social policy (Statistics New Zealand 2018). The aim is to have an integrated and coordinated approach linking government (setting priorities and making policy decisions) and the Treasury and Statistics New Zealand (providing analysis and advice, information and data). This brings us to New Zealand's first 'Well-Being Budget'. According to the Minister of Finance, [this] is a new approach to how government works, … placing the wellbeing of New Zealanders at the heart of what we do … This approach represents a significant departure from the status quo. Budgets have traditionally focused on a limited set of economic data. Success has been declared on the basis of a narrow range of indicators, like GDP growth. … Many countries around the world have begun to look at different ways of measuring success to better reflect the wellbeing of their people. This Budget goes further and puts wellbeing at the heart of everything we do. To set the priorities for this Budget, we used evidence and expert advice to tell us where we could make the greatest difference to the wellbeing of New Zealanders (Robertson 2019: 3).
In a similar spirit but for the UK, Frijters and Krekel (2019) provide a detailed handbook on how to design policies to maximize subjective well-being.
The discussion so far demonstrates the substantial role that non-traditional indicators of personal well-being are now playing in assessments of social progress and the design of socio-economic policy. However, to date, the dominant approach to summarizing the distribution of well-being in a given year, and tracking changes over time or differences across groups, has examined distributions of income, rather than (say) distributions of subjective well-being (e.g. measures of life satisfaction or self-assessed health status).
Leading examples of the income-based approach are the annual reports on Household Incomes from New Zealand's Ministry of Social Development (Perry 2018), the UK's Households Below Average Incomes (Department for Work and Pensions 2019), and the US's Income and Poverty in the United States (Semenga et al. 2019). Since its landmark report Growing Unequal (2008), the OECD has also provided frequent reports on the distributions of income within and between member countries, and an income distribution database (https://www.oecd.org/social/inequality-and-poverty.htm).
Against this background, this paper is based on two observations and a question. First, income is a cardinal variable, and many of the most commonly-used indicators of subjective well-being, including life satisfaction and self-assessed health status, are ordinal variables.
Second, one cannot undertake distributional comparisons for ordinal variables using the same methods as are routinely applied to cardinal variables. This raises the question: are there analogous methods for ordinal indicators? I show that the answer is 'yes', broadly speaking, and illustrate the methods using data from the World Values Survey. Before proceeding to address the main question, I elaborate on the first two observations. Section 2 explains the distinction between cardinal and ordinal variables. Section 3 reviews the standard and well-known methods for undertaking distributional comparisons of income, a cardinal variable, considering both income levels and inequality. It is analogues of these that we require for distributional comparisons of ordinal data.
I provide a non-technical review of methods for distributional comparisons of ordinal variables in Section 4 and, in Section 5, I apply the methods to compare life satisfaction levels and life satisfaction inequality in New Zealand and four other countries (Australia, the United Kingdom, the USA, and South Africa) in the mid-2000s. I also consider how the distribution of life satisfaction in New Zealand has developed over time (I compare 1998, 2004, and 2011), and how life satisfaction distributions in 2004 differ between men and women, and by age group (Section 6). For brevity, I summarize the findings on these aspects in the main text with the charts summarizing the results available in Appendices A and B.

Cardinal versus ordinal variables (e.g. income versus life satisfaction)
Cardinal variables refer to quantities or amounts. Income is an example of such a variable; so too is financial wealth, or consumption measured using household expenditure. If we have a distribution of a cardinal variable for a set of individuals, we can order those individuals according to that variable from lowest to highest. In addition, differences between values, and ratios of values are well-defined (which also means that a value of zero is a true zero). It is meaningful to refer to a $100 difference in income or to say that John's income is twice as large as Janet's. Thus, income is measurable on both an interval and ratio scale.
Ordinal variables encapsulate people's choices from a set of categories with a welldefined ordering. Life satisfaction is a leading example. In the World Values Survey (WVS), respondents are asked 'All things considered, how satisfied are you with your life as a whole these days? Using this card on which 1 means "completely dissatisfied" and 10 means you are "completely satisfied" where would you put your satisfaction with your life as a whole?'.
The card looks like this:

Completely dissatisfied
Completely satisfied 1 2 3 4 5 6 7 8 9 10 With a distribution of life satisfaction responses, we can order the individuals in terms of their response from lowest to highest. However, differences in response values, and their ratios, are not well-defined. We cannot say that the difference in life satisfaction between 1 and 2 is the same as the difference between 8 and 9, or that a response of 8 corresponds to four times as great a life satisfaction as a response of 2, only that a value of 9 is greater than 8 and 2 is greater than 1.
The fundamental measurement issue for ordinal variables is that the labelling of the response categories is arbitrary. We do not know the underlying life satisfaction scale. The categories could be relabelled (-5, -4, -3, -2, -1, 0, +1, +2, +3, +4) or (1,2,4,5,7,8,10,11,13,14) and there would be the same information about how a respondent places him-or herself across the 10 categories. or South Africans, or men and women. This is the assumption of a 'common reporting function' and its plausibility may be questioned (Bond and Lang 2019). For example, New Zealand and South African respondents may have different reporting functions because of the substantial differences between their two societies, but the assumption is more plausible if comparing respondents from a single country to surveys fielded not too many years apart, or if comparing groups of respondents to a survey in a given year. In this paper, I assume common reporting functions, following virtually all the literature on subjective well-being.
Readers should also be aware that analogous issues of comparability apply to 'income'. If we are interested in 'personal economic well-being', the appropriate measure of 'income' is not the raw response from the data source. Researchers and statistical agencies make assumptions about how to: (i) aggregate income sources across the individuals within an income unit (typically the household or the family); (ii) adjust observed money incomes for differences in household/family size and composition ($2000 per month for a single person living alone is worth a lot more in living standards terms than $2000 for a family of four); and (iii) adjust observed money incomes for differences in price levels across regions with a given country, across years (for temporal comparisons), or across countries (for crossnational comparisons). In other words, distributional comparisons are based on income variables that are transformed from the raw data using assumptions about aggregation, equivalence scales, within-country price differences, inflation rates, and market or purchasing power exchange rates, and these choices are not simply objective scientific ones but incorporate normative elements. There are extensive literatures about the sensitivity of income distribution comparisons to different choices under these headings. In sum, although cross-individual comparability is often cited as an issue for subjective well-being comparisons, the issue also applies to income comparisons. In the next section, I assume that appropriate adjustments to observed income values have already been made, and simply refer to 'income'.

Assessing better-offness: a toolbox of methods for income distributions
The two main distributional features relevant to assessing whether we are better off are wellbeing levels and well-being inequality. Better off means having higher well-being levels or less inequality, or potentially some combination of the two if we are willing to trade-off improvements in one dimension against deterioration in another. It is helpful to think of assessments being undertaken using a 'social welfare function' that aggregates the information in a distribution of well-being to a single number. That is, we employ a social welfare function W = W(x1, x2, x3, …, xn) = W(x) and distribution x is judged better than distribution y if W(x) > W(y). Individual-level outcomes x and y may refer to either income or to a subjective well-being measure.
The income distribution literature has developed two main approaches to assessing better-offness. The first consists of 'dominance' checks according to which we see whether we can rank two distributions A and B. The advantage of this strategy is that we may be able to say that A is better than B (A dominates B) without strong assumptions about the nature of the social welfare function. The weaker the assumptions, the less likely that there is room for disagreement about the overall assessment. The downside of relying on dominance checks is that we may not be able to derive a clear-cut ranking. However, regardless of whether dominance holds, the checks are useful because they involve graphical comparisons, and these pictures are powerful ways of summarising the distributions -they 'show the data'.
The second approach is to summarise distributions using numerical indices. Indices encapsulate stronger and specific assumptions about the nature of the social welfare function -about which there may be disagreement -but they have the powerful advantage that we can not only rank distributions (regardless of whether dominance holds), but also talk about magnitudes of difference. (For example, we are often interested not only in whether inequality increased or decreased but by how much.) Of course, the two approaches are complementary and analysts often employ both. In addition, it is also useful to look at density functions as well as cumulative density functions. Although there are no dominance results associated with the density functions, they are useful tools for 'showing the data' in a compact form.
For a more detailed background to the results that I review in the rest of this section, as well as references to the original literature and additional results, see inter alia Foster et al. (2013). For an earlier survey of the same material, see Jenkins (1991).

Three dominance results and many indices
There are three main dominance results employed in income distribution analysis. The first is If this first order dominance holds, then at any specific value of p, quantile xp = F -1 (p) is larger in B than in A. It is also the case that the mean of distribution A is greater than the mean of distribution B.
The second result is Lorenz dominance and relates to inequality comparisons. A Lorenz curve plots cumulative income shares against cumulative population shares (p). The perfect equality reference point is when the Lorenz curve is a 45° ray through the origin: everyone has the same share of total income; each person's income is the mean income. With inequality, the Lorenz curve hangs below the 45° line: the poorest 20% have less than one fifth of total income and so on. Intuitively, inequality is greater, the further the Lorenz curve is from the 45° ray. There is also a fundamental link between mean-preserving progressive transfers of income and inequality reduction. If a small amount of income is transferred from person R to person P, where R's income is larger than P's, and no other incomes are changed, then the Lorenz curve must move inwards towards the perfect equality reference point over the range of p between P and R. Atkinson's (1970) Lorenz dominance result formalizes these ideas: finding the Lorenz curve for distribution B lying everywhere below the Lorenz curve for distribution A is equivalent to finding that A has less inequality than B according to all social welfare functions that reflect 'equality preference' (and 'more is better'). Equality preference is the idea that a mean-preserving progressive income transfer as described above improves social welfare -though the magnitude of the change is not stated, thus allowing a wide range of attitudes to be incorporated from almost totally inegalitarian to strongly egalitarian views. In fact, we can go from the Lorenz dominated distribution to the dominating one by a finite sequence of mean-preserving progressive transfers. The assumption that 'more is better' is maintained but differences in mean income are controlled for: Lorenz curves are based on income shares rather than income levels.
The third result is Generalized Lorenz dominance and relates to 'social welfare' by taking account of both income levels and income inequality. A Generalized Lorenz curve is derived from a Lorenz curve by scaling each Lorenz ordinate vertically by overall mean income. Equivalently, the vertical axis of the Generalized Curve plots against cumulative population share p, the mean income among the poorest 100p% multiplied by p. The curvature inherited from the Lorenz curve reflects inequality; the height reflects income levels (note that the Generalized Lorenz curve value at p = 1 equals mean income).
Shorrocks ' (1983) Generalized Lorenz dominance result states that finding the Generalized Lorenz curve for distribution B lying everywhere below the Generalized Lorenz curve for distribution A is equivalent to finding that B has less social welfare than A according to all social welfare functions that reflect 'more is better' and 'equality preference' -we are concerned about both the overall size of the economic pie as well as the size of each of the slices going to individuals. The Generalized Lorenz criterion encapsulates the idea that a society may be better off if a rise in inequality is accompanied by a sufficiently large increase in income levels. In cases where two Generalized Lorenz curves do not cross, we can go from the dominated distribution to the dominating one by a sequence of either meanpreserving progressive transfers or increments to individuals' income levels or a combination of both.
If we do not find dominance in a distributional comparison, we cannot order the distributions without additional assumptions about nature of the social welfare function. A numerical index reflects these specifically. Even if two CDFs cross, we can compare mean incomes (the arithmetic mean is one index of well-being levels; the geometric mean is another). If two Lorenz curves cross, we can assess inequality by comparing inequality indices. A leading example is the Gini coefficient which is equal to twice the area between the Lorenz curve and the line of perfect inequality.
The choice of index is a normative issue since different indices aggregate income differences in different ranges of the distribution in different ways. For this reason, economists are particularly fond of using families of inequality indices in which this sensitivity to differences in income shares in different ranges is varied in a systematic way.
For example, members of the Atkinson family of inequality indices are characterized by an 'inequality aversion' parameter ε > 0, The larger is ε, the greater the inequality aversion -the greater is equality preference -and the greater the weight that is put by the index on income differences at the bottom of the income distribution. Generalized Entropy indices are the other leading family of inequality indices and are characterized by an income sensitivity parameter α. With α = 0, we have the Mean Logarithmic Deviation; with α = 1, we have the Theil index, and setting α = 2 yields half the squared coefficient of variation. The smaller (and more negative) α is, the more 'bottom sensitive' the index (corresponding to larger values of ε for Atkinson indices). The larger α is, the more sensitive is the index to income differences at the top of the income distribution.
In sum, the income distribution toolbox consists of three sorts of dominance check and associated graphical device, and many indices incorporating different social judgements about how to aggregate incomes and income gaps. These basic tools are used routinely by income distribution researchers. Not all of the tools are used by statistical agencies but virtually all of the agencies (including the New Zealand, UK and US ones cited earlier) report estimates or provide access to raw data in a form that allow other researchers to apply the tools.

Assessing better-offness: a toolbox of methods for distributional comparisons of ordinal variables (e.g. life satisfaction)
The vast majority of users of subjective well-being data treat their well-being variables as cardinal rather than ordinal. For distributional comparisons, this has a great advantage: one can simply apply the toolbox described in the previous section. A leading example of current practice is the World Happiness Report which considers trends over time within countries and cross-national differences in subjective well-being measured variously in terms of life evaluation (scored using a Cantril ladder) and positive and negative affect. Helliwell et al.
(2019) summarize well-being levels using means and well-being inequalities using standard deviations. Similarly, a major study of subjective well-being trends in the UK summarizes trends in four subjective well-being measures in terms of their means (Bangham 2019: Figure  11). The OECD (2017) How's Life report examines cross-national differences in life satisfaction by comparing country means and country inequalities summarized by S80/S20 indices -the average score of the top 20% of scores divided by the average score of those in the bottom 20% of scores.
The fundamental problems with using the mean to summarize the distribution of an ordinal variable are that the mean is not a stable reference point and the ranking of a pair of distributions can change if one changes the scale. The arguments are set out clearly by Allison and Foster (2004). See also Schröder and Yitzhaki (2018) and Bond and Lang (2019).
For a cardinal variable like income, the mean is a stable reference point but, for ordinal variables, any transformation of the numerical category labels that preserves their ordering is legitimate. With these changes in scale, the mean's location relative to the distribution as a whole or a subset of observations can vary a lot. Allison and Foster (2004: 510) illustrate this problem with reference to a distribution of 17 responses on a five-category variable represented by (1,3,3,6,4), meaning 1 observation reporting the lowest category through to 4 observations reporting the fifth (top) category. If the scale is linear with labels taking integer values from 1 to 5, the distribution has a mean of 3.53, located between 7 th lowest and 8 th lowest of the 17 observations. But if the scale is changed so that the label of the top category is 8 rather than 5, the mean is 4.24, lying between 13 th and 14 th observations. To illustrate how distributional rankings according to the mean may change if the scale is changed, Allison and Foster consider distributions x = (2,2,2,2,2) and y = (3,2,1,1,3).
With the same integer linear scale as before, running from 1 to 5, the means of x and y are 3 and 2.9 respectively. But if the numerical label attached to the top category is changed to 10, the means of x and y are 4 and 4.4. In other words, the distributional ordering is reversed.
This illustration is not a theoretical curiosum: Allison and Foster (2004) provide a real-world example using self-assessed health data.
Rankings of distributions of ordinal variables based on comparisons of standard inequality indices -the commonly-used ones cited in the previous section -are also not robust to changes of scale. This is because all the indices are based on ratios of each observation's well-being score to the mean score (this is why they are sometimes called 'relative' indices). The OECD's S80/S20 index is also not robust for the same reason. The same problem rules out comparisons based on Lorenz and Generalized Lorenz curves.
The standard deviation of subjective well-being scores is often used as an inequality measure (as by the World Happiness Report op. cit.). Inequality orderings based on this measure, and other 'absolute' indices such as the absolute Gini coefficient or inter-quartile range, are robust to absolute changes in scale, e.g. if the five-category linear scale cited earlier with labels running from 1 to 5 were changed to a scale running from 3 to 7. However, there is no reason to support the legitimacy of this sort of scale change and rule out other transformations -an issue not considered by Kalmijn and Veenhoven (2005) in their defence of the standard deviation as an inequality measure for ordinal variables.
The arguments made so far imply that distributional comparisons of ordinal variables need new tools; the toolbox for cardinal variables should not be applied without adaption or modification.

Dominance results and indices
For dominance results relating to subjective well-being levels, comparisons of CDFs continue to play a role, though they now refer to CDFs for discrete distributions (and hence step functions) rather than continuous distributions. Although comparisons of mean subjective well-being are not robust to scale changes, comparisons of quantiles are. The category that is the median, for instance, remains the median if the scale is changed. One can also compare proportions with responses in a number of categories, for instance in the lowest category or the top two categories (this corresponds to looking at the inverse of the CDF). For example, the UK's Office for National Statistics (2018) report on subjective well-being summarizes trends in distributions over time with reference to the proportion of persons with low wellbeing (scores of 0 to 4 on an 11-point integer scale running from 0 to 10), and the proportion with very high well-being (scores of 9 or 10).
The robustness of a distributional ordering by the mean holds only in the special case in which two well-being CDFs do not cross. Allison and Foster (2004 In the conventional Lorenz-based approach to inequality, 'more unequal' is associated with a greater spread around the mean. This concept is inappropriate for ordinal data because the mean is scale-dependent. In contrast, the median is a natural reference point because its In terms of CDFs, the CDF for A is above (or nowhere below) the CDF for B up to the common median, and then is below (or nowhere above) the CDF for B above the median through to the top category. Allison and Foster (2004) show how one can construct an 'S-curve' from the CDF in order to implement dominance checks but, in practice, it is straightforward to compare CDFs to check Sdominance. If there is S-dominance, the CDFs for A and B must cross each other (once), and so, if there is F-dominance, there cannot be S-dominance as well.
The 'greater spread around the median' idea leads naturally to numerical indices summarizing the degree of inequality (polarization). Allison and Foster (2004), and later Dutta and Foster (2013), propose a measure based on areas under an S-curve. Specifically, the AF index is the difference between the mean response for above-median categories minus the mean response for below-median categories. Although it has an intuitive interpretation, AF is scale-dependent.
Scale-independent indices have been developed by Abul Naga and Yalcin (2008) Other families of inequality/polarization indices have been proposed by, for example, Apouey (2007), whose measures summarize 'distances' between the CDF and one half (the CDF value at the median) across the categories of the well-being variable. The greater the aggregate distance, the greater the spread around the median, with different distance concepts leading to different indices. Apouey's P2(0.5) and P2(2) indices incorporate 'square root' and Euclidean distance, respectively. P2(2) is the 1 -l 2 index of Blair and Lacy (2000). P2(1) is the Average Jump index. In the empirical analysis that follows, I estimate P2(0.5) and P2 (2) in addition to ANY(1,1), ANY(1,2), and ANY(1,4) indices.
A totally different approach to measuring inequality in ordinal variables is taken by Cowell and Flachaire (2017 A fresh look at dominance results for ordinal variables is taken by Gravel et al. (2015) using the concept of a Hammond transfer. A progressive Hammond transfer is one in which there is a small shift in the frequency density mass downwards from some higher category combined with a small shift in frequency density mass upwards from some lower category, where the 'recipient' category in the former is less than or the same as the 'recipient' category in the latter. This is analogous to the mean-preserving progressive transfer concept  I therefore report Hdominance checks below in addition to H + dominance checks.
An advantage of the Gravel et al. (2015) approach to ordinal distributional comparisons is that the dominance checks are not restricted to distributions with a common median as is the case for Allison and Foster's (2004) S-dominance criterion. This is also true for a CF(γ) dominance check but the relationship between this and Gravel et al.'s (2015) dual-H dominance is yet to be ascertained. Observe, finally, that H + dominance is a form of secondorder stochastic dominance and so, if there is F-dominance, there must also be H + dominance.
In sum, I have shown that there is a toolbox for distributional comparisons of ordinal variables that is analogous to the more widely-known and commonly-used toolbox for income distribution comparisons. Both toolboxes consist of dominance checks and associated graphical devices, and also indices incorporating different social judgements about to weight low values relative to high values of the well-being variable. The next Section illustrates the toolbox in action. Zealand. Unfortunately -and unnecessarily -access to confidential unit record files from other New Zealand data sources on subjective well-being is currently restricted to residents with access to a Statistics New Zealand-approved datalab.

Comparisons of life satisfaction distributions: New
The WVS consists of nationally representative surveys in almost 100 countries using a common questionnaire that includes questions on a range of subjective well-being variables.  Figure 1 shows the proportions of individuals in each of the ten life satisfaction categories, country by country. Australia, Britain, and the USA appear to have similarlyshaped distributions with relative frequencies rising toward the modal value (8) and then decreasing thereafter. The mode is also 8 for New Zealand and South Africa, but these two countries stand out for the relatively high fractions of individuals reporting that they are 'completely satisfied' (score of 10): 22.0% and 17.5% respectively, by contrast with 10.9% for Britain, 8.2% for Australia, and only 6.7% for the USA. South Africa is also distinctive for having relatively high fractions of individuals reporting very low life satisfaction, with almost 6% in the lowest two categories, whereas the fraction is less than half that for the other four countries (highest for Australia at 2.6%). The country means are, given the 1-10 scale, 7.9 (NZ), 7.3 (AU), 7.6 (GB), 7.3 (US), and 7.0 (ZA). All the pairwise differences in means are statistically significant with the exception of the AU-US comparison.  Figure 2) and therefore we also see that New Zealand H + dominates each of these countries. New Zealand's curve lies everywhere on or below that for these three. However, New Zealand's H + curve lies everywhere on or above that for Britain. Although there was no first-order dominance (F-dominance), Britain's distribution second-order dominates New Zealand's. Again the relatively small differences between the pairs of curves compares warns us that differences may not be statistically significant.
For 'pure' inequality dominance according to Gravel et al.'s (2015) results, we require the dual dominance criterion to hold, i.e. to have both Hand H + dominance. In Figure (1,4) is more sensitive to differences below the median than ANY(1,1). According to the former, life satisfaction inequality is 41% higher in South Africa than in New Zealand and 35% higher than in Britain. These cross-country differentials are slightly smaller than the corresponding ones for ANY(1,1). Going from ANY(1,1) to ANY(1,4) makes more of a difference when summarizing how much greater New Zealand's inequality is than Australia's: for the former index, the ratio of country estimates is 1.06; for the latter, it is 1.17. (Look at Figure 5 to see this too.) Changing the index may also change the ordering of the countries in terms of life satisfaction inequality. Figure 5 shows, and as we would expect from the dominance analysis, the differences in country rankings largely refer to the ordering of Australia and the USA relative to each other. For seven of the indices, inequality is very slightly greater in the USA; for P2(2) and the standard deviation, US inequality is slightly lower than Australia's.
<Figure 5 near here> However, the confidence intervals shown in Figure 5 remind us to be cautious about making inferences about cross-national differences. For example, the null hypothesis that inequality is the same in Australia and the USA cannot be rejected for CF(0) (test t-statistic 0.37, which is very much less than a critical value of 2), P2(2) (t-statistic 0.98), or the standard deviation (t-statistic 1.55). On the other hand, the differences between South Africa and each of the other four countries are statistically significant. This is not only because estimated inequality is so large for South Africa, but also because the sample size for that country is much larger than for the others and so estimates are more precise (confidence intervals narrower). For example, the null hypothesis that life satisfaction inequality is the same in New Zealand and South Africa is decisively rejected. The test t-statistics are: 5.1 for CF(0), 9.9 for ANY(1,1), 4.9 for ANY(1,4), and 9.2 for P2(2), for instance.
In sum, there is one clear similarity between the differences in the distributions of life satisfaction and of income across the five countries: for both measures of well-being, South Africa has the lowest average well-being and the highest inequality. Otherwise, however, rankings of the other four countries are not the same. For example, by contrast with the rankings shown in Table 1, it is not the USA that has the highest average life satisfaction or the second-highest level of life satisfaction inequality. And, compared with the USA, New Zealand has higher levels of average life satisfaction and higher life satisfaction inequality.

Life satisfaction in New Zealand: trends over time and differences across groups
The toolbox of methods for ordinal data illustrated in Section 5 can also be applied to other types of comparisons. One can address questions such as whether the distribution of life satisfaction has improved over time and how life satisfaction distributions differ between groups within a given population. Differences by sex and by age are common indicators of stratification in a society. Differences by ethnic background are also important but they cannot be examined using these WVS data.
Using For brevity, all the charts summarizing the comparisons undertaken -in the same formats as those discussed in Section 5 -are reported in two Appendices (A for trends over time; B for differences across groups). In this section, I simply highlight the main findings without citing the charts on which they are based. However, the differences between the relevant H curves are very small once again.
Estimates of the nine equality indices used earlier confirm that inequality is lower in 2011 than 1998, though the estimated decrease over time depends on which specific index is used. For example, according to CF(0) the decrease is less than 1% (from 0.794 to 0.792).
The decrease registered by P2(2) is one of the largest but, at around 5%, is still small.
Inequality in 2004 is estimated to be lower than in 1998 according to all nine indices, but whether it is larger or smaller than 2011 depends on which index is employed. For six indices, 2004 has lower inequality but, for the two bottom-sensitive indices ANY(1,2) and ANY (1,4) and also P2(0.5), inequality is higher. Again, the changes over time are small and none of the pairwise differences between years is statistically significant.
Overall, the data suggest that there was both a fall in average life satisfaction and in inequality between 1998 and 2011. These changes contrast with the rise in mean and median equivalized household income over the same period and relatively unchanged income inequality for the same period that are reported by Perry (2018). However, one cannot state with confidence that the changes in the distribution of life satisfaction are statistically significant.

Differences across groups: life satisfaction in 2004, by sex and by age group
Although differences by sex and by age are of great interest, it is difficult to draw robust conclusions about life satisfaction differences using WVS data. As before, the issue is one of statistical rather than substantive significance. This is particularly pertinent when looking at subgroup differences because WVS sample sizes for subgroups are small, especially for the age breakdowns. Indeed, breakdowns by sex and age combined, or other interesting combinations of characteristics, all of which lead to even smaller sample sizes, are largely worthless.
The frequency distributions for men and women look different. Although the medians are the same (8) and the means very similar (7.92 and 7.87 respectively), the spread across the top four categories differs: in particular, relatively more women than men report 10 rather than 9. Comparisons of CDFs reveal neither F-nor S-dominance. The Generalized Lorenz curve for women's life satisfaction status lies everywhere on or below that for men, so inequality is greater for women according to all CF(γ) indices, but the differences between the curves are negligible. The higher inequality among women conclusion is also supported by the comparisons of H curves: there is dual dominance, though it should be noted that the Hcurves largely coincide. The estimates of all nine inequality indices are larger for women than men but differences are not statistically significant from zero. Among the largest differences are those registered by P2(2) and ANY (1,4), 10% and 8% respectively, but the test t-statistics are only 1.7 and 1.9 respectively.
Overall, one cannot state with confidence that the life satisfaction distributions of New Zealand men and women differ in 2004, which is a positive conclusion if the social goal is to eliminate gender divides in well-being. Note also that life satisfaction questions provide an individual-based perspective that conventional income distribution analysis is unable to do because of the standard assumption that income is equally distributed within families or households.
There appear to be some remarkable differences across age groups in life satisfaction but robust conclusions about these differences are bedevilled by small sample sizes. The outstanding feature of the subgroup frequency distributions is the extremely large fraction of individuals aged 60+ reporting that they are completely satisfied (a score of 10), 36%, which is around twice the fraction for the other two groups -16% for individuals aged 30-59 and 17% for individuals aged 18-29. Related, the median life satisfaction score for the 60+ group Overall, it appears that the prevalence of extremely high life satisfaction levels in New Zealand (in 2004) is much greater for individuals aged 60+ than for younger people, but robust conclusions about inequality differences across age groups cannot be made because of small sample sizes.

Summary and conclusions
I have argued that we should compare distributions of ordinal data using methods appropriate to such data, rather than applying methods developed for cardinal data as is commonly done.
That the application of cardinal methods sometimes yields the same substantive conclusion is not a strong argument against my injunction because there can also be situations in which different methods lead to different findings and, in any case, there is now a well-developed toolbox for undertaking distributional comparisons of subjective well-being data. As I have shown, easily-applicable tools are available to 'show the data' and to compare well-being levels and inequality using dominance checks and indices.
Areas for further research on aspects of the subjective well-being toolbox remain of course. One example is the relationship between H curve dual dominance and rankings by Cowell-Flachaire (2017) inequality indices. Another example is the development of appropriate methods for statistical inference, accompanied by software, enabling practitioners to implement the methods routinely.
Better methods also need to be accompanied by better data. My empirical illustrations based on WVS data have thrown up some intriguing results about how life satisfaction in New Zealand compares with other countries, how it has changed over time, and differs by sex and age. These perspectives on the nation's well-being differ in several ways from those provided when well-being is measured using income. However, it is also the case that the small sample sizes available in the WVS substantially constrain our ability to conclude whether a finding is statistically significant in addition to being substantively significant. This is particularly so for subgroup breakdowns. Access to the unit record data from these data sources should be improved in order to maximize the returns to the large investments that they represent. At present, researchers based outside New Zealand cannot access the confidential unit record files and New Zealandbased researchers can only do so from within Statistics New Zealand-approved datalabs. I support Statistics New Zealand's goals of preserving respondent confidentiality and privacy.
Yet, at the same time, experience elsewhere shows that it is possible to provide better data access while still meeting these goals. Given the priority that New Zealand is placing on information about subjective well-being nowadays, it is important to increase the amount of data analysis undertaken on the topic. Facilitating an increase in the pool of analysts is one important way to do this. Notes. Weighted estimates from WVS data. The country means are: 7.9 (NZ), 7.3 (AU), 7.6 (GB), 7.3 (US), and 7.0 (ZA). The median is 8 for all countries except ZA (7).   Appendices-5 Notes. The median is 8 and the mean is 7.9 for both men and women.