Graphical Measures Summarizing the Inequality of Income of Two Groups

Abstract Recently, Gastwirth proposed two transformations and of the Lorenz curve, which calculates the proportion of a population, cumulated from the poorest or middle, respectively, needed to have the same amount of income as top . Economists and policy makers are often interested in the comparative status of two groups, for example, females versus males or minority versus majority. This article adapts and extends the concept underlying the and curves to provide analogous curves comparing the relative status of two groups. Now one calculates the proportion of the minority group, cumulated from the bottom or middle needed to have the same total income as the top qth fraction of the majority group (after adjusting for sample size). The areas between these curves and the line of equality are analogous to the Gini index. The methodology is used to illustrate the change in the degree of inequality between males and females, as well as between black and white males, in the United States between 2000 and 2017, and can be used to examine disparities between the expenditures on health of minorities and white people.


Introduction
Recently, Gastwirth (2016) proposed a transformation p * (q) of the Lorenz curve, which calculates the proportion of a population, cumulated from the poorest, needed to have the same amount of income as the top 100q%. Economists and policy makers are often interested in the comparative status of two groups, for example, females and males or minority and majority. This article adapts and extends the concept underlying the p * (q) curve to provide analogous curves comparing relative status of two groups. Because the number of minority individuals is usually less than the majority, the data for the smaller group are adjusted so the sample sizes are equal. The new curve b * (q), which adapts p * (q), is based on the fraction of the minority group cumulated from bottom needed to have the income as the top qth fraction of the majority. The area between the b * (q) and the line of equality is analogous to the Gini index. When the b * (q) curve is higher than the p * (q) curve for the majority group, the area between the curves can be considered as the additional inequality the minority group experiences relative to the majority group, compared to the inequality within the majority group.
Two related curves are based on the fraction of the minority group cumulated from the top or the middle needed to have the same income (adjusted for sample size) as the top 100q% of the majority are also explored. The methodology is used to illustrate the change in the degree of inequality of income between males and females, as well as between black and white males, in the United States between 2000 and 2017. The curves indicate that there has been a little progress in equalizing female and male incomes, primarily resulting from an increase in income for CONTACT  females in the upper portion of the income distribution. On the other hand, the curves and corresponding areas comparing the incomes of black males to white males, have hardly changed during the period.
Section 2 presents the formulas defining the original p * (q) curve and the extensions proposed here. Section 3 illustrates the proposed curves and their area-based measures, when applied to the Pareto distribution. Section 4 presents the results comparing male and female incomes and Section 5 presents the comparison of black and white male incomes. Unfortunately, the data reported by the Census Bureau for the incomes of black females were not sufficiently detailed, especially in the upper regions, to accurately calculate the curves for black and white females. Section 6 summarizes the results and makes suggestions for more informative summaries of income and earnings data. The section ends with a brief description of the applicability of the new measures to summarize data on health disparities.

The Graphical Measures
The most commonly used graphical measure of summarizing an income distribution is the Lorenz curve, defined mathematically as L(p) = μ −1 p 0 F −1 (t)dt (Gastwirth 1971;Cowell 2011). A related measure of income inequality is based on the fraction p * (q) of units, cumulated from the lowest, needed in order that their share of the total income equals 1 − L(1 − q), the share of the top 100q% was suggested by Gastwirth (2016). Thus, p * (q) is the value of p * for which L(p * ) = 1 − L(1 − q), or The curve 1 − L(1 − q) was introduced by Leimkuhler (1967) and its statistical properties were studied by Sarabia (2008) and Sarabia et al. (2010). Its relationship to Lorenz ordering and related literature is discussed by Arnold (2015, p.169). Arnold and Sarabia (2018, chap. 6) studied the relationship between the Lorenz and Leimkuler curves and other curves used to summarize income and earnings data. A related curve, based on the ratio [1 − L(1 − q)]/L(q) is described by Jasso (2018). When comparing two populations, say minority and majority, or females and males, the analogous measure is based on the fraction of the minority group needed to have the same total income as the top 100q% of the majority receive. Because the two groups usually are of different sizes, one needs to adjust the population size and hence total income of the smaller group (usually minority). Thus, one multiplies the number of minorities by the ratio, r, of the size of the majority to the size of the minority and assumes that the additional minorities have the same income distribution as in the original data. Thus, the distribution of income within the minority population remains the same but their total income is r times the original total. The curves will be described in terms of comparing female incomes to that of males, so μ f (μ m ) denotes the female (male) mean and L f (t) (L m (t))) the Lorenz curve of female (male) incomes.
Letting L f (p) be the Lorenz curve of the adjusted income data of the females and L m (p) be the Lorenz curve of the males, the analogue of the p * (q) curve is Formula (2) is a consequence of the fact that when the sample sizes are equal to N, b * (q) is determined from the requirement that μ f L f (b * (q)N, should equal the total income of the top 100q% of the majority, that is, If one cumulates the incomes of the minority group from the top, so t * (q) denotes the top fraction of the adjusted minority group that one needs to have the same income as the top 100q% of the majority, then Similarly, if one cumulates the adjusted female income distribution from the middle, then m * f (q) is defined by Equations (2) and (4) correspond to formulas (3) and (4) in Gastwirth (2016). The term 1 − L m (1 − q), the fraction of the total income of males that the top 100q% of them have, occurs in both Equations (3) and (4). Equation (3) gives the fraction t * (q) of the highest female incomes needed to have the same income as the top 100q% of males. In contrast, Equation (4) gives the middle fraction, m * (q) of females needed to have the same income as the top 100q% of males.
Analogous to the Gini index, twice the areas between the b * (q), m * f (q) and t * (q) curves and the line of equality will be referred to as the Income Shortfall Index (ISI) for each measure.
One way of illustrating the gap between female and male incomes is to compare the b * (q) curves giving the fraction of females needed to have the same total income as the top 100q% of males with the corresponding fraction, p * (q) of males, needed to have the same income as the top 100q% of males. If the inequality of male incomes is considered an approximation to the inherent variability in the skill and abilities of people, then the difference b * (q) − p * (q) and corresponding areas between the two curves is a measure of the excess shortfall in the incomes of females relative to male incomes, over the inherent distribution of abilities within each group.
While the above interpretation does not account for other forms of discrimination, for example, against a racial, ethnic or religious subgroup, within each of the male and female distributions, changes over time in the b * (q) − p * (q) curves should indicate whether females are making economic progress.

Measures for the Pareto Distribution
To illustrate the proposed curves and their area-based measures, they will be applied to the Pareto distribution, defined by This distribution (Arnold 2015) Suppose we set α m = 2 and α f = 3, then τ m = 1/2 and τ f = 2/3, and μ m = 2 and μ f = 3/2, and thus we have Note that 1− 4 3 q 1/2 = 0 when q = (3/4) 2 = 0.5625. This means that income of the top 56.25% of males equals the total income of all females. Similarly These two curves are illustrated in Figure 1. Figure 2 compares the b * (q) curve, which compares the bottom portion of females incomes to the top males, to the  corresponding p * (q) curve, which compares the bottom portion of male income to the top males.

Comparison of the Female-Male Income Disparities in 2000 and 2017
This section presents the various curves and associated ISI's comparing the incomes received by females to the incomes of males, using income reported in the Current Population Survey (CPS) conducted by the U.S. Census Bureau. Table 1 gives some summary measures (mean and quartiles) for men and women in 2000 and 2017. The Lorenz curves, which underlie the three recently proposed curves, were estimated from the publically available summary of the income data obtained in the CPS. This data reports both the number of individuals in each income interval and their average income, which provides more information than data without the group means, as noted by Krieger (1983) and Lyon et al. (2016), therefore, the split histogram technique of Cowell and Mehta (1982) was used. The split histogram method works as follows: suppose the income interval ($a, $b), contains a fraction γ of the sample, and their mean is m. The split histogram method divides the interval (a, b) into two sub-intervals (a, m) and (m, b), which contains the fractions γ b−m b−a and γ m−a b−a of the sample, respectively. The data within each sub-interval are assumed to follow a uniform distribution. Figures 3 and 4 show the b * , m * , and t * curves for 2000 and 2017, respectively. All three curves in Figure 3 reach 1 when q = 0.188, because the income received by the top 18.8% of males in 2000 equaled the total income received by all females. While one can distinguish the m * curve from the b * curve, they are quite similar; this is reflected by the values 0.954 and 0.913 of their respective ISIs. This demonstrates that the the middle portion of the female distribution did not fare much better than the lower portion relative to higher income males in 2000. The t * curve, which cumulates the incomes of females from the top, is somewhat closer to the line of equality than the b * and m * curves, but still reflects substantial inequality, with an ISI of 0.748. Figure 4 shows that in 2017 the total income received by females equaled that of the upper 23% of males. Comparing the ISIs derived from the b * , m * , and t * curves for the two years indicates that most of the increase in income received by females occured primarily in the upper region of their distribution, as the ISI of the t * curve declined from 0.748 to 0.679. This may be due to the increase in the proportion of females who continued their education, and the decline of blue collar jobs during the 2000-2017 period.
It is interesting to compare the changes in the b * (q) curves in 2000 and 2017 focusing on the status of females (cumulated from the bottom) to the top males to the corresponding     Figures 5 and  6, one can see that the "excess" inequality (shaded region) experienced by females relative to males decreased slightly (the  area of the "excess inequality" declined slightly from 0.109 to 0.093). This result is consistent with the decline in "blue collar" jobs, primarily employing males, that occurred during this time.

Comparison of the Black-White Male Income Disparities in 2000 and 2017
This section presents the various curves and associated ISIs comparing the incomes received by black males to the incomes of white males. Only men are considered here because the census data had too few black women in the upper region for reliable estimation of the Lorenz curve. For example, the mean of 11 out of the 17 intervals for incomes of at least $65,000 were not reported due to the small sample size. Figures 7 and  8 show the b * , m * , and t * for 2000 and 2017, respectively. These plots suggest that there has been very little change in the income received by black males relative to white males. In 2000 the total income of all black males equaled that of the top 25.5% of white males In 2017, however, the total income of black males declined  to 24.2% that of white males, which is reflected in the ISI's for the middle (m * ) and top (t * ) curves in Figures 7 and 8.

Summary and Discussion
Several authors (Gastwirth 1975;Divine et al. 2018) suggested that the Mann-Whitney-Wilcoxon probability that an observation from a random variable X is at least as large an observation from another random variable, Y, that is, P[X ≥ Y] be used to measure the relative status of two groups. The probability a randomly selected female had at least as much income as a randomly This indicates that there was virtually no improvement in the incomes of black men relative to white men during the period; which corresponds to the small changes in the ISI's. The measures developed here would be useful in assessing earning inequality. Earnings are the money received from work and is a component of income. Some data on earnings only refer to wages and salaries from an employer, while other data include self-employment income (Bureau of Labor Statistics 2021). The income of a household also includes payments from social security, public assistance, pensions and annuities, alimony, child support, unemployment insurance as well as interest and dividends. For families in the lower 50% of the distribution of wealth wages account for about 80% of their income, about 70% for families in the third quarter, but only 45% for families in the top 10%. In contrast, capital gains were 11.2% of the income of families in the top 10% of the wealth distribution but only 0.2% and 0.3% of the income received by families in the second and third quartiles of the wealth distribution (Federal Reserve Board 2020). Unfortunately, the largest interval in earnings data reported by the Census Bureau is $100,000+, and 7.2% of females and 16.0% of males fall in this category. Thus any analysis would rely on assumptions about the underlying distribution tail and within each interval and it would be difficult to estimate the effect of any such interpolation. Hopefully, the Bureau will report earnings in the same format as income in the future.
The focus of this article is on graphical representations of the inequality between groups, along with a summary measure based on the area between the new curves and the line of equality. In order to examine the main factors underlying changes in the income distribution illustrated in this article, more detailed data incorporating education are needed; see Blau andKahn (2000, 2017) and Goldin et al. (2017). The curves and measure discussed here can also be used to compare the income or earnings of subgroups of two populations that have similar education and occupation and then an overall summary measure can be obtained by a suitable weighted average of the subgroup measures, as in the combined Wilcoxon procedure (Oosterhoff 1969).
Combined Wilcoxon's tests are used to combine the analysis of stratified data, for example, income data by the educational level of the household's primary earner, and the Wilcoxon test is applied to the data in each strata. Van Elteren (1960) weighted the estimates of P(Y > X) obtained from the Mann-Whitney form of the test in the strata by the inverse of their variances (see Gastwirth 1988, p. 331 for an example). It is a powerful test when the values of P(Y > X) in each strata are similar and other procedures (Mehrota et al. 2010) are appropriate for other situations.
The concepts underlying the extension of the transformation of the Lorenz curve proposed by Gastwirth (2016) can be applied to the recently proposed measures of inequality by Prendergast and Staudte (2018). They consider the ratio of the bottom (p/2)th quantile of a distribution to the upper (100 − p/2)th quantile. The analogue of our b * would be the ratio of the (p/2)th quantile of the minority distribution to the upper (100− p/2)th quantile of the majority distribution. As in Section 3, the area between these curves and the original one for inequality within the majority measure the excess inequality.
While the proposed graphical methods were illustrated by studying income inequality, they also can be used to study health disparities between minority groups and the majority. For example, the b * , m * , and t * of Medicare expenditures of black or hispanic people versus white people to ascertain whether the money spent on the medical care of minorities is comparable to that of white people. There is some concern that minorities need to wait longer to receive treatment, especially in emergency rooms, or obtain an appointment than white people. In addition to the usual comparison of mean and median times, the data could be further examined by exchanging the role of black and white people in calculating the t * curve. Then, the area on which the ISI is based would represent the decreased time white people wait for appointments or treatment than black people. Comparing the b * curve and the corresponding ISI to the p * curve for black people alone, as in Figure 5, would also reflect how much faster white people receive treatment or appointments than black people.
When there are several groups, if there is a well-defined majority group one can use it as the basic comparator and calculate the b * , m * , and t * curves comparing their income or other variable to the basic one. Alternatively, one can follow the Uniform Guidelines (1978) approach to assessing an employment practice for a disparate impact and select the group with the highest average income as the basic comparator.