A new approach to estimate household food demand with panel data

ABSTRACT This article develops a model of food demand in which the quality and quantity of food purchased and the inter-purchase time are determined simultaneously. We use this model to explore the relationship between a household’s cereal purchases and its demographic variables. Households eligible to participate in the U.S. Government Special Supplemental Nutrition Program for Women, Infants, and Children are found to purchase a larger quantity of cereals when making a purchase and also buy more often. The model is estimated using data from Information Resources Inc.’s National Consumer Panel. These data are heavily censored at zero. The traditional approach for working with such data in demand estimation usually involves accounting for the missing unit value for non-purchase occasions and the evaluation of multivariate probabilities. However, our methodology overcomes these problems by modeling the inter-purchase time rather than modeling whether or not a purchase is made in a given time period.


Introduction
Researchers studying household food demand have been increasingly able to take advantage of panel data collected by companies like Information Resources Inc. (IRI) and Nielsen, in which daily grocery purchases of thousands of households are observed over months or years. In addition to the prices paid and quantities purchased which are the focus of traditional cross-sectional demand analyses, these data make it possible to observe how frequently households buy particular types of foods like cereals, meats, dairy products, fruits, and vegetables, among others. The time period between purchases in a given category may be short or long, depending upon perishability and the quantity previously bought. For example, a household may stock up on non-perishable products when they are on sale, thereby extending the amount of time until the household needs to shop for this product again. In addition, higher-than-expected prices may also cause households to postpone a purchase. Such scenarios can lead to many households reporting zero-valued food purchases during a given time period (say a week).
Demand estimation with the above described and similar data has traditionally involved accounting for the missing unit value for non-purchase occasions and the evaluation of multivariate probabilities if temporal linkages exist among purchased food quantities over time (Schmit, Gould, Dong, Kaiser, & Chung, 2003;Zeger & Brookmeyer, 1986). Temporal linkages are not uncommon because of phenomena, such as quantity stock-up effects, that cause a household's food purchases during one-time period to be correlated with those in a previous time period. The estimation of this type of model can be very complex.
In this study, we propose a three-equation model to study household demand for a single food product with panel data. This model describes the quality, quantity and frequency at which a considered food product is bought. The first equation models the quality of a household's purchases represented by the unit value paid as in Deaton (1990). The second and third equations specify the quantity of a household's purchases and the number of purchase periods (say weeks) lapsed since the last purchase was made (inter-purchase time), respectively. There is no need to impute missing unit values for non-purchase occasions. There is also no need to evaluate the multivariate probability that a household made no purchases during each of many different time periods. These complexities are avoided by modeling the inter-purchase time rather than modeling whether or not a purchase is made in a given time period.
Data from IRI's 2012 National Consumer Panel are used to estimate the proposed econometric model for household cold cereal purchases as the example. In essence, this data set comprises demographic characteristics and weekly purchases of food for at-home consumption by households recruited by IRI to participate for the entire year of 2012. Available information includes the date of their purchases, total expenditures, food quantities, and product descriptions. Each household's weekly purchases of cold cereals are considered for the studied 53 weeks in 2012. We focus on how household variables affect the quality of the cereals purchased and, in turn, whether differences in purchase quality are responsible for differences in quantities purchased across households and, in turn again, how differences in quality and quantity purchased affect inter-purchase time. Included among our independent variables are household size, income, and other demographic characteristics. Of special interest in this study is a binary variable for whether a household may be eligible to participate in the US Department of Agriculture's Special Supplemental Nutrition Program for Women, Infants, and Children (WIC). The WIC Program is the third largest food assistance program in the United States. It provides 36 ounces of ready-to-eat breakfast cereal (among other food products) at no cost to participating households. Understanding the cereal buying habits of both participants and potential participants may help program managers to better serve the target population and maintain high rates of satisfaction. Thus, the WIC-eligible population represents an important component of the cereal market and focusing on them may yield findings of particular economic and policy significance.

Model of household purchase behavior using household panel data
Panel data, such as IRI's National Consumer Panel, provide information for each surveyed household on the quality (unit value paid) and quantity of foods purchased over time. However, a challenge with using such data for demand analysis is dealing with zero purchase observations. Traditional censored models originated by Tobin (1958) and later developed by Heckman (1979) for pure cross-sectional data are applied and developed by many researchers. These models focus on each possible purchase occasion, say every week, and observe the purchase quantity if the household purchased, or calculate the probability of a zero purchase if the household did not buy the food in question. This approach implicitly assumes that households make a decision on whether to buy at each hypothetical purchase occasion (say every week). Using this approach with panel data involves dealing with missing unit values for zero purchase occasions and the evaluation of multivariate probabilities for those zero purchases (Dong & Kaiser, 2007;Lee, 1999;Zeger & Brookmeyer, 1986). Evaluating multivariate probabilities is typically difficult because there is generally no closed solution to the maximum likelihood program thereby requiring simulated probability procedures (Geweke, 1991;Hajivassiliou, McFadden and Ruud, 1996;and Keane, 1994). 1 Unlike the traditional approach, this study proposes a model that focuses on the duration between two consecutive purchase occasions rather than each possible shopping week. Our model therefore uses only positive observations for the unit value and quantity, and leaves the decision of whether to purchase to be implicitly determined by inter-purchase time.
To model consumer behavior in which households simultaneously choose the quality of a product and how much of it to buy, Deaton (1987Deaton ( , 1990 proposed an econometric model for the unit value and quantity of purchases measured in physical terms (e.g., pounds), in which the quality elasticity and demand elasticity were estimated using household crosssectional (cluster) data. In this study, we extend that model to a panel data structure and, by adding an inter-purchase time equation, estimate purchase frequency.
Suppose household i purchased quantity Q it of a composite commodity at purchase occasion t and had total expenditures of E it . The unit value paid by household i at occasion t for the commodity can be calculated as P it = E it /Q it . The inter-purchase time between the current purchase occasion t i and the previous purchase occasion t i − 1 is D it . These variables together describe what to purchase (P it ), how much to purchase (Q it ), and how often to purchase (D it ). Below, we describe our model beginning with the quality and quantity equations, and, finally, the equation for inter-purchase time.
As pointed out by Deaton (1987Deaton ( , 1990, the derived unit value (P it ) consists of two parts: the exogenous market price and the endogenous commodity quality. The quality part is determined by household i's choices over more and less expensive products within the same commodity category. Previous studies (Cox & Wohlgenant, 1986;Deaton, 1988Deaton, , 1987Deaton, , 1990Dong, Shonkwiler, & Capps, 1998;Nelson, 1991) further hypothesize that a household's economic and demographic characteristics account for its preferences over more and less expensive products. Accordingly, we define the unit value as: (1) where Z it is a vector of variables such as regions and seasons to capture price variations, as well as household economic and demographic variables to capture differences in preferences for food quality. We will discuss the actual variables used in this study in detail in the data and variable section. The remaining model components are as follows: β is a vector of parameters, v i is household i's individual effect on the unit value such as unobserved preferences or tastes for a certain quality of food that is specific to household i and invariant over time, and ε it is an idiosyncratic error term. The purchase amount equation, which depends on the unit value P it as defined above, is expressed as: where X it is a vector of household variables that influences cereals purchases and may contain some of the same variables in Z it . What remains are α 1 and α 2 as parameters, u i as household i's individual time-invariant effect on the purchase quantity, and e it as an error term.
To estimate (1) and (2), we assume the error terms in the two equations are jointly distributed normal with mean zero and a variance-covariance matrix Ω i : The variance-covariance matrix Ω i ¼ EðΨ i 0 Ψ i Þ may be defined as: Equations (1) and (2) are unbalanced panel data models. The number of purchases within the data period (T i ) varies across households. Following Zeger and Brookmeyer (1986) and Schmit et al. (2003), we assume that the individual effect (v i ) and the error term (ε it ) in (1) are independent 2 and ε it follows an AR(1) process, 3 then we have: Similarly, by assuming that the individual effect (u i ) and the error term (e it ) in (2) are independent and e it follows an AR(1) process, we have: where σ 2 u is the variance of u i , σ 2 e is the variance of e it , and ρ e is the autocorrelation of e it . The two error terms in the unit value and quantity equations, i.e., ε it and e it in (1) and (2), are very likely to be correlated. We assume the correlation of ε it and e it is ρ εe and that it is invariant over time. 4 Thus, we have: (7) Ω ψω will be 0 if ρ εe = 0, or triangle if either ρ ε or ρ e = 0, or diagonal if both ρ ε and ρ e = 0. If all households make a purchase during every purchase occasion (e.g., all households purchased cereal in every week), then the purchase frequency is 1. There are no zero purchases (i.e., P it and Q it are non-zero for all households in all time periods). As the data are not censored in this special case, one can estimate the model (3) by maximizing its likelihood. The resulting parameter estimates will be unbiased.
However, when studying the demand for specific types of food products, zero purchases (caused by infrequency of purchases or corner solutions) generally exist. Thus, in most cases, estimating the model in (3) by maximum likelihood ignoring zero censoring will give biased estimates. As mentioned above, the conventional approach focuses on each possible purchase occasion and calculates the probability of the zero purchase if the household did not buy the food. However, applying this approach to panel data involves not only the dealing with missing unit values but also the evaluation of high degree multivariate probability integrals. To avoid this problem, alternatively, we develop a model below that takes only the purchase occasions and focuses on the time between two consecutive such occasions. 5 Our proposed model includes an inter-purchase time equation in addition to the above unit value and quantity equations. 6 Inter-purchase time is a count of the number weeks that lapses between purchases and, indeed, is the reciprocal of purchase frequency. For example, if 4 weeks lapsed between a household's purchases of a food product, then it shopped 1 out of 4 weeks during that particular time interval. Anything that causes inter-purchase time to increase implies less frequent purchases and anything that causes it to decrease implies more frequent purchases. An increase in quantity, for example, might be associated with stocking up, so we would expect it to positively affect inter-purchase time.
We model inter-purchase time as a random variable that follows a Poisson probability distribution, which captures the effect of the time elapsed since the last purchase on the timing of the next purchase. This probability is also assumed to be influenced by marketing variables (such as unit value or lagged purchases) and household characteristics. Other distribution choices can be found in Kiefer (1988) or Jain and Vilcassim (1991). The Poisson pdf of inter-purchase time (D it ) is given as: whereλ it is Poisson parameter and it gives the mean and variance of D it . We introduce the effects of marketing variables and household characteristics through the parameterization ofλ it as below: where the γ's are parameters to be estimated and W it is a vector of household variables influencing D it and may contain some of the same variables in X it and Z it . The variable P it is the unit value, which can directly influence inter-purchase time. If the current price of cereal is higher-than-expected, households may wait until next time to buy. This is usually interpreted as a corner solution in one-time, snapshot cross-sectional data. The amount bought in the previous purchase (lagged purchase) is Q it−1 , which also directly affects inter-purchase time. If the previous purchase amount is large and a household's stock of the product sufficient, this may push the household to buy in a later time. This is usually interpreted as infrequency of purchase in one-time, snapshot, cross-sectional data. The use of the exponential form in (9) guaranteesλ it being positive. In addition, the value ofλ it varies across households and also over time, thereby capturing the effects of household characteristic variables (W it ), unit value (P it ), lagged quantity (Q it−1 ), and a household individual effect μ i . Without μ i or μ i = 1 (η i = 0), model (8) and (9) is the standard Poisson model that has been widely applied by researchers. However, as in the unit value and quantity equations above, we wish to introduce a household, unobserved, individual effect μ i that can also influence inter-purchase time. Following Hausman, Hall, and Griliches (1984), we assume μ i ¼ e η i is a random term and a gamma variable with mean 1 and variance 1/θ: The joint probability density function (pdf) of D it of household i over t is derived by integrating out μ i as below: where Hausman et al., 1984). Note λ it ¼ e W it γ 1 þP it γ 2 þQ itÀ1 γ 3 is the non-stochastic part ofλ it defined in (9). Finally, having completed our specifications of the product quality, purchase quantity, and inter-purchase time equations, we may estimate the model through maximizing the joint likelihood function. Given the pdf of D i as conditional on Ψ (ω, ψ), by Bayes theorem, the likelihood function of (3) and (11) for household i is: where The logarithm of the likelihood for household i is: 7 The log-likelihood for a total of N households is then, 7 Since independence is assumed between inter-purchase time and the quality and quantity, one can estimate the quality and quantity using only the purchased observations by maximizing the likelihood of Equation (3) Consistent and efficient parameter estimates can be obtained by maximizing (14). As was mentioned before, the purchase behavior of a given household can be captured by the above equations. For a given food category, say cereals, Equation (1) answers the question of what type of cereals to buy (quality), Equation (2) answers how much to buy when making purchases, and Equation (8) answers the question of when to buy (frequency).
Finally, the expectation of the unit value and the quantity for positive purchases, and the inter-purchase time can be derived as: The marginal effects of all the explanatory variables and their associated elasticities for the continuous variables or semi-elasticities for the dummy variables can be calculated based on (15)-(17).

Data and variables
Data from IRI's 2012 National Consumer Panel are used to study households' weekly purchases of cold cereals for at-home consumption. Weekly purchase quantities and expenditures are defined as the sum of all quantities and expenditures on all cold cereals over a week. As shown in the previous section, unit values capture both price and quality. They are derived by dividing reported expenditures by quantities for the purchase weeks accounting for any coupons used. Table 1 gives a summary of U.S. households' cereal purchases based on our 2012 data. Our IRI household panel data include information on 52,514 households over one year after deleting those households who participated in the panel less than 12 months. Among the 52,514 households, 4058 households did not buy any cold cereal in 2012. In this study, we use only data on the 48,456 purchasing households. Among these purchasing households, another 3087 (6% of all purchasing households) bought cold cereal only one-time during the whole year. We also delete these single purchase occasion households from our estimation. We need at least two purchase occasions to determine inter-purchase times for a household and estimate the model. The mean inter-purchase time between cereal purchases is about 6 weeks. This indicates that U.S. households purchased cold cereals about 9 of the 53 weeks, on average. During the weeks when a purchase is made, the average quantity bought was 33 oz, and the unit value paid was $0.20/oz. Table 2 lists all the explanatory variables used in estimating the model and provides descriptive statistics on each. We use the inverse of household size to convert this number from a discrete variable into a continuous one and to capture the possibly nonlinear effect. We also take the natural logarithm of household income and the age of the household head to account for the nonlinear effects of these two variables.
In this study, we also generated a dummy variable to identify households potentially eligible for WIC in order to see if they exhibit different purchase behavior with respect to cereals. We define a household as potentially WIC eligible if its family size-adjusted income was less than 185% of the federal income poverty guideline as required by the WIC program, and it contained children less than 5-year old, or it contained women aged 14-44, or it bought infant formula authorized for WIC, or any combination of these. The 14-44 years age range is used to capture women most likely to be pregnant.

Model estimation results
The three equations for unit value, quantity, and inter-purchase time are jointly estimated using the maximum likelihood estimation procedure described in the second section of this paper. GAUSS software and the BHHH (Berndt, Hall, Hall, & Hausman, 1974) optimization procedure are used.
Most of the variables are statistically significant at the 5% level. Table 3 reports all estimation results. Estimated variance parameters capturing random effects in the unit value, quantity, and inter-purchase time equations, i.e., σ u , σ v , and θ, are all highly significant, indicating household heterogeneity effects exist in cereal quality choice, purchase quantity, and purchase frequency. For example, the data show that households who purchase a larger quantity of cereals also tend to buy cereal more frequently than those who purchase smaller quantities. However, after accounting for this household heterogeneity effect in the inter-purchase time equation, we find that having purchased a larger quantity in the past (lagged purchases) reduces purchase frequency by increasing inter-purchase time. This confirms the existence of a stocking up effect as hypothesized in our model section. Detailed results are discussed later.
Autocorrelation parameters affecting both unit value (ρ ε ) and quantity (ρ e ), and the correlation between the two (ρ εe ) are also highly statistically significant, indicating temporal linkages and cross-equation correlations exist in the two equations, though the numbers are small for our cereal purchases.
To better understand pertinent aspects of a household's demand for cold cereal, we calculated elasticities for all continuous independent variables and semi-elasticities for all binary independent variables (see Table 4). These elasticities and semi-elasticities are derived using Equations (15)-(17) based on the parameter estimates in Table 3. Because our system of equations is recursive, the exogenous variables not only have a direct effect on the unit value, quantity, and inter-purchase time, but also have indirect effects. The exogenous variables have an indirect effect on the quantity and inter-purchase time via the unit value as well as an indirect effect on the inter-purchase time via the quantity. For the unit value, since all the right-hand-side variables are exogenous, the direct and total effects are the same.

Unit value and quality
Household demographic variables co-vary with the quality of purchased cereal in an economically meaningful way. We find income to have a positive and statistically significant effect on the unit value ($/ounce) that a household pays for cold cereals. This result implies that poorer households tend to buy lower quality cereals. Employment and education of the female head are also positively related to cereal quality choices. Specifically, households with a working female head, or a collegeeducated head, tend to spend more money per unit. Larger-sized households are found to purchase cheaper cereals, which may reflect a tighter per capita budget relative to smaller-sized households. Age of the female head of household has a significant negative effect on cereal quality choices. This may suggest older people are more frugal.
In this study, we find that the amount households spent per ounce of cereal slightly decreased over time. We also find that people in the Midwest and South spend more per unit on cereal than people in other areas. Compared to whites, blacks and Asians spend more per unit on cereal. Households with children less than 12 year's old buy higher quality cereals. Households who own a house spend less per unit on cereal. This may indicate that home owners tend to buy larger, economy-sized packages as they have a greater capacity to store foods. Finally, we do not find that potentially WICeligible households behave significantly different from non-eligible households with regard to their choice of cereal quality.

Quantity
Households are modeled to simultaneously choose the quality of cereal to buy and the quantity to purchase. The unit value paid by a household significantly affects the amount of cereal that it buys on weeks when it makes a purchase. In this study, we find that a 1% increase in the unit value would cause a decrease of 2.67% in the quantity of cereal bought. An increase in the unit value could reflect an increase in cereal market prices or a choice to buy better quality cereals made by the household. Given the recursive relationship among unit value, quantity, and inter-purchase time, we calculate direct and total effects for each explanatory variables. For example, we find large households buy cheaper (lower quality) cereals: a 1% increase in household size would decrease by 0.24% the unit value paid. We also find household size directly affects purchase quantities. Larger households tend to buy more ounces of cereal on any purchase occasiona 1% increase in household size would increase by 0.72% the quantity purchased. Since the 0.24% decrease in unit value caused by the 1% increase in household size would enable the household to buy a larger quantity (0.24 × 2.67 = 0.64%), the total effect of a 1% increase in household size on quantity purchased would be 0.72 + 0.64 = 1.37%.
Household income is found to have a positive direct effect on purchase quantity (0.04), but it also has a positive effect on the unit value paid (0.04) which, in turn, decreases the quantity by 0.11 (0.04 × 2.67). As a result, the total effect of income becomes negative (0.04-0.11 = −0.07).
Some variables in the unit value equation are not included in the quantity equation for model identification, for example, college education, region, and some of the race variables. Those variables, however, can influence quantity through their influence on unit value. In this study, we find that households with a college-educated head buy 0.03% less cereal on any purchase occasion.
Finally, we note that being a potentially WIC-eligible household as defined in this study has a significant and positive effect on the quantity of cereal purchased. In contrast to non-eligible households, WIC-eligible households choose similar quality cereals, but in larger quantities. Specifically, WIC-eligible households buy 0.1% more cereal on any purchase occasion than other households.
3.2.3. Inter-purchase time or purchase frequency Inter-purchase time is the reciprocal of purchase frequency. Among our results, we find the unit value paid is negatively related to the inter-purchase time. This indicates that paying more money for cereals causes households to shorten the period of time until their next purchase. However, the magnitude of the elasticity of inter-purchase time with respect to the unit value is only −0.0026, too tiny to have any observable effect, even though it is statistically significant.
The amount purchased on the previous shopping occasion (lagged quantity) has a positive and significant effect on inter-purchase time for cereals. A 1% increase in lagged quantity would increase inter-purchase time by 0.11%. Cereal products are storable, can be purchased when available at a cheap price, and stocked up for long periods of time.
Larger-sized households are found to buy cereals more frequently than smaller-sized ones, regardless of any change through the unit value and quantity. The total effect of household size on inter-purchase time is −1.18. That is, a 1% increase in household size would reduce the inter-purchase time by 1.18%. The age of the female head and income do not have significant effects on inter-purchase time. Households with small children (up to 5) and a fulltime working female head purchase cereals less frequently. Households with big children (>5) and homeowners are found to buy cereals more frequently.
We also find that potentially WIC-eligible households buy cereal more frequently. The total effect of being a WIC-eligible household on the inter-purchase time is −0.03, indicating a 0.03% decrease in the time between purchases.

Conclusions and suggestions for future research
Researchers studying US household food demand have increasingly been able to take advantage of panel data collected by companies like IRI and Nielsen, in which thousands of households are observed each day over a number of months, possibly years. To provide these researchers with a method for analyzing those data that is both econometrically tractable and captures pertinent aspects of household behavior, we develop and estimate a three-equation model. These three equations describe how much money a household pays per unit of food, how much food it typically buys on a purchase occasion, and how often it makes a purchase. Our proposed approach overcomes problems commonly encountered when working with censored panel data. These problems usually involve the dealing with missing unit values and the evaluation of high degree multivariate probability integrals.
For an empirical application, we investigate the demand for cold cereal. Research shows that ready-to-eat cereal and dairy products are the most commonly consumed breakfast foods (Mullan & Singh, 2010). Children and adolescents who consume cold cereal also tend to have higher nutrient intake or are more likely to meet nutrient intake recommendations compared with non-consumers (Rampersaud, Pereira, Girard, Adams, & Metzl, 2005). Because of the potential health benefits of consuming cold cereal, USDA's WIC program provides participating children and women with up to 36 ounces per month. In this study, we include a variable to identify households with individuals who may be eligible to participate in WIC. We find that potentially WICeligible households purchase a larger quantity of cold cereal than other households. They also purchase cold cereals more often than other households. These findings confirm that households participating in WIC are likely to use their benefits to acquire cold cereal and providing them with low-calorie, nutrient-dense options may be an effective strategy to improve participants' diet quality.
Future research might use household panel data and the model proposed in this study to investigate the behavior of households who participate in USDA's Supplemental Nutrition Assistance Program (SNAP), such as whether they and other households have equally smooth purchase patterns or whether SNAP households tend to stock up on foods when they have benefits. Still other research might examine the behavior of households living in areas with more limited access to retail food stores. Such households may have higher time and travel costs for visiting large retail food stores, shop less often, and stock up on certain types of foods when they do shop at large stores.