Estimating poverty among refugee populations: a cross-survey imputation exercise for Chad

ABSTRACT Household consumption surveys do not typically offer poverty estimates for refugees. We test the performance of a recently developed cross-survey imputation method to estimate poverty for a sample of refugees in Chad, combining survey and administrative data collected by the United Nations High Commissioner for Refugees (UNHCR). We find the imputed poverty rates are not statistically different from the poverty rates obtained directly from the survey consumption data. This result is robust to different model specifications, varying poverty lines, and assumptions of the error terms. Targeting results based on the imputed poverty estimates also outperform common targeting methods, such as proxy means tests and the current targeting method used by humanitarian organizations in Chad. Replicating this approach in at least some of the 122 other countries currently using UNHCR administrative data could help address data gaps and provide much-needed estimates to effectively respond to forcibly displaced crises.


Introduction
The UN General Assembly's Sustainable Development Goal (SDG) 1 -to end poverty in all its forms by 2030 -explicitly pledges that 'no one will be left behind'.To achieve this goal, accurate poverty measurement is essential, which typically requires the availability of high-quality household consumption surveys. 1 It is equally important for these surveys to be inclusive and cover marginal populations, such as refugees and Internally Displaced Persons (IDPs).Unfortunately, household consumption surveys rarely include forcibly displaced populations, despite the fact that these populations are among the most vulnerable and deprived.They typically lack fundamental rights such as freedom of movement and the right to work, have eroded human and physical capital, and face more frequent shocks than surrounding host communities.This is a significant and growing challenge, particularly in Sub-Saharan Africa.The global number of forcibly displaced persons almost doubled from 43.3 million in 2009 to 82.4 million in 2020.Among them, there are 26.4 million refugees, 4.1 million asylum seekers, 48 million IDPs (UNHCR, 2020), and other displaced populations under the United Nations High Commissioner for Refugees (UNHCR) protection.Almost four out of five refugees live in countries neighboring their place of origin, and some 84% of them live in developing countries.Sub-Saharan Africa hosts around one-figures.The current targeting strategy in Chad, which is used jointly by the National Commission on the Welcoming and Resettlement of Refugees (CNARR), UNHCR, and World Food Programme (WFP), is fairly accurate in predicting household welfare.However, our results suggest that this targeting strategy could be further improved by reducing the inclusion and exclusion errors.If these encouraging results are replicated in other contexts, poverty predictions for refugees can be expanded at scale, with good prospects for the improvement of targeted programs. 4he paper is organized as follows.The second section outlines the country context.The third section presents the data and analytical framework.The estimation results are presented in the fourth section, and the fifth section evaluates the targeting strategy used in Chad and our targeting method in light of the global experience.The final section offers further discussion on data limitations, suggest future directions of research, before concluding.

Country context
Chad is one of the poorest countries in the world.According to the latest household consumption survey administered in 2017-18, 42% of the population fall below the national poverty line (World Bank, 2021).The past decade has seen much instability for Chad with negative consequences on household well-being.Per capita Gross Domestic Product (GDP) decreased by 15% between 2015 and 2017, from an average of US$963 in 2015 to US$823 in 2017 (in 2010 purchasing power parity [PPP]).In terms of overall development, Chad ranks 187th of 189 countries on the Human Development Index (World Bank, 2021).Due to these challenges, the country struggled to meet many of the Millennium Development Goals (MDGs) in 2015.Barring unforeseen economic growth or large increases in official development assistance, the country appears unlikely to meet many of the Sustainable Development Goals (SDGs) objectives set for 2030.
Despite the current negative economic downturn, Chad continues to host a high number of refugees.In fact, Chad is among the top refugee-hosting countries in the world, ranking as the 10 th largest host country for refugees in the world and the 5 th largest host country in Sub-Saharan Africa (after Ethiopia, Kenya, Uganda, and the Demographic Republic of Congo, See Table A1).Chad's refugee population is sizable and represents about 3% of the national population.The number of forcibly displaced persons increased from 474,478 in 2015 to 667,586 as of March 2019, of which about 69% were refugees or asylum seekers. 5efugees are much poorer than the host population and face a more severe challenge with food insecurity.The poverty rates for Sudanese and Central African Republic (CAR) refugees are estimated at 79.8% and 83.7%, respectively, compared to that of 70% for host communities (World Bank, 2021).
Of the 459,809 current refugees and asylum seekers, the majority (74%) are Sudanese refugees living in the eastern part of Chad, 21% are CAR refugees living in southern Chad, and a smaller number of Nigerian refugees (about 2%) are living in the Lake Chad Basin.The situation is further complicated by the large population of IDPs in the Lake Chad region, which was estimated at 165,313 at the end of 2018 (UNHCR, 2019).Map 1 shows the locations of the refugee camps in Chad.

Analytical framework and data
In this section, we provide an overview of the analytical framework before describing the data.

Analytical framework
The methodology used in this paper relies on the cross-survey imputation framework that was first introduced by Elbers et al. (2003) to generate poverty maps. 6Most recently, Dang et al.  (2017) built on this literature to propose a model that imposes fewer restrictive assumptions and offers an explicit formula for estimating the poverty rate and its variance.Three new contributions introduced by this study are: (i) it offers a simple variance formula, which is in line with the recent statistical literature; (ii) it can accommodate complex design sampling; and (iii) the framework remains applicable to two surveys with different designs (such as imputing from a household consumption survey into a labor force survey).Finally, the approach allows for different modeling methods, including the standard linear regression model, its variant with a flexible specification of the empirical distribution of error terms, a logit model, and/or a probit model.
Formally, x j is a vector of characteristics that are commonly observed between two surveys, where j indicates survey type, with 1 and 2 being respectively the base survey (which we impute from) and the target survey (which we impute into).The welfare indicator is assumed to be a function of household and individual characteristics (x j ): where y j is the welfare indicator (consumption per capita per month), β j is a vector of parameters, υ cj is cluster (c) random effects, and ε j is the idiosyncratic error term.We suppress the subscripts for household and individual characteristics for less clutttered notation.
This imputation framework is based on two assumptions.The first assumption (Assumption 1), which is critical for poverty imputation, states that measurement of household characteristics in each sample of data is a consistent measure of the characteristics of the whole population.In other words, it stipulates that the surveys considered are representative of the same target population.In our context, the two surveys represent the same population of refugees, and they were conducted approximatively at the same time.Therefore, the first assumption is satisfied.However, we will conduct means difference tests on the observed overlapping variables between the target data and base data to ensure that this is the case.The second assumption states that changes in x j between the data collection periods of the two data sets can capture the change in welfare over the period (Assumption 2).Since the two data sets that we analyze were collected in the same year, Assumption 2 is also satisfied by design.
Under these two assumptions, the imputed welfare is where y 1 2 represent the imputed welfare when we apply the estimated parameters (β 0 1 ) and the estimated distributions of the error terms (υ c1 and ε 1 ) from the base survey to the variables (x 2 ) in the target survey.
Since Equation ( 1) is typically estimated with the standard cluster-effects linear regression model, Dang et al. (2017) propose different imputation methods for poverty estimation.The first method relies on the assumption of the normal distribution for the two error terms (μ cj and ε j are uncorrelated, as are υ cj jx j ,N 0; σ μ cj � � and ε j jx j ,N 0; σ ε j

� �
). Hereafter, this method is referred to as the normal linear regression model.An alternative method proposed is the empirical error method, which assumes no functional form for these error terms and instead uses their estimated empirical distributions.
Since the estimated parameters are obtained using a different survey from the target survey, we can use simulation to estimate Equation (2) (for a single draw) as follows: In Equation (3), e υc1;s , and e ε1;s represent the s th random draw (simulation) from their estimated distributions, using the base survey, for s = 1, . . ., S.
The imputed poverty rate ( b P 2 ) and its variance (V b P 2

� �
) in the target survey are then estimated as: where P(.) is the probability function that estimates the poverty rate in the population for each simulation and z 1 is the poverty line in the base survey.In Equation (5), P2;s is similarly defined as follows P2;s ¼ Pðŷ 1 2;s � z 1 Þ. 7 These poverty estimators offer consistent estimates of the parameters of interest.Furthermore, in terms of prediction accuracy, these estimators outperform the traditional proxy means testing technique, which typically omits the error terms υ c1 þ ε 1 and results in biased estimates of the welfare indicator (Dang et al., 2019).To provide further robustness check, we also employ two alternative modelling methods -the probit model and the logit model.These models place more restrictive assumptions on the error term but estimate poverty figures directly (i.e.Equation (4) and Equation ( 5)) instead of estimating consumption expenditure first and subsequently obtain poverty estimates using the predicted consumption expenditure.

Data
As part of its mandate to protect displaced persons in host countries, UNHCR collects data to monitor the welfare of refugees and other populations of interest and to deliver assistance and services.In this study, we use three sets of data collected by the UNHCR and its partners (Table 1).The first one is the ProGres data set, which is the UNHCR's registration system covering all refugees or asylum seekers requiring assistance.The ProGres data set is a live instrument that is continuously updated as new refugees arrive, or as existing refugees contact the UNHCR.The data that we use were extracted at the end of December 2017.This set of data contains socioeconomic variables (such as household size, marital status, gender, age, country of origin, and region of residence) but has no consumption or expenditure data.This data set can therefore be considered the 'census' of refugees.
The second set of data, the Targeting data, is also a census-like data set for refugees living in Chad.The main objectives of this data set are to fill knowledge gaps on refugee livelihoods and the levels and differences of vulnerability in refugee households, and to categorize refugees into different income levels for assistance (i.e.including cash and food).Besides these objectives, the Targeting data aim to identify factors that can facilitate refugee self-reliance.Consequently, this data set is based on a mixed methods approach, including qualitative and quantitative methods.The first step involves conducting focus group discussion with refugee leaders, women's organizations, and youth associations, to identify the wealth characteristics and key challenges that are specific to different ages and genders.The second step is to implement a sample survey across camps to confirm the wealth characteristics that were identified by refugees in the first step.Based on the outcomes of the first two steps, a detailed quantitative survey designed to capture wealth characteristics is administered to all the refugee households.
The Targeting data include all the Sudanese, Central African, and Nigerian refugees living in Chad.The data were collected June 17th-15 July 2017, and cover 19 refugee sites and refugees living in nine host villages.After the data are collected, a statistical model, which takes into account household welfare, is used to classify households into four socioeconomic groups (very poor, poor, average, and better off).For the variables that are relevant for this study, this data set contains demographic variables (household size, gender, age, country of origin, and region of residence), variables for asset and animal ownership, and variables reflecting shock-coping strategies.Similar to the ProGres data, the Targeting data do not collect information on consumption or expenditure; however, the Targeting data collect information on wealth.
The last data set is the Post-Distribution Monitoring (PDM) data, which is from a sample survey that covers similar themes as the Targeting data set.The PDM data set, which was collected in 2017 by the World Food Program (WFP), aims to provide a better understanding of how refugees use food assistance and contains data on consumption and expenditure.The PDM has a two-stage stratified random sample design, where the first stage includes the selection of camps and the second stage the selection of households.The different camps are stratified in three zones: (i) North East (Ourecassoni, Amnaback, Iridimi Touloum); (ii) Centre-East (Goz Amir, Djabal, Gaga, Teguine, Bredjing, Farchana); and (iii) South (Amboko, Dossey, Gondjé, Belom, Moyo) (see Map 1).In addition, the sampling takes into account the kind of humanitarian assistance that is provided to refugees (in-kind, food voucher, or cash).Importantly, the PDM includes two consumption aggregates measuring monthly total consumption and monthly food consumption, using retrospective questions with varying recall periods depending on the item considered (from seven days to one year).The consumption aggregate is compiled by aggregating the different food and non-food items, including expenditures on education, health, durable assets, and rent.For this study, we consider two welfare indicators from the PDM data set.The first is the household total consumption expenditure per capita per month, and the second is the household food consumption per capita per month. 8or poverty imputation purposes, we construct three data sets from the ProGres, Targeting, and PDM data.The first, which we refer to as 'ProGres 2' is obtained by appending the ProGres data to the PDM data (i.e.pooling the two datasets together).As the ProGres and PDM data share only demographic variables, ProGres 2 contains the demographic variables for all observations, although only the observations from the PDM data have consumption expenditure.As such, the ProGres 2 dataset allows us to estimate the welfare model using the PDM data and subsequently use this model to impute household consumption using the ProGres dataset.
The second constructed data set, 'Targeting 2', is obtained by appending the Targeting data to the PDM data.Therefore, the Targeting 2 dataset contains demographic variables, asset and animal ownership, and coping strategies variables as well as consumption data.The last constructed data set, 'ProGres Targeting', is obtained by first merging the ProGres and Targeting data (which we can match 72% of the observations) and subsequently appending these data to the PDM data.This data set is the most complete in terms of variables.
The motivation behind constructing these three sets of data is to check whether the different sources of data as well as the different sets of variables generate different poverty figures, such that we can determine the set of variables that best predicts poverty.To ensure comparability across the three data sets, we restrict the analysis to 16 (of the 19) refugee sites, because the PDM data cover only 16 sites.Consequently, this study covers the refugees in Chad that come from the Central African Republic and Sudan only.

Estimation results
In this section, we test the model assumptions and present the estimation results.

Testing model assumptions
As a first step, we check whether our data sets are representative of the same underlying population (Assumption 1) by performing means difference tests across key predictors.Since the PDM data is a subsample of the Targeting or ProGres data sets, we use a statistical test for partially overlapping samples.The results, shown in Table A2 in the Annex, indicate that all the variables are not significantly different in terms of means and provide supporting evidence that the two samples are representative of the same population. 9o evaluate the performance of the welfare estimation model, we consider three models.Model 1 includes demographic and geographic variables (region of residence and country of origin).This is the most parsimonious model and uses the variables that are readily available in the ProGres data set.Model 2 adds to Model 1 variables related to animal and asset ownership.Model 2 is richer than Model 1, but it is more demanding in terms of the control variables, which may also be less reliable or more likely to be missing in the census data.Model 3 adds to Model 2 variables measuring coping strategies.To test for multicollinearity, Table A3 reports the variance inflation factor (VIF) for the different models.It shows that no variable has a VIF that is over 5, which is far lower than the rule-of-thumb value of 10 given for harmful collinearity by Kennedy (2008).We conclude that multicollinearity is not an issue for any of the models considered.
Next, we test the out-of-sample performance and possible overfitting of the three models, using the PDM data and the root mean square error (RMSE) and mean absolute error (MAE) as performance functions.To do so, the data set is split into five equal folds.In the first iteration, the first fold is used to test the model, and the rest are used to train the model.In the second iteration, the second fold is used as the testing set, while the rest serve as the training set.This process is repeated until each of the five folds has been used as the testing set.The performance function is obtained as the mean across the five iterations.
For the food consumption aggregate, the three models have similar measures of goodnessof-fit for both indicators (Table 2).Model 1's RMSE is 0.55, Models 2 and 3's RMSE is 0.54.For the MAE, Models 1 and 3 have a value of 0.42, whereas Model 2 has an RMSE of 0.41.When we turn to the overall consumption aggregate, we note the differences between the The international total poverty line is $1.88 2011 PPP per person per day while the most recent national total (Food) poverty line in Chad is $2.60 ($1.88) per person per day.Robust standard errors in parentheses are clustered at the camp level.We use 1,000 simulations for each model run.Source: Authors' calculations.
three models.The RMSE values range from 0.53 to 0.58, with Model 3 and Model 1 having the smallest and highest RMSE, respectively.The MAE is quite similar across the three models, within a range from 0.39 (Model 3) to 0.43 (Model 1).These results suggest that no model consistently outperforms the other models.

Estimation results
Table 3 applies the model to the three constructed data sets described earlier (ProGres 2, Targeting 2, and ProGres Targeting data), using the normal linear regression model and the empirical error model.We also show the results using two poverty lines in this table: (i) a US$ 1.9 -a-day poverty line in 2011 PPP, which represents the international poverty line for extreme poverty (panel A); and (ii) the national poverty line, which corresponds to around US$ 2.6 (World Bank, 2013) (panel B). 10 In order to be consistent with the contemporaneous thinking on global poverty at the time that the data was collected we use the global poverty line of $1.90 per day in 2011 PPP, which is equivalent to the current (i.e. in 2023) global poverty line of $2.15 per day in 2017 PPP.
Table 3 shows that the imputed poverty rates are not statistically different from the poverty rates obtained directly from the survey consumption data (henceforth, 'survey estimates').That is, all the imputed poverty estimates fall inside the 95% confidence intervals of the survey estimates, with many even falling inside one standard error of the survey estimates.The normal linear regression model and the empirical error model offer quite similar estimation results, which is generally consistent with findings in poverty imputation for the general population in other countries (Dang et al., 2019).
To provide further robustness check, we also employ two alternative modelling options (the probit model and the logit model) and show the results in Table A5 in the Annex.These models offer quite similar estimation results.The estimation results using the food poverty line, shown in Table A6 in the Annex, are qualitatively similar.Using the ProGres Targeting data, Figure 1 further simulates the estimation results for all the poverty lines between the 66th and 99th percentiles of the consumption distribution.Panels A and B offer estimation results using the normal linear model and the empirical error model, respectively.The results suggest that Models 1 and 2 predict the poverty rates for different poverty lines well.The imputed poverty rates are within the 95% confidence intervals (CIs) for all the arbitrary poverty lines considered, and they are similar across the normal linear regression and empirical error models.However, Model 3 overestimates poverty, and the imputed poverty rates are outside the 95% CIs of the survey-based rates for the set of different poverty lines considered.As Model 3 adds variables related to coping strategies, it may suffer from measurement errors.For example, households might not accurately report these strategies, for example, by overestimating the frequency of using these strategies to receive more assistance from humanitarian organizations.
Figure 2 shows the imputed welfare rates for the set of different poverty lines for all three models, but with a focus on food security.Welfare based on food security is defined in humanitarian settings as the inability to afford the minimum expenditure basket required to purchase a food basket (to satisfy basic needs).In particular, the minimum expenditure basket is defined by the WFP 'as what a household requires in order to meet their essential needs, on a regular or seasonal basis, and its average cost' (WFP, 2018).The results are similar to the overall welfare results displayed in Figure 1.The results indicate that Models 1 and 2 predict the actual welfare rates well based on food security for different poverty lines and are within the 95% CIs for all the arbitrary poverty lines considered and the two different estimation models offer similar results.Again, Model 3 overestimates the poverty rates, as the imputed welfare rates are outside the CIs of the survey-based rates. 11n summary, our results show that while Models 1 and 2 predict poverty and food security poverty reasonably well for different arbitrary poverty lines, Model 3 always overestimates poverty for lower poverty lines and its predictions are outside the 95% CIs.The results remain similar regardless of the employed models (i.e. the normal linear regression model or the empirical error model).Put differently, the variables currently available in the ProGres UNHCR registration system can be combined with other survey data to predict the poverty rates of refugees using the proposed imputation methods. 12

Targeting performance
The imputed welfare estimates can be useful in evaluating ex-post the inclusion/exclusion errors of the food assistance programs administered by government and humanitarian organizations during 2016/ 17.The targeting strategy for food assistance was agreed to and implemented by the UNHCR, WFP, and the Chadian government agency responsible for refugees, the CNARR.The proposed crossimputation method has also been shown to perform better than the proxy means testing approach in refugee contexts (Dang & Verme, 2023). 13The current UNHCR/WFP/CNARR targeting approach relies on the Food Consumption Score (FCS) generated by WFP's PDM surveys, which is a composite score based on dietary diversity, food consumption frequency, and the relative nutritional importance of different food items.As is the case for any index, the FCS is contingent on the selection of the food group weights as well as the food item thresholds, which are based on inherently subjective choices.
We show next how accurately the current targeting strategy identifies poor households in terms of inclusion (leakage) and exclusion (undercoverage) errors.The inclusion error is defined as the proportion of households that the targeting method considers as poor despite not being poor.This is expressed as FP TPþFP , where FP (false poor) is the number of non-poor households incorrectly considered by the targeting method and TP (true poor) is the number of poor households correctly reported poor.The exclusion error is defined as the proportion of households in poverty that the targeting method considers as non-poor FN TPþFN , where FN (false non-poor) is the number of poor households incorrectly considered non-poor by the targeting method.
Both error types are important from different perspectives.The inclusion error matters primarily from a budget perspective, as it represents a waste of resources.The exclusion error summarizes the program's failure to cover households in need.Another common targeting indicator is the Coady-Grosh-Hoddinott (CGH) Ratio (Coady et al., 2004), which is obtained by dividing the proportion of beneficiaries falling within the target population by the proportion of beneficiaries that would result from a random allocation.For example, if the bottom 40% of the income distribution receives 60% of the funding, the performance indicator is 1.5 ( = 60/40).The higher the indicator, the greater is the performance of the targeting strategy (see Table A7 in the Annex for a summary of these indicators).
Table 4 shows the undercoverage and leakage rates for the different approaches.The method we propose (panel B) outperforms the targeting method currently used in Chad (panel A) for all the poverty lines except the 25th percentile poverty line.The errors are considerable, with the UNHCR/ WFP/CNARR undercoverage rates ranging from 9% to 32% and the leakage rates from 12% to 36%, and our model-based undercoverage rates from 6% to 40% and the leakage rates from 9% to 41%.However, these methods perform relatively well when compared with international evidence.For example, Skoufias et al. (2001) find that the undercoverage and leakage rates for the PROGRESA program in Mexico were 7% and 70%, respectively, for a poverty rate of 25%.These figures represent slightly better performance on the undercoverage rate but much worse performance on the leakage rate compared with those for Chad.
In fact, the estimated targeting rates for Chad are also better than the median performance of similar scores for programs across the world.Table A8 reports the CGH ratio for the 85 programs considered by Coady et al. (2004) (A), the UNHCR/WFP/CNARR targeting program (B), and our proposed method (C).Notably, our methodology outperforms the UNHCR/WFP/CNARR targeting program and the median value of the programs covered by Coady et al. (2004).
For a more general test, we empirically evaluate how the UNHCR/WFP/CNARR targeting strategy performs relative to the proposed targeting method based on imputed consumption for different poverty lines varying from 38% to 99% of the consumption distribution (Figure 3).The results suggest that our proposed method outperforms the targeting method currently used in Chad for all the poverty lines between 38% and 99%.In other words, our proposed method would more accurately identify the intended beneficiaries than the targeting method currently used in Chad for any welfare programs targeting poor refugees.

Conclusion
Tracking the progress made toward SDG Goal 1 of eradicating poverty for all requires the availability of high-quality household consumption surveys.However, the majority of countries across the world, especially developing countries, face challenges in collecting poverty data.High-quality consumption surveys that are comparable for forcibly displaced persons and their hosts are, and will, remain in limited supply, given the cost and challenges associated with these types of surveys.In the meantime, cross-survey imputation methods can provide a second-best alternative that can potentially save time and resources.We combine survey and census-type data on refugees to estimate welfare for refugees in Chad.We showed how different sets of variables as well as different sources of data perform in the identification of poor households, in particular how well the set of variables available in the ProGres database can predict poverty.In a second step, the paper estimated the accuracy of the current UNHCR/WFP/CNARR targeting strategy and compared it with the targeting strategy based on imputed consumption.
The results suggest that the set of variables available in ProGres accurately predicts the welfare rates for different poverty lines.Adding variables related to asset and animal ownership provides predictions that are very close to the ones with only the variables available in the ProGres data set.These results are robust to different model specifications, varying poverty lines, and assumptions about the error terms.Since the UNHCR ProGres data are available in most refugee locations where the UNHCR runs the registration system -currently more than 122 countries -these methods may be replicable in many settings of forcibly displaced persons.
The current targeting strategy that is used for food, livelihoods, and cash-based assistance, despite its simplicity, is rather accurate when compared with the existing international evidence.The targeting errors resulting from the current UNHCR/WFP/CNARR targeting strategy for a poverty rate of 25% are in the same error range as other targeting methods around the world, as reported in Coady et al. (2004).Yet, we also showed that the existing targeting method can be improved by using the imputation method proposed in this paper.
Our study has several data limitations.The PDM data measure consumption using relatively fewer variables than those in the standard household consumption survey (i.e. the Chadian Household Consumption and Informal Sector Surveys [ECOSIT4]), and the data used by this paper only cover a subset of refugees in Chad.Our main objective is to test a cross-survey methodology, and, for this purpose, we used a subsample of UNHCR refugee data that are not nationally representative of the refugee population in Chad.Therefore, the poverty estimates presented in this paper do not reflect the official poverty estimates monitored by the government and the international community, and our poverty estimates can be improved once ECOSIT4 data are available.In addition, the data that we analyze did not cover refugees who live outside camps.As these refugees live in different environments, accurately predicting their welfare may require more detailed variables.
A promising direction for future research is to adapt existing survey instruments to better collect data to further enhance the accuracy of the imputation model.Another direction can be to experiment with different imputation models for different types of refugees.Since forced displacement can lead to a reorganization of a family's structure (Beltramo et al., 2023), refugee household composition can be unique as only a subset of the original members may be sustained post conflict (e.g.due to conscription in the military for male members, and the death, kidnapping or separation of certain family members during displacement).As such, cross survey imputation methods offer an opportunity for heterogeneity analysis within the refugee (or forcibly displaced) community to assess the welfare of especially vulnerable groups, which is typically impossible in data-scarce contexts.Last but not least, the proposed imputation method is being extended to study other general poverty outcomes such as poverty gap, extreme poverty, or vulnerability (near-poor) rate that can better capture the consumption distribution (Dang et al., 2023).It would be useful to further explore similar applications to the refugee context.

Notes
1 In this paper, we focus on poverty as measured by household consumption.While some richer countries tend to implement income surveys (e.g. in Latin America), consumption-based poverty measurement is the standard practice with poorer countries, particularly in Africa (see, e.g.

Figure 1 .
Figure 1.Imputed welfare and survey-based welfare for different poverty lines, ProGres targeting.Note: The (blue) dashed curve presents the actual poverty rates derived from the PDM data in the ProGres Targeting.The (green) solid curve with circle symbol represents the imputed poverty rates from Model 1 with observations from Merged ProGres Targeting (56,830 observations).The (indigo) solid curve with symbol "x" represents the imputed poverty rates from Model 2 with the Merged ProGres Targeting observations (56,829 observations) while the (orange) solid curve with the triangle symbol represents the imputed poverty rates from Model 3 with the Merged ProGres Targeting observations (56,829 observations).

Figure 2 .
Figure 2. Imputed welfare and survey-based welfare based on food security for different poverty lines, ProGres targeting.Note: The (blue) dashed curve presents the actual poverty rates derived from the PDM data in the ProGres Targeting.The (green) solid curve with circle symbol represents the imputed poverty rates from Model 1 with observations from Merged ProGres Targeting (56,830 observations).The (indigo) solid curve with symbol "x" represents the imputed poverty rates from Model 2 with the Merged ProGres Targeting observations (56,829 observations) while the (orange) solid curve with the triangle symbol represents the imputed poverty rates from Model 3 with the Merged ProGres Targeting observations (56,829 observations).

Figure 3 .
Figure 3.Comparison of targeting performances of different targeting methods.

Table 1 .
Summary of data.

Table 2 .
Out of sample model performance, individual level.

Table 3 .
Imputed poverty rates using the international and national poverty lines*.

Table A3 .
Beegle et al., 2016). 2 Missing data issues are not a problem limited to displaced populations but can emerge because of lack of survey data on a particular topic of interest, population group, or time period.These issues can also be caused by sampling errors, incomplete data due to unit or item nonresponse, data input errors, or post-survey data manipulations such as top-coding or censoring.3Thisis an expanded version of an early working paper byBeltramo et al. (2021).Collinearity tests.

Table A5 .
Imputed poverty rates using the international and national poverty lines, further robustness checks with probit and logit models.