Understanding the Role of Rural Non-Farm Enterprises in Africa’s Economic Transformation: Evidence from Tanzania

Abstract Tanzania’s recent growth boom has been accompanied by a threefold increase in the share of the rural labour force working in nonfarm employment. Although households with nonfarm enterprises are less likely to be poor, a substantial fraction of these households fall below the poverty line. Heterogeneity in the labour productivity of rural nonfarm businesses calls for a two-pronged strategy for rural transformation. Relatively unproductive enterprises may be part of a poverty reduction strategy but should not be expected to contribute to employment and labour productivity growth. Failure to account for this heterogeneity is likely to lead to disappointing outcomes.


Introduction
Since the beginning of the twenty-first century, Tanzania's economy has grown more rapidly than at any other point in recent history. Between 2000 and 2015, the average annual GDP growth rate was 6.8 per cent and the average annual labour productivity growth rate was more than 4 per cent. Between 2002 and 2012, more than three quarters of this labour productivity growth was accounted for by structural change; the remainder of the growth is largely attributable to within sector productivity growth in agriculture. The growth attributable to structural change is almost entirely explained by a rapid decline in the agricultural employment share and an increase in the non-agricultural private sector employment share (Diao, Kweka, McMillan, & Qureshi, 2017).
Despite these changes, Tanzania remains heavily rural; between 2002 and 2012, the share of the population living in rural areas declined by only 6.5 percentage points: from 76.9 per cent to 70.4 per cent (Table 1). Living in rural areas is traditionally associated with farming. However, the census and household survey data also shows that between 2002 and 2012, the share of the rural population engaged in agricultural activities decreased by almost 14 percentage points (Table 1). The data also show that growth in rural nonfarm employment has been very rapid at between 11.1 per cent and 13.5 per cent per annum depending on the data source (Table 1). Thus, while the agricultural sector still employs the majority of the rural population in Tanzania, the rural nonfarm economy is becoming increasingly important.
The purpose of this paper is threefold. The first is to describe the characteristics of the households that make up the rural nonfarm sector in Tanzania. The second is to describe the businesses run by households in the rural nonfarm sector; we also describe characteristics of the owners of these businesses and their self-reported motivations for running a business. The final purpose of this paper is to assess the productivity of rural nonfarm businesses. This last exercise is meant to inform the following question: does the rural nonfarm sector have the potential to contribute to long-run productivity growth in Tanzania. The importance of this question is highlighted in recent work by Diao, McMillan, and Rodrik (2017) who show that: (i) the pattern of growth in Tanzania is common across Africa and; (ii) without labour productivity growth in nonfarm enterprises, labour productivity growth in Africa is likely to stall.
Using Tanzania's 2012 Household Budget Survey (HBS)we classify rural households into three groups: (i) households uniquely engaged in farming; (ii) households uniquely engaged in the nonfarm sector and; (iii) mixed households. Our analysis shows that more than 10 per cent of Tanzanian rural households engage only in the nonfarm economy. The heads of these households tend to be younger and better educated than the heads of households in the two comparison groups. Gender of the household head does not seem to affect the likelihood of being a nonfarm household. Households in  8 2014 (ILFS) 66.9 Annualised growth rate in population, total employment and employment in agriculture and non-agriculture (percentage) Population (2002 1. 8 2.7 Total employment 2000 1. 3 2.4 20023 2.4 -2012 1.6 2.  4 20024 -2012 13. 5 8.8 20065 8.8 -2014 6.8 Notes: For the Census employment, data for current employees aged 10 years old and above is used. Agricultural employment is based on the industry classification. For the Household Budget Survey (HBS), employees are for aged 10 years old and above. The definition of employment differs between the two rounds of HBS. In HBS 2000/ 2001, 'unpaid family helper' counts for 7.8 per cent of total employment and is all considered as non-agricultural employment. This causes two problems: (1) total employment of HBS 2000/2001 is more than that of Census 2002, which was conducted two years later; and (2) (NBS, 2006(NBS, , 2011a(NBS, , b, 2014a(NBS, , b, 2015. The micro data for Census 2002 is downloaded from IPUMS (https://usa.ipums.org/usa/). also identifies a group of firmsthe 'in-between' firmswith the potential to contribute to rural transformation. Section 6 studies the characteristics of the 'in-between' firms to better understand how policy-makers might target these firms. Section 7 concludes with a summary of the main points and a brief discussion of policy implications.  (Table 1). Comparing these large declines with the modest changes in the share of rural population discussed above indicates that growth in rural nonfarm employment outpaced growth in agricultural employment over this period.
Indeed, the annualised employment growth rates in the recent 15 years presented in the third panel of Table 1 clearly indicate this pattern. These employment growth rates are computed from each of the three surveys' two most recent rounds. In general, employment in agriculture has been growing at a slower pace than total employment nationwide and in rural areas. According to the census data, the growth rate of agricultural employment is almost zero between 2002 and 2012 in rural areas. By contrast, growth in rural non-agricultural employment has been in the double-digit range growth rate over this period (the bottom of Table 1).
To help better understand the changing structure of Tanzania's employment, Table 2 displays the structure of net increases in employment across different economic sectors between 2002 and 2012 for total employment and formal and informal employment. While the agricultural sector still accounts for two-thirds of total employment in Tanzania, as shown in Table 1, it has played a relatively minor role in the net job increase as shown in Table 2. In fact, almost 90 per cent of the net increase in jobs occurred in the non-agricultural sector over the period of 2002-2012. Considering that agricultural employment made up more than 80 per cent of total employment nationwide in 2002 (Table 1), this rapid non-agricultural employment growth is remarkable.
As is evident from Table 2, about 73 per cent of the net increase in total employment has taken place in the informal non-agricultural sector, accounting for 88 per cent of the increase in private non-agricultural employment. We do not have access to detailed employment data disaggregated by rural and urban among different sectors. However, we know from Table 1 that nonfarm employment increased significantly in rural areas. If one assumes that formal non-agricultural employment is more likely to take place in urban than in rural areas, which is a realistic assumption, then the 88 per cent of increased private sector's nonagricultural employment created by the informal sector nationwide can be taken as the lower bound for informal employment growth in rural areas. Section 4 of this paper will use the MSME data to further investigate the nature of this rural nonfarm employment.

Data and methods
The two datasets used for the analyses are briefly discussed in this section, followed by a description of the methodologies employed to analyse two distinct research questions. We provide a more detailed description of the HBS data in the Appendix since the data are fairly standard and have been used frequently. We devote more space here to the description of the MSME data which is unlikely to be familiar to readers both because of its' limited use and because these kinds of nationally representative firm level surveys of mostly informal firms are much rarer. First, what are the characteristics of households which participate in the rural nonfarm economy compared to other types of households? And second, can the firm level data be used to assess the characteristics and potential of rural enterprises in comparison with those in urban areas?

The HBS data
The 2011/2012 Household Budget Survey (HBS) is a nationally representative survey, which is designed to provide estimates of household income and expenditures for poverty assessments, similar to how the Living Standard Measurement Surveys (LSMSs) are conducted routinely in many other countries. The 2011/2012 HBS surveyed 4130 rural households and 6056 rural households, and is also representative at three geographic locationsrural, Dar es Salaam and other urban. Similar to a standard LSMS, the HBS has an occupational module that provides information for all household members' primary employment by industries (including farming). This is the module used for the rural employment analysis in this paper. The survey also asked whether the households have their home enterprises, a question also used in the discussion in Section 3. However, the HBS only covers mainland Tanzania and excludes Zanzibar. Like the MSME survey to be discussed later, the sampling framework used to conduct the HBS survey is based on the 2002 Census, which could possibly oversample rural households given that, as discussed in Section 1, the 2012 Census has shown a decline of 6.5 percentage points in the share of rural population from the 2002 Census. A set of summary statistics based on the 2011/2012 HBS for the variables used in our analysis is presented in Appendix Table A1.  (NBS, 2006;2007, 2014b.
The role of rural enterprises in Tanzania's transformation 837 3.2. The MSME data As mentioned in Section 1, the Micro, Small, and Medium Sized Enterprise (MSME) survey is Tanzania's first nationally representative survey of small businesses. The data was collected during interviews with 6134 small business owners identified in a three-step sampling process. In the first step, a sample of 640 representative enumeration areas were selected. A complete listing of all households was carried out to identify households that currently owned and ran small businesses or had recently closed businesses. In the second step, about nine to 12 households with currently operating businesses (and two to three households with closed businesses) were selected. Finally, if more than one member in a selected household owned and ran a small business, a Kish Grid was applied to select the interviewee. There are three questionnaires for the survey. The main questionnaire, which is used for this study, includes 192 questions on 20 topics that are asked to owners of currently operating enterprises. Based on the enterprise's main activities, main products and services, all enterprises in the survey were assigned to an industry according to the International Standard for Industrial Classification (ISIC). There are 80 unique ISIC industries, of which many have few sampled firms. Thus, in this study, the 80 industries are aggregated into 24 subsectors, six of which are in the manufacturing sector and the rest in service sectors. A set of summary statistics of the MSME survey data is reported in Table 3 for the three areas separately: rural, other urban area and Dar es Salaam. Among the 6134 sampled firms, a total of 5609 firms have all the information required for the analysis. Based on the information that is available, there is no reason to believe that the firms with missing information are significantly different from the rest of the sample; for example, they are dispersed across regions and firm size.
As shown in the first row of Table 3, most MSMEs are extremely small: mean employment is 1.5 in rural areas, 1.65 in urban areas outside Dar es Salaam and 1.7 in Dar es Salaam. The very small size of MSMEs is at least in part possibly due to a sample selection bias. The sampling framework is household based rather than enterprise based, and is based on the 2002 Population and Housing Census. This could possibly lead to oversampling rural households given that, as discussed in Section 1, the 2012 Census has shown a decline of 6.5 percentage points in the share of rural population from the 2002 Census. The household-based sampling means that the survey probably under-sampled businesses outside households, which in practice translates into under-sampling relatively larger sized firms. The fact that more than 40 per cent of firms with five or more employees in the survey are sampled in rural areas might support this concern, as it is known that most larger sized firms are often in urban areas instead of rural areas. Among the 6134 enterprises sampled in the survey, there are only 96 firms with five or more employees. While for the largest firm in the sample there are 80 employees, the second and third largest ones have employees of 34 and 33 respectively. In fact, there are only four firms in the survey with employees more than 20. While the larger sized firms are usually assigned larger weights than the smaller ones in the data, this potential issue of possibly undersampling larger urban firms is unlikely to be fully corrected by the assigned sample weights given that the sampling weights are derived from the relationship between listed households and households currently operating with businesses in selected representative enumeration areas.
As expected, few small firms are registered with Tanzania's Business Registration and Licensing Agency (BRELA) and there is little difference between rural and urban firms in this regard. 5 In contrast, more urban enterprises (8%) have tax identification numbers than rural enterprises (3%). While the MSME survey is a household based survey, 49 per cent of rural firms report that their businesses actually operate out of their homes and the number in urban areas is almost identical at 47 per cent. Therefore, the shares of nonfarm enterprises reported in Table 4 calculated from HBS 2011/ 2012 data could significantly underestimate the importance of MSMEs in rural areas, given that HBS captures only businesses run out of the home. Table 3 also reports average monthly value-added and average monthly sales per firm. Firms in the MSME database report sales on a monthly basis and also provide their own judgement on whether a particular month is a good, bad or normal month. By taking the possible seasonality into account, value added is then computed as the firm's average monthly sales minus the firms' average monthly costs of production. The mean value-added of rural firms reported in Table 3 is about 20 per cent lower than the mean value-added of urban firms in both Dar es Salaam and other urban areas. However, there is significant variation among surveyed firms in monthly value-added in both rural and urban areas, indicated by the high value of the standard deviation (s.d.) in Table 3. We will return to this point in detail later in this paper. Most MSME firms are young, with a mean age of 6.9 years for rural firms, 6.1 years for urban firms outside Dar es Salaam, and 5.5 years for firms in Dar es Salaam. This is consistent with the findings in Section 1 that most nonfarm jobs created in Tanzania between 2002 and 2012 were created by the informal sector. Table 3 also indicates that 76 per cent of rural businesses operate full time, compared to 82 per cent in urban areas outside Dar es Salaam and 87 per cent in Dar es Salaam. More than 40 per cent of rural business owners report that the business is the owners' main source of income with a significantly lower share of rural business owners (28%) reporting that the business is the owners' only source of income. By contrast, 44 per cent and 51 per cent of business owners in other urban and Dar es Salaam report that the enterprise is their only source of income. Only about a quarter of rural business owners report that farming is their main source of income, a fact that supports the finding from the 2011/2012 HBS, that many households (about 10%) in rural areas only participate in the nonfarm economy as their primary employment.
Like their businesses, the owners of these small businesses are also relatively young. For the full sample, the mean age of business owners is roughly 37 years in rural and 36 years in urban areas. In contrast, according to the HBS data, the average age of a rural household's head is 47 years old and 42 years old for an average urban household's head.
Finally, the last three rows of Table 3 report the distribution of business owners by the three categories of their household's income. The measure of these three income categories (very poor, modestly poor and not poor) 6 was computed using monthly household income reported by survey respondents. Poverty appears to be higher among households with MSME owners than among overall rural and urban households. Based on the poverty assessment profile reported by the government (NBS [National Bureau of Statistics], 2013), there are 66.7 per cent of the rural population, 78.3 per cent of the urban population outside of Dar es Salaam and 95.8 per cent of Dar es Salaam's population lives in nonpoor households. In contrast, in the MSME survey, only 45 per cent of rural MSME owners, 54 per cent of urban MSME owners outside of Dar es Salaam and 62 per cent of MSME owners in Dar es Salaam live in nonpoor households. This seems to confirm that in both urban and rural areas, small businesses are often part of a coping strategy for many poor households, while rich households are less likely to choose such small businesses as their main livelihoods.

Methodologies
The empirical strategy employed in this paper aims to answer to two questions: (1) What determines whether households participate in the nonfarm economy? And (2) what determines whether nonfarm enterprises have the potential to contribute to employment and labour productivity growth? To answer these questions econometric analyses are used. Descriptive statistics for the HBS and MSME data provide a glimpse of the heterogeneity that is observed across households and firms. The means and standard errors presented in all the descriptive tables of this paper were generated using the sampling design of the two surveys (HBS and MSME).
To address the first question, the rural households in the HBS data are classified into three types based on their family members' primary employment in the econometric analysis, while more subgroups of households are further classified in the descriptive analysis. The three types of rural households are: (1) farm, in which all family members' primary employment is agriculture; (2) mixed, indicating that in the same household some family members work in agriculture and others in rural nonfarm economy; and (3) nonfarm, in which all family members work in the rural nonfarm economy as their primary employment. A multinomial probit model is used in the analysis. The choice of a multinomial probit over a multinomial logit arose from the fact that the multinomial probit is better suited to handle correlations and is not bound by the independence of irrelevant alternatives like the multinomial logit. The farm household group is chosen as the comparison category in the regression.
For the MSME dataset, the left-hand side variable is binary which under normal circumstances could be addressed with a simple probit model. However, we must modify our approach due to possible endogeneity issues in some of the right-hand side variables. Dealing with endogeneity in nonlinear models (particularly in a probit model) is a straightforward exercise if the endogenous variable is continuous. In such cases, endogeneity is dealt using a control function approach as explained in Wooldridge (2010) and StataCorp (2015). 7 The endogenous variables in our dataset are binary in nature, however. This eliminates the possibility of using a control function approach but raises the possibility of using a bivariate probit model. However, this last approach is also ruled out because it only allows for one endogenous variable while our dataset contains multiple endogenous variables. To address this issue, we have therefore resorted to estimating the probit model using a generalised method of moments (GMM) approach which allows for the endogeneity of multiple variables.
Estimating probit models using GMM is straight-forward if only exogeneous variables are present in the right-hand side, as is shown in StataCorp (2015). However, in the presence of endogeneity the estimation becomes considerably more complicated, as instruments cannot simply be added to the moment conditions of a GMM instrumental variable approach (as one would do for linear models). Doing so for probit or logit models is not possible since neither the conditional expectations nor the linear projections assumed for the linear model apply in the case of probit models. We have therefore followed Wilde (2008) to estimate a two-stage generalised methods of moments (GMM) model which accounts for both the non-linearity of the model and the binary nature of the endogenous variables.
The GMM estimation as proposed by Wilde takes the following form. In the first stage, a reduced form probit model is estimated for each endogenous variable and the residuals are calculated. Having estimated stage 1, a two-step GMM approach is used to estimate the parameters using the correct moment conditions and the necessary adjustments to the standard errors. We refer the reader to Wilde (2008) for details on the specification of the moment conditions. The method proposed by Wilde and as applied in this paper leads to an exactly identified estimation, meaning that the number of instruments equal the number of parameters. Thus, no tests for overidentification of instruments could be conducted for the final estimation. The left-hand side variable for the structural model was defined as follows. It took the value of one if the firm's value-added per worker (which is used to measure the firm level labour productivity) is greater than the economy-wide labour productivity in the The role of rural enterprises in Tanzania's transformation 841 trade sector at the national level and zero otherwise. The small firms with high potential are defined as the 'in-between' firms following Lewis (1979).
The specifications of the models described above are provided in Sections 3.2 for the HBS survey and Section 4.4 for the MSME survey. Average marginal effects, which can be interpreted as the change in the predicted probability given a one unit change in the right-hand side in the case of continuous variables or a discrete change in the case of categorical variables, are reported. With the exceptions of two variables (firm age and number of employees), all our right-hand side variables are binary in nature. Thus, the marginal effects should be interpreted as discrete changes and, as such, we report average marginal effects. All estimations are done using robust standard errors in accordance to the sampling design. Since the survey was not in any way stratified and subnational units are not representative, we have not clustered the standard errors to any specific subnational location.

Characteristics of households with rural nonfarm activities
We begin this section using the 2011/2012 HBS to assess the size of the rural nonfarm economy by the number (or share) of rural households that participate in the nonfarm economy and the differences between households with and without rural nonfarm participation. We then analyse the characteristics of the three categories of households using a multinomial probit model.

How large is the rural nonfarm economy?
As a starting point, we first classify rural households using the HBS data into with and without nonfarm activities. The classification is based on household members' primary employment. Like other low-income countries in Africa, most states in Tanzania are predominantly rural (Davis, Di Giuseppe, & Zezza, 2014). Farm activity dominates rural Tanzania -61.4 per cent of rural households' members engage only in agricultural activities in 2012 (Table 4), and less than 40 per cent engage in nonfarm activities. Households engaging in rural nonfarm activities can further be classified as farm/ nonfarm mixed and nonfarm only households. As shown in Table 4, about 27.5 per cent of total rural households are farm/nonfarm mixed households and about 11 per cent are nonfarm only households.
We also further categorise rural households with nonfarm activities according to whether they have their own nonfarm businesses. This helps us to establish a link between the HBS survey and the MSME survey to be analysed later. According to the HBS, fewer mixed households have their own nonfarm businesses, while almost half of the nonfarm only households own nonfarm businesses. These numbers are comparable to the statistics drawn from the firm data of the MSME survey in Table 3, which shows that 43 per cent of rural nonfarm enterprises are the household's main income source, but only 28 per cent with businesses report that the business is the only source of income. Finally, we also show in Table 4 that rural households with nonfarm businesses are less likely to be in poverty.

Characteristics of households in the rural nonfarm economy
Characteristics of the different types of households are explored using a multinomial probit model as discussed in Section 2.3. Equation (1) below describes the specification used in the estimation where y i is the choice of a given household and takes three values (1 = farm, 2 = mixed, 3 = nonfarm), H i is a vector of household characteristics, C j is a vector of infrastructure or other community level factors, D r is a set of regional dummies. The variables in the vector H i include a dummy equal to one if the household is headed by a young person (age 15-34); a dummy equal to one if the household head is female; dummies for the levels of education of the household heads (less than primary as the comparable variable) and; dummies for farm size defined by cultivated area categorised into four groups: no-land, farms with less than two ha, farms with two to five ha, and farms with more than five ha (no land is the comparison group). Vector C j contains a set of variables related to access to infrastructure at the community level and other community level variables including daily public transportation to the regional capital, electricity, mobile phone signal, internet, banks, informal finance, cooperatives, a large employer (for example, a factory), and a weekly market. ε i is the iid error term. Table 5 reports the average marginal effects for the three types of households. We begin with the household variables. Being a young household head has a significant and positive effect on being a nonfarm household, with the predicted probability increasing by 3.1 per cent, and has a negative effect at a similar scale on being a mixed household. The level of the household's head education shows a Notes: Standard errors in parentheses. *p < 0.05; **p < 0.01; ***p < 0.001. The average marginal effects (predicted probabilities) based on the multinomial probit regression are reported. In the multinomial probit regression, the farm household group is chosen as the comparison category. See Table 5 in Sosa-Rubi, Galárraga, and Harris (2009) for a similar way to report the result. Source: Authors' calculation from their estimation results of multinomial probit regression using 2012 Tanzania HBS data.
distinct and contrasting effect on being a farm or a nonfarm household. While both primary and secondary/higher education matter for being a nonfarm household, the more educated the household's head, the larger the effect. Only the higher level of education affects the probability of being a mixed household, as the effect of primary education is insignificant. These results seem to indicate that higher levels of education may be required to obtain nonfarm jobs in rural areas. Next, we look at farm size. As expected, having a larger farm size decreases the probability of being a nonfarm household by 22-24 per cent. Likewise, a larger farm size is associated with a higher predicted probability of being a farm household by 20.6-27.3 per cent.
Differences among the three types of households are less pronounced for community level variables, perhaps due to the decreased variability that is inherent in these variables. Indeed, public transportation to the regional capital, a proxy for road access, only positively affects the probability of being a nonfarm household. Having a mobile phone signal has the opposite effect on being a farm and a mixed household, positive for the former and negative for the latter, while the effect on being a nonfarm household is insignificant. While the use of electricity for doing business, especially in the manufacturing sector, is important, the variable is not significant for any type of household; this is also true for internet access. This may be because access to electricity or internet at the community level does not necessarily imply access at the household level. Access to informal financial services is associated with a higher probability of being a farm household and a lower probability of being a nonfarm household, suggesting that informal financing is the main channel to borrow money for farm households. The presence of cooperatives is only significant for the effect on being a mixed household with a 3.3 per cent greater predicted probability. As expected, the presence of a large employer is associated with a higher predicted probability of being a nonfarm household by 4.5 per cent, but does not influence being a mixed household. Finally, having access to weekly markets is the only variable that is significant across all types of households. Access to markets reduces the predicted probability of being a farm household by 7.5 per cent and increases the predicted probability of being mixed and nonfarm households by 4 per cent and 3.4 per cent respectively. The mixed results regarding the role of infrastructure are puzzling. As noted, this may be because the community level variables are too 'rough' a proxy for access at the household level. However, the lack of significance of these variables may also be associated with the small scale of rural nonfarm enterprises. According to Tybout (2000), low levels of economic density and interaction may lead to small, diffuse pockets of demand, which in turn result in small, localised production and services. We revisit this issue in the next section using the MSME data.

Characteristics of rural nonfarm enterprises and their ownersan analysis at the firm level using MSME survey data
A vibrant rural nonfarm sector can play an important role in rural transformation. To understand the extent to which the rural nonfarm sector can play a role in labour productivity growth and poverty reduction in rural areas, the MSME survey data is used to examine the motivations of business owners in the rural nonfarm sector as well as the characteristics of their businesses. In a previous paper, Diao, Kweka, McMillan and Qureshi (2017) identify a group of MSMEs that can be considered members of what Arthur Lewis (1979) referred to as the in-between sector. According to Lewis (1979), these firms play an important role in the transformation process. Lewis (1979) uses the term in-between to signal that these firms are not just petty traders, rather, they often look more like formal firms and provide important goods and services. Diao, Kweka, et al. (2017) show that rural enterprises are on average slightly less productive than their urban counter-parts (this is confirmed in Table 3 of Section 2 in this paper) but they do not explore in detail the characteristics of rural enterprises or rural entrepreneurs.
This section begins with a description of the location and industrial composition of MSMEs. It follows with an exploration of the extent to which rural entrepreneurs appear to be subsistence or growth-oriented. To analyse this issue, we use the following data from the MSME survey: (i) selfreported motivations for business ownership; (ii) the productive heterogeneity of MSMEs and; (iii) employment growth in MSMEs. Table 6 reports the distribution of employment and the number of MSMEs by rural, other urban and Dar es Salaam, compared with the distribution of population in the three locations. While more than 67 per cent of the population lives in rural areas, rural MSMEs account for 52 per cent of total MSME employment. In urban areas, the distribution of MSME employment/firms and distribution of population seem to be similar in Dar es Salaam and other urban areas. 15.8 per cent of MSME employment and 17.3 per cent of MSME firms are in Dar es Salaam, where 12.2 per cent of the national population resides. Likewise, 32.6 per cent of MSME employment and 30.7 per cent of MSME firms are in other urban areas, which contain 20.4 per cent of the population (Table 6). Table 7 reports the industrial distribution of MSMEs by rural, other urban and Dar es Salaam. Although the MSMEs operate in a wide range of activities, the bulk of these activities can be classified as trade services (80%) and manufacturing (15%). However, more rural firms (19.8%) engage in manufacturing than urban firms (10.1% in other urban and 7.2% in Dar es Salaam). Seventy-two per cent of manufacturing MSMEs are in rural areas while 52 per cent of trade service MSMEs are in rural areas. This is an expected pattern, as small manufacturing firms operate mainly in food processing, which has strong links to agriculture. Without further information, however, it is not possible to identify exactly what these linkages are and how they work. This is an important area for future research. More firms are in the trade services in Dar es Salaam (87.6%) than in other urban (83.0%), which is clearly driven by demand for tradable goods.

Self-reported motivations of small business owners
The MSME survey includes three questions designed to elicit the reasons for opening a business. Responses to such self-reported motivations for a business could help us assess the extent to which rural entrepreneurs are in business solely for the purposes of survival or aiming to grow. The responses to these questions are tabulated using sample weights in Tables 8-10.  (2012) and MSME employment and number are from MSME survey (2010). The role of rural enterprises in Tanzania's transformation 845 The first question is: 'What was your main occupation before you started this business?' As shown in Table 8, the biggest difference between rural and urban entrepreneurs is that 56.5 per cent of rural entrepreneurs report that their main occupation prior to starting the business was farming compared to 19.3 per cent in urban areas outside Dar es Salaam. Very few respondents (4.8%) in rural areas report that they were unemployed prior to starting the business; this is not true in urban areas where 11.3 per cent and 9.7 per cent of MSME owners in other urban and Dar es Salaam report that they were unemployed before starting their business. Unlike in rural areas, urban business owners are much more likely to report that they were previously employed in a private company or running a similar sized business in another line of business. It is also much more common for urban business owners to report that they were previously a housewife or homemaker (26.6% in other urban and 34.1% in Dar es Salaam) than for rural respondents (12.3%).
The second question is: 'For what reasons did you choose your line of business?' In Table 9, the firms responding to this question are grouped into three broad sectors: manufacturing, trade services and other services, by rural, other urban and Dar es Salaam. In rural areas, half of all business owners say that the reason they chose their line of business is because they saw a market opportunity. This response is similar for firms in manufacturing and trade services. However, this response is less common in Dar es Salaam and other urban areas. The second most common reason for operating in a line of business in rural areas is that the owners' capital could only finance that line of business; this response is more common in urban than rural areas possibly indicating that capital constraints are more severe in urban areas. The third most common reason for choosing a line of business in rural areas was prior experience in that line of business, although shares for this reason are much lower than the two previous reasons.
The third question is: 'If you were offered a full-time salary paying job, would you take it? 'Responses to this question are reported in Table 10 and indicate that only 46.6 per cent of all small business owners would leave their current business for a full time salaried position, but the share is higher in rural areas (47.8%) and other urban areas (48.6%) than in Dar es Salaam (37.7%). Approximately 64 per cent of all respondents who would prefer a full time salaried job say they   would like to work for the government, with 68.6 per cent and 62.9 per cent in rural areas and other urban areas respectively but only 44.6 per cent in Dar es Salaam, where more government jobs are concentrated. The responses from rural and other urban MSME owners are consistent with results reported in Banerjee and Duflo's analysis of the economic lives of the poor (Banerjee & Duflo, 2007). Large private companies are more attractive to small business owners in Dar es Salaam than in other places. The predominant reason for preferring a full time salaried position is better security of income.

The productive heterogeneity of rural enterprises
The kernel densities of the log of value added per worker, which is defined as firms' labour productivity, is used to examine the productive heterogeneity of MSMEs. Value added is computed as the firm's average monthly sales minus the firms' average monthly costs of production, and seasonality is taken into consideration in the calculation. Only full-time employees (including owners of the firms) are considered in calculating value-added per worker or labour productivity for individual firms. The kernel densities of labour productivity reveal two important features of the MSME firms. First, there is a significant degree of productive heterogeneity among both rural, urban and Dar es Salaam enterprises. This can be seen by examining the density of the log of value added per worker in Figure 1. Surprisingly, the distribution of the log of value added per worker or labour productivity for rural firms is almost identical to the distribution for urban firms. In fact, stochastic dominance test rejects the hypothesis that the rural and urban distributions are not identical. 8 One reason for this may be the fact that medium sized enterprises that are mainly in urban areas appear to be under-sampled in the MSME survey discussed in Section 2.
In Figure 1 the vertical lines represent average labour productivity in Tanzania's economy in 2010 in the agricultural sector (the far-left line), the trade services sector (the middle-line) and the manufacturing sector (the far-right line). Economy-wide labour productivity is calculated using national accounts data and census data; since 1997 national accounts data make every attempt to include the informal sector (GGDC, Africa Sector Database, 2015). However, in practice it is difficult to accurately measure informal sector activity and so it is likely that economy-wide estimates of labour productivity are biased toward the formal sector. Figure 1 reveals that a little over half of the firms in the MSME sector have labour productivity levels higher than the average labour productivity in the agricultural sector and this is true in all three locations. This is not surprising and is consistent with evidence presented by Diao, Kweka, et al. (2017), who show that labour productivity for many MSMEs is consistently higher than average labour productivity in the agricultural sector. It is also true that around 25 per cent of rural and urban MSMEs have labour productivity higher than average labour productivity in the services sectora sector most MSMEs belong to. In fact, as shown in Diao, Kweka, et al. (2017), this group of 25 per cent of small firms accounts for 77 per cent of total value-added produced by the whole MSME sector. In other words, the remaining 75 per cent of MSMEs account for less than 25 per cent of the value added generated by the MSME sector. These results underscore the productive heterogeneity of MSMEs in both rural and urban areas. They also raise the possibility of a growth strategy focused on these most productive firms. This is not to say that the remaining firms should not be part of a strategy for alleviating poverty, perhaps they should. Our point is that the productive heterogeneity most likely calls for different strategies for different types of firms.
6. Using the MSME survey to identify 'high potential' rural enterprises If we accept that some rural MSMEs have more potential to contribute to rural transformation in Tanzania than others, we are left with the question of how to identify those with potential. This is a complicated problem not least because a properly designed mechanism should be immune from manipulation. We do not pretend to solve it. Instead, what follows is meant to be illustrative of the way in which we are thinking about the problem. We use a productivity cutoff to distinguish inbetween MSMEs from the rest of the MSMEs and then we look for readily observable characteristics of these highly productive MSMEs that might be used for targeting. In practice, it would be important to use readily observable characteristics that cannot be manipulated or are too costly to be manipulated by firms.
For the purposes of this exercise, we define the in-between firms as those with labour productivity greater than economy-wide labour productivity in trade services. Using this criterion, we identify 1334 rural firms in the MSME sample that can be classified as belonging to the in-between sector. Having The role of rural enterprises in Tanzania's transformation 849 selected our in-between firms, we use the GMM probit analysis previously described to identify characteristics of in-between firms. We use a host of business and owner characteristics that have been used and tested in the literature.
Prior to discussing the results of the GMM probit regressions, we must first identify the endogenous variables and present the instruments used for the reduced form probit estimations. Three endogenous variables were identified: the owner views the business as growing, the firm has regional customers, and the number of daily customers is more than 20. The first two variables were instrumented using the following variables: whether the business was located at home, whether the firm advertised, whether the firm had a business plan, and whether the firm regularly sends or receives money. The model was run using a two-stage GMM with robust standard errors. Results of the first-stage estimations are available upon request. Table 11 presents the GMM probit results of the increases in predicted probability (the average marginal effect) of being an in-between sector firm in rural and urban areas. Results for the owners' personal characteristics suggest that a female-headed business is associated with a decrease in the probability of being in-between in both rural and urban areas; probabilities decrease by 6.4 and 9.5 per cent, respectively. Owners that perceive their businesses to be growing observe gains in the predicted probability of being inbetween by 4.8 per cent and 6.4 per cent nationally and in rural areas, respectively. These results are intuitive; business owners who are optimistic about their firm's future and potential are more likely to be driven to achieve success and to use resources productively. Being a member of a business association increases the probability of being an in-between firm nationally and in urban areas, but not in rural areas, possibly due to the low participation rate of rural firms in such associations. Education was not found to be significant anywhere, likely due to lack of variation in education among business owners.
The second panel of Table 11 shows varying levels of significance between locations, which is expected. A one-year increase in a firm's age has a small effect on the probability of being an inbetween firm in the country as a whole and in rural areas, but not in urban areas. A one-unit increase in the number of employees reduces the probability of being an in-between firm by around 5 per cent consistently in both rural and urban areas. On the other hand, operating full time is associated with an increase in the probability of being in-between by 6.2 per cent nationally and 6.7 per cent in rural areas, but is not significant in urban areas possibly for two reasons. First, the survey is designed to capture small businesses, of which many in urban areas may be part-time. Second, unless the businesses are large enough, running a full-time small business in an urban area probably carries a higher opportunity cost if it means not finding a job. Keeping written accounts is significant with an increase in the predicted probability of being in between of around 6.5 per cent in all locations. There are increases in the predicted probability of being in-between for firms which have licenses, with a larger marginal effect in urban areas (6%) than in rural areas (3.4%). The increases in the predicted probability of being in the in-between sector from having regional customers is significant only in rural areas. However, firms that have a daily number of customers greater than 20 are between 7.4 per cent (rural areas) and 6.2 per cent (urban areas) more likely to be in the in-between sector.
The variables associated with the external conditions of doing business, infrastructure and technology, are presented in the third panel of Table 11. Using a mobile phone increases the predicted probability of being in-between by 4.3 per cent in rural areas and 6.2 per cent in urban areas. Whether the business uses electricity to light their businesses is important in rural areas, increasing the predicted probability by 5.6 per cent, but not in urban areas, possibly again due to the lack of variability in electricity access in urban areas. We also include three financing variables in the regression (the last panel of Table 11), and all three variables are related to the ways business owners allocated their profits. It turns out all three variables are insignificant, possibly due to lack of variability in these variables, or they fail to accurately capture firms' investment behaviour.

Summary and policy implications
Although Tanzania remains heavily rural, the composition of economic activity in rural areas has changed significantly over the past decade and a half. Between 2002 and 2012, the share of the rural Notes: Standard errors in parentheses; ***p < 0.01, **p < 0.05, *p < 0.1. Dependent variable is a binary variable which takes the value of one if the firm is in-between and zero otherwise. Firms in the 'in-between' category satisfy the following conditions: labour productivity is higher than economy-wide labour productivity in trade. Source: Authors' estimation using MSME data. labour force working in nonfarm employment tripled going from 6.8 per cent to 20.5 per cent, Moreover, in 2011/2012, more than one-third of rural households participated in the rural nonfarm economy and 11.2 per cent of rural households reported that working members of the household had primary employment only in the nonfarm economy. The heads of 'nonfarm only' rural households tend to be younger and more educated, while the heads' gender does not appear to influence the likelihood of being a nonfarm household. Education of the household head is also a determinant of the likelihood that a household participates in the nonfarm sector; a primary education increases the probability of engaging in nonfarm activities by 5.8 per cent and a secondary education increases the likelihood of engaging in the nonfarm sector by 16.9 per cent. Among a set of selected community level variables, households in communities with access to daily public transportation or a weekly market are more likely to participate in rural nonfarm activities. Consistent with these results, we find that rural households with nonfarm activities are less likely to be poor. However, it is still true that around 15 per cent of rural households whose primary source of income is the nonfarm economy have incomes that place them below the poverty line. The implication is that some nonfarm activities must be very unproductive. By extension, although these activities help families to survive, it would be unrealistic to expect them to contribute significantly to rural transformation.
To explore the nature of the nonfarm businesses owned by rural households in Tanzania, we use Tanzania's first nationally representative survey of micro, small and medium sized enterprises. Roughly 20 per cent of these businesses operate in the manufacturing sectormore than double the share in urban areasthe rest of the businesses operate in the services sector. Labour productivity among these businesses is extremely heterogeneous with roughly half having labour productivity lower than average labour productivity in agriculture. Using a probit specification we show that operating full time, keeping written accounts and using electricity to run the business are all positively correlated with labour productivity.
We conclude that policies designed to stimulate rural transformation must take into account the heterogeneity of the rural nonfarm sector. Unless this heterogeneity is understood, policies designed to stimulate rural transformation are likely to disappoint. Of course, rural nonfarm activities help to generate income and reduce the risks associated with agricultural production for many rural households. These activities should be supported as part of a poverty reduction strategy. But we should not expect the large majority of these activities to transform rural livelihoods. For this to happen, it will be important to target the firms with the potential for employment and labour productivity growth.