Size distributions of slums across the globe using different data and classification methods

ABSTRACT More than 900 million people worldwide live in slums. These slums mainly can be found in cities of the global south and are characterized by poor living conditions and usually insufficient access to basic infrastructure such as water or energy. In order to improve the living conditions of slum inhabitants, information about the number, location and size of the slums is required to plan supply infrastructure. We therefore identify morphological slums in eight different cities in Africa, South America and Asia, using remote sensing data and analyse their size distributions. We show that 84.6% of all observed morphological slums have a size between 0.001 and 0.1 km2. These results rely on a consistent approach using a clear ontology and conceptual frame for classification. However, classification methods for these underserved areas differ. We show slum classifications based on different methods reveal a strong dependency between the particular method and the resulting size distribution. The study shows the relevance of remote sensing for the investigation of slums and the results can be used for infrastructure planning, as infrastructure improvement projects are often limited to the large known slums. Whereas, the large number of small slums distributed across the city is often neglected.


Introduction
More than half of the world's population lives in cities (United Nations, 2016). According to several studies, the urban population is expected to continue to grow rapidly over the next few years, especially in Asia and Africa (Kraas & Schlacke, 2016;United Nations, 2016). Connected with the strong urban growth in these regions of the world is the emergence of slums or informal settlements. It is estimated that about 25-50% of the world's urban population lives in slums and that the absolute number of slum dwellers will increase in the next years up to 2 billion (Kraas & Schlacke, 2016).
The United Nations defines slums as areas with high population density, poor infrastructure and bad living conditions (Habitat, United Nations, 2016). According to several studies, these living conditions have a significant negative impact on both the physical (Ezeh et al., 2017) and mental health (Subbaraman et al., 2014) of their inhabitants and therefore, it is important to improve the water supply and sanitation infrastructure in slums (Lilford et al., 2017;Van der Bruggen, Borghgraef, & Vinckier, 2010). In their global sustainable development goals, the United Nations are therefore striving to improve the supply infrastructure in these settlements (United Nations, 2015).
In order to be able to plan or provide holistic supply strategies in terms of water, energy, sanitation and others for these settlements (Friesen, Rausch, & Pelz, 2017;Rausch, Friesen, Altherr, Meck, & Pelz, 2018), it is necessary to obtain information on the number, location, size and size distribution of the respective slums.
In the literature, there are numerous studies on the description and analysis of slums, their morphology, size and structure. Detailed summaries and systematic reviews can be found in the works of Mahabir, Croitoru, Crooks, Agouris, & Stefanidis (2018) and Mahabir, Crooks, Croitoru, & Agouris (2016). In recent years, the analysis of slums using remote sensing data has increased, mainly due to the now available high-resolution data. A detailed review of recent studies about slum mapping using remote sensing data can be found in the work by Kuffer, Pfeffer, & Sliuzas (2016). Taubenböck and Kraff (2014) empirically show that morphological slums can be distinguished from formal structures in cities. In a further study, Wurm and Taubenböck (2018) prove that a physical approach on morphological slum structures allows to locate the social group of urban poor to a certain degree. They also analyse the diversity in morphology of these settlements all over the world and calling them arrival cities, since often these settlements are the place where immigrants arrive (Taubenböck, Kraff, & Wurm, 2018).
In addition to these works, however, the question of similarities between the different slum systems in different parts of the world arises. A previous study by the authors showed that slums in different metropolitan regions of the world have a similar size (Friesen, Taubenböck, Wurm, & Pelz, 2018). They investigated morphological slums in four different cities around the world using remote-sensing data. The morphological slums were, according to Taubenböck and Kraff (2014), distinguished from other settlement areas on their building structure and morphology. The study shows that the geometric mean of slums in different cities varies between 0.0085 km 2 (Manila) and 0.0198 km 2 (Rio de Janeiro) and have an average size of 0.016 km 2 for all considered cities.
Since the used database in the paper of  is small (four cities), we extend the previous study by four additional cities. Furthermore, the size distributions are examined in more detail in order to identify similarities and differences between the cities. Beyond, we also investigate in three cases the influence of different classification methods onto the resulting size distributions.
The structure of this paper is as follows: First, we briefly introduce the different datasets used for the investigation and the metrics used for analysis and comparison. In the second step, we present the results. Finally, we discuss and summarize the results and give a short outlook for further research.

Materials and methods
The general workflow of our approach relies on the study by Friesen et al. (2018). The main steps are as follows: we use optical high-resolution remote sensing data (i), locate slum areas within the city (ii), calculate the sizes of the slums (iii) and analyse their size distributions (iv). The workflow is shown in the following Figure 1.

Datasets
The data used in this paper come from different sources. We investigate slums in the Metropole regions of Cairo (Egypt), Cape Town (South Africa), Manila (Philippines), Mumbai (India), Dhaka (Bangladesh), Caracas (Venezuela), Rio de Janeiro and Sao Paulo (Brazil). The cities listed are all located in the Global South and have different politically, culturally, economically, topographically backgrounds. They all also have a very large population of several million inhabitants: Dhaka has 14.4 million (2015) For all cities listed, we used one consistent classification method, based on physical parameters of the observed settlements, using very high-resolution optical satellite data. The classification relies on the ontology introduced by Kohli, Sliuzas, Kerle, and Stein (2012) and the empirical classification for morphologic slums shown by Taubenböck et al. (2018). For the mapping approach, visual interpretation of the high-resolution basemap images included in ESRi ArcMap was performed. Slums were characterized as such based on physical parameters such as the organic pattern, high densities or small building sizes of the settlements. As a comparable spatial entity, the mapping relied on the spatial level of the city block, focusing on the derivation of homogenous patches of similar building structure with adjacent homogenous patches forming the spatial entity of a slum which are usually circumscribed by significant street networks or natural boundaries . For separating individual slum patches, a minimum distance of 10 m was defined. This framework is applied onto all mentioned cities.
However, we also investigate the influence of different classification methods onto the resulting size distributions. For Cairo, we investigate, next to our consistent approach identifying morphologic slums, different classification methods, using the information from Sims, Sejoume, & El Shorbagi (2003). They divide the informal settlements in Cairo into different slum types with different properties (A, B, C).
The typology of slum type A "is defined as private residential building constructed on agricultural land purchased from farmers in areas where there were no subdivision plans and where building permits were not given". The second typology B "is defined as private residential buildings constructed on vacant state land by citizens' under the process of 'hand claim'". The third typology C represents "neighbourhoods with a high percentage of old, crowded, and deteriorated structures within the medieval urban fabric".
The division in these different types is necessary, since the morphology of the slums in Cairo usually deviates very strongly from the other areas in other cities known to us. While on the one hand, slums of type A actually have the structure of formal development and are only classified as slums, because they were built illegally, slums of type C on the other hand consist of simply dilapidated, historic buildings in the city centre (Kraas & Schlacke, 2016;Sims et al., 2003).
In this paper, our analysis will primarily focus on type B "Informal Areas on Former Desert State Land", as this type corresponds best with the morphologic classification mentioned above.
We also investigate a dataset of Sao Paulo with data collected by the Brazilian Institute of geography and statistics (IBGE) in the Census 2010. The Brazilian census classified the slums (favelas) as subnormal agglomerations or Aglomerados subnormais, when they have at least 51 shacks or houses and fulfil other criteria like: "non-standard urbanizationreflected by narrow, irregularly aligned roads, unevenly sized plots and shapes, and unregulated constructions by public agencies". Details of the data collection are summarized by Demográfico (2010).
The last comparison is the comparison of our dataset of Dhaka with the slum classification from Gruebner et al. (2014), who mapped the slums of Dhaka for the years 2006 and 2010, by visual interpretation of Quickbird satellite imagery. Detailed information about the data collecting can be found in the work by Gruebner et al. (2014).
The only pre-processing step performed with all datasets is a merging of slums with touched boundaries. All datasets used are summarized in Table 1.

Methods and metrics for comparison of slum sizes
In order to be able to compare the slums in the before mentioned cities, we use different methods and metrics. Since we want to identify the dominant size of the different urban systems, we calculate the geometric mean with the ground area S i of slum i and the number of slums N within a city. Connected with this value is the logarithmic standard deviation (2) to determine how much the different slum sizes within a city deviate from the geometric mean S 0 calculated above. Beside the area, we calculate the typical length of slum systems as a metric for the planning of infrastructures Finally, we analyse the size distribution of the different cities in particular. Therefore, we divide the whole distribution in five size sectors I-V. Size sector I covers all N I slums S I;k with an area 0.0001 km-2 < S I < 0.001 km 2 , size sector II all N II slums with an area 0.001 km 2 < S II < 0.01 km 2 and so on. The different sectors are shown in Figure 2.
We work on a logarithmic scale, because the sizes of the different morphological slum cover a wide range of magnitudes. While the smallest detected morphological slum for example in Caracas has a ground area of just 0.001 km 2 , the largest slum has an area of 3 km 2 .
For each of the before mentioned five sectors, we calculate the relative area of slums in relation to the total area A Slum;Total ¼ P N i¼1 S i of slums within a city For explanatory purposes, we show the definitions for sector I only, but of course, they also apply to all other sectors. We also calculate the relative number of slums with the number N I of slums in sector I and N the number of slums within a city. We do so to identify the dominant magnitude in slum sizes. A last metric is the size relation between larger and smaller slums Thus, the total area of all slums larger than 0.1 km 2 (sectors IV and V) is compared with the total area of all slums smaller than 0.1 km 2 (sectors I-III). In this context, we define the threshold at 0.1 km 2 . This value can of course also be shifted, which would lead to a change in the results.
Finally, we use images from Google Earth and the connected timeline for demonstration purposes to compare the classifications.

Results
After describing the size distributions of morphological slums in the different cities in general, we compare the size distributions of different classification methods for the cities of Dhaka, Sao Paulo and Cairo in particular.

Size distribution of cities
In Figure 3, the geometric mean of the different cities is plotted over the number of classified slums within the respective city. The number of observed morphological slums per city differs between 41 for Cairo and 2125 for Dhaka. The size of the slum area in a city does not seem to correlate with the number of slums, as the slum areas of Sao Paulo and Caracas are similar, while the number of slums detected differ greatly. On the other hand, Dhaka and Cape Town both have relatively small slum areas, while also in this case the number of slums differs greatly.
The geometric mean ranges between 0.0572 km 2 for Cairo and 0.0021 km 2 for Dhaka. Connected with the geometric means are the typical lengths of the respective slum systems. Their values vary between 239.2 m for Cairo and 45.8 m for Dhaka by almost one order of magnitude. Table 2 shows the results for size distributions of slums in the different investigated cities.
An analysis of the different size sectors shows the following picture: In Sao Paulo, Rio de Janeiro, Mumbai, Manila more than 90% and in Cape Town more than 80% of all slums have a size between 0.001 and 0.1 km 2 (sectors II and III) and similar size distributions ( Figure 4). Besides that, the size distributions in Cairo, Caracas and Dhaka are different.
In Dhaka, the share of small slum units is much higher than in the other cities. Nearly 90% of all detected morphological slums are smaller than 0.01 km 2 (sectors I-III). By contrast, in Cairo a large proportion (41.5%) of the morphological slums are larger than 0.1 km 2 (sectors IV and V), which corresponds to an area share of more than 95%.
Caracas is a special case. Although the number of slums is relatively evenly distributed over the different orders of magnitude, the majority of the area with 62.8% is concentrated in the large slums with an area larger than 1 km 2 .
Analysing the area share of slums, the majority of slum areas lies in size sectors III and IV. It is also interesting that the relative number of slums in a particular size sector within a continent is similar. In all three South American cities surveyed, the size sector III has the largest share of slums. In Asian cities, this trend is shifting towards smaller slum sizes. This is different in African cities included in this study. While the relative number of detected  morphological slums decreases to larger sectors, the area share of these slums of the total slum area within a city increases. Beside these findings, the geometric means of the slum sizes in the different cities are similar. On average, the geometric mean value for the n ¼ 8 considered cities is S ¼ 0.0231 km 2 . The standard deviation σ ¼ for all cities is σ = 0.0181 km 2 . The high deviation is mainly caused by Cairo and Caracas, as the means are much higher than of the other cities. This is also shown in Figure 3. Considering the fact that Cairo is a special case, like it was mentioned above, and leaving it out of averaging, the geometric mean for n ¼ 7 cities (without Cairo) is S = 0.0182 km 2 with a standard deviation of σ = 0.0136 km 2 . In addition, the width of the distributions in most cases is very similar. The sizes of the morphological slums in the different cities usually vary by about an order of magnitude around the geometric mean value, which becomes very clear in the values of σ, which are in a range of 1.3 (cf. Table 2). When calculating the arithmetic mean of the standard deviations σ j for the different cities, the result is σ 0 ¼ 1.5. Outliers in this case are again Caracas (σ = 2.29) and Cairo (σ = 2.07), because the size distributions of these cities are much broader and extend over several orders of magnitude. These findings are also visible in the size distributions shown in Figure 4.
Looking at all identified morphological slums (Table 3), the following picture emerges: 7382    The values in Table 4 show the ratio of larger (>0.1 km 2 , sectors IV-V) to smaller areas (<0.1 km 2 , sectors I-III). Caracas and Cairo show values of c ≫ 1, meaning that the area share of larger slums is much higher than that of smaller slums in these cities. In Cape Town, Mumbai, Rio de Janeiro and Manila, the size relation is c ≈ 1, meaning that the slum sizes are distributed evenly over these two categories. In Dhaka and Sao Paulo, most of the slum area is in smaller slum units. Considering all slums of all cities mentioned, we see that c ≈ 1 and the slum areas are distributed approximately equally between larger and smaller slum units.

Comparison of data collection methods
Besides the comparison of the different cities with the same classification methods, we also compare different classification methods for three cities. First, we compare our remote-sensing-based classification of Dhaka with the one from Gruebner et al. (2014) based on remote-sensing data. Second, we compare our classification of Sao Paulo with the one from the IBGE, which relies on census data (and not remote-sensing data). Third, we investigate the influence of different classifications of slums on the size distributions in Cairo.

Dhaka, Bangladesh
A morphological approach towards the classification of slum locations seems to be obvious. However, even when similar ontologies are applied, classification results may differ due to ambiguity of structural change-overs, knowledge of the interpreter, among other factors. To account for this, we compare size distributions of different very high-resolution remote-sensing-based classifications of slums in Dhaka, Bangladesh, which have been done by different groups but based on a similar ontology. The histograms in Figure 8  Besides this, the size distribution resulting from our classification shows a geometric mean of S 0 ¼ 0:0021 km 2 . The resulting standard deviation for the year 2015 is σ Dhaka2015 ¼ 1:31 and thus a little higher. The results are summarized in Table 5.
The size distribution of morphological slums ( Figure 5, right) of our classification is much more regularly than the one of Gruebner et al. (2014) (Figure 5, left). Both classifications have in common that they observe a high number of small slums in the area range of 0.001-0.01 km 2 (sector II).
A reason for this can be seen in Figure 6, where an island in the south west of Dhaka is shown at two times with the two classifications drawn in. Although the underlying morphology did not change very much between the two years, there are big differences in the classification of the slums. While in Gruebner's classification (yellow), larger areas in the west of the peninsula were classified as slums, in our case (red) these are only smaller slum units. The question of which areas are classified as morphological slums and which are not is difficult to answer, since the transition between formal and informal settlements is smooth, whereby areas are classified differently by different interpreters. This difference in classification can also be observed in other areas of the city of Table 5. Results for slums in Dhaka. The bold values indicate the largest share of slums or slum area in the respective sector.   Dhaka. Discussions about these kinds of uncertainties in the classification of slums can be found in detail in literature (Kohli, Stein, & Sliuzas, 2016;Pratomo, Kuffer, Martinez, & Kohli, 2017).

Sao Paulo, Brazil
In Figure 7, the size distributions of the two classification methods for the city of Sao Paulo are shown. The number of classified slums differs from 1286 for the Census classification and 1937 for our classification. The slums classified by the census (left) are in general larger than the morphological slums classified by our own based on remote-sensing data (right). The values shown in the following Table 6 confirm this observation.
Furthermore, the geometric mean of the slum sizes in Sao Paulo is higher in the classification of the census than in our classification.
The main reason for the larger values in the census classification is that the census often classifies areas as slums that have no morphological characteristics of slums. An example from the north of Sao Paulo is shown in Figure 8. The displayed section is shown at the two different points in time and both classifications are drawn in. While our classification (red) is clearly oriented towards morphological features, the census classification (yellow) also covers the area east of the slum structure as a slum, which at both times is mainly formally built up with regular patterns and different building types.
This observation can also be made at many other points in Sao Paulo, as can be seen in the following Figure 9. In the marked areas, the classification by the census identified large areas as slums, although there is no morphological evidence to support this. Cairo, Egypt As already indicated above, Cairo is a special case in the study presented here. Our classification of slums leads to 41 morphological slums. This classification relates to type B in the work of Sims et al. (2003). Most of these units are very large, as shown in Table 2, which in turn leads to a high geometric mean of S 0 = 0.0572 km 2 . This fact can be seen impressively in the proportion of the slum area of slums in the size range of 1-10 km 2 . With a value of a V = 65.4%, more than half of the total area of the morphological slums is in large units.
The situation is different when a different classification is used. When analysing the size distribution of all classified slums (Figure 10, right), combining different types of slums (A, B and C) as they were suggested by Sims et al. (2003), the size distribution is more regularly and show a geometric mean of 0.0155 km 2 . This is not very different from the mean values of the other investigated cities (Table 2). Considering all slum types (A, B and C), the classification leads to 1031 slum units and the typical length l 0 nearly halves from 239.2 m to 124.6 m. The results for Cairo are summarized in Table 7.
It is also interesting to look at the total area of the slums. A classification of morphological slums (type B) leads to a total area of 13.3 km 2 , while a classification of all slums leads to a total area of 123 km 2 . Thus, morphological slums in Cairo account for only about 11% of all slums we classified according to Sims et al. (2003).

Discussion
The results (Table 2) show that the slum sizes are similar for different metropolitan regions of the Global South. Out of the 7382 morphological slums identified in our study, almost half of the slums (3339) have a size between 0.001 and 0.01 km 2 (sector II) and another 2909 slums are one order of magnitude larger (0.01-0.1 km 2 , sector III). The average slum size for the examined cities is S = 0.0182 km 2 , with a standard deviation of σ = 0.0136 km 2 , if the large morphological slums of Cairo are not considered. Morphological slums have a similar size globally, confirming the results of Friesen et al. (2018) with a larger database.
Furthermore, it becomes apparent that the largest share of the slum area occurs in units with 0.01-1 km 2 (sectors III and IV). Assuming a constant population density across the different slums, this result leads to the statement that more than 70% of the slum population (1 billion) live in slums with a size between 0.01 and 1 km 2 (sectors III and IV). When linking the results found here to the estimations on the current slum population mentioned in the introduction, more than 700 million slum dwellers live in slum units with this sizes.
Analysing the different global regions, no strong dependence of the typical slum size on the investigated region can be determined. Only the tendency of South American cities to larger and Asian cities to rather smaller slums can be observed. For Africa, the continent with the strongest population growth, only data for the slums in the cities of Cape Town and Cairo are available. The only city in the sub-Saharan region (Cape Town) considered here is rather a special case due to the strong political influence in the last century . The analysis of slums in the fast-growing cities of sub-Saharan Africa such as Lagos, Luanda,   Kinshasha or Nairobi is in demand for confirming or declining the observed trends, since previous studies show that the slum areas in these cities differ in their characteristics (Kuffer, Orina, Sliuzas, & Hannes, 2017).
The findings mentioned above can be used as input to systematically plan infrastructures for the supply of slums. From the analysis that a large part of the global slum population lives in smaller slum  units, we can derive requirements for infrastructure measures. The worldwide similarity of the slum units suggests the development of small, decentralized units that can be used to supply these settlements. These could be, for example, container solutions for small water distribution systems. In larger slums, on the other hand, it could be useful to set up small semi-centralized supply systems from which, for example, pipelines lead to smaller subunits that supply certain areas of the slum (Figure 11). However, we need to clarify the requirements for infrastructure systems. For example, whether there is room for the implementation of infrastructure in densely populated settlements. It is also necessary to analyse which approach makes sense from which size onwards. This involves the question of what a "big" and what a "small" slum is.
The different classifications of slums carried out for Dhaka, Cairo and Sao Paulo show that the classification methods have a considerable influence on the resulting size distribution. In the cases of Dhaka and Sao Paulo, our classification leads to smaller identified units.
With regard to classification, the question arises of how two adjacent slums can be distinguished from each other. In the study shown, slums were separated if they are more than 10 m apart. It should be noted that the study compares only morphological slums and cannot make any statement about their sociological situation.
The analysis of slums in Cairo shows that although slums globally have similar structures, local differences must be taken into account. In some cities, economic or political boundary conditions lead to other settlement classes of urban poverty being found in addition to morphological slums (compare the analysis of Cape Town by Friesen et al. (2018) or the global categorization of arrival cities by Taubenböck et al. (2018)). If a globally uniform classification is used, only a part of living places of the urban poor is identified (Kuffer, Pfeffer, Sliuzas, Baud, and Maarseveen, 2017).
Although many people in Cairo live in informal settlements, morphological slums are only a small part of it (Kraas & Schlacke, 2016). A large proportion of informal settlements are not characterized by very small buildings. Rather, there are many buildings with several floors, in some areas of Cairo even 8-15 floors. The informal character of the settlements lies rather in the not clearly defined legal situation and the associated poor connection to supply infrastructures, resulting in bad living conditions. Furthermore, the question arises as to which kind of probability distribution best describe the size distributions found. To investigate this question, studies of the kind of Giesen, Zimmermann, & Suedekum (2010) and González-Val, Ramos, Sanz-Gracia, & Vera-Cabello (2015) are necessary, as they have already been carried out in the analysis of the size distributions of cities within countries. The discussion of probability distribution functions as well as the different statistical tests to assess the goodness-offit necessary for this goes beyond the scope of this work and should be dealt with in future publications.

Summary and outlook
In the presented study, we identify morphological slums, using remote-sensing data and analyse their size distribution in different cities in different regions of the world. We showed that the majority of slums (84.6%) have an area between 0.001 and 0.1 km 2 and slums have a similar size globally. With the extension of the data base we generalize previous investigations  Table 7. Results for slums in Cairo. The bold values indicate the largest share of slums or slum area in the respective sector.  . Although cities with larger (Caracas) and smaller slum sizes (Dhaka) appear, the average value lies in the range of S ¼ 0.0182 km 2 with a standard deviation of σ = 0.0136 km 2 , when investigating 7 cities.
Furthermore, we showed that a classification of morphological slums is useful for infrastructure planning, etc. These classifications, based on morphological aspects, seem to be more suitable for these aims than census-based classifications.
Nevertheless, further investigations are necessary. These should primarily focus on cities in sub-Saharan Africa, as this region is currently due to the enormous demographic dynamics expected to experience the greatest slum growth.
Since the size distributions were only examined qualitatively, future studies should investigate the kind of distribution fitting the data the best. Methodological paths for this can be found in several studies for city size distribution in countries (Giesen et al., 2010;González-Val et al., 2015). When knowing the type of distribution, fitting the data the best, that can be used to identify processes that lead to this type of distributions and to a better understanding of the similar size of slums.