The use of public spaces in a medium-sized city: from Twitter data to mobility patterns

This research evidences the usefulness of open big data to map mobility patterns in a medium-sized city. Motivated by the novel analysis that big data allow worldwide and in large metropolitan areas, we developed a methodology aiming to complement origin-destination surveys with à la carte spatial boundaries and updated data at a minimum cost. This paper validates the use of Twitter data to map the impact of public spaces on the different parts of the metropolitan area of Concepción (MAC), Chile. Results have been validated by local experts and evidence the main mobility patterns towards spaces of social interaction like malls, leisure areas, parks and so on. The Main Map represents the mobility patterns from census districts to different categories of public spaces with schematic lines at the metropolitan scale and it is centred in the city of Concepción (Chile) and its surroundings ( ∼ 10 kilometres).


Introduction
The study of social networks has long been a field of interest from different perspectives.Transport oriented studies are a major research line as they aim to uncover mobility patterns in order to improve transport and urban planning.There is however another raising field of interest related to the use of public spaces, with the focus on the shared spaces.The aim is to help reducing social exclusion by promoting the shared use of public spaces as opposed to the existence of segregated places.
In this research, we explore the utility of big open data through the Twitter platform for small to medium sized cities, which have been paid much less attention and still host almost 50 per cent of the urban population (United Nations 2014).We use the city of Concepción (Chile) as a study area to analyse mobility patterns to public spaces.Concepción is a medium-sized city with over 200,000 inhabitants and strong spatial relationship with Talcahuano.Both cities are the articulators of Concepción metropolitan area with nearly 1 million inhabitants.
Twitter data is publicly available, even at no cost if one uses the Streaming API.For this reason, Twitter has become a popular data source and is present in millions of research documents.
However, most of them do not make use of its geographical dimension (Leetaru et al. 2013), or do it at a worldwide or regional scale (Hawelka et al. 2014;Sobolevsky et al. 2015;Li, Goodchild, and Xu 2013;Liu et al. 2015).The utility of Twitter data to obtain regular mobility data in large urban areas was demonstrated by Huang and Wong (2015) in their case study in Washington DC.They were able to obtain space-time paths from Twitter geolocated data similar to traditional travel diaries.We aim to extract similar material for our case study area in order to obtain information about the use of public spaces as opportunities for spatio-temporal interaction.
Our aim is to provide complementary and easily updatable information of mobility patterns to current origin-destination surveys that take place every 10 years, especially considering the last one in Concepción was done in 1999.Also, the public space is relevant because many social activities outside homes occur in public spaces (Rojas et al. 2015), even park and plazas play an important role for people (Villagra-Islas and Alves 2016).The challenge is to extract useful information from a smaller dataset than the ones available for large cities.In this paper we use mobility patterns to analyse the use of the public space by people residing in different parts of the city.

Data and methods
In order to fulfil the objective of improving spatio-temporal resolution and reducing data collection costs, we made use of geolocalized Twitter data.This is freely available on a real-time basis; thus an effort was made to download all the geolocalized tweets trough the streaming API in order to build a proper database.We used Python language to screen and save the tweets within a certain bounding box, and then we transformed it into a point layer for use in a Geographic Information System.
This research was made with the geolocalized tweets published in the study area (Concepción metropolitan area) between 1st January and 31 March 2016.Data treatment included the removal of identical tweets referring to the same emergency phenomena and at the same location, since this information is not relevant for the purposes of this research, i.e. identifying the use of public space.
In addition, user accounts with more than 250 tweets over the whole period (> 2.7 tweets per day on average) were checked in order to remove those not corresponding to individuals (i.e.Twitter accounts devoted to disseminate news or emergency issues).This led to the removal of 17 of the 26 top active users in the entire metropolitan area.Table 1 below shows the main figures of our sample data referring to the regional, metropolitan and central core of Concepción.The generation of mobility maps implies the need to set Twitter users' place of residence, which was done assuming they live in the district were they tweeted the most between 22:00 and 07:59 on regular weekdays (Monday to Thursday).
Our methodology includes an exploratory analysis of the spatial autocorrelation between official resident-based data and our estimated place of residence prior to the flow maps representing mobility patterns.This way we aim to confirm the null hypothesis that the number of users of public spaces from a particular place is not directly proportional to that place population volume and distance, and that this relationship varies across the space.
In particular, we used global and local bivariate Moran's spatial autocorrelation Index, which indicates whether there is a strong or weak relationship between high (and low) population volume and Twitter users at the district level.This index was developed by Anselin et al. (2010) and was computed using their GeoDa software.
The mobility analysis was made in base to those users that tweeted at night on working days at any part of the metropolitan area, and during the day and weekends within the most popular public spaces according with local researchers.Data treatment and map production was done with the commercial software ArcGIS 10.3.1 and Simantel's Flow Map Generator (Simantel 2012) toolbox.
We followed Shelton (2016) and proceed to normalize our data to a tweet usage baseline.The mobility analysis was therefore performed with 'raw' data (i.e.number of Twitter users moving from one district to a particular public space) and with normalized data (i.e. the proportion of Twitter users from one district to a particular public space).The comparison between the former (number of people moving) and the latter (the impact of each public space on each district) allows the identification of a potential bias in using Twitter as a proxy for individual mobility.

Mapping mobility patterns
We analysed the mobility patterns between the estimated district of residence and a selection of public spaces.These public spaces were selected and mapped according to local experts and represent the most popular places in the metropolitan area in six categories, i.e.Central Business District (CBD), Malls, Leisure, University Campus, Transport and Parks (Figure 1).We then compared the results of non-normalized vs. normalized data in order to understand the mobility patterns to public spaces (Figure 3, top vs. bottom).These maps show the number of Twitter users in each district that visited each public space (top, non-normalized data) vs. the impact of each public space on residential areas by district (bottom, normalized data).Line thickness grows proportionally with the number of users/size of the impact.Lines start at the furthest district in a given direction and accumulate the flow from nearby districts which lines merge together (like a hydrological basin).The scale growth is constant in all maps, thus making them easily comparable.We produced generalized lines that are topological and quantitative accurate, and that maximize map simplicity and readability.
CBD's impact is much larger than any other public space, thus deserving a prominent place.The rest of the public spaces are grouped in maps by categories.Colours representing individual public spaces in each map have been carefully chosen from ColorBrewer qualitative ramps (Brewer 2002) in order to ensure equitable visibility to each one.
As expected, the CBD is the main attraction place with users from all districts and a greater impact on the north-west and south part of the metropolitan area (flows from Talcahuano, Penco, San Pedro de la Paz and Coronel).Normalized flows indicate that Trebol mall (the largest in size and shops), the furthest away from the city centre and the easiest to reach by car, has the largest impact on most districts, including the city centre.On the contrary, Centro mall shows a sharp decrease in its impact outside the inner city boundaries (i.e.walkable and with higher transit density), and the Mirador mall extends most of its impact along the northwest-southeast axis, which is coincident with the railway and bus network.University and college campus also concentrate a high proportion of Twitter visitors with differences across the metropolitan area.Universidad San Sebastián (USS), on the northern part of the city centre, gets the highest impact from its surroundings and, to a lesser extent, from the western area.In the second position, University of Concepción (UdeC) has a large impact that spreads in all directions, whereas DUOC and UBB's impact decreases sharper with distance.
Leisure areas not related to retail activities but to entertainment and gastronomy receive fewer visitors.In some cases, the impact is clearly concentrated on some specific parts of the metropolitan area.For example, Diagonal-Perú, an outdoor bar and restaurant area in the city centre, has a higher impact on the northern districts, whereas Casino, a private entertainment area in the northern outskirts, impacts especially in the districts located on the high-income residential area north of the city centre and on the southwest part of the metropolitan area.Both areas also receive a larger impact from the airport than other districts.Other places, like Lenga, Plaza España or Laguna Grande have an impact that decreases with distance in a consistent way.In the case of parks, it is clear that Parque Ecuador, the one with a clear urban character, is the only one with a hinterland covering the whole metropolitan area.Similarly, Terminal Collao bus station has an extended impact which is more intense towards the north and west.

Figure 1 .
Figure 1.Categories of public spaces

Figure 2 .
Figure 2. Twitter users that visited public spaces by district of residence and resident population scatterplot (left) and local Moran's I (right)

Figure 3 .
Figure 3. Twitter users' flows from estimated resident district to public spaces

Table 1 .
Main figures of downloaded geo-located tweets in Concepción, Chile * Users that tweeted more than once and from different locations.Source: own elaboration from data obtained from Twitter Public API between 01 January 2016 and 31 March 2016, local time (GMT -3).