New avenues for second home tourism research using big data: prospects and challenges

ABSTRACT The phenomenon of owning and visiting a second home is broadening beyond tourism due to the increase in remote working and multi-local living – people living in multiple residences. To understand people, places, and mobility linked to second homes and its implications to society better, new complementary data sources are needed to provide timely and adequate information on temporal patterns and changes in second home use. Big data sources have been used in tourism research, but less often in studies about second homes. This article aims to propose a perspective in describing the potential of utility consumption data, transaction data from mobile positioning and smart cards, social media data, and data from smartphone applications for second home tourism research. By focusing on six key questions relevant to second home tourism research, we exemplify how these data sources could provide new knowledge to the field and propose four axioms for future research.


Introduction
Second homes in remote locations and close to natural amenities are considered to be places to escape the stressful and fast-paced urbanized society. The global COVID-19 pandemic has significantly increased the number of people residing in second homes (Willberg et al., 2021), and created a boom in second home purchasing (Financial Times, 2021;Pitkänen et al., 2020). While second homes and their use have traditionally been linked to leisure time and examined as second home tourism, recent studies on behavioural changes show that second homes are gradually being integrated to the daily lives of people (Adamiak et al., 2017;Zoğal et al., 2020). Most importantly, the pandemic has vividly shown that second homes are more than about leisure and tourism-related activities. During the COVID-19 pandemic, people were encouraged or even forced to stay at home and work remotely to reduce their social contacts, and many chose their second homes as the desired option for teleworking and staying safe.
Even before the pandemic, two significant social trends have diversified the second home phenomenonmulti-local living and remote working. More people live in several places and have multiple homes for a range of reasons (Schier et al., 2015). For example, in Europe, the retirement of the large 'baby boomer' generation is leading to increased time residing at second homes (Hiltunen & Rehunen, 2014). Simultaneously, work-life is becoming more flexible and developing telecommunication capabilities are increasing remote working (Hardill & Green, 2003), which is leading to increased use of second homes for younger generations. The ability of white-collar and knowledge workers to choose their physical working premises is contributing to second home living. This is also known as 'south working'remote working from rural low-density areas with better working conditions (Aloisi & Corazza, 2022). Multi-local living and remote working are accompanied by the increasing mobility of people (Sheller & Urry, 2016) that further creates diversity and complexity in understanding and defining second homes (Schier et al., 2015;Zoğal et al., 2020).
Second homes are important both for the rural and regional development and the sustainability of our societies. As rural communities in Europe suffer depopulation and economic stagnation, second home users are perceived as a potential stimulant for local communities (Back, 2020) and consumption (Czarnecki, Sireni, et al., 2021). However, due to the varying spatial and temporal patterns of second home users, anticipating the service needs tends to be difficult both for the public sector and local businesses (Larsson & Müller, 2019). Unpredictable events like heatwaves and pandemics may change the patterns rapidly and radically. For example, during the COVID-19 pandemic, people escaping to their second homes raised concerns about the potential spread of the virus in rural areas, where health-care services have not been adjusted to serving the surge in numbers of the temporary population (Gallent & Hamiduddin, 2021).
To identify and respond to these changes in second home usage patterns as well as to capitalize on the opportunities and mitigate challenges, more information about the dynamic nature of second home users' behaviour is needed. Due to the changing nature of second homes and the general increase in mobilities, second homes need to be examined from mobility and spatio-temporality perspectives (Müller, 2021). However, these dynamic perspectives are challenging to capture using traditional methods with relatively sedentary naturepeople seen more as static entities linked to specific places, and focusing more on space and places, instead of mobility - (Burrows & Savage, 2014;Kwan, 2013;Müller, 2021;Sheller & Urry, 2016), and complementary methods are needed. Current knowledge in second home tourism research is commonly acquired from registries, surveys, and interviews (see e.g. Adamiak et al., 2016Adamiak et al., , 2017Hiltunen & Rehunen, 2014;Overvåg, 2009). These traditional data and methods provide retrospective information from one point of time but fail to provide longitudinal and dynamic information on second home usage and its users. Registry data may provide information on the number and location of second homes, but not their use. Surveys and interviews may provide in-depth data on the reasons and practices, but they have low temporal resolution and small sample sizes. Therefore, the commonly utilized data collection methods are limited in predicting, capturing, or reacting to broad scale or rapid events.
To incorporate complementary perspectives and to gain more extensive and dynamic information on second home tourism research, we propose inspecting the opportunities offered by big data collected by mobile phones, social media applications and utility companies. These potential data sources are increasingly used in social sciences (Kitchin, 2014), but have seldom been used in second home tourism research (Müller, 2021) and gained prominence only recently during the COVID-19 pandemic (Willberg et al., 2021). Therefore, the aim of this perspective paper is to highlight how a big data approach could be applicable in second home tourism research. We focus on six key questions derived from tourism literature that are relevant to second home research and evaluate how big data sources can contribute to second home tourism research. Based on the quality and applicability evaluation, we propose four axioms to consider when introducing big data approach to second home tourism research.

Key questions in second home tourism research
We have derived six key questions based on comprehensive reviews written about second home research over the past few decades by Hall (2014), Müller and Hoogendoorn (2013), and Müller (2021). These questions have been prominent from the beginning of second home research until the recent advancements in the field.

Where are second homes located?
This question can be answered using the data from registries and censuses. However, identifying locations of second homes for official (building) registers may be difficult and if possible, then the spatial resolution might only be at the aggregated level such as for a municipality. Also, the increasing amount of multi-local living makes it challenging to detect which properties are used as de facto second homes. Precise locations are also needed to understand the geographical context of second homes (Müller, 2021): the relative distance to urban areas, services, or natural amenities. Not the least, temporality is relevanthow the spatial distribution of second homes develops over time.

When are second homes visited?
Temporal aspects such as when, how often and for how long second homes are used are often obtained from surveys, interviews, and travel diaries, but the data are typically collected retrospectively from a small sample covering a limited period. Thus, more detailed data are needed to understand the daily, weekly, seasonal, and annual usage patterns and volumes, and its geographical variation. This is relevant especially in the case of disruptive events such as the COVID-19 pandemic, during which second homes have been used more often for remote working (Zoğal et al., 2020).

Who owns and uses second homes?
Understanding the socio-economic background of owners and users of second homes is important in local planning and governance, especially considering the ageing society and increasing local and international mobility. Furthermore, disaggregating second home users from primary owners is relevant as user groups have distinct behavioural characteristics in visiting second homes (Hall, 2014). De facto second home users are difficult to detect from ownership registry data, and sample-based surveys are limited in time and space.

How do second homeowners and users engage with local communities?
The engagement and participation in the local community, both on-site and remotely while being away, helps in understanding the aspects of place and community attachment. The everyday activities at second homes, consumption behaviour of local services, participation in local events, social initiatives, and politics reveal engagement of temporary residents in the local community and their attitude and willingness to support local community life.

Why do people own and use second homes?
There are many reasons for owning and visiting second homes and the motives may vary between user groups and among the various types of second home. Surveys, questionnaires, and interviews are important for identifying complexities, embodied feelings, and more place-specific motives of second home users. However, retrospective data collection makes it difficult to grasp suddenly emerging events, activities, and feelings in a timely manner.
have been assessed. Currently this line of research does not make it possible to react effectively in a timely manner to the changing circumstances.
The use and analysis of big data can provide complementary insights to the above questions from more dynamic and timeliness perspectives as big data analytics is an important toolbox for smart tourism development (Xiang & Fesenmaier, 2017). Furthermore, using big data can open new research opportunities about the mobilities, expenses and experiences of second home users (Müller, 2021) and provide valuable insights for destination management (Raun et al., 2016).

Potential big data sources for second home tourism research
The rapid advancements in information and communication technologies (ICTs) and the global adoption of these ICTs in our daily lives generates a massive amount of constantly growing digital data also known as big data (Domo, 2021). Big data transforms the way people think about and conduct research (Burrows & Savage, 2014;Kitchin, 2014). Besides tremendous quantities of data (volume), as the most described difference in contrast to traditional datasets, big data is ambiguous regarding the qualities and characteristics of data sources. However, 'big data' has two traits that are distinct from traditional data sourcesvelocity and exhaustivity (Kitchin & McArdle, 2016). Velocity indicates data being created continuously with a fast pace up to real-time, and exhaustivity indicates that data can capture potentially entire populations or systems (n = all) instead of small sampling.
For second home tourism research, most big data sources are valuable as they can disassemble the dynamic nature of social phenomena, reveal spatiotemporal mobility of people and their social interactions with communities that are linked to second homes. In this paper we focus on utility consumption data, user-generated social media data, transactional data from mobile phone communication (SIM cards) and smart card (e.g. credit card) usage, and data from smartphone applications, as these can provide such information from different perspectives ( Figure 1). Consumption data from utility companies are created for and collected by companies providing electricity, gas, water, and waste management. Utility consumption data characteristics and availability vary by company and country, yet in general the data are centred around the properties, where services are used and indirectly linked to households using these properties. The widespread implementation of smart meters in buildings for sustainability aspects enables precise, near real-time consumption data collection (Yildiz et al., 2017), in contrast to former monthly or quarterly data collected for billing purposes (Hof & Schmitt, 2011). Such data are provided by the service companies and can be used as an indicator of the presence of people in a property.
User-generated data from social media are individual level data created by social media users that are collected and stored by social media platforms. Social media data consist of posts made on social media platformsfor example, a photograph uploaded to Instagram and Flickr, or a tweet made in Twitter. Social media posts typically include the following elements: social content in textual, image or audio format, geographical location information, timestamps, user profile and social network information such as likes, comments, friends, and followers (Toivonen et al., 2019). These data can be used for research using an Application Programming Interface (API) provided by the platform companies or collaborating with given company. Social media data are inherently individual level data; thus, data over longer periods provide rich understanding about these users, including mobility, activity practices, social networks, personality, perceptions, and attitudes (Steinmetz et al., 2020;Toivonen et al., 2019).
Transactional data that are useful for second home research originate from several mobile positioning and smart card data sources. Mobile positioning data can be both individual and network level collected by mobile network operators for billing and network maintenance purposes. Individual level data can comprise either phone user-initiated call detail records containing metadata on connected calls, messages, and data service (Järv et al., 2018;Saluveer et al., 2020) or automatic network-initiated signalling data such as antennae handovers and location updates of mobile phones in the network, or a combination of both (Ni et al., 2018;Zaragozí et al., 2021). Networklevel data comprise metadata about the use of the network such as the number of mobile phones connected to an antenna (Ratti et al., 2006). Smart card data such as credit cards, customer loyalty cards, and transport cards can provide individual-level metadata similar to those available from mobile phone data (Bojic et al., 2015). The core elements of mobile positioning data are geographical location and timestamp information as indicators for the presence and mobility of people . Individual data over time can further reveal activity spaces, meaningful activity locations and social network of people Puura et al., 2018).
Data from smartphone applications are created for and collected by companies providing location-based services through phone applications. The individual level data are collected via applications such as search engine, sport, health, weather, or transit apps that have a permission from the user to collect location data and other personal information for improving services. Recently, these data have been shared with external companies (data brokers or location-based service) providing processed data for research and practice. In particular, the COVID-19 pandemic has increased the demand for and supply of such data (Apple, 2020;Google, 2020;Huq Industries, 2021;SafeGraph, 2020;X-Mode, 2020), and its use in research (Trasberg & Cheshire, 2021). The key element of this type of data is the capacity to join timestamp information with precise location information as a result of combining smartphone built-in GPS-receivers with other positioning methods like Wi-Fi, Bluetooth, and mobile phone network receivers. Data from some applications can include additional information on social networks and topics relevant for users.

The applicability of big data sources in second home tourism research
We now describe how each of these big data types can contribute to providing new insights into the six key questions relevant for second home tourism research that are otherwise difficult to capture with the traditional data collection methods. In general, different data sources are suitable for answering different questions on second homes, their owners and users, and the impact on local community regardless of geographical setting as presented in Table 1.
Utility consumption data can contribute to the research from three perspectives. First, these data are about the properties, thus they can help to reveal the geographical distribution patterns of second homes (Pienaar & Visser, 2009). The second home property characteristic can be obtained either directly from a consumption provider's database or indirectly by deducing from consumption data based on consumption regularity patterns. By linking this data source with other sources like land use data, it provides further information about the environmental and social context of second homes. Second, utility consumption data can reveal temporal (daily, weekly, monthly) usage rhythms of second home properties and long-term trends (Razavi et al., 2019). Here, rhythms and trends can be detected from a massive consumption database using big data analytics such as machine and deep learning algorithms (Feng et al., 2020;Razavi et al., 2019). Third, utility consumption data (e.g. water, waste) can provide information about the impact of activities related to second homes on local environments (Hof & Schmitt, 2011). The occupation of second homes from given data can be used as an indicator for the presence of second home users and thus to evaluate their impact on the local community (e.g. demand for local resources and services).
Several consumption data sources are used in tourism studies to estimate temporary populations. Sun et al. (2018) used electricity data to show how the population decreases in Chinese cities during the domestic holiday season. Sánchez-Galiano et al. (2017) used drinking water consumption and solid waste generation data to estimate the size of the temporary population in a resort city in Spain. A study by Matsuto and Tanaka (1993) used waste data to infer seasonal patterns of population. However, there have been few studies focusing on second homes. Andersen et al. (2008) used electricity consumption data to reveal the increase of second homes over time in Denmark, Pienaar and Visser (2009) used water consumption data to identify properties being used as second homes, whilst Hof and Schmitt (2011) used water consumption data to analyze the impact of tourists and second home users on local water resources.
In general, detailed household-level electricity consumption data are widely used due to the increased availability from the adoption of smart meters (Yildiz et al., 2017). Other sources such as *** = high potential, ** = moderate potential, * = low potential, NA = no potential water usage and waste management data are temporally and spatially less detailed. This makes the use of these data more challenging for second home research. Deducing the occupation of second homes and thereby mobility patterns based on consumption data may be challenging, because the consumption levels depend on many factors (e.g. the number of appliances) and the data may be low in temporal or spatial aggregation levels. In these cases, the detailed variation between consumption profiles is not likely to be captured (Anderson et al., 2017). The main contribution of social media data comes from the fact that the data are provided at the individual level: each user creates a profile and generates a data flow by posting to the sites. These individual social media posts can be used in analysing the user perspective of second homes. Contextual information deduced from social content, user profiles or social networks (Toivonen et al., 2019) enable extraction of information about demographics of second homeowners and users, and their motivations and preferences. This enables an evaluation of how the users interact with the local community and what services they consume near their second home (Steinmetz et al., 2020). Also, this can shed some light on understanding better the individual reasons for owning and visiting second homes. Social media posts with an accurate timestamp and geographical location can reveal general mobility flows to second homes.
Social media data have been used widely in mobility studies (Huang et al., 2020;Jurdak et al., 2015) and urban studies (Boy & Uitermark, 2016;Heikinheimo et al., 2020). Tourism researchers have analysed visitation hotspots (Hasnat & Hasan, 2018;Shi et al., 2017), the origins of tourists , movement patterns in destinations (Kádár & Gede, 2021), visitor preferences based on the textual and visual content of their posts (Hausmann et al., 2018;Tenkanen et al., 2017), and human-nature interactions . Social media content is also used in capturing community engagement in cities (Steinmetz et al., 2020). However, social media data have not been used in second home research to date to the authors knowledge.
Despite strengths in sentiment and content analyzes for capturing the activities and perceptions of second homeowners and users, social media data usage has its challenges and applicability limitations. Determining locals, frequent second home users both within a country and from abroad, and tourists based on their spatiotemporal pattern and content of the posts is applicable yet needs more accurate data analysis methods. Users and posted content have varying representativeness between social media platforms in terms of socioeconomic status, age, gender, and geography, and data currently available from the social media companies do not necessarily represent the population accurately (Jurdak et al., 2015;Toivonen et al., 2019). This is amplified when inspecting only social media posts with a geotag included since not all users geotag their posts and not all platforms offer this feature (Sloan & Morgan, 2015). The frequency of geo-located posts can also be too low to conduct mobility analyses, and the variation of locations precision can be challenging to pinpoint people to places (Toivonen et al., 2019). Nevertheless, the abovementioned issues are being tackled with conceptual and methodological development and the applicability of data improves constantly.
Data from mobile positioning and smart card transactions can contribute to the research mainly by providing the location of second homes and the temporal information of when they are visited. Geographical information can be deduced from the location of mobile antenna or the transaction terminal. Temporal information is recorded when a call or a payment is being made. Based on the time and location information, it is possible to derive the mobility patterns Sobolevsky et al., 2014) to and from second home areas. Further, knowing the volume and timing of mobility flows and the expenditure of second home users enables to deduce the potential impact of second homes and their users on the local economy and environment.
The broad variety of studies use passive mobile positioning data to study human mobility in urban studies (e.g. Ahas et al., 2015;Järv et al., 2018), tourism (e.g. Raun et al., 2016;Saluveer et al., 2020;Zaragozí et al., 2021) and transport research (e.g. Järv et al., 2012;Ni et al., 2018). Data have been used to analyse second home mobility in Estonia  and in Finland (Willberg et al., 2021), where second home use and multi-local living is a common part of lifestyle for many. A study from Estonia showed clear seasonal patterns of temporal presence in coastal areas, surroundings of cities and in specific 'dacha' areas during summer . Another study from Finland captured the population decline in cities and increase in rural areas during the COVID-19 outbreak in spring 2020 (Willberg et al., 2021). In both studies, the areas with increased population were strongly linked to second housing. In general, bank card expenditure data has been used less due to the difficulties in data access. In tourism studies, these data have been used to identify the mobility patterns of tourists (Aparicio et al., 2022;Sobolevsky et al., 2014) and tourist behaviour during the COVID-19 pandemic (Donaire et al., 2021).
The main challenges with individual level transaction and mobile phone data are associated with data availability and privacy issues affected by national and international privacy legislation and data sharing policies of private companies (Poom et al., 2020;Saluveer et al., 2020). Instead, companies are providing processed data products, which are usually black boxed: the aggregation and collection methods are unknown and not customizable, making their use for research purposes somewhat challenging and controversial (Poom et al., 2020). Specifically, some mobile phone datasets (e.g. call detail records) may have varying temporal resolution , thus, creating a challenge for temporal analysis. Another challenge is the varying spatial resolution of mobile phone data, especially in scarcely populated rural areas, where second homes are mostly situated. However, advanced interpolation methods incorporating land use, time use, and other relevant datasets can significantly increase the spatial accuracy (Aasa et al., 2021;Järv et al., 2017). Mobile positioning data remain one of the more feasible data sources for second home mobility research.
Similarly, data from smartphone applications can provide spatially and temporally precise information about the location of second homes and when they are being visited. Based on the volume of mobility flows to second homes, it is possible to evaluate the impacts of second home visitation to local municipalities. The exact features of the data depend on the provider (e.g. data broker company). More detailed qualitative information could be collected with custom applications made specially for research purposes (Puura et al., 2022). Recent smartphone-based individual level human mobility studies showcase the importance of custom applications during the COVID-19 pandemic (e.g. Järv et al., 2021;Molloy et al., 2020), despite the small sample sizes.
More accessible aggregated smartphone positioning data have been used to examine how the COVID-19 pandemic and associated regulations affected human mobility, created inequalities (Chang et al., 2020;Dimke et al., 2020), and reduced the daily activity of people (Couture et al., 2021;Trasberg & Cheshire, 2021). The high spatiotemporal accuracy and the opportunity to link the information with other data sets also makes smartphone data an attractive source for second home research. However, it raises concerns about privacy and thus far has seldomly been used. Due to high privacy concern, data from smartphone applications have become hard to acquire in the European Union (EU) since the introduction of General Data Protection Regulation (GDPR) in 2018. Thus, data broker companies (e.g. X-mode) may exclude the EU entirely from its location dataset (Green, 2020). Nevertheless, a study by Morrison et al. (2020) shows how smartphone data can capture de facto population by day and month, including seasonal residents such as people owning or renting a second home, and thus be valuable for second home research.

Four axioms for future research
The changes in society such as increasing remote working and multi-local living are pointing to the need for a more nuanced and detailed understanding about second homes and their use. The growing use of big data sources in tourism studies and social sciences in general is a potential way forward in providing complementary information for second home research. While we have sketched out potential big data sources and perspective avenues relevant for second home tourism research, we recognize that these are not the only options available nor described their operationalisation in detail. Thus, we propose considering four axioms when incorporating big data in studying societal phenomena regarding second homes.
First, we need to adopt big data to include dynamics and mobility perspectives in second home tourism research. Because of the changing and increasing mobility of people, traditional methods tend to fail in grasping the extent and spatiotemporal details of the various types of temporary mobility. While living in multiple places increases, it complicates the understanding of and research on second homes and multi-local living, for example during the COVID-19 pandemic (Müller, 2021) and from cross-border and transnational context . Due to the velocity and exhaustivity of big data (Kitchin & McArdle, 2016), it has vast potential for enhancing the stateof-the art of second home research (cf. Müller & Hall, 2018). This is recently stated by Müller (2021, p. 97): 'Big data covering second-home owners' mobilities, expenses, and experiences further opens up for new, exciting research opportunities'.
Second, to ensure that relevant big data and appropriate analytics are used properly in second home tourism research, interdisciplinary collaboration is vital. Collaboration between experts on second homes and on big data methods is crucial for overcoming the learning curve and the familiarization of big data analytics for second home (tourism) research (Shoval & Ahas, 2018). The knowledge of tourism researchers is essential to have a critical understanding of the data used for second home research. The combination of the most suitable data, big data analytical skills, and extensive knowledge from second home researchers helps to improve the general conceptual framework and methodological development of the field. Sharing datasets and codes openly with clear documentation on empirical analysis would ensure research reproducibility and transparent capacity building within the research community examining second homes.
Third, acknowledging the ethical and privacy issues in using big data for second home tourism research. When dealing with person-based big data on individual or aggregate level, one needs to address and apply the current national and international legislation and rules, such as the GDPR in the EU. The individual or household level data provided by private companies (e.g. electricity companies, mobile network operators, data broker companies) are often aggregated in space and time to preserve the privacy of individuals. Yet, reliability concerns arise both for research and practice when the aggregation is done in a 'black box' manner and no information (or only partial information) about the aggregation method is available for big data products (Poom et al., 2020). With openly available individual-level (social media) data via API, one must not only follow the terms and conditions specified by the company owning the data, but also handle the data and present the research findings appropriately (Toivonen et al., 2019). Big data can be used safely in second home research if it is conducted responsibly (Zook et al., 2017), and thus, provide social good to local communities without harming individuals' privacy.
Fourth, and most importantly, research combining big data and traditional data sources is needed. Combining different data sources will provide a holistic understanding about second homes and multi-local living. Combining big data with surveys and interviews provides insights into answering what, why, and how questions. Mobile phone data can provide precise information about the volume and spatiotemporal pattern (when and where) of second home visits. Synthesizing this with insights from interviews and surveys puts mobility flows into context by revealing the reasons for visits. Social media data can reveal activities by second home users (what) and how they interact with local community. Yet, combining social media data with qualitative information from interviews allows us to understand why activities and interactions take place. Another way to obtain a broader view on second home usage is by combining big data analytics with a custom smartphone application study with a small and targeted sample, made for specific research purposes Molloy et al., 2020;Puura et al., 2022), enabling the collection of both quantitative and qualitative information.
The advantage of big data in providing new insights into second home tourism research is based on the long-term monitoring ability and timeliness for reacting quickly to disruptive events (e.g. the increasing use of second homes during the COVID-19 pandemic). Certainly, a big data approach does not replace the traditional methods involving surveys and interviews, but rather it complements and provides additional perspectives for understanding relevant themes in second home tourism research.