Evaluating events data for cultural analytics

The effects of the Covid-19 pandemic on the Creative and Cultural Industries can be difficult to quantify. Metadata about events (theatre productions, music and comedy gigs, sporting fixtures, days out, and more) are an untapped resource for cultural analytics that can be used as a proxy metric for financial and social impact. This article uses a sample of large-scale cultural events data from UK industry providers Data Thistle to ask: how can events data at scale be used to quantify the financial and social effects of the Covid-19 pandemic on the cultural events sector in a particular region? We analysed the changes in event provision in Edinburgh in August 2018, 2019, 2020 and 2021, revealing an estimated 97.3% fall in ticketing revenue between 2019 and 2020. Additionally, the effects that pandemic restrictions had on different categories of event reveal a disparity in how different audience sectors were affected, with ‘Visual Art’ and ‘Days Out’ showing most resilience and ‘Theatre’, ‘Comedy’ and ‘LGBT’ events being most reduced. Our findings indicate that events data are a rich but heterogenous source of information regarding the cultural and creative economy, which is not yet routinely used by researchers.


Introduction
The UK has world-leading Creative and Cultural Industries (CCI), defined by UNESCO as conducting 'activities whose principal purpose is production or reproduction, promotion, distribution or commercialization of goods, services and activities of a cultural, artistic or heritage-related nature' across the domains of advertising, architecture, books, gaming, music, movies, newspapers and magazines, performing arts, radio, tv, and the visual arts (Lhermitte, Perrin, and Blanc 2015, 11).As well as contributing to a multi-billion-pound economy, access to arts and culture is related to issues of social inequality, health and wellbeing, and public infrastructure (Crossick and Kaszynska 2016, 7-8).
This heterogenous value of culture, both financial and social, has understandably proven difficult to quantify.A major report commissioned by the Arts and Humanities Research Council published in 2016 concludes that '[a]ccounting for human experiences of art and culture calls for multi-criteria analyses and a range of approaches, in order to span the depth and the breadth of research' (Crossick and Kaszynska 2016, 10).A range of approaches has been implemented, ranging from financial analysis of individual sectors to quantify the economic value of CCI, to work that falls under the umbrella of cultural sociology, which attends to 'politics, religion, social movements, race, civic engagement, and a variety of other social processes -from a cultural perspective' (Alexander, Jacobs, and Smith 2017).This research by cultural sociologists works to tease out other forms of societal impact beyond the strictly financial.
The global Covid-19 pandemic has brought the importance of understanding the financial and social values of CCI into focus, for example, in determining the effectiveness of pandemic restrictions on the cultural workforce and how to better plan for future crises (Walmsley et al. 2022).The pandemic also caused changes to the cultural landscape that will have to be accounted for in future research, such as the reduction in prominence of revenue from live music (BBC News 2022) and the increased role of online activities and streamed performances (Bradbury et al. 2021, 4).These changes have only made it more important and more difficult to understand and measure the financial and social impacts of the pandemic on the CCI.
Before the pandemic, Hanquinet et al. espoused the value of 'the information held by commercial organisations about consumer behaviours, particularly around consumption practices' as 'offer [ing] much more detailed, fine-grained, information on the social world compared with surveys and face-to-face interviews' (Hanquinet et al. 2019, 199).Every year there are hundreds of thousands of events, festivals, concerts, plays and gigs, varying in scale from the very small and informal to the large and coordinated.The details of these events constitute a huge amount of data about events that have taken place (historical events) and events that are planned for the future (live events).Metadata about these events are an untapped resource that can be used as a proxy metric for financial and social impact.
This article addresses the usefulness of cultural events data in understanding the cultural and social value of CCI.To do this, we evaluate both an available source of events data (from the commercial providers Data Thistle) and large-scale data analysis and visualisation, using Python and pandas DataFrames, to ask: how can events data at scale be used to quantify the effect of the Covid-19 pandemic on the cultural events sector in a particular region?As we are assessing the suitability of a data source and a method -with the aim of providing guidance to other researchers working in this area -we take care to explain the decisions made when working with these data and their repercussions.After a detailed explanation of these data and how we approached them, there follow two examples of how cultural events data could be used.We calculate the estimated revenue difference between Edinburgh in August before and during the Covid-19 pandemic by creating an algorithm for estimating the revenue generated by ticket sales.We then compare the differing effects that pandemic restrictions had on different categories of event to go beyond financial impact and draw inferences about social consequences that would otherwise be imperceivable.We demonstrate that events data are a rich but heterogenous source of information regarding the cultural and creative economy and, although not yet routinely used by researchers, hold much future potential.Our findings augment the wealth of work that has been done to ascertain how the cultural sector in the UK was impacted by the pandemic, financially and otherwise.The key intervention here is to add another facet to the myriad approaches to quantifying and evaluating cultural value.

Determining cultural value
Determining cultural value, whether financial or otherwise, is not easy.Before the Covid-19 pandemic, estimates of the financial contribution that the Creative and Culture Industries make to the UK's economy painted a healthy picture, with official figures showing that the Creative Industries contributed £111.7 billion to the UK in 2018 (Department for Digital, Culture, Media & Sport, 2020) and £115.9 billion in 2019 (Department for Digital, Culture, and Media & Sport 2021).Beyond Gross Value Added (GVA), the government has been seeking ways to quantify cultural value more broadly.The Department for Digital, Culture, Media & Sport commissioned a report on how to quantify the value of the culture and heritage sectors with the aim of 'develop [ing] a formal approach to value the benefits of culture and heritage assets to society… to create publicly available statistics and guidance that will allow for more accurate articulation of the value of services provided by culture and heritage' (Sagger, Philips, and Haque 2021, 5).From academia there have also been calls to better assess cultural value.A project commissioned by the Arts and Humanities Research Council, 'The AHRC Cultural Value Project' (Crossick and Kaszynska 2016) sought to 'identify the various components that make up cultural value' and 'consider and develop the methodologies and the evidence that might be used to evaluate these components of cultural value' (6).This call for a better understanding of the multiple values of culture also comes from the public body Arts Council England, which describes the reach of arts and culture into society, wellbeing, education and the economy, and calls for better methods to 'establish causality between arts and culture and the wider societal impacts' (Arts Council England 2014, 8).There is work underway focusing on specific methods for quantifying the social impact of arts and culture, such as using wellbeing data (Oman 2021), and the pandemic has made this work all the more prescient.

Consequences of the Covid-19 pandemic
We know that the pandemic had a huge effect on business revenue, especially on the arts, culture and heritage sectors, after the UK government announced a stay-athome order and the closing of all non-essential businesses on 24 March 2020 (BBC News 2020a).There were various timetables for arts venues to reopen across the UK in 2020 and 2021, although these often involved Covid-19 mitigation measures like reduced capacities, which meant that venues were not necessarily able to generate enough revenue to be financially viable (BBC News 2020b; News Editor 2020; Telegraph Reporters 2021).
Globally, these measures have had a significant financial impact.The World Economic Forum has found that '[t]en million creative jobs have been lost because of the pandemic' (Bateman 2022).UNESCO found that most countries reported a fall in Gross Value Added over 2020 in the Cultural and Creative Industries, and that this fall was larger than the fall in their national economies overall, with the loss of revenue from CCI amounting to approximately 30-40% across different countries (Naylor et al. 2021, 12).In the UK, official GVA figures from DCMS for 2020 and 2021 are not yet available (Department for Digital, Culture, and Media & Sport 2022), although multiple reports have collated figures for specific aspects of the cultural sector.
A report collating figures from multiple government and industry sources estimates that in the creative, arts and entertainment activities sector, '[b]etween Q4 2019 (before the pandemic in the UK) and Q2 2021 (the most recent data to date), output declined by 37% in real terms in the sector in the UK' , and that the music industry suffered a 46% fall in GVA contribution, and theatres a £630 million loss of income (Waitzman 2021).Further work on the economic effects of the Covid-19 pandemic has quantified this revenue loss for specific geographic regions (Chamberlain and Morris 2021), specific industries (Anayi et al. 2021;Clark 2022) and globally (Betzler et al. 2021).Focusing specifically on Edinburgh, The Edinburgh Festival Fringe Society estimate that the Edinburgh Festival Fringe brings between £200 m and £1bn to Scotland and the UK each year, and that cancelling in-person shows at the Fringe in 2020 resulted in a revenue gap for the Fringe Society of £1.5 m and for Fringe venues of £21 m (Edinburgh Festival Fringe Society 2020).
The impact of the Covid-19 pandemic on the CCI extends beyond cancelled performances, refunded tickets and economic loss.As anthropologist Juliet Bedford writes: No epidemic is ever just a health issue in isolation, and Covid-19 has emphasised this on the global stage… We need to be looking at it in terms of an economic issue, a livelihood issue, a social issue and a political issue too.(Wellcome 2020) Reflections on the pandemic have revealed the effects it has had on the Cultural and Creative Industries themselves, like inequalities and precarities in the sector's working conditions (Comunian and England 2020;Walmsley et al. 2022) and how audiences' consumption of culture has changed, for example with the move to digital modes of engagement (Bradbury et al. 2021;Denk et al. 2022;Leung and Davies 2021;BBC News 2022).

Inequalities
Covid-related changes have not been experienced equally among all sectors of society.UK Research and Innovation (UKRI) has made efforts to focus research on the effects of the pandemic on sectors of society that may be impacted disproportionately, such as children, older people, people with disabilities, and LGBTQIA + communities (Arts and Humanities Research Council 2021;Economic and Social Research Council 2021a;Economic and Social Research Council 2021b;Economic and Social Research Council & Innovate UK, 2021).Other research finds that '[w]omen are affected more than men by the social and economic effects of infectious-disease outbreaks' because of greater caring responsibilities, less secure employment contracts and risk of domestic violence (Wenham et al. 2020, 194).Existing work on the effect of the pandemic on children and young people focuses on direct consequences like bereavement (Hillis et al. 2021) but also on the repercussions of mitigation efforts like home schooling (Deeker 2022).Studying the consequences of children's reduced access to arts and culture during this time could help us to better understand the ways in which children and young people have specific vulnerabilities during times of international crises.It has also been recognised that LGBTQIA + communities, as seen in this early call by the charity Stonewall for renewed attention to data collection, have specific vulnerabilities (Munir 2020).In 2018, a survey of LGBTQIA + individuals in the UK found that the respondents were 'less satisfied with their life than the general UK population (rating satisfaction 6.5 on average out of 10 compared with 7.7)' and that '[t]rans respondents had particularly low scores (around 5.4 out of 10)' (Government Equalities Office, 2018).Inequalities in participation in arts and culture are not new.Orian Brook et al. discern 'deeply unequal patterns of consumption and production' across multiple axes of oppression, including gender, race and class (Brook et al. 2020, 53).Better data about event provision where it intersects with minoritised sectors of society would enable a better understanding of how inequalities in access to arts and culture are experienced, their effects, and suggestions for policy and action to remedy this disparity.

Cultural analytics with events data
The interdisciplinary approach of cultural analytics, defined by Lev Manovich as 'the use of computational and design methods for exploration and analysis of contemporary global culture at scale' (Manovich 2020, 35) positions the metadata generated from the Creative and Cultural Industries as a valuable data stream.In his endeavour to 'work toward a more inclusive and democratic understanding of the cultural present and also of cultural histories' (9), Manovich hones in on events as 'cultural happenings that have duration in time and involve multiple people' (75).Existing work in UK universities that aligns with Manovich's description of cultural analytics often falls into the category of cultural sociology or the sociology of culture, which attend to 'culture industries and cultural consumption' and 'the entirety of social life… from a cultural perspective' (Alexander, Jacobs, and Smith 2017), respectively (see Brook, O'Brien, and Taylor 2020;Gross and Pitts 2016;Hadley et al. 2019;Hanquinet, O'Brien, and Taylor 2019;McAndrew, O'Brien, and Taylor 2020).In their comparison of the uses of survey data and social transactional data like ticketing data, Hanquinet et al. promote the utility of 'transactional data being particularly useful for shedding light on activities that are hard to gain insight into from surveys' like the Taking Part survey of cultural participation in England or Active Lives (Hanquinet et al. 2019, 201, 214).One such source of transactional data is events listings.
The data for the analyses in this article have been provided by Data Thistle (Data Thistle, n.d.), an Edinburgh-based listings technology business that provides live events data and has been in operation since 1985 (previously known as The List).They collate 'What's On' data about cultural events (theatre shows, music performances, art exhibitions, sporting fixtures, club nights, author events, nature walks and more) from over 10,000 venues across the UK.They make the event name, category, venue, date, time, event description, website links, accompanying images, prices, addresses and geo-location data available to be queried on their public 'What's On' website as well as providing data to users such as destination marketing organisations, transport companies and hotels.Data Thistle currently provide access to live events data (that is, events scheduled in the future) via widgets, spreadsheets, calendars, bespoke JSON feeds and an API.
Data Thistle are also constantly accruing a long tail of historical events data, as they operate on the understanding that they are creating a cultural record of the UK for posterity and so retain data pertaining to past events.This rich picture of cultural provision for the UK remains mostly untapped and under-researched.A few projects have begun to utilise these data.One such project is the Festivals & Communities Map (Currie and Correa 2021;Currie and Correa 2022), which combines data on: Edinburgh Festivals' community outreach efforts; Edinburgh Festival Fringe venues; ticket sales across the Fringe, Edinburgh International Festival and Edinburgh Book Festival; bike routes; the Scottish Index of Multiple Deprivation; and year-round venues.The year-round venues data were supplied by Data Thistle.Plotting all of these geospatially allows questions about event provision and equality to be interrogated.Another project that has found success by combining data from different sources is the Festival Mobility project (Ryan-Saha 2020a; Ryan-Saha 2020b).The project was designed with the aim of investigating how to combine events and transport data to better understand festival-related traffic and congestion in Edinburgh during August.When the Covid-19 pandemic happened the project pivoted to investigating festival infrastructure in Edinburgh from the perspective of public health (Ali-Knight 2021).Each of these pieces of work builds on Data Thistle's data andenriches it by adding data from other sources to extend the questions that can be asked of the data.
Combining events data with data from other sources, such as which Covid-19 restrictions were in place, the provision of public transport, or the number of tickets sold to each of these performances, enables many more questions to be asked.This would require access to further open and commercially-licensed data, which is beyond the scope of this case study, but we flag up here that the richness of the What's On data may be best explored in collaboration with other datasets.

Materials: a cultural events dataset
We sourced live cultural events data under licensed agreement from Data Thistle, a UK based commercial company (Data Thistle, n.d.).Data Thistle maintain comprehensive listings of cultural events in the UK -what's on, where and when -with each event containing, at minimum, a title and a location including latitude/longitude and postcode.Data Thistle provided us with a sample of events data: the 'Data Thistle Dataset' (DTD) delimited by postcodes within the Edinburgh and South-East Scotland City Region Deal 1 and taking place between November 2017 and May 2022.The DTD is approximately 10% of the whole UK dataset for that time period.
The DTD is comprised of nine JSON files totalling 153MB and contains machine processable text information about 38,700 events, 48,790 schedules and 349,855 performances at 2474 places.The sample does not contain any online events.Each event in the dataset is allocated to 1 of 15 discrete genre categories ('Film' , 'Music' , etc) and events can also have tags conveying further information.The dataset contains 2175 unique tags used a total of 97,532 times.The distinctions between events, schedules and performances, and categories and tags are described below.The ontology of the data -how categories, tags and other values are defined and implemented, and whether they are used consistently -affects how we can query the data and the related findings.

Data complexities
Before asking specific questions of the Data Thistle Dataset, our first step was to establish how many events took place where, and what type of event they were.This brought with it complexities relating to two characteristics of the data taxonomy, which needed to be understood before conducting any analysis: 1) the difference between events, schedules and performances, and 2) the difference between categories and tags.

Events, schedules and performances
In the DTD: • An event refers to the characteristics of a potential event.An event also needs a schedule and performance for it to be realised.
For example, a concert by the band Biffy Clyro is an event.
• A schedule is the range of instances of an event at a particular place.
For example, Biffy Clyro at Murrayfield Stadium on 3, 4 and 5 March 2023, and at Hampden Park on 6 and 7 March 2023 are two schedules.
• A performance is one instance of an event at a particular time and place.
For example, Biffy Clyro at Murrayfield Stadium on 5 March 2023 at 8 pm is one performance.
Some implications of events versus schedules versus performances are: • Working at the event level allows an overview of events without the specifics of time, place, ticket prices or performance-specific properties, such as whether a performance provides British Sign Language interpreting or if a film is screened in 3D.• Working at the schedule level allows for time spans and places, but not ticket prices or performance-specific properties.
• Working at the performance level includes details of time, place, ticket prices and performance-specific properties.Information that may be pertinent to a particular research question is located in performance-specific properties, for example, whether an event is cancelled or sold out.
As working with the data at event, schedule or performance level will generate different results, this choice will be determined by the research question.For example, for an analysis of event descriptions using Natural Language Processing, the event level may be sufficient.Due to the complexities of this dataset, researchers working with it must have clear understanding of the decisions that are required to generate robust findings.

Categories and tags
Another aspect of the data required for analysis is the difference between categories and tags.Each event is assigned to one of 15 categories within the Data Thistle ontology: 'Film' , 'Music' , 'Days Out' , 'Theatre' , 'Kids' , 'Comedy' , 'LGBT' (Lesbian, Gay, Bisexual and Transgender 2 ), 'Clubs' , 'Visual Art' , 'Dance' , 'Books' , 'Sport' , 'Food & Drink' , 'Talks & Lectures' and 'Workshops' .There is no data definition of these categories, but their scope can be inferred from the events in each category as described in Data Thistle's API documentation. 3 Events are assigned only one category from a controlled vocabulary of the 15 categories listed above.These are chosen and inputted by event creators (event organisers, venues or promoters) and, when the data is imported, the given categories either match one of Data Thistle's categories or are mapped onto Data Thistle's ontology algorithmically using the closest match.
Events can also be assigned multiple tags and, rather than using a controlled vocabulary, these constitute a crowd-sourced list of descriptors generated by the event creators.There is no real limit on the number of tags an event can have as Data Thistle do not place any restrictions on this free text field.There are 2175 unique tags in the dataset and they function to add extra or more nuanced information about events, so an event categorised as 'Music' may also be tagged as 'Folk' to provide more detail about the genre of music.Tags become important when searching for all events relevant to a specific query.For example, an event that borders two categories, such as a musical theatre performance, may be categorised as 'Music' and tagged as 'Theatre' or vice versa.Tags are especially important for the categories of 'Kids' and 'LGBT' since these are not strictly genres of event but pertain to the audience for an event: an event aimed at children or LGBTQIA + communities will also fall into another category, such as a kids' nature walk in 'Days Out' , or a drag show in 'Theatre' .Formulating search queries to extract a sample of the DTD in order to answer specific research questions therefore requires an understanding of the difference between, and interplay of, categories and tags.

Method: notebooks
With our initial explorations of the DTD structure in mind, we ingested the nine JSON files that comprise the DTD using pandas DataFrames into a Python 3.9 environment, created entity diagrams to understand the structure of the dataset and its main features (see Appendices A & B), and created two Jupyter Notebooks: one to generate DataFrames from the JSON files and one to enable a variety of analyses to be conducted on the DTD or subsets of it delimited by city, month and category of event.

Notebook one: generate_list_dataframes
The Generate_List_Dataframes 4 notebook automatically ingests JSON files and processes them so the information can be analysed more easily by merging events that appear in more than one file, removing duplication, and 'exploding' nested lists of dictionaries to create several DataFrames with the information 'separated' at different levels (events, schedules, performances, tickets, places, etc.) and saving them to disk for use with the Case_Study_Maker notebook.This notebook also includes preparatory work to enable the estimation of revenue at the schedule level, which is discussed in more detail in section 3.4.

Notebook two: case_study_maker
In the Case_Study_Maker 5 notebook we developed a new method to perform custom analyses at run-time.It automatically loads and ingests the DataFrames generated in the previous notebook and generates different types of analyses (frequency, histograms, Gantt charts, maps, etc.) using the Plotly open-source interactive graphing library for Python at different levels (events, schedules, performances, tickets).This notebook requires three parameters from users: 1) city (e.g.'Edinburgh'); 2) list of categories to analyse (e.g.'Music' , 'Visual Art' , 'Film' , 'Books'); 3) month to create a more focused analysis over the years (e.g.'August').It enables the generation of consistent findings from the DTD and comparisons between subsets of the DTD.
These notebooks allowed us to generate the findings presented here, but could also be used in future research projects on any datasets supplied by Data Thistle.

Method: revenue calculation
For calculating the estimated revenue at the schedule level we had to establish a method.After workshopping an approach with Data Thistle, we applied the following steps (these are calculated automatically by running the Generate_List_Dataframes notebook).For all schedules, we retained the performances that we know have not been cancelled (this an attribute of performances properties, and some performances have that information available).For each of these performances, we obtained their ticket types (e.g.Standard, Concession, etc.).Each of those ticket types can have several prices (max_price and min_price).So, first we calculate the ticket price per performance (steps 1a and 1b) to obtain the revenue of each performance ticket.This involves: • Step 1a: obtaining the 'list of performances ticket type price'.For each ticket type price, we use the max_price value (which is a ticket's attribute, as we can see in the Events E-R Diagram shown in Appendix A) if that value is greater than or equal to zero.If max_price is equal to zero, we use min_price value (which is another attribute of tickets).This list has as many elements as ticket types per performance.• Step 1b: obtaining the 'ticket price per performance' by calculating the trimmed mean of the list of performances ticket type prices (from step 1a).
We trim the 10% extreme scores (lowest and highest).
Once we have the ticket price per performance (with the revenue of each performance ticket), we calculate the 'performance revenue' (step 2), which is all the revenue obtained by each performance.
• Step 2: If a performance is sold out then, the performance revenue is equal to place capacity (where the performance is being scheduled) multiplied by the ticket price per performance (calculated in step 1b).The place capacity is also an attribute of a place's properties, as we can see in the Places E-R Diagram (see Appendix B). • If the performance is not sold out, the performance revenue is equal to place capacity multiplied by the performance ticket price (calculated in step 1b) and multiplied by a capacity factor.
For the purposes of these estimations we have set the capacity factor to 0.7, that is, we have made the assumption that if a performance is not sold out, it will sell 70% of the maximum capacity of the venue.However, we have made adjustments for discrepancies potentially arising from high-capacity venues like sports stadia.If performances are not categorised as 'Music' but they are in places with high capacity (>20,000) then the capacity factor is set to 0.2.Also, for performances categorised as 'Sport' with high capacity (>20,000), their capacity factor is set to 0.4.We highlight below that these assumptions will be variable depending on the dataset, and other applications of this method should make decisions regarding these variables accordingly.
The final step is to calculate the schedule revenue (step 3).For each schedule we added the performance revenues (calculated in step 2).In this step we removed the performances designated as cancelled.See Figure 1 for an algorithm combining the steps described above for estimating the revenue from each schedule.
This method was used in the following section to estimate the revenue generated from cultural events in Edinburgh in August 2018, 2019, 2020 and 2021.

Estimating revenue lost due to covid-19 pandemic: findings and discussion
In this section we investigate an approach to quantify the financial impact of the Covid-19 pandemic on cultural revenue using the method described in the previous section, which uses ticket prices, venue capacities and an assumption about the percentage of available tickets that are sold.This method is tested on data for Edinburgh during August 2018, 2019, 2020 and 2021, the city's busiest month for events.We arrive at some estimated figures for the revenue generated from tickets sales in those months.As this is exploratory work, we extensively work through the steps involved in this method, the assumptions this method relies upon, any issues arising from the data, and the work that would need to be undertaken on the DTD to achieve more accurate results.

Revenue findings
Using the method described in section 3.4, we estimated and visualised the revenue generated from each category of event for Edinburgh in August 2018, 2019, 2020 and 2021 in the Data Thistle Dataset.By just looking at these schedules, we can estimate the financial impact that the pandemic has had on Edinburgh's event revenue.We chose August as a comparison point as the majority of Edinburgh Festivals happen that month (Edinburgh International Festival, Edinburgh Fringe, Edinburgh Art Festival, Edinburgh International Book Festival, and the Royal Edinburgh Military Tattoo) as well as other events.As can be seen more clearly in Figure 2 and Table 1 below, the estimated revenues from ticket sales of in-person schedules in Edinburgh in just August 2018, 2019, 2020 and 2021 are as follows: Figure 2 shows the estimated revenue generated from schedules in Edinburgh in August 2018, 2019, 2020 and 2021 per category, and the combined figures are given in Table 1.
These figures show an expected drop-off from approx.£18.3 m in 2019, before the Covid-19 pandemic, to approx.£0.5 m in 2020, when pandemic restrictions were in full effect, and a slight resurgence to approx.£6.2 m in 2021, when some of the restrictions were lifted.However, there is an unexpected difference between the 2018 and 2019 revenues of approx.£2.9 m.Rather than reflecting any real-world circumstance, several key issues with the data are affecting these results.In the following paragraphs, we enumerate these issues and suggest how to mitigate them in order to generate more reliable figures.200,040 18,263,400 492,512 6,216,536 −17,770,888 −97.3 −12,046,864 −66.0%

Assumptions
Beyond assuming that ticket availability can act as a proxy for cultural presence and consumption, the largest assumption in our method is the capacity factor.We have chosen a capacity factor of 0.7 as theatre industry figures suggest that ticket sales of around 50-80% of occupancy are required to break even.This is based on claims that '[m]ost major [West End] shows need around 70 to 80 per cent of seats filled merely to break even' (Telegraph Reporters 2021), '50% to 70% occupancy is typically needed to break even' (UK Theatre & Society of London Theatre 2020), and '[t]he ability of venues to absorb the constraints varies widely, but the Society of London Theatre (Solt), an industry body, has said social distancing measures mean auditoriums will have only between 15 and 30 per cent of seats available to sell, far below the 65 per cent or more capacity required by most productions to break even' (Pickford 2020).We acknowledge that not every performance will break even and that there are other factors in play, with various events subsidised with Arts Council funding, for example.In addition, not every artform will resemble the financial model of theatres.The Independent Cinema Office estimates that 'average rates for a 2, 3 or more screen venue are more likely to be in the 15-20% range' (Independent Cinema Office, n.d.).Furthermore, it is likely that smaller venues with very small operating margins will need to sell a much higher percentage of tickets to cover costs, even as much as 100%.
We also acknowledge that the capacity factor will likely be affected by many other circumstances, and a more nuanced estimation may be achieved by choosing different capacity factors for, say, a football cup final and a smaller local event, or distinguishing between a nationally recognised performer at the Edinburgh Festival Fringe and a new comedian in their debut year, or between a film performance on a Tuesday morning on a hot summer's day and a film performance of the newest film in the James Bond franchise on opening weekend.In all of these instances, combining the Data Thistle Dataset with further data would enable more accurate revenue modelling.For example, the event categories could be used to infer the distribution of child, family, senior concession or other types of tickets likely to be sold, or data about the types of events that often sell out could be used to nuance the expected percentage of total tickets sold.

Limitations 4.2.2.1. Capacity values. At present, the DTD only has capacity values for 120
performance places (out of 2474).Of the 1346 places in Edinburgh, only 82 have capacity values and therefore only a small subset of events at venues in the DTD appear in the above revenue calculation.This omission is made clear in 'Film' schedules, which we would expect to contribute a significant portion of the total revenue.However, the dataset does not include capacity values for the large cinema chains, either for the entire venue or for each individual screen, and therefore most of the revenue generated by 'Film' schedules does not appear in these figures.Obtaining the venue capacities for more venues would increase the relevant data in the DTD.

Capacity values and performance spaces.
The capacity values present in the DTD relate to the capacity of the whole venue, not to each performance space, of which there may be several.The DTD does contain some data about which performance space a performance is located in.For example, the performance place Summerhall has a capacity value of 500 in the DTD but the venue is comprised of multiple smaller spaces, such as 'The Dissection Room' , 'Red Lecture Theatre' and 'Basement' , some of which are temporary structures erected during August for the Edinburgh festivals.Of the 38,704 events in the DTD, 923 take place at Summerhall and 80 have an associated performance space.Moreover, the maximum capacities of venues changed along with changing Covid-19 pandemic guidelines and this is not accounted for in the DTD.Obtaining the venue capacities for each performance space at each of the venues in the dataset -as well as tracking through time the changes due to Covid-19 distancing restrictions -would enable us to make a more accurate calculation.

Online performances.
There are also online events, which are not represented in the DTD as they have no physical location associated with them and this dataset was generated by selecting events in a specific geographical area.The Edinburgh Festival Fringe reports that 300 online events took place in 2020 accruing £250,000 in ticket sales (Edinburgh Festival Fringe 2020), which are not represented in our calculations.A more accurate estimation of revenue would include online events, which raises the question of how to associate online events with a geographical location.

'Free' events.
There are a significant number of 'Free' or 'Pay What You Want' events in the dataset and, as these do not have fixed ticket prices, are not included in the revenue calculation.The following histogram (Figure 3) shows the number of free tickets available for performances in Edinburgh across 2018-2021, and Table 2 gives the number of free tickets in August 2018August , 2019August , 2020August and 2021. .As some performances have multiple ticket types (for example adult = £0, child = £0, family = £0 and student = £0), which are all represented in these figures, the number of free ticket types is larger than number of performances with free tickets.Free tickets have to be taken into consideration when estimating event revenue, especially during August.

Financial effect of pandemic on Edinburgh festivals
Again, it must be stated that these figures should be treated with an abundance of caution as our calculation rests on multiple assumptions and we are aware that we do not have all of the requisite venue capacities.Our estimation shows a loss of over £17 million for the sector in Edinburgh from 2019 to 2020, a figure that is not too far away from the revenue gap of £21 million reported by the Edinburgh Festival Fringe Society (Edinburgh Festival Fringe Society 2020).We should also stress that many of the venues and employees would have been supported by UK Government Covid-19 support grants and furlough schemes (Waitzman 2021), but that there were also many individuals, small traders, and freelancers that would not have been eligible for these schemes.The cultural sector across the UK did receive financial support in the form of a £157 billion recovery fund (Department for Digital, Culture, Media & Sport, & HM Treasury, 2020), which included the Coronavirus Job Retention Scheme and the Self-Employment Income Support Scheme (Waitzman 2021).The furlough scheme in particular has been interrogated elsewhere regarding its success and failures (Brook, O'Brien, and Taylor 2022).
The pandemic has altered how many industries expect to operate, making planning for events and audience attendance more difficult.Will Page, previously the Chief Economist of Spotify and PRS for Music, compiled statistics from PRS for Music, Office of National Statistics, HSBC Economics, Entertainment Retailers Association and the British Phonographic Industry to chart live and recorded music sales between 2019 and 2021.Page's overview of the music industry shows a drastically changed -and evolving -picture.He emphasises the importance of robust data analysis for 'evidence-based policy making, not policy-based evidence making' as the basis for 'policymakers and industry… to figure out what assistance and actions are required to get us back to where we once belonged' (Page 2022).Data work like Page's and like that made possible by Data Thistle's data have far-reaching possibilities for industry and beyond.
The pandemic did not just affect ticket sales: Cultural and creative sectors are important in their own right in terms of their economic footprint and employment.They also spur innovation across the economy, as well as contribute to numerous other channels for positive social impact (well-being and health, education, inclusion, urban regeneration, etc.).(Organisation for Economic Co-operation and Development 2020)  The economic impact of cultural events is not limited to ticket buying: it extends to transport, food and drink, accommodation and spending on other local amenities.Future work could see events data being used as the source for economists to calculate the wider financial benefits that events have to a city such as Edinburgh.

Provision findings
We know that the Covid-19 pandemic drastically affected the landscape of cultural events.After lockdowns were announced on 24 March 2020 (BBC News 2020a), events were cancelled or postponed in their thousands.After the easing of restrictions, we would expect that some industries would be better placed to return to existing levels of event provision more easily, such as outdoor activities.Using the Data Thistle Dataset, we investigated which types of events recovered more quickly and, from this, inferred which audiences have been impacted to a greater or lesser extent by the change in event provision.
This graph (Figure 4) shows the total number of performances (rather than events or schedules) in Edinburgh in August for the years 2018, 2019, 2020 and 2021.August was chosen as it is the city's busiest month for events.The bars are coloured according to category.For example, in August 2018 and 2019, before the Covid-19 pandemic, the dominant categories are 'Comedy' and 'Theatre' .In 2020 and 2021 both the number and categories of performances that took place changes drastically.As expected, the total number of in-person performances is vastly reduced from approx.70,000 in 2019 to approx.1500 performances in 2020 and increases again to approx.10,000 performances in 2021, which is far below the pre-pandemic level of event provision (see Table 3 for exact figures).
Looking at the change in performance provision in Edinburgh from pre-pandemic (August 2019) to the easing of restrictions (August 2021) for the different categories, not all categories recovered to the same extent (see Table 4).This table shows the number of performances in each of eight selected categories as well as a total for all categories.'Visual Art' was the most resilient category of event, retaining 39.74% of pre-pandemic performances in 2021.'Days Out' also managed to retain 29.62% of its pre-pandemic provision.Two categories that can be directly linked to specific demographics, and can therefore be used to intimate their relative impact on those demographics, are 'Kids' and 'LGBT' .In 2021, event provision for children was only 13.37% of that in 2019.But the category with the biggest difference was 'LGBT' , with 2021 event provision for LGBTQIA + communities at only 0.77% of that in 2019.

Provision discussion
For industry, the wider implications of the disparity in event provision during (and potentially after) the pandemic are an understanding of which types of events are more robustly able to weather public health crises.There are also sociocultural implications about how sectors of society, such as LGBTQIA + communities and children, have been affected disparately.

Effect of Covid-19 on LGBTQIA + communities
Research on the impact of the pandemic on LGBTQIA + communities indicates that they have been negatively affected by the pandemic in specific ways (Haworth 2021), especially when taking into account intersections with other minoritised identities (LGBT Foundation 2020).This disparity has been directly linked to the inability of LGBTQIA + individuals to access safe spaces and events, as described in the NatCen report 'The experiences of UK LGBT + communities during the COVID-19 pandemic':  The evidence identified by this review suggests that the COVID-19 pandemic has had a negative impact on the mental health of LGBT + people living in the UK.This includes evidence of increased anxiety and depression, attributed to feelings of isolation and loneliness through the loss of safe, supportive, and identity-affirming peer-groups, communities and spaces.(Hudson et al. 2021, 2) Billy Haworth of the Humanitarian and Conflict Response Institute notes in particular the importance of cultural events to mental health and wellbeing where the inaccessibility of cultural events and spaces during the pandemic has led to a situation where 'disruptions to LGBTIQ + spaces have been acutely felt, including nightlife, but also community spaces, support groups and activities like Pride festivals.These spaces usually provide many LGBTIQ + people with essential opportunities to freely and safely express themselves and their identities' (2021).
It should be noted that worries over declining numbers of LGBTQIA + event and community spaces, and a commensurate effect on the mental health of LGBTQIA + youths, especially, have been prevalent since before the Covid-19 pandemic (Marshall 2021;Walters 2015).The literature on this subject cites a lack of robust empirical evidence (Hudson et al. 2021, 2), such as McGowan et al.'s overview of research on the impact of the pandemic on the health and wellbeing of LGBTQIA + communities, which finds a lack of high-quality research that attends to social and structural factors as well as a lack of sexual orientation and gender identity data (McGowan et al. 2021, 1).Given the importance of cultural events and spaces to the health and wellbeing of LGBTQIA + communities, and the unique circumstances of the pandemic making those events and spaces inaccessible, the use of cultural events data, like those available through Data Thistle, provides an additional angle of insight into the effects of the pandemic.Cultural events metadata provides one more facet of evidence to combine with existing sociological methods of evidence gathering.

Limitations 5.2.2.1. Online events.
However, there are limitations to what the DTD can tell us.
The DTD does not contain data about online events, which have in some ways ameliorated the lack of access to in-person events when 'moving activities online presented opportunities to connect with new and diverse audiences, in some ways improving accessibility' (Haworth 2021).Lo Marshall describes how '[b]y removing geographical barriers…online platforms have shown how LGBTQ + spaces can reach new audiences and connect with communities in difficult circumstances' (2021).It should be noted that data about online events are available -Data Thistle separately collected a comprehensive set of online events during the Covid-19 pandemic -and future work focused on quantifying event provision for LGBTQIA + or other specific communities would benefit from including them.

Intersectionality. Marshall goes on to highlight the ways in which online
spaces can in some cases better cater to intersectional identities with 'intimate digital gatherings for people who live under multiple identities' (2021).Attending to intersections of identity is important in all social research, and particular care must be taken not to treat LGBTQIA + individuals as a homogenous community, as 'there is no single LGBTIQ + experience' but '[d]iversity of COVID-19 experiences reflects diversity within LGBTIQ + populations, including diversity between different subcategories, but also intersections with other factors, like age, class, disability or race' (Haworth 2021) with transgender people facing disproportionate challenges (Haworth 2021; LGBT Foundation 2020).

Data taxonomies.
A limitation arising directly from the DTD is the way in which events are categorised and tagged.For example, at the event level, the DTD contains 75 events with category 'LGBT' and an additional 267 events tagged as 'LGBT' and assigned to other categories.This means that a search for events categorised as 'LGBT' will only return 21.93% (75 out of 342) of potentially relevant events.It should be noted that the 'LGBT' category is particularly susceptible to inconsistent use as event creators may opt to categorise their events as 'LGBT' and use the tags to give supplementary information about the event category or to categorise their events using one of the other categories and use the tags to indicate that the event may be relevant to audiences, or include work by people, from LGBTQIA + communities.In addition to the differences between how tags and categories are attached to events, the textual description of each event may also contain terms indicating a specific audience, such as 'family-friendly' or 'queer' .Any future work using the DTD will require an awareness of these taxonomic discrepancies and how they factor into formulating search queries.
Future work on event provision for LGBTQIA + communities could profitably examine the changing landscape of event provision, the link between access to arts and culture and mental health and wellbeing, and language use around the ways in which events are targeted towards specific demographics.This work could also be extended to focus on events targeted at children.

Conclusions
The question of how to determine cultural value -financially or otherwise -is a difficult one.There exist multiple approaches drawn from multiple disciplines and using heterogeneous data sources.In this article we aimed to evaluate large-scale metadata about cultural events as an additional data source.To this end, we sourced data from Data Thistle about events in Edinburgh from 2018 to 2022, evaluated the data's potential for analysis, and performed some exploratory analyses regarding changes in cultural revenue and cultural provision due to the Covid-19 pandemic.
We introduce a 153MB dataset comprised of events that took place in the Edinburgh and South-East Scotland City Region between 1 November 2017 and 1 May 2022 and describe one method of exploring and working with those data, using pandas DataFrames and Python.The dataset is rich and detailed, and so there is the potential for using it to ask nuanced and valuable questions about event provision and more in the UK.However, this richness comes with complexities.We highlighted some of these -the difference between events, schedules and performances, and between categories and tags -and discussed how they will impact findings.
After working through the characteristics of the data, we conducted two sets of analyses.The first investigated the possibility of estimating the difference in revenue generated during August in Edinburgh -the month when the largest arts festival in the world is held -before and during the Covid-19 pandemic.We propose a method for estimating the revenue from cultural events by multiplying, for each event, the number of performances by the average ticket price, the capacity of the venue and a factor to reflect the average number of tickets sold.For the subset of venues with the relevant data in the DTD, we estimate that there was a 97.3% fall in ticketing revenue between 2019 and 2020.This work also surfaced issues around incomplete data regarding venue capacities and determining the most accurate capacity factor.
In an exploration of how event provision is affected by the Covid-19 pandemic, we looked at whether each category of event was affected equally.From the performances in each category in Edinburgh in August 2018, 2019, 2020 and 2021 we determined that 'Visual Art' and 'Days Out' were the most resilient categories of event retaining the most performances while the 'Theatre' , 'Comedy' and 'LGBT' categories were the most reduced.We connected disproportionate event provision to inequalities in how sectors of society are affected by the pandemic in different ways.
Our key findings from these analyses are 1) large-scale cultural events data add an additional facet to existing approaches for assessing cultural value, both financially and socially, 2) that while the Data Thistle Dataset is rich enough to answer interesting questions it requires a substantial undertaking to understand and clean the data, and the research potential is vastly expanded if the dataset is enhanced or used alongside other datasets.A report on behalf of the Arts and Humanities Research Council in 2016 recommended the establishment of an Observatory for Cultural Value (Crossick and Kaszynska 2016, 10).We also suggest that infrastructural support for accessing and working with large-scale cultural data would allow novel research that can expand a pressing area of enquiry across multiple disciplines, encouraging research that intertwines with industry.

Figure 2 .
Figure 2. for all schedules in edinburgh for august 2018, 2019, 2020 and 2021 by category.

Figure 3 .
Figure 3. Histogram of free tickets frequency per category in edinburgh in august 2018-2022.

Table 2 .
number of free tickets in edinburgh in august 2018-2021.

Table 3 .
total number of performances for all categories in edinburgh for the years2018, 2019, 2020 and 2021.

Table 4 .
number of performances in edinburgh in 2019 and 2021, and the change between 2019 and 2021, by category.