Identifying industry clusters: a critical analysis of the most commonly used methods

This paper analyses the most commonly used methods to identify industry clusters by applying them to Brussels ’ media industry data. The results are compared and bene ﬁ ts as well as limitations are highlighted. The resulting implications for industry cluster research and policy-making are subsequently discussed. It is found that a mixed-methods approach (compared with the application of a single method) can reveal important patterns of industry cluster formation, and that future research should make purpose-driven choices on methods based on known limitations and bene ﬁ ts within the research process.


INTRODUCTION
Industry clusters can be broadly defined as local agglomerations of competitive and complementary economic actors and activities. There is a general consensus among scholars that agglomerations of industry activities bring advantages to the participating firms and positive externalities to the local economy of cities, regions and nations in which industry clusters are located (Eriksson, 2009). This is why, in recent years, industry clusters have become a popular policy tool for economic development strategies and plans. One example is the government of the Brussels Capital Region (BCR), which recognized the importance of industry clustering for the region and subsequently proposed the development of a media cluster in one neighbourhood. The BCR government's plans and strategy were built on a study it commissioned, which identified existing media clusters in the BCR in 2012 (Verheyen and Franck 2012). It is common for governmentswho increasingly aim to create industry clustersto commission such studies.
Since the 1990s, industry cluster literature has seen a significant increase in research on the application and development of methods to identify industry clusters (Cruz & Teixeira, 2010). More recently, the localization of knowledge-intensive business services, life science, and creative and cultural industries has been thoroughly analysed (e.g., Boix, Hervas-Oliver, & Miguel-Molina, 2015;Cruz & Teixeira, 2015), while other industries and sectors are still overlooked. Additionally, there is still no consensus among scholars about which methods are the most reliable. In research, the viability of certain methods and how governments use such research to inform their policy-making has been questioned. There seems to be an issue with strategies relying on flawed insights without being aware of the real extent of existing clusters in their space (Boix, Lazzeretti, Hervàs, & de Miguel, 2011). The most common problems identified in the literature about identifying industry clusters with existing methodologies include issues around viable data, definitions of spatial units and issues of co-location of clusters. Depending on the industry that is analysed, there are particular issues in definitions and spatial dynamics that occur in different industries.
Although these limitations have been discussed, there seems to be no study that exclusively focuses on the limitations for the most commonly applied methods. It is also still not clear what impact the application of different methods can have on the findings when industry clusters are identified. Therefore, the differences in such methods have not yet been fully explored in a comparative way. At the same time, governments are building their insights on studies which rely on these limited methods. The differences between methodologies are very important when policy-makers use the findings. At the same time, understanding the differences and limitations of commonly applied methods can help to identify the most viable method that should be applied in future research.
This paper uses Brussels' media industry as a case study. The media industry is an interesting case, as it has distinctive features including, for example, the production of intangible and experience goods, the heavily collaborative nature of media production and the convergence of media products. At the same time, the media industry has been increasingly targeted by local cluster development policies in recent years. This is also the case for the BCR. Additionally, media clusters are still an under-researched sector in industry cluster research.
The main research question of this paper is: What are the benefits and limitations of the most commonly applied methods when identifying industry clusters? By answering this research question, this paper aims to find a more viable way to identify industry clusters that can inform policymaking for the future.

IDENTIFYING INDUSTRY CLUSTERS: THE LITERATURE
The literature on industry cluster identification includes a wide range of methods. Stejskal (2010) categorizes them into two groups, including quantitative methods (such as localization coefficient, input-output analysis, shift-share analysis, Gini coefficient, Ellison and Glaeser index, and Maurel-Sédillot index) and qualitative methods (such as interviews with experts and case studies). Cruz and Teixeira (2015) found that studies either focus on industry or occupation data and that they are based mostly on simple measures of industry concentration or specialization. Three methods that are most commonly applied (especially in studies commissioned by governments) can be identified: . The most commonly applied method to research the geography of industry activities is by means of very simple standard or stochastic methods, using absolute or relative indexes (Lazzeretti, Boix, & Capone, 2008). This includes the identification of agglomerations through a comparative approach of distribution along different spatial units and has been mostly applied in industry study analyses (cf. Barbour & Markusen, 2007). Employment has been described as the most suitable indicator (Cruz & Teixeira, 2015). . In order to bring into perspective the presence of industry activities compared with the total economic activities in an area, independently of the size of the area, location quotients (LQs) are the most commonly employed method (Boix et al., 2015). LQs are widely used in academic articles (cf. Cruz & Teixeira, 2015) and government-commissioned studies. The LQ compares the relative concentration of an industry in one place of a total area regarding the average of the area and is defined as: where E ij is the number of employees in the industry i in a place j; E i is the total number of employees in an industry i; E j is the number of employees in a place j; and E is the total employment in the analysed total area. An LQ > 1 indicates that the concentration of an industry in a place j is larger than the total average. . In order to represent industry concentration visually, the most commonly applied method is the plotting of dots on a map. The visual representation of dots allows one to zoom in on certain parts of a territory and identify accumulations or agglomerations (Debroux, 2013). In comparison with the above described methods, this method relies on qualitative interpretation of the result, as no empirical evidence can be presented (cf. He & Gebhardt, 2014).
These methods have been applied in numerous studies and represent quite straightforward and easily adaptable research approaches. This explains why many studies commissioned by governments use one of these methods. Of course, in the literature more 'sophisticated' formulas and methods have been developed and tested. For example, Hill and Brennan (2000) combine cluster analysis and discriminant analysis to identify industrial clusters in the United States; and Boix et al. (2011) apply a geo-statistic algorithm based on nearest-neighbour hierarchical clustering to identify creative and cultural industry clusters in Europe. However, looking at existing studies, the application of more sophisticated methodologies is still quite rare, and when they are applied, studies use a unique approach that is seldom comparable with other studies. It is, therefore, necessary to explore in more detail the limitations and differences of the most commonly applied methods, as they are most often used as basis for governmental strategies. This premise defines the research question of this paper.

METHODOLOGY AND DATA ANALYSIS APPROACH
The research question is addressed through an exploration of the application of a mixed-methods approach that compares the findings of the three most commonly applied methods for industry cluster identification. Johnson, Onwuegbuzie, and Turner (2007) discuss the application of mixed methods in research and observe that triangulation can be part of a validation process ensuring that the explained variance is the result of the underlying phenomenon and not of the method. This paper contributes to this discussion and explores to what extent triangulation can influence future research on the matter.
In order to compare critically the three identified methods, several research steps were taken. First, data were extracted. Nomenclature of Economic Activities in the European Communities (NACE) codes that describe the media industry, as developed by Komorowski and Ranaivoson (2018), have been used to extract micro-firm data from Bel-first (a database on Belgian companies published by Bureau van Dijk). The 2014 data for economic entities, addresses and number of employees were retrieved for all bodies where the principal activity falls under the media industry NACE codes. Entities within Brussels, as determined by the European Union's Nomenclature of Territorial Units for Statistics (NUTS) levels for data-collection purposes, and the neighbouring NUTS regions were extracted. This enabled consideration of Brussels as a metropolitan area (BMA).
Second, empty data fields on the number of employees were harmonized by applying the median (cf. Britton & Legare, 2005). For analysis purposes, the location data were enriched with information on neighbourhood level within Brussels, based on information provided by the Brussels Institute for Statistics (around 90% of addresses could be assigned to a neighbourhood). Additionally, the micro-location of addresses of entities in the BMA were created with the tool Doogal, to extract the longitude and latitude of addresses (around 95% of addresses in the BMA could be plotted). Limitations of the data-gathering and harmonization process are discussed below. The data were finally analysed with the software Tableau.

IDENTIFYING BRUSSELS' MEDIA CLUSTERS: THE FINDINGS
In 2013, the government of the BCR approved the Sustainable Regional Development Plan. The plan identifies so-called territorial competitiveness 'poles'areas or neighbourhoods in Brussels that will be developed to strengthen the position of priority sectors in them. One of them is 'Pole Reyers for communication and imaging' (Brussels-Capital Region, 2013). Based on this plan, substantial financial and planning efforts have been made to support the development of local media clusters, including the creation of the so-called mediapark. brussels in the neighbourhood Reyers, an urban and property development project that is supposed to be finished by 2024. These plans are built on a study commissioned by the government of the BCR in 2012. In this study, media activities were plotted on a map and three 'media poles' were identified in three different neighbourhoods in Brussels: Tour & Taxis, Reyers, and Ixelles/Saint-Gilles (Verheyen & Franck, 2012). In the following section, the findings of the three most commonly applied methods will be given to demonstrate the differences in the findings between the methods and the BCR government-commissioned study.

Comparison of the applied methods
Based on the first method applied, the data show that the biggest media cluster in Brussels can be clearly identified in the neighbourhood of Reyers, which hosts 19.2% of Brussels' media employees. The high share of media employment in Reyers is due to several of the biggest Belgian broadcasters being located there, including VRT, RTBF, RTL and BETV. Additionally, the European Quarter (2.5%), the Ixelles Ponds (2.2%), Matonge (2.0%), Globe (2.0%), and neighbourhoods in and around the City Centre (Port de Hal, Notre-Dame Aux Neiges and North Quarter) have high concentrations of media employment (Figure 1).
Using the LQ analysis as the second method to look at the postcode areas within Brussels, different patterns of concentration of the media industry are revealed in relation to the absolute concentration. The postcode areas that feature an above-average representation of media industry activities include Schaerbeek (2.5) (in which Reyers is located), Sint Gillis/Saint Gilles (2.0) and Elsene/Ixelles (2.0) (Figure 2).
The final method, plotting of addresses, indicates that Brussels city centre (within the city ring) is completely 'covered' in media organizations. Additionally, there is a high prevalence of media organizations in the area inside of the inner ring (R21, N 290 and connecting streets), with easily observable concentration. The further a media organization is from the inner ring, the more scattered and less dense the agglomeration (Figure 3).

DISCUSSION OF THE DIFFERENCES AND LIMITATIONS OF THE APPLIED METHODS
Comparing the results of the three applied methods reveals inhomogeneous patterns: . Relative indexes on employment distribution has been applied in this research. This method allows a very detailed analysis, as data are available at a neighbourhood level as  neighbourhoods are defined as statistical units (in contrast to political or other geographical units, such as postcode areas). However, not all cities provide the most detailed data and it was not possible to assign all addresses to a neighbourhood. While different neighbourhoods have clear concentrations of media employment, large enterprisessuch as the public broadcasters in this analysisdistort the results. . The LQ was applied on a postcode area level in Brussels. There are no data on neighbourhoods for the total employment available and postcode areas are often not very representative units of analysis. The results show that certain areas that have been identified with the first method are not identified with this method. For instance, the City Centre is significant in terms of absolute share, but has a below-average representation of media activities, and, therefore, no significant competitive advantages for media agglomeration can be assumed. There is also a question around whether media activities in an area where a lot of other economic activities are also located excludes the existence of a media cluster. . Media organization locations were plotted on a map. The interpretation of the results is solely based on observations, while no empirical evidence can be given. The extraction of longitude and latitude locations of media firms has limitations as not all addresses can be plotted with the available instruments. However, the visual representation of dots allows one to zoom in on certain parts of a territory and identify accumulations or agglomerations that go beyond predefined borders and even beyond the city borders. The plotting shows that no clear borders between the different media clusters can be found, which may raise the question of whether the previously identified media clusters can actually be differentiated or if they stretch into one another.

CONCLUSIONS AND IMPLICATIONS FOR FUTURE RESEARCH
The research question of this paper was: What are the benefits and limitations of the most commonly applied methods when identifying industry clusters? First, the findings show limitations of the methods in identifying the exact extent and location of a media cluster. While the methods applied in this paper show that different areas have strong concentrations, it is not possible to distinguish whether the concentrations exist only due to the overlapping of different media clusters or whether a single media cluster is formed. Second, each method has limitations in terms of application and access to data. For instance, in order to apply the LQ in the analysis approach, it is necessary to have access to data on total employment in the area. This is not always given for smaller and better comparable areas. There are explicit limitations when the NACE system is applied to the media industry. For the majority of activities in the media industry, there is no code (e.g., mobile app development), or entities carry out activities in a number of codes (e.g., a broadcaster who also publishes text on their website). Importantly, this kind of data assumes that the number of employees assigned to an organization actually perform their activities at the address. In practice it is often not possible to distinguish between headquarters and employee location. Additionally, data harmonization was necessary because some figures are only available to a limited extent due to the filing obligations, meaning that smaller entities often provide less data. The findings are dependent on these data restrictions and translations made in the harmonization process.
Finally, if the researcher is aware of these restrictions, including the distortion of results through large enterprises, comparability of geographical units, data restrictions and more, the results show that the most commonly applied methods give the researcher the opportunity to find first tendencies and understand where concentrations of media activities in the city can be found.
Governments are often reliant on insights given from one of these most commonly applied methods, without being aware of the limitations found in this study. The findings suggest that future research may benefit from a triangulation of these simple methods. Valuable insights can be given and validated through comparison. When combining the three methods, cluster formations beyond geographical borders can be revealed, while statistically proven cluster formations can be found. At the same time triangulation of the methods allows researcher to better assess the importance of found clusters for the local economy. This is not possible using only one of the analysed methods.
Even if future research decides not to apply a multi-methods approach, the results given here highlighting the limitations and benefits of the research processes can support the choice of methods for future research. Of course, the context in which researchers work also has an impact, including limitations of funding, time, access to data and more. The identified limitations and benefits are not meant to be used to directly advise or discourage the use of a certain method. The findings of this paper are supposed to show that researchers need to be aware of limitations and find the right arguments to support their choice of method, focusing on the purpose and context of the research process. Future research should further explore the limitations and benefits of methods in the context of the industry study object. As has been shown in this study, the media industry is still a sector that has received less attention in industry cluster research and there are limitations for data access and validity. Different methods could give better insights into different industry clusters, which should be further investigated in future research. More attention should be also paid to existing, more sophisticated, methodologies that have been developed. So far, such methods are often presented in isolation and are not applied in a comparable way.
The findings of this paper can also support policy-makers' decision-making processes. As has been shown, the findings on which the government of the BCR have built their policy plans can be questioned. This study has found different media clusters compared with the government-commissioned study. Only when the limitations are known are policy-makers able to make the right decisions to avoid unnecessary spending or the syphoning of pre-existing clusters in the city.
In conclusion, this paper finds that a mixed-methods approach (compared with the application of a single method) can reveal important patterns of industry cluster formation and that future research should make purpose-driven choices on methods, based on known limitations and benefits within the research process.

DISCLOSURE STATEMENT
No potential conflict of interest was reported by the author.