Agglomeration, diversity, and tradition: an analysis of fractionalized web of science publications in EU regions

ABSTRACT A region’s science output depends on several spatial factors. A generalized least squares function with random effects was estimated to gauge the impact of such factors in European NUTS2 regions. The main findings are that output is positively associated with the number of researchers in higher education institutions and negatively associated with the Herfindahl index of disciplinary specialization. Regions with old universities and good accessibility are also more productive, but these effects are mostly limited to Europe’s core. Most leading science regions are in western or northern Europe. They combine large agglomerations of university scientists, disciplinary diversity, old university traditions, and good interregional accessibility.


Introduction
What is it that makes a region host more scientific creativity than others?In a postindustrial society, an answer to this question would identify a key source of regional competitiveness.Yet multi-causal spatial studies of science output are still uncommon.
From the 1970s onward, regions in Western Europe and North America have been undergoing an economic restructuring.Bell (1973) is among the first contributions to stress the increasing reliance on science as the source of knowledge for innovations, while Florida (2002) focuses on occupational restructuring away from manufacturing jobs toward knowledge-intensive services as the hallmark of post-industrializing regions.
Among creative activities, scientific research-whether measured as research output or jobs-has been growing at an even faster rate than general economic activity (Matthiessen et al., 2010).Florida (2008) claims that the world of science is even 'spikier' in a spatial sense than other spheres of economic activity.But other studies, notably Grossetti et al. (2014), note that the growth of scientific activity has been faster in lagging regions.
Among studies that analyse research output with statistical methods, only a handful use a spatial perspective (e.g.Pumain et al. (2006); Nomaler et al. (2014), and as far as we know, there are only two that include rigorous measures of interregional accessibility (D.E. Andersson et al., 2020;Karlsson et al., 2009).This study employs a combined institutional and spatial theoretical framework to more regions than any other study.An added novelty is separate analyses for three subcontinental regions of the European Union (EU), which differ not only in their level of development, but-as we shall see-also in the relative importance of different factors that affect science output.The analysis employs a random-effects generalized least squares (GLS) model with robust standard errors, and comprises annual observations of more than 90 percent of all NUTS2 regions in the European Union (EU) from 2001 to 2013.
Our research focuses on two dimensions of regional performance.The first concerns institutions.In spite of European integration policies, most funding sources remain national, and levels of per capita spending on research within higher-education institutions ranged from $40 in Slovakia to $384 in Denmark in 2012 in purchasing power parity-adjusted international dollars (OECD, 2012).General per capital research and development (R&D) expenditures in EU member states were similarly wide-ranging, ranging from $146 in Poland to $1,415 in Finland (ibid.).Added to this are microinstitutional differences, such as the relative priority that different universities give to research as opposed to teaching or outreach activities.
The second dimension is accessibility.Researchers benefit from access to other researchers, since scientific research is for the most part a collaborative activity, particularly in the natural sciences, medicine, and engineering.This access could be to other researchers in the same region, which we may denote as intraregional accessibility.But it is increasingly common for researchers in different regions to collaborate, and this type of interaction is facilitated by good interregional accessibility.
We contend that to properly understand regional development processes, one needs to employ an approach that combines spatial and institutional perspectives.Almost all relevant empirical studies are either exclusively spatial or institutional.The novelty of this study as compared with prior studies about regional or national research performance is that we combine these two theoretical approaches to reach a fuller understanding of regional development.
This paper is organized as follows.The next section provides the theoretical framework that constitutes the foundation for our empirical model.Section 3 describes the variables and data sources while section 4 presents the estimation method, which is a generalized least squares model with random effects.We report and discuss our estimation results in the penultimate section, before summing up the main takeaways in the final section.

Theoretical background
The most common theoretical approach for analysing regional or university-level output is the production function approach (Griliches, 1979).Typically, production functions for science output employ a Cobb-Douglas function.This is an extension of a conventional microeconomic production function, whereby firms transform inputs into outputs of various kinds.In the case of the regional production of scientific publications, the necessary input during a specific period is human capital, which we measure as the regional numbers of researchers in higher education institutions and high-technology firms.We assume that universities and firms provide the necessary physical capital, such as access to databases, computer hardware and software, and -in some disciplines -laboratory equipment.
It is also important to keep in mind that different production factors-or productivity-enhancing environmental factors-operate on different time scales (D.E. Andersson et al., 2020).By definition, a region's institutions and its spatial accessibility vis-à-vis other regions are factors that remain approximately constant over long periods.But empirical data also show that regional volumes of residents or researchers change at a much slower pace than measurable outputs such as journal articles.This has been particularly evident during the early 21 st century, with its high rate of science output growth in most regions (Matthiessen et al., 2010).Our general approach is therefore to view science output as a relatively 'fast variable' that depends on regional factors that change at a slow or moderate pace.Factors that are not timeinvariant throughout the course of the analysed period are, additionally, subject to a one-year time lag.

Institutions and path dependence
Institutions, defined as collective and durable rules that constrain individual action, affect many different performance indicators.The starting point for many institutional analyses is Douglass North's (1990) explanation for the divergent development paths of the north-western and southwestern parts of Europe over a period of several centuries.But institutional differences remain in operation, in spite of the gradual convergence of regional performance within the EU.Harzing and Giroud (2014).

URBAN, PLANNING AND TRANSPORT RESEARCH
In science, institutional factors are part of the explanation of persistent cross-national differences in funding and output levels in the EU.Table 1 presents population-adjusted science output, general R&D expenditures, and university R&D expenditures in 15 EU member states in the early 21 st century.
As Table 1 shows, countries differ not only in per capita research output, but also in R&D spending and R&D employment.These indicators show that the countries of northwestern Europe have higher levels of scientific activity as well as R&D expenditures as compared with southern and eastern countries.
Each university also has its own institutions, which constrain faculty in their allocation of time and effort to research, teaching, and service.In a study of European and American universities, Aghion et al. (2010) show that greater university autonomy is associated with larger publication volumes, other things being equal.Bauwens et al. (2011) contend that better university institutions are associated with more highly cited researchers, with the United Kingdom, Sweden and the Netherlands hosting the highest shares, relative to population size, in the EU.Large national differences in per capita funding and research output justify using country effects in our estimated models.
D. E. Andersson and Andersson (2020) contend that universities that subscribe to the old 'Enlightenment model' of academic freedom and university autonomy produce more and better research than universities that focus on the employability of students or business engagement.An analysis of publication volumes at Australian universities supports this contention, with older Group-of-Eight universities producing greater publication volumes, controlling for research funding and numbers of faculty and postgraduate students (Abbott & Doucouliagos, 2004).In a study of a handful of European countries, D. E. Andersson et al. (2020) show that regions with a longer history of university research have higher levels of research output, other things being equal.For institutional reasons, we expect the age of a region's oldest university to exert a positive impact on science output.
In addition to influencing overall research activity, institutions may influence the research profiles of universities, regions and countries along path-dependent diversification or specialization trajectories.While Harzing and Giroud (2014) note that even large countries have distinct specialization profiles, Heimeriks et al. (2019) show that the NUTS2 regions with the greatest research volumes also have the most diversified portfolios of disciplines.In addition, regions in the north and west of Europe tend to have the most complex knowledge specializations.In this context, complexity refers to the presence of uncommon subfields that draw on advanced knowledge in disparate disciplines (ibid.).
Many scientific breakthroughs occur as a result of combining ideas from different disciplines.Hollingsworth (2012) enumerates five institutional factors that increase the likelihood of major scientific discoveries: high scientific diversity within organizations; scientists' internalization of scientific diversity; integration of scientists from different fields; diversity-endorsing research leaders; and high levels of institutional autonomy and flexibility.While there is always a limit to the absorptive capacity (Cohen & Levinthal, 1990) of individual scientists, increasing this capacity is conducive to scientific creativity.Our hypothesis is thus that there is a positive association between scientific diversity and scientific productivity.

Externalities and accessibility
It has become commonplace to note that spatial externalities are a key type of agglomeration economy that benefits densely populated regions disproportionately.But references to knowledge externalities are often vague and amorphous.Johansson (2005) provides a useful typology that distinguishes between six different types of externalities.One of these externalities is the type that is most relevant to science production; horizontal innovation externalities involve knowledge flows such as joint research, either as collaboration through the creation of formal interpersonal links or as unplanned spillovers.A region's overall spatial accessibility reflects its potential benefits from distance-attenuated externalities of this type.
The relation between spatial accessibility and horizontal innovation externalities hinges on the 'stickiness' of tacit knowledge.Such knowledge is much easier to transmit through face-to-face interaction than as disembodied flows of information (Von Hippel, 1994).Consequently, Anselin et al. (1997) show that U.S. high technology firms' innovativeness is an increasing function of university science output in the surrounding metropolitan area, while Kelly and Hageman (1999) show that the location of industryspecific R&D activities depend more on the location of R&D in other industries than on the location of production in the same industry.
Horizontal innovation externalities tend to increase with increasing science intensity.As a case in point, Mariani (2002) shows that the most science-intensive Japanese firms are more likely to locate R&D-only labs in suitable knowledge regions-that is, they choose locations for their R&D activities that have been spatially decoupled from the production plants of the same firms.In science-intensive industries such as biotechnology, it is the norm for scientists to primarily rely on non-market research collaborations with nearby university scientists for the creation of knowledge that may spawn new product innovations (Liebeskind et al., 1996).

Restructuring and regional tiers
Europe has been undergoing a restructuring toward a knowledge-based society since the 1970s (D.E. Andersson & Andersson, 2019).Hallmarks of this process include a shift from manufacturing to knowledge-intensive occupations, greater R&D expenditures, and the extension of mass education to the tertiary level.In the early stages, the new structure was mostly confined to clusters of universities and high-tech firms such as Cambridge and Copenhagen.But the restructuring process gradually involved an increasing number of regions.Johansson and Klaesson (2011) show that 'knowledgeintensive services to firms' and 'ICT products and services' were more over-represented in the Stockholm region in 1993 than in 2008, and that this is a typical instance of the interregional knowledge diffusion process in post-industrializing countries.
In the 2001-2013 period, the majority of regions in north-western Europe had attained typical post-industrial characteristics, such as large shares of knowledgeintensive or 'creative class' jobs, high per capita R&D expenditures (see Table 1), and widespread postmodern values (Inglehart, 1997).Southern and eastern parts of Europe lagged behind.Although Matthiessen et al. (2010) and Grossetti et al. (2014) show that some southern cities, notably Barcelona and Madrid, exhibited Europe's highest rates of science output growth in the first decade of the 21 st century, this has been a catch-up process, given the lower per capita output figures in the South than in the Northwest (Harzing & Giroud, 2014).Southern research activities were also more concentrated in a handful of big cities than was the case in the Northwest.The post-communist countries of the East were even further behind, with moderate levels of research activity concentrated in capital-city universities.Other development indicators, such as total factor productivity per hour worked (OECD, 2021), per capita R&D spending (see Table 1), and the prevalence of postmodern values (Akaliyski et al., 2020) show a clear division of the EU into disparate parts: the Northwest, the South, and the East.This is also our rationale for estimating separate regional output functions for each part.

Data
This study makes use of NUTS2 regions as the spatial units of observation.NUTS2 refers to the intermediate level of spatial aggregation in official EU statistics (Eurostat).It is a better approximation of the feasible spatial extent of daily face-to-face interaction opportunities than the smaller NUTS3 and larger NUTS1 levels of aggregation.While not always ideal, they constitute the closest approximation of metropolitan areas among territorial units with published and mutually consistent statistical indicators.Studies that employ more consistent regional delimitations, such as Matthiessen et al. (2010), do not attempt any statistical analyses of potential explanatory factors.
However, the NUTS2 regions corresponding to some national capitals are too small, as compared with any reasonable approximation of commuting areas (Annoni & Dijkstra, 2013).Consequently, the capital region has been combined with the surrounding NUTS2 region(s) in seven of the 21 countries included in this study (see Table A1 in the Appendix).Our analysis encompasses 249 of the 270 NUTS2 regions according to the NUTS 2010 classification of the 27 member states at the time.We excluded six of these 27 countries (i.e.Cyprus, Estonia, Latvia, Lithuania, Luxembourg, and Malta) due to missing observations for one or more of the explanatory variables.

The dependent variable
The main variable of interest in this study is the quantity of scientific publications in a spatially delimited region.As scientific publications, we include all publications indexed by the Web of Science (WoS).The dependent variable, PUB, thus includes peerreviewed books, journal articles, and conference papers across all academic disciplines, but excludes so-called 'predatory journals' (Beall, 2012). 1 Our measure of publication is wider than studies that only use articles indexed by the Science Citation Index Expanded (SCIE), which provide a narrower focus on regional productivity in the natural sciences, medicine, and engineering (e.g.Grossetti et al., 2014;Matthiessen & Schwarz, 1999;Matthiessen et al., 2002Matthiessen et al., , 2010)).
The publication count for each region uses the fractionalization approach, which means that co-authored papers may be allocated to more than one region.For example, if there are three authors from three different NUTS2 regions, then each of the three regions receives one third of one publication.Fractionalization allows us to avoid overestimates of the publication volumes of large regions (Luukkonen et al., 1992).
Table 2 presents all regions with more than 2,000 publications in 2000 and 2013.Because the delimitation of NUTS2 regions uses a population size criterion, some regions in sparsely populated countries such as Finland and Sweden tend to contain several small urban areas.In contrast, the most densely populated countries comprise too many NUTS2 regions, notably the Netherlands, where the Randstad conurbation encompasses the regions of North Holland (Amsterdam), South Holland (Rotterdam), and Utrecht.Table 3 shows that the northwest is the core of European science production.With the exception of four large urban regions in southern Europe, the rest of the south and east produce more modest volumes relative to population size.Except for Poland, scientific research in the east is almost exclusively the concern of national capital regions.

Direct input variables
The direct producers of science outputs are research scientists.We assume that researchers have sufficient access to relevant complementary capital goods, which are associated with considerable cost differences across disciplines.We use the number of full-timeequivalent (FTE) researchers in higher education institutions (USCI) as the first direct input variable 2 , and FTE researchers in high-technology manufacturing and knowledgeintensive high-technology services (ISCI) as the second input variable.

Institutional variables
As we have noted, national institutions may shape research outcomes.A country dummy controls for the direct effects of national institutions on national output levels.The year of establishment of a region's oldest university (YEAR) measures the time-dependent effect of institutional learning-by-doing on the university itself and neighboring higher education institutions.The latter effect is in this case conceptualized as a distanceattenuated knowledge externality.
In addition, path-dependent institutional processes may lead a region to diversify or specialize, with attendant general or discipline-specific effects on scientists' probability of obtaining research funding from local sources.We use the Herfindahl-Hirschman Index of disciplinary diversity (SHHI) to measure the relative specialization of a region, with 23 separate disciplinary categories.An index value of 1 implies maximum specialization, with all research publications allocated to the same category, while a value of 1/23 denotes maximum diversification.

Accessibility
Spatial accessibility encompasses intraregional and interregional accessibility (Karlsson et al., 2009).Intraregional accessibility refers to the benefits that arise from interaction opportunities within the same region.Interregional accessibility refers to the relative accessibility of a region vis-à-vis other regions, which both reflects the centrality of the location relative to other relevant agglomerations and the state of transport infrastructures for outward connectivity.Our accessibility measure, ACC, combines intraregional and interregional accessibility: The accessibility [measure] uses centroids of NUTS2 regions as origins and destinations.
The accessibility model calculates the minimum paths for [a] network, i.e. minimum travel times between the centroids of the NUTS2 regions.For each region the value of the potential accessibility indicator is calculated by summing up the population in all other regions weighted by the travel time to go there.For access to the region to itself, the time to the centroid of the region is used, while for access to other regions: (i) travel time over the network between the two centroids plus the (ii) access from the destination centroid to the destination region are used.(Annoni & Dijkstra, 2013, p. 45) The description implies that the intraregional portion of this multimodal accessibility measure increases both with the size and the density of a region, after controlling for the state of the road, rail, and air transport infrastructures.

Control variables
We only use one control variable, since all other tested control variables were consistently insignificant across all specifications and functional forms.Our control variable, INC, is a measure of per capita income that is specific to each NUTS2 region.Table 3 presents means and standard deviations for all analysed variables for each of the three analysed parts of Europe, as well as for the combined sample.On average, northern and western regions of Europe have the most publications and scientists, the best accessibility, and the most diversified science production.Northern, western, and southern European regions tend to host older universities than the east.In addition, northern and western regions tend to have higher levels of education, income, and interpersonal trust.

Estimation method
We employ a generalized least squares (GLS) panel model with random effects.In a random-effects model, the constant term is random and is specific to a unit of observation, i.The constant consists of a fixed term and a random term, in order to control for (observable and unobservable) heterogeneity.In this case, the unit is a NUTS2 region.
The choice of random effects, rather than fixed effects, is due to the presence of three fixed variables that reflect the slowly changing nature of their effects on regional performance indicators.The models also make use of country and year dummies, and our estimated standard errors are robust with respect to the potential presence of heteroscedasticity and/or autocorrelation.The accessibility variable, ACC, substantially reduces the severity of problems associated with potential spatial dependence (M.Andersson & Gråsjö, 2009).

Estimation results
The model results for the northwest and south of Europe, provide high levels of explanatory power, with R-squared coefficients from 0.87 to 0.94 (see Table 4).The model for the east explains a smaller share of the interregional variability (R 2 = 0.67).
It is clear from the results that the number of researchers in HEIs is the most important input, with effects that are in the neighbourhood of constant returns to scale in all regions.On the other hand, the number of researchers in high-technology industries has no significant effect on the regional published output, with the exception of a possible and unexpected negative effect in the east.
A positive effect of hosting older universities is evident in northern and western Europe.In the south and east, however, there is no significant effect.While it is impossible to infer the cause of this difference from the estimation results, one possibility is that there has been greater continuity of university autonomy in the north and west than elsewhere, due to long periods of autocratic rule in the 20 th century in all eastern countries, as well as in Greece, Portugal, and Spain.Similar periods but with shorter duration may have affected university institutions in Germany and Italy.Except for Germany from 1933 to 1945, northern and western countries have had uninterrupted traditions of pluralist institutions spanning several centuries.Diversification of the regional research structure has a strong positive effect on publication volumes in northern, western and southern Europe, although there is no significant effect in the east.This is a stronger estimated effect than in D. E. Andersson et al. (2020), which reported a smaller diversification effect of −1.32 in a subset of 153 NUTS2 regions 3 in the 2007-2012 period, as compared with our estimated effect of −1.56 in 132 northern and western NUTS2 regions and −2.08 in 60 southern NUTS2 regions from 2001 to 2013.
The effect of the accessibility variable is restricted to Europe's post-industrial core in the northern and western parts of the continent, which as a whole mostly consists of regions with high or moderate accessibility, but with some low-accessibility regions in northern Finland and Sweden.There is no significant effect in southern or eastern Europe, which involve regions with low or moderate accessibility.One possible interpretation is that accessibility benefits only come to the fore at the highest levels of accessibility, such as in Benelux and the German Land of North Rhine-Westphalia.Another interpretation is that north-western Europe consists of dense corridors of science centres, whereas scientists in the south and east are concentrated in national capitals and a handful of large cities such as Barcelona and Milan.If the latter interpretation is correct, it would imply that one should give greater weight to airport accessibility and less weight to roads and railroads than in the present study.But we cannot offer a conclusive answer on the basis of the results reported here, other than that good accessibility enhances the productivity of scientists in the European heartland.
The income variable, which is a common proxy for general economic development, is significant in the northwest and east.It exerts an especially strong estimated effect in the east, which may reflect the low levels of income and development in the east, particularly in rural and peripheral regions.These regions are decades behind the cutting-edge postindustrial structure of Western Europe's leading knowledge conurbations.

Conclusions
The empirical analysis in this paper represents the most comprehensive study of aggregate science production at the regional level in the EU.It is the first study to make use of a multivariate econometric model for almost all NUTS2 regions.The observations encompass every year in the 2001-to-2013 period.We can draw five main conclusions on the basis of our results.
First, the key input for knowledge production in the form of peer-reviewed scientific publications is the number of researchers in higher education institutions.The estimated elasticity is in the vicinity of one in all parts of Europe, which implies that output is proportional to input of HEI researchers at the regional level, other things being equal.
Second, researchers in high-tech industries do not have a measurable impact on overall publication output.This does not necessarily reflect substandard research quality.A more reasonable interpretation is that research efforts that involve profit-seeking firms are less directed toward wide dissemination of scientific findings, and more toward proprietary research objectives such as new patents and other sources of competitive private advantage.
Third, diversification of science has a substantial and significant impact on research output in Europe.This effect is particularly strong in the south, which on average hosts the most specialized agglomerations of science (see Table 4).Our results support the argument that multidisciplinary research is more likely to lead to knowledge gains than narrow specialization in departmental silos, although the presented evidence is indirect rather than direct.
Fourth, older universities seem to be associated with more output-oriented institutions than newer universities and colleges in the scientific heartland of Europe, consisting of all the main northern and western countries.This effect is however only weakly evident in the east, and not at all evident in the south.We hypothesize that this may be due to the importance of having an unbroken tradition of academic independence and pluralism, but to conclude whether this is indeed the case is beyond the scope of the present study.
Fifth, accessibility in transport networks also seems to be limited to the European core.Given the results of D. E. Andersson et al. (2020), which concluded that accessibility mainly benefits small regions, our results complement that result by using other criteria for segmenting EU regions.In this study, accessibility advantages mainly accrue to researchers in the north and west.Since this is on the whole the subcontinental area with the best accessibility, the results seem to indicate that accessibility benefits mainly affect researchers in high-accessibility regions, with negligible productivity differences between scientists in moderate and low accessibility regions.
Reformulating these conclusions as four key scientific competitiveness criteria would imply that north-western regions with large numbers of university researchers, low levels of disciplinary specialization, good accessibility within multimodal transport networks, and ancient universities would be especially suitable for scientific research.Cambridge and Oxford are examples of university towns that perform exceptionally well according to these criteria, as are a small number of large conurbations, notably Cologne-Bonn-Aachen (DEA2) and Paris (FR10).

Notes
1.The Web of Science employs rigorous quality criteria for the inclusion of journals such as the rigour of the peer review process and measures of the organizational and geographical diversity of editorial board members and published authors.It is thus more effective at excluding so-called 'predatory journals' than the broader Scopus index of scientific publications.2. Most of the missing data problems at the NUTS2 level of observation concerned USCI.
Noting the relative stability of this variable in regions with observations for all years in the 2001-2013 time period, we decided to use a fixed quantity that corresponds to the year with the greatest number of regional observations, 2011, or, in isolated cases, 2010 or 2012.Data availability issues relating to this variable also explains the exclusion of six small EU countries and some Belgian regions from the analysis.3. Most observations in D. E. Andersson et al. (2020) were in three large and two mid-sized countries where they analyzed all rather than a subset of regions: France, Poland, Slovakia, Spain, and Sweden.

Table 2 .
Regions in the Northwest, South, and East of the European Union with more than 2,000 fractionalized Web of Science publications (PUB) per year, 2000 and 2013.

Table 3 .
Descriptive statistics for NUTS2 regions in the European Union: North & West, South, East, and  Combined, 2001-2013 a. USCI is the number of full-time-equivalent scientists in higher education institutions.b ISCI is the number of full-time-equivalent scientists in four two-digit high-technology industries.c INC refers to the income generated directly from market transactions.This includes income from the sale of labour services.It also includes asset income such as interest income, dividends and rents.In addition, there is income from net operating surpluses and self-employment.Interest and rent payments are recorded as negative income.The balance is the primary income.
a Northeast: Austria, Belgium, Denmark, Finland, France, Germany, Ireland, Netherlands, Sweden, and the United Kingdom.South: Greece, Italy, Portugal, and Spain.East: Bulgaria, Czech Republic, Hungary, Poland, Romania, Slovakia, and Slovenia.a

Table 4 .
Generalized least squares models with random effects and robust standard errors; dependent variable: natural logarithm of fractionalized number of Web of Science publications (PUB); NUTS2 regions in the European Union: North & West, South, and East, 2001-2013.Germany, Ireland, Netherlands, Sweden, and the United Kingdom.South: Greece, Italy, Portugal and Spain.East: Bulgaria, Czech Republic, Hungary, Poland, Romania, Slovakia and Slovenia.