Global science and national comparisons: beyond bibliometrics and scientometrics

ABSTRACT In the last three decades, a networked global system has emerged in the natural-science-based disciplines, sustained by collegial epistemic relations in universities. Nationally ordered and funded science has expanded alongside the global science system. The common global pool of papers, defined by bibliometric collections, nevertheless excludes large components of knowledge. In the global system, four tendencies are apparent: (1) rapid growth of papers, (2) diversification of scientific capacity to many more countries, (3) expansion of networked international and national collaboration as measured by co-authorship, (4) growing multi-polarity of capacity, outputs and quality, with the rise of China and several middle-sized national systems outside the Euro-American bloc. The paper critiques the interpretation of global science dominant in scientometrics, in which positivist data analyses are applied to performative national comparisons. It argues for a historical-synthetic explanation of the global system that combines data and theorisation, and accounts for relations of power.


Introduction
After the establishment of the Internet in the early 1990s, networked communications in research and scholarship expanded at a rapid rate. A global science system emerged, comprised of four elements: (1) a common pool of knowledge in the natural sciencebased disciplines, defined by two bibliometric collections of papers, Web of Science and Scopus; (2) scientists, who produce and exchange knowledge; (3) their structure of communications between them and (4) practices and protocols that govern their work. 'Scientists are organised in global epistemic communities that codify their knowledge in peer-reviewed articles published in specialist journals' (Wuestman, Hoekman, and Frenken 2019). By the term, 'global' is here meant activities and relations that constitute a planet-level ontology and move towards a more integrated world over time (Conrad 2016). By 'system' is meant elements that form an interactive whole within defined boundaries. 'Global science' predominantly refers to natural science-based fields of knowledge, though the global pool includes a modest volume of social science papers and some work in the humanities.
Two elements are key to understanding contemporary science. One is the flourishing of activity on the global scale, which is exceptionally dynamic and now leads most disciplines, so that the natural sciences are readily imagined as a single field of creation, exchange and comparisonthough by no means all knowledge is globally recognised, as will be discussed. The dynamism on the global scale is apparent in four areas. First, since 2000 globally circulated papers have increased by 5% a year. Second, capacity in natural science-based fields is increasingly dispersed on the global scale, with middleincome and some low-income countries forming endogenous self-reproducing science systems. Third, papers co-authored by networked scientists from more than one organisation have grown faster than papers overall, and internationally co-authored papers have grown especially rapidly. Fourth, while the United States (US) is still the world leader in high citation science and networked collaboration, the old Euro-American global order in science is becoming pluralised, with the rise of China and several other large national science systems outside Euro-America.
The other key to science is that it has evolved a dual heterogeneous system structure, global science and national science. Global and national science are distinct in form. Global science is self-managed by scientists in distributed professional networks. National science is normatively centred by nation-states. National systems are comprised not only by knowledge, people, networks and protocols, as in the global science system, but also by laws, regulations, policies, agencies, institutions, infrastructures, and especially funding (in that regard European research programmes replicate the role of nation-states, at the regional scale). Wagner, Park, and Leydesdorff (2015) describe global science as 'operating orthogonally to national systems' (12). The two kinds of science also overlap and interact. They share the pool of global papers and also share scientists who are active in both the global and national domains (Marginson 2021a;2021b). National science includes both the national part of global scientific activity and also scientific infrastructure, activity and outputs that fall outside the global circuit. The two science systems are necessary to each other. Nation-state ordered science draws on cutting-edge global knowledge and networks. At the same time, global activity is sustained by the ordering and resourcing of science in national systems, and in organisations nested in nations, especially research universities. Though less than a fifth of R&D spending is in higher education (OECD 2021), 85% of papers have at least one university author (Powell, Baker, and Fernandez 2017, 2, 8-9). Universities and their scientists have long worn both national and global hats. What has changed is the weight of activity in the global scale.
The purpose of the present paper is to contribute to understanding of the evolving global science system. This system both transcends nation-states, and constitutes a field of comparison between them, though it is constrained by its bibliometric borders. The paper also reflects on the potentials and limits of the (primarily positivist) methods used in studies of science that analyse bibliometric data, including the widespread but flawed application of these methods to performative national comparisons of global science. The paper argues that large data-based inquiry is part of the explanation of global science, not the whole of it. This should be combined with other methods, including historical-synthetic theorisation.
The paper begins with inclusions and exclusions in global bibliometrics, and comments on scientometrics. It then discusses the tendencies that show in the global data: growth of global papers, the spread of scientific capacity, growth of co-author networks, and multipolarity in science power. The section that follows critically reviews national comparison studies in scientometrics. The conclusion outlines a preferred approach to understanding global science.

Bibliometrics and scientometrics
The bibliometric collections Scopus, owned by Elsevier, and Web of Science (WoS), owned by Clarivate Analytics (Waltman 2016) can be accessed both directly and through secondary data sources, such as the US National Science Bureau's biennial science and engineering indicators that use Scopus (NSB 2020), and the Leiden University (2021) ranking that uses WoS. Scopus and WoS have normative, practical and empirical-analytic functions. They set the boundaries of recognised global knowledge, provide the content of networked epistemic collaboration and exchange, and source the investigation of global science. Bibliometrics enable the categorisation and analysis of papers, authors, scientific groups and citations by discipline, topic, institutional affiliation, demographic characteristics and geographic location. This large data set is worked for many purposes, scientific and performative.

Bibliometric inclusion and exclusion
Bibliometrics give material form to global science as an epistemic network of persons and knowledge. That epistemic network reciprocates, providing the content and purpose of bibliometrics. The global system of science is constituted both by acts of production and collaboration, and acts of recognition such as inclusion and citation, which legitimate and delegitimate knowledge. All in their own way are acts of power, but whereas scientific production is open and without limit, scientific selection enables ordering and hierarchy. The inside/outside boundaries imposed by Scopus and WoS regulate academic norms of valid output and quality, government definitions of performance in science, and industry perceptions of useful knowledge. The regulatory function of bibliometrics is exercised by commercial companies that specialise in information and publishing, working with professional, scientific communities. Between them, they determine the norms governing global science. Changing those norms means changing the networked global system.
Currently, a large volume of scientific and other disciplinary knowledge falls outside Scopus and Web of Science. English is the first language of 5% of global population (Ethnologue 2018), yet in 2018, 95.37% of WoS publications and 92.64% from Scopus were in English (Vera-Baceta, Thelwall, and Kayvan 2019). Consider, for example that Chinese scientists published 528,623 papers in Scopus in 2018, nearly all in English (NSB 2020, Table S5A-2), but 447,800 papers in the Chinese Scientific and Technical Papers and Citations data base, nearly all in Chinese, including 21,605 papers in Chinese medicine (CSTPC 2021), which is of global interest but largely inaccessible to non-native speakers.
In the social sciences and humanities, national contexts, cultural phenomena and issues are often central. Discussion is usually in national languages. However, there are no standard translation protocols that bring non-English papers into global English, and book translations from English to other languages outweigh translation from other languages into English by a factor of 10 (Naravane 1999). Even in relation to work in English, Scopus and WoS exclude many humanities and non-quantitative social science journals and nearly all books, often primary content in these disciplines. 'Global' social science in bibliometrics is mostly focused on Anglo-American countries (Marginson and Xu 2021). Using China again as the example, in 2018, Chinese social scientist scholars published 5,486 papers in English in Scopus, often using Anglo-American theories, concepts and topics (Xu 2020) and 53,300 papers in Chinese (CSTPCD 2021). The academic world knows US society much better than society in China or India, yet these countries are as important as the US.
This problematic invokes Foucault's (1975) question: what is the regime of truth? Behind Clarivate Analytics and Elsevier, and the scientific communities with which they exchange recognition, lies a long story of epistemic, linguistic, institutional and political-economic power. Beigel (2014) notes that the bibliometric collections use Anglo-American templates for codifying and circulating knowledge, such as the journal form. Editors are mostly Anglo-American-European and expect 'Northern' theories and methodologies. The definition, validation and selection of knowledge are legitimated and reproduced by Anglo-American universities, which dominate global rankings because they validate the knowledge they produce. It is not just about unequal resources, or output volume, it is about recognition. 'Scientific research in the non-OECD world generally suffers from a lack of visibility and prestige' (Vessuri, Guedon, and Cetto 2014, 650). Beyond that, most endogenous (indigenous) knowledge is excluded altogether (Connell 2014, 211-212).
Global knowledge is an outgrowth of Euro-American (primarily Anglo-American) content, reproduced by Euro-American agents and processes. It is surprising that in a multi-polar world there is little demand for publishers and bibliometricians to use multi-lingual translation, which software is bringing within reach. Theorised explanation might lie partly in Gramsci's (1971) idea of hegemony. Consent to authority can have cultural-linguistic roots as well as political-economic roots. The two kinds of power seem to be out of phases.

Limitations of scientometrics
The scholarly field of interrogation of bibliometric data is titled, perhaps unfortunately, 'scientometrics' (for field reviews see Mingers and Leydesdorff 2015;Patelli et al. 2017). Approximately 200 papers in scientometrics were read for the present paper (Marginson 2021a). Scientometricians seek to establish patterned regularities and identify causality in science in positivist fashion by establishing numerical correlations between variables. Some use social network analysis, a set of methodological techniques, primarily quantitative, for establishing relations between units (Scott 2017, 2), for example network diagrams that are used to map linkages between scientists, institutions and countries, or to map flows of knowledge via citations within scientific conversations in defined sub-fields and topics.
Scientometrics can provide partial glimpses of aspects of global science. Some are drawn on in this paper. However, the powerful quantitative tools of scientometrics encourage a fixation on that which can be measured and neglect of that which cannot. Scientometrics is often used to explain more than it can. The scientometric testing of variables cannot enable a comprehensive theorisation, yet few scientometric studies (Wagner and Leydesdorff 2005 is one) supplement data analysis with theory. The shallow ontology is often compounded with academic boosterism. Gaps in the explanation are addressed by presenting correlations in suggestive fashion, inferring causality without claiming it, or by noting scepticism about causality but presenting correlations as default (see Cimini, Zaccaria, and Gabrielli 2016, below). Some authors also develop interpretive narratives, rather than theorisations, in the discussion sections of papers. These narratives usually lack empirical or theoretical foundation but gain a certain referred authority from the rigour of the preceding data work.
Bibliometrics and scientometrics are also taken where they should not go. Many studies slip into datafication (Sadowski 2019), whereby real objects and practices are transformed into digital data, which are then represented as an abstract system of value (e.g. citations are seen as uniformly positive and each citation is given a unit value). This blocks from view the specificity and diversity of context, agents and purposeswhile creating tools, grounded in false universals, that are widely used to compare institutional and national performance.

Growth of papers
The first empirical trend in global science is its rapid growth. Between 2000 and 2018, papers in Scopus increased from 1,071,952 to 2,553,959 (Figure 1), average annual growth of 4.94% (NSB 2020, Table S5A-2), while world GDP grew more slowly by 3.52% per year (World Bank 2020). If a science country is defined as one with doctoral training in some disciplines, enabling self-reproduction, and citizens who publish 5000 Scopus papers a year, weighted for author share, between 2000 and 2018, all established science countries except Japan saw marked growth in output, and most newly emerged science countries saw very rapid growth (see below). The Leiden University (2021)  The growth in bibliometric output loosely correlates to expanded national system capacity, as shown in the government funding of research in universities and public institutes and the researcher workforce. For example, in China from 2000 to 2018, funding for higher education research multiplied by 10.24 in constant prices, while Scopus papers by Chinese authors multiplied by 9.96. Correspondingly, Japan's funding was multiplied by 1.03 and papers multiplied by 1.02 (NSB 2020, Table S5A-2; OECD 2021). Between 2000 and 2015, the number of doctoral graduates grew by 2.9% per annum in the US, 5.7% in the UK and 10.9% in China (NSB 2020, Table S2-16). In Germany, equivalent full-time researchers in higher education grew from 67,087 in 2000 to 114,868 in 2018 (OECD 2021).
The issue requiring explanation is why the accelerated growth of capacity and output. Within science, is it a function of bibliometric inclusion or growth in knowledge itself and/ or the collective scholarly activity that nurtures it? Beyond science, is it driven by intensified national competition and investment or by an expansionary dynamic in networked communication that stimulates national investment at the scientific nodes?
There has been growth in the number of journals, and papers per journal (Powell, Baker, and Fernandez 2017, 5), and journal inclusion has expanded, especially in Scopus. However, these factors are not sufficient to determine the growth in measured output. Studies that focus on national performance in science often see science growth as a function of economic growth, or vice versa (e.g. King 2004). The state's role is to secure a virtuous causality between autonomous global science and capital accumulation. However, it is never explained how this causal relation is mediated. After all, little science is produced on a profit-making basisand nation-states do not directly drive scientific output or govern global science networks. Figure 1. Total science papers in Scopus (right-hand axis) and proportion (%) that were internationally co-authored and solely nationally co-authored (left-hand axis), world: 1996-2018. Nationally coauthored papers are collaborations between authors from the same country and different institutions. Internationally co-authored papers involve authors from more than one country. Source: Author from Scopus data in NSB 2020, Table S5A -32 In The Rise of the Network Society (2000), Castells theorises the intrinsic expansionary dynamic of Internet-based networks. This idea has salience, provided there are motivations to connect. After 1990 global science moved from a miscellany of episodic contacts and shared work to a structured relational system, with instantaneous communication, in which commonly-held knowledge was the basis and outcome of cooperation. In networks, each new node is added at negligible cost and enhances the value of all existing connections. Networks expand towards all possible nodes while intensifying their existing links. The drive to inclusion, ease of connections and visibility of outputs all facilitate scientific growth (Wagner, Park, and Leydesdorff 2015). This helps to explain the observable synergy between national science-building and the growth of global science. National investment resources the globally networked activity. The ever-expanding pool of global knowledge catalyses and quickens national and local infrastructure so they can access global science. Nevertheless, this raises further questions. What holds together global and national science? What are the drivers and the potential tensions? Is there any constraint on the social logic of continuous expansion?

Diversification of science countries
The second empirical trend is the spread of scientific capacity to ever more countries. In 1960, the US accounted for almost 70% of global R&D. World science, organised in separated national systems, developed primarily as a duopoly of the US and Western Europe. The duopoly was linked to Japan, the Anglophone settler states, and Israel, though Soviet science and technology were largely decoupled from it. The situation is now very different. The identifiable global system has been accompanied by a wide diversification of national scientific capacity connected to the global. In 2018, there were 53 systems with doctoral training and 5000 papers by citizen scientists (NSB 2020, Table  S5A-2), and the US proportion of global R&D had fallen to 25% (Flagg, Toney, and Harris 2021, 2).
In 2000, countries classified as high-income by the World Bank published 84% of papers. By 2018 this proportion was 56%. Upper-middle income countries such as China, now the largest producer of science in terms of paper volume, Brazil and Malaysia, lifted their share from 12% to 34%. Lower-middle income countries, including India, now the third-largest producer, rose from 4% to 9% (NSB 2020, Figure 5A-1). Figure 2 lists all nations producing more than 5000 papers in 2018. In systems above the horizontal line, the annual growth of papers in 2000-2018 exceeded the world average of 4.94%. Of the 27 fast-growing countries above that line, 13 had a per capita PPP GDP below the world average, again underlining the fact that science is no longer confined to relatively wealthy countries. High citation science has also diversified. In 2006-2009, there were 30 national systems with universities producing more than 100 papers in the top 5% by citation. In the 2016-2019 count, there were 40 such systems (Leiden University 2021).
World-systems theory, which is cited from time to time in studies of science (e.g. Olechnicka, Ploszaj, and Celinska-Janowicz 2019), explains global science in terms of a worldwide division of labour in which the Euro-American core is absolutely dominant and science in 'peripheral' and 'semi-peripheral' countries have limited potentials. While this may apply to the cultural content of global science, which, as noted is Euro-American, it applies less to scientific capacity and output. Wagner, Park, and Leydesdorff (2015) argue that the open network structure of global science encourages new entrants. They identify a decline in 'network betweenness', meaning that a lesser proportion of 'edges' (links between nodes of the network) pass through the leading countries (12). These do not gate-keep within the system. Choi (2012) finds that the fastest-growing relation in global science is 'periphery to periphery' networking (39).

Collaborative knowledge production
The third empirical trend is the growth of international collaboration in science.
Knowledge formation is both individual and collective, as Vygotsky (1978) shows in his work on infant speech development. All intellectual innovations rest partly on the ideas of others. Creativity is often articulated through teams, and knowledge cannot be recognised until it is communicated. Yet bibliometrics settles authorship of scientific works on named individuals and combinations of individuals. Likewise, patent recognition settles absolute ownership on individuals and corporations. When knowledge is a discrete possession, its attribution is determined not only by cognitive merit but by social and institutional hierarchy. Arguably, these individualised categories do not wholly capture the ambiguity and fluidity of the imagination, and the shared flows of knowledge across blurred boundaries. Not all aspects of international collaboration are visible in the data on global science (Katz and Martin 1997). Two factors are visible and are frequently measured. These are the international citation of papers, which signifies recognition and a non-reciprocal cognitive transfer, and the international co-authorship of papers, which directly identifies a social relation.
In relation to citation, between 1996 and 2014, authors in most countries increased the proportion of their citations that were to international papers. In the US this rose from 42% to 56%, and in 2014 it exceeded 75% in most of Europe (NSB 2018, Table A5-42).
In relation to co-authorship, Figure 1 demonstrates a major growth between 1996 and 2018 in the proportion of Scopus papers co-authored by scientists with colleagues outside their own institution. (Such collaborations include international doctoral students and other mobile researchers. A joint paper by a German professor and her Chilean doctoral student in a German university is split 0.5/0.5 between the countries on the basis of authors but is 1.0 German on the basis of institution). Between 1996 and 2018 international co-authorships increased from 12.4% to 22.5% of papers, compared to just 1.9% of papers in Web of Science in 1970 (Olechnicka, Ploszaj, and Celinska-Janowicz 2019, 78). From 1996 to 2018, the proportion of papers with authors at two or more institutions in the same nation rose from 35.1% to 44.4% (NSB 2020, Table S5A-32). The growth in national co-authorship receives less attention does international co-authorship, but both indicate the expansion of networked science.
There are significant variations by discipline in international co-authorship (Winkler et al. 2015, 129-130). Research programmes that share large equipment such as telescopes, synchrotrons or particle accelerators in physics (Jang and Ko 2019) or deal with subject matter that is intrinsically global, such as climate change, water management or epidemic disease, encourage co-authorship. The proportion is highest in the natural sciences. In 2016, 54% of all papers in astronomy entailed international collaboration, and 20-30% in geosciences, biological sciences, mathematics, physics and chemistry. The rate was 19% in medical sciences, 18% in engineering, and 15% in the social sciences. It was increasing in all disciplines (NSB 2018, 122).
Likewise, in research-intensive universities in almost all countries, the international proportion of papers has grown. In the fifty universities with the largest number of high citation papers in Scopus in 2016-2019 (see Table 1), the incidence of cross-border papers typically rose by 10 percentage points or more in the previous decade. Two thirds of the papers in the leading universities in non-US Anglophone nations, Europe, Singapore were international. The US universities, led by Harvard with 40,877 internationally co-authored papers (52.0% of papers), were more internationalised than other US universities (Winkler et al. 2015, 129). In China, expanding national collaboration during accelerated system building slowed growth in the international share of papers (Leiden University 2021).

Conditions and drivers of collaboration
International collaboration is encouraged by national science systems, by grant conditions in Europe that foster teams (Kwiek 2020), and by individual universities via career incentives that reward global publishing (e.g. in China, Xu 2020). Nevertheless, bottom-up collegial relations and motivations are also essential to co-authored work. Though the Castellian theorisation explains the ease and excitement of networks, scientists must want to cooperate.
A large literature investigates or reflects on collaborative behaviours (e.g. Georghiou 1998;Birnholtz 2007;Winkler et al. 2015;Chen, Zhang, and Fu 2019). Do authors collaborate for shared knowledge-related reasons, which might be called cognitive accumulation? Are the drivers of science professional friendship and shared values, including the desire to advance the common good and the joy of shared breakthroughs? Are more self-centred goals in play, such as securing career advantages via 'preferential attachment' to senior colleagues (Wagner and Leydesdorff 2005), a notion from sociology that understands science as a vast field of individual competition? Arguably, all these factors in are play. More than one can be at work.
Earlier papers on science, such as the interview-based study by Melin (2000), emphasise shared values and trust. Explanations based on self-interest are dominant in scientometrics, such as desires for higher citation through international publishing and preferential attachment. Focus on these elements may be conditioned by the individualised nature of the bibliometric indicators that define scientific output. The narrative about preferential attachment is so pervasive that some scholars use it as a synonym for collaboration itself (e.g. Jeong, Neda and Barabasi 2003). This obscures the fact that the one common element in all epistemic collaboration, and the element that distinguishes intellectual work from other networked sociability, is cognitive accumulation. Co-authorship without preferential attachment is more plausible than co-authorship without shared cognitive processing. Motivations related to career, status (shared or singular), and intellectual formation are hard to separate. However, it is disturbing that the idea of scientific conduct is so readily dragged back to the old idea of 'possessive individualism' (Macpherson 1962), the Hobbesian order of fear, greed and glory, that pervades Anglo-American culture. There is more to knowledge building than this.
All of linguistic, cultural, historical, geographic and political proximities can encourage scientific collaboration across national borders (Chinchilla-Rodríguez, Sugimoto, and Larivière 2019; Graf andKalthaus 2018, 1200). The absolute volume of internationally coauthored work is expanding almost everywhere. Growth in the proportion of papers that are international is more variant. Figure 3 compares the proportion of papers entailing international collaboration in 1996 and 2018. Systems above the oblique line saw an increase in the international proportion. Thus happened almost everywhere, but more so in the longer-established Anglophone and European science systems. In newly emerging systems, characterised by rapid growth in national infrastructure and the number of potential domestic partners, there was pronounced expansion in both national and international collaborations. Accordingly, there was modest growth in the international ratio (e.g. in China, and middle-sized science systems in volume terms such as South Korea, India, Brazil) or a decline in that ratio (e.g. Iran). In some small emerging systems, not shown in Figure 3, the international ratio in 2018 was much higher, because cross-border collaboration had been used as a substitute for, rather than a partner to, endogenous growth. Chinchilla-Rodríguez, Sugimoto, and Larivière (2019) suggest that the epistemic presence of national scientists in the global system, as indicated by levels of first authorship and citation, is correlated to the level of investment in national science.

Multi-polarity
Relations of power in global science are evolving rapidly. Figure 4 indicates the volume shifts between 1996 and 2018, from output dominated by the Euro-American duopoly to a multi-polar system in which China has the most papers. While universalised bibliometric data on outputs, citation and collaborations are insufficient to explain the changes, they can help in building a more nuanced and integrated explanation. This must include four elements.
First, the continued leading position of Anglo-American (especially US) scientists in most domains, as shown in high citation papers, centrality in networks, and, as noted, norms and processes in science. Second, the integration and strengthening of scientific capacity in Europe. Third, the explosive growth of science in China, and more generally in East Asia where paper output now exceeds Europe, despite the steady-state in Japan. China and Singapore have reached US levels of high citation papers in some disciplines. Fourth, the rise of the above-mentioned nation-building middle-sized systems, South Korea, India, Brazil and perhaps Malaysia, all located outside the Euro-American duopoly and less intensively networked into the duopoly than the duopoly countries are with each other. As well as growing basic science, South Korea is an exceptional investor in industry research. Its R&D as a proportion of GDP at 4.64% in 2019 was second highest after Israel (OECD 2021).

Rise of Chinese science
The largest single change is the rise of China (for extended discussion, see Marginson 2021b).
Science is not one network, but a looser combination of disciplinary and cross-disciplinary networks, and China's global weight varies by discipline (Table 2). Competition between nations in science and technology is primarily focused on physical sciences STEM disciplines, where Chinese scientists authored one paper in four in 2018. STEM output also grew rapidly in South Korea, Singapore and Iran. (Singapore is with 'East Asia' because its political and educational cultures are underpinned by a Sinic heritage). In China from 2000 to 2018, engineering papers increased from 13,777 to 134,542, and computing from 3981 to 69,932. The nation also led in chemistry, materials and physics. Europe led in volume in biological, biomedical and health, with the US not far behind. China's share was rising sharply but average citations were low (NSB 2020, Tables S5A-3 to S5A-16).
Citations do not conclusively indicate quality, but they signify epistemic recognition. China's high citation science improved more quickly after targeted funding in 2011 (Hu 2020). In 2016, 1.12% of its papers were in the top 1% by citation and China exceeded average EU citations in five disciplines (Table 3). China's growth of high citation science in the Leiden data suggests further advance after 2016. In the US, between 2010 and 2016, the ratio of top 1% papers fell from 1.95% to 1.88% but remained well above China and the EU. China led only in mathematics and statistics. There were large concentrations of top 1% work, relative to total papers, in UK, Switzerland, Netherlands, the Nordic countries and Singapore (2.97%) (NSB 2020, Figure 5A-9).
China's successive 211, 985 and Double-World Class projects have focused on selected universities. In high citation global science, in internationally comparative terms, China is stronger in its leading universities than the system as a whole. In Table 1 (above), showing the top 5% papers in 2016-2019, a total of 24 of the first 50 universities, almost half, were from the US and Harvard towered over the field. Another 11 were Anglophone. Just 2% were from continental Europe, which has smaller universities, but 13 were from East Asia, including 11 from China. Tsinghua was sixth in the world. Seven years earlier, in 2009-2013, East Asia had only Tokyo at 36, National University of Singapore at 47, and none from China (Leiden University 2021).
There is no precedent, in any national system, for growth and improvement of science at the rate achieved in China and given China's scale its development is globally  transformative. Figure 5 presents the trajectory of high citation science in universities from East Asia, Anglo-America and Western Europe, selected on the basis of top 5% papers in 2016-2019 (see Table 1 for paper volumes). Between the Leiden counts of 2006-2009 and the 2016-2019, a 10-year period, top 5% papers grew by 14.62% a year at Tsinghua, and 24.62% a year at Huazhong. Parallel growth in Europe was modest, and there was little change in the US (Leiden University 2021).
Tsinghua led all universities in the world in the Leiden University (2021) list of top 5% papers in physical sciences and engineering in 2016-2019, and also the parallel list in mathematics and computing. Combining the two lists, Tsinghua was number one STEM university, well ahead of MIT. The only other universities in the top 10 in both lists were Zhejiang, Harbin IT and Huazhong UST in China and Nanyang in Singapore. However, in biomedicine and health, the top ten universities in 5% papers were all Anglophone, with seven from the US. The first Chinese university was Shanghai Jiao Tong at 42.
The exceptional growth of basic science in China and Singapore, and the rapid growth in South Korea, Taiwanand before that in Japanraises questions for theorisation. Why the pattern of accelerated development? What are the respective roles of national investment and global collaborations? What are the resources and drivers in collegial communities? Is the skew in favour of STEM a passing phase or does it have deep cultural roots? Marginson (2011Marginson ( , 2021b) postulates a distinctive state-driven East Asian approach, focused on catch-up to the West, grounded in a Confucian ethic of continuous reflexive improvement, that has been especially effective in building of universities and published knowledge. On the darker side, it has been suggested that China's science building is sustained by hyper-performative university cultures fostered by the partystate and undermined by corrupt practices (e.g. Yang 2016). Arguably these limitations are not confined to China, and they negate neither the agency of Chinese scientists nor their expanding global contribution. A larger problem in China is the political constraints affecting the social sciences and humanities.
Scholars in scientometrics have been slow to grasp the growth of science in China (aside from scholars with Chinese names). Social network analyses of nation-to-nation patterns of collaboration find that scientists in China, Japan and South Korea, and also in India and Brazil, have lower global 'centrality' in the technical sense than Euro-American science (e.g. Zhang, Rollins, and Lipitakis 2018). Graf and Kalthaus (2018) imply that this lesser international connectedness, combined with the rapid growth of national co-authorship, means that quality is lower than in the duopoly countries: 'Asian countries … do not fully exploit their knowledge sourcing potentials' (12). Olechnicka, Ploszaj, and Celinska-Janowicz (2019) develop a sharper critical narrative of China. They claim, with little discussion, that 'the Confucian culture does not support collaborative behaviour', or 'critical thinking', and the Chinese state inhibits collaboration (155-156). However, it is a mistake to expect science in emerging systems to necessarily follow Euro-American patterns and trajectories, as if there is only one pathway.
China has been especially active in international collaboration with science in one country, the US. The 55,382 joint US-China papers in 2018 constituted much the largest such national coupling in science. Time will tell whether this survives American policy efforts to decouple China-US relations (Lee and Haupt 2020).
Global science data in national comparisons of performance Bibliometric data and scientometric analyses are widely and problematically used to compare scientific performance. The three most prominent global university rankings are partly composed from bibliometrics, enabling the context-free calibrations used to norm and order institutions. It is an extreme example of the de-historicised use of global science data. Rankings as datafication are critiqued elsewhere (e.g. Moed 2017). However, the main comparisons used in scientometrics are not of institutional performance but national systems. Cimini, Zaccaria, and Gabrielli (2016, 200) make a claim for a 'science of science policy', which is primarily concerned not to map the global space but to compare one nation with others. This work focuses on the quantity and quality of scientific outputs, and outputs in relation to finance. Positivist comparative studies test statistical correlations between separated variables in order to identify 'causal' relations that might be suggestive for policy (Abramo and D'Angelo 2020 review the field). However, national comparisons in scientometrics are a fraught exercise.
The foundational studies in the comparative genre are by May (1997) and King (2004). In specifying limitations, May states that 'the above comparisons are to a degree confounded because a large and growing fraction of scientific work involves international collaboration'; and 'there is an English language bias in the ISI [WoS] database, both in the journals included and in patterns of citation' (795). Both points are often repeated. For example, Bornmann, Adams, and Leydesdorff (2018) comment that once collaboration growth, mobile researchers, multiple institutional affiliations, and multiple international citations, are taken into account 'it becomes increasingly difficult in bibliometric analysis to separate clear country effects' (942). However, this has not disrupted the standard practice of splitting relational data on co-authored papers on an arbitrary basis between nationswhether or not the contributions are equivalent and whether or not the national contexts matter at all.
In this process, global science becomes re-imagined, from a system in itself to a mosaic of separated national systems. This conceals not only its interconnectedness, but its existence.
Another flaw is generated by datafication. To compare national systems for performative purposes, scientometrics needs indicators with constant value across all cases. Though the meanings of scientific relations between agents are always contextualised, bibliometric data on citation and international collaboration are used as abstract universals for analytical purposes. This is sustained by a normative narrative about virtue in science. First, the value or 'quality' of a paper, or a national science system, is seen as proportional to its citations. Second, as internationally collaborative papers mostly have above-average citation rates (see e.g. Khor andYu 2016, 1096;Ronda-Pupo 2019, 1049), when the rate of international co-authorship is higher, the system/university is seen as better in quality. The statistical correlation becomes causal.
There is a double leap in logic (citation = quality; correlation = causation). This makes some uneasy, but perhaps not for long. Cimini, Zaccaria, and Gabrielli (2016) begin by stating 'it is necessary to point out that the presence of a possible cause-effect relationship between scientific success and international collaborations is still an open issue' (201). Nevertheless, by the end of the paper, the claim about causality is unqualified: 'internationalisation emerges from our analysis as an additional fundamental parameter for the scientific development of nations' (210).
However, the standard narrative lacks adequate empirical support. First, citations cannot provide a currency for valuing knowledge or its creators. Tahamtan and Bornmann (2019) review 41 studies of citations. They find 'a paper may be cited for very different scientific and non-scientific reasons' (1635; also Patelli et al. 2017Patelli et al. , 1230. Are citations expressions of agreement, cognitive debt, legitimation, field identity, status building, mutual support, or national or institutional identity? Single citations can incorporate multiple reasons. Second, the relation between international publication and citation is not statistically constant. Kwiek (2020, 16) notes that in five countries in 2018, including the US and Chinaimportant casesdomestic only co-authored papers received higher rates of global citations than international papers. In those international collaborations where emerging system authors are first named, citation is often lower (Chinchilla-Rodríguez, Sugimoto, and Larivière 2019, 12).
Third, co-authorships, like citations, have many meanings. Partners can be weaker or stronger, less or more determining of the knowledge produced within an international collaboration. The proportion of papers that entail international collaboration is determined by the denominator as well as the numerator. Are systems, or universities, less internationalising, when nation-only papers expand at the same time as international papers? Not necessarily.
Despite these problems, the fiction that citations and co-authorships have a constant value, independent of context, has proven highly resilient. This is not surprising. Without indicators of universal value, the process of performative comparison would collapse. Positivist data analyses in scientometrics would be emptied of much of the meaning ascribed to them.

Conclusions
Despite the fecund materiality of global networks and the dynamism of science growth, the global scale in science is not well understood. 'Methodological nationalism' (Wimmer and Glick Schiller 2002) means that it tends to be seen as an outgrowth of national science. Yet, the continuing expansion of global science, led by professional scientists, is partly decoupled from nation-states and beyond their control (Wagner, Park, and Leydesdorff 2015). Science and its organisation are more global in character than are economies, where worldwide convergence has slowed since 2010 (The Economist 2019). A key question for social theory is why the global scale is especially potent in relation to knowledge. Another question is about the factors enabling the autonomy of global science, which may be threatened by US-China tensions (Inkster 2020).
The global system settings are moving. There is a growing disjunction between, on one hand, multi-polar capacity in political economy (Pieterse 2018) and in science output; and on the other hand, the forceful Euro-American dominance of the contents of knowledge and of the mechanisms for calibrating its value. Arguably, China and Singapore have succeeded within the global system by doing Western science. Why is the global in science and knowledge still neo-imperial in cultural terms? Will a more multi-civilisational approach emerge in future? Amid plural paths to modernisation, and the momentum for decolonisation, can bibliometrics and professional scientists continue to lock out the emerging powers? More radically, could bibliometrics be opened up to an 'ecology of knowledges' (Santos 2018)? No one knows the answers to these questions, but the situation is unstable and open to change.

Critical science studies
Scientometric investigations are insufficient to investigate and explain global science. The present paper seeks to contribute to a more plural approach. To understand global science comprehensivelyas knowledge, as embedded in society, and including relations of powerit is necessary to move not only beyond positivism but beyond single-lens approaches. What is required is a synthetic-historical explanation that combines several disciplines, as in, say, Braudel (1992) and the Annales school, and integrates multiple data and theorisations.
The critical realist ontology (Bhaskar 1975) provides a starting point. For critical realism, science is a complex domain in which internal and external elements interact, systems are open over time, phenomena vary with context, and evolutions are neither linear nor patterned by statistical regularities. The regularities, which are the goal of positivist methods, the truths they seek to discover, require closed systems. This happens only under limited space/time conditions. But societies are always changing: phenomena are both existing and emergent, and solely data-based trends in science present an unduly static picture. In any case, global science cannot be exhaustively described and analysed using data.
A key difference between scientometrics and critical studies of science is that for the latter, theorisation is central.
For the critical realist, explanation necessarily includes interpretation. Not all social structures or historical causes can be observed directly. Sayer (2000, 12) distinguishes between reality, including social relations, and those aspects of reality apprehended in empirical observation. 'Observability may make us more confident about what we think exists, but existence itself is not dependent on it'. No single body of theory answers every question about global science. This paper has drawn on thinkers as varied as Castells, Foucault, Gramsci and Vygotsky. The point is that it is essential to look below the surface.
The material global practice of science is defined and regulated by the bibliometric collections. They are not illusory. Scientometric analyses of these data can generate insightswhen shorn of performance regulation and essentialist claims and used as one part of a larger set of theorisations and empirical inputs, including qualitative studies of science networks. The mis-use of global science data, and the effects in shaping behaviour, are so destructive that it can be difficult to remember that the same data can contribute to explanation. However, scientometric investigations ought to focus primarily on what bibliometrics enables them to do best, which is the study of the trans-border global science system itself, rather than focusing primarily on national chunks artificially carved out of that system for analytical purposes.
Some scientometric studies provide illuminating investigations of global science. Hennemann, Rybski, and Liefner (2012), Chen, Zhang, and Fu (2019), Wuestman, Hoekman, and Frenken (2019) and Wagner, Whetsell, and Mukherjee (2019) focus on scale and proximity. They find that locally based interactions are more fertile than electronically mediated interactions, especially in novel work such as cross-disciplinary research. Technology limits the sharing of tacit or implicit knowledge. However, spatial distance as such is irrelevant once communication is in the global scale. Using social network analysis, Citron and Way (2018) explore similarities in the evolution of co-authorship networks in different sub-fields, and the temporality of edges in networks. Larregue, Larivière, and Mongeon (2020) investigate global epigenetics. They trace epistemological paths, connections and breaks, and identifying institutional concentrations, and evolving patterns of collaboration. They analyse both the conceptual apparatuses (field identity) of researchers and their thematic or topic interests. The authors note that their macro quantitative approach means that 'we might here and there lack some analytical depth and nuance'. They propose to explore finer-grained investigations of content themes, and qualitative interviews (23).
In place of context-free mega-national comparisons based on datafication, national science can be investigated via context-rich studies of single systems. These can then be compared with each other and similarities and differences identified. Nations directly foster national capability in global science by building resources and connectedness at nodal points in the global scientific network, as China has done. However, while the engagement of nation-based scientists in the global system is part of any national case study in science, national science cannot be read solely from global bibliometrics. It is equally important to mobilise data on national resources, activities and outputs that fall outside the global system.

Disclosure statement
No potential conflict of interest was reported by the author(s).