A scientometric study of three decades of machine translation research: Trending issues, hotspot research, and co-citation analysis

Abstract This study aims to examine machine translation research in journals indexed in the Web of Science to find out the research trending issue, hotspot areas of research, and document co-citation analysis. To this end, 541 documents published between 1992 and 2022 were retrieved and analyzed using CiteSpace, and Bibexcel. Many metrics were analyzed such as document co-citation analysis, sources co-citation analyses, authors’ keywords analysis, and Hirsch index. Data were coded and filtered to include research related to machine translation from the perspectives of language and translation studies. We identified 11 clusters that represented the hotspot research during the period of almost three decades of research. We also discovered that a significant focus of research in machine translation centered around enhancing the translation process through the implementation of neural networks integrated with artificial intelligence. Additionally, we observed the incorporation of human post-editing as a means to refine and improve machine-translated outputs. We found that translation studies journals were the most highly co-cited journals and Google translate was the most highly used machine translation. This study highlights the trending issues and hotspots in machine translation research within language and translation studies. The integration of neural networks with artificial intelligence and human post-editing emerged as prominent areas of focus for enhancing translation quality. The findings of the current study inform future research and technological advancements in machine translation, guiding efforts to improve translation processes and outcomes.


Introduction
Machine translation (MT), a subfield of computational linguistics, is defined by the European Association of Machine Translation as any type of automated translation from one natural language to another (European Association of Machine Translation, n.d.).It is a form of language processing technology that enables the automatic translation of text and speech from one language to another.The main concept of MT is to build an automated system that can translate a natural source language (SL) to a target language (TL) through bilingual dictionaries, algorithm-based processes, and corpus and neural networks (Al Mahasees, 2020).This technology aims at enabling humans to quickly and accurately translate large amounts of text or speech in different languages with minimal effort.The increasing demands of translating thousands of documents in different spheres of life such as technology, industry, law, and medicine (to name just a few) make it hard for human translators to manually translate these documents from SLs into TLs.This calls for automatic translation, despite its shortcomings, to meet the dire need for customers to read products in their native language.Given the complexity of human language which contains different morphosyntactic, pragmatic, and cultural diversity, a precise output from automatic translation for some genres is almost unattainable (Vanroy, 2021).Although it is commonly acknowledged that human intervention in modifying machine translation (MT) output is minimal, it is recommended for a human translator to engage in pre-editing the text input and post-editing the resulting text.This practice aims to guarantee the accuracy and readability of the final translation product.This role is categorized as human-assisted machine translation, which refers to partial roles played by humans to modify the output produced by MT, leaving the job of translation to be performed by the machine as a principal translator (Quah, 2006).
There is no doubt that the output from MT is not fully accurate, depending on the type of text and availability of corpora used by MT.The advancement of artificial intelligence (AI) in the modern age to build neural machine translation (NMT), which was produced by Google in 2016 minimized errors and significantly improved the output.Despite the dramatic advancements of Neural Machine Translation (NMT) in the digital age, human translation remains crucial to ensure the accuracy and readability of the translated output.The potential of machine translation to automate the translation process, alongside its limitations in conveying the meaning of the target language text, has led to extensive research.This research aims to explore how machine translation can facilitate the translation process and address its limitations.The insights gained from these studies will guide future machine translation projects, providing valuable tips and directions to enhance the effectiveness of machine translation.
It is essential to synthesize the literature on machine translation (MT) to understand how MT research has evolved, the most pressing MT research topics over time, and the emerging MT technologies that are driving these research trends.This information can help us identify the current hotspots in MT research and inform future research directions.
MT has a long history that dates back to early 1950s by adopting word-to-word translation, which was resulted from the need to break communication barriers among military and intelligence communities (Al Mahasees, 2020;Andrabi, 2021).Three-stage process for translating between languages using computers was proposed: analysis (breaking down the source language into meaningful units), transfer (converting these units into their equivalents in the target language), and synthesis (reassembling them into a coherent sentence).This approach failed to provide a precise translation due to the complexity of natural languages.A decade later, the ALPAC report (Automatic Language Processing Advisory Committee) was launched and warned against relying on MT products due to the poor quality of translated outputs, describing the research done as biased and invalid.The report, however, recommended carrying out interdisciplinary research to address MT inaccuracy and progress text comprehensibility.The period of 1980s witnessed the inaugural of statistical machine translation (SMT) produced by IBM.The concept of SMT revolves around the utilization of algorithms that learn from human-translated texts to generate improved translation outputs (Mi et al., 2022).Many software and mobile apps were developed later after the 2000s, which were based on SMT technology such as text-to-speech and speech-to-speech mechanisms.In the 1990s, many research papers written by the IBM research team resulted in adopting a new approach, which is the release of bilingual corpora that provided rich resources for translation (Wang et al., 2021).In 2016, Google launched neural machine translation (NMT), utilizing the neural network to predict the likelihood of phrases, segments, and sentences based on AI technology, which has been proven to provide better accuracy for language translation than SMT (Tan et al., 2020).
Based on the historical development of MT, three major approaches were observed for MT development.First, the rule-based approach where the role of the machine is to provide a set of grammatical rules based on bilingual dictionaries built and grammar-detecting errors software.Second, statistical machine translation was built on sets of algorithms to train computer how to predict appropriate texts based on human translation corpora.The last approach is NMT which combines the advantages of the previous two approaches and provides more accuracy to translation outputs produced by MT (Gupta & Dhawan, 2019).The accuracy of MT depends largely on the quality and quantity of training data available for the algorithm to learn from.The more data it has access to, the better it will be able to understand the context and produce accurate translations (Lee, 2021).Additionally, some applications employ techniques such as post-editing or human evaluation in order to improve accuracy further by having humans review translations before they are released for use (Rivera-Trigueros, 2022).
Given the potential of machine translation (MT) and its continuous development, alongside the limited availability of language corpora for certain languages, there is a pressing need to investigate research outputs related to MT and its evolution.Conducting a scientometric study becomes crucial in order to identify the emerging issues and hot topics in MT research.This study aims to examine how research issues in MT have evolved over time and to uncover the prevailing themes and issues within the fields of language and translation studies.By analyzing the scientific literature, this study seeks to shed light on the advancements, trends, and research priorities in MT, contributing to a deeper understanding of the current landscape and paving the way for future developments in the field.Therefore, this study aims to find out the trending research issues by looking at different structural and temporal metrics such as centrality, burstness, silhouette, and sigma values.Network clusters will be visualized using CiteSpace and VOSviewer software.Multiple types of analysis will be run such as document co-citation analysis (DCA), word co-occurrences, and citation counts over time.The significance of this study is to identify the most emerging issues of MT research, and the most productive venues of publication, and to guide other researchers in their journey while looking for the most notable state-of-the-art MT research issues.

Scientometric
In the past few years, many scientometric articles were published in different fields of studies, aiming to quantitatively analyze scholarly publications by applying scientific methods to find out the trending research issues in the field and to identify how scientific contributions take place in the field (Lim & Aryadoust, 2021).Scientometric analysis is different from bibliometric analysis as the latter is concerned with the statistical analysis of books and written documents by looking at authors, institutions, and countries' productivity, which is not the focus of the present study.Scientometric analysis focuses more on quantitative methods that are applied to scientific literature to elucidate emerging trends of research in a field of study (Mohsen & Alangari, 2023).MT research has evolved not only in the field of translation studies but in different disciplines such as artificial intelligence (Voß & Zhao, 2005), and computational linguistics (Gupta & Dhawan, 2019).However, there is a paucity of research to synthesize the MT studies from scientometric perspectives in relation to the field of translation studies and language learning.

Related work
The field of MT has witnessed many systematic reviews, meta-analyses, and bibliometric studies that were proposed from information technology perspectives (Giménez & Màrquez, 2010;Kahlon & Singh, 2021;Mi et al., 2022;Tan et al., 2020;Wang et al., 2021).Other works are adopted in second language education and language learning (Klimova et al., 2022;Lee, 2021).Many bibliometric studies were published tackling different areas of applied linguistics and education in general (Aryadoust, 2020;Mohsen, 2021), and translation studies in particular (Dong & Chen, 2015;Huang & Liu, 2019;Rovira-Esteva et al., 2015;Van Doorslaer & Gambier, 2015).However, scientometric studies in the field of language and translation studies in general and MT, in particular, are still scarce.Few bibliometric studies were conducted to identify publication metrics (quantity) such as productivity of authors, sources of publication, and countries on one hand and international visibility metrics (quality) such as citation and co-occurrence on the other.Li (2015) analyzed publications of translation studies produced by Mainland China to find out the international visibility of Chinese contributions.Results indicated that authors from English-speaking countries had dominant global visibility.Another attempt was made by Dong and Chen (2015) who tracked global translation studies for the period 2000-2015 by extracting data from the Web of Science database.Using a bibliometric approach, the authors investigated the main research streams of translation studies and identified publication metrics over countries, authors, and publication titles.Three main research areas of translation studies were identified in the literature of translation studies pertaining to theoretical translation studies and interpretation and translation issues.
Several systematic review studies have addressed the integration of machine translation (MT) in foreign language education.Lee (2021) conducted a review of 87 studies published between 2000 and 2019, aiming to provide language teachers with an overview and guidance on the benefits and challenges of using MT in classrooms.The study observed a significant increase in MT research, primarily attributed to the advancements in Neural Machine Translation (NMT).The findings highlighted positive impacts on language accuracy and fluency.Similarly, Klimova et al. (2022) conducted a review of 13 studies obtained from the Web of Science and Scopus databases, focusing on the use of NMT to enhance language teaching and learning.The results demonstrated that NMT is an effective tool for developing both productive skills (speaking and writing) and receptive skills (reading and listening) in language acquisition.Additionally, NMT showed potential benefits for mediation skills related to translation.The research suggested that advanced learners of a second language (L2), who possess higher proficiency levels, are better equipped to critically evaluate the output of NMT texts.In contrast, beginners or lowerintermediate learners may have limited ability to assess the translations produced by NMT.Rivera-Trigueros (2022) conducted a comprehensive investigation of the literature on English-Spanish MT directionalities.The primary objective was to determine the most commonly utilized tools in previous studies and to assess the extent of human intervention required for achieving accurate translations.The findings, derived from 19 studies obtained from various databases, indicate that NMT, integrated with artificial intelligence, emerged as the predominant MT tool.Among the MT systems investigated, Google Translate was identified as the most widely used.Furthermore, both human and automatic assessments were employed for post-editing the MT outputs.
Concerning scientometrics of translation studies, we found only one article that is related to the theme of this paper, which was conducted by Zhu and Aryadoust (2022).They sought to study translation research during the 21 st century by applying a scientometric approach using CiteSpace software.The authors identified 10 clusters as hotspot issues in translation studies.Many temporal and structural metrics were recorded in terms of burstiness, document co-citation analyses, authors' productivity, and sources of publication.However, regarding the scientometrics of MT research in relation to translation studies, the authors of this study are not aware of a single study that has tackled this issue.This gap in the literature on translation studies motivates the researchers to carry out the present study.Therefore, this study attempts to address the following research question: (1) What are the trending issues of machine translation outputs addressed in the literature of translation and interpretation studies?
(2) What are the most influential highly cited articles and the top sources of publication in the field of machine translation (MT) research?

Methodology
In order to conduct a scientometric analysis, several steps were undertaken: (1) identifying and selecting reliable databases that contained relevant machine translation (MT) research aligned with our scientometric software, (2) determining appropriate keywords to retrieve research outputs pertaining to our target issues, (3) filtering the raw data to rectify misspellings, inconsistencies in words, synonymous terms, and errors in publication sources, (4) selecting the appropriate software for the scientometric analysis, and ( 5) identifying the various metrics to be analyzed.

Data sources
The Web of Science Core Collection (WSCC) was used as the main source to extract the data.The WSCC includes three subindexes: the Social Science Citation Index (SSCI), the Arts and Humanities Citation Index (AHCI), and the Science Citation Index Expanded (SCIE).We have chosen these databases because they are known for employing scientific rigor for indexing abstracts of scholarly venues of publication in different fields of knowledge (Mohsen & Ho, 2022).While many scientometric studies rely on data from either Clarivate Web of Science or Scopus, we limited our search to include Web of Science Core Collection as this database adopts stringent scientific rigor in indexing journals (Stahlschmidt & Stephen, 2020).Another reason is that data extracted from these indices are downloadable in different forms that match software such as Citespace and VOWSviewer.We limit our search to articles type and the timespan was set to cover the period 1992-2022.To narrow the scope of the selected studies, we refine the data to include research categories such as computer science, linguistics, educational and educational research, and social science-multidisciplinary.

Database and key terms
We based our study on key terms employed by previous systematic reviews on machine translation, with some modifications (Lee, 2021;Rovira-Esteva et al., 2015).The following key terms (machine translation, automated translation, google translate, automatic translation 1 ) were run in the search tab of the Web of Science Core Collection and the search was limited to its three indexes; SSCI, A&HCI, and SCI-EXPANDED.These three indices were selected as they underwent more scientific rigor than other indices.Only original articles were selected to be included in our analysis and the timespan was set out to cover published articles between 1992 and 2022.Other documents such as reviews and editorial.Corrections and letters to editors were excluded.The search was also limited to original articles published in English.The search was concluded on 20 March 2023, and the record resulted in 541 documents.

Software
Three software applications, namely CiteSpace (Chen, 2006), VOSviewer (van Eck & Waltman, 2017), and Bibexcel, were utilized to perform scientometric analysis, visualize network structures, and analyze trending issues over time.The file format extracted from the WSCC is compatible with the software used in the study.We download the data in the form of plain text to suit the analysis of the CiteSpace to analyze the data concerning references co-citation while we use authors and sources of publication using VOSviewer because nodes visualization in VOSviewer is clearer than that of CiteSpace.We used Bibexcel to record the Hirsch index (h-index) for the sources of publication and authors.

Data filtering
To ensure the validity of the extracted data from WSCC, the authors of the present study screened the records to find out whether the selected studies aligned with the research objectives.We excluded studies that did not examine machine translation from the perspectives of translation studies.Some full articles were checked if the abstracts did not match the exact scope of the studies.The articles that remained blurred were highlighted and an online meeting was conducted to reach consistency.Kappa's inter-rater reliability was .82.We also excluded studies that did not have abstracts or those that employed a meta-analysis design or editorials, despite the fact that some journals categorized them as original articles.Words' synonyms were amalgamated, misspelled authors and study titles were corrected, and publishers' titles were removed from the studies' abstracts.

Data analysis
CiteSpace 6.1.R6 Advanced (Chen, 2006) was used to perform a DCA on the datasets received from the WSCC.Examining the bibliographic data of these records, a number of temporal and structural metrics were performed to measure the correlation between the references used.
The first is Modularity Q (Q index), which indicates the degree of interrelatedness of the parts that make up the network (Newman, 2006).The burstness score measures the abruptly increased notice a reference gets in a given area over time (Chen et al., 2012).Next comes the silhouette metric, which shows the homogeneity of the nodes in clusters (Rousseeuw, 1987).The closer the score is to 1, the more united the cluster is.In addition, the betweenness centrality metric controls the spots of nodes within clusters and/or the network; Therefore, the nodes of high centrality references are typically positioned to connect other nodes and/or clusters (Brandes, 2001).
VOSviewer was employed to perform co-occurrences of author keywords, and analysis of sources of publication and references co-citation.Author keywords were reviewed and refined to eliminate the possibility of synonymous words receiving different results.The dataset was also checked for inconsistencies over co-cited journals as the software treats spelling variants as distinct entities leading to less valid results.For instance, the journal Machine Translation was cited in two different ways: Machine Translation and mach transl (Figure 1).Consequently, such variants have been standardized to ensure that more valid results are available.
Bibexcel software has been used to calculate author and journal h-indices within the retrieved data.A title change was detected at one publication source in 2017 (Editorial, 2017), resulting in articles for the same journal being published under different source titles (Figure 2) affecting the calculation of the h-index.To ensure accurate results when calculating the h-index for journals, we accounted for instances where sources of publications underwent title changes over time.In such cases, we consolidated the changes in journal titles and considered the most recent title.For example, the source title "Perspectives-Studies in Translatology" was updated to "Perspectives-Studies in Translation Theory and Practice" to achieve more precise results during the h-index calculation.

Clusters
We used CiteSpace to run a DCA, setting the timeframe to 1992-2022 with Link retaining factor and look back years options set to (−1=unlimited) in order to obtain all the possible outcomes of the aforementioned period.Out of 294 records, 713 nodes and 5253 links were detected, forming 68 unique clusters with a Modularity Q of 0.713 and an average silhouette of 0.8937.The visualization of clusters was confined to show only the largest 11 clusters that approach the leading research in Machine Translation over the past three decades (Figure 3).Table 1 summarizes the clusters information.

Citing authors and co-cited references
Bibexcel was used to calculate the h-index for the authors of the citing articles.Out of the 487 individual authors contributed to our pool of studies, Table 2 shows the first 12 authors with the highest h-index.
On the other hand, VOSviewer was run to identify the highly co-cited references (Figure 4).As shown in Table 3, Papineni et al. (2002) from cluster #2 is the highest co-cited reference with 36 occurrences.This reference also has the highest centrality score (0.20), which indicates the important ideas this publication has and the extent to which it impacted the field of Machine Translation research.In Figure 5 Papineni et al.'s (Papineni et al., 2002) node is bordered by a purple ring to highlight its centrality value.Burst detection analysis shows that Lee (2020) has received the highest abrupt attention scoring 6.14 from 2020 to 2022.Within the same duration, Garcia and Pena (2011) is the second strongest reference in burstness scoring 5.75 although it was published in 2011.Figure 6 summarizes the strongest 11 co-cited references in terms of burstness.Such references can be recognized in Figure 5 by red treerings surrounding their nodes.

Citing and co-cited journals
The analysis of the retrieved data from WSCC shows the supremacy of translation-specialized journals in both citing and co-cited journals.The top five references in the filed of translation studies are reported in Table 4.The 294 articles were published in 78 different journals.
Perspectives-Studies in Translation Theory and Practice is the most productive journal with 34 articles, and it has the highest h-index (9) as shown in Table 5.With the same h-index (9) Journal of Specialised Translation falls in second having merely 25 articles that met the inclusion criteria.Notwithstanding, the domain of Computer Assisted Language Learning journal is not in translation; it has the third highest h-index (7) contributing only nine articles to the pool of studies.Table 5 lists the top 10 journals with the highest h-indices and their production.
As for co-cited journals, the dominance of translation-specialized journals continues (Figure 7).Six of the ten highly co-cited sources of publication are specialized in translation.With 1339 different sources that have been cited at least twice, Machine Translation journal has the highest number of citations (334) as it is dedicated to publishing papers focusing on the subject of  6 contains the 10 most frequently co-cited sources of publication.

Author keywords
A total of 828 unique author keywords were identified by VOSviewer; most of them are represented in Figure 8. Results reveal that the abundantly used author-keyword(s) in our group of studies is Machine Translation with 91 occurrences.It has been used more than three times compared to post-editing (30 occurrences) that falls in second.The justification for the copious use of Machine Translation is believed to be that it refers to the essence of the field under analysis in this study.Other keyword(s) include some fields of translation and different technologies used in machine translation.Table 7 shows the top 10 most frequent keywords by authors and their occurrences.

Discussion
The present scientometric study attempts to synthesize MT research in terms of translation and interpretation studies by employing quantitative metrics to uncover trending research issues and map how they are investigated over time.Research demonstrates the evolution of MT, keeping abreast of its advancement and potential to improve the quality of translation.Results identified #1"human translation" clusters as the major cluster, indicating that MT research is conducted to investigate how it can simulate human translation and reach the same quality.The findings indicate that there is an emphasis on Statistical Machine Translation (SMT) through an examination of bilingual corpora and their potential to enhance SMT.Indeed, there was a shift from the focus of human translation as a substantial step to NMR in the last few years, which produces more accurate outputs than SMT.Such findings align with previous systematic reviews (Klimova et al., 2022;Lee, 2021;Rivera-Trigueros, 2022) which showed that more focus was given to NMT as a trending research issue in MT studies.The advancement of machine translation (MT) has undergone significant transformations, starting from a rule-based theory to SMT, pioneered by IBM, and later to NMT, which marked a substantial leap in translation quality.However, the latest trends in MT research are still not fully being explored and analyzed.Several clusters identify contexts for translation studies such as cluster #1 -Translation Education-which refers to pedagogical implications for using MT in education and language learning/teaching.This is evidenced by publications of many studies in many educational technology journals and language teaching journals that incorporated MT into language learning/teaching.Indeed, many studies were conducted to investigate the potentials of MT to aid foreign language education, and such affordances could boost second language learning and teaching.This shift was demonstrated by Klimova et al. (2022), showing that MT has improved both students' receptive skills (reading and listening) and productive skills (speaking and writing).Such advancements in MT technology have the potential to significantly enhance language acquisition and instruction.
The most frequent authors' keywords reflected the trending research issues of MT.It is noted that the words visualized by VOSViewer are related to the use of MT in translation studies.Words such as "post-editing", "translation quality", "neural machine translation", "evaluation" and "translator training" are all concerned with improving the quality of the final output text produced by MT.This is consistent with previous systematic reviews ( Rivera-Trigueros, 2022), showing that NMT was the most MT tools used by previous literature.The top two highly co-cited references were Papineni et al. (2002) and Snover et al. (2006), which investigated human evaluation of the final outputs of MT to improve its quality.It is also found that Google Translate recorded one of the prominent keywords used by authors in this selected research, indicating that Google Translate was the most dominant type of MT due to its inauguration of neural machine translation in 2016.As stated in our theoretical framework, Google created NMT in 2016 to improve the quality of MT by creating algorithms based on AI which predict subsequent phrases and segments, thereby avoiding ambiguities caused by synonyms.Furthermore, the analysis of the research conducted on this topic revealed that NMT was consistently one of the most commonly used keywords.This finding aligns with the conclusions drawn by Rivera-Trigueros (2022), highlighting that Google Translate, which utilizes NMT, emerged as the prominent form of machine translation employed by researchers in the fields of translation studies and language education.This indicates the widespread adoption and recognition of NMT as a powerful tool in these domains.Concerning co-cited sources of publication, translation-related journals were the top journals in terms of co-cited references, frequency of published articles, and harsh index (H-index).There are other venues of publications that recorded high metrics such as Computer Assisted Language Learning, which publishes research that manipulates cutting-edge technology with issues in language learning and language teaching.Moreover, Machine Translation Journal, which is still not included in SSCI, A&HCI, or SCIE, recorded the first source of publication that has the top co-cited journals.A possible justification is that this journal has a wide scope and publishes research on MT in different fields of knowledge such as translation studies, computational linguistics, and computer science.Findings indicate that two specialized journals recorded the top productive journals in terms of H-index which are Perspectives-Studies in Translation Theory and Practice and Journal of Specialised Translation.While these two journals score the highest metrics in terms of quantity and quality of the published articles in MT, the record is still not encouraging as the highest record of h-index is only 9. The same can be applied to the h-index of authors in MT studies, scoring the H-index 3 as the highest record, needing further research of MC in the field of translation studies and linguistics.

Conclusion
This scientometric study synthesizes the research that investigated MT as a cutting-edge technology used to refine translation work and how this technology was introduced to research, limiting the studies to journals indexed in SSCI, A&HCI, and SCIE.Research findings identified several clusters that represented the hotspot trending research in the field of translation studies.While the research represented in this article needs further elaboration in future research projects, the results of the current study shed light on many areas that were treated by the pool of selected studies that dealt with many development components of MT.Further, the authors' keywords used by the selected articles in this study reveal the researchers' attempts to improve the quality of the texts generated by MT.The research endeavour is to find solutions to the problems generated by MT to minimize human translation intervention to postedit the MT outputs.This is indicated by the most frequent keywords used by researchers to improve the outputs of MT by employing NMT builtin with AI technology that simulates human translation work.

Figure
Figure 1.Inconsistencies in a cited journal title.

Figure
Figure 2. A title change of a publication source.

Figure
Figure 8. Author keywords visualization.