How trustworthy is ChatGPT? The case of bibliometric analyses

Abstract The introduction of the AI-powered chatbot ChatGPT by OpenAI has sparked much interest and debate among academic researchers. Commentators from different scientific disciplines have raised many concerns and issues, especially related to the ethics of using these tools in scientific writing and publications. In addition, there has been discussions about whether ChatGPT is trustworthy, effective, and useful in increasing researchers’ productivity. Therefore, in this paper, we evaluate ChatGPT’s performance on tasks related to bibliometric analysis, by comparing the output provided by the chatbot with a recently conducted bibliometric study on the same topic. The findings show that there are large discrepancies and ChatGPT’s trustworthiness is low in this particular area. Therefore, researchers should exercise caution when using ChatGPT as a tool in bibliometric studies.


Introduction
ChatGPT is a highly developed large language model intended to respond to text-based queries and produce human-like natural language responses.It was developed by Open AI based on San Francisco, USA using generative pre-trained transformers 3.5 (GPT 3.5) and introduced to the general public in late 2022.However, just two months after the release of ChatGPT, it had an estimated 100 million active users, making it the fastest-growing consumer application in history.In March 2023, Open AI launched GPT 4 with additional features and capabilities.
There is already a considerable body of research on the use of ChatGPT in different areas, as illustrated by several recent review articles (e.g., Dwivedi et al., 2023;Sohail et al., 2023;Khosravi et al., 2023;Lo, 2023;Sallam, 2023).Many studies and reports have shown that ChatGPT is frequently used in academic writing, including essays, poems, stories, computer coding, and even technical writing.As ChatGPT produces text that resembles human writing and has lower levels of plagiarism than anticipated, a number of papers have recently been published addressing its impact on scientific writing.For example, Dowling and Lucey (2023) find that ChatGPT can be a highly useful tool and research assistant in finance research.
However, the capacity of ChatGPT to produce original writings has generated questions and difficulties for academic science.While some people are embracing ChatGPT because of the improved learning possibilities, others are raising concerns about ethical issues, trustworthiness, and misleading data.For instance, Sharples (2022) recommends that rather than forbidding the use of these AI tools, educators and students should be encouraged to use them to enhance learning experiences.Similarly, McMurtrie (2023) suggests that advanced tools like ChatGPT will soon become a part of everyday writing.
Other commentators have been more skeptical.For example, some researchers (Graham, 2022;Salvagno et al., 2023) have pointed out the risks involved in relying on ChatGPT's generated text or data.Despite mixed opinions about ChatGPT, its popularity continues to grow as it provides a powerful tool for generating high-quality written content.Scholarly writing is one of the fields most significantly impacted by ChatGPT.Several publications have explored the advantages and challenges of using ChatGPT in scientific writing (Biswas, 2023;Hill-Yardin et al., 2023;Koo, 2023;Omar et al., 2017;Salvagno et al., 2023) and have cautioned that, despite its impressive writing abilities, human judgment is still required (Kitamura, 2023).Some authors have even used ChatGPT to write entire articles and have assessed its reliability, plagiarism, and authentication capabilities in scientific writing (Cotton et al., 2023;King, 2023).The question of whether ChatGPT should be considered an author of scientific articles is currently being debated among scientific experts, and while some have given ChatGPT authorship credit (Salvagno et al., 2023), others have raised concerns (Lee, 2023;Stokel-Walker, 2023;Teixeira da Silva, 2023;Thorp, 2023).As a result, some leading journals like Science (Thorp, 2023), and The Lancet (https://www.thelancet.com/pb/assets/raw/Lancet/authors/tl-info-forauthors.pdf), have updated their guidelines regarding the use of ChatGPT and similar AI generated chatbots.
To this end, we conducted an investigation to evaluate how ChatGPT 3.5 performs when asked to write an abstract for bibliometric analysis, by comparing the chatbot's output with the results of a recently published bibliometric study on the same topic.In other words, this comparison helps evaluate the extent to which ChatGPT can be considered trustworthy and reliable in this type of research task.

Conducting bibliometric analysis using ChatGPT
We tasked ChatGPT 3.5 with creating abstracts for two bibliometric analyses to evaluate the accuracy and quality of its generated content.We selected a previously published study on curcumin in wound healing (Farhat et al., 2023) and requested ChatGPT to write an abstract using the same search string and database utilized in the original study.We created two versions of Query 1, one without any word limits and another limited to 300 words, while using the same search terms and database.The objective of the first query was to evaluate the Query 1 (a): Write a bibliometric review using the search keyword "turmeric" OR "curcum*" AND "wound" in SCOPUS database.
ChatGPT: Introduction: Curcumin, the active ingredient in turmeric, has been shown to have anti-inflammatory and antioxidant properties that may be beneficial for wound healing.This bibliometric review aims to provide an overview of the research landscape related to turmeric and wound healing.Methodology: A search was conducted on the Scopus database using the keywords "turmeric" OR "curcum*" AND "wound".The search was limited to articles published between 2000 and 2022.A total of 246 articles were identified, and bibliometric analysis was performed using VOSviewer software.Results: The analysis of publication output showed that the number of publications related to turmeric and wound healing has steadily increased over the past two decades, with a peak in 2021.The majority of publications were research articles, followed by review articles and conference papers.The top countries contributing to the research were India, the United States, and Iran.The analysis of authorship revealed that the most prolific authors in this field were Kottarathil Abraham Jacob and Madhulika Bhagat.The analysis of keyword cooccurrence showed that the most common keywords were "wound healing," "curcumin," "inflammation," "oxidative stress," and "antioxidant."The analysis of journal co-citations revealed that the most cited journals were "Phytomedicine," "Journal of Ethnopharmacology," and "Biomedicine & Pharmacotherapy."Conclusion: This bibliometric review provides a comprehensive overview of the research landscape related to turmeric and wound healing.The analysis reveals a growing interest in the use of turmeric and curcumin for wound healing, with a focus on their anti-inflammatory and antioxidant properties.The analysis also highlights prominent authors, countries, and journals in this field.These insights can inform future research and policy decisions related to the use of turmeric for wound healing.
consistency of ChatGPT in producing data.For the second query, we performed a bibliometric analysis on the Web of Science (WoS) database ourselves.Then, we requested ChatGPT to generate an abstract using the same search terms and database to evaluate its real-time data curation capabilities.In the third query, we asked for references for bibliometric analysis related to Query 1, followed by cross-questioning ChatGPT's responses in Queries 4 and 5.
Query 1 (b): Write an abstract of 300-words on bibliometric analysis using the search keyword "turmeric" OR "curcum*" AND "wound" in SCOPUS database ChatGPT: Turmeric, a common spice in many households, has been shown to have potential therapeutic benefits for wound healing due to its anti-inflammatory and antioxidant properties.This bibliometric analysis aimed to explore the research landscape related to turmeric and wound healing by conducting a search using the keywords "turmeric" OR "curcum*" AND "wound" in the SCOPUS database.A total of 246 publications were identified, and bibliometric analysis was conducted using VOSviewer software.The analysis showed a steady increase in the number of publications related to turmeric and wound healing over the past two decades, with a peak in 2021.The majority of publications were research articles, followed by review articles and conference papers.India, the United States, and Iran were the top countries contributing to the research.The most prolific authors in this field were identified as Kottarathil Abraham Jacob and Madhulika Bhagat.The analysis of keyword co-occurrence revealed that the most common keywords were "wound healing," "curcumin," "inflammation," "oxidative stress," and "antioxidant."Journal co-citation analysis showed that the most cited journals in this field were "Phytomedicine," "Journal of Ethnopharmacology," and "Biomedicine & Pharmacotherapy."This bibliometric analysis provides an overview of the research landscape related to turmeric and wound healing.The analysis highlights the growing interest in the potential therapeutic benefits of turmeric and curcumin for wound healing, with a focus on their anti-inflammatory and antioxidant properties.The analysis also identifies prominent authors, countries, and journals in this field, which can inform future research and policy decisions related to the use of turmeric for wound healing.Overall, this study demonstrates the potential of bibliometric analysis to provide insights into research trends and the scientific landscape of a particular topic.
Query 2: Write an abstract of 300-words on bibliometric analysis of machine learning and antimicrobial resistance using web of science database.
ChatGPT: Antimicrobial resistance (AMR) is a serious public health threat that has led to increased interest in the use of machine learning (ML) to develop new treatments and predict resistance patterns.A bibliometric analysis of research related to ML and AMR can provide insights into the current state of research in this field.This study aimed to conduct a bibliometric analysis of publications related to ML and AMR using the Web of Science database.A total of 2,725 publications were identified from the database, with the majority of publications originating from the United States and China.The number of publications has steadily increased over the past decade, with a peak in 2020.The top five institutions with the most publications were all universities, with Harvard University leading the way.The most common keywords were "antimicrobial resistance," "machine learning," "drug discovery," "bioinformatics," and "genomics."The co-citation analysis of journals revealed that the most cited journal was "bioinformatics," followed by "PLOS One" and "Nature Communications."The analysis of author networks showed that there were several prominent researchers in this field, including Jian-Hua Zhao, Yang Zhang, and Gerard Wright.Collaboration networks revealed that researchers from the United States, China, and the United Kingdom collaborated most frequently.The analysis of funding sources revealed that the National Institutes of Health (NIH) in the United States was the primary funding source for research related to ML and AMR, followed by the National Natural Science Foundation of China (NSFC) and the European Union.In conclusion, this bibliometric analysis provides a comprehensive overview of the research landscape of ML and AMR using the Web of Science database.The analysis revealed a growing interest in the use of ML for drug discovery, genomics, and bioinformatics to address AMR.The analysis also highlighted prominent researchers, institutions, and funding sources in this field.These insights can inform future research and policy decisions related to AMR and the use of ML in addressing this critical public health concern.Query 4: Where did you find these references?Among the five references you have given, three of them does not even exist in literature and two of them are irrelevant.
ChatGPT: I apologize for the errors in my previous response.I must have made a mistake while generating the references.As an AI language model, I do not have a specific source or database of references.Instead, I use my pre-existing knowledge and understanding of the topic to generate responses.Here are five correct references for bibliometric analysis of curcumin and wound healing: (1) Yen-Chen ChatGPT: I apologize for the inconvenience caused.As an AI language model, I strive to provide accurate and relevant information to the best of my ability.I use my pre-existing knowledge and understanding of the topic to generate responses, but sometimes, there can be errors due to various reasons.
To answer your question, I am programmed to provide a response to every user query, and I cannot reject or ignore a question.However, if I am unable to provide a satisfactory response, I can inform the user that I am unable to do so.
To rectify my previous mistakes, I have conducted a thorough search and found the following five references for bibliometric analysis of curcumin and wound healing:

Discussion
Bibliometric analysis is a valuable tool for identifying prolific authors, top avenues, leading countries and their collaborative patterns, as well as the intellectual structure of a particular domain in existing literature (Donthu et al., 2021;Ellegaard & Wallin, 2015;Zupic & Čater, 2015).This type of analysis involves processing significant amounts of unstructured data, such as the number of publications, keywords, and other relevant metrics.Bibliometric analysis can also be helpful in forecasting future trends in a particular academic topic (Farhat et al., 2023).The usefulness and impact of bibliometric reviews can be greatly enhanced through synthesis.The fundamental promise of bibliometrics rests in the capacity to synthesize knowledge, even though tracking trends and doing statistical analysis are important components (Maggio et al., 2021).
Researchers can go beyond simple analysis by engaging in synthesis, which will help them gain a more complete picture of the research landscape.In order to provide new insights and knowledge, synthesis requires combining the results of various investigations and locating recurring themes and patterns.It tries to produce an integrated and complex understanding of the subject by going beyond the various components of the study (Perrier et al., 2016).Consequently, wellexecuted bibliometric studies can significantly contribute to the progress of a field and guide future research endeavours.
The accuracy of the data curation capabilities of ChatGPT was evaluated by conducting a bibliometric analysis using search strings in either SCOPUS or Web of Science databases.Table 1 compares its results from actual bibliometric papers.Despite the well-written presentation of quantitative data, ChatGPT provided inaccurate information about leading authors, countries, and avenues.For instance, when asked to write a bibliometric review using the search keywords "turmeric" OR "curcum*" AND "wound" from the SCOPUS database, it only retrieved 246 articles.In comparison, the original study (Farhat et al., 2023) found 1284 articles, i.e., a quite significant difference in data.
Furthermore, ChatGPT listed India, USA, and Iran as the top contributing countries, while the original study listed India, China, USA, and Iran.In terms of the most prolific authors, ChatGPT identified Kottarathil Abraham Jacob and Madhulika Bhagat, whereas the original study found Meiyanto, Edy, Sahebkar, Amirhossein, and Jenie, Riris.Even if we disregard the sequence of the top countries, the three countries mentioned by ChatGPT are still among the top five countries listed in the original study.Although the authors that ChatGPT identified were not included in the original study's list of authors, the top keywords and avenues retrieved by ChatGPT were among the top 20 listed in the original study for their respective metrics, despite their sequence being inaccurate.Reframing the query did not affect the consistency of the data curation.

References
No. of studies retrieved In order to evaluate the real-time data curation capabilities of ChatGPT, a bibliometric analysis was conducted using a new set of search strings on the Web of Science database, and then ChatGPT was asked to conduct a similar analysis.The results indicated that ChatGPT generated inaccurate information regarding data collection.While 681 articles were retrieved, ChatGPT reproduced 2725 articles, which is far more than the original number.While the top two contributing countries were accurate, the top institutions were not correct.Similarly, the top cited journals and funding agencies provided by ChatGPT did not align with the findings of the study conducted by the researchers themselves.The top cited journals in the field were found to be Scientific Reports, Frontiers in Microbiology, and Microbiome, whereas ChatGPT provided a different ranking.The prominent researchers suggested by ChatGPT were Jian-Hua Zhao, Yang Zhang, and Gerard Wright, but Jian-Hua Zhao and Yang Zhang were not even included in the list of authors retrieved by the researchers themselves.Gerard Wright, however, was identified as one of the top 30 prolific authors.The top three countries with the highest collaborations were found to be the USA, UK, and Germany, while ChatGPT identified the top three most frequent collaborators as the USA, China, and the UK.However, ChatGPT did not include Canada, which was found to be equal to the UK, leading to incomplete information.

Top avenues
In a recent study, ten research abstracts were collected from five high-impact medical journals, and ChatGPT was tasked with generating new abstracts based on their titles and journals.While the generated abstracts had similar patient cohort sizes as the original abstracts, the exact numbers were found to be fabricated (Gao et al., 2022).Despite this, reviewers found it surprisingly difficult to distinguish between the two sets of abstracts, though they noted that the AI-generated abstracts were vague and had a formulaic tone.Several recent studies have highlighted the difficulty faced by researchers in distinguishing between AI-generated and original abstracts (Else, 2023;Salvagno et al., 2023).
In Query 3, we asked ChatGPT for references related to bibliometric analysis, but some of the references provided were either non-existent or irrelevant to our study.When we cross-questioned ChatGPT about the sources, it apologized and generated a new set of references, but upon further investigation, those references were also non-existent.In Query 5, when we asked ChatGPT why it does not reject questions if it cannot provide relevant answers, it explained that it is programmed to respond to every query and cannot ignore any question.These findings highlight that ChatGPT is programmed to respond to every query regardless of accuracy and does not take responsibility for any errors.It can be envisaged that ChatGPT needs to be more trustworthy to depend solely on the data it generates.Therefore, it is crucial to ensure the accuracy of data generated by ChatGPT, and bibliometric analysis may not be the most suitable task for it.
While analyzing the ChatGPT response, we observed that there is a glaring lack of synthesis in addition to the inaccurate data procurement when performing a bibliometric review.Instead of synthesizing the information, ChatGPT largely concentrates on reporting random data.This constraint prevents it from offering a thorough and perceptive analysis that extends beyond individual data points.Moreover, the lack of data reproducibility raises concerns about the reliability and trustworthiness of the results generated by ChatGPT.By acknowledging and addressing these limitations, we can work towards developing AI models that excel in both data analysis and synthesis, thus advancing the capabilities of bibliometric reviews.

Conclusion
In conclusion, this exploratory study finds that while ChatGPT has the potential to be a useful tool as a scientific writing assistant in terms of improving readability, language enhancement, rephrasing/paraphrasing and proofreading, etc., it should not, as of today, be used for retrieving bibliometric data or conducting bibliometric assessments.It is very important for researchers and students to keep this in mind.In recent years, bibliometric methods have become increasingly popular in many different research areas, and some might be tempted to take a shortcut and ask ChatGPT rather than carrying out the analyses by extracting data from databases and analyzing these data using appropriate software packages.
Table 2 summarizes the potential issues related to ChatGPT while conducting bibliometric analysis.It is our view that researchers should exercise caution when interpreting the results generated by ChatGPT and should verify the information using other sources.ChatGPT's real-time data curation capabilities and data analytic techniques, specifically with electronic databases such as SCOPUS and WoS, need further refinement and validation to ensure trustworthiness, accuracy and consistency.