Utilizing Machine Learning Techniques for Classifying Translated and Non-Translated Corporate Annual Reports

Globalization has led to the widespread adoption of translated corporate annual reports in international markets. Nonetheless, it remains largely unexplored whether these translated documents fulfill the same function and communicate as effectively to international investors as their non-translated counterparts. Considering their significance to stakeholders, differentiating between these two types of reports is essential, yet research in this area is insufficient. This study seeks to bridge this gap by leveraging machine learning algorithms to classify corporate annual reports based on their translation status. By constructing corpora of comparable texts and employing thirteen syntactic complexity indices as features, we analyzed the reports using eight different algorithms: Naïve Bayes, Logistic Regression, Support Vector Machine, k-Nearest Neighbors, Neural Network, Random Forest, Gradient Boosting and Deep Learning. Additionally, ensemble models were created by combining the three most effective algorithms. The best-performing model in our study achieved an Area Under the Curve (AUC) of 99.3%. This innovative approach demonstrates the effectiveness of syntactic complexity indices in machine learning for classifying translational language in corporate reporting, contributing valuable insights to text classification and translational language research. Our findings offer critical implications for stakeholders in multilingual contexts, highlighting the need for further research in this field.


Introduction
Corporate annual reports are comprehensive documents that deliver an indepth analysis of a corporation's financial performance, activities, plans, and strategies over the preceding year.They function as a pivotal communication medium, sharing information about a company's financial statements, corporate governance, and social responsibility activities with its shareholders, investors, and other stakeholders (Ren and Lu 2021).The importance of narrative disclosures in annual reports has been underscored by the US Securities and Exchanges Commission (SEC 1998).These reports play a crucial role in aiding investors in their decision-making process (Bhatia 2010).They provide key insights through which investors and other stakeholders can evaluate a company's financial health, performance, and prospects.They aid investors in making informed decisions about purchasing, holding, or selling a company's shares and also assist regulators in monitoring compliance with financial reporting standards.The impetus behind homing in on corporate annual reports as the specific focal point for this research is the significant role these reports play in the global business ecosystem and the unique challenges they present in the context of translation studies (Wang, Liu, and Moratto 2023).Corporate annual reports are more than just financial documents; they are the principal mode of formal communication between a corporation and the global market (Beattie, McInnes, and Fearnley 2004).They serve as a critical interface for a myriad of stakeholders, including investors, analysts, customers, employees, and regulators, each with diverse linguistic and cultural backgrounds.
The readability and language functions of annual reports have received considerable scrutiny in academia due to their significance (Bhatia 2008;Clarke, Hrasky, and Tan 2009;Courtis and Hassan 2002;Garzone 2004;Li 2008;Liao 2021;Li, Luo, and Deng 2019;Ren and Lu 2021).The majority of readability studies on annual reports have employed classic formula-based readability measures to obtain readability scores and to conduct analyzes and comparisons.These formula-based readability measures have been critiqued for their simplicity, focusing primarily on word and sentence difficulty (Benjamin 2012;Crossley et al. 2022;Lu, Gamson, and Eckert 2014).Syntactic complexity, encompassing the diversity and degree of complexity of the syntax used in a document (Housen and Kuiken 2009;Pallotti 2015), represents an important and multi-faceted aspect of text complexity (Frantz, Starr, and Bailey 2015;Lu 2011).Previous studies on annual reports have primarily relied on formula-based readability measures that only use average sentence length to assess syntactic complexity, thereby overlooking other aspects of sentence complexity.Furthermore, most of the prior research has focused on original annual reports, underscoring the need for scrutiny of translated annual reports as a subset of company annual reports.As English serves as the lingua franca in the international business and finance sectors, corporations in non-English-speaking countries often release a second annual report in English to enhance the accessibility of their messages to international investors, thereby yielding economic benefits (Jeanjean, Lesage, and Stolowy 2010;Jeanjean, Stolowy, and Erkens 2010).
This study endeavors to address existing research gaps by employing syntactic complexity metrics as features for classifying both translated and non-translated annual reports.Through the application of text classification techniques, the study aims to determine the efficacy of syntactic complexity as features in effectively distinguishing between these two categories.We utilized four dimensions (comprising 13 indices) of syntactic complexity as features and applied eight machine learning models, including Naïve Bayes, Logistic Regression, Support Vector Machine (SVM), k-Nearest Neighbors (kNN), Neural Network, Random Forest, Gradient Boosting and Deep Learning, to distinguish translated annual reports from non-translated ones.Translated texts often exhibit certain features known as "translation universals."Measures of syntactic complexity can help identify these features (Liu and Afzaal2021;Wang, Liu, and Moratto 2023;Xu and Liu 2023).Syntactic complexity measures, such as sentence length, subordination level, depth of syntactic trees, and the variety and frequency of syntactic structures, can serve as valuable features in machine learning models (Lin and Liang 2023).By quantifying these aspects, classifiers can be trained to recognize patterns and discrepancies that are indicative of translated or non-translated texts.
The rest of this paper is organized as follows: Section 2 offers a brief review of the related research on annual reports and text classification.Section 3 introduces the research questions, as well as the two corpora compiled and used in this study for obtaining the values of syntactic complexity features.Section 4 presents the results of the eight machine learning models.The discussion of results and conclusions are presented in Sections 5 and 6, respectively.

Previous Studies on Annual Reports
In the accounting literature, scholars have paid attention to the readability of annual reports.It is defined as a text's capacity to be read with speed and ease (Schroeder and Gibson 1990).Regulated parties in finance and business support accountability and integrity, and have accepted these fundamental concepts of readability.For example, the US Securities and Exchange Commission (SEC 1998) published "Plain English Disclosure Guidelines" to encourage issuers to improve the effectiveness of their financial and business disclosures.Similarly, the Financial Reporting Council (2009) argued that enterprises should use plain language.These requirements increase the importance of information in financial disclosures by making them readable in order to increase the accuracy of analytical forecasts and ultimately benefit shareholder choices.Jones (1994) highlighted the importance of readability to the effectiveness of narratives, and Lehavy, Li, and Merkley (2011) found that the readability of corporate texts of communication impacts financial analysts.Schroeder and Gibson (1990) compared the footnotes and chairman's statements, and found that the footnotes to be more complex and complicated in terms of words and syntax.They compared 40 samples of the MD&A section with the president's letter section and footnotes.The authors found that the MD&A narratives were significantly lower than the footnote narratives in the use of passive voice.They found no significant differences in word length, sentence length, or readability score.The MD&A had significantly longer word length and lower readability than the president's letter section.Clarke, Hrasky, and Tan (2009) compared corporate annual report narratives with local government annual report narratives in Australia.They found corporate annual reports to be significantly less readable than their local governmental counterparts, as reflected by the Flesch Reading Ease scores.
The relationship between annual report readability and various variables, such as company performance, stock price, company size, business strategies, and earnings management, have received significant attention from scholars.Smith and Taffler (1992) conducted a systematic investigation into the correlation between the measures of readability of the chairman's statement in the corporate annual report and financial performance, using three readability formulas: Lix, Flesch, and Cloze.They found a positive correlation between the readability of the chairman's statements and corporate profitability, financial gearing, and liquidity.Subramanian, Insley, and Blackwell (1993) echoed these findings and further revealed the presence of statistical variations between companies with good and poor performance in terms of annual report readability.However, no statistical difference was found in the use of jargon or modifiers between the good and poor performers.On the contrary, both Baker and Kare (1992) and Courtis (1995) reported inconsistent results.Rutherford (2003) examined the relationships between the readability of operating and financial reviews with different indices of corporate performance.The study found no significant correlation between them.Smith et al. (2006) investigated the relationship between the readability of the chairman's statement in annual reports and the financial performance of Malaysian firms, taking into account such factors as corporate size and board membership.Their results found little evidence to support the obfuscation hypothesis.Li (2008) investigated the correlation between the readability of annual reports and companies' performance and earnings using the Gunning Fog index to measure readability.He found that companies with poorer earnings tended to produce less readable and longer annual reports, whereas those companies whose annual reports were more readable had higher persistent earnings.Ajina, Laouiti, and Msolli (2016) investigated the connection between annual report readability and earnings management in the French stock market.They gathered annual reports from 163 listed French companies between e2340393-4 2010-2013.They used the Gunning Fog index to measure the readability of annual reports, and discovered a statistical and positive relationship between annual report readability and the amount of discretionary accounting adjustments.Businesses that manage their revenues tend to produce less readable and comprehensible annual reports.Moreover, significant correlations between the readability of annual reports and other variables, such as profitability and company size, were also identified.Their discovery may be helpful for investors or businesses looking to better understand how profit management may impact annual reports on the French stock market.Lim, Chalmers, and Hanlon (2018) discovered that company strategy significantly influences annual report readability, with prospector strategy-pursuing organizations producing more readable reports than companies with defender strategy.The study also found that the annual reports of companies of technology sectors and those with decentralized organizational structures are more difficult to understand.Habib and Hasan (2020) examined how a company's unique business strategy may be affected by the readability of narrative disclosures in its annual reports.The researchers evaluated the readability of annual reports through using the Flesch Reading Ease and Flesch-Kincaid Grade Level indices.They found that businesses with various strategies generate annual reports of varying degrees of readability.Specifically, companies that employ prospector-type business strategies provide less understandable annual reports than those who employ defender-type business strategies.However, the study also claimed that annual reports are often difficult to comprehend.Some scholars have also explored bilingual annual reports.Courtis and Hassan (2002) examined the bilingual versions of annual reports in Hong Kong and Malaysia, which use English and Chinese, and English and Malaysian, respectively.In the study, the Flesch Reading Ease index was employed to gauge the readability of English annual reports, the Yang index for Chinese texts, and the Yunus index for Malaysian reports.As the scores from those indices cannot be compared directly, Courtis and Hassan (2002) connected the scores to their indices' prearranged benchmarks of text difficulty and investigated "the distributions of reading ease degree of difficulty' interpretations per formula" (p.404).They found that the English annual reports were less readable than the Malaysian and Chinese versions.Studies on the readability of annual reports have mainly used formula-based readability indices.However, critics of these indices have argued that such formulas only involve simplistic measures of word and sentence difficulty (Benjamin 2012;Lu, Gamson, and Eckert 2014).Syntactic complexity is an essential, and multidimensional, component of text complexity (Frantz, Starr, and Bailey 2015;Lu 2011).Readability formulas only use the average sentence length to measure the syntactic aspect, thus entirely neglecting other measures of sentence complexity.Therefore, there is a need for a systematic investigation of the syntactic complexity of annual reports.
Moreover, previous studies have primarily focused on original annual reports, disregarding the significance of translated annual reports as an important category of company annual reports.As English is a lingua franca in the international business and financial domain, corporations in non-English-speaking nations and regions often disclose a second annual report in English to enhance the accessibility of their messages to international investors, thus yielding economic benefits (Jeanjean, Lesage, and Stolowy 2010;Jeanjean, Stolowy, and Erkens 2010).Translated annual reports have a significant impact on the communication between companies and their external investors (Jeanjean, Lesage, and Stolowy 2010).Indeed, studies have shown that translated annual reports have different linguistic features from the originals, and that these differences may affect the effectiveness of communication (Huang and Wang 2020;Liao 2021).Therefore, a comprehensive and comparative investigation of both translated and non-translated annual reports' syntactic complexity features is crucial for a thorough understanding of their language functions.This study distinguishes itself from previous research on annual reports by undertaking a comprehensive and granular analysis of the nuanced differences in syntactic complexity between translated and non-translated annual reports.The study's innovative approach to syntactic analysis, at a fine-grained level, elucidates the intricate linguistic nuances that differentiate these financial documents, thereby contributing novel insights to the field.

Syntactic Complexity As Features
The concept of syntactic complexity, as defined by the diversity and intricacy of a text's syntax (Ortega 2003), plays a pivotal role in quantifying text readability and evaluating written work (Jagaiah, Olinghouse, and Kearns 2020;Lei and Shi 2023;Ortega 2015).It is a key construct tied to the production units of language and grammatical frameworks.Syntactic complexity finds its main application in the domain of language learning, particularly within reading and writing exercises.It acts as a mirror, reflecting a learner's capacity to construct complex sentences.The research interest lies not only in mapping the evolution of learners but also in discerning the myriad factors that could influence syntactic complexity.These factors encompass the genre of the text, the learner's mother tongue, their academic level, and educational history (Bulté and Housen 2014;Jagaiah, Olinghouse, and Kearns 2020;Lu and Ai 2015).In the study of language reading, syntactic complexity has been scrutinized for its vital role as an indicator of reading difficulty.It is perceived as an essential component in modifying teaching materials and reading resources (Jin, Lu, and Ni 2020;Lei and Shi 2023).Recently, the investigation of syntactic complexity extends to translation studies, which frequently involves identifying the unique properties of translated and interpreted languages (Chen, Li, and Liu 2024;Liu et al. 2022;Liu and Afzaal 2021;Liu, Liu, and Lei 2022;Xu and Liu 2023).For example, Liu and Afzaal (2021) found that translated texts exhibit significantly reduced syntactic complexity, supporting the simplification theory.Similarly, Wang, Liu, and Moratto (2023) identified partial simplification in translated chairman's statements.Regarding the explicitation universal, Al-Jabr (2006) noted that translations often become more explicit through syntactical changes.Xu and Li (2021) linked this explicitation to the formality level in translations.These studies collectively highlight how translation affects syntactic complexity and explicitness.

Machine Learning and Classification Between Translated and Non-Translated Texts
Text classification is a pivotal technique in the domain of natural language processing, enabling the systematic categorization of textual data into predefined labels.This can vastly improve information retrieval and content organization (Manning, Raghavan, and Schütze 2008).Text classification encompasses a variety of tasks where the goal is to assign one or more categories to a piece of text.This can be done based on the content, style, sentiment, purpose, or other criteria.Some of the key tasks in text classification include sentiment analysis (e.g., Mouthami, Devi, and Bhaskaran 2013), topic categorization (e.g., Zhou, Li, and Liu 2009), spam detection (e.g., Lau et al. 2012), translationese identification (e.g., Liu et al. 2022), and intent detection (e.g., Balodis and Deksne 2019).By applying machine learning techniques, such as Naive Bayes classifiers or SVM, textual data can be effectively classified into predefined labels (Ilisei et al. 2010;Liu et al. 2022).In the specialized task of translationese identification, these models offer a robust approach to distinguishing between original and translated texts because they can discern complex and non-linear relationships between features and labels.Machine learning algorithms can be trained on extensive datasets of labeled texts, where each text is identified as translated or original.These classifiers can then leverage these labeled datasets to predict the status of new texts.Furthermore, machine learning allows for the incorporation of various statistical and linguistic features extracted from the text, thereby enhancing the accuracy and resilience of the classifiers.Baroni and Bernardini (2006) utilized a machine learning approach to classify translated and non-translated Italian geopolitical journal articles.They employed word or part-of-speech n-grams as features and used the SVM as their machine learning algorithm.Their findings reveal that an ensemble of SVMs could achieve an accuracy rate of 86.7% in the classification task, reinforcing the notion that machine learning models can identify linguistic variations between translated and non-translated texts.Ilisei et al. (2010) undertook a study on the universals of simplification in translated Spanish, using various machine learning algorithms to classify between translated and non-translated Spanish texts.They found that all eight classifiers showed improvement when the simplification features were included.Notably, the SVM exhibited the most significant improvement, increasing from 73.65% to 81.76% accuracy on the test set.Similarly, Volansky, Ordan, and Wintner (2015) applied a supervised machine learning classifier to test hypotheses of translation universals, finding that the use of mean word rank as a feature yielded the highest accuracy (77%) among the simplification features.
Hu and Kübler (2021) explored the unique characteristics of translated Chinese texts and how they differ from texts originally written in Chinese.They used machine learning techniques to investigate the nuanced relationship between translations and their source languages, as well as variations within the translations.The researchers utilized the SVM classifier and 10-fold cross-validation for their analysis.The segmentation, POS tagging, and parsing of the Chinese texts were performed using the Stanford CoreNLP toolkit and its default models.The study found that translated Chinese texts exhibit distinct features that set them apart from non-translated texts, constituting a unique variety of Chinese.The authors underscored the novelty of their approach, which provided valuable empirical results from a non-Indo-European target language and examined a typologically diverse group of source languages.Similarly, Liu et al. (2022) explored the potential of machine learning techniques to distinguish between original and translated Chinese texts.In their study, the authors employed seven entropy-based metrics and four machine learning models -SVM, Linear Discriminant Analysis, Random Forests, and Multilayer Perceptron -to classify a balanced Chinese comparable corpus.The study's findings suggest that the combination of Shannon's entropy information-theoretic indicators with machine learning techniques offers a unique and effective method for analyzing translation as a distinct communicative activity.The results support the hypothesis that translational language differs qualitatively from original language, with the SVM identified as the most effective model, reaching an AUC of 90.5%.This study provides valuable insights for corpus-based investigations of the translationese phenomenon within the field of translation studies.
Rubino, Lapshinova-Koltunski, and Van Genabith ( 2016) conducted a study to address the challenge of distinguishing between human translations and original texts.The research specifically aimed to discriminate between the outputs of novice and professional translators.The authors proposed a feature set inspired by Quality Estimation (QE) and information density, and assessed its effectiveness in differentiating between non-translated texts, professional translations, and translations by translator trainees.The authors used SVM and four types of features -surface features, surprisal features, complexity features, and distortion features -for the classification task.The study found that the combination of all four feature types yielded the highest accuracy, with distortion and complexity sets following closely.However, surprisal features were ineffective in differentiating between professional and student translations.Although the research suggests a close proximity between the two types of translations according to the feature types, with accuracy rates slightly above the 50% baseline, the authors argue that the proposed feature set shows promise in distinguishing between translations and non-translated texts.However, they also acknowledge the need for further research to improve the accuracy of the classification task.Building on the theme of language differentiation, Habic, Semenov, and Pasiliao (2020) applied deep learning techniques to ascertain whether texts were authored by native or nonnative speakers.They collected a variety of labeled datasets and devised a specialized deep neural network (DNN) tailored to this specific classification challenge.Their methodology resulted in a notable enhancement of accuracy, achieving a peak of 88.75%.This significant achievement not only underscores the efficacy of deep learning in the realm of language processing but also sets a new standard for subsequent research endeavors aiming to precisely detect nuanced differences in language use and translation.
While the majority of studies on classifying between translated and non-translated texts have employed supervised learning, a few studies have utilized unsupervised learning.For instance, Nisioi and Dinu (2013) used a clustering approach to identify translationese in the novels of Vladimir Nabokov, a bilingual author.They employed the hierarchical clustering method and found that function words -common words serving grammatical purposes rather than conveying meaning -were the most informative features for identifying translationese.Similarly, Rabinovich and Wintner (2015) used an unsupervised machine learning approach to identify translationese, and their findings suggest that their proposed method outperforms supervised methods in this task.According to the authors, their two-phase clustering approach is a reliable method for distinguishing original texts from translated counterparts in highly heterogeneous datasets.Natural Language Processing (NLP) models have also been used for text classification tasks.A salient example of this is FinBERT, a large language model specifically designed for extracting information from financial texts (Huang, Wang, and Yang 2023).FinBERT builds upon the BERT (Bidirectional Encoder Representations from Transformers) architecture, which has revolutionized the understanding of context in language by considering the bidirectional nature of text.According to their study, FinBERT has demonstrated superior performance in sentiment classification, outperforming both the Loughran and McDonald dictionary -a widely recognized lexicon tailored for financial context analysis -and other conventional machine learning algorithms.In a similar vein, Dogra et al. (2021) explored advanced machine learning and statistical techniques, including NLP models, to analyze sustainability in banking stocks, a task that entails the classification of events based on their relevance and impact on stock performance.Their study demonstrates how the integration of NLP with event study methodologies can uncover latent patterns within financial news and disclosures that traditional quantitative analyzes might overlook.
Although the existing literature predominantly relies on SVMs for classification, Liu et al. (2022) argue that it is essential to consider alternative classifiers to comprehensively evaluate the performance of classification techniques from a machine learning perspective.An ensemble model is also needed to improve accuracy and reduce overfitting.As such, the present study will apply eight machine learning models -Naïve Bayes, Logistic Regression, SVM, kNN, Neural Networks, Random Forest, Gradient Boosting and Deep Learning -for the classification of translated and nontranslated annual reports and create ensemble models integrating the most effective models among the eight machine learning models.

Research Questions
The present study aims to use machine learning techniques to examine syntactic complexity of annual reports of companies from the Chinese Mainland listed on the Hong Kong Stock Exchange and those of American companies listed on the American stock exchanges.For the text analysis, two comparable corpora have been compiled, including the annual reports of companies from Mainland China (listed on the Hong Kong Stock Exchange, HKEX), and US companies listed on American stock exchanges.The HKEX was chosen due to the region's sheer importance as a business and financial center, as well as the abundant linguistic resources the HKEX provides for Chinese and English translation studies because of its special historical status and geographic location (Huang and Wang 2020; Jeanjean, Lesage, and Stolowy 2010).Many Mainland Chinese enterprises in the international market create their annual reports in Chinese and then translate them into English (Wang 2014).The study aims to address the following research questions: RQ1: Can machine learning techniques differentiate between translated English annual reports and non-translated English annual reports using syntactic complexity features?To quantify the syntactic complexity of corporate annual reports, we utilized the L2 Syntactic Complexity Analyzer developed by Lu (2010), which provided us with a comprehensive set of thirteen syntactic complexity measures.For the classification task, we first deployed eight diverse machine learning models, leveraging the syntactic complexity measures as predictive features while using the translation status as the target variable.We then assembled the three most effective algorithms to develop a final model.This approach was designed to harness the strengths of different algorithms, providing a robust assessment of the reports' syntactic nuances in relation to their translational status. Figure 1 illustrates the workflow of the methods in our study.

Data
The linguistic data used in this study consists of three corpora, namely the Corpus of American Annual Reports (USAR), the Corpus of Chinese Mainland Annual Reports (CMAR).
This study extracted data from companies listed on both US stock exchanges and HKEX.As of April 1, 2022, 5,295 stocks were listed on US stock exchanges and 2,251 on the HKEX.In order to ensure balanced representation in the corpora, the stock information was sorted by respective industry sectors.Given that the research focus was on the translated annual reports of Mainland Chinese companies listed on the HKEX, the distribution of stocks on the HKEX was used as a reference to design a sector-based scheme for company selection.Companies were chosen from ten distinct sectors, with a proportional number of companies selected from each sector (refer to Table 1 for details).The base locations of the companies were screened to ensure that 100 companies for USAR were based in the US, and 100 companies for CMAR were based in Mainland China.They annual reports were converted from PDF format to plain text, and subsequently saved as text files for computational analysis.Non-textual content, such as graphs and images, was excluded.Sections such as the chairman's statements, Management Discussion and Analysis (MD&A), and Notes to Consolidated Financial Statements (NCFS) were isolated into separate text files.The USAR and CMAR were further subdivided into three sub-corpora each, namely US-C, US-M, US-N, CM-C, CM-M, and CM-N.Detailed information regarding the annual reports corpora and their sub-corpora can be found in Table 2.

Data Analysis
A machine learning approach was applied in this study to evaluate the performance of several dimensions of syntactic complexity in the classification of translated and non-translated annual reports.The study used the syntactic complexity measures proposed by Lu (2010).Given that the fifth dimension of syntactic complexity is characterized by a singular feature, the research primarily focused on quantifying the classification performance of the remaining four dimensions.These dimensions encompassed: 1) Length of production unit; 2) Degree of subordination; 3) Degree of coordination; 4) Degree of phrasal sophistication (see Table 3).Length of production unit refers to the average length of spoken or written text units, which could be sentences, T-units (main clauses plus any subordinate clause attached to them), or clauses.A higher average length typically indicates more complex production.This is often measured in terms of words per sentence/T-unit/clause, with the understanding that longer units may contain more information and more complex syntactic structures.Amount of subordination measures the use of subordinate clauses within larger sentence structures.Subordinate clauses are clauses that provide additional information to the main clause but cannot stand alone as a complete sentence.They are often introduced by subordinating conjunctions such as "because," "although," "if," etc.A higher amount of subordination would be an indicator of greater syntactic complexity because it shows the writer's ability to embed clauses within clauses, creating more nuanced and detailed sentences.Amount of coordination refers to the syntactic construction that links clauses or phrases of equal syntactic importance with coordinating conjunctions like "and," "but," "or," "nor," "for," "so," and "yet."The amount of coordination is a measure of the frequency and complexity of these structures.In syntactic complexity analysis, a high amount of coordination can reflect the writer's ability to connect ideas and maintain structural parallelism, contributing to the overall complexity of the text.Degree of phrasal sophistication measures the complexity within phrases, including complex nominals and verb phrases.A higher degree of phrasal sophistication indicates that the writer can manipulate phrase structure to convey more precise and complex meanings.Each dimension was critically analyzed to gain a deeper understanding of its role in syntactic complexity and, consequently, its influence on the classification of texts as either translated or non-translated.In this study, the L2 Syntactic Complexity Analyzer (Lu 2010) was utilized to generate scores for thirteen indices of syntactic complexity.This computational tool, specifically designed for assessing the syntactic complexity of written English, was selected for several compelling reasons.According to Lu and Ai (2015), its advantages include its free availability, a comprehensive suite of syntactic complexity measures, the ability to process large batches of files, and its demonstrated high reliability (p.18).The L2 Syntactic Complexity Analyzer has gained widespread recognition in the field of applied linguistics, as evidenced by its use in numerous studies.Researchers have employed it to investigate the syntactic complexity of texts produced by English learners (Lu and Ai 2015;Wu, Mauranen, and Lei 2020) and to address the differences between translation and non-translation (Liu and Afzaal 2021).The tool's robust design facilitates detailed analyzes of syntactic complexity, and its capacity to handle extensive datasets makes it an invaluable resource for linguistic research.This study applied eight machine learning algorithms -Naïve Bayes, Logistic Regression, SVM, kNN, Neural Network, Random Forest, Gradient Boosting and Deep Learning -to classify annual reports as either translated or nontranslated.For the kNN algorithm, we adhered to the common heuristic of setting k equal to the square root of the number of samples in the dataset.Given a dataset size of 200, we selected an approximate k value of 14.As for the SVM, we compared the performance of linear and RBF kernels, finding the linear kernel to yield superior results.To evaluate the models' performance and generalizability, a 10-fold cross-validation procedure was implemented.This method involves partitioning the dataset into 10 equal-sized subsets.The model is then trained on nine subsets and tested on the remaining one, a process repeated 10 times with each subset serving as the test set once.This validation approach promotes efficient data utilization, stable performance estimates, improved parameter tuning, and representative test outcomes, thus offering a more reliable measure of model performance than a single train-test split.The goal of binary classification is to categorize an ambiguous object into a specific class (Lei and Shi 2023).If the object falls into the specified category, it is labeled as "positive;" otherwise, it is labeled as "negative."For example, in a task to classify a fruit as an apple, a "positive" outcome would indicate that the model identifies it as an apple, while a "negative" outcome would suggest the opposite.These performance measures allow for a comprehensive evaluation of the machine learning models employed in this study.Model performance was quantified using several metrics: AUC, Accuracy, F1 score, Precision, and Recall.These metrics provide a comprehensive evaluation of the models' classification performance (Hand 2012).

Results
Figures 2, 3 and 4 depict the syntactic complexity measures differentiating between the two sub-corpora within chairman's statements (US-C and CM-C), MD&A (US-M and CM-M), and NCFS (US-N and CM-N), respectively.Figure 2   Management Discussion and Analysis (MD&A), and Notes to the Consolidated Financial Statements (NCFS) across the two corpora are presented in Tables 4, 5 , and 6, respectively.Table 4 reveals that CM-C exhibits higher standard deviations than US-C in most syntactic complexity features.Table 5 indicates that CM-M is more dispersed than US-M across all 13 features.Table 6 shows that CM-N has lower standard deviations than US-N in seven features.These tables provide a comprehensive overview of the syntactic complexity observed within each type of corporate documentation and highlight the differences between the two corpora.
Table 7 presents the performance results of the machine learning models applied to chairman's statements.In the context of measuring the length of production units within chairman's statements, Logistic Regression outperforms   kNN, Random Forest, and Naïve Bayes models also perform strongly, each achieving an AUC above 95%, with respective values of 97.64%, 97.53%, 97.32%, 96.38%, 95.47%, and 95.15%.Gradient Boosting, although ranking last, still maintains a relatively high AUC of 94.52%.We developed an ensemble model combining Deep Learning, SVM, and Neural Network algorithms to analyze chairman's statements.The model utilizes the "Degree of Phrasal Sophistication" dimension, which includes three features: CN/C, CN/T, and VP/T.It demonstrated robust performance, achieving an AUC of 97.7%, an accuracy of 92%, an F1-score of 92.22%, a precision of 91.59%, and a recall of 94% (See Table 10).Table 8 provides the performance results of the machine learning models applied to the Management Discussion and Analysis reports (MD&A).In terms of the length of production units within MD&A, the Neural Network model achieves the highest AUC among all models, reaching 89.25%.The Logistic Regression and SVM models follow, ranking second and third with AUC values of 88.24% and 88.06% respectively.Naïve Bayes, on the other hand, exhibits less optimal performance, achieving an AUC of 75.66%.When considering coordination within the MD&A narratives, the Neural Network model again ranks highest, achieving an AUC of 84.99%.The kNN and SVM models are closely behind, with AUC values of 83.19% and 81.24% respectively.For the dimension of subordination within MD&A narratives, the Gradient Boosting model surpasses the others, reaching an AUC of 86.21%.The Random Forest model follows closely with an AUC of 85.97%.In assessing phrasal sophistication, Random Forest tops the list with an AUC of 89.8%.The kNN, Neural Network, Gradient Boosting, and SVM models rank second, third, fourth, and fifth respectively, each achieving an AUC above 85%.We created an ensemble model that merges the analytical strengths of Logistic Regression, SVM, and Neural Network techniques for scrutinizing MD&As.This model employs the "Length of production units" dimension, encompassing three features: MLS, MLC, and MLT.The model's efficacy was good, with a 91.1% AUC, 83.5% accuracy, 84.1% F1-score, 83.28% precision, and 86% recall (See Table 10).
Table 9 shows the performance results of the machine learning models applied to the Notes to Consolidated Financial Statements (NCFS).In the context of the length of production units within the NCFS, Deep Learning has the highest AUC at 99.3%, the SVM model, Neural Network model and Logistic Regression closely follow, reaching an AUC of 99.05% and 99%, 98.92% respectively.Notably, the  Meanwhile, the second ensemble model also showed strong performance, with an AUC of 99.2%, an enhanced accuracy of 96%, an F1score of 95.94%, precision of 97.18%, and a slightly better recall of 95%.Given the comparative analysis, the latter ensemble model, which yielded a higher accuracy and recall, was selected as the more suitable model for analyzing NCFS (See Table 10).This model's balance of high precision and recall, coupled with its marginally superior accuracy, suggests that it is the better-performing model for our specific application in analyzing NCFS.

Discussion
Translation is a dual-faceted phenomenon, encompassing both a mental process within the translator's psyche and a cross-cultural social behavior (House 2014).The Hypothesis of Gravitational Pull, as proposed by Halverson (2003), offers a widely accepted explanation for this phenomenon by merging bilingual theory and cognitive linguistics (Liu and Afzaal2021).This hypothesis proposes that the language of translation is molded by three interconnected forces.The first force, known as the magnetism effect, would entice translators to mirror the conspicuous linguistic characteristics of the target language in  their translations.The second force, termed the gravitational pull effect of the source language, acts in opposition to the magnetism effect and represents a challenge the translator must navigate.The third and final force, the connection effect, arises from the frequent co-occurrence of translation equivalents in both the source and target languages.These three forces operate collectively to shape the language of translation (Halverson 2017).The magnetism effect has primarily been employed to elucidate unique features in translated texts at the lexical level.However, Liu and Afzaal(2021) and Wu et al. (2023) also suggest that it can elucidate unique syntactic features in translated texts.In our study, it has been observed that translated chairman's statements and MD&A narratives contain more coordinate phrases than their non-translated counterparts.This pattern is indicative of the magnetism effect, demonstrating how a translator tends to incorporate the prominent linguistic features of the target language into their translation, and may even utilize these features more abundantly than they appear in non-translated texts of the target language.
On the other hand, the complexity levels of annual reports are also shaped by societal factors.A notable societal influence is regulatory guidance.In 1998, the U.S. Securities and Exchange Commission (SEC) initiated the Plain English Movement, advocating for the use of clear and straightforward language in financial disclosures.This initiative was designed to facilitate efficient and effective communication between companies and their investors.The SEC provided guidelines to evade pitfalls in communication such as lengthy sentences, weak verbs, and abstract nominalizations (SEC 1998).The SEC identified the prevalent issue of long sentences in disclosure documents and provided strategies to avoid them.It also emphasized the problem of using weak verbs, which often involve substituting verbs with nouns derived from them, typically ending in "−tion."These nominalizations, while less active, tend to be more abstract than their verb counterparts (Loughran and McDonald 2014).In line with SEC guidelines, U.S. companies usually aim to decrease the use of nominalizations and shorten the average sentence length in their annual reports.This method enhances the comprehensibility of these reports for investors by promoting the use of verbs or verb phrases over their equivalent nouns or nominal phrases.
Incorporating measures of syntactic complexity has resulted in a substantial enhancement of classification performance, elevating the Area Under the Curve (AUC) to an impressive 99.3% and achieving an accuracy of 97%.This represents a marked improvement over previous methodologies, such as those reported by Liu et al. (2022), which attained an AUC of 90.5% and an accuracy rate of 84.3% by utilizing entropy values.Similarly, it outperforms the approach of Baroni and Bernardini (2006), which relied on the use of word or part-of-speech n-grams and achieved an accuracy rate of 86.7%.These comparisons underscore the efficacy of syntactic complexity measures in refining the precision and reliability of classification outcomes.The high AUC and accuracy achieved by our machine learning models substantiate the categorical distinction between translated and original annual reports in terms of syntactic complexity.By evaluating eight distinct machine learning models, we further establish the unequal performance of different algorithms in differentiating between translated and non-translated annual reports.The findings of this study reveal that the Naive Bayes algorithm frequently underperforms in comparison to the other models.This suboptimal performance can potentially be attributed to Naive Bayes' inherent assumption of feature independence (Domingos and Pazzani 1997).This assumption may not hold true in real-world datasets, where features often exhibit correlation, thereby leading to issues of underfitting or overfitting.The subpar performance of Naive Bayes has been observed in some datasets, particularly when the assumption of independence is violated and the features are significantly correlated (e.g.Nguyen and Kim 2021).Given the tendency of syntactic complexity features to be highly correlated (Lei and Shi 2023), this could partially account for the subpar performance of the Naive Bayes model.These findings underscore the necessity of judiciously selecting the appropriate machine learning algorithm based on the data's characteristics and feature relationships.
In this study, we developed an ensemble model by amalgamating the three most effective machine learning algorithms identified from a pool of eight candidates.Conventional wisdom and existing literature often posit that ensemble models, through their integration of diverse algorithms, are likely to outperform individual models due to their ability to capture a broader range of patterns and reduce overfitting (Opitz and Maclin 1999;Rokach 2010).Contrary to these expectations, our findings indicate that the ensemble approach did not necessarily surpass the predictive performance of the best individual model in our experiment.This outcome echoes the observations made by Kuncheva and Whitaker (2003), who argue that the success of an ensemble is contingent upon the diversity and accuracy of the individual classifiers; if the constituent models are too similar or if one model is significantly more accurate than the others, the ensemble may not offer any substantial benefit.Moreover, the phenomenon where ensemble models fail to improve results has been acknowledged in studies such as (Sagi and Rokach 2018), which suggest that the complexity of combining models does not always translate to higher accuracy, particularly when the individual models are already highly optimized.
The present study makes valuable contributions to financial reporting by utilizing computational tools to analyze translated texts, a method that is typically not used in traditional translation studies.The application of machine learning techniques to assess syntactic complexity provides a quantitative way to analyze translation, which could supplement the more traditional methods often used in translation studies.Machine learning models applied in the study improve with better data analysis and a better understanding of the task at hand.This interdisciplinary approach could pave the way for future research that combines these two fields, leading to more comprehensive and nuanced insights.In addition, by comparing the syntactic complexity of translated and non-translated corporate annual reports, this study provides empirical evidence about how translation affects syntactic structures.These insights hold the potential to significantly contribute to the evolution of new theories that delve into the ways translation affects text structures.Understanding these dynamics may, in turn, have noteworthy implications for the actual practice of translated financial reporting.
Our study also has practical implications for text classification and translation practices.Text classification models often rely on features like word frequency and document length.This study introduces syntactic complexity as a new kind of feature that can be used to improve the accuracy of these models.By incorporating syntactic complexity measures into the feature extraction stage, it might be possible to distinguish between different kinds of texts more accurately.Syntactic complexity is a key factor affecting how easy a text is to read and understand.By providing insights into the syntactic complexity of corporate annual reports, this study could help organizations improve the readability and comprehensibility of both their original and translated annual reports, making them more accessible to a wider audience, and thus yield more positive economic outcomes.

Conclusion
The aim of this study was to examine whether features of syntactic complexity could effectively distinguish between translated English annual reports and their non-translated counterparts.The comparison encompassed four dimensions of syntactic complexity, revealing distinctive characteristics in translational language, which, as a mediated form of communication, sets it apart from non-translated language in the context of annual reports.This study's findings not only successfully demonstrate the applicability of syntactic complexity in discerning between translated and non-translated text but also highlight the efficacy of advanced methodologies, such as machine learning models, in contrast to traditional statistical procedures.
The significance of this research lies in its contribution to new empirical evidence and enhanced understanding, specifically within the realm of translated English.It advances financial reporting by applying computational analysis to translated texts, introducing a quantitative approach to assess translation's impact on syntactic complexity.This novel method enhances traditional translation studies and can improve machine learning model

RQ2:
If the answer to the first question is affirmative, which syntactic complexity features perform best in the classification?RQ3: Do ensemble models outperform individual models in the classification task?

Figure 1 .
Figure 1.Overview of methodological approach employed in the study.
Figures 2, 3 and 4 depict the syntactic complexity measures differentiating between the two sub-corpora within chairman's statements (US-C and CM-C), MD&A (US-M and CM-M), and NCFS (US-N and CM-N), respectively.Figure2illustrates that CM-C is generally higher in the mean length of clauses, mean length of T-units, coordinate phrases per clause, coordinate phrases perT-unit, complex

Figure 2 .
Figure 2. Syntactic complexity measures between the two sub-corpora of chairman's statements: (a) mean length of clause; (b) mean length of clause; (c) mean length of T-unit; (d) clauses per T-unit; (e) complex T-units per T-unit; (f) dependent clauses per clause; (g) dependent clauses per clause; (h) coordinate phrases per clause; (i) coordinate phrases per T-unit; (j) T-units per sentence; (k) complex nominals per clause; (l) complex nominals per T-unit; (m) verb phrases per T-unit.

Figure 3 .
Figure 3. Syntactic complexity measures between the two sub-corpora of MD&A: (a) mean length of clause; (b) mean length of clause; (c) mean length of T-unit; (d) clauses per T-unit; (e) complex T-units per T-unit; (f) dependent clauses per clause; (g) dependent clauses per clause; (h) coordinate phrases per clause; (i) coordinate phrases per T-unit; (j) T-units per sentence; (k) complex nominals per clause; (l) complex nominals per T-unit; (m) verb phrases per T-unit.
, achieving an AUC of 96.88%.The SVM, Neural Network, and kNN models follow closely, each achieving an AUC above 95%.Although Deep Learning ranks lowest among the models, it still attains an AUC exceeding 91%.The high AUC values suggest that the length of production units serves as a highly discriminative feature.A comparison of means reveals that the CM-C sub-corpus typically has longer production units than the US-C sub-corpus.In terms of coordination within chairman's statements, SVM ranks highest among the models, achieving an AUC of 92.11%, with the Neural Network model trailing closely at 90.5% AUC.For the dimension of subordination within chairman's statements, Logistic Regression again stands out, achieving the highest AUC of 88.82% among the eight models.Finally, when assessing phrasal sophistication, Deep learning tops the list with an AUC of 98.1%.Logistic Regression, SVM, Neural Network,

Figure 4 .
Figure 4. Syntactic complexity measures between the two sub-corpora of NCFS: (a) mean length of clause; (b) mean length of clause; (c) mean length of T-unit; (d) clauses per T-unit; (e) complex T-units per T-unit; (f) dependent clauses per clause; (g) dependent clauses per clause; (h) coordinate phrases per clause; (i) coordinate phrases per T-unit; (j) T-units per sentence; (k) complex nominals per clause; (l) complex nominals per T-unit; (m) verb phrases per T-unit.

Table 1 .
The distribution of the companies by sectors in the corpora.

Table 2 .
Description of USAR and CMAR corpora and their sub-corpora.

Table 4 .
Descriptive statistics for the syntactic complexity measures of chairman's statements.

Table 5 .
Descriptive statistics for the syntactic complexity measures of MD&A.

Table 6 .
Descriptive statistics for the syntactic complexity measures of NCFS.AUC values of the top seven models exceed 98%, indicating high performance.Regarding accuracy, Logistic Regression leads with an impressive 96.5% accuracy rate.It is closely followed by SVM, Neural Networks, and Random Forest, which achieve accuracy rates of 96%, 95.5%, and 95.5%, respectively.For the dimension of subordination within the NCFS, the Gradient Boosting model outperforms the others, achieving an AUC of 98.58%.All eight models demonstrate strong performance, with AUC values exceeding 96%.In terms of coordination within the NCFS, the Neural Network model ranks highest, achieving an AUC of 98.81%.The kNN, SVM, and Logistic Regression models follow, with respective AUC values of 98.71%, 98.66%, and 98.44%.Notably, six out of the eight models have AUC values higher than 96%.Finally, when assessing phrasal sophistication, the SVM model ranks highest among the eight models, achieving an AUC of 97.61%.The Neural Network, kNN and Deep Learning models rank second, third and fourth respectively, each achieving an AUC above 97%.Remarkably, all eight models display strong performance in this dimension with AUC values exceeding 95%.We designed and compared two ensemble models to analyze NCFS.The first ensemble

Table 7 .
Results of machine learning models on chairman's statements.

Table 8 .
Results of machine learning models on MD&A.

Table 9 .
Results of machine learning models on NCFS.

Table 10 .
Results of ensemble models.