Interpretable Models for the Potentially Harmful Content in Video Games Based on Game Rating Predictions

ABSTRACT Studies reported that playing video games with harmful content can lead to adverse effects on players. Therefore, understanding the harmful content can help reduce these adverse effects. This study is the first to examine the potential of interpretable machine learning (ML) models for explaining the harmful content in video games that may potentially cause adverse effects on players based on game rating predictions. First, the study presents a performance analysis of the supervised ML models for game rating predictions. Secondly, using an interpretability analysis, this study explains the potentially harmful content. The results show that the ensemble Random Forest model robustly predicted game ratings. Then, the interpretable ML model successfully exposed and explained several harmful contents, including Blood, Fantasy Violence, Strong Language, and Blood and Gore. This revealed that the depiction of blood, the depiction of the mutilation of body parts, violent actions of human or non-human characters, and the frequent use of profanity might potentially be associated with adverse effects on players. The findings suggest the strength of interpretable ML models in explaining harmful content. The knowledge gained can be used to develop effective regulations for controlling identified video game content and potential adverse effects.


Introduction
Since their introduction around 40 years ago, video games have become one of the most popular leisure technologies globally. With their various platforms, video games have reached many different levels of players. According to market analysis, the widespread popularity of video games produced 159.3 billion dollars in 2020 (Field Level Media 2020). According to a survey in 2020, 75% of Americans had at least one video game player in their household. In total, 214.4 million people played video games, of whom 51.1 million were kids (under 18) and 163.3 million adults (above 18). Further, the average age of players ranged from 35 to 44. Regarding the devices owned by players, 73% played on a game console, 43% on a handheld system, 29% on a virtual reality (VR) device, and 25% on a mobile VR device. Most players said that video games offered mental stimulation (80% of respondents) and relaxation (79% of respondents) for the human body (ESA 2020).
Nowadays, the increasing popularity of video games has attracted numerous researchers. Most researchers focus on the player-video game interactions (Caroux et al. 2015) and user experience (Bernhaupt and Mueller 2016;Boyle et al. 2012). Another community of researchers focuses on game development and design (Duarte, Silveira, and Battaiola 2017;Engström et al. 2018). Despite the popularity of video games, considerable discussion remains about their potential positive and negative effects on individuals and society. Several studies have shown positive outcomes of video games, especially educational games. These studies reported that playing video games was connected with a range of positive outcomes with respect to perceptual, cognitive, behavioral, affective, and motivational factors (Boyle et al. 2016). Other positive effects of video games have also been reported, such as enhancing the academic achievement of students (Karakoç et al. 2020), critical thinking (Mao et al. 2021), and positive feelings (Quwaider, Alabed, and Duwairi 2019).
Adverse effects of gaming have also been reported. Lee et al. (Lee, Kim, and Choi 2021) found that playing violent games correlated with physical and verbal aggression. Further, in their experimental study on the effects of playing violent, sexist video games, Gabbiadini et al. (Gabbiadini et al. 2016) identified that playing such games reduced males' empathy for female victims of violence. This reduction in empathy arose because the video games increased masculine beliefs, such as confidence in being a "real" man, dominant, and aggressive. Furthermore, violent games are associated with significant impacts on blood pressure and appetite perceptions, which can increase the risk of hypertension and weight gain (Siervo et al. 2013). Other adverse effects, such as game addiction (Gros et al. 2020), anger and hostility (Lee, Kim, and Choi 2021), gaming disorders (WHO 2018), and hallucinations (Griffiths 2005), have also been reported. In a specific group of gamers, Nguyen and Landau (T. Nguyen and Landau 2019) reported that excessive gaming was strongly associated with social isolation and depression in children. In another survey, lack of physical movement, eyesight disorders, and anxiety (Sălceanu 2014) were reported as adverse effects. As previously mentioned, most video game players are adolescents. Many experimental studies have shown that playing video games enhances adolescents' aggressive behavior (J. Y. Li, Du, and Gao 2020). Adolescents who play for an excessively long time were also found to develop depressive, musculoskeletal, and psychosomatic symptoms (Hellström et al. 2015). In their experimental study on the effects of emotional arousal on swearing fluency, Stephens and Zile (Stephens and Zile 2017) found that the swearing fluency of adults was strongly associated with raised emotional arousal after playing a shooter game.
In order to avoid the negative effects of video games, various organizations have proposed a game rating system to control their harmful content and to advise consumers about the games they want to play. Parents use these guidelines to control what video games can be played by children and adolescents. Depending on differences in society, culture, and political aspects, several organizations have proposed game rating systems: the Pan European Game Information (PEGI) for Europe (PEGI 2021), the Entertainment Software Rating Board (ESRB) for North America (ESRB 2021), the Australian Classification Board (ACB) in Australia (ACB 2021), the Office of Film and Literature Classification (OFLC) in New Zealand (OFLC 2021), the Computer Entertainment Rating Organization (CERO) in Japan (CERO 2021), the Media Development Authority (MDA) in Singapore (MDA 2021), and the Game Rating Board (GRB) for South Korea (GRB 2021). Based on their classification methods, such rating systems can minimize children's and teenagers' access to possibly harmful content. The findings of various studies support this claim. A rating system helps parents protect their children from the adverse effects of video games (Felini 2015). Furthermore, Laczniak et al. (Laczniak et al. 2017) reported that the kids of parents who used the game rating system tended to play less violent games and were less likely to be engaged in negative actions at school.
Despite the promising results of game rating systems for minimizing the negative consequences of video games, the rating systems do not explain which types of harmful content may potentially these adverse effects. To the best of our knowledge, this issue has not been investigated elsewhere. Therefore, understanding the potentially harmful content in video games is essential, as it can provide early warning information with which to evaluate a game's content. Also, the knowledge gained from this study is beneficial because it (a) allows policy-makers to evaluate the policy decisions about the harmful content in video games (Laczniak et al. 2017), (b) allows the game developers to create an optimal gaming profile for a specific group of users based on game rating systems (Hamid and Shiratuddin 2016), and (c) allows researchers to confirm existing knowledge regarding harmful content in video games (Langer et al. 2021).
Explainable artificial intelligence (XAI) is a relatively new technique that explains the underlying processes in ML models in a way that humans can understand (Barredo Arrieta et al. 2020). Various studies have started to take advantage of this technique. In experimental studies, Parsa et al. (Parsa et al. 2020) leveraged the XAI technique to explain the occurrence of traffic accidents using several types of real-time data, including traffic, network, demographic, land use, and weather features. Chakraborty et al. (Chakraborty, Başağaoğlu, and Winterle 2021) employed the XAI technique to explain the inflection points in the climate predictors of hydro-climatological data sets. The XAI technique has also been utilized in the medical field. For example, it has delineated the area of tumor tissue in patches extracted from histological images (Palatnik de, Rebuzzi Vellasco, and Da Silva 2019) and explained the occurrence of Parkinson's disease in a public data set of 642 brain images of Parkinson's patients (Magesh, Myloth, and Tom 2020). Although previous studies have demonstrated the promise of the XAI technique regarding interpretability analysis, no study has used it to examine video games. Current research studies attempt to obtain metrics with the highest prediction accuracy (Alomari et al. 2019) but lack a thorough analysis of the harmful content in video games. The fact that no game studies focus on explainability has also been raised in previous review studies (Barredo Arrieta et al. 2020;Tjoa and Guan 2020). Our study addresses these research omissions identified in previous experimental and review papers.

Research questions and hypotheses
This paper aims to examine the potential of an interpretable ML model for explaining the harmful content in video games that may potentially cause adverse effects on players based on a multi-class classification of game ratings. The hypotheses of this study are twofold.
First, this study comprised empirical experiments with the supervised ML models to predict the well-known public ESRB game rating system. Specifically, this study compared the ensemble and non-ensemble ML models to understand their performance in predicting ESRB game rating systems.
Secondly, based on the comparison results, this study utilized the best ML model to explain the potentially harmful content that may cause adverse effects on players using global and local interpretability analysis.
This study notes that the terms content and feature are semantically identical. The former is used to explain a video game's content descriptors or harmful content, and the latter is usually applied to explain the technical term in the machine learning field. These two terms are used interchangeably throughout the paper.
The rest of this paper is organized as follows. Section 2 presents a short literature review on XAI. Section 3 provides the methodology of this study, while Section 4 presents the results and discussion of the experiments. Finally, Section 5 answers the research questions and summarizes the essential findings and implications of this work.

Explainable artificial intelligence
In the literature, the term explainability of artificial intelligence or explainable artificial intelligence is often misused or confused with other terms, such as interpretability (Tjoa and Guan 2020), explainability (Guidotti et al. 2018), comprehensibility (Fernandez et al. 2019), and transparency (Lipton 2018).
Explainability refers to explaining the reason behind the prediction of a specific machine learning model that humans can understand, and such explanations can be used to formulate new assumptions or to validate existing knowledge (Belle and Papantonis 2021;Linardatos, Papastefanopoulos, and Kotsiantis 2021;Lipton 2018).
A published review study classified the explainability of artificial intelligence techniques based on scope, methodology, and model usage (Das and Rad 2020). In scope, explanations can be local or global, and some methods can be applied to both. Locally explainable methods represent the individual feature attributions of a single instance of input data from the all-data population and show a user why a specific choice was made. The study's examples of local explanation are Activation Maximizations, Saliency Map Visualizations, Layer-Wise Relevance Backpropagations (LRP), Local Interpretable Model-Agnostic Explanations (LIME), and SHapley Additive exPlanations (SHAP). In contrast, globally explainable methods attempt to understand each feature that contributes to how the model makes its choice over all of the data. Examples of global explanation are Global Surrogate Models, Class Model Visualizations, LIME Algorithm for Global Explanations, Concept Activation Vectors (CAVs), Spectral Relevance Analyses (SpRAy), Global Attribution Mapping, and Neural Additive Models (NAMs).
Furthermore, the principal algorithmic concept behind the explainable model can be classified based on implementation methodology. Typically, both local and global explainable algorithms can be classified as either backpropagation-based or perturbation-based methods. In backpropagation-based methods, the explainable algorithm performs one or more forward passes through the neural network, then produces attributions using partial derivatives of the activations during the backpropagation stage. Examples of backpropagation-based include Saliency Maps, Gradient Class Activation Mapping (CAM), Salient Relevance (SR) Maps, Attribution Maps, and Desiderata of Gradient-Based Methods. On the other hand, perturbation-based methods aim to change the feature set of a given input instance by utilizing occlusion, partly switching features with filling operations or generative algorithms, masking, and conditional sampling. In most cases, a single forward pass is sufficient to build attribution representations, and back-propagating gradients are not required. Examples of perturbation-based methods are Deconvolution Nets for Convolution Visualizations, Prediction Difference Analysis, Randomized Input Sampling for Explanation (RISE), and Randomization and Feature Testing.
A well-built explainable method with a defined scope and approach can be integrated inside the neural network model or used as an external algorithm for explanation at the model usage or implementation level. Any explainable algorithm that is dependent on the model architecture falls into the modelintrinsic group. Model-intrinsic algorithms are specific models, which means explainability is built into the neural network architecture and cannot be transferred to other architectures. Model-intrinsic examples are Trees and Rule-Based Models, Generalized Additive Models (GAM), Sparse LDA, and Discriminant Analysis. Conversely, model-agnostic post-hoc explanations are not dependent on the model architecture and can be implemented with neural networks that have previously been trained. Post-hoc methods are frequently used in various input modalities, including photos, text, and tabular data.

Why SHapley Additive exPlanations (SHAP)?
SHAP is an explainable method based on game theory, and it provides a powerful and insightful measure of a feature's relevance in a model (Parsa et al. 2020). The SHAP technique is a feature attribution method that allocates a value to each feature for each prediction, making it easier to evaluate the prediction result. The method ensures feature consistency and model stability, significantly improving the original Shapley Value estimation method (Meng et al. 2021). The global and local explainability of the interpretable ML model can be analyzed by using the SHAP frameworks. SHAP can reveal the global order of importance of the predictor variables in estimating the target (i.e., game rating categories) and highlight the local dependencies or interactions among the independent and dependent variables. SHAP can also quantify the independent variables' inflection points that trigger the prediction (Chakraborty, Başağaoğlu, and Winterle 2021).
In the last year, SHAP has successfully been implemented in numerous areas, such as traffic accidents (Parsa et al. 2020), climatology (Chakraborty, Başağaoğlu, and Winterle 2021), medicine (Palatnik de, Rebuzzi Vellasco, and Da Silva 2019), human decision support systems (Knapič et al. 2021), and anomaly detection (Antwarg et al. 2021). Recently, Moscato et al. (Moscato, Picariello, and Giancarlo 2021) compared SHAP and LIME performances in explaining the predictions of ML models. They found that SHAP achieved statistically more reliable results. These findings are in line with previous studies that reported SHAP outperformed the LIME method in terms of robustness (Antwarg et al. 2021).
Furthermore, SHAP's ability was also proved to boost the prognostic performance and confirm its value in AI-based reliability research. Nor et al. (Nor et al. 2021) utilized the SHAP technique to explain gas turbine prognostication. They revealed that the SHAP technique could improve prognostic performance, aspects that have not been considered in the literature of prognostic and health management-XAI. They found that the gas turbines' prognostic findings improved by up to 9% in root mean square error and 43% in early prognosis due to SHAP. They improved the prognostic performance by using the best set of features according to contribution order from the SHAP summary plot. In short, the previous studies have proved the ability of the SHAP technique to explain the predictions of ML models. For this reasonand taking into account the research omissions identified by the previous studies described in the last section -this study proposes using the SHAP technique for explaining the harmful content in video games that may cause adverse effects on players based on game rating prediction.

Materials and methods
The ESRB data set was sourced from the Kaggle website for data science and machine learning (Kaggle 2021). The data set contained the title of the game, 30 content descriptors, and 4 game rating categories (i.e., everyone [E], everyone 10+ [E10+], teen [T], mature 17+ [M17+]). A detailed description of the game rating categories or game rating classes and game content descriptors are given in Appendixes A and B. PlayStation and Xbox were the most used video game platforms. The training data set comprised 1,895 games with 30 ESRB content descriptors, and the testing data set included 500 games with 30 ESRB content descriptors. The experiments were performed on a Windows 10 platform with a 16 GB graphics processing unit (GPU), 256 GB of SSD storage, a 1.80 GHz Intel Core i7 processor, and 8 GB of RAM. The Python environment (version 3.7.6), Scikit-learn. and the Keras library were used to develop algorithms. Finally, the SHAP framework was used for the global and local interpretability analysis.

Development of machine learning models and interpretable models
The development process of ML models and interpretable models is displayed in Figure 1. The first step was data processing, in which all the features (independent variables) from the ESRB data set and the game rating classes (dependent variables) were combined. Then, the data were converted into binary values (1 or 0). Table 1 shows a matrix of input features, where the columns represent the presence (1) or absence (0) of a particular feature in a specific game. The column of the game rating class represents the output and indicates whether a game is classified as E, E10+, T, or M17 + .
The complete data set was divided into a training and a testing data set, which were utilized to predict the performance of the ML models in estimating the game rating categories. The next step was model creation. In this step, several ML models were developed, including ensemble models (i.e., random forest [RF], gradient boosting [GBoost and XGBoost]) and non-ensemble models (i.e., logistic regression [LR], naive Bayes [NB], and deep neural multilayer perceptron classifier [DL]). To achieve the best performance of the algorithm, the hyper-parameter technique was performed using the randomized search cross-validation (CV) technique (Bergstra and Bengio 2012). Appendix C presents the randomized search CV technique results. Moreover, each model was executed 10 times using 10-fold CV to obtain the best model for the interpretability analysis step. Four metrics were used, including accuracy, precision, recall, and the F1 score. The accuracy is the percentage of correct instances classified by the model. The precision is the number of instances that fit the given class and the instances classified into that class, while the recall or sensitivity describes the true positive prediction rate. The F1 score or F1 measure describes the classification accuracy concerning the average precision and recall values. F1-score values closer to 1 indicate a better classification accuracy. The evaluation metrics are calculated in (1), (2), (3), and (4): where TP is a true positive, FN is a false negative, and FP is a false positive. Finally, the global and local interpretability analyses were performed to explain the results.

Performance analysis of predictive ML models
This section presents the comparative performance of ML ensemble (i.e., RF, GBoost, and XGBoost) and non-ensemble models (i.e., LR, NB, and DL). The model details and the configuration-based randomized search CV are depicted in Figure 2. All models were evaluated using four metrics on the test data, as depicted in Table 2.
The overall performance revealed that RF model outperformed the other models in predicting game rating categories. The RF showed the best performance compared with other models with 84.90% overall performance. Experimental findings revealed that the performance of ensemble learning (i.  e., RF) is better than the other models. This finding is in line with the previous results that reported the supremacy of ensemble models (Y. Li and Chen 2020; Kiziloz 2021; K. A. Nguyen et al. 2021). Thus, our study used the RF model for the next section of global and local interpretability analysis using the SHAP framework.
In terms of accuracy metrics, Table 2 shows that the XGBoost performed best, with a prediction accuracy of 84.60%, and both GBoost and RF achieved a comparable performance of 84.20%. In other studies, XGBoost exhibited higher prediction accuracy than RF in predicting PM 2.5 concentrations in the air using satellite and meteorological data (Zamani Joharestani et al. 2019). However, Kabiraj et al. (Kabiraj et al. 2020) found that RF outperformed the XGBoost model in predicting breast cancer risk. Further, Lu et al. (Lu et al. 2021) reported that RF performed better than GBoost in predicting false invoicing feature identification and risk prediction. Contrary, Golden, Rothrock, and Mishra (Golden, Rothrock, and Mishra 2019) found the opposite results, reporting that the GBoost model outperformed the RF model in predicting the prevalence of Listeria spp. in pastured poultry farm environments.
Regarding precision, recall, or sensitivity and F1 score metrics, XGBoost outperformed the RF model for predicting mortality of patients with acute kidney injury in recall and F1 metrics. Also, Kardani et al. (Kardani et al. 2021) reported that XGBoost performed better than RF for predicting slope stability (i.e., the condition of inclined soil slopes to withstand movement) in all metrics. On the contrary, the RF model outperformed the XGBoost model in precision. Additionally, the GBoost model outperformed the RF model in predicting landslide susceptibility mapping in all metrics (Liang, Wang, and Jan Khan 2021).
Our results and the findings of previous studies demonstrated that the models, such as RF, GBoost, XGBoost, and DL, achieve a comparable performance. Although the findings showed the comparable performance of the models, the authors noted that the performance of algorithms is influenced by various factors, such as the model's complexity and configuration and the quality of data.

Evaluation results of the global interpretability analysis
This section presents the results of the global interpretability analysis using the RF model utilizing the SHAP technique. As depicted in Figure 3, the SHAP for the global interpretability analysis uncovered the relative order of the importance of features (Blood > Fantasy Violence > Strong Language > Blood and Gore). For example, the ML model pushed the rating predictions higher (i.e., higher Shapley values for the output) when Blood, Fantasy Violence, Strong Language, and Blood and Gore were high. Such a representation of the underlying physical processes shows that models can reveal meaningful physical interactions between the features (independent variables) and the game rating classes (dependent variables).
The essential values from the SHAP global interpretability analysis for the rating classes are shown in Figure 3. The results indicate that the influence of a feature slightly differs in each prediction class (i.e., E, E10+, T, M17+). In prediction class E, the order of the most important features was Fantasy Violence > Blood > Blood and Gore > Suggestive Themes, while, in class E10 +, the order was Fantasy Violence > Blood > Blood and Gore > Strong Language. In class T, the model revealed an order of Blood > Strong Language > Suggestive Themes > Violence. On the other hand, Strong Language > Blood and Gore > Blood and Sexual Themes were the features with the highest importance in prediction class M17 + .

Evaluation results of the local interpretability analysis
To further investigate the prominent feature interactions that drive the game rating prediction, this study performed a local interpretability analysis, as depicted in Figures 4, 5, 6, and 7. Figure 4 (a) shows the global interpretability analysis for prediction class E. Almost all features (y-axis) and the SHAP values (x-axis) are on the positive side (a low value in a blue dot), indicating that lower values of these features drive the prediction of class E. Fantasy Violence and Blood have a strong influence on the positive side when the feature value is low. Thus, it is expected that a low percentage of harmful content (i.e., Fantasy Violence and Blood) in a game leads to the game being rated E. The local interpretability analysis also gives an overview of the influence of each feature on the model prediction, as depicted in Figure 4 (b). For the right y-axis, 1 means the presence and 0 the absence of a feature.
For the left y-axis, a positive SHAP value means that it pushes the model toward predicting an E rating, and a negative SHAP value means that lower values of the feature contribute negatively to predicting an E rating. As can be seen in Figure 4 (b), the negative SHAP value of Blood and Fantasy Violence in a game contributes negatively to the game being rated E. In contrast, the positive SHAP value of Mild Fantasy Violence means that this feature substantially influences the model in deciding an E classification. In other words, the lower the percentage of Fantasy Violence and Blood and the higher the percentage of Mild Fantasy Violence in a game, the more likely the game is rated E. Blood, Blood and Gore, and Strong Language substantially impact the positive side when the feature value is low. In contrast, Fantasy Violence strongly affects the positive side when the feature value is high. This condition drives the model output toward predicting an E10+ classification. The local interpretability analysis in Figure 5 (b) reveals that Blood, Blood and Gore, and Strong Language contribute negatively to predicting E10 + . On the other hand, Fantasy Violence contributes positively toward predicting E10 + . Thus, it is expected that a lower percentage of Blood, Blood and Gore, and Strong Language and a higher percentage of Fantasy Violence both lead to the prediction of an E10+ grade. In other words, the lower the percentage of Blood, Blood and Gore, and Strong Language and the higher the percentage of Fantasy Violence in a game, the more likely the game is rated E10 + . As for the global interpretability analysis of class T, Figure 6 (a) shows that Strong Language affects the positive side when the feature value is low. In contrast, Blood, Suggestive Themes, and Violence significantly impact the positive side when the feature value is high. Their SHAP value is positive, meaning that this feature raises the prediction value and contributes to the T prediction. The local interpretability analysis also indicates that Strong Language contributes negatively to predicting T. On the other hand, Blood, Suggestive Themes, and Violence contribute positively toward predicting T, as depicted in Figure 6 (b). Thus, it can be inferred that the lower the percentage of Strong Language and the higher the percentage of Blood, Suggestive Themes, and Violence in a game, the more likely the game is rated T.
Finally, Figure 7 (a) and 7 (b) present the global and local interpretability analysis, showing that Strong Language, Blood and Gore, Blood, and Sexual Themes strongly influence the prediction of an M17+ rating. In other words, the higher the percentage of those types of content in a game, the higher the probability of that game being rated M17 + .

Discussion
This section discusses the analysis of the potentially harmful content in video games and compares it with findings of previous studies. The results of the global interpretability analysis show that the following features (in order of importance) contribute most to the game rating prediction: Blood > Fantasy Violence > Strong Language > Blood and Gore. When a video game has these types of content, the game is more likely to be classified into one of the game rating categories (i.e., E, E10+, T, M17+).
As for the local interpretability analysis, the results indicate that Fantasy Violence, Blood, and Mild Fantasy Violence are significant features for an E rating. Meanwhile, for the E10+ and T ratings, Fantasy Violence, Blood, Blood and Gore, Strong Language, Suggestive Themes, and Violence are found to be the essential features. On the other hand, Strong Language, Blood and Gore, Blood, and Sexual Theme are the common features for the M17+ rating. An examination of the interpretability analysis revealed several types of harmful content that might potentially relate to adverse effects on players. Such content included the depictions of blood, the mutilation of body parts, violent actions of human or non-human characters, and the frequent use of profanity (see Appendix B).
A limitation of our study is that the interpretability analysis was based on the ESRB game rating system, so other game rating systems were not explored (e.g., PEGI, ACB, OFLC, CERO). Future studies should use another data set to investigate whether our findings can be confirmed. Furthermore, in our study, the presence or absence of a game feature or content descriptor was expected to be a binary variable. However, in addition to the base presence of a content descriptor, the quantity of a content descriptor (e.g., how many violent scenes a game contains) may also influence the game rating prediction (e.g., M17+). This study abstained from exploring the number of content descriptors included in a game because of the qualitative nature of measuring a game's content types. Thus, future work should explore metrics that define the number of specific content descriptors included in a game (i.e., objectively estimating the amount of harmful content in a game) to investigate the impact of the quantity of game content on the model prediction. Another limitation is that this study examined only PlayStation and Xbox game platforms. Thus, similar studies using different game platforms, such as mobile platforms, may be conducted in the future.
This study does not claim that our findings, the harmful content, will directly harm players; instead, based on data analysis, this study emphasizes that harmful content frequently appears in games and that it may potentially have an adverse effect on players. However, more experimental studies, such as studies using questionnaires and interviews, are needed to investigate the direct effects of harmful content.
Overall, it is found that interpretable ML models display promising results regarding the harmful content in video games. This technique successfully captures the underlying process in the ML model and how it constructs predictions. The interpretable ML models are believed to satisfy specific curiosities, aims, hopes, requirements, and needs regarding artificial systems (Langer et al. 2021). This study demonstrates that interpretable ML models can reveal potentially harmful content in video games. Combining global and local explanations presents an accurate picture of the real-world game rating system and offers a simple explanation for human understanding.

Conclusions
The main goal of this study is to examine the potential of an interpretable ML model for explaining the harmful content in video games that may potentially cause adverse effects on players based on a multi-class game rating classification. This study employed the ESRB game rating system using the interpretable ML model. In total, 1,895 games and 500 games with 30 ESRB content descriptors were used to test the model.
The first hypothesis was examined through the performance of ensemble and non-ensemble ML models using the hyper-parameter technique (i.e., randomized search CV and 10-fold CV). The results showed that ensemble models (i.e., RF) outperformed the other models in predicting game rating categories. Therefore, the RF was chosen for the interpretability analysis, resulting in an accuracy of 84.20%, a precision of 85.00%, a recall of 84.00%, and an F1 score of 84.00%.
The second hypothesis was examined using the global and local interpretability analysis of the SHAP framework for the RF model. The global interpretability analysis revealed several types of harmful content in the following order: Blood > Fantasy Violence > Strong Language > Blood and Gore. When a video game contains these elements, it drives the model to predict one of the game rating categories (i.e., E, E10+, T, M17+). However, to assess the importance of an individual feature, a local interpretability analysis should be conducted. The local interpretability analysis quantified the essential inflection points in each predictor that drives the model's prediction, finding high Mild Fantasy Violence and low Fantasy Violence and Blood rates for the E game rating. Further, a lower rate of Blood, Blood and Gore, and Strong Language and a higher rate of Fantasy Violence both lead to predicting an E10+ game rating. In comparison, a lower rate of Strong Language and a higher rate of Blood, Suggestive Themes, and Violence drive the model to predict a T rating. For the M17+ rating, Strong Language, Blood and Gore, Blood, and Sexual Themes significantly influence the prediction. Our analysis confirmed that the feasibility of this interpretable ML model was enhanced when the models were coupled with the global and local interpretability analysis. An examination of the interpretability analysis revealed several types of harmful content that might potentially relate to adverse effects on players. Such content included depictions of blood, the mutilation of body parts, violent actions of human or non-human characters, and the frequent use of profanity.
In general, the results showed that the interpretable ML model could successfully identify several types of harmful content that may cause adverse effects on players of video games. These findings demonstrate the strength of this technique in explaining the harmful content in video games. Also, interpretable ML models can provide new insights for stakeholders (e.g., domain experts, parents, teachers, game developers, and policy-makers) and forge a better integration of video game research and applications. The SHAP framework offers valuable insights for explaining the results from an advanced algorithm, such as RF. The technique can evaluate the importance of a feature and track and elucidate the complex and detailed impacts on the model's output. In particular, the different effects of various types of content on the game rating prediction provided essential information that cannot be obtained by the game rating systems themselves. Finally, the knowledge gained from this study can help several stakeholders, such as those evaluating policy decisions, in developing effective regulation to mitigate the adverse effects of video games in real life, understanding the existing knowledge regarding the harmful content in video games, and creating optimal gaming profiles for specific groups of users.
As future works, this study would like to incorporate our analysis into the positive side of video games using the same method, explaining the positive content in video games that may cause constructive effects on players based on game rating predictions. Our study will also be extended to other video games platforms, such as PC and Mobile games. In this way, we could help understand the positive effects of a group of users in a broader way.