Combining Clinical Symptoms and Patient Features for Malaria Diagnosis: Machine Learning Approach

ABSTRACT Presumptive treatment and self-medication for malaria have been used in limited-resource countries. However, these approaches have been considered unreliable due to the unnecessary use of malaria medication. This study aims to demonstrate supervised machine learning models in diagnosing malaria using patient symptoms and demographic features. Malaria diagnosis dataset extracted in two regions of Tanzania: Morogoro and Kilimanjaro. Important features were selected to improve model performance and reduce processing time. Machine learning classifiers with the k-fold cross-validation method were used to train and validate the model. The dataset developed a machine learning model for malaria diagnosis using patient symptoms and demographic features. A malaria diagnosis dataset of 2556 patients’ records with 36 features was used. It was observed that the ranking of features differs among regions and when combined dataset. Significant features were selected, residence area, fever, age, general body malaise, visit date, and headache. Random Forest was the best classifier with an accuracy of 95% in Kilimanjaro, 87% in Morogoro and 82% in the combined dataset. Based on clinical symptoms and demographic features, a regional-specific malaria predictive model was developed to demonstrate relevant machine learning classifiers. Important features are useful in making the disease prediction.


Introduction
Machine learning (ML) is an emerging approach that has shown to be effective in making decisions and predictions from the large quantity of data produced by the healthcare industry. It learns from experience and detects valuable patterns from large, unstructured, and complex datasets to predict future incidences. Today, the biggest challenge in front of the healthcare industry is diagnosing disease with accuracy and at affordable costs. There is a massive amount of complex data available with the hospitals that can be used to extract useful information for diagnosis. The use of these data for future predictions can be done with the help of data mining. The health-care field generates a massive amount of data about clinical assessment, patient records, disease treatment, clinical follow-ups, and medication (Fatima and Pasha 2017;Iyer, S, and Sumbaly 2015). These massive data can improve health-care delivery when incorporated with machine learning techniques.
Accurate prediction of clinical outcomes is essential to successful decisionmaking and can lead to better patient care and disease management. For example, in malaria management, accurately predicting which patient should be prescribed a malaria medication and should undergo further checkups may prevent unnecessary use of malaria drugs (Menard and Dondorp 2017;Mwai et al. 2009). Apart from that, a lack of proper diagnosis might result in mismanagement of other diseases that have related symptoms to malaria. Given common behavior on self-medication with malaria drugs and challenges in the health system in most low-income countries like Tanzania necessitate a machine, learning-based diagnosis model. In addition, the model can assist in correctly diagnosing malaria for patients who cannot get a laboratory-based diagnosis.
The use of ML for malaria diagnosis is not necessarily the right solution. For example, a better solution would be to have rapid malaria diagnostics tests at pharmacies to ensure only malaria patients or those with an anti-malaria prescription are given anti-malarial drugs. However, the rapid tests would be costly for pharmacies and require administration by trained pharmacists or personnel, who may not be available in rural/remote areas. A cheap but effective tool for determining possible malarial status is therefore needed. The ML-based diagnostic tool could be one such tool. Different studies have shown how machine learning assisted other areas of the health-care system (Artificial Intelligence in Healthcare: Past, Present and Future, 2017; Davenport and Kalakota 2019; Khare et al. 2017;Shailaja, Seetharamulu, and Jabbar 2018; Sidey-Gibbons and Sidey-Gibbons 2019; Triantafyllidis and Tsanas 2019). Recently, supervised learning algorithms have been applied in various studies to diagnose malaria (Fuhad et al. 2020a;Madhu 2020;Masud et al. 2020;Muthumbi et al. 2019;Poostchi et al. 2018;Yang et al. 2019). Despite the successful application of machine learning in disease management, most of these applications focus on microscopic image analysis to detect malaria while neglecting that most health facilities do not have a microscope, and patients treat themselves. Noninvasive-based methods such as machine learning are reliable and efficient to classify healthy people and people with malaria. While several studies (Bibin, Nair, and Punitha 2017;Das et al. 2013;Femi Aminu, Onyebuchi Ogbonnia, and Shehi Shehu 2016;Fuhad et al. 2020b;Madhu 2020;Masud et al. 2020;Patil, Yaligar, and Meena 2018;Pillay et al. 2019;Rajaraman et al. 2018;Rajaraman, Jaeger, and Antani 2019;Shekalaghe et al. 2013;van Driel 2020) have suggested that using clinical symptoms in prediction of malaria is not a practical idea, the experiments performed in this study proved the feasibility of using clinical symptoms and patients' demographic information to predict malaria using machine learning classifiers.

Related Work
Malaria shares similar symptoms with other febrile diseases such as dengue fever, typhoid fever, common cold, respiratory tract infection, dyspepsia, and pneumonia (Abba et al. 2011;Crump et al. 2017;Nadjm et al. 2010). Parasitological tests, in the form of microscopic and rapid diagnostic tests (RDT), are the recommended and standard tools for diagnosing malaria (WHO 2019(WHO , 2020(WHO , 2021. However, in areas where parasitological tests for malaria are not readily available, the complexity of malaria diagnosis may lead to misdiagnosis, overdiagnosis, and inappropriate presumptive treatment (D'Acremont et al. 2009;Gosling et al. 2008;Graz et al. 2011;Isiguzo et al. 2014;UM 2016). As specified by WHO, in situations such as rural areas where there is no parasitological test available within 2 hours of presenting for treatment in medical centers, medical doctors can provide a prognosis using a clinical examination and physical examination to treat suspected patients (WHO 2015(WHO , 2021WHO-Guidelines. 2015). Consequently, suspected patients would be presumptively treated. A clinical diagnosis of malaria is traditional among medical doctors. This method is the least expensive and most widely practised. A clinical diagnosis called presumptive treatment is based on the patients' signs and symptoms and physical findings at the examination. The earliest symptoms of malaria are very nonspecific and include fever, headache, body weakness, chills, dizziness, abdominal pain, diarrhea, nausea, vomiting, anorexia, and pruritus. With the clinical diagnosis, misdiagnosis is possible due to a lack of sufficient knowledge about significant malaria symptoms (other than shivering, fever, and sweating) and non-malaria related factors for clinical diagnosis of malaria (Bria, Yeh, and Bedingfield 2021). Presumptive treatment could increase the use of unnecessary anti-malarial drugs, which have side effects and increase the spread of resistance to the drugs (Attinsounon et al. 2019;Chipwaza et al. 2014;Debora and Moses 2017;Hertz et al. 2019;Kazaura 2017;Mwita et al. 2019).
Apart from that, there is a major tendency of self-treatment/medication with over-the-counter medication when malaria-related symptoms are observed. Based on the studies done in Tanzania, it was observed that drugdispensing shops still sell non-prescription medications frequently, although it is advised that the anti-malarial medications should be administered after a parasitological confirmation of the disease dispense prescription-only treatments (Michael and Mkunde 2017;Ndomondo-Sigonda et al. 2004). This could lead to disease mismanagement, drugs resistance, and drug shortage (Grobusch and Schlagenhauf 2019;Mboera, Makundi, and Kitua 2007;Metta et al. 2014;Mwai et al. 2009;Wang et al. 2019). In the efforts of eliminating these issues, the government of Tanzania has established a "not every fever is Malaria" campaign, which aims to educate people that not every fever episode experienced is a malaria case (Baltzell et al. 2019), since there are other diseases such as typhoid, dengue, chikungunya, and urinary tract infections that present the same symptoms as malaria (Goodyer 2015). The significance of these issues was a substantial drive to develop a malaria prediction model using patients' symptoms and demographic information. Machine learning techniques have been used as tools for predicting the risk of diseases such as heart disease, diabetes, brain stroke, liver, thyroids disease, and brain cancer (DB, P, and N 2018;Habib et al. n.d.;Kim, Choo, and Chang 2021;MS, E, and J 2020;Priyadarshini, Dash, and Mishra 2014;Rao and Renuka 2020). In malaria diagnosis, machine learning has been used from diagnostic tools to the prediction of disease presence using patient symptoms and signs. Over the past decade, malaria research has been done in the areas of diagnostic testing (RDT) and microscopy, specifically the automation of these tools (Brown et al. 2020;Dharap and Raimbault 2020;Ford et al. 2020;Ravalji, Shah, and Nai 2020;Shekalaghe et al. 2013). These studies elicited how machine learning can assist in the reading of microscopic blood smear images to diagnose malaria and automate the complete blood count, which is the test that screens infection in the blood. The performance of machine learning in the automation of these tools has improved, and classifier prediction accuracy has shown potential (Fuhad et al. 2020b;Lee, Choi, and Shin 2021;Masud et al. 2020;van Driel 2020). Despite the promising results of these studies unavailability of a microscope and mRDT in some of the health facilities in constrained areas and the selfmedication behavior of some of the patients (Bibin, Nair, and Punitha 2017;Das et al. 2013;Liang et al. 2017;Madhu 2020;Masud et al. 2020;Muthumbi et al. 2019;Poostchi et al. 2018;Rajaraman et al. 2018;Rajaraman, Jaeger, and Antani 2019) remain the major challenge.
On the other hand, several machine learning studies have used malaria symptoms, signs, and patient information to diagnose malaria. For example, the study done by (Bria, Yeh, and Bedingfield 2021) used malaria symptoms and non-symptom factors to diagnose malaria. It showed potential good prediction accuracy if the combined significant features were identified. However, these studies do not specifically identify significant or important symptoms, notwithstanding their contribution to malaria diagnosis improvement. Furthermore, other studies that used malaria symptoms to diagnose malaria used data mining techniques such as rule-based classification, which are considered weak in classification (A., n.d.; Bbosa, Wesonga, and Jehopio 2016;Oguntimilehin n.d.).
In Tanzania, most of the studies have been done in diagnostic testing (RDT and microscopy; (Mpapalika and Matowo 2020;Mwanga et al. 2019aMwanga et al. , 2019b. A malaria diagnosis study using symptoms and patients demographic features has never been done in Tanzania. This study aims to fill this important gap in malaria research in Tanzania since the country has settings where diagnostic tools are unavailable and self-treatment is over the chart.
The findings of this study can be used to raise public awareness on the potentiality of using machine learning in classifying malaria patients by developing a simple tool that will be used before administering antimalaria drugs. Apart from that, the study can raise public awareness of significant malaria symptoms and patient features in the diagnosis of malaria at early stages within Tanzanian societies vulnerable to malaria and reduce the rate of self-medication and presumptive treatment in the country.

Theoretical Background
This study uses the most common supervised machine learning classifiers to build a malaria diagnosis model ( Logistic Regression (LR) is a robust and well-established method for supervised classification. It can be considered an extension of ordinary regression and can model only a dichotomous variable that usually represents an event's occurrence or nonoccurrence. This algorithm helps find the probability that the new instance belongs to a particular class. The outcome lies between 0 and 1 since it is a probability (Swaminathan et al. 2017;Ullah et al. 2019).
K-Nearest Neighbor (KNN) algorithm is a simple iterative method to partition a given dataset into a specified number of clusters, k. Several researchers across different disciplines have discovered this algorithm. The algorithm operates on a set of d-dimensional vectors, D = {xi | i = 1, . . ., N}, where xi ∈ Rd denotes the ith data point. The algorithm is initialized by picking k points in Rd as the initial k cluster. Techniques for selecting these initial seeds include sampling at random from the dataset, setting them as the solution of clustering a small subset of the data or perturbing the global mean of the data k times (Krishnani et al. 2019;Patil, Yaligar, and Meena 2018). Support Vector Machine (SVM) algorithm can classify both linear and nonlinear data. It first maps each data item into an n-dimensional feature space where n is the number of features. It then identifies the hyperplane that separates the data items into two classes while maximizing the marginal distance for both classes and minimizing the classification errors (Krishnani et al. 2019;Moreno-Ibarra et al. 2021;Ullah et al. 2019).
Decision Tree (DT) is one of the earliest and prominent machine learning algorithms. A decision tree models the decision logic, i.e., tests and corresponds outcomes for classifying data items into a tree-like structure. The nodes of a DT tree typically have multiple levels where the first or top-most node is called the root node. All internal nodes (i.e., nodes having at least one child) represent tests on input variables or attributes. Depending on the test outcome, the classification algorithm branches toward the appropriate child node, where the process of test and branching repeats until it reaches the leaf node. The leaf or terminal nodes correspond to the decision outcomes. DTs have been found easy to interpret and learn quickly and are a common component of many medical diagnostic protocols. When traversing the tree for the classification of a sample, the outcomes of all tests at each node along the path will provide sufficient information to conjecture about its class (Krishnani et al. 2019;Saranya and Pravin 2020;Swaminathan et al. 2017).
Random Forest (RF) is an ensemble classifier consisting of many DTs, similar to how a forest is a collection of many trees. DTs grown very deep often cause overfitting the training data, resulting in a high variation in classification outcome for a slight change in the input data. They are susceptible to their training data, making them error-prone to the test dataset (Azar et al. 2014;Chen, Liu, and Peng 2019;Iyer, S, and Sumbaly 2015).

Materials and Methods
This paper aims to develop the machine learning-based model to classify patients with malaria and those without malaria using their symptoms and non-symptoms factors. The machine learning-based model for malaria diagnosis development was structured in five stages, namely; (1) Dataset description and preprocessing, (2) Features selection, (3) Machine learning classifiers, (4) Cross-Validation methods and (5) Classifier's performance evaluation.

Study Area
Data were collected from four hospitals in two regions in Tanzania: Morogoro and Kilimanjaro (Figure 3). The four health facilities are Mawenzi regional hospital and Majengo health center in the Kilimanjaro region and Morogoro regional hospital, and Mzumbe health center in the Morogoro region. Dataset represents the patients who live in the areas with low malaria transmission represented by the Kilimanjaro region and those who live in the areas with high malaria transmission represented by the Morogoro region. The choice of these regions was based on the prevalence of malaria, where Morogoro represents regions with a high prevalence with (15.0%) of malaria prevalence and Kilimanjaro represents regions with low prevalence with (1.0%) of malaria prevalence.

The Method Used and Participants
A malaria patient's records extraction form was designed to summarize the MoH patient's file and the information collected when visiting the health facility. The records were retrieved from the patient's files who have been treated for malaria from 2015 to 2019. The aim was to identify the past state of clinical malaria diagnosis in the local health facilities and understand the standard practice in malaria diagnosis and treatment. The critical information collected was: (i) the patients' demographic information, (ii) the symptoms presented by the patient when consulting a doctor, (iii) the tests taken and results, (iv) diagnosis based on the laboratory results and (v) the treatment provided. Training nurses administered data collection, and all participants provided written informed consent.

Ethical Clearance
The study was approved by the National Institute for Medical Research Tanzania (NIMR) before the participants were recruited and records were collected. All participants were provided written informed consent to participate in the study. For the case of patient records, consent was given by the health facilities with guidance from NIMR.

Dataset Descriptions and Preprocessing
The malaria diagnosis dataset was used in this study to develop a machine learning model for malaria diagnosis. The dataset was obtained by extracting malaria patients' diagnosis records from the Tanzania Ministry of health's patient files in two regions in Tanzania: Morogoro and Kilimanjaro.
The original Malaria diagnosis dataset has a sample size of 2556 patients' records with 36 features. The targeted output variable has two classes representing patients with malaria (tested positive) and those with no malaria (tested negative). Instances that could lead to individual patients being located or identified were removed to maintain the confidentiality of the patient and ethical practice. Also, missing values were deleted from the dataset. Nominal features were encoded to conform to Scikit-learn and coded 1 for patients with malaria and 0 for patients with no malaria (health people).

Feature Selection
Three sets of features were generated from the malaria diagnosis dataset. The first feature set was derived from applying the features selection to a dataset of only Kilimanjaro (low endemic area) patients, the second from a dataset of only Morogoro (high endemic area) patients and the last from a dataset of both Morogoro and Kilimanjaro (combined areas) patients. Model-based feature selection method, which uses supervised machine learning algorithms to judge the importance of each feature in the dataset, was used in this study to select essential features. Feature selection is one of the vital processes for machine learning because including irrelevant features affect the classification performance of the machine learning model. Model-based feature selection has two approaches: feature importance and selection from the model to select the most significant features (Brodersen et al. 2011). Random Forest algorithm was used as a feature selection algorithm to determine important features from the Malaria diagnosis dataset. This algorithm used the tree-based strategies by naturally ranking and improving the purity of the node. Nodes with the most significant decrease in impurity happen at the start of the trees, while notes with the slightest reduction in impurity occur at the end of trees. Thus, a subset of the most important features was created by pruning trees below a particular node.
In both datasets, feature selection algorithms identified a large set of important features (up to 20 features). However, to minimize the complexity of the model, for the regional datasets, only the top 10 significant variables were selected, and for the whole malaria diagnosis dataset, only 15 features were selected. Both features were obtained from the feature selection methods and were employed for models' development. Nevertheless, it was observed that the ranking of these features was different among datasets where some features that were considered to be the most significant to one region were not as substantial to another region, as shown in Table 1. Apart from that, features specific to a particular region, for example, Joint Pain and Dizziness symptoms were only significant in the Kilimanjaro region, and Muscle Pain and Confusion were only important in the Morogoro region. From the malaria diagnosis combined dataset, the most important features are residence area of a patient, fever, age of the patient, general body malaise, visit date, headache, abdominal pain, backache, chest pain, sex of a patient, vomiting, confusion, dizziness, coughing and joint pain.

Prediction Classifiers
After the dataset had been described and preprocessed, features were selected based on different machine learning algorithms and the importance of every feature in the predictive variable was done. Then, machine learning classification algorithms were used to classify the patients with malaria and those who do not have malaria. The popular disease diagnosis machine learning classifiers, which are Logistics Regression (LR), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT) and Random Forest (RF), were used in model development. Finally, the machine-learning classifiers' performance for malaria diagnosis and feature selection was computed and compared to obtain the best performing model.

Machine Learning Classifiers Validation
The study used the repeated K-fold cross-validation (CV) method and four performance evaluation metrics. In repeated k-fold cross-validation, the data set was divided into k equal size of parts. The k -1 group was used to train the classifiers, and the remaining portion was used to check the outperformance in each step. The execution was repeated a number of times to attain the optimum results. The process of validation was repeated k times. The classifier performance was computed based on k results. For CV, different values of k were selected. In this experiment, k = 10 was used because of its good performance and recommendations in many pieces of literature. In the 10fold CV process, 70% of data were used for training, and 30% were used for testing purposes. The process was repeated ten times for each fold of the process. All training and test groups instances were randomly divided over Joint Pain the whole dataset before selecting and testing new sets for the news cycle. At the end of the 10-fold process, averages of all performance metrics were computed.

Machine Learning Model Performance Evaluation
Various performance evaluation metrics were used in this study to check the performance of the classifiers. First, a confusion matrix was used, and every observation in the testing set was predicted in precisely one box (Table 2). Two matrix approach was deployed because there were two (2) classes of malaria positive (1) and malaria negative (0). Moreover, it gives two types of correct predictions of the classifier and two classifiers of an incorrect prediction. Apart from that classification report was computed to get the classification accuracy, precision, recall and F1 score of the classifiers. From the confusion matrix, TP: predicted output as true positive (TP), it was concluded that the positive malaria subject is correctly classified and subjects have malaria. TN: predicted output as true negative (TN); it was supposed that a negative malaria subject is correctly classified and healthy. FP: predicted output as false positive (FP), it was concluded that a negative malaria subject is incorrectly classified as having malaria (a type 1 error). FN: predicted output as false negative (FN), it was concluded that a positive malaria subject is incorrectly classified that the subject does not have malaria as the subject is healthy (a type 2 error).

Classifiers Performance on Full Features with K-Fold Cross-Validation
In this experiment, the five-machine-learning classifiers were checked with 10fold cross-validation methods in full 35 features of the complete malaria diagnosis dataset as described in Table 3. While different parameter values were passed through classifiers, the mean of 10-fold methods was computed.
From this experiment with full features on a full malaria diagnosis dataset, Random Forest classifier showed overall good performance among other classifiers with a classification accuracy of 79%, AUC of 80%, sensitivity of 82%, specificity of 69%, precision of 71% and recall of 76% as shown in Table 3. The specificity value of Random Forest was 69% showing the probability that a diagnostic test was negative and the person does not have malaria. The decision tree classifier has demonstrated exemplary performance on Sensitivity of 85%, precision of 77% and recall of 76%. The K-Nearest Neighbor classifier has underperformed on the Specificity of 49% and AUC of 69% but scores the Sensitivity of 78%, precision of 71% and accuracy of 72%. The Support Vector Machine achieved an accuracy of 73%, specicity of 61%, precision of 71%, AUC of 74% and Sensitivity of 74%. Apart from that, the Linear Regression classifier achieved an accuracy of 75%, specificity of 57%, precision of 74%, AUC of 76% and Sensitivity of 77%. The performance comparison on AUC, Specificity and Sensitivity among the classifiers is shown in Figure 1.

Results of Classifiers Performance on Selected Important Features with K-Fold Cross-Validation (n = 10)
The model was developed considering only 10 important features selected during the feature engineering process. In this experiment, all models had high performance in all metrics compared to when the full features were used (Table 4). For the Accuracy and AUC, the Random Forest classifier has the best performance with an accuracy of 82% and AUC of 83%, followed by the Logistic Regression classifier with an accuracy of 76% and AUC of 78%. Random forest and Decision Tree classifiers have the best precision of 81% and 76%, respectively. These models confidently predict true negatives that 81% of the negative malaria prediction were healthy (with no malaria). Performance of different classifiers with on Full Features. For the classification of confident true positive that does not classify a sick patient as a healthy person, Decision Tree performed well with a Sensitivity of 85%, followed by Random Forest with Sensitivity of 84%. In this dataset, Random Forest had an F1 score of 81%. Support Vector Machine had the best performance on Specificity by 74%, while the KNN classifier performed the least in all aspects with the score of 72% accuracy, 70% AUC, 80% sensitivity, 60% specificity and 71% precision. It was also established that the Logistic Regression classifier's accuracy and AUC dropped after selecting the important features. The average accuracy and AUC dropped from 76% to 75% to 75% and 73%, respectively, as shown in Figure 2. This signifies that the dropped features dominated the predictive capacity of this classifier.

Results of Classifiers Performance on Selected 10 Important Features on Regional Datasets
The 10 selected important features from every regional dataset were checked on five machine learning classifiers with a 10-fold crossvalidation method, and computation of the average metrics was presented in Figure 3. The machine-learning classifiers were trained and tested in  phases with different features to see features that will bring the best performance. First, the classifiers trained and tested the three most important features. Then three important features were added, and the last four important features were fed. It was observed that the performance of the classifiers was good at the ten important features. Results of classification accuracy, AUC, Specificity, Sensitivity, Precision and F-1 score on different graphs were used for better demonstration. These performance metrics were computed automatically. In both two experiments Random Forest classifier has shown outstanding performance with 95% and 87% classification accuracy, 96% and 85% Sensitivity, 92% and 78% Specificity, 92% and 80% Precision, 97% and 86% AUC for Kilimanjaro and Morogoro, respectively. This classifier has outperformed all the other classifiers in all performance metrics. The Decision Tree classifier performed second best to Random Forest, and its performance in the Kilimanjaro dataset is better than in the Morogoro dataset. While the classifier archived well with 92% classification accuracy, 91% Sensitivity, 80% Specificity and 80% Precision in the Kilimanjaro dataset, its Specificity and Precision was poor by 67% and 68% in the Morogoro dataset.
For the Logistic Regression classifier, the classification accuracy scores, AUC and Sensitivity were good by 81%, 82% and 85%, respectively, for the Kilimanjaro dataset and 76%, 77% and 74% for the Morogoro dataset, respectively. On the other hand, the classifier had an unsatisfactory performance on Specificity 65% and Precision 65% in Kilimanjaro dataset and 68% Specificity, 67% Precision for Morogoro dataset. KNN performed well on the same metrics as Logistic Regression in all the datasets. Unlike Logistic Regression and KNN classifiers, Support Vector Machine classifier had a pretty good performance in all metrics for all the datasets, as shown in Table 5. The main aim of conducting these experiments is to create a machine learning model that can classify patients correctly with malaria from healthy patients based on the symptoms presented and some of the patient's demographic information. When the classification accuracy of the classifiers was compared between the regional datasets, Random Forest was found to be the best classifier with 95% accuracy for the Kilimanjaro dataset and 87% accuracy for the Morogoro dataset, as shown in Figure 4.
The sensitivity score of the classifiers in each dataset is shown in Figure 5. Random Forests and Decision Trees classifiers showed equal high performance of 96% Sensitivity in Kilimanjaro. As for Morogoro Random Forest classifier showed a good performance of 85% Sensitivity. The experiment also identified the harmonic mean between Precision and Recall (F 1 score), which tells how precise and robust the classifier incorrectly classified the true negative and true positive. As shown in Figure 6, Support Vector Machine classifier performed with 81% F1 score in Morogoro dataset and Random Forest Classifier performed with 95% F1 in Kilimanjaro dataset as shown in ROC plot in Figure 7. The summary of excellent performance metrics results and best classifiers are presented in Table 6.

Discussion
This study demonstrates the success of using supervised learning models in diagnosing malaria using patient symptoms and demographic features. However, overall, the ranking of the features was different among the regional datasets due to geographical location, which enhances the rate of disease transmission. These findings are aligned with the studies (Chandramohan et al. 2001;Ngasala et al. 2008;Nkumama, O'Meara, and Osier 2017;UM 2016) that indicated that malaria transmission depends on climatic conditions that may affect the number and survival of mosquitoes, such as rainfall patterns, temperature, and humidity.  Coughing and joint pain were significant for malaria diagnosis in Morogoro. Still, they have zero significance in Kilimanjaro, while dizziness and confusion are important in the diagnosis of malaria in Kilimanjaro and with no importance in Morogoro. A previous study conducted in Morogoro indicated that community perception associate coughing and joint pain are symptoms of Malaria (Mariki, Mduma, and Mkoba 2021).
It was also observed that some months of the year when patients visit the health facility with malaria-related symptoms are significant in malaria diagnosis. The months that are significant are either during the rain session or just after the rain session. This aligns with the guideline given by the WHO on malaria transmission behavior.
Six well-known machine learning classifiers such as Logistics Regression (LR), K-Nearest Neighbor (KNN), Naïve Bayes (NB), support vector machine (SVM), decision tree (DT) and Random Forest (RF), were used in cooperation with RF a feature selection classifier. In regional and combined datasets, Random Forest showed overall good performance compared to other classifiers with an accuracy of 79%, AUC of 80%, sensitivity of 82%, specificity of 69%, precision of 71% and recall of 76%. Furthermore, all models had a high performance with selected important features except the Logistic Regression recorded lower AUC and accuracy. In addition, in both regions, the Random Forest classifier has shown strong performance in predicting malaria. Although the random forest algorithm is considered a black box because the information is hidden inside the model structure, this study adopted it as a feature selection algorithm due to its robustness, execution speed, and intensive searching procedure. Similar findings were described in the studies conducted in Senegal, and Burkinafaso indicates that random forest is a promising classifier with high accuracy of predicting malaria using clinical symptoms (Harvey et al. 2021;Yadav et al. 2021). The best accuracy (%) and the best classifier The best Sensitivity (%) and the best classifier The best Specificity (%) and the best classifier The best precision (%) and the best classifier The best AUC (%) and the best classifier In a clinical setting, our study demonstrates that clinicians can use the model to detect new malaria cases provided that patients symptoms and demographic features are available. This aligned with the guidelines described by both WHO (WHO 2015(WHO , 2021 and Tanzania Mainland's malaria treatment guideline (Michael and Mkunde 2017) that for diagnosis of malaria to consider symptoms and demographics such as ages, Fever, location, headache, Joint pains, Malaise, Vomiting, Diarrhea, Body ache, body weakness, Poor appetite, Pallor and enlarged spleen as a diagnostic criterion.
Results of this study, however, are subject to certain limitations. First, our sample is restricted to patients' records extracted from the patients' files in the selected health facilities. More studies need to be conducted for the patients in different regions and health facilities. The additional potential limitation is the developed models were based on the data obtained in the four health facilities in two regions. Therefore, we can not generalize our results with the entire country population.
While several studies have shown that using clinical symptoms to predict malaria is not a practical idea, the strength of this study is using clinical symptoms and patients' demographic information to predict malaria using machine learning classifiers. Another strength is that the study dataset represents the patients who live in low endemic and high endemic areas. In addition, our dataset included medical records from patients files and surveys from the patients visiting the health facility.

Conclusion
This study developed a regional-specific malaria predictive model used in malaria diagnosis based on clinical symptoms and demographic data. The model will create a clinically based diagnosis system for malaria. Furthermore, this study demonstrates that using the right machine learning classifiers and important features for each dataset is useful in making the disease prediction. Overall, Random Forest has shown an outstanding performance in classifying sick malaria patients and healthy ones in both the low and high endemic areas. For future studies, our results are a necessary first step in designing a decision support system through the proposed model, which will be more suitable for people who cannot access the laboratory-based diagnosis tools or access the health facility before any treatment. Therefore, we recommend future studies include more regions and enlarge the dataset to improve the model's performance and inclusivity.

Disclosure Statement
No potential conflict of interest was reported by the author(s).