A lexicon-based method for detecting eye diseases on microblogs

ABSTRACT This paper explored the feasibility of detecting eye diseases on microblogs. A lexicon-based approach was developed to provide an early recognition of common eye disease from social media platforms. The data were obtained using Twitter free streaming Application Programming Interface (API). A cluster analysis was applied to extract instances that share similar characteristics. We extracted three types of emotions (positive, negative, and neutral) from users’ messages (tweets) using SentiStrength. A time-series method was used to determine the applicability of predicting emotional changes over a period of seven months. The relevant disease symptoms were extracted using Apriori algorithm with prediction accuracy of 98.89%. This study offers a timely and effective method that can be implemented to help healthcare decision makers and researchers reduce the spread of eye diseases in a population specific manner.


Introduction
Social media mining has been widely applied in many sectors, such as commerce, healthcare, and education, to help decision makers understand how to transform data into knowledge for decision support (Sarsam et al. 2021a(Sarsam et al. , 2020a. The use of social media sites (e.g., Twitter and Facebook) has been found to offer an effective way for studying users' emotional experiences. Such experiences have been found to be important for detecting various events. Health-related topics published on social media sites have been extensively regarded as a rich resource for researchers seeking to understand health changes in a population over time (Ru and Yao 2019). An example would be glaucoma which is one of the leading causes of blindness globally (Ban, Siegfried, and Apte 2018). Common eye disease such as cataract and glaucoma are two conditions that affect the neuroretinal rim of the nerve (Maetschke et al. 2019). However, the association between eye disease and users' emotions is not fully studied. One reason to this can be linked to the difficulty in mapping the structural and functional abnormalities when a person experience a range of eye conditions (Dervisevic et al. 2016). This has motivated the need for developing mechanisms capable of detecting various eye disease. For example, Mehta et al. (2021) developed a multimodal model to automate glaucoma detection. The authors trained a multimodal model with multiple deep neural nets (trained on macular optical coherence tomography volumes and color fundus photographs) based on demographic and clinical data. The detection results showed an accuracy of 97%. Kim, Cho, and Oh (2017) introduced a machine learning approach capable at diagnosing glaucoma by using several feature selection techniques. The authors found that random forest achieved the highest performance (98%) for the detection of glaucoma.
Another study by Ahn et al. (2018) used a deep learning technique to diagnose glaucoma using 1542 images. These images were used to build simple logistic classification and convolutional neural network that achieved a performance of 99%. According to Stefan et al. (2020), different machine learning, deep learning and transfer learning techniques are likely to offer different solutions for retinal image analysis. Despite these studies, there is a limited use of textual information shared in social media sites for the diagnosis of glaucoma disease. This study examined the potential of using time series and association rules mining in the detection of common eye disease from social media texts. We used Twitter to extract and analyze users' sentimental features in an attempt to categorize disease-related emotions.

Literature Review
Online sentiments in the form of tweets or messages can provide valuable information about users' state. The changes in users' sentiments and emotions over time have been used by many studies to investigate various emergent phenomena (Su et al. 2017). This motivated many researchers to study changes in user sentiments in an attempt to describe various health conditions and needs (Denecke and Deng 2015). Public's opinion/reaction to personal circumstances can be mapped with the help of sentiment analysis. The Twitter platform is once source of data that contain emotions represented in a text form. The texts/ tweets are used to explain users' experiences and their opinions toward certain events and topics (Kessler and Schmidt-Weitmann 2021). However, in order to make a good use of online opinions shared between users on social media sites, a number of analytical procedures should be carried out (Chen, Hossain, and Zhang 2020). In a healthcare context, analyzing users' sentiments is a complex process which requires a lot of information to be retrieved and annotated (Ahmet and Abdullah 2020). This has motivated many scholars to examine health-related content available on Twitter in an attempt to support healthcare decision-makers. For example, a study by Sarsam, Al-Samarraie, and Al-Sadi (2020) used sentiment analysis technique to analyze social media content in the detection of diabetes using a set of machine learning models. The authors developed a heuristic mechanism to efficiently detect diabetes-related incidents and outbreaks on Twitter. The result showed that certain terminologies were highly linked to users' emotions. Along the same line, another study by Sarsam et al. (2020b) examined the possibility of mapping certain emotional types and climatic factors in the detection of migraine disease. The authors demonstrated the potential of using the Twitter platform as a reliable real-time data source to aid in early-stage disease recognition. The literature (e.g., Sarsam et al. 2021a) also showed the role of sentiments, in the form of tweets, in detecting suiciderelated contents. The authors used a complex analytical procedure to extract and characterize six types of emotions: anger, fear, sadness, joy, positive, and negative. They used a semi-supervised learning approach via the YATSI (Yet Another Two-Stage Idea) classifier to predict suicide-related tweets through the identification of certain emotions such as fear, sadness, and negative.
Based on these observations, it can be concluded that the use of social media platform can be efficient in characterizing users' emotions related to their health conditions. This can help us develop a robust mechanism for diagnosing diseases at the very early stage. We develop a lexicon-based approach to provide an effective means for extracting emotions related to common eye disease. We used time series to determine the applicability of predicting emotional changes over a specific period of time. Both NRC Affect Intensity Lexicon and SentiStrength were used to cluster users' emotions. It is believed that this method can help provide robust and more efficient estimates of common eye diseases on Twitter.

Proposed Method
Our method and procedure are summarized in Figure 1. This procedure consists of data collection, data pre-processing, cluster analysis, emotion extraction, association rule mining and time series, and disease detection. These stages are explained in the following sections.

Data Collection
A total of 2,448,250 English tweets were collected within a time span of seven months (October 2020 till April 2021). We collected messages in relation to the common eye diseases using the Twitter free streaming Application Programming Interface (API). The search keywords consisted of: Refractive Errors; Age-Related Macular Degeneration; Cataract; Diabetic Retinopathy; Glaucoma; Amblyopia; and Strabismus. Then, the data pre-processing stage was conducted to produce reliable data for the analytical stage.

Data Pre-processing
In this study, several pre-processing methods were used to provide the necessary resources for the recognition of various eye diseases. We applied the bagof-words model to extract words associated to a sentiment level form the collected tweets. After that, all the extracted words were converted to a lowercase form before deleting the unnecessary words via the Stopwrods list technique. Lastly, the length of the tweets was normalized using the L2 norm.

Cluster Analysis
We performed cluster analysis to obtain semantic labels for the collected queries. We clustered the data using a density-based method via the K-means algorithm by fitting normal distributions and discrete distributions within each produced cluster. Finally, two ophthalmologists (with 14 years of experience) were involved to assess these terminologies together with their relation to various eye diseases. The embedded emotions were extracted from each cluster using the lexicon-based method.

Emotion Extraction
In order to extract users' emotion from their tweets, lexicon-based method was implanted via SentiStrength (Khaira et al. 2020;Sarsam et al. 2021b). In this sense, scores were assigned to each tweet in which the score ranges from '+1ʹ for 'not positive' to '+5ʹ for 'extremely positive' and '-1ʹ for 'not negative' to '-5ʹ for 'extremely negative.' Based on these scores, we labeled the tweets with +5 as 'Positive' tweets, −5 as 'Negative' tweets, and −1/+1 as 'Neutral' tweets (Culpeper et al. 2018;Thelwall 2017). After that, time series prediction was implemented in each cluster to examine periodical changes in users' emotions.

Association Rule Mining and Time Series
To extract the emotional features in each tweet (i.e., words), we used the Apriori algorithm to define patterns within a set of items and establish a meaningful relationship between the data features. We set the delta value of the Apriori algorithm at 0.05 to reduce the support until a minimum support is reached. The minimum metric score was set at 0.9, while the upper bound and lower bound support were set at 1.0. After that, the emotional changes during the data collection time span were captured and analyzed using the time series technique.

Disease Detection
To detect the glaucoma using the extracted emotional features, three classification algorithms were compared to find the best detection algorithm. These were: Deep learning algorithm (Witten et al. 2016), sequential minimal optimization (SMO) or support vector machine (SVM) (Al-Ajeli, Alubady, and Al-Shamery 2020), and Bagging classifier (Kabiraj et al. 2020;Sarsam, Al-Samarraie, and Alzahrani 2021). These algorithms were implemented using Weka software (Waikato Environment for Knowledge Analysis) using settings similar to previous experiments (Al-Samarraie et al. 2018;Al-Samarraie, Sarsam, and Guesgen 2016). The stratified tenfold cross-validation method was applied to assess the learning process of the three algorithms. Finally, the best algorithm was selected using several evaluation metrics, including Accuracy, Kappa statistic, and Confusion matrix, as per the recommendations of previous studies (Al-Samarraie, Sarsam, and Guesgen 2016;Sarsam et al. 2021a).

Cluster Analysis and Emotion Extraction Results
The results for the k-means showed three different clusters (groups) with instances in each group sharing similar characteristics. To label each cluster, two ophthalmologists named the first, second, and third cluster as 'Symptoms,' 'Lifestyle,' and 'Advice,' respectively. Figure 2 exhibits the result of SentiStrength (positive, negative, and neutral) in each of the three clusters where negative emotions were found to be dominant over (with highest score of 91%) the positive (5%) and neutral (4%) emotions. In second cluster (lifestyle), positive emotions were found to be the dominant type (score of 94%) as compared to the negative (3%) and neutral (3%) emotions. In the third cluster (advise), neutral emotions were found to be dominant (96%) over the positive (2%) and negative (2%) emotions. From these, it can be said that identifying the polarity of emotions in the tweets can be useful in labeling training samples. Figure 3 shows the highly associated rules in each cluster solution. In this sense, Figure 3a outlines the rules that are highly associated with the main symptoms of eye diseases in the form of eye pain, nausea, blurred vision, headache, and vomiting. We can also note that some of the rules were highly associated with the patients' lifestyle. An example is the type of food consumed by the affected users such as eggs, carrots, milk, and broccoli. The figure also reveals another set of rules that were highly associated with users seeking information (advise) about their eye disease such as information about the types of eye screening and eye drops for glaucoma.

Association Rule Mining and Time Series Results
To further understand users' emotional reactions in each cluster, emotional changes were extracted together with their temporal features, which were analyzed using time series in the period of seven months (see Figure 3b). From the figure, it can be observed that negative emotions were the only type that faced a substantial increase in number of tweets comparing to other types of emotions. The increasing number of negative emotions was located in the cluster that contained symptoms of eye disease.

Disease Detection Results
Our classification results are summarized in Table 1 which shows that deep learning algorithm achieved the highest classification accuracy (98.89%), followed by SMO (65.13%), and Bagging (57.43%). The classification results also revealed that deep learning had the highest kappa statistic value (97%) compared to SMO (73%), and Bagging (60%), respectively. In contrast, the Bagging classifier produced the highest RMSE value (89%) followed by SMO (45%), and deep learning (3%). In addition to the previous evaluation metrics, we also used the "Confusion matrix" approach to evaluate our algorithms by measuring the relationship between the predicted and the actual instances when demonstrating instances along the diagonal of the confusion matrix. Our results (see Figure 4) revealed that Deep learning classifier had the highest predictive capability between actual and predicted classes, i.e., 100%, for the three classes.

Discussions
The proposed lexicon-based approach in this study showed promising results in detecting common eye diseases. We found that certain types of emotions can be linked to certain eye conditions. The results obtained from NRC Affect Intensity Lexicon and SentiStrength revealed that negative, fear, and sadness sentiments were strongly associated with symptoms of eye diseases. It is believed that users' sharing of personal stories about their eye conditions on social media sites is driven by the need for information and support (Zhang and Zhou 2020). In addition, people in general are interested in similar stories submitted by other users, which can help them make informed decisions about their symptoms (Moorhead et al. 2013). The findings support the work of Jampel et al. (2007), who reported that glaucoma patients are likely to express fear-related emotions in relation to vision loss. Ciuraru (2016) stated that concerns about losing vision from glaucoma or other similar eye disease can increase people's anxiety and depression, which explains the negative sentiment that were found in our results. Based on this, it can be said that patients who exhibit fear of becoming blinds are likely to share negative emotions with others (Dorison et al. 2020). Our findings are also in line with the work of Stamatiou et al. (2021) who confirmed the associations between glaucoma patients and the negative sentiments which were characterized in the form of depression. The main topics shared between users on Twitter were mostly related to the symptoms of eye diseases. Several eye conditions were posted by social media users in relation to their eye disease such as dry eye, temporary loss of vision, eye pain, nausea, blurred vision, and halos. Other topics shared between social media users were related to the type of food consumed by people with certain eye diseases. The results also showed that another group of people were mostly interested in sharing topics related to eye disease prevention and treatment advice. We believe that these topics are essential in characterizing the types of emotions people express on social media sites. Our review of the literature showed that previous studies have frequently discussed certain eye symptoms (e.g., dry eye, head pain, and nausea) in relation to people experiencing early eye disease signs (Bartlett et al. 2015). The results also showed that people with certain eye conditions were very interested in learning about the types of food to take or to avoid when taking certain medications. This finding is in line with previous studies, which have shown that certain types of food and fruits can be effective in reducing intra ocular pressure in patients, as well as possessing an analgesic effect that is relative to the powerful opioid morphine, aspirin, and indomethacin (Rasmussen and Johnson 2013). Also, the consumption of fruits and vegetables rich in vitamin A, vitamin C, and carotenoids has been reported to help reduce the risk of glaucoma (Jabbehdari, Chen, and Vajaranant 2021). In addition, the consumption of blackcurrants was found to be helpful in reducing the eye fatigue among patients (Smeriglio et al. 2016). Patients were also found to seek advice about certain symptoms of eye diseases and treatment or preventive solutions. This information seeking pattern is probably the most common of all the patterns identified in this study. According to Smailhodzic et al. (2016), six groups of patients use social media: emotional, information, esteem, network support, social comparison and emotional expression. These groups use social media sites for improved self-management and control, enhanced psychological well-being, and enhanced subjective well-being, diminished subjective well-being, addiction to social media, loss of privacy, and being targeted for promotion. As a result, it can be said that users' queries about certain eye treatments or conditions can be effective in the prediction of eye disease. From the findings of this study, one can observe that the discussed topics on Twitter were highly associated with the literature on various eye diseases.

Implications
This study provides a first step for the use of sentiment analysis in extracting and characterizing eye diseases from Twitter. The use of emotions in the form of tweets was found to be useful in predicting the common symptoms of eye diseases. The proposed method contributes to the development of clinical decision support systems that aim at examining changes in users' vision based on certain sentimental features embedded in their social media posts. The proposed mechanism is cost-effective and can be effectively used in the detection of various eye diseases. Meanwhile, the proposed mechanism can be used not only for detecting specific eye diseases, but also for other diseases/ conditions. The use of time series and Apriori algorithm offers an additional way of building associations between tweets/emotions and should be used to complement traditional detection methods.

Limitations and Future Works
This study has several limitations. For example, we only used English tweets in the analysis and detection of eye diseases. The collection period of the data was limited to seven months in which three cluster solution were identified and evaluated. This study was also limited to certain types of emotions found in the collected tweets. The prediction process was limited to the use of Apriori algorithm to establish associations between the common symptoms of eye diseases and emotions. In addition, this study used and compared three classifiers for the prediction of eye diseases. Although the utilized classifiers were recommended by previous studies, other more complex classifiers can be considered in the future. Future work can also consider the use of different social media platforms to collect and retrieve different types of emotions capable of representing a larger population. Researchers are also encouraged to focus on a specific type of eye diseases in order to confirm the efficiency of our proposed approach. Finally, future studies are also invited to adopt our technique to diagnose other types of diseases on social media platforms.

Conclusion
This study proposed a lexicon-based approach the detection of eye diseases using tweets. We applied a cluster analysis to extract instances that share similar characteristics. Three types of emotions were extracted (positive, negative, and neutral) from users' messages (tweets) using SentiStrength. Changes in users' emotions were analyzed using time series. In addition, Apriori algorithm was used to extract the common symptoms of eye diseases from the tweets. The results showed that deep learning can be effective in detecting eye disease with an accuracy of 98.89%. The proposed approach can contribute to the development of intelligent health monitoring systems and clinical decision-support systems.

Disclosure Statement
No potential conflict of interest was reported by the author(s).