Current state of artificial intelligence applications in ophthalmology and their potential to influence clinical practice

Abstract Artificial intelligence (AI) has emerged as a major frontier in healthcare and finds broad range of applications. It has the potential to revolutionize current procedures of disease diagnosis and treatment, thus influencing the clinical practice. Artificial intelligence (AI) in ophthalmology, primarily concentrates on diagnostic and treatment pathways for eye conditions such as cataract, glaucoma, age-related macular degeneration (MDA) and diabetic retinopathy (DR). The purpose of this article is to systematically review the existing state of literature on the various AI techniques and its applications in the diagnosis and treatment of eye diseases and conduct an in-depth enquiry to identify the challenges in accurate detection, pre-processing of data, monitoring and assessment through various AI algorithms. The results suggest that all AI models proposed reduce the detection time considerably. The potential limitations and challenges in the development and application play a significant role in clinical practice. There is a need for the development of AI-assisted technologies that shall consider the clinical implications based on experience and guided by patient-centred healthcare principles. The diagnostic models should assist ophthalmologists on making quick and accurate decisions in determining the progression of various ocular diseases.


PUBLIC INTEREST STATEMENT
Artificial Intelligence (AI), a technology that enables machines and equipment to "learn" and adapt from their experience. AI-based platforms have obtained clinically acceptable diagnostic efficiency in the automated diagnosis of many retinal diseases. AI in ophthalmology concentrates primarily on illnesses like diabetic retinopathy (DR), glaucoma and macular degeneration related with age (MDA). Artificial intelligence (AI) is a technology that can significantly influence the field of ophthalmology in the coming decades. In this review, an in-depth enquiry to identify the challenges in accurate detection, pre-processing of data, monitoring and assessment through various AI algorithms is conducted. This study contributes to the literature on AI models by thorough methodical processes for applying diagnostic AI systems and its application for the diagnosis of retinal diseases using evidence-based systematic approach.

Introduction
Artificial Intelligence (AI) is the reproduction of the human thinking process by computers. It has a wide array of applications including but not limited to, law enforcement, finance, education and healthcare. AI platforms are used to replicate human intelligence and then made to complete tasks such as image and speech recognition, sentiment analysis and problem solving (Heidary & Gharebaghi, 2012;Hogarty et al., 2018). Traditionally, a set of instructions is given to computer to complete a task and allow machines to make independent decisions, i.e. without any programming, they undergo a task called Machine Learning (ML).
Machine learning is a process where a computer trains itself based on sample labelled data with a validation dataset, and a basic learning structure for chosen algorithms. There are three types of machine learning algorithms: Supervised machine-learning: Data sets are labelled and fed into the classifier for training 2. Unsupervised machine-learning: Data sets are not labelled initially, but sorted according to their differences or similarities and then trained 3. Reinforcement learning: Memory networking differentiates the connections between supervised and unsupervised learned information (Lee et al., 2017).
Algorithms that a computer uses to learn and classify data are termed as classifiers. Some examples of classifiers of artificial intelligence that include support vector machines (SVM) and neural networks are shown in Table 1. SVM is a popular supervised learning classifier ( Figure 1). In this classifier, we first plot each data value as a coordinate in n-dimensional space. Then, classification finds the hyper-plane differentiating different classes. SVMs are memory efficient and work well with a clear margin of separation. However, training time is proportional to input size and less accurate when different classes' overlap (Sevik et al., 2014).
Traditional supervised training algorithms suffer when training on huge amounts of data. To solve this problem, Deep learning (DL), a subset of AI is used. DL algorithms take inspiration from the functioning and structure of the human brain. Using DL algorithms, performance of neural networks increases rapidly. DL is based on Artificial Neural Networks (ANN), which are inspired from biological neural networks. The unique feature of ANNs is that each layer learns different features with different weights for different stimuli. When multiple layers are used, it is termed as deep learning. A Convolutional Neural Network (CNN) is a collection of deep neural networks, most commonly applied to analysing visual imagery.
At present, AI has applications in fields such as cancer, cardiology and neurology. Given deep learning's practical implications in image processing, our technology needs to improve to be able to process various other imaging modalities such as computed tomography scans and magnetic resonance imaging. While AI tools vary in functionality, their use raises questions in terms of their reliability because humans select the data used to train the AI program and that the potential for human bias may undermine the platform. This review directly explores the current state of artificial intelligence applications in ophthalmology and discusses about the current limitations and the future challenges in clinical practice.

Method of literature search
Detailed and systematic literature review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) guidelines, to identify all potential relevant studies using the PubMed ® , Springer ® , Medline ® , Scopus ® , Web of knowledge ® , and Inspec ® databases for reporting artificial intelligence in ophthalmology. The articles were searched with the search terms: "Artificial intelligence" and "ophthalmology" "Artificial intelligence" and "glaucoma", "Artificial intelligence" and "diabetic retinopathy", "Artificial intelligence" and "macular degeneration", "Artificial intelligence" and "cataract".

Inclusion and exclusion criteria
The primary goal of this systematic review was to include all published articles, which utilized AI platforms for automatic detection of retinal diseases. Figure 2 shows a flow chart of study selection process for the articles considered in the study for analysis. A search was performed using the keywords: Ophthalmology, artificial intelligence, CNN, machine learning, and deep learning. Inclusion and exclusion criteria of the articles are as follows: a) Exposure of interest: Only patients with confirmed ocular diseases were included in the study. b) Language: Only studies written in English were included. c) Reported outcomes: Only outcomes obtained using objective measures were included. d) Type of publication: Review articles were excluded.

Data extraction
Titles and abstracts have been screened for all identified studies. Irrelevant and duplicate papers were excluded, and a full text review was conducted to examine the remaining articles for compliance with requirements for inclusion and exclusion. The data extracted at this stage included the title, year of publication, authors, study objective, type of study, diagnostic criteria and selection criterion of participants, method, algorithm, results and performance metrics. The recent articles were handpicked, curated for the current year and subject to the same inclusion criteria. For articles cited in the results bibliographies, the same search strategy was used.

Results
A total of 773 articles were identified in the initial search performed. After screening the titles 454 article and abstract 296 articles were excluded, as per inclusion and exclusion criteria. Based on full text accessibility from the mentioned databases in method of literature 23 articles, which met all inclusion criteria were included for the review. Other 32 articles were included after searching bibliographies that highlighted the methods and algorithms of application, challenges and future perspectives of artificial intelligence in healthcare.

Building artificial intelligence models
Various imaging methods used in healthcare support AI platforms in diagnosis, such as Computed Tomography (CT), X-rays, fundus images, etc. Fundus images and Optical Coherence Tomography (OCT) scans are the most prevalent and widely available methods used. In order to create an AI model, picture information are first processed, trained, validated and assessed with corresponding classifiers as shown in Figure 3.

Data pre-processing
The images used are often obtained from multiple sources and to provide accurate guidelines for the classifier, some pre-processing steps need to be carried out on the data sets. Many preprocessing steps often involve converting the image data sets into greyscale or extracting a particular RGB channel to obtain a more detailed image. The green channel is preferred generally as it appears more contrasted compared to blue and red channels (Farooq & Sattar, 2015).

Training and validation
After pre-processing the images, the classifier is trained with labelled data (supervised-learning). To ensure better accuracy, datasets are divided into training and testing sets. While the classifier defines parameters for the model using the training data, a validation set (partitioned from the training set) is used to verify the model's performance and tune the parameters if necessary. To evaluate the trained model, the testing set is used.

Evaluation
A Receiver Operating Characteristic (ROC) curve shows the variation of the diagnostic ability of a binary classifier system as its discrimination threshold is varied (Hajian-Tilaki, 2013). It is created by plotting probability of detection against probability of false alarm. To assess a model in AI diagnosis, area under receiver operating characteristic curves (AUC) is widely used (Paulus et al., 2010). Effective models have an AUC value in the range of 0.5 to 1.

Applications in ophthalmology
Many studies have shown that deep learning models can nearly achieve and sometimes even exceed human performance (Al-Fadhili et al., 2017). In view of recent publications, almost 85%-90% of them focus on DR, MDA, glaucoma and cataract. The probability deviation map acts as a virtual environment filter that helps us know the problems of a patient while walking by demonstrating which regions of his vision have been affected 9 . Incorrect identification of intraretinal cystoid fluid regions in OCT images might result in the removal of existing cysts or ignore false positives, hindering the performance of the AI system (Al-Fadhili et al., 2017;Moura et al., 2017).

Diabetic retinopathy
A pooled assessment of 22,896 diabetes sufferers from 35 population-based studies (from 1980 to 2008), carried out in the USA, Australia, Europe, and Asia, showed that the prevalence of DR (in type 1 and type 2 diabetes) racked up to 34.6% with 7% vision-related retinopathy (DR) (Yau et al., 2012). Various health care professionals including ophthalmologists, optometrists, general practitioners and clinical photographers can perform DR screening. Screening procedures include direct ophthalmoscopic screening, dilated bio microscopy with hand-held lenses (90 D or 78 D), mydriatic or non-mydriatic screening, tele retinal screening and video recording. However, the problems of execution, the availability of human assessors and long-term economic sustainability are questioned in DR screening programmes.
DR is one of the most commonly researched fields of AI applications globally for visual impairment. The ability to recognize images of NN models allows users to use widespread fundus images for early identification. Gulshan et al. were the first to identify the applications of DL models for DR identification (Gulshan et al., 2016). They used supervised learning and very large fundus data sets to train a deep CNN (DCNN). The high number of input data sets resulted in a very good statistical performance with an AUC of 0.99. Chowdhury et al. developed a Random Forest classifier with five performance measures, which obtained an accuracy of 93.58% and an AUC of 0.893, which significantly outperforms the Naïve Bayes classifier (NBC) (Chowdhury et al., 2018).
To counter the shortage of specialists in the field, Hnoohom et al. used a local thresholding method to separate foreground region from background region to clearly identify the Optical Disc (OD) and exudates that in turn facilitated in making preliminary decisions, reducing the detection time (Hnoohom & Tanthuwapathom, 2017). Exudates in the layers of the retina obstruct vision and early detection helps to estimate the severity of the condition. Ravindraiah et al. (Krishnapuram & Keller, 1993) presented a framework for exudate detection using a Spatial Possibilistic C-means Clustering (SPCM) algorithm proposed by Krishnapuram et al. (Ravindraiah & Chandra Mohan Reddy, 2018) to avoid the problem of outliers and noise in the traditional Fuzzy c-means clustering algorithm (FCM). Takahashi et al. tried a different approach towards DR staging by using nonmydriatic posterior pole photographs to train a modified GoogLeNet DCNN (Takahashi et al., 2017).

Age-related macular degeneration
AMD is an important cause for impaired vision in the world's elderly population. The Age-Related Eye Disease Study (AREDS) classified no, early, intermediate and late AMD stages of AMD. The American Academy of Ophthalmology recommends that intermediate AMD patients be seen at least once every 2 years. 288 million patients are expected to be able to experience some types of AMD until 2040, with an intermediate AMD or worse of about 10%. The elderly population is in need of an urgent clinical system to screen these patients in tertiary eye care centres (Bogunovic et al., 2017;García-Floriano et al., 2019). AMD is a retinal disease that occurs when the macula is damaged and hence, central vision is lost. This occurs secondary to deposition of lipid in the macular region; these depositions are known as drusens. A gradient segmentation-based algorithm to identify the borders of the drusen is used in the detection and segmentation process. Anti-vascular endothelial growth factor therapy (anti-VEGF) prevents vascular endothelial growth, thus reducing the growth of new vessels in the macular region of the retina. Using ML to predict anti-VEGF injection requirements for AMD can reduce economic burden on patients. Bogunovic et al. used a random forest classifier to train Optical coherence tomography (OCT) images of patients on anti-VEGF medication to predict future requirements and obtained very solid AUC between 70% and 80% for the predictive model (Bogunovic et al., 2017). However, when a DCNN was trained with OCT scans regarding anti-VEGF injection requirements, Prahs et al. obtained better accuracy than using a RF classifier (Prahs et al., 2017). These studies proved to be an important step towards using image modalities to predict treatment intervals for medication for AMD.
Typically, stages of AMD were classified as none, early-stage, intermediate-stage and advanced (Philippe Burlina et al., 2017). Nevertheless, when a 2-stage classification system (none or earlystage and intermediate or advanced AMD) was used, the diagnostic accuracy was better than that of the 4-stage classification system (Schmidt-Erfurth et al., 2018) Various DL platforms have been used for the automatic detection of various abnormalities such as drusen and exudates.

Glaucoma
Glaucoma is a condition that is due to increase in intraocular pressure, which subsequently affects the optic nerve. When it is not detected early, glaucoma can lead to permanent visual loss . For individuals 40-80 years old, the global prevalence of glaucoma is 3.4%, and it is forecast that around 112 million people are impacted globally by 2040. The improvements of disease detection, assessing the progressive structural and functional harm, optimizing therapy for visual impairment, and precise long-term forecasts would be both welcome by clinicians and patients (Galilea et al., 2007;Wang et al., 2019). Glaucoma is an optic nerve disorder that is clinically manifested by enhanced optic nerve head (ONH) classified by excavation and neuroretinal edge erosion. However, because the ONH region differs by five, almost no Cup to Disk Ratio (CDR) describes pathological cupping, preventing the detection of a disease. The most common type of glaucoma is Open-angle glaucoma, which occurs when fluid does not flow normally out of the trabecular meshwork. Most glaucoma patients suffer from high intraocular pressure (IOP) which leads to and retinal nerve fiber layer defects with concomitant damage to the optic nerve, which leads to visual loss. Early diagnosis and automatic detection in older patients has proven to be highly beneficial. Apreutesei et al. (Apreutesei et al., 2018) attempted to establish a relation between open-angle glaucoma and diabetes in patients. Fundus images were pre-processed and trained using a back-propagation algorithm. The feed forward neural model with parameters like cup-disc ratio, intraocular pressure, glycosylated haemoglobin levels attained an accuracy of 95%.
The second type of glaucoma is Angle-closure glaucoma (ACG) which might occur when the drainage space between the iris and cornea becomes too narrow (Apreutesei et al., 2018). This form of glaucoma causes a sudden increase in the intraocular pressure and is an ocular emergency. Niwas et al. (2016) (Galilea et al., 2007) proposed segmenting AS-OCT images based on the four major mechanisms causing ACG as the required medication differs for each mechanism. Morphological features were extracted from the AS-OCT images, features with minimum redundancy were trained using Neighborhood-Based Clustering (NBC), and 89.2% accuracy was obtained using a leave-one-out cross-validation method. However, Galilea et al. who implemented a multilayer perceptron ANN with backpropagation and analyzed nerve fibers using laser polarimetry obtained the most promising results . This allowed them to analyze more parameters and the final neural network model had an accuracy of 100%.
Diagnostic models built using both fundus images and OCT scans. Primary cause for inaccurate results is misalignment of optic disc on the fundus images. Li et al. in their study presented that DL can be applied to identify referable glaucomatous optic neuropathy with high sensitivity and specificity (Ran et al., 2018). These results demonstrate potential of clinical decision support software compared to human clinician accuracy tests, particularly given the practicality of the ability to recognize many specific referred diseases.

Cataract
A cataract is the excessive build-up of protein in the natural lens of the eye that leads to the opacification of the eye lens. Since this is an age-related phenomenon, it is common in older adults. Early detection and treatment at the appropriate stage can help in reducing the blindness caused by this condition. AI platforms using random forests and SVMs can accurately detect cataracts using fundus images. Ran et al. used an elaborate six-level cataract grading random forest based on the feature datasets generated by a DCNN (Long et al., 2017). The results show that random forests reduce the concussion of DCNN on smaller datasets and with an average accuracy of 90.69%. Long et al. classified and graded patients, with pediatric cataract using a CNNbased CAD framework (Almeida & De et al., 2015).
Other than the above-mentioned retinal diseases, AI systems can detect keratoconus (Almeida & De et al., 2015), to formulate plans for horizontal strabismus (Koprowski et al., 2016), to evaluate corneal power after myopic corneal refractive surgery (Xu et al., 2017), and to detect pigment epithelial detachment in polypoidal choroidal vasculopathy . In this review, we outlined studies on DR, AMD, glaucoma and cataracts using various DL techniques in Table 2.

Discussions
Automatic screening and diagnostics with AI assistance for common eye diseases may ultimately contribute towards maximizing the role of doctors in the clinic. Outside the clinic, AI platforms offer more health opportunities to patients and reduce barriers in eye care where an ophthalmologist is not available. To a certain extent, new AI-based technologies can reduce social inequalities (Khalid et al., 2018). AI-assisted systems will demonstrate the potential to alleviate the problems of the overcharged healthcare system.
In particular, there are primarily three steps in the method of automatically detecting a disease (Sengur et al., 2017;Yau et al., 2012). Firstly, a large amount of image collection is necessary and relative experts must label the characteristic lesions. Secondly, computers extract disease characteristics from the input of labelled pictures via a specific program. Lastly, the statistical characteristic of the target lesions can distinguish a specified picture from other types of disease.
Bourne et al. (Fleming et al., 2006) detailed upon the global prevalence of vision impairment. As of 2015, more than 200 million people worldwide were suffering from moderate to severe vision impairment. The age-standardized prevalence was highest among developing regions of the world like South Asia, western sub-Saharan Africa and North Africa Bengio et al. (2013),Ting Daniel et al. (2011. Visual defects and retinal diseases for people in these regions affect quality of life and economic opportunities.
The retina has a complex morphology and is among the most complex parts in the human body. AI systems are more reliable than a human counterpart in the long run with the accessibility of high-resolution scanners. Many retinal conditions must be critically evaluated because they are highly subjective. Some of the characteristics of diabetic eyes are so minute and rare that specialists will not check until and without reason (Bellemo et al., 2019). However, an AI platform can reliably and without partiality look at all these features. As far as the concerns that doctors might lose their employment owing to automation are concerned, automation will allow more Most automated diagnostic studies on retinal diseases concentrate on one problem at a time. However, this is perhaps not the case in current clinical cases where patients may have several retinal diseases. In order to enhance the application of various AI platforms, multiple conditions with high precision need to be detected (Elze et al., 2015). Use of multiple image methods confirm a given illness rather than just a single data source to further provide a concrete diagnosis. The classifiers are also dependent on the image quality (Abràmoff et al., 2016). Studies have shown that some algorithms such as DR, AMD, glaucoma, and cataract have been preliminarily created. However, with so many current reports, 100% accuracy and sensitivity are seldom achieved. In other words, not every image can be accurately identified or missed. The accuracy of the results obtained depends not only on computer technology but also on the quality of input pictures (Christopher et al., 2018;Elze et al., 2015). The primary variables leading to bad quality in segmentation of the images includes head and eyeball movement, undilated pupil, frequent blinking, opaque refractive medium and poor fixation. It is the basis of computer education. The annotators must therefore be trained to achieve a uniform standard Bengio et al. (2013),Ting Daniel et al. (2011.

Future challenges
Although the AI-based models are highly accurate in many ophthalmic diseases, the clinical and technological difficulties and the real-time use of these models in clinical practice remain numerous. In research and clinical settings, these challenges could arise at different stage. Many of the research have used comparatively homogenous populations of training sets. Retinal images for AI practice and testing often come under various variables, including field width, field of perspective, picture magnification, image quality and the ethnicity of participants (Gargeya & Leng, 2017;Mayro et al., 2019). Diversification of the information set could assist to tackle this issue in terms of ethnicity and imaging hardware. The restricted accessibility of high-level information for both unusual conditions (e.g., ocular tumours) and prevalent diseases that are not regularly imaged in clinical exercise is another challenge for the growth of AI models of ophthalmology (Zheng et al., 2012). Furthermore, the formation of an algorithm requires a lot of computational expense and preparation. This means that AI can only be useful for highly morbid illnesses. It might not be accessible for rare diseases. Secondly, the computer mechanically recognizes a structure or function so AI cannot recognize a disease that was totally separated from our procedure. There will be a small portion of characteristics and variety, which are uncommon (Nguyen et al., 2015). We conclude that AI can select vast majority, not all, people with a disease. Thirdly, this job is somewhat complex. The features of a disorder and the algorithm parameters vary from task to task. Finally, the machine may not construct a model if the connection between input and anticipated output is complicated. Most importantly, it can lead to an error. Nguyen et al. described the process of wrong classification of data by the neural networks. Even though AI can effectively perform a task, it is essential to have a certain level of human intervention during the process [55].

Research limitations
Current limitations of accurate diagnosis of retinal diseases are: a) Quality of the training sets: The labelled set of images tend to have low accuracy if the training set images do not have strong reference standards. b) Black Box dilemma: Most of the image recognition models use Convolutional Neural Network (CNN) based systems. Wherever a CNN analyses data, it follows some self-generated rules and is difficult to interpret decisions made by the algorithms. c) Incorrect diagnosis: CNN systems are very sensitive to even minor pixel-level changes in the images leading to inaccurate diagnosis. d) Image Quality: Current state-of-the-art models are accurate when detecting retinal diseases. However, they cannot recognize when and if an image does not contain retinal diseases. For example, they might confuse a central retinal vein occlusion with DR. Blurry or partial images might hinder the accuracy of model.

Conclusions
Artificial intelligence techniques like deep learning and machine learning have dramatically altered the healthcare sector. AI-based platforms have obtained clinically acceptable diagnostic efficiency in the automated diagnosis of many retinal diseases. In order to assess the clinical deployment and cost-effectiveness of various AI systems in clinical practice, future research is crucial. It is important to reuse existing and future methodologies to improve clinical acceptance of AI systems. Although challenges lie ahead, AI-based automated detection platforms are likely to influence the field of medicine and ophthalmology in the coming decades.

Disclosure Statement
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.