Dynamic Educational Recommender System Based on Improved Recurrent Neural Networks Using Attention Technique

ABSTRACT Most web-based educational systems contain some drawbacks, as compared to traditional classrooms. Particularly, it becomes difficult for teachers to guide students to choose an appropriate learning resource due to the large number of online learning resources. Meanwhile, student decisions make it more difficult to choose educational resources according to their circumstances. In this matter, the resource recommender system can be employed as an educational environment to recommend the educational resource advice for students, so that these recommendations can be coordinated to each student’s preferences and needs. This paper presents the resource recommender system as a combination of MLP, BiLSTM, and LSTM improved deep learning networks using the attention method. Compared to similar studies conducted using DBN networks and focus only on the near past interests and preferences of users, the proposed system provides higher accuracy and more appropriate recommendations considering current interests, in addition to the user’s long-term past interests. The proposed recommender system with accuracy of 0.96 and a loss of 0.0822 contains a better performance to recommend resources to students compared to other methods.


Introduction
Many years ago, the book was well known as the primary tool for education. Today's, however, the development of computers and the World Wide Web and the increase in heterogeneous information has created a sense of the need to design systems to generate the most meaningful recommendations, which simplifies selection and activity processes. Recommender systems have evolved over the past two decades (Adomavicius and Tuzhilin 2005;Cremonesi et al. 2011). By becoming the Internet to a comprehensive medium and the rapid growth of e-learning, users' expectations of these systems have links student achievements across different courses, ERS which take into account physical distance between students and use of ERS to motivate students to work continuously. (Muthukumar and Bhalaji 2020;Zhang et al. 2018) The use of recommenders when analyzing facts in the decision-making process is one of the basic elements that many people apply during this process (Prem and Vikas 2010). The development and improvement of existing systems are one of the current researches in the world. These developments are applied based on the continuous evolution of statistical methods, machine learning, artificial intelligence, data mining, and information retrieval (Prem and Vikas 2010; Manouselis et al. 2012). This paper aims to design an Educational Recommender System (ERS) that recommends the relevant resources to users based on their interests and features in the relevant dataset. Indeed, our recommender system contains individual features such as educational background, age, and so on, including the individual's interests in pre-clicked, the downloaded resources, and the user score given to each resource. Hence, such a system must be educated to be able to recommend new resources to users.
if there is an inherent structure that the model can exploit, deep neural networks are very efficient for this issue. For instance, both CNN and RNN have long employed the internal structure of machine vision or natural language. Because the nature of recommending textbooks depends on the time and long-term review of student performance, the sequential structure of sessions or report clicks is very appropriate for inferential errors in conventional or recursive models. In many of these methods, the same weight is considered in learning for all users' interests, and only the user's past information is used in learning. While in the present article, having a network that looks both backward and forwards can also cover changes in learner behavior and offer more up-to-date recommendations.
In the following, the sources and descriptions are reviewed on the basic methods on which ERS operates. Then, some examples of different ERS classifications are provided according to their specific characteristics and basic methods. In the next section, the proposed method is presented, which is a combination of the architecture of MLP deep learning networks. In the fourth section, the results of the implementation of the proposed algorithm based on accuracy and efficiency are surveyed. Ultimately, some future suggestions are provided in the fifth section.

Recommender Systems: A Review
In recommender systems, we encounter a set of users, options, and their transactions in the system. Options are entities that are recommended to users based on their content and user transaction with the system. These recommendations are related to various decision-making processes, such as buying an entity, listening to music, or reading online news. A recommender system usually focuses on a specific type of entity. For example, we can refer to an article or news item to provide valuable and practical recommendations for that particular type of entity. In recent years, research on recommender systems has increased compared to other information systems and methods (such as datasets with search engines) (Covington, Adams, and Sargin. 2016). Nowadays, many recommender systems are running, which are based on different approaches and methods. The methods are shown in Figure 1.

Deep Learning-based Methods
Currently, deep learning has revolutionized the structure of recommenders. It has attracted the attention of many researchers by coping with many of the barriers to traditional models and generating some quality recommendations. Deep learning can receive non-linear user-item relationships and display abstract representations of data at higher layers. Moreover, it can extract complex relationships within conceptual, textual, and visual data (Zhang, Yao, and Sun 2017). An example of a beautiful feature of neural networks and deep learning is that they are end-to-end differentiable and provide suitable inductive biases for the type of input data. As such, deep neural networks can combine several neural building blocks into a differentiable function and educate end-toend. Here, the key advantage is when it comes to content-based recommendation systems. Multi-modal data is very common for user-item modeling on the web. For instance, when working with textual data such as review data (Zheng, Noroozi, and Yu2017), tweets (Gong and Zhang 2016), items, image data (social posts, product images), CNN/RNNs become the main building blocks. Here, less attention has been paid to the traditional solutions such as modality-specific features, and as a result, the recommender system cannot take advantage of video learning (Zhang et al. 2018).

Educational Recommender Systems (ERS)
Educational recommender systems are increasingly utilized as tools to assist students and teachers in implementing the learning process (Muthukumar and Bhalaji 2020;Zhang et al. 2018). Here, e-learning is one of the fields that its use is inevitable to improve the quality of education. E-learning is a form of education provided using various electronic tools (Internet, intranet, extranet, satellite networks, audio and videotapes, CDs). It is controlled in different ways (selfdirected/controlled by the educator), and its implementation is without geographical and time restrictions (simultaneous/asynchronous learning). Other terms are used to describe this method of learning and educating, such as online learning, virtual learning, distributed education, and web-based learning (Mubarak, Cao, and Ahmed 2021;Tejeda-Lorente et al. 2015).

The Most Important Challenges in ERS
Despite the many improvements that recommendation systems have made, these systems also encounter some challenges that can be summarized as follows: Automatic information retrieve, cold start problem (user/new items), highly specialized content, the lack of diversity in recommendations, data scarcity problem, fraud problem, and critical mass problem (Zhang et al. 2018).
In the Automatic information retrieval problem, the main issue is that today's algorithms have limited ability to analyze the content of recommended items automatically. Items with associated textual content (such as books, web pages, etc.) are usually easily described (using different approaches for Information retrieval from texts). The most developed algorithms are tailored for analyzing textual content (Santos and Boticario 2011). They use keywords and phrases that are found in the text and compare them with search parameters. With the higher correlation between these data, the likelihood that a particular text can be recommended to the designated user is greater (Adomavicius and Tuzhilin 2005).
The Cold-start (New User/Item) problem appears in situations when ERS encounters a user or an item that could be recommended for the first time. In such cases, the system does not have enough information about this user or the item to prepare a meaningful recommendation (Al Mamunur, Karypis, and Riedl 2008). Consequently, the system depends on the manually entered initial parameters about the user or the items of recommendation provided by the user or system administrator. Implicitly gathered information about the user, which does not require the user's cooperation, will give the system more accurate information about the user's interest, how the user uses the design, what contents are recommended, etc. (Reddy, 2016). However, for implicit data collection, the user must use the system for a certain period of time. in open education surroundings there is a danger that, due to the lack of information about the new items, they will not be treated like that by the system. In these cases, ERS relies only on the available information about items that are in some cases dependent on the other users of the plan (through ratings, etc.). (Adomavicius and Tuzhilin 2005;Al Mamunur, Karypis, and Riedl 2008). Content overspecialization and non-diversity problem are pronounced when the ERS only recommends items that score highly with the user's profile. In these cases, there is a risk that the user will only be recommended very similar items. In ERS, this issue is more pronounced in open educational environments that usually determine recommendations based on matching the user's profile and recommended items (Sunil and Saini 2013). In formal education environments, teachers could rectify the system to ensure that various items are recommended (in accordance with the objectives of the course). On the other hand, in open education environments, the most common approach for solving this problem is the introduction of a random selection of content that will be recommended, taking into account that there is a proper correlation between this content and the content the user is interested in (Adomavicius and Tuzhilin 2005;Cremonesi et al. 2011).
The Sparsity and Gray Sheep problem usually appears when ERS depends on the ratings of items by the system's users or when the recommendation is done based on grouping and comparing similar users. Suppose some items that the system can recommend have been evaluated by a few users of these items, regardless of their quality. In that case, they will not be widely recommended to other users. In addition to the items' content, the problem of sparsity could appear among system users. The user who does not fit well in any of the groups will not get good recommendations.
In formal educational environments, these problems can be solved through interventions done by teachers. However, in open education environments, there is a risk that they will remain unresolved (Adomavicius and Tuzhilin 2005).
Fraud problem in ERS is related to the data entered by the user. These data could be basic data on/about the user's profile or the data collected through tests used for monitoring user advancement through the course. Although fraud problems make no sense in open education environments, in formal education environments where achievement in an assignment may have consequences for the overall success of the user, there is a possibility of fraud. This can happen when the user is not monitored during the use of the ERS.

Examples of Educational Recommender Systems
Today, many different ERSs are utilized. Their objective is to facilitate the modernization of the educational process in both formal and open educational environments. These systems are usually a combination of design and behavior, methods, and strategies to create the recommendations. In this way, ERS can be divided into systems that recommend educating, learning objects, teamwork to implement joint activities, different educating methods through learning cases based on the user's unique preferences, or helping to create a personal learning path (PLP) (Zhang et al. 2018). Likewise, the ERS educating method recommends dividing users into methods in formal learning environments and freely available methods on the World Wide Web.
Regarding the widespread use of web2.0 tools for e-learning, most ERSs recommend a combination of these methods. Besides, some ERSs help teachers to perform a part of student supervision (Tejeda-Lorente et al. 2015) or find some ways to recommend the learning topics (Huang, Zeng, and Chen 2007;Mubarak, Cao, and Ahmed 2021). Sunita et al. (Aher and Lobo 2012) recommend ERS courses available to students. Then, they developed their recommender system based on the best combination of lessons available and each user's unique interests. In (Imran et al. 2016) developed their recommender system of Personalized Learning Object Recommender System (PLORS), which is an ERS in the LMS by various learning objects to personalize the formal educational process, based on monitoring the activities of previous students and then comparing them with other students and their activities. In (El-Bishouty et al. 2014), they proposed an ERS model that helps teachers proportion e-learning content to their students' different learning styles.
Moreover, the E-learning Activities Recommender System (ELARS) (Hoic-Bozic, HolenkoDlab, and Mornar 2016) exploits visual, auditory, read/write, mobile (VARK) descriptions (Fleming 1995), and learning styles as an essential element in a user profile. (Marian, Popescu, and Costel 2015) suggest the use of ERS to help students find the groups who can help them solve a particular problem in learning the content of a specific course. In several different systems, the use of ERS to communicate with students has anonymously been proposed.
In addition, one of the objectives of ELARS is to be able to recommend when forming a group to work on a specific problem or on a particular project. When this feature is added to the ERS, students usually have the freedom to decide independently whether to accept recommendations and communicate with appropriate colleagues or to ignore them (Zhang et al. 2018). Determining a personal learning path is one of the objectives of a number of ERSs. These systems employ different input parameters to define a unique path of educational content for each user. China Ming et al. (Chin Ming, Chih Ming, and Mei Hui 2005) arranged the syllabus in such a way that the system utilizes the student's incorrect answers to devise more learning paths so that the user can obtain sufficient knowledge of the course content.
On the other hand, Lata et al. (Latha and Kirubakaran 2013) developed a model of ERS that its algorithm employed the graph theory and knowledge of different learning styles to recommend different PLPs for each user. In (Chin Ming et al. 2007), each user's basic knowledge level is compared to the complexity of individual learning objects. According to the results of this comparison, the ERS provides some recommendations for different learning paths. Besides, in (Onah and Sinclair 2015), the PLP building was designed based on a comparison of the user profile and the aim of proper learning defined by the user. The ERS monitors the user's progress and changes the learning path to ensure that all the required knowledge is acquired to succeed in further learning.
To obtain the better performance of the algorithms used in ERS, several methods of artificial intelligence (fuzzy sets, artificial neural networks, evolutionary strategies) or their interactions are utilized (Zhang et al. 2018) In the papers of (Tejeda-Lorente et al. 2015; Jamsandekar and Mudholkar 2013), they use fuzzy inference methods to process data on student success with the aim of better monitoring students' progress through course content. Artificial neural networks are considered to develop algorithms that are capable of self-learning based on data from a given domain (Negnevitsky 2005), on the ERS of artificial neural networks for complex modeling relationships between user profiles and their interests (De Gemmis et al. 2009), as well as for modeling the relationship between recommended objects and other parameters in the ERS to determine specific recommendations that are unique for each user (Adomavicius and Tuzhilin 2005; Jamsandekar and Mudholkar 2013) This method can obtain better overall results in the same environment, as compared to cases where only one of these methods is employed (Zhang et al. 2018). Some methods of artificial intelligence based on evolutionary calculations include genetic algorithms, evolutionary strategies, and genetic programming (Negnevitsky 2005). It should be noted that genetic algorithms and various evolutionary strategies are commonly used in ERS (Zhang et al. 2018). In (Sengupta, Sahu, and Dasgupta 2011), they used the Ant Colony Optimization approach (an evolutionary algorithm) to identify system users' effective and optimal learning paths. This system is exploited to access information about unfamiliar terms that the user encounters during the learning process. The paper (Chin Ming et al. 2007) utilized the genetic algorithm to create a personal learning path for the user, while (Cayzer and Aickelin 2002) used the biological immune system model to obtain a set of possible recommendations.

Proposed Method
This paper aims first to obtain a dataset of users, including their interest in the resources under study and the extent to which they use and click on these resources and related features. After that, the practical items from them are chosen. In the second phase, using deep neural networks, we educate our recommender system with acceptable accuracy, and finally, we recommend resources to users using the educational network. The recommender algorithm encompasses data extraction from OULAD data source files, data preprocessing, construction of a deep learning network of MLP, LSTM, and BiLSTM networks improved by Attention method, initialization of parameters, educating, and finally, educating point predicting. In our proposed architecture, as illustrated in Figure 2, in each layer for each feature in the dataset, one BiLSTM cell and one LSTM cell are considered so that the cells are focused on one feature of each record. Each cell considers only the pattern of one feature. Finally, the total of these patterns leads to obtaining a better result.

Predicting Resource Scores
In the process of model educating, data labeled as class is considered as an educating set to model educating. Then, according to the user-lesson attribute vector, the recommendation problem becomes the category prediction problem. In this paper, using the label of rating classes, the loss information is published to each layer from top to bottom by accurately tuning the observer's parameters. After educating the model to obtain a certain amount of loss, a set test can be employed to test the performance of the recommender model. The data in the set test is categorized into two classes: user-lesson attribute vector and lesson assessment. Each user-lesson attribute vector corresponds to a category level, while each level corresponds to a score. All lessons that correspond to a user are sorted based on the expected scores, and then the lesson recommendations are generated.
Using the Bilstm structure in the first layer due to its two-sided nature focuses on short-term and long-term interests. In this architecture, two layers of LSTM and Bilstm are siblings used to extract the general patterns in the total database data. Finally, the output of these two layers is sent to the attention layer. In this architecture, we have used the Seq weighted model of the attention technique to reduce the useless features and side effects of noise data; At the beginning of the proposed architecture classification step, the Dense layer was used, and since the inputs of these layers are vectors, we used the Flatten layer to convert the output of the higher layers into vectors. As you can see in Figure 2, before the last layer, which is the softmax output and determines the probability of belonging to each class, dropout has been used to use the secret aggregation, ensemble feature; Also, it prevents network overload and increases the generalization capability of the model.

Implementation
The volume of dataset data used is about 11 million data, which after eliminating those records via missing fields, finally remains 10543682 records consisting of 12 features. It should be noted that this set of records is gathered from the activities of 23326 different students. The implementation steps are carried out in two parts: the first part contains the hybrid architecture (see Figure 2) that refers to our idea in the paper. On the other hand, the second part includes implementing several traditional methods and the deep learning based on the DBN network related to the idea of the study of (Zhang et al. 2018), which developed scientific resources. In this article, we have used 3 divisions 70% -30%, 80% -20%, 90% -10% to train and test our proposed model, and for each case 3 values of 0.1, 0.2, 0.3 For validation split. (Table 7) After the completion of the Epochs, the graphs and the results of the performances show that the accuracy and loss of our work are far better than the results of the implementation of the proposed network (Zhang et al. 2018 Students generate a diversity of behavioral data by learning in an online learning environment. This behavioral data is gathered and stored through data collection methods (Kuzilek, Hlosta, and Zdrahal 2017). The resources dataset provides data sources for this platform. This resources dataset can be exploited to extract content features that reflect students' interest in resources. The feature vectors of students are generated by combining student features and lesson features. Afterward, the combination vectors of behavioral feature and user-lesson feature are created. The dataset consists of 3 students, teachers, and lessons. It includes information about 22 courses provided, 32,593 students, their assessment results, and their interaction with the Virtual Learning Environment (VLE), which is summarized by Students' daily clicks on various "resources" (10,655,280 inputs) are provided. The dataset is anonymous using the ARX data encryption tool [PK15]. The data is investigated for loss detection and then confirmed and published by the Open Data Institute1.
OULAD is a sample subset of the collected student data, which contains student demographics, student performance in the course assessment, and last but not least, student behavior in the VLE. This resource provides a unique dataset of student performance and prepares an opportunity to create new generations of learning management systems. Courses (called modules) with a history of at least two successive presentations were chosen as the first stage. This course covers the topic of learning and the set of sessions completed with the test.
Module-presentation: represents the academic year so that the courses are taught. After that, the data is converted and identified using a data Anonymizer tool. Figure 3 depicts the overall structure of the presented data set. This data set generally represents students and then the course. The main table of StudentInfo contains the student files that are linked to the courses (A student can have more than a one registered course). Each course has several assessments related to the student using the student assessment table, including the history of the student assessment results.

Dataset Plan
• courses.csv: The file contains a list of all available and presented courses. • assessments.csv: This file contains information about the assessments of the presented courses. Usually, each course has a number of assessments followed by a final exam.
There are three types of assessment: Teacher-Made Assessment (TMA), Computer-Made Assessment (CMA), and Final Exam (Exam).  Figure 4 shows the pre-processing steps. The input consists of four sections: provided resources, student features, held courses, and student performance and assessment history in each course. These four sections are combined to perform further analysis, categorization, feature mapping. Table 1 clearing blank or mistake data, and feature normalization. The features are normalized before implementing any work (the data range is [0, 1]). To do so, Eq. (1) is utilized to perform the normalization as follows:

Data Pre-processing
Where Xmin denotes the lowest eigenvalue, which is Xmin = min{X1, X2, . . ., Xn}. Xmax denotes the maximum eigenvalue that is Xmax = max{X1, X2, . . ., Xn}. X*indicates the normalized value, x means the primary data. Ultimately, after pre-processing, the data is divided into an education set and a test set.

Records Labeling
The dataset is labeled after the initial stages of pre-processing as follows: For each student from the set of registered activities for the joint courses, the resource with the most clicks (which can indicate the student's taste and interest) in the course that had the highest grade (which can be the influence of the resources studied in that course to show the student to be more successful) was chosen as the label. Therefore, 562 labels were generated, which were mapped from 0 to 561. The frequency of repetition of labels is illustrated in Figure 5.
In the end, after the pre-processing step, the data is divided into an education set, a validation set, and a test set. The education set enters our proposed network as input.

Network Construction
The current research was conducted on a Google Colab server. Here, it is worthwhile to mention that Google Colab is a cloud service provided by Google that allows Python programming, which prepares to install and work with several Python language packages and deep learning frameworks such as Tensorflow, Keras Pytorch, and more. In terms of this service, it provides a free GPU to users, which has practically multiplied the efficiency of this service. The service has been provided with an Nvidia Tesla P100 and 25.51GB of RAM. Besides, both LSTM and Bidirectional Keras library have been used to build the recommended network. After entering data into the network, the data enters the Bidirectional layer with 1536 neurons. At the end of the output of this layer, they enter the LSTM layer with 512 neurons, and at the end of the output of this layer, they enter the Attention layer. In this implementation, SeqWeightedAttention existence in the Keras library has been employed to implement the attention technique.
Initialization of network parameters: In the model educating process, we must repeatedly adjust the model parameters to achieve better results in feature extraction. During the learning process, the minibatch process method is exploited to solve the problem of large data volumes. Besides, some parameters such as learning rate, number of repetitions, and Bach-Size are set as follows: Bach-size: 2048, Learning rate: 0.0001, Epoch: 120, Activation function: Softmax   The higher the number of network parameters, the higher its computational load in the network educating phase. In our proposed network, in the educating phase, the first val accuracy can be observed = 0.95 at epoch = 74 via a minimum loss of 0.08 at epoch = 115. After completing the educating phase, we entered the test data as input to the network, and then the obtained final result was equal to 0.96 with one percentage of generalization.

Methods and Tools of Data Analysis
The recommender system developed in this paper aims to predict the best sequence of educational resources. To do so, there are many criteria to measure different aspects of recommending performance. Two essential criteria, i.e., Mean Absolute Loss (MAE) and Root Mean Squared Loss (RMSE), are utilized to measure the accuracy of the predicted scores (P) for each educational resource relative to the correct scores (R).
Here, to assess the loss of the implemented methods, RMSE is employed. Then both the accuracy and loss of our architecture are compared to ones of the architecture of the study (Zhang et al. 2018), which contains a similar application nature to our work (suggested by scientific sources). Table 7 and Figure 6 exhibit the accuracy and loss of our network in the educating phase.

Investigate the Effect of the Number of Cells in Each Layer
As can be observed in Table 3, the results of single-cell structures in three single-layer architecture, two layers, and three layers, are implemented and then investigated, which are not desirable. An LSTM cell could not find well the pattern of different features, the relationship of features to themselves, or other features in combination, permutation, and additional models. As shown in Table 4, the single-layer multicellular architecture implemented in the single-layer two architectures of LSTM, the single-layer BiLSTM was implemented and then investigated and obtained more favorable results.

Investigating the Effect of Using the Attention Mechanism
As mentioned before, the attention technique can filter out useless features from raw inputs and decrease the side effects of noise data. By applying the attention technique to the recommender systems, we can eliminate useless content and choose the items via the most representation along with maintaining interpretability. We added the attention technique to single-layer two architectures of LSTM and BiLSTM. As can be observed in Table 5, the use of the attention technique in network architecture has had a positive effect on the obtained results.

Investigating the Effect of the Number of Layers on Neural Network Structure
In deep learning network architecture, the first problem is the relationship of LSTM cells to themselves, which was investigated in detail. The second problem is the relationship between the different layers in the implemented architecture  that this part of the architecture performs the general understanding of the features in the database. Here, to survey the effectiveness of the relationship between the layers, we implemented and educated two-layer two architectures LSTM and two-layer Gru. As shown in Table 6, the results of two-layer architectures are more favorable than those of single-layer architectures.
As the BiLSTM deep neural network considers both long-term and shortterm interests of the user, and due to their gradual learning natures, they support learners' behavioral changes. Hence, we implemented our proposed two-layer architecture as a combination of the improved LSTM and BiLSTM network using the attention technique. As can be seen in Table 7, the results of our proposed architecture are very acceptable and desirable According to Figure 6 considering the loss rate, it can be seen that the number of selected epics is appropriate. Besides, according to the accuracy diagram, it can be seen that as the val and val accuracy diagrams are almost the same, so that overfitting does not occur in this experiment.

Investigating the Effect of Unconventional Data on Model Accuracy
As shown in Figure 5, we are faced with an unbalanced data set. In these cases, in addition to the model's accuracy, it is better to use other evaluation parameters such as recall and F1 score. To ensure that the model has not  intelligently categorized all the data presented into a repetitive class in the training process to achieve high accuracy. At the end of the test phase, for all three parameters precision-recall f1-score, the average value is 96%. We examined the classification accuracy of each group separately and found that the data of all classes were very well categorized. To evaluate, we have trained and tested our model with 3 different data sharing modes. In Table 8, as an example, we have ten cases of the groups that had the lowest frequency of repetition and ten cases of the groups that had the highest frequency of repetition from the test results.
As you can see, in all three divisions of 30-70, 20-80 and 10-90 in the groups with the value support = 1, the result of most recalls and f1-scores is equal to 1. On the other hand, the number 1 obtained for recalls and f1-score is very small in the groups with the most members. This shows that our model has also met the unconventional data challenge in addition to the data volume challenge.

Comparison of the Performance of the Proposed Model with Other Models
We have compared the suggested model results in the first row of Table 8 with other methods presented in related work or implemented by ourselves. As can be seen, the results are more desirable for different evaluation parameters of the proposed model than other implemented methods. All evaluations were performed on OULAD shared data.
The proposed method (Zhang et al. 2018) has been implemented and trained, tested, and evaluated with OULAD data. As shown in Table 9, it performed worse than our proposed model in terms of both error and accuracy criteria. Five methods have been implemented and studied in (Lic. Mar´ıa Emilia Charnelli et al., 2019). The results show that the SVD algorithm with an error of 0.839 is more desirable than other schemes, which is several times higher than the error of our proposed model.
In (Hui Chen et al. 2020), the three criteria, including Recall, Prec, and F1 for the three methods itemCF, Clustering + itemCF, and AROLS are examined. It shows that the proposed algorithm (AROLS) has a better Prec than the other two cases. Meanwhile, F1 and Recalls remain relatively steady at n top recommendation at the same time.
The work (Rumei Li et al. 2019) shows that AROLS performs much better than traditional participatory filtering, especially User-AROLS calling and accuracy, which has more than tripled. Also, the calling accuracy of UserCF is much smaller than ItemCF, probably because UserCF focuses more on the interests of learners who are more like a particular learner. At the same time, the ItemCF recommendation is more personal because it primarily suggests similar items based on the learner's interest. As can be seen in the first row, the proposed model performed better than all seven reviewed methods. support f1-score recall precision label support f1-score recall precision label support f1-score recall precision label In (Lingyao Yan et al. 2021), the results show that OLS characters can make the recommendation algorithm more accurate and robust, but as seen in the first row, the proposed model performed better than both studied methods.

Conclusion and Future Works
Today's ERSs do not usually include the designed mechanisms to assist teachers in decreasing their workload. Such systems are more focused on the needs of students. Regarding the work proposed by (Bhojak, Jain, and Muralidharan 2012), some internal algorithms are designed to assist teachers. Overall, the data collected by the ERS can noticeably be employed to help teachers (Zhang et al. 2018). Because of this diversity, the developed systems for one environment may not efficiently be utilized (without a significant change in how they work) in a different learning environment. Currently, the employed systems are specialized for one of these two learning environments. Indeed, a range of future research and development will construct ERSs that can work appropriately with minimal changes in both environments. By introducing the Bologna process in higher education, particularly in continuous monitoring and evaluation of student work, the amount of teacher work has significantly increased (Zhang et al. 2018). Regarding the lack of functional understanding, one area of further ERS research and development will certainly focus on teacher support, especially in formal learning environments.
Systems should take teachers' workload completely, especially when we need the continuous monitoring and evaluation of student work during the semester. Although in terms of education, some algorithms developed for the ERS and investigated in one course could be employed without any modification in another course (algorithms are not dependent on the content education). Generally, systems do not create any link in the student achievement to different courses. In fact, considering that the education programs based on learning outcomes and obtaining the general and specific qualifications are predefined, the obtained results in one course can be considered to provide recommendations in another course (Zhang et al. 2018). The differences existed between the different study methods that are suitable for use in other areas, requiring the system's flexibility to satisfy the needs of all users. Concerning these differences, the ERS model can be designed and built to provide some satisfactory services to the students and teachers who utilize them. The areas for the future development of ERS confirm that there are still numerous opportunities for further scientific advancement in the field of ERS (Zhang et al. 2018).