An approach for learning resource recommendation using deep matrix factorization

ABSTRACT In traditional learning, learners and their lecturers, or tutors can meet face-to-face. In such lectures, the lecturers, or tutors can introduce printed book tutorials. However, in several circumstances, such as distance education, learners cannot interact with their teachers. Therefore, online learning resources would be helpful for learners to get knowledge. With a large and diverse number of learning resources, selecting appropriate learning resources to learn is very important. This study presents a deep matrix decomposition model extended from standard matrix decomposition to recommend learning resources based on learners' abilities and requirements. We test the proposed model on two groups of experimental data, including the data group of students' learning outcomes at a university for course recommendation and another group of 5 datasets of user learning resources to provide valuable recommendations for supporting learners. The experiments have revealed promising results compared to some baselines. The work is expected to be a good choice for large-scale datasets.


Introduction
Learning is the lifelong work of the learner. With the traditional method, learners and lecturers can meet face-to-face. However, if learners cannot interact with lecturers, learning resources including books, textbooks, lectures, and magazines, can be very useful for learners to gain knowledge. With the development of information technology, learning tends to shift from traditional learning to online learning. Nowadays, information can be searched from electronic libraries or the Internet; Learners tend to search for their preferred learning resources on online resource management systems.
Learning resources covering all aspects of social life help learners fully exploit aspects and information of the problem they need to study. However, too many different systems of materials and resources make it difficult for learners to choose the appropriate learning resources. With the explosion of electronic libraries, the source of data has increased significantly. Most learning resource systems have recorded rating information (ratings) about learning resources. For student learning outcome data, ratings represent their scores. These assessments show that researching solutions for learners to choose the most appropriate learning resources is necessary to help the learning process achieve better results.
Learning resources and materials are an essential part of education. Learners and educators can be benefited from using them. Firstly, learning resources provide digital learning environments such as open textbooks, open visual materials, open courses, and selfassessment tools. Then, numerous high-quality learning resources can be updated and edited frequently. Another advantage of learning resources is that self-study, testing, or group learning are supported. The development of information technology may turn traditional reading to online reading. Nowadays, information can be searched or exploited from e-libraries or the Internet. Students and educators search for their preferred learning materials on online resource systems. Moreover, universities have also provided open educational resources. They may consider an investment in open resources a sustainable human development. Open learning resources increase access to high-quality education and lower the cost of education worldwide. People around the world can share, contribute and access knowledge. These resources refer to all aspects of social life, helping readers exploit aspects and information about the problem they need to research. However, various educational resources and material systems may cause users difficulties in selecting appropriate learning resources. With the explosion of e-library websites, volumes of data have been increasing. Today, most resources and materials system websites allow their users to write comments or/and reviews on the materials to express their feelings more precisely and detailedly about the materials they had read. This information enables us to conduct a study based on users' ratings for providing valuable suggestions.
The problem of rating (ranking) prediction and recommendation cannot be separated. First, it is necessary to have a ranking prediction result to suggest. Then, the prediction result can choose the results with the best rank to suggest. In education, predicting learning outcomes or predicting the effective use of learning resources is a premise to suggest learning resources suitable for each learner.
This study is extended from our previous work (Dien et al., 2021) at the ICCCI (International Conference on Computational Collective Intelligence) 2021 conference. This work proposes an architecture based on deep learning and the state-of-the-art matrix factorization models to recommend learning resources using two groups of data, including (1) datasets about learning resource recommendation and (2) datasets about course recommendation based on learning outcomes of students.
The contributions are different from previous work (Dien et al., 2021) as described as follows: . Re-construct and describe more about the standard matrix factorization and the deep matrix factorization models, which are easier for the readers to follow. We also analyze the differences between the standard matrix factorization and the deep one. . Apply the proposed model for another domain which is course recommendation and show how to map the course recommendation problem to recommendation system problem. . Test the proposed model on more datasets in course recommendation and compare with other methods. . Add more related works in both learning resources and course recommendations.
. Discuss the tasks in future work.
In the rest of this work, Section 2 presents some state-of-the-arts related to book recommendation systems. In Section 3, we present our proposed methods for this work. Next, the experimental results of our proposed methods obtained are discussed concerning different parameters in Section 4. This section is followed by Section 5 containing the paper's conclusion.

Related work
Numerous studies have been proposed to design automatic book recommendation systems using various methods, including classic machine learning, deep learning techniques, and hybrid methods on vast datasets that can range from a few features or even hundreds of attributes.
The authors in Wang and Huang (2020) stated that existing recommendation systems skipped the characteristics of readers' personalized information. Hence, their work proposed a library recommendation system based on a restricted Boltzmann machine and collaborative filtering algorithm to improve performance and provide a good application effect. The work of Xu et al. (2020) provided a system for exploring the data concerning reading, including three categories.
The authors in Hyun-Yang (2020) introduced another exciting work for the disabled and other libraries based on user preference. The keywords selected from users' usage data were divided into ten subject categories and ranked by appearance frequency. The proposed method considered the books with the high frequency of the keyword in selecting as a preferred target to produce alternative materials. Alharthi and Inkpen (2019) trained on more than a hundred features learned from the full text of the books aiming to provide accurate lists of suggested books. As a result, the authors investigated and conducted the textual elements that are thought to play a role in generating a highquality book.
Based on Naïve Bayes enhanced with Optimal Feature Selection, Sang Nguyen (2019) provides an efficient model-based book recommender system. This work revealed several methods for data-processing and selecting features and classifiers with the hope of providing an efficient book recommender system. Naïve Bayes can be the best selection for book recommendations with acceptable run-time and accuracy for this author. Jomsri (2018) introduced a book recommendation system for university libraries to explore user interests which are related in the same topic and faculty to provide the most suitable books to users according to the faculty of the user profile with book category and book loan or technique of FUCL (abbreviated by Faculty, User, Category, Loan). The work proved that FUCL mining techniques could be suitably applied for the recommender book software in the libraries. Okon et al. (2018) took into account object-oriented analysis and designed a methodology of object-oriented analysis and design methodology to enhance collaborative filtering algorithm and combine quick sort algorithm with improving the speed scalability of book recommendations generated from the system.
The authors in Sohail et al. (2016) explored and analyzed opinions from customers' online reviews for books via ordered weighted aggregation, a well-known fuzzy averaging operator to quantify the scores of the features. The proposed method was expected to provide new insights better to address the users' expectations better and find relevant books. The work of Alharthi et al. (2017) has attempted to analyze the user's interests from social media data as word embeddings to produce book recommendations. As a result, the new users can get recommendations for books as accurately as current users. Finally, Musto et al. (2016) proposed a method based on the content which utilized textual features extracted from Wikipedia to learn user profiles based on such Word Embeddings. Their work was examined against two state-of-the-art datasets and achieved good performance for high-sparsity recommendation scenarios.
There are numerous proposed methods and models including user/item collaborative filtering, filtering based on content, association rule mining, hybrid recommender system, and recommendations. Though those techniques have exposed interesting characteristics, those can be inefficient in generating appropriate recommendations in particular cases. Therefore, numerous hybrid approaches have been investigated to integrate various methods to perform better recommendations (Passi et al., 2019). A hybridbased method presented in Hariadi and Nurjanah (2017) has been integrated to use both attributes and personalities of users for book recommendation systems. Another hybrid-based method was introduced by Nirwan et al. (2016) to consider customer demographic information such as sex, age, geographical location, and book information such as title and ISBN for the rating of book prediction with values ranging from 1 to 5 classified with Multilayer Perceptrons models. NOVA, Hybrid book recommendation engine proposed by Pathak et al. (2013). It was based on a unique hybrid recommendation algorithm and expected to meet customers' best and most efficient book recommendations.
In Vaz et al. (2012), the authors integrated two item-based collaborative filtering algorithms to predict books and authors that the user may like and produced a booklist that then provides the top-n book recommendations. Ali et al. (2016) explored table of contents and association rule mining in books and opinions of similar users to propose a hybrid book recommender system. They observed the results and concluded that integrating content-based filtering and collaborative filtering techniques and association rule mining can obtain the best recommendations. Finally, Chandak et al. (2015) summarized hybrid techniques for optimizing book recommender systems on collaborative filtering, content-based, and demographic techniques to utilize the powers of each technique in a hybrid manner.
The work of Tarus et al. (2018) introduced a hybrid recommendation approach that integrates information on context awareness, sequential pattern mining, and collaborative filtering algorithms to produce learning resource recommendations for learners. Finally, W. Chen et al. (2014) presented an algorithm including two steps. The first one was to discover content-related item sets using item-based collaborative filtering, and the other aimed to (2) apply the item sets to a sequential pattern mining algorithm to filter items according to common learning sequences. A model learned from learner profile and the learning content using two ontologies was proposed by Aissaoui and Oughdir (2020) using two ontologies.
The work in Arabi et al. (2020) considered that contextual information, including location and emotion, could significantly enhance product or service recommendations to be included in the recommender system. The authors determined several user characteristics and product features. For example, in some studies, students' scores or courses are used as learning resources for deep learning techniques. Dien, Luu, et al. (2020) leveraged some techniques such as Quantile Transformation MinMax Scaler to perform the prediction tasks. Another research of Dien, Hoai-Sang, et al. (2020) proposed to use MultiLayer Perceptron and pre-processing methods on four million mark records of a university to give appropriate recommendations on course selection.
Moreover, there are several works in learning resource recommendation using different approaches. H. Chen et al. (2020) introduced a learning style model to represent features of online learners. They present an enhanced recommendation method named Adaptive Recommendation based on Online Learning Style, which implements learning resource adaptation by mining learners' behavioural data. The authors used Collaborative Filtering (CF) and association rule mining to extract the preferences and behavioural patterns of each cluster. Liu et al. (2020) proposed an online learning resource recommendation method based on Wide and Deep and Elmo model. This method explores the deep features of learner characteristics and course content features under the condition of high-dimensional data sparseness.
In other works, knowledge domain is also mentioned on the recommendation system. Do et al. (2020) proposed a method for intelligent searching on ontology-based knowledge domain in e-learning. The intelligent searching system based on a knowledge base has been applied to construct a search engine for the knowledge of high-school mathematics. This engine can do searching works and retrieve required information in mathematics for high-school students.
Besides, Nguyen et al. (2021) proposed a model that uses ontology technology for relational knowledge. The method is applied to build an intelligent chatbot for answering questions on the Introduction to Programming course contents in a university. It is helpful for self-learning to enhance learners' skills.
Based on the previous works mentioned above, in this study, we propose an approach based on deep learning and the state-of-the-art matrix factorization models to provide sound recommendations on learning resources (e.g. books, journals, courses, etc.). We also compare the proposed model with other well-known recommender systems to prove the state-of-the-arts of the proposed model.

Proposed method
This work focuses on two groups of recommendations: learning resources based on user feedback (such as rating) and course recommendations based on student performance.

Problem formulation
3.1.1. Learning resource recommendation Let denote u as the user/learner, i as the learning resource (e.g. book, journal, paper, etc.), and r is the feedback from user u on learning resource i (rating). In general, the learning resources recommendation can be mapped to rating prediction problems in recommender systems as follows: Learner, reader, or student ↦ User Learning resource (book, journal, or paper, etc.) ↦ Item Feedback (rating, number of views or clicks, etc.) ↦ Rating . Prediction phase: Given a dataset D with available (u, i, r) as presented in Figure 1(a), we would like to build the model to predict the rating (score) of the learning resources which have not been seen/read by the user (the empty values in this matrix). . Recommendation phase: After having the prediction results, we sort the rating scores by descending and selecting the top N learning resources with the highest scores for recommendation (N could be 3, 5, or other values depending on the system interface).

Course recommendation based on student performance
Let denote u as the student/learner, i as the course studied by the learner, and r is the score/mark obtained by student u on course i. In general, the course recommendation can be mapped to rating prediction problems in recommender systems as follows: Learner or student ↦ User Course ↦ Item Score/Mark ↦ Rating . Prediction phase: Given a dataset D with available (u, i, r) as presented in Figure 1(b), we would like to build the model to predict the student performance (score) on the courses which the student has not learned. . Recommendation phase: After having the prediction results, we sort the rating scores by descending and selecting top N courses with the highest scores for recommendation (N could depend on the student's learning plan).

Matrix factorization in deep learning environment
Matrix Factorization has been widely used in recommender systems (Khanal et al., 2020;Koren et al., 2009). In this approach, a matrix X is decomposed to two sub-matrices W and H, X ≈ WH T , as demonstrated in Figure 2.
Where W [ R |U|×K while H [ R |I|×K and K is latent factors, K ≪ |U|, K ≪ |I|. The parameters W and H can be obtained by optimizing an objective function (equation 1) using stochastic gradient descent (Koren et al., 2009), where λ is a regularization parameter to prevent over-fitting.
After optimizing process, we can get the parameters W and H. Using these parameters, the rating r ui of learner/student u on learning resource/course i can be predicted by equation (2).r This matrix factorization approach can be presented in a deep learning environment as in Figure 3. First, an input layer represents the current user/learning resource; this layer will be passed to an embedding layer for reducing the dimensions of user/item factors (act as the K parameter in Figure 2). Next, a DOT product is calculated for these users and learning resource factor vectors. Finally, an output layer is used for prediction. This model is trained using stochastic gradient descent (Adam optimization in deep learning).

Proposed deep matrix factorization for learning resource and course recommendation
This study proposes using Deep Matrix Factorization (DMF), which is extended from the matrix factorization (Guo et al., 2017;Zhang et al., 2018) for learning resources and course recommendations. The model is described in detail in Figure 4. The proposed model has four layers similar to the standard matrix factorization (MF). First, an input layer represents the current user/learning resource; an embedding layer for reducing the dimensions of the user and learning resource features. In the MF in Figure 3, a dot product is used to calculate the user and item factors for prediction. This dot product is running fast. However, it is a linear combination between two latent factors. In the DMF, the dot product is replaced by a deep neuron network for a better nonlinear combination. Two embedding features are concatenated as the Multilayer Perceptron (MLP) layer input. Finally, an output layer for the prediction score. The MLP has 128 nodes (neurons). However, we can set up different hidden layers (e.g. adding more layers)  and the number of neurons depending on different datasets/domains. In this study, the number of nodes is selected using hyper-parameter search, which will be carefully presented in the experimental results section (Section 4.3). The network uses the Adam optimizer function, using a batch size of 256 while the learning rate is 0.001.

Data description
Besides the datasets for learning resource recommendation as the original work in the conference, in this study, we also used new datasets for course recommendation to validate the proposed model.

Data on learning resources
(1) Ratings dataset 1 contains all users' ratings of the books (980,000 ratings, for 10,000 books, from 53,424 users). It introduces and demonstrates collaborative filtering, allowing us to take a deeper look at data-driven book recommendations. The number of users, learning resources, and ratings of these datasets are described in Table 1. These datasets are very sparse, which means that the users or learning resources may have a few ratings. This problem is a challenge for every machine learning method. Therefore, we present other versions of these datasets for comparison purposes by keeping those users/learning resources with at least five ratings. The new version of these datasets is presented in Table 2.

Data on course recommendation
We have also collected data on student performance for course recommendations. Three datasets are presented in Table 3. The first dataset (Student Performance) was collected from Can Tho University. It is the 'original dataset'. The second dataset includes records of students who have studied at least ten courses extracted from the original dataset. Similarly, the third dataset retains records of students who have learned at least 20 courses. In such datasets, we consider a user as a student, while the item is a course, and ratings are the course scores that students have achieved. Student Performance 10 and Student Performance 20 datasets can be considered datasets to overcome data sparsity; we also use these two datasets to study building models to predict learning outcomes presented in the previous chapter.

Evaluation measure and baselines for comparison
For evaluating the models, this work uses the Root Mean Square Error (RMSE) which is the popular measure in recommender systems. It is calculated by equation (3).
Where y i is the true value,ŷ i is the predicted value, and n is the test size. In order to evaluate the effectiveness of the proposed DMF model, we compare it with other popular baselines for recommendation systems, including User KNN, Item KNN, Global Average, User Average, Item Average, and Matrix Factorization as detailed in Dien et al. (2021).

Results on learning resource recommendation
The experimental results including the relationship between RMSE and the number of neurons, are shown in Figure 5, the relationship between the number of latent factors (attributes) and the error is shown in Figure 6. In contrast, the performance of the training phase and the test phase during the learning process is shown in Figure 7. Observing the two charts in Figure 5 which illustrated the results of dataset one and dataset 3, the  The relationship between the error rate and the number of neurons for DMF models is exhibited in Figure 5 while the effect of the Number of Latent Factors (Features) for DMF models on the error rate prediction is shown in Figure 6. Two charts in Figure 5 exhibit similar patterns in the performance. The number of neurons can increase the performance to a peak, then the loss goes up and tends to be saturated. On dataset 1, the lowest RMSE is achieved with around 100 neurons. A similar result is also obtained on dataset 3. Figure 7 evaluates the overfitting of approaches during the learning. As shown in the figure, the training and testing errors of MF tend to converge after four epochs while DMF remains stable, the error rate only after two epochs. Moreover, MF suffers less overfitting than DMF when the learning progresses further.
Also, we compare DMF to a variety of methods including Item Average, User-KNN CF, Global Average, User Average, and Matrix Factorization in RMSE metric shown in Figure    Figure 12 on dataset 5. For datasets 2 and 3, we have deployed the experiments on the server with 320 GB in RAM. However, the learning of User-KNN CF cannot be done on two datasets of 3 and 4 due to the limitation of memory resources. In general, DMF outperforms most of the previous state-of-the-arts. Better results can be obtained only considering the users/books with at least five ratings. Matrix Factorization seems to be the worst among the considered methods, with the highest error rates on full datasets 1 and 5 and on dataset 3, including books/users with at least five ratings. The performance of User   Average is almost near to DMF, but it only gets a better result on full dataset four and is defeated in other cases. From the prediction results, online learning resource systems can sort the resources/materials based on the rating scores corresponding to each user and then provide appropriate resources/materials suggestions to him or her as mentioned in the work of Dien, Hoai-Sang, et al. (2020). If a user logs into the system, it can extract the top five or top ten resources and materials with the highest predicted rating for that user. If the user does not log in, the systems can provide the top five or ten resources and materials that obtain the highest average predicted rating for all users.

Results on course recommendation
Similar to the datasets of learning resources, in this experiment, we also evaluate the DMF model using the standard error measure RMSE. The assessment is used on three datasets, including the entire original dataset (Student Performance). The dataset is kept at least ten records of learning results per student, i.e. user/course has at least ten records. In addition, at least ten ratings (Student Performance 10), and the dataset retains at least 20 academic performance records per student (Student Performance 20). We also compare the DMF model with other methods in the recommender system, such as Global Average, User Average, Item Average, and Matrix Factorization for these three datasets.
The results of the RMSE measure between the DMF model and other methods in the suggested system that we compare between the original dataset and each reduced dimensional dataset are shown in Figures 13 and 14.  Experimental results show that DMF gives quite good results compared to other methods of recommender systems. For example, for datasets with users/courses with at least ten ratings and 20 ratings, the RMSE error of the DMF model is smaller than that of other methods. Furthermore, the model predicts learning outcomes using more efficient DMF techniques than the other methods due to the problem of data sparsity. In addition, the results also show that when data sparsity is better resolved (10 ratings compared to 20 ratings), the DMF model gives good results. For example, if the original dataset has an RMSE error of 0.7451, then the filtered dataset with at least ten ratings and the dataset with at least 20 ratings gives a descending RMSE error of 0.7011 0.6915, respectively.
We also observe the DMF and MF models' performance between the training and test phases during the student learning outcomes dataset. For example, Figure 15 shows the error during learning and testing for the Student Performance dataset. The figure shows that the training and test errors of the MF model tend to converge after 6 to 8 epochs, while the DMF model saturates early and converges after only about three epochs, which is less time-consuming in training time. Datasets with at least ten ratings and 20 ratings also have similar results.
Learning resource data from the ranking prediction results can be used for learning resource suggestions. Specifically, when there is a prediction result, the system can sort learning resources based on the rating score corresponding to each user, then suggest appropriate learning resources for that learner. When a user logs into the learning  resource management system, the learning resources with the best rating prediction (for example, top 5 or 10) are provided. If the user is not logged in, the system can provide the top 5 or top 10 learning resources with the best average predicted rating based on all users and suggest courses for learners after predicting results.
For the proposed DMF model, we can apply it to various learning resources recommendations such as recommending related lectures, course books, articles, and also other digital learning resources.

Conclusion
We have proposed a DMF extended from the standard matrix factorization for learning resources recommendations in this work. We validate the proposed model on five public learning resources and three datasets on student performance collected from universities. We also compare the proposed model with other well-known baselines of recommender systems. The experimental results show that the DMF model can work well on several different datasets, especially for datasets that have been reduced in dimensions. The experiments also show that the error of the DMF model tends to saturate and converge earlier than the traditional MF model, which significantly improves the training time of the model.  One drawback of algorithms in recommender systems is the cold-start problem. It means that when a new user (or a new item) is added to the system, the algorithms can not predict the user's rating since they have no rating information in the past. Meta-data (information) about the items could tackle this problem. For example, item information could be represented in terms of tf-idf vector. Then, this vector is plugged into the DMF model. This work will be continued in the future.