Collaborative filtering recommendation using fusing criteria against shilling attacks

The collaborative filtering recommendation technique (CFR) is one of the techniques used in recommended systems, in which the most proximal neighbours to a target user are selected. Their profiles are used to predict rating for items as yet unrated by that target user. However, malicious users inject fake user profiles to destroy the security and reliability of the recommender systems, which is called shilling attacks. Therefore, it is crucial to improve the recommendation technique against shilling attacks. Malicious users use a single method to perform shilling attacks. Intuitively, fusing multiple criteria to construct CFR can effectively resist shilling attacks. A novel CFR is proposed against shilling attacks (called CFR-F). In our approach, a similar interest users’ resource set is obtained first by integrating users’ dynamic interest model and social tags. Then, a similar interest user resource set is selected according to a strategy that selects preference influence weight based on user background. Our experimental results show that our approach can recommend accurate information resources and has a lower Mean Absolute Error (MAE) and Average Prediction Shift (APS) than traditional techniques by 50% and 20%, respectively.


Introduction
Information overload is becoming quite serious (Tang et al., 2021;Zhang et al., 2021). It is a big challenge to use information resources effectively. Information recommendation systems can help users sift through a large amount of information to select items they are interested in (Lian & Tang, 2022;Liu et al., 2022). However, due to the openness of information recommendation techniques, they are vulnerable to shilling attacks (Gao et al., 2011;Gao et al., 2010;Li et al., 2022). Shilling attacks are when malicious users inject fake user profiles to affect the recommendation's results (Yu et al., 2017). Shilling attacks threaten the recommendation system security and cause losses to the users. Therefore, it is crucial to study recommendation techniques against shilling attacks (Si & Li 2020;Arora & Taneja, 2021).
Currently, information recommendation techniques are classified into (Adomavicius & Tuzhilin, 2005;Yu et al., 2020): (1) collaborative filtering recommendation (CFR), (2) contentbased recommendation, and (3) hybrid recommendation. The CFR is one of the most popular and successful techniques in the information recommendation community, and it assumes that similar users have similar tastes. A user-based CFR technique makes recommendations by finding their neighbours with similar user profiles, which are assumed to represent the preferences of many different individuals. The profile database contains fake data; these fake profiles would be considered neighbours of target users and eventually result in biased recommendations in shilling attacks (Shah & Bhanderi, 2014;Wu et al., 2021).
In the event of a shilling attack, the key to CFR techniques is to ensure that the selected neighbours are trusted (Zhang and Kulkarni 2013;. A trust mechanism is a common method to solve shilling attacks in CFR. According to the initial trust value, trust can be explicit trust and implicit trust. The former explicitly provides the trust value between users in the system to set the initial trust value and maintains and adjusts its trust list. The latter calculates the initial trust value from the user profile (Mobasher et al., 2007). Trust relationships in social networks are combined with traditional recommendation systems to reduce the possibility of attacks. However, the trust relationship is computed by the static user behaviours or user ratings, and they are the object of fraud by the malicious users. It makes the trust relationship entirely dependent on its mining the risk of fraud. At the same time, maintaining user trust relationships in social networks brings a huge amount of extra work to the recommendation system, and the consideration of trust dynamics makes the problem more complicated (Mehta et al., 2008a;Arul & Razia, 2021).
Generally, a malicious user fakes data based on a single criterion, such as user rating, to perform shilling attacks. A malicious user finds it difficult to fake data based on multi-criteria. Therefore, fusing multiple criteria to construct a novel CFR can effectively resist shilling attacks. Unlike CFR, we fuse the two criteria of dynamic social tagging and user ratings to form user dynamic interests. The user interest is dynamic. User dynamic interests are difficult to be imitated by malicious users. Collaborative filtering recommendations using Fusing criteria increase the difficulty of a malicious user to fake attack data. The approach can improve the ability of the recommendation system to resist shilling attacks. In our approach, user dynamic interest fuses users' interest constructed by dynamic social tagging and user interest based on user ratings. Furthermore, based on the weight of scenario attribute combination in the scenario, the influence weight selection strategy is designed to find the nearest neighbours and set target users to provide reliable recommendations.
The main contribution of our work is as follows: (1) We fuse multi-criteria to construct dynamic interest, which is difficult for malicious users to fake data in user scenarios. (2) We select near neighbours by selection strategy based on attribute combination to exclude malicious user information, and (3) We verified our approach by experiments on the two synthetic datasets Smovie and Slast.fm.
The rest of the paper is organised as follows. Section 2 introduces related works. Section 3 shows the framework of our approach and its details. Section 4 reports the experiment results and analysis. In the last section, we conclude our works.

Shilling attack framework
The recommendation system has natural noise and artificial noise. In reality, some users who are rigorous and some users who are not rigorous for grading items, the rating of the strict user is a natural noise. The artificial noise mainly refers to the malicious rate for some items. That is called shilling attacks. If, a malicious user, forges a rating vector that is similar to a target user in a recommendation system, it can affect the result of recommendation for the target user. For a shilling attack, a malicious user attempts to manipulate the system recommendation results by injecting fake profile information close to other individuals. The general model of shilling attacks is shown in Figure 1 (Adomavicius & Tuzhilin, 2005), and its scoring vector (user profile information) usually consists of four parts: I S , I P , I ∅ , i r , where I S is a selected item, I P is the filler item, I ∅ is the unrated item, i r is a singleton target item. In the target item, the attacker tries to improve or reduce the recommendation frequency. The malicious user uses the filler item to disguise himself as a regular user and uses the selected item to disguise himself as the nearest neighbour of as many normal users as possible. Meanwhile, the filler item and the selected item enhance the attack efficiency.
Shilling attacks can be generally divided into three categories (Mehta et al., 2008b;Jiang et al., 2016): push attacks, nuclear attacks, and malicious disruption attacks. If the attack is intended to increase the recommended frequency of the target item, it is called a push attack, whereas a nuclear attack is an attempt to reduce the recommendation ranking of the target item and an attempt to disable the recommendation system is called a malicious disruption attack.
There are two common forms of anti-attack: attack detection (Hao et al., 2021;Julio et al., 2021) and attack defence. Attack detection is generally based on mathematical statistics. Defend against attacks, such as using user relationships in social networks to exclude attackers as much as possible. For the former, a lot of work has been done. Most detectors model shilling attacks detection as a classification problem, including three detection algorithms based on supervised learning, unsupervised learning and semi-supervised learning, divided from the perspective of machine learning. We mainly focus on the techniques for defending against shilling attacks.

CFR against shilling attacks
The CFR is to recommend resources that may be of interest to target users based on the interests of the most similar neighbours (Deldjoo et al., 2020). User rating similarity and user interest similarity jointly determine user similarity (Ma et al., 2009). By injecting fake user profile information (such as user rating) close to other individuals, the attacker attempts to influence user similarity calculation tomanipulate system recommendation results. The presentation of user interests is derived from multiple dimensions, including user interaction with the environment and user context. User social behaviours, including user tagging, interact between users and the environment. Social tagging refers to the labelling of resources in the social community from users' perspective. In the social labelling system, the content information and labelling time reflect the user's interest preference and its changing trend. As the result of users' labelling behaviours, social tags can dynamically reflect users' real interests (Eleftherios et al., 2011). Dynamic tags expressing social behaviours can reflect the similarity of users' interests from the perspective of behaviour. Such interest actively displayed by users is difficult for others to track and grasp, especially in semantically rich labels (Xi et al., 2014). It is even harder for the attackers to forge and fill the data.
Based on the potential of users' dynamic interest and the weight selection of users' situation on preferences, we propose an information resource recommendation method integrating dynamic interest with selection strategy to resist shilling attacks.

Framework
Our idea that defends shilling attacks is to obtain candidate user sets of similar interests by fusing dynamic social tags and user ratings, and then apply preference influence weighted ranking to candidate user sets of similar interests using the selection strategy, and finally determine the nearest neighbour recommendation set of target users.
Our CFR technique is illustrated in Algorithm 1. The approach has two phases: (1) Generating a neighbour set. In this phase, the dynamic interests are computed by fusing criteria. User similarity obtained no longer depends entirely on the user rating. It makes its difficult to generate the target user's nearest neighbours manipulated by malicious users (lines 2-6).
(2) Resource items recommendation. In this phase, the nearest neighbours are screened based on the preference selection strategy and formed the nearest neighbour set for target users (lines 8-9). Next, we discuss the two phases. The recommendation technique is described as follows:

Dynamic interest model
(1) Label information and time weight When users use social tags to label resources, the frequency of usage reflects the user's preference for related resources. If the frequency of tag usage is high, it has a great influence on the user's interest preference, and the related tags should have a higher weight. This paper Algorithm 1 Our CFR-F Technique Framework Input: Target user u r , label dataset R m×n , user-rating item matrix M i×j , user-label pseudo-matrix, neighbour number k; Output: Top-N recommendation result. 1: //Phase I: Generating neighbour set 2: For each user u, find the items that have been scored jointly with the target user u T and record them in; 3: According to the label dataset R m×n , calculate the label weight M u,r save its value in the matrix M i×j ; 4: Based on the matrix M i×j , calculate the similarity sim 1 (a, b) between user labels; 5: According to the matrix R m×n , calculate the similarity sim 2 (a, b) between user ratings; 6: Generate the similarity sim(a, b) between the target user u T and other users; 7: //Phase II: Generating neighbour set 8: Calculate the predicted score of all items of the target user, and select the nearest neighbour of the target user according to the influence weight of user preference; 9: Generate Top-N recommendation for the target user uses the TF-IUF method to calculate the weight of a single label. The calculation equation is shown in Equation (1): where w i is the weight of tag i in user labels, f i is the frequency of tag t i , N is the total number of users in the test set, and n i is the number of times that tag t i appears in different user tag sets.
In general, recently labelled tags have a greater impact on users' interests and preferences, and corresponding tag resources should have a higher weight than those labels earlier. Cheng et al. proposed an adaptive exponential decay function to calculate the time weight in labels, and the calculation equation is shown in Equation (2) (Cheng et al., 2008): where w time (u, r) is the time weight of the label that represents the attenuation degree of user preferences, w time (u, r) ≥ 0 and time(u, r) ⊆ N . When time(u, r) = 0, it indicates the last annotation time of the user u to the resource r. When time(u, r) = 1, it indicates the time of the penultimate annotation of the resource by the user, and so on. The hl u is the half-life of the user u, whose value varies with the user's life cycle.
(2) Dynamic label weight represents the total score for the user u i on the tag t j , which can be obtained by weighting tag information and time weight. The calculation equation is shown in Equation (3): where the parameter χ is a harmonic factor and χ ⊆ (0, 1), label weight reflects user preference, and time weight reflects the change in user preferences. Therefore, the value of χ can be adjusted according to the importance of W i and w time (u, r). If the two factors are equally important, we set χ = 0.5. If annotation information has a more important influence on user preference, χ takes a larger value. Conversely, χ takes a smaller value.

Neighbour set generation
For the similarity sim 2 (a, b) based on user labels, cosine similarity is used in our approach to calculating, as shown in Equation (4): Tthe equation I a,b indicates the resource set annotated by users a and b.
Assume that the user set is U = {u 1 , u 2 , · · · , u n } and the resource item set is I = {i 1 , i 2 , · · · , i n }, the r u,i represents the behaviour of the user u towards resource item i (such as purchase or rating behaviour). When the user does nothing, r u,i is set as 0. The behaviour of all users towards resource items is represented as a matrix of m × n, denoted as R m×n = , where the value of R(u m , i n ) represents the rating of the user u m to resource item i n .
The Pearson correlation coefficient is adopted to calculate the resource item set I a,b jointly evaluated by users a and b, and the similarity between users and each other is calculated according to Equation (5): where the r a,i is the user rating of a for resource item i, r b,i is the user rating of the user b for the resource item i, r a is the average score of the user a, and r b is the average score of the user b. The weighted calculation equation of user similarity is shown in Equation (6): where sim(a, b) is the comprehensive similarity of the integration, α and β are the weights of sim 1 (a, b) and sim 2 (a, b) similarity, respectively, and 0 ≤ α ≤ 1, 0 ≤ β ≤ 1, α + β = 1. The first k candidate neighbour sets of the target user are found by calculating user comprehensive similarity. The user dynamic interests fusing criterion makes the candidate neighbours of target users no longer depend entirely on a single criterion, such as user ratings. This reduces the weight of user ratings and reduces the likelihood that fake filling items enter the recommended target user's nearest neighbour set.

Resource items recommendation
Different attribute combinations have different influences on user preferences. Therefore, the selection strategy for user preference weight should give by different situational instances. The influence weight of user preference is set according to the similarity between target users and neighbours recommended in a specific attribute combination. The scenario instance (reflected by a specific attribute) with a high similarity degree will have a significant weight. If the preference influence weight is small, the similarity degree is low. Therefore, our selection strategy is as follows: First, according to the application needs, select several situational attributes as a combination of situational attributes. Then, the influence weight of the attribute combination on user preference in different situation instances is set. Finally, according to the given weight of the situational similarity between the target user and the candidate neighbour user, the weighted sorting is performed again to select the Top-N neighbour, which, in the candidate neighbour set, have of the target user with a similar interest.
Generally, for an application rely more on user background, such as watching movies, user preference is closely related to the factors such as gender, age, occupation, and education. The influence weight of user preferences is given according to different situational instances and the influence degree of combination attribute on user preferences. For watching movies, the user background can select two attributes: "age" and "occupation". Among them, the influence of attribute "occupation" on the preference for watching movies is higher than that of "age". We take target users a and b to be recommended as examples to illustrate the influence of weight selection strategy.
The attributes "occupation" and "age" have four situational instances. For example, if the user a and b are of the same occupation and age, the recommended users should have the highest preference weight for watching movies, and the weight should be W a,b = 1. If the user a and b are of the same occupation but different ages, the weight of user preference for watching movies should be recommended next, and the weight should be W a,b = 0.8. If the users a and b are of different occupations and the same age, the weight of user preference for watching movies should be recommended next again, and the weight should be W a,b = 0.5. If users a and b are of different occupations and ages, the recommended users have the lowest preference weight for watching movies, and the weight is taken W a,b = 0.3. The selection strategy of preference influences weight is shown in Table 1. The preference weight W a,b of the recommended neighbour selected by the target user and the user to be recommended in different situations is given.
After the top-k neighbour for a target user is obtained, corresponding information recommendations are provided for the target user. The rating of the target user a is predicted according to the weighted average equation of scoring difference, and the predicted item is obtained. The calculation equation is shown in Equation (7): where r c a,j is the predicted score of the user a for the item i, r a is the average user score a, U c a is the nearest neighbour set of the user a, sim(a, b) is the similarity of similar users a and b, W a,b is the weight of the attribute and attribute combination of user background selected in the selection strategy on user a and b preference, and where I a = {i ⊆ I|r c a,i = 0}. Finally, top-k recommended items for predicting target users are selected.
In our selection strategy, we consider the attribute combination for a target user, and similar candidate neighbours selected are different from the normal candidate neighbour set. After reordering candidate neighbours, it reduces the possibility of the malicious attacker to forge score dynamic tags and shields the fake user data into the target user's nearest neighbours set.

Empirical study
We implement our approach called CFR-F. We used the CFR-F tool to conduct an empirical study on two datasets, synthesised based on MovieLens and last.fm. We present an empirical evaluation of the effectiveness of CFR-F for different scenarios that have a shilling attack and have not it. Notably, we seek answers to the following four research questions:

RQ2. How well does our CFR-F work without shilling attacks?
RQ3. How effective is our CFR-F approach under different filling scales and attack injection scales?

RQ4.
How effective is our CFR-F approach under different shilling attack models?

Data synthesis and description
There exist several public datasets for social networks and product users. However, there is no suitable dataset available to validate a CFR using fusing criterion against shilling attacks. Inspired by Yu et al. (2021), we design an approach to construct synthetic datasets based on MovieLen 1 and last.fm, 2 which have been widely used in the CFR community.
MovieLens haa two datasets: MovieLens_D1 and MovieLens_D2. MovieLens_D1 contains 1Mb public data, including 6040 subscribers in 2000. MovieLens_D2 contains 10Mb public data including 71,567 subscribers, 10,681 movies, and 95,580 tabs. MovieLens_D1 contains user background and user rating information, and MovieLens_D2 has a user rating, a social label that users freely add tags, and time attribute of label tags. By obtaining similarities between user ratings, similar user classes are obtained as a bridge to construct a new synthetic dataset. Last.fm is an Internet radio and music community based in the UK. It has over 15 million active listeners in 232 countries. The dataset includes 2100 users, 18,745 artists, and 12,648 tags; Data items include user ID, timestamp, artist ID, artist, song ID, song name, etc. Specifically, the number of times users listened to artists from 2005 to 2011, users' tags on different artists, and the friendship relationship between users. The Last.fm dataset does not directly provide user rating or user background. Considering that film and music belong to entertainment and film contains music, it is reasonable to combine these two datasets. Therefore, dataset Last.fm needs to be pre-processed and combined with MovieLens_D1. The pre-processing takes the listening frequency range as the artist's rating, and the higher listening frequency corresponds to the higher rating.
The construction process of synthetic dataset SMovie is as follows: (1) According to the similarity between user ratings, we conclude the similar user classes; (2) User background information in MovieLens_D1 to MovieLens_D2 for a synthetic dataset (contained usersproject evaluation, film category, background information, label) in which the user project evaluation dataset contains: user ID, project, score values, and timestamp; The tag dataset contains: user ID, item, tag, and timestamp.
The construction process of synthetic SLast.fm is as follows: First, the number of times users listen to artists is divided into scores 1-5 according to different intervals. Then, the data of last.fm and Movielens are scored and matched according to the data records to construct a new synthetic dataset. If the score of Movie Lens is the same as that of last.fm, the two score records are matched successfully. Form a new record containing user background information for the corresponding record in Movielens_D1 for the composite dataset. If the two scores are different, the match fails, and the score is deleted.
Through the above construction process, the two synthetic datasets, SMovie and SLast.fm, have user scores, user background, social tags, time stamps and other data attributes. The synthetic dataset was randomly divided into the training dataset and the test dataset.

Evaluation criteria
To evaluate the effectiveness of our recommendation technique, we evaluate the performance of our approach using mean absolute error (MAE) and average prediction shift (APS).
(1) Mean Absolute Error The accuracy of the recommendation system is measured by calculating the absolute value between the user's actual score and the score predicted by the recommendation technique. If MAE is smaller, the system's recommendation accuracy is higher. Conversely, the higher its value, the worse its recommendation accuracy.
Assuming that the predicted user evaluation set is {p 1 , p 2 , · · · , p N } and the corresponding actual user evaluation set is {q 1 , q 2 , · · · , q N }, the calculation equation of MAE is shown in Equation (9): where n represents the number of scoring items of the test algorithm, p i represents the predicted score of the test algorithm, and q i represents the real score of users in the test set.
(2) Average Prediction Shift (APS) The robustness of the recommendation technique is measured by the average prediction shift. U and I are the test user set and the target item set, respectively. P u,i represents the real score of the user u for item i, Q u,i defines the score after being affected by the shilling attack, and the predicted increment PS u,i describes the difference between user u's predicted score for the item i after and before the attack. PS u,i = |P u,i − Q u,i |, then the APS is defined as Equation (10): The larger the average prediction offset, the more vulnerable the CFR is to perform shilling attacks. The closer the value is to 0, the stronger the robustness of the CFR.

Analysis of experimental results
In the mean absolute error experiment, the synthetic dataset comprises 80% training set and 20% test set. In the average predictive deviation experiment, the performance of our approach with shilling attacks is tested by inserting fake user profiles. Two attack strategies are proposed: push attack and nuclear attack. In push attacks and nuclear attacks, the target items of forged fake user profiles are assigned the highest and lowest ratings, respectively. The profile size of the forgery attack is important to affect the performance of the recommendation algorithm. The experiment intends to investigate the anti-attack capability of the system under the attack scale of 1%, 2%, 5%, 10% and 20%.

Analysis of experimental results without shilling attacks
To verify the effectiveness of our method without attack, the synthetic dataset is conventionally divided into 80% training set and 20% test set; we perform experiments to answer RQ1 and RQ2.
(1) Answer RQ1 In our approach, we fuse two criteria to improve CFR. Therefore, the weights for two criteria would influence the effectiveness of our approach. We assume the weight α is the weight of user similarity based on dynamic social tagging β is the weight of user rating similarity. We set fusion weight α = 0.75, β = 0.25, α = 0.6, β = 0.4, and α = 0.35, β = 0.65 on the same experiment, a setting to compare MAE our approach with traditional CFR. The results on dataset SMovie are shown in Table 2, and on dataset SLast.fm is shown in Table 3. Table 2 shows that in the case of the same number of neighbour on the dataset SMovie, the recommendation accuracy of our approach is higher than that of the traditional CFR. Due to the different selection of similarity weight, the corresponding recommendation efficiency is also different, and the recommendation efficiency of our approach reaches the best when α = 0.6, β = 0.4. Similarly, Table 3 shows that in dataset SLast.fm, the recommendation efficiency reaches the best when α = 0.6, β = 0.4. Therefore, α = 0.6, β = 0.4 are taken as similarity weights to compare the recommendation performance of our approach with other CFRs.  (2) Answer RQ2 To answer this question, our approach is compared with the traditional CFR, including CFR based on scoring (called CFR-T), CFR based on social tags (called CFR-S), and CFR based on user background (CFR-B). The result is shown in Figures 2 and 3.  When the similarity weight α = 0.6, β = 0.4, the MAE change curve of the algorithm in our approach on dataset Smovie is shown in Figure 2, and the MAE change curve on dataset SLast.fm is shown in Figure 3. In Figure 2, when the nearest neighbour number is less than 30, MAE becomes smaller as the number of neighbours increases, indicating that the recommendation accuracy of the algorithm is improved as the number of neighbour increases, and the recommendation quality is affected by the number of neighbours. When the number of neighbour is 30, the recommendation efficiency of the algorithm is the highest. When the neighbour number is larger than 30 and gradually increases, the MAE curve of the proposed algorithm tends to be stable. It improves the recommendation accuracy significantly compared with the other three recommendation algorithms. In Figure 3, on dataset Slast.fm, the recommendation accuracy of the algorithm also increases with the increase of the number of neighbours and gradually tends to be stable.

Analysis of experimental results with shilling attacks
To verify the effectiveness of our CFR-F approach with shilling attacks, we perform experiments to answer RQ3 and RQ4.
(1) Answer RQ3 To investigate the robustness of our approach, APS is used to evaluate the effectiveness of our CFR-F and the three CFR approaches in the above experiments under the injection of attack data of different sizes. The closer APS is to 0, the less impact the attack has on the recommended system. Figures 4-7 show the push attack and nuclear attack with a 5% population size of the user profile on dataset Smovie and Slast.fm, respectively.
After the attack data of 1%, 2%, 5%, 10%, and 20% scales are injected (considering that the attack is easy to be discovered manually when the attack scale is large), the average offset APS of the four recommended algorithms changes. Figure 8, Figure 9, Figures 10 and 11  show the APS changes after the attack data of 1%, 2%, 5%, 10% and 20% scales are injected into the push attack and nuclear attack with the padding scale of 10% on dataset SMovie and dataset SLast.fm. (Figure 9).    the four recommended algorithms increase with the increase of the attack scale. Therefore, the more attacking users, the lower the stability of the recommendation system. In addition, when injecting attack data of the same scale, compared with CFR-T, CFR-B and CFR-S, the average offset value of the prediction score of this algorithm is reduced by nearly 50%, and its offset value is more stable with the increase of attack scale. When injecting attack data with a 10% filling scale and 20% attack scale, the average offset value of the algorithm prediction score is still lower than 0.3. As shown in Figures 5 and 7 and Figure 9 and Figure 11, our CFR-F also performs well in the nuclear attack mode, indicating that it has a strong anti-attack ability.

Conclusion and our future works
How to improve the reliability and accuracy of recommendation systems has become a very important research problem. A trust mechanism is a conventional means to resist attacks, but the workload of maintaining the dynamic trust relationship between users is large, and the trust relationship obtained by mining users' historical behaviour and the score has the risk of being forged. Since it is difficult for attackers to grasp the user's interest in social tag modelling, and the selection strategy of the influence weight of scenario attributes or attribute combinations on preferences, we propose an information resource recommendation approach based on fusing criteria against shilling attacks. Experimental results show that compared with the traditional recommendation methods, our approach can effectively improve the reliability and accuracy of recommendations. Despite the effectiveness of our approach, they need more information consumption due to fusing multiple-view data. In our future work, we will focus on how to collect multiple data to support our proposed methods. Notes 1. https://grouplens.org/datasets/movielens/ 2. https://grouplens.org/datasets/hetrec-2011/

Disclosure statement
No potential conflict of interest was reported by the author(s).