API recommendation for Mashup creation based on neural graph collaborative filtering

With the increase of open APIs appeared on the Web, reusing or combining these APIs to develop novel applications (e.g. Mashups) has attracted great interest from developers. However, to quickly find a suitable one among a huge number of APIs to meet a developer’s requirement is basically a non-trivial issue. Therefore, a high-quality API recommendation system is desirable. Although a number of collaborative filtering methods have been proposed for API recommendation, their recommendation accuracy is limited and needs to be further improved. Based on the neural graph collaborative filtering technique, this paper proposes an API recommendation method that exploits the high-order connectivity between APIs and API users. To evaluate the proposed method, extensive experiments are conducted on a real API dataset and the results show that the proposed method outperforms the state-of-the-art methods in API recommendation.


Introduction
With the emergence of new software technologies, such as cloud computing, mobile computing and blockchain (Liang et al., 2019), APIs (Application Programming Inter-faces) are playing an increasingly important role in software development. Driven by the API economy, many enterprises, such as Google, Amazon and Microsoft, have published open APIs (typically Web-based) to let the third parties to access their key resources in a programmable way. Consequently, the number of open APIs on the Internet grows rapidly. According to the latest statistics of the largest Web API portal, ProgrammableWeb.com, the number of open Web APIs has exceeded 24,000. With such abundant Web APIs, it become very popular for developers to reuse or combine them to develop value-added services or new applications, such as Mashups Tang et al., 2019a). Mashups represent the web applications that integrate multiple data sources or APIs into one interface (Zang & Rosson, 2008). The prevalence of Mashups has raised the API-Mashup ecosystem, which consists of APIs, Mashups, providers, developers, etc. Figure 1 is a toy example of the API-Mashup ecosystem built on ProgrammableWeb.com, which shows that Web APIs from different providers can be combined into Mashups by developers.
The rapid increase in the number of APIs, however, has posed serious challenges to efficient API discovery and reuse. In addition, since most Web APIs are described using plain text or HTML, instead of structured language, it makes the automatic API discovery task even hard. To address this issue, API recommendation has attracted considerable attention from the academia, and a number of API recommendation methods have been proposed for automatic API discovery.
API recommendation usually relies on users' requirements and historical data of API invocations to estimate the probability of users choosing an API . Like many other recommender systems, Collaborative Filtering (CF) is also very popular in API recommendation. Usually, a CF model has two key components: (1) Embedding module, which converts users and items into vector representations; (2) Interactive modelling module, which reconstructs historical interactions based on the embedding between users and items. For example, matrix factorisation (MF) uses a latent feature vector of real values to represent each user and item, and models the interactions between users and items using the internal product (Koren et al., 2009). Collaborative deep learning extends the MF embedding function via learning the deep representation and integration of user/item preference in rich auxiliary information ; Neural collaborative filtering uses a multi-layer perceptron (MLP) to learn the user-item interaction function, which unifies the potential structures of user and item modelling (He et al., 2017a).
Previous CF methods for API recommendation usually exploit only the direct interactions between users and APIs, but ignore the deeper collaborative signals behind the interactions. As such, most of existing API recommendation methods may be not accurate enough for embedding representations of users and APIs. To further improve the accuracy of API recommendation, this work attempts to exploit the high-order connectivity of the user-API bipartite graph, and overcome the shortcomings of previous embedding representations.
In summary, the contributions of this paper are as follows: • We are the first to integrate high-order connectivity and CF for API recommendation, aiming at mining deeper collaborative signals from the messy interactions between users and APIs to assist API recommendation.
• Based on neural graph collaborative filtering, we propose a method which can effectively capture more semantic information from user/API interactions and thus promotes the embedding representation for more accurate API recommendation. • Through experiments on the data set of Programmableweb.com, we show that the proposed API recommendation method outperforms the other state-of-the-art recommendation methods.
The rest of this article is organised as follows: Section 2 surveys related work on API recommendation. Section 3 introduces the motivation and research issue of this work. Section 4 presents the details of the proposed method. Section 5 describes the experimental evaluation and analysis. Finally, Section 6 concludes this paper and outlines future work.

Related work
Web API or service recommendation is crucial for users to efficiently finding suitable ones in their application development (Liang et al., 2021a;Tang et al., 2021). Previous API recommendation methods can be roughly divided into two categories: content-based and CF-based.
Content-based methods usually rely on the requirement of user and the functional description of Web API to calculate their matching degree, and recommend Web APIs with the highest matching degrees to user. Gu et al. (2016) divide the description of the user's demand into sentence blocks through the discourse parser, and then calculates the similarity between the sentence blocks and the API's description to predict the user's preference score. Tang et al. (2019b) use TF/IDF technology to mine keywords in the API function description to expand the API tag set and make API recommendation based on wordlevel similarity. In addition, Zhang et al. (2017a) use linguistic analysis to extend synonyms for keywords that match user needs. Li et al. (2013) adopt the topic model LDA to obtain the topic feature of Web API description and user requirement, and then accomplish the recommendation task by calculating the matching degree of their two topics.
CF-based methods complete API recommendation task by taking account to the historical data of the other users and APIs, in addition to the target user and the target API. Furthermore, CF-based methods have been widely used for QoS-aware service or API recommendation (Liang et al., 2021b;Liu et al., 2016;Tang et al., 2016;Zhang et al., 2019a;Zhang et al., 2019b;Zheng et al., 2020), with a focus on recommending service or APIs with optimal QoS. Different from them, this paper focuses on recommending APIs that meet the user's functional requirements. In the following we survey some related work addressing the similar issue. Cao et al. (2017) propose a CF-based API recommendation method which firstly calculates the similarity between user needs and Mashup application based on text mining, and then recommend the APIs in the most similar mashups to users. Yao et al. (2015) propose a CF method that integrates implicit API correlations regularisation and matrix factorisation for API recommendation. Fletcher (2019) incorporates the user's implicit preferences (i.e. invocation history) into a matrix factorisation model to improve the accuracy and diversity of recommendations. Ma et al. (2020) employ the Node2Vec technology to decompose the Mashup-service invocation matrix into Mashup latent representation and service latent representation, and then integrate them to predict user preference scores of API. Xu et al. (2013) fuse multiple attributes of Mashups and APIs with the Mashup-API invocation matrix in matrix decomposition to derive the representation of users and APIs. Hao et al. (2017) incorporate the API popularity degree into CF-based model to recommend API based on the user's Mashup development needs.
There are also some studies which employ network or graph models to represent the data of users (or applications) and APIs, and based on which make API recommendations. For instance, He et al. (2017b) model the API cooperative relationships as a network, and formulate API recommendation as an optimal Steiner tree problem. Liang et al. (2016) construct a heterogeneous network by jointly using API, Mashup, tag and API provider attributes, and adopt meta-paths to calculate the similarity between Mashups.
Although previous CF-based methods have exploited direct or indirect connections among user and API for API recommendation, they ignore or fail to explore the highorder connectivity of user and API. This work fills this gap by exploring the higher-order connectivity from the user-API bipartite graph.

Problem definition
The Web API recommendation task studied in this paper can be defined as follows: Given a set of APIs and a set of API users (e.g. Mashup developers) as well as a set of invocation records of users on APIs, for any target user, recommend new APIs to him/her as accurate as possible.
To address the above problem, we firstly model the interactions between users and APIs as a bipartite graph, and then explore the higher-order connectivity of the user-API bipartite graph for API recommendation. Figure 2 illustrates the research problem and motivation of this work. The left part of Figure 2 is a bipartite graph illustrating the interactions between APIs and API users. The link between a user u and an API i indicates that the user has used the API to create a Mashup. The right part of Figure 2 shows the high-order connectivity between API and users. For example, u 1 and i 1 are with the first-order connectivity for they are directly connected in the user-API bipartite graph, while u 2 and u 1 are with the second-order connectivity for they are connected by at least two edges. The collaborative relationships between users and APIs are also referred to collaborative signals. Typical CF-based recommendation systems usually adopt the collaborative relationships between a user and its first-order neighbours to make recommendation. Therefore, they fail to exploit the second-order or higher-order collaborative signals between APIs and users, which are certainly useful. By extracting the necessary collaboration signals and explicitly encoding key features into the API and user embeddings, we can optimise the user and API representation for recommendation task.
Let's further use Figure 2 as an example to explain how to explore the high-order connectivity of the user-API bipartite graph. As Figure 2 illustrates, the high-order connectivity sub-graph originated from the active user u 1 has a tree-like structure, where l denotes the number of hops for the other users/APIs to reach u 1 . In this setting, the preference scores of u 1 for APIs can be inferred. Intuitively, since there are two paths from u 1 to i 4 (i 4 → u 2 → i 2 → u 1 , i 4 → u 3 → i 3 → u 1 ), while only one path from u 1 to i 5 (i 5 → u 3 → i 3 → u 1 ), and both cases are with the same length l = 3, it is reasonable to infer that the interest of user u 1 in i 4 is probably greater than that in i 5 . Besides the number of paths between a user and an API, path length also plays an important part in determining how likely the API will be used by the user. Although the one-hop, i.e. first-order connections are the most important, high-order connections also have rich semantic information. Therefore, we introduce them into the representation of APIs/users, thus optimising the node embedding in the user-API bipartite graph.

The proposed method
Our proposed Web API recommendation method is inspired by the neural graph collaborative filtering model (Wang et al., 2019). The model of the proposed method is overviewed in Figure 3. The model mainly consists of two components: (1) Embedding propagation layer, which initialises the embedding of APIs and users, and then optimises the multiple embedding propagation layers between APIs and users by high-order connectivity; (2) API Prediction layer, which integrates the embedding of multiple propagation layers and calculate the user preference scores for all target APIs. The details of the proposed method are presented as follows.

Embedding propagation layer
The user and API embedding vectors are represented as e u ∈ R d and e i ∈ R d respectively, where d represents the size of the embedding vector. In this setting, we construct a parameter matrix as a lookup table for embedding purpose: ( 1 ) It is worth mentioning that this lookup table is used as the initial state of embeddings for API and user, so that it can be optimised as an end-to-end way.
Next, based on the message passing structure of GNNs (Xu et al., 2018), we plan to capture the collaborative signals along the graph structure and optimise the embedding of users and APIs. We first introduce the propagation rules with only one layer, and then extend it to multiple stacked layers.

One-order propagation
In the API recommendation scenario, an API called by a user can be regarded as a collaborative feature of the API, which can be used to measure the similarity between different APIs. Based on this consideration, our embedding propagation can be performed between connected users and APIs, and the process can be formulated through the two main operations of message construction and message aggregation.
Message Construction: For user u and API i that have an interactive relationship (u, i), we define the message from i to u as follows: where m u←i represents the message embedding (i.e. the information from API to user). f (·) is a message encoding function with two embeddings and a coefficient as input, where the coefficient p ui control the decay factor of each propagation on the edge (u, i).
In this paper, we implement f (·) as: where W 1 , W 2 are trainable weight matrices that are used to extract useful propagation information, and denotes the element-wise product operator. The traditional graph convolutional network only considers the contribution of e i , different from which we encode the interaction between e i and e u into the message passing through e i e u . This operation makes the message depend on the affinity between e i and e u , which can convey more information from similar APIs. This improves not only the model representation ability, but also the recommendation performance.
Following the parameter settings of the graph convolutional network (Kipf & Welling, 2016), we set the graph Laplacian norm 1/ √ |N u ||N i | to p ui , where N u and N i represent the first-hop neighbour of user u and API i respectively. From the perspective of representation learning, p ui reflects how much the historical item. contributes the user preference. From the perspective of message passing, p ui can be interpreted as a discount factor, considering that the messages being propagated should decay with the path length.
Message Aggregation: In this step, the neighbour node information near user u will be integrated into the new user representation u through an aggregation operator. The aggregation function is defined as follows: where e (1) u represents the embedding representation of u, which is obtained after the first embedding propagation layer. LeakyReLU(·) is the activation function, which can solve the "gradient disappearance" problem and speed up the convergence. In addition to considering the message propagated from the neighbour N u to the user u, we also consider the self-connection of u: m u←u = W 1 e u , which allows the original feature information to be preserved. Similar to the user aggregation method, we can obtain the representation of API e (1) i by gathering information from its connected users. From the above, it can be seen that the embedding propagation layer has advantages in explicitly using first-order connectivity information to associate users with API representations.

High-order propagation
Here we explore higher-order connectivity information by stacking multiple embedding propagation layers. The high-order connectivity between users and APIs is actually important for encoding the collaborative signal to calculate the correlation scores.
After stacking one embedding propagation layers, users or APIs can receive messages propagated by their l-hop neighbour. As shown in Figure 2, in the l-th layer, the representation of user u can be recursively represented as: Among them, the definition of the message being propagated is defined as follows: where w (l) 2 are trainable transformation matrices, e (l−1) i is a low-dimension representation of the API generated from the previous message-passing steps, which used to store messages from its (l − 1) hop neighbour. It helps to further obtain the representation of user u on layer l. Similarly, we can obtain the representation of API i at the layer l.
Based on the high-order connectivity representation between users and APIs (as shown in Figures 2 and 3), we can capture collaboration signals like u 1 ← i 3 ← u 3 ← i 5 during the embedding propagation process. And the message from i 5 can be explicitly encoded as e (3) u 1 . In this way, the cooperative signals can be seamlessly injected into the representation learning process through the stacking of multiple embedding propagation layers.
Rule of Matrix Propagation: In order to clarify the embedding propagation and enable batch training, we provide a matrix form of hierarchical propagation rules, which is equivalent to formula (6) and (7): where E (l) is the embedding of the user or API which after l step propagation, and I represents an identity matrix. In the initial message passing phase, we set the iteration E (0) to E, and let the initial representation of user and API be e i = e i . The notation L is the Laplacian matrix of the user-API graph, and its formula is: Where D denotes the diagonal degree matrix, A represents the adjacency matrix, and R is the user-API interaction matrix. The tth diagonal element is denoted by D tt = |N t |. Thus, the non-zero, non-diagonal term L ui = 1/ √ |N u ||N i | is equal to p ui used in Equation (3). Through the implementation of the above matrix propagation rules, the representation of all users and APIs can be updated at the same time. Meanwhile, it allows us to discard the node sampling procedure when training graph convolutional networks on large graphs (Berg et al., 2017).

API Prediction layer
After the l-layers message propagating, we can obtain multiple representations of user u, denoted by {e (1) u , . . . , e (l) u }. Since the representations obtained by users in different layers emphasise the messages delivered by different connections, they play different roles in reflecting user preferences. Therefore, we connect multiple representations of different layers in series to form the final embedding of the user. Then, we perform the same operation on the API to cascade the representations learned at different layers to obtain the final API embedding: where || is a concatenation operator. In this way, not only the initial embedding can be enriched, but also the propagation range can be controlled by adjusting the number of propagation layers l. The advantage of using concatenate operations has also been proved in previous work of graph neural network (Xu et al., 2018). In addition to being simple to use and easy to understand, the concatenate operations do not involve the learning of other parameters, and play a crucial role in the layer aggregation mechanism. After obtaining the final embeddings for users and APIs according to Equation (10), they will be used for recommendation score prediction. Since our focus is optimising the embedding learning by exploiting high-order connectivity, we simply adopt the inner product for user's preference prediction:ŷ For model parameter training, we use the paired BPR loss function which is widely applied in recommender systems (Rendle et al., 2012). Specifically, BPR assumes that observed interactions (than unobserved interactions) should reflect higher predictive values, and thus should reflect the user preferences better. The objective function is defined as follows: Where O = {(u, i, j)|(u, i) ∈ R + , (u, j) ∈ R − } represents paired training data, and among them, R + are the observed interactions, while R − are the unobserved interactions. Notation σ (·) denotes the sigmoid function, represents all trainable model parameters, and λ is used to control the intensity of L 2 regularisation to prevent overfitting. We use mini-batch Adam (Kingma & Ba, 2014) to adjust the prediction model and renew the model parameters.

Dataset description
In our experiments, we use the most popular online web API repository, Prgrammableweb.com (PW for short), to evaluate our method. PW is a website that collects the meta-data about web APIs and corresponding applications (e.g. mashups) that use them. We crawled all web APIs and Mashups from PW, and analysed the interactions between web APIs and API users (Mashups). The dataset includes 21,900 APIs, 6,435 Mashups and 13,340 interactions between Mashups and APIs. Some details of our experimental dataset are shown in Table 1. For the sake of evaluation, we remove the API users (i.e. Mashups) that have only one API invocation in the dataset. Eventually, 80% of interaction records are used as the training set, and 20% of the rest are used as the test set.

Evaluation metrics
For all users in the test set, we treat each API that the user has not invoked with as a negative item. Each model can output user preferences for all APIs. In order to estimate the effectiveness of Top-K recommendation and user preference ranking, we use two evaluation indicators, Recall@K and nDCG@K, which have been widely used in various recommendation systems (Yang et al., 2018). In this experimental evaluation, we set K = 5, 10, 15, 20, 25, respectively. After calculating the evaluation metrics in our test set, average metrics of all users will be calculated.
Recall@K is the ratio of the actual APIs in the top-K API recommendation list to the actual APIs used by the user requirements. It can be defined as: nDCG@K assigns different weights to each API in the top-K recommendation list, and the highly ranked APIs should have larger weights. One of its frequently used definitions is: where rel(i) is a binary value that indicates whether the user has used the candidate APIs. If true, rel(i) = 1, otherwise rel(i) = 0. Notation c is the number of APIs that are truly used by the user in the top-K candidate list. The metric nDCG@K is implemented by standardising DCG@K (IDCG@K) to evaluate the recommendation accuracy.

Baseline methods
In order to validate the effectiveness of our proposed method, we choose the following methods to compare with our proposed method: • MF (Koren et al., 2009): MF (matrix factorisation) is a ranking-oriented recommendation algorithm proposed for implicit feedback scenarios. This method explores the direct interaction of nodes (first-order connectivity) on the user-API bipartite graph. • NeuMF (He et al., 2017a): This method adopts Deep Neural Network to implement CF, using multi-layer perceptron (MLP) to learn user and API's interaction features, and modelling the implicit feedback(first-order connectivity) of user-APIs by using linear capabilities of MF. Here we use a two-layer architecture with fixed hidden layer size. • GCN (Kipf & Welling, 2016): This method adopts a local first-order approximation of spectral convolution to represent the convolutional network structure of users and APIs. It makes current node contains the information of direct neighbours, and the 2-step neighbour's information can be included when calculating the next layer. • GC-MC (Berg et al., 2017): This recommendation method employs a user-item bipartite graph and uses a graph autoencoder framework to make recommendations from the perspective of link prediction.

Experimental results
In this subsection, we compare the performance of our method and the baseline methods mentioned previously. The experimental results are shown in Tables 2 and 3, and we explain them as follows: • According to Tables 2 and 3, it is illustrated that our method consistently performs better than the baselines in all cases. More specifically, when Recall@K is compared, our method improves the best baseline by 7.22-26.10%; while nDCG@K is compared, our proposed method improves the best baseline by 20.96-36.77%. It is usual that a user needs less than 5 APIs to create a Mashup, thus, when the number of recommended APIs increases, the improvement in the recall rate gradually decreases. The experimental results indicate that exploiting high-order connectivity information can significantly improve recommendation performance. • We can also observe that MF and NueMF have the worst performance. This indicates that simply using the inner product in the MF models is not enough to exploit the deep interactions between APIs and users. GCN performs better than MF and NeuMF for it explicitly encodes the connections between users and APIs in embedding learning. Among all baselines, GC-MC has the best performance. This indicates that introduction of the firstorder neighbourhood and differentiable graphs self-encoding framework can improve the embeddings of users and APIs to some extent.

Hyperparameters analysis
In this subsection, we discuss the impact of the model's hyperparameters used for data training on recommendation performance. We fixed the other parameters and only vary the hyperparameters to carry out the experiments. The hyperparameters include user or API embedding dimensions, L2 regularisation impact factor λ, node loss rate, and message loss rate on the model during training.    Figure 4 shows the impact of embedding sizes of users and APIs on Recall@25. We can observe, increasing the embedding size of users and APIs initially improves the recommendation performance. More specifically, when the embedding size grows from 16 to 32, Recall@25 increases from 0.5259 to 0.5338. However, when the embedding size exceeds 32, the Recall@25 values begin to decline rapidly. This observation indicates that a moderate embedding size can provide enough information storage space during training. If the embedding size is too small, it may result in the information loss of some users or APIs in the embedding. On the contrary, if the embedding size is too large, it may cause information redundancy and increase the time overhead of model training.
Node dropout and message dropout techniques can be used to prevent model overfitting (Berg et al., 2017). Figure 5 shows the impact of node dropout ratio r1 and message dropout rate r2 on Recall@25. Of the two dropout strategies, node dropout performs better than message dropout in most cases, as we can observe from the figure. Moreover, node dropout seems to have less impact than message dropout on Recall@25 when varying their values. For example, when the node dropout ratio increases from 0 to 1, Recall@25 only changes slightly within a small range, and there is no consistency tendency. In contrast, when the message dropout ratio increases to 0.6, the value of Recall@25 declines significantly. However, when the dropout ratio continues to increase from 0.6, the Recall@25 value rises. Overall, it seems that smaller dropout ratio is more appropriate for the proposed method.
Besides the embedding size and dropout rate, the most important hyperparameter is the coefficient λ of L2 regularisation. Regularisation is a commonly used technique in machine learning. Its main purpose is to control model complexity and decompose over-fitting. The basic regular method is to add a penalty term to the original target (cost) function to "punish" the model with high complexity. Here, we will explore the impact of different λ on model performance. As shown in Figure 6, when λ is set to 1e-5, Recall@25 reaches the optimal value. When λ is greater than 1e-5, the performance of API recommendation will decline rapidly, which indicates that excessive regularisation will adversely affect the normal training of the model.

Conclusion and future work
This paper proposes an API recommendation method based on neural graph collaborative filtering, which exploits the historical user-API interactions. The key to the proposed method lies in the embedding propagation which mines the deeper interactions between users and APIs to obtain richer collaborative signals, so that the embedding representation of APIs and users is improved. The experiments on a real dataset demonstrated that introducing the user-API interaction graph into the embedding learning and extracting more collaborative signals of users and APIs can indeed improve the performance of API recommendation.
In the future, we will consider integrating content and connections of APIs to recommend APIs. We will also consider incorporating the QoS of APIs (such as reputation) into API recommendation.