A pricing model for subscriptions in data transactions

With the increasing demands for data, the subscription scheme came into being in the face of pricing for an extensive and unfixed number of data items. However, in the existing subscription scheme, a diversity of customers in the real market may lead to the lack of stability, which means risking the failure of pricing. Additionally, the study involves arbitrage-free, an essential economics concept, which is not reasonable on data items. To address these problems, this paper provides insights for designing an improved subscription scheme that includes two components: the calculation and the specific validity. On the one hand, the calculation improves the existing scheme by building a new structure that combines different customers' behaviours instead of the separated calculation in the existing scheme, and can steadily set prices for subscriptions to maximise the sellers' profit even in a real market. On the other hand, the specific validity shows the improvement towards arbitrage-free by taking the characteristics of data subscriptions into account. In other words, the specific validity endows the scheme with more rationality.


Introduction
As data are becoming more widely used, the demand for data transactions is also increasing, and therefore, so is the demand for pricing models of data transactions. In recent years, many large platforms which provide data transaction service are becoming indispensable, such as FACTUAL (2021), crunchbase (2021), Hu et al. (2021), Fernandez et al. (2020), SDTE from Dai et al. (2020) and GBDEX (2021). Different demands towards data transactions emerge in these platforms, resulting in the need of pricing models. For example, individuals may plan to buy a definite and limited number of data items each time, thus, pricing for every single item precisely is necessary. According to the precise data pricing schemes, e.g. query-based models and game theory methods, after waiting for an inefficient and unintuitive calculation, individuals can obtain the precise price. Meanwhile, large organisations, such as firms and research institutes, may have continuous and extensive demands over a continuing period, e.g. a potential demand of 200-300 items each year. In order to choose a data supplier for cooperation, they may mainly focus on the total expenditure rather than the cost of any specific item, not willing to wait for the unintuitive and inefficient pricing time and time again. Therefore, an intuitive and efficient pricing strategy for the continuous and enormous demand is in need, which is commonly called the subscription model.
Generally speaking, the subscription scheme is the strategy which intuitively instructs customers how much they should pay when choosing a number of items. Specifically, the subscription scheme has three main features. Firstly, a subscription model usually sets the price of certain data items in total in advance and ignores which items are purchased by the customers. Thus, the subscription model is efficient and intuitive, which is completely different from precise pricing schemes. Secondly, to set the price, the subscription model is used to take the behaviours of customers and sellers into account. When a pricing model attempts to ignore the difference among items, whether customers and sellers can reach agreement according to the subscription scheme and determine the sellers' profit. Thirdly, the subscription scheme is usually multi-stepped and customers can choose the step they need and then pay for it. Each step contains a price and an upper bound of items customers can purchase. Thirdly, the subscription scheme is usually multi-stepped and customers can choose the step they need and then pay for it. Each step contains a price and an upper bound of items customers can purchase.
The features above have brought some new but essential challenges which have not been all solved at once. Firstly, a well-designed scheme should find a way to set the multistepped model efficiently and intuitively. It shall be able to tell customers the price to pay in advance. Secondly, while the scheme ignores what items customers actually choose, it is therefore necessary to take the behaviours of customers and sellers into pricing. How do you ensure the agreement maximises the sellers' profit? Thirdly, the pricing scheme should work steadily in a real market, despite the various kinds of customers. Finally, the subscription model should meet arbitrage-free to be reasonable, which is an essential but basic economic requirement for pricing. When considering the characteristics of data transactions, what kind of arbitrage-free should the data subscription schemes meet and how to meet it remain ongoing challenges.
To overcome the challenges in subscription schemes, this paper firstly analyses the existing subscription scheme. The existing mainstream subscription scheme, proposed by Kushal et al. (2011), takes the behaviours of customers and sellers to build a model to overcome the first two challenges. However, there are two remaining challenges. Since its separated calculation on each type of customer may not cover the maximum profit, it may not work steadily in many real conditions. The detailed analysis is in Section 4. Besides, its requirement for a subscription model in data transactions towards arbitrage-free is not reasonable enough. While pricing for data items, the features of data products should be considered.
Facing the two remaining challenges, this paper improves the existing scheme to form a new strategy including two components: the calculation and the specific validity. By combining the behaviours of each type of customer in advance, the new calculation fixes the shortages of the separated calculation. After that, the process shows the reason why it works steadily. Considering the irregularity of steps in data subscriptions, this paper claims a new standard towards arbitrage-free, demonstrating that the standard is reasonable, and then presents an operation to reach it.
In summary, this paper has three main contributions as follows.
(1) The problems in existing subscription models are analysed, as well as the way of improvement.
(2) The combined calculation is put forward to build a new subscription model that can target different kinds of customers. In this way, the model works steadily (3) This paper proves why the common requirements in arbitrage-free do not fit data items. A new standard is proposed towards arbitrage-free in data subscriptions and then the way to reach the standard is also presented. Though Muschalle et al. (2012) and Balazinska et al. (2011) previously study the requirements of data pricing, however, to the best of our knowledge, there are not any mainstream pricing schemes. Different schemes have different focuses, and the advantages and disadvantages vary among them. These schemes can be roughly divided into two parts: the traditional methods and the modern methods. Some large platforms, e.g. GBDEX (2021), still use traditional methods, such as auction and bargain, to reach an agreement. Though some research, e.g. Zhao et al. (2020), attempted to enhance the efficiency of data auctions, the traditional methods are still relatively inefficient.

Related work
The modern pricing schemes can be divided into two types according to the purpose, ones aiming at pricing for every item precisely and other ones aiming at pricing for items in total.
The schemes aiming at pricing data items one by one contain two main algorithms, the query-based model (Tang et al., 2013(Tang et al., , 2015Zhang et al., 2020) and the methods based on game theories (Bataineh et al., 2021;Liu et al., 2019;Niyato et al., 2016;Xu et al., 2020). Koutris et al. (2012) firstly illustrate the requirements for the query-based model. After that, some data markets, e.g. (crunchbase, 2021), find a model to offer the final price after a query. At first, the data provider defines some views with price in its local data warehouse. After that, every query creates an undefined new view, which should be divided into a combination of defined views, and then the price of the new view is generated according to the combination. Tang et al. (2015) evaluate the scheme and conclude that it can even precisely offer a price for a single tuple. Considering that dividing the new view into the combination of defined views is an NP-hard combination problem, Tang et al. (2013) find a way, named Minicon, to accelerate the computing process to keep the price returned in time. Moreover, Zhang et al. (2020) enhance Minicon to fit their market of IoT data. Query-based models work well in providing prices for each query precisely, at the same time however, there are also many disadvantages. For instance, these models seriously limit the scale in data trading. Though Tang et al. (2013) and Zhang et al. (2020) enhance them, a lot of time is still needed to calculate the price. Nowadays, many purchasers' demands towards data transactions are continuous and extensive, who are not willing to wait for pricing hundreds of times. Additionally, the defined view may not cover all the views, in other words, the new view cannot be divided into a combination of the defined views. Customers possibly have to wait for the remedies. Niyato et al. (2016) offer a scheme based on a game theory, the stackleberg model. This scheme divides participants into three groups, namely, the data exchanges, data vendors, and data purchasers. When the data exchange gets data from the vendors, machine learning is used to score the quality of every dataset. Then, with the help of the stackleberg model, the exchange determines a final price for the dataset. This scheme aims to maximise the difference between the cost it pays to vendors and the revenue it can get from customers. In other words, the main difference from other schemes is that this scheme aims at maximising the interests of the exchanges instead of the sellers'. Besides, the stackleberg model is used by Xu et al. (2020) to build a blockchain-based data platform of car sharing and by Liu et al. (2019) to set a two-step data pricing scheme. However, these schemes are confronted with a serious challenge that before these exchanges gain dominance, why vendors agree to supply data if they cannot get the largest profit? How can the purchasers and the vendors trust in the third party instead of trading with each other directly? Furthermore, these models can neither tell the price in total for many data products towards the continuous and extensive demand. At last, these models take customers' willingness into consideration, but still ignore the variety of customers.
Considering that many purchasers have a constant demand for data, the price of every single item is no longer the most necessary issue. As a consequence, many data markets attempt to use subscription schemes that focus on pricing for data items in total. These schemes can replace the precise strategies as well as being a supplement to them. In a subscription model, purchasers can choose a step and acquire as many items as they wish, but no more than the upper bound. Instead of scientifically based however, the origin subscription model is usually experience-based. Then Kushal et al. (2011) gives a multi-stepped model to maximise the sellers' profit and a operation make the model valid and Chen et al. (2019) give an example of providing customers with subscription services. But unfortunately, when facing the real complex market with various customers, Kushal et al. (2011) do not work steadily. Furthermore, the standard and operations towards arbitrage-free are not reasonable enough in data transactions, which also limits the actual utilisation in the real market.
Owing to the problems different pricing models have, determining a well-designed pricing scheme facing the extensive and continuous demand in the real market is challenging. A subscription model is presented in the paper, which works steadily to maximise the sellers' profit. Moreover, it also enables the pricing models to fit arbitrage-free better by considering the characteristics of data transactions.

Basic frame work
This section introduces the common basis of subscription schemes in data pricing from Kushal et al. (2011). Since either of them is changed, this section is simple to understand. Three basic concepts are presented in data subscriptions, namely, the basic assumptions, roles that the participates play and the requirements for it.

Assumptions
Here are the assumptions of the pricing model in subscription schemes.
(1) The pricing model only depends on the number of items customers select. In other words, the pricing model is purchased-amount invariant.
(2) The basic price of each item is determined independently before modeling.
(3) There is cost associated with the production of the dataset only when it is published on the data market. There is no more cost for the seller every time it is sold. (4) Each kind of CWTP function is the same in each kind of customer. In a sense, this can be justified by averaging the kind of customers.

Roles in subscription models
In common subscription models, the roles of participants can be roughly divided into three categories: sellers, customers and data markets.
(1) Sellers: The sellers allow the customers to check the data items and choose what they want. In any event, sellers concentrate on getting the profit as large as possible.
(2) Customers: Customers are those who are interested in buying data. Generally, they only agree to have a deal when the price the seller asks for is lower than their subjective threshold. In a real market, the willingness of customers may vary. (3) Data markets: A data market brings sellers and customers together. For the reason that the commission of the data market is a constant number, it does not affect the pricing scheme. The sellers set the price independently.

Requirements
To maximise the sellers' profit, there are some requirements a well-designed pricing scheme should meet. The pricing model, P(n), which is an increasing function of variant n, is the amount of items customers determine to buy.
(1) Arbitrage-free: A pricing model should be concerned with arbitrage-free, an important concept in economics described in Section 6. (2) Diminishing Returns: The price should vary sub-linearly according to the purchase amount, which means P(n)/n is a non-increasing function.
(3) Consumer Buying Power: The pricing model should capture and closely follow the buying power of the customers. i.e. it should be able to represent customers' maximum willingness to pay.

The calculation part of subscription schemes
This part introduces the existing subscription scheme and our improvements towards it. Firstly, Section 4.1 shows the basic modelling framework on which both Kushal et al. (2011) and this paper are based. Then, the second subsection illustrates, though Kushal et al. (2011) claim that its calculation can work in the complex market, the reason it is still unstable in many common conditions. Therefore, the subscription pricing schemes when facing different types of customers remain a problem. After that, the improvement of the subscription scheme to solve this problem is shown. At last, a contrast between the existing scheme and the proposed model is carried out.

Basic modeling
Now that the roles are divided into three categories, two functions are established, aiming at characterising the sellers and customers' behaviours.

Cumulative willingness to pay
Cumulative willingness to pay (CWTP(n)) is defined as the average maximum willingness of customers to pay for a total of n items, which has the following properties.
In the real market, there are usually different kinds of customers. According to Choudhary (2010), all the customers for information goods can be divided into two types, type c, and type d. Consequently we have two CWTP functions, CWTP c and CWTP d to characterise customers' behaviours. When type c customers want to buy one more item, the added price they are willing to afford is fixed. More specifically, CWTP c is linear. In contrast, type d customers have a declining willingness with a sub-linear CWTP d . Let (1 − k) be proportion of type d and type c has k. e.g. CWTP c and CWTP d generally have the form as follows: When customers are divided into two types, the functions seem clear. In Section 4.2, it is illustrated why setting the subscription scheme facing two CWTP functions remains unsteady in the existing work.

Pricing model
The pricing model gives the price of the items sellers ask to pay. P(n) means the purchaser needs to pay P(n) for n selected data items, which shall meet the following conditions.
A subscription scheme is absolutely a pricing model. A subscription scheme usually contains multiple levels with the form (P 1 , T 1 ), (P 2 , T 2 ), . . . , (P m , T m ) where m is the total number of steps. Besides, {P j }, j = 1, . . . , m, as well as {T j }, j = 1, . . . , m, is an increasing sequence. This structure means that customers can buy up to T j items by paying P j . For instance, a customer who chooses the step (20.00, 100) can buy up to 100 data items with the cost of 20.00.

The calculation part in existing subscription scheme
Let us recall the calculation part from Kushal et al. (2011). The shortage of the calculation is analysed when customers type c and d both exist. Though has claimed to work, a steady subscription scheme in the real market is still a problem.

A recall on the existing subscription scheme
This section recalls the way the existing subscription models set the pricing scheme. Considering the two kinds of customers in the market, this part analyses the relations between the pricing model and the customers' willingness. Next, the scheme calculates the profit from each type of customer separately and sums them up together. And the price which should be set on each step by maximising the total profit is also explored.
Analysis: The figures above present the relations between customers' behaviours and sellers. According to the definition of CWTP above, and considering every fixed n, customers are willing to pay for n items only if CWTP(n) ≥ P(n). The point of intersection is defined as n j , CWTP(n j ) = P j . In each step, based on the relations among n j , T j−1 , T j , three different cases may appear in Figure 1. Case1 means cwtp(n) is above the pricing model while cwtp(n) is under the pricing model in case3. When a smart seller sets the subscription model, however, only case2 which means cwtp(n) crosses the pricing model can happen. Kushal et al. (2011) show a detailed proof.
In case2 (i.e. T j−1 < n j < T j ), customers decide to buy n items from level j when n j ≤ n < T j and T i from level i when T j−1 ≤ n < n j , given by max i ≤ j|(P i ≤ CWTP(T i )). π j , the profit from level j can be calculated by using the formula as follows: ( 1 ) In conclusion, the profit in jth is as follows when T j−1 < n j < t j : To calculate the profit from various kinds of customers, the scheme computes the profit from each type separately and then sums them up.
The profit from type c only: The profit from type c only has the following form. The critical point is given by P j = wn j , and P j 's are constrained by T j−1 ≤ P j /w ≤ T j .
The seller can maximise his profit by searching for the local extremum point. After solving m equals from ∂π s /∂P j = 0, the price should be set on the jth interval is The profit from type d only: In the same way, the profit from customers type d only is as follows. This scheme can help us find the critical point n j , while P j 's are constrained by Let ∂π s /∂P j = 0, the solutions are very complicated. For example, let m = 2. The prices on the intervals are, The profit from all customers in total: After calculating the profit from type c and d separately, the scheme finally focuses on the real market which contains different kinds of customers, and directly sums up the profit from type c in Equation (4), and type d in Equation (7), put together as π s , After that, to find the extremum point on each step to set the price, m equations (∂π s /∂P j = 0) need to be solved. However, this scheme neither shows the exact results in this complex condition nor the constraints of P j 's domain. In fact, this scheme may not work steadily.

The problems of the existing scheme in real markets
Here we analyse the problems in the existing scheme. When we solve equations (∂π s /∂P j = 0), usually, we cannot obtain the results directly. The analysis is also helpful for us to find a way to improve the subscription to work steadily for both kinds of customers. Here are three definitions and one proposition for the following analysis. Here are three definitions and one proposition for the following analysis.
Definition 1: Solvable: Regardless of the constraints on the domains of the variables, when the equations to find extremum points have solutions, we call this scheme solvable. Otherwise, it is unsolvable.
Definition 2: Feasible: When the scheme to find extremum points is solvable, and the solutions obey the constraints on the domains, the scheme is feasible. Otherwise, when the solutions are beyond the domains, it is infeasible.
Definition 3: Replaceable: According to the extreme value theorem, when the scheme is infeasible or unsolvable, it may be meaningful to take the maximum point on the boundary instead of the extremum point. The scheme is replaceable when meaningful and unreplaceable when meaningless.

Claim 1: A subscription scheme is able to set an optimised price on each step to maximise the sellers' total profit only if it is feasible or replaceable. Such a scheme is called workable.
Proof: (sufficiency) The proof of sufficiency is apparent. The target of the scheme is maximising the sellers' profit on each step. When the equations are feasible or replaceable, solutions can always be found to maximise the profit, which is workable.
Proof: (necessity) The maximum profit can be found from ∂π s /∂P j = 0 or on the bound of the domain. When the scheme is infeasible, the extremum points do not exist. At this time, if the scheme is unreplaceable, picking up the optimised P * j on the boundary is meaningless. As a result, it is not possible to maximise the profit on the step, which in other words, the scheme is not workable.

Proposition 1: A subscription scheme is workable only if it is feasible or replaceable.
According to the definitions and proposition 1, the following paragraphs present why the scheme by Kushal et al. (2011) is not workable steadily. As a result, it cannot be used to optimise the subscription model. Table 1 shows when m is equal to 3,4,5,6,7, respectively, it is neither solvable nor feasible.
Unsolvable: First and foremost, this scheme may be unsolvable with some common m's. For instance, when m is equal to some common values, e.g. m = 3, 4, 5, it returns no solutions for any step at all. At this moment, this scheme cannot be used to set the price.  Here is the analysis of the reason. The integration of this framework is based on the intelligent sellers pursuing a pricing model in case2 of Figure 1. But in this complex condition, when both CWTP curves exist, not both curves satisfy in case2. For example, one CWTP in case1 and the other one in case2, together with one CWTP in case3 and the other one in case2, may lead to a larger profit. Specifically, Figure 2 shows when P j declines from (b) to (a), the profit of CWTP c goes down while the profit of CWTP d gets larger. It is difficult to tell whether the profit in total gets larger or not. In the same way, whether the profit in (b) is larger than (c) is unknown. But the integration above only includes the condition that both two types are in case2. In a word, the calculation sums up the two separated profit integration may not cover the extremum point in many conditions, which may result in either an infeasible or unsolvable scheme.
Infeasible: Though the solutions may exist, when focusing on the expansion of each P * j , there is no guarantee that P * j lies in its domain, which is from T j−1 to T j . In other words, the scheme may be infeasible.
Unreplaceable: As we all know, a function must have a maximum point on a domain of finite close interval. The constraint of P j is a finite close interval. When the extremum point does not exist, the maximum point always exists on the boundary of the domain. In this case, when the scheme is unsolvable or infeasible, is it possible to pick up the solutions on the boundary instead?
The answer is no. If the scheme sometimes is not feasible or solvable according to the exact equations, picking up the solutions on the boundary is meaningful. But if the analytical solutions do not exist at all with some m's, the scheme cannot work steadily. Do you want to choose a subscription model which cannot work when you want to set a three or five-step model? Therefore, picking up the solutions on the boundary is meaningless in this work, which is unreplaceable.
In conclusion, facing the complex situation where different kinds of customers exist, the existing scheme may be unsolvable, infeasible, or unreplaceable. In other words, according to proposition 1, the scheme is not workable in many conditions. Although the existing scheme is established, a steady subscription model towards complex customers in the market is still a problem remains unresolved.

The new calculation part
Considering the disadvantages of the existing scheme, finding a new way to set up the subscription model in the real market catches our eyes. As common sense, the data pricing scheme should be workable no matter how many steps sellers choose. The way of calculating the profit separately in advance and then summing up together cannot work steadily. Naturally, combining customers' behaviours in advance and maximising the profit later is focused on in this paper.
This part is presented with an analysis of the combined calculation and an introduction to the new scheme. Besides, it is proved in the paper that the proposed model, the new subscription scheme, is workable in the real market by theoretical analysis.

Analysis and the properties
The main reason why the existing scheme cannot work steadily is that calculating the profit from each type of customer separately may not cover the maximum points. Therefore, attention shall be paid to combining the customers' behaviours in advance as a mixed CWTP, CWTP mix and optimising the pricing model. This part analyses the way to combine the behaviours of type c and type d customers together in advance. Besides, their properties are presented.
Analysis: According that the degree of type d customers is generally around 0.5 instead of the fixed 0.5. Let us talk about a more general situation, where CWTP c = wn and CWTP d = an x , 0 < x < 1. Considering every fixed n, k customers are willing to pay up to CWTP c (n) for n items while (1 − k) fraction of customers are able to afford no more than CWTP d (n). Meanwhile, CWTP real , which combines two types of customers, actually has the following form at n, The single curve, CWTP real , should fall in case2 in each step. On the jth step, the critical point n j is the solution to the following equation, Using this CWTP real (n) to calculate the profit π s = m j = 1 π j is the best and easiest way to represent both kinds of customers. In this way however, the m equations, ∂π s /∂P j = 0, may be unable to solve. As we all know, the value of n j is necessary while integrating. For example, when x = 1 2 , n j can be easily solved from a quadratic equation with one variable, though π s may be very complicated. But furthermore, on the one hand, if x varies, there is not an analytical solution that fits all the x's. On the other hand, it is very tough to find n j due to every x. In particular, when x ≤ 1 5 , equivalent equations with an exponent of more than five even do not have an analytical solution itself, which was once a classic mathematical problem in history proved by Abel.
Properties: Being aware that CWTP real (n) = kwn + (1 − k)an x clearly does not have analytical solutions, finding a replacement to CWTP real (n) is in need. The new CWTP function, CWTP mix (n) should have the following properties, (1) It should mainly meet the basic requirements for a normal CWTP function, e.g. nondecreasing, non-negative, continuous, and concave.
(2) It can represent the behaviours of both kinds of customers with a single expression, which means it can replace CWTP c and CWTP d in general. Meanwhile, the loss between the new CWTP mix (n) and kCWTP c (n) + (1 − k)CWTP d (n) is small enough to be ignored. (3) According to this CWTP mix , no matter what the parameters are, the critical points, n j , always exist with an analytical form. (4) The subscription scheme must be workable steadily, which means it should be either feasible (and solvable) or replaceable.

The calculation part in the new subscription scheme
This part introduces a new model combining the customers' behaviours in advance and then maximises the total profit. In the beginning, the way to set the model to meet the requirements above is presented, especially the parameters. After that, it is proved that the proposed scheme is always solvable on each step compared to the existing scheme. Then, considering the condition that some P * j s might lay beyond the interval, an operation to guarantee that the scheme is feasible is presented. In this way, the sellers could always get an advised model after calculation. As a result, the scheme is workable.
The way to build the new model: Here, the way to build the new model is introduced. Considering the common ranges of x and k from Choudhary (2010), and after a research on different kinds of expressions, bn k+(1−k)x , b = k/xw and so on, CWTP mix = bn (x+1)/2 , b = 2 √ k(1 − k)aw is chosen, which has the best performance and can apparently meet the requirement (1). Moreover, the following part presents the reasons why (2), (3) and (4). Firstly, we show why we choose b = 2 √ k(1 − k)aw in CWTP mix to represent both kinds of customers' behaviours. Cautiously, we use dif to represent the difference between CWTP mix and kCWTP c + (1 − k)CWTP d , and more specifically, dif is defined as the integration of L1 loss, To try our best to simulate CWTP c and CWTP d , b is set at the minimum point, After substitution and calculation, the minimum point of dif is obtained at b = 2 √ k(1 − k)aw. As a result, CWTP mix is calculated in the following formula, Figure 3 shows that L1 loss between CWTP mix and the target function, CWTP real , on some common x's around 0.5 and a fixed w = 2.5. MAE is small enough when n is not too large. As far as we know, in most data markets, each data item itself contains a large number of data records or even datasets. Therefore, the number of transactions is strictly limited. Besides, the MAE of the common substitution function, b * n k+(1−k)x (b = k/xw), which is labelled as the comparison objects is also presented in the figure. Just as we prove, various CWTP c and CWTP d give various b which enables the CWTP mix to describe the behaviours of CWTP c and CWTP d in total. Furthermore, it is proved to be solvable and feasible strictly.

Why the new scheme is solvable:
Now let us come to why CWTP mix makes Equations (18) solvable on each step. The operation on how to keep the scheme feasible is in the next paragraph.
When CWTP mix is used to represent both kinds of customers, the intelligent sellers, analysed in Kushal et al. (2011), should guarantee that the scheme falls into case2 and crosses every step. Accordingly, we have the following integrations (with no limitations on the domain of P j now) on the jth step, And owing to bn x j = P j , each n j is calculated as follows: After summing all the π j ' up, the total profit π s is obtained as follows, To maximise the total profit, the following m equations are solved to find the price that should be set at each step, ∂π s ∂P j = 0, j = 1, 2, . . . , m.
(18) P * m and the recurrence relations among P * j s are obtained, Apparently, P j has a recursive relationship depending on α j . Among α j 's, there are also recursive relations. It is not presented the final expression of P * j , because calculating each P * j and α j step by step according to the equations is the way to get the extremum points.
Then each P * j due to P * m and α j can be obtained from Equations (19b) and 20. In other words, the equations of extremum point, Equation (18), have a result on every step. At this time, the scheme is solvable. The operation on how to keep the scheme feasible is in the next paragraph.
The operation to keep the scheme feasible: Now we focus on how to guarantee the results above, P * j , to become solutions in its domain. As we can see, the expression of P * m only has relationship with T m and P * j s are all up to P * m . As a result, T j 's except T m do not affect the extremum points. In this way, we can move T j to ensure that every T j is between n j−1 and n j . According to the equations, such operation not only keeps the profit sellers have but also ensures each P * j lays between T j−1 and T j . Specifically, the moving-boundary operation is as follows. This operation guarantees that the structure falls into case2 on every step. Moreover, the operation is an O(n) algorithm which uses T * j to replace T j for j = 1, 2, . . . , m − 1, Operation 1: As a result, the CWTP mix falls into case2 in each step. After the above operations, the constraints of P * j remain on P * m only, We have x < 1 according to CWTP d which is sub-linear. Thus α 1 < 1. After that, every α j < 1. Thus, the denominator of P * m > 1. Therefore, Equation (22) is satisfied. Now the restraint is satisfied, so the scheme is feasible. P * j for all the j's are stagnation points. Furthermore, according to the expansions of Equation (18), P * j is an extremum point, and specifically, it is the local maximum point. In a word, now the P * j s are solutions to Equation (18). Precisely, P * j s are the prices sellers should set in each step after optimisation in our scheme ( Figure 4). In this way, the optimisation in this work is solvable and feasible. Therefore, it is always feasible to give a way to maximise the profit step by step. The optimised subscription scheme in this work is as follows:

A comparison between the existing scheme and the new subscription scheme
It is proved in the paper that no matter how many steps sellers choose and what the parameters are, the scheme can always make sellers get the largest profit after moving T j . Here, a comparison between the scheme in this paper and the existing scheme is as follows. Let us take the same sellers, a, b and c from Kushal et al. (2011) for instance in Table 2 and there are 2,3 and 5 steps in it. The first two columns are the origin transaction limits (i.e. T j ) and prices on each step (i.e. P j ), then the third and fourth columns are (T * j , P * j ) after the calculation from Kushal et al. (2011) and the fifth and sixth columns are from the optimisation in this work.
As shown in the table, the calculation from the existing scheme usually returns no result, which is labelled as 'null', e.g. m equals 3,5. This is because the separated calculation in the existing scheme sometimes cannot cover the largest profit. Therefore, ∂π s /∂P j = 0 returns nothing. We cannot help the sellers enlarge the profit. After the combined calculation in this work, whenever m equals to 2,3 or 5, the sellers always get T * j 's to enlarge the expected profit. These three cases are just examples, the comparison will be more obvious if more tests are carried out.

Summary
To summarise, the paper shows that the separated calculation in the existing subscription scheme does not work steadily in complex markets. After that, CWTP mix is presented to combine customers' behaviours in advance with an attached operation, so that the new model can face different customers. The new model is always solvable and feasible, it is therefore workable. At last, the comparison shows that the scheme in this paper works much more steadily.

The specific validity part
This part introduces the work towards arbitrage-free. This part can either be carried out after the calculation part or proceed independently. Firstly, there is an introduction of arbitragefree, which is an important requirement in economics. After that, we illustrate the traditional requirements towards data subscriptions of arbitrage-free is not reasonable. As a result, we present a more reasonable requirement of arbitrage-free towards data items, named specific validity, together with an operation to reach it. At last, we show why the operation towards specific validity is more reasonable in the same condition.

The concept of arbitrage-free
Arbitrage-free is an essential concept in economics, which means a pricing model should guarantee that there is no arbitrage in a risk-free market. In other words, when the market is risk-free, the price of the union should be equal to the prices of the individual units, i.e. P(n 1 + n 2 ) = P(n 1 ) + P(n 2 ). Otherwise, if P(n 1 + n 2 ) > P(n 1 ) + P(n 2 ), customers buy n 1 and n 2 separately and sell them as n 1 + n 2 to get arbitrage. So does P(n 1 + n 2 ) < P(n 1 ) + P(n 2 ). When arbitrage exists, the pricing scheme may not work reasonably from the view of economics. In a word, arbitrage-free is a necessary requirement to meet though it may lessen the profit from the calculation part, However, considering that the market in real life is not risk-free, the arbitrage-free requirement is usually weaker, i.e. P(n 1 + n 2 ) ≤ P(n 1 ) + P(n 2 ). It is also the requirement existing subscription schemes meet, and there is a more formal expression. If O(t) is the least price to get t transactions and I(t) is the price by the pricing mode, e.g. I(t) = P j+1 if T j < t < T j+1 . When O(t) = I(t) happens at all the t's, it is called validity. The existing expression of arbitrage-free in form of proposition is shown as follows, Proposition 2: P(n 1 + n 2 ) ≤ P(n 1 ) + P(n 2 ), ∀n 1 , n 2 ⇔ O(t) = I(t), ∀t Proposition 3: A multi-step pricing function (P, T) is valid iff P k+1 ≤ min 1≤j≤k P j + I(T k + ε − T j ), ∀k = (0, 2, . . . , m − 1), ε → 0.

Analysis
Nevertheless, considering the characteristics of data subscriptions, such a requirement is still too much. In a data subscription model, the length of step may be much more uneven than traditional products. As a result, the existing requirement for data subscriptions is not reasonable enough. Therefore, a more detailed example is put forward as follows, a twostep pricing model with constant T.
Due to the conditions of arbitrary free, P 2 is strongly limited, As a result, P 2 on the second interval (T, aT] should be less than 2P 1 . At this time, however, parameter a is arbitrary only if a ≥ 1. As a result, the price from the model on (T, aT] must be less than 2P 1 , regardless of the exact value of a among 2, 3, . . . , ∞. The price on an interval thoroughly has no relationship with its upper bound! As common sense, the pricing model on a step should be determined by the bounds and length of interval together. For some common products, the length of steps may be uniform which means parameter a is around 2. But in data markets, the steps are uneven, e.g. the subscription examples from Kushal et al. (2011) showed that a can be 5 or even larger. That may lead to a big difference.

Specific validity, a new standard to meet arbitrage-free in data items
This part introduces the improvement of a subscription model towards arbitrage-free, which fits the data markets better.
To take the uneven steps of data items into account, we take some relaxations on validity requirements, named specific validity. There is no need that the price I(t) from the pricing model be equal to the least price O(t) all the way.
A claim is presented as follows, along with the proof to connect the specific validity with the multi-stepped model.

Proof: (necessity)
Here we use the mathematical introduction to finish the proof.
basis: When k = 0, the necessity is satisfied. introduction hypothesis: Assuming that the necessary works are carried out on a fixed k, now we pay attention to k + 1.
introduction step: The definitions of I(t) and Considering about k + 1, P k+1 = I(t). Now it can be concluded that, must be a combination of the smaller steps, which is expressed as, As the assumptions of the mathematical introduction tell us, O(t) = I(t) works on the first k steps. Consequently, As a result, I(t) ≤ O(t) can be satisfied.
Step by step, now let us come to I(t) = O(t) for k + 1. According to mathematical introduction, the equation holds for every T j+1 ≥ t ≥ 1 2 (T j + T j+1 ).
The proof of sufficiency and necessity is completed, therefore, this claim becomes a proposition, which claims the requirements a model should meet for specific validity.

Operation
After discussing the requirements for arbitrage-free in the pricing scheme, an operation is needed to improve the pricing scheme to meet specific validity. An original subscription model without the calculation in Section 4 can also carry out the operation here to meet arbitrage-free. To satisfy Proposition 4, it is necessary to transfer (P, T) into (P , T) but not too far from (P, T). This operation 2 is an O(n 2 ) algorithm. The dual operation which transfers (P, T) to (P, T ) will not be repeated here.

Comparison
Here is a comparison on a pricing model between validity and specific validity. Let us look at the same example now. To meet the specific validity, we have the following equations due to the proposition.
Equation (24) can precisely describe the lengths and the distributions of the steps. When a gets larger, P 2 increases linearly. For example, if a = 4, 6, 10, then P((T, aT]) ≤ 3T, 4T, 6T instead of a fixed 2T. The scheme which meets specific validity performs much more reasonably than validity. Table 3 shows the comparison between validity and specific validity. The first two rows are the original scheme (P, T) from seller a,b,c when the third and fourth row show that P j meets validity and P j meets specific validity. When the step is uneven, e.g. seller a, the P j performs more reasonable while P j is too tight. When the steps are even, e.g. seller b,c, neither validity nor specific makes a big difference to the original scheme. In a word, considering the uneven steps in data subscriptions, and aiming at meeting arbitrage-free, the scheme which meets specific validity performs much more reasonably than validity. Specific validity can help sellers set the model better from the view of economics.

Conclusions and future work
After introducing the current work on data pricing, this paper points out the problems of existing schemes, which mainly focus on a single record or dataset, and pricing on some items together is either an important supplement or replacement to them. Based on the existing subscription model that connects customers' buying power with the sellers' pricing model, this paper puts forward a way to improve it to face both kinds of customers. This paper also rigorously proves that the new scheme works steadily in a complex market, which can always optimise the model to enlarge the sellers' profit. Furthermore, the paper proves that the normal requirement of arbitrage-free is too strict for data products such as datasets and data records. In data markets, data pricing models have their characteristics. As a result, the paper proposes a specific validity and an O(n 2 ) program to reach it.
However, the work in this paper has some shortages. For instance, though we propose a better calculation part, there is still difference between CWTP mix and CWTP real . How to set a better CWTP mix ? Moreover, although it is considered to be worthy, specific validity lessens the profit sellers can get from the calculation part. How to balance the arbitragefree and the expected profit more reasonably? The Game theory maybe useful here. At last, the paper sets the subscription model independently from precise models. In actual use, they are usually closely related.
In the future, we may concentrate on the shortages above, especially the relations between subscription models and precise models. The balance between two kinds of pricing schemes is necessary to cosmically commercial in data exchange. Besides, as Wu et al. (2021), Wang et al. (2019), Chen et al. (2020) and Sreeja (2019) mention, the quality of data items has an important impact in actual use. It is our concern to combine data quality with subscription pricing schemes. Furthermore, Zhang et al. (2021) and Liang et al. (2018) illustrate the importance of the security of the data exchange system, and then Hu et al. (2021), Dai et al. (2020), Yu et al. (2020) and Murugajothi and Rajakumari (2020) focus on building a secure exchange system. But such system designs are lack of consideration on pricing methods. As mentioned in related work, pricing schemes, the ways of data exchanging and the system design are closely related to each other. Integrating the pricing schemes into the exchange system design deserves more attention in the future.

Disclosure statement
No potential conflict of interest was reported by the author(s).