A Data Mining Approach for Developing Online Streaming Recommendations

ABSTRACT Online streaming has become increasingly popular with the availability of broadband networks and the increase in computing power and electronic distribution. Online streaming operators have difficulty developing flexible business alternatives according to users’ changing streaming behaviors in terms of generating a good and profitable business model. In terms of e-commerce development, the live-streaming platform that provides streaming of the main merchandise to users, allowing the users to directly consume via live-streaming become critical issues. In this regard, personalized recommendation systems can use the user’s interests and purchasing behavior to recommend information and merchandise. Thus, this study investigates the online streaming experiences of Taiwanese consumers to evaluate online streaming users and their online purchase behaviors for developing online recommendations. This study uses a snowflake schema, which is an extension of the star schema. In addition, this study develops a rule-based recommendation approach for investigating online streaming and purchasing behaviors in terms of online recommendations. This study is the first to determine how online streaming proprietors and their affiliates are disseminated using online streaming consumer behaviors in terms of online recommendations for further electronic commerce development.


Introduction
With the increasing popularity of broadband networks and the improvement of computing power, live-streaming media that was originally only suitable for electronic media and enterprises can be extended to users and families on the Internet (Singh and Sharma 2020). The line stream uses and continues the advantages of the Internet, and uses video for online live streaming. This stream can display product demonstrations, conferences, background introductions, program evaluations, online surveys, dialog interviews, online training, online libraries, and other online applications (Anderson 2017). On the road, using the fast, intuitive, and rich content of the Internet, as well as strong interactivity and lack of geographical restrictions, audiences can be divided to develop the effect of communication (Wang 2019). After the live stream is over, creators can continue to provide replay or on-demand services at any time, thereby effectively expanding the time and space of the live stream. One is to provide TV signal viewing on the Internet, such as live broadcast of various sports games and cultural activities. The principle of this live streaming is to collect TV signals from analog (analog) signals and input them to the computer, thereby providing real-time uploading websites for people to watch, which is equivalent to Internet TV. The other is live streaming in the true sense: an independent signal acquisition device (audio + video) is set up on the site, and then the director (guidance device or platform) is imported, and then uploaded to the server through the console (Bilal, Erbad, and Hefeeda 2018). The biggest difference between this type of streaming media and the previous one is the autonomy of live streaming: independent and controllable collection of audio and video is completely different from a single broadcast of TV signals.
As a result, the huge business opportunities brought by online shopping have also turned retail companies selling various products into virtual e-commerce suppliers, which has led to more common online shopping behaviors among consumers (Smink et al. 2019). In particular, Facebook launched the fan page function in 2007 and the Facebook Live Stream function in 2015, which not only made Facebook a platform for interpersonal interaction but also expanded its popularity, making Facebook an essential tool for online marketing. Since online streaming allows multiple viewers to watch the same video file at the same time, and the viewer can play the file interactively at any time, for example, the current Facebook shopping live streaming model has been developed to include auction-based activities (Lim and Kim 2018). Regarding to interactivity, live streaming allows consumers to browse and participate in real-time, and enhances the characteristics of the connection between people. This is also a consumption attribute that other traditional e-commerce websites cannot achieve (Huang et al. 2015). In addition, in terms of transparency, online streaming allows online and offline operators to display sites and products to consumers, thereby enabling consumers to evaluate whether the products meet their needs. This marketing model overcomes the key limitation of online shopping, that is, consumers cannot see the physical product and get the texture. Live streaming enables consumers to better grasp the texture of the product and improve consumers' imagination when actually using the product . Thus, through live streaming of product features, functions, prices, advantages, and disadvantages product attributes can be fully presented to consumer groups.
The live-streaming industry in Taiwan has developed rapidly, and its output value has increased by more than 24 times from 2016 to 2018. In 2020, it will reach an economic market of 300 billion TWD. In terms of the Taiwanese online streaming market, the results of the Taiwan MIC survey conducted by the Taiwan Council show that 31.3% of Taiwanese have the habit of tracking specific broadcasters. In this group of followers, most of them interact with the broadcasters through methods such as click to support (38.6%), pure viewing (34.8%) and barrage/comments (26.9%). However, 15% of followers have spent money rewarding live broadcasters, 14.3% have purchased peripheral goods, 10.9% have participated in offline activities, and male followers (18.6%) are more willing to spend money than women for rewards (10.8%)(Taiwan Market Intelligence & Consulting Institute (Taiwan MIC) 2019). The above findings indicate that an influential economy has great development potential. Generally speaking, live broadcasting has different development paths in different regions. For example, China has developed a unique live-streaming media market, and it also has a special high-frequency, high-value reward culture. Among the first-tier, second-tier, and third-tier cities, the types of live programs and age levels can be classified. On the other hand, live-streaming media in Europe and the United States mostly follow the operating brand model, with the ultimate goal of creating its own brand and e-commerce operations. For example, a live broadcast can start with teaching beauty, then endorsements, selling cosmetics, and finally creating an e-commerce model of his/her own brand. Therefore, rewards or platform sharing are only the initial profit model (Taiwan TSRI 2019).
Thus, this study is the first to investigate the online streaming experiences of Taiwanese consumers to evaluate online streaming users and their online purchase behaviors for developing online recommendations. This study uses a snowflake schema including 2,124 data record, which is an extension of the star schema and combines association rules with rough sets in terms of developing a data mining approach. In addition, this study develops a rulebased recommendation approach for investigating online streaming and purchasing behaviors in terms of online recommendations. This study is the first to determine how online streaming proprietors and their affiliates are disseminated using online streaming consumer behaviors in terms of online recommendations for further electronic commerce development.

Online Streaming
With the availability of broadband networks and the increase in computing power and electronic distribution, online streaming has become more and more popular (Aljukhadar and Senecal 2017). With the availability of broadband networks and the increase in computing power and electronic distribution, online streaming has become more and more popular (Hamid et al. 2016). Online streaming media is a kind of multimedia, which is continuously received by the end user and presented to the end user when it is continuously delivered by the provider. The term of live streaming refers to the process of the medium's delivery method, not the medium itself. In addition, live-streaming media provides an alternative method of file download, in which the end user obtains the entire file of the content before watching or listening to the content. On the contrary, live streaming is the real-time transmission of Internet content, just like real-time TV broadcasts content via airwaves through TV signals. Live streaming requires a form of source media (e.g., a video camera, an audio interface, screen capture software, etc.), an encoder to digitize the content, a media publisher, and a content delivery network to distribute and deliver the content. Live-streaming media does not need to be recorded at the point of origin, although recording is often required. However, the development of live-streaming media on the Internet presents challenges. For example, in terms of long-term behavior observation, it is difficult to investigate user behavior through data collection and analysis. In addition, it is difficult for online streaming media operators to develop flexible business alternatives based on the changing streaming media behavior of users, thus failing to produce good and profitable business models.
In addition, this paper should also look at power-law behaviors reported in the literature and how online streaming can be used for business purposes on Table 1.
By examining the illustrated literature of online streaming technologies and applications, we infer that past studies have a gap on investigating users' behavior and preference on online streaming in terms of an e-commerce business model development. This is a main motivation of this paper to develop online streaming using recommendations.

Personalized Recommendation Systems
A personalized recommendation system (PRS) uses users' interests and purchasing behavior to recommend information and products. With the expansion of e-commerce and the increase in the number of products offered online,  Mahanti et al. (2013) Pareto and Zipf distributions method. Power-laws. Huang et al. (2015) Heterogeneous vehicular networks Adaptive live streaming mechanism. Thomson, Mahanti, and Gong (2017) One-click file hosting indexes Copyright infringement. Farrelly et al. (2017) Video characteristics of xHamster. Video streaming service Thomson, Mahanti, and Gong (2018) One-click file hosting services (OCHs). Internet piracy. Song and Mahanti (2019) Mobile streaming. Academic Web Server. Yang et al. (2019) Fuzzy Petri Nets News streams Wong, Song, and Mahanti (2020) Video streaming; Tagging; Data mining. Online social networks.
customers need time to find the products they want to buy (Chang and Jung 2017). Browsing a large number of irrelevant information sources and product processes means that consumers will experience information overload. Therefore, a personalized recommendation system (PRS) is proposed (Modarresi 2016). PRS is an advanced business intelligence platform that uses massive data mining to allow e-commerce websites to provide its customers with complete personalized decision support and information services (Xu et al. 2016). The recommendation system of shopping websites recommends products to customers and automatically completes the process of personalized product selection to meet the individual needs of customers (Contreras, Salamó, and Pascual 2015). The main algorithms for e-commerce recommendation systems include association rule-based recommendation (Santos et al. 2018), content-based recommendation (Zheng et al. 2018) and collaborative filtering recommendation (Akcayol et al. 2018). The biggest advantage of personalized recommendation is that it can collect user data and provide online users with personalized recommendations based on user characteristics (such as interests or preferences) (Lee and Brusilovsky 2017). Various online platforms and social networks that provide personalized services also need the support of the recommendation system (Yin et al. 2018). In an increasingly competitive environment, personalized recommendation systems can retain customers and improve the services of e-commerce systems. A successful recommendation system (RS) can use accurate big data analysis to bring huge benefits (Gao et al. 2019). In the electronic market, more and more recommendation systems are adopted to improve the pre-selection of available products and services (Hwangbo, Kim, and Cha 2018). Determining user preferences is an important condition for effective operation of these automatic recommendation systems (Dixit, Gupta., and Jain 2018). This study develops a rule-based recommendation method for investigating online streaming and purchasing behavior based on online personalized recommendations.

Rough Set Theory
Rough set theory (RST) was introduced by Pawlak in the 1980s (Pawlak 1982(Pawlak , 2002 as a mathematical approach to aid decision-making in the presence of uncertainty. The rough set philosophy assumes that for every object there is associated a certain amount of information (data, knowledge), expressed by means of some attributes used for the object's description. Objects having the same description are indiscernible (similar) with respect to the available information. The indiscernibility relation thus generated constitutes the mathematical basis of RST. It induces a partition of the universe into blocks of indiscernible objects, called elementary sets that can be employed to build knowledge about a real or abstract world. The use of the indiscernibility relation results in information granulation (Greco, Matarazzo, and Slowinski 2001). In other words, when the available knowledge is employed, boundary-line cases cannot be properly classified. Therefore, rough sets can be considered as uncertain or imprecise as illustrated in the following (Liao and Chang 2016). An attribute a is a mapping a : U ! Va where U is a non-empty finite set of objects (called the universe) and Va is the value set of a. An information system is a pair T ¼ ðU; AÞ of the universe U and a nonempty finite set A of attributes. Let B be a subset of A. The B-indiscernibility relation is defined by an equivalence relation IB on U such that IB ¼ fðx; yÞ 2 U2j"a 2 B: aðxÞ ¼ aðyÞg. The equivalence class of IB for each object xð2 UÞ is denoted by [x]B. Let X be a subset of U. We define the lower and upper approximations of X by BðXÞ ¼ fx 2 Uj½x�B � Xg and The accuracy and coverage of a decision rule r of the form ϕ ! ðd ¼ diÞ are respectively defined as follows: In the evaluations, Ui j j is the number of objects in a decision class Ui and ½½ϕ��T 0 j j is the number of objects in the universe U ¼ U1 [ � � � [ U Vd j j that satisfy condition ϕ of rule r. Therefore, Ui\½½ϕ��T 0 j j is the number of objects satisfying the condition ϕ restricted to a decision class Ui.
Database Modeling -Snowflake Schema Database Berson et al. (1997) noted that a snowflake schema database is constructed using dimensional data tables and a fact table, and the dimensional structure of the data is presented in a star schema, a snowflake schema, and a fact constellation schema. A multi-dimensional data model is necessary to organize and store a large amount of data in a data warehouse so that queries and analysis decisions can be optimized. This study uses a snowflake schema including 2,124 data record, which is an extension of the star schema. This schema queries the data and saves storage space and is easier to maintain than other schema modes that use a relational database. The snowflake schema database is shown in Figure 1 and includes 17 fact tables, 9 relationships and 76 dimensions.

Ordinal Scale Data Processing -Rough Set Processing
Ordinal scale is a scale that uses labels to classify cases into ordered categories. Traditional association rules ignore the rules found in ordered data. Ordinal scale means that the classes must be arranged in order so that each case in one class is considered greater (or less than) each case in the other class. Cases of the same category are considered equivalent. Data science is a critical issue in applied artificial intelligence in terms of developing information science (Jadhav, Pramod, and Ramanathan 2019). This study combines association rules with rough sets to create an application for ordinalscale data. This study investigates a market survey using a questionnaire to collect data. A total of 2,124 valid questionnaires were divided into six sections with 29 items in the database design. All questions were designed using ordinal scales. All items used multiple-choice questions. An example is given as follows: Which of the following live-streaming social platforms have you used to watch a live stream?
Facebook YouTube Instagram Twitch Live.me 17 Stream PikoLive Others (Please list your three top choices in order//) / /). The processing of ordinal-scale data is described below.

Definition1:
Transform the questionnaire answers into information system is a finite set of general attributes/criteria and j ¼ 1. f a ¼ U � A ! V a called the information function, V a is the domain of the attribute/ criterion a, and f a is a ordinal function set such that f ðx; aÞ 2 V a for each x i 2 U. Table 2 shows the ranking of live-streaming platforms, from the first to eighth, by x 1 , named Facebook, YouTube, Instagram, Twitch, Live.me, 17 Stream, PikoLive and Others. Then: f a 1 ¼ 1 f g f a 2 ¼ 2; 3; 5; 6; 7 f g f a 3 ¼ 2; 3; 6; 7; 8 f g f a 4 ¼ 2; 4; 5 f g f a 5 ¼ 3; 4; 6; 7 f gf a 6 ¼ 2; 5; 6 f gf a 7 ¼ 3; 4; 8 f gf a 8 ¼ 6; 7; 8 Definition2: According to specific universe of discourse classification, a similarity relation of the general attributes a 2 A, denoted by U A . All of the similarity relation, denoted by Rða j Þ.

Example:
The information system is composed of ordinal scale data. Therefore, an ordinal response occurs between the two attributes, where D a presents the pair wise comparison results of ordinal scale data, which are defined as follows, Then, using the concept of similarity relation in rough set theory we find the value of the ordinal-scale data between a i and a j , where indðBÞ is the core attribute value of the ordinal-scale data in the first step, and B is the subset of A. Example: Example: According to the similarity relation and the fact that g both belong to the same fundamental set, the ordinal function set is f a 5 ¼ 3; 4; 6; 7 f g and f a 6 ¼ 2; 5; 6 f g. Therefore, a 5 and a 6 are both core attribute values of the ordinal-scale data for live-streaming platforms and for user x 1 , x 3 and x 4 , a 5 always places after a 6 , denoted by D þ a . The pair-wise comparison of a 5 and a 6 , as shown in Table 3.

Rough Set Association Rules
Definition 1: As a first step, this study identifies the core attribute values of ordinal-scale data. In this step, the object generates the rough associational rule. The consideration of other attributes and the core attributes of ordinalscale data as the highest decision-making attributes is used to establish the decision table and to generate rules, as shown in Table 4.
g is a finite set of objects and i ¼ 1 � � � n, Q is usually divided into two parts. G ¼ g 1 ; g 2 ; � � � g j � � is a finite set of general attributes/criteria and j ¼ 1, D ¼ d 1 ; d 2 ; � � � d l f g is a set of decision attributes and k ¼ 1 � � � p. f g ¼ U � G ! V g is called the information function, V g is the domain of the attribute/ criterion, g, and f g is a total function, such that f ðx; gÞ 2 V g , for each g 2 Q; x 2 U. f g ¼ U � G ! V d is called the sorting decision-making information function, V d is the domain of the decision attributes/criterion, d, and f d is a total function, such that f ðx; gÞ 2 V d , for each d 2 Q; x 2 U. Then: f g 1 ¼ g 1 1 ; g 1 2 f gf g 2 ¼ g 2 1 ; g 2 2 f g f g 3 ¼ g 3 1 ; g 3 2 f gf g 4 ¼ g 4 1 ; g 4 2 f g Definition2: According to the specific universe of discourse classification, a similarity relation for the general attributes is denoted by U G . All of the similarity relations are denoted by Rðg t Þ and t is the combination of all the general attributes.
: By the similarity relation, and determination of the reduct and core, the attribute, g, of G and the set G, which was ignored, has no effect, so g is an unnecessary attribute and can be reducted. R � G and " g 2 R.
A similarity relation for the general attributes of the decision table is denoted by indðGÞ. If indðGÞ ¼ indðG À g 1 Þ, then g 1 is the reduct attribute and if indðGÞ�indðG À g 1 Þ, then g 1 is the core attribute. Stream ranking x 1 g 11 g 21 g 31 g 41 3 a 5 x 2 g 11 g 22 g 31 g 42 6 a 5 x 3 g 12 g 21 g 32 g 41 6 a 5 x 4 g 12 g 21 g 32 g 41 7 a 5 x 5 g 11 g 22 g 31 g 42 5 a 5 Example: When considering g 1 , alone, g 1 is the reduct attribute, but when considering g 1 and g 3 , simultaneously, g 1 and g 3 are the core attributes.

Definition4:
The lower approximation, denoted as À GðXÞ, is defined as the union of all of the elementary sets that are contained in x i ½ � G . More formally: The upper approximation, denoted as � GðXÞ, is the union of those elementary sets that have a non-empty intersection with x i ½ � G . More formally: The difference: Bn G ðXÞ ¼ � GðXÞ À À GðXÞ is called a boundary of x i ½ � G .

Definition5:
Using the traditional association rule to calculate the value of Support, Confidence and Lift value, the formula is shown as follows: Definition6: Rough set association rules.

Pattern 1: Associations between Users' Behaviors and Live Stream Platform/ content
In terms of the users' preferences to use an online stream platform and content, the types of stream platforms that are preferred by the users are determined by investigating the users' habits and behaviors, so an online stream operator can tailor information for users. This study uses a minimum antecedent support of 2% and a minimum rule confidence of 30% to generate ten meaningful association rules, as shown in Table 5 and Figure 2. The lift values are all greater than 1.
In terms of R1, the participation motivation of the users to watch a live stream platform, we found that most of the users like to interact with live streamers and viewers using the PikoLive platform. They like to donate to their favorite live stream moderator on the live-streaming platform with a hope of improving their online gaming technique. In terms of R6, the participation motivation of the users was to celebrate birthdays and festivals using the witch platform. They like to participate in live community discussions and interact with streamers and viewers, spending time with a specific favorite streamer. In terms of R10, the participation motivation of the users was to engage in dialog with live streamers using the YouTube platform. They like to recommend stream content to their friends, enjoy the fun atmosphere and relieve stress.

Pattern 2: Associations between Users' Behaviors and Online Purchase Preferences
In regard to pattern 2, this study investigates live-streaming users' behaviors and their online purchase behaviors. This study uses a minimum antecedent support of 2% and a minimum rule confidence of 30% to generate nine meaningful association rules, as shown in Table 6 and Figure 3. The lift values are all greater than 1.
In terms of R1, users increase purchase intention by sharing their product/ service experience with live-streaming content. The reason for consumption of the live stream is to bargain for products and price. Reducing the offline shopping frequency is a main factor behind changing purchase behaviors in consumption patterns. In addition, users like to read messages and absorb new information in terms of stream interaction with other viewers. For R6, live shopping information increases purchase intention by solving problems through chat room talk, offers are promoted through stream shopping, special offers are provided by stream operators and users enjoy other viewers' sharing in terms of stream interaction with other viewers. On the other hand, in terms of R9, the main reason for stream users to increase purchase intention is that stream platforms can provide product classification. Also, increased viewer  intentions, online to offline interaction, discounts on various payment methods and stream moderators talking to the fans will affect users' online purchase preferences.

Pattern 3: Associations between Online Stream Services and Online Recommendations
When it comes to online stream services and online recommendations, the content of stream services that are preferred by the users are determined by investigating the users' online streaming and purchase behaviors, so that an online stream operator can provide online recommendations to users so as to improve stream service and create online business opportunities. This study uses a minimum antecedent support of 1% and a minimum rule confidence of 30% to generate 10 meaningful association rules, as shown in Table 7 and Figure 4. The lift values are all greater than 1. In terms of R2, live stream content should include differentiation of features and themes for. Value VIP users, daily chat, more product introductions and offering discounts are the antecedents in terms of associations of the rule the online stream services and online recommendations. For R4, organizing events, commenting on current affairs, make a good impression of the product and selecting the most sponsored fans are associated with improved message speed and social interaction. On the other hand, in terms of R9, increased placement marketing can incorporate events of new product development, comments on current affairs, recommending items trusted by users and discount offers by operators in terms of implementing online stream service and online recommendations to online streaming operators.

Live Streaming Platform/content Recommendations
In regard to platform recommendations, this study suggests that live-streaming platform operators can use the advantages of hardware facilities or featured live streamers to attract viewers. For example: 17 stream broadcasts have strong foundations of celebrity endorsements and beautiful live broadcasts to viewers. These broadcasts are dedicated to content production and interaction on the live-streaming platform, and thus have stable viewing groups. In regard to content recommendations, this study suggests that streaming platform operators or moderators should pay attention to the streaming program quality and content so that viewers can increase the exposure of the platform by sharing stream content on social networking sites. As for social media/network recommendations, this study suggests that streaming platform operators or moderators can try to provide current topics and issues that are popular in online networks. Instagram, for example, allows users to receive more timely information in the social media/network through live streaming. Furthermore, Facebook Live Stream does not only provide a streaming platform for online marketing and sales, but also develops membership and fans by expanding social media/network influence through online recommendations.

User's Behaviors and Online Purchase Recommendations
In regard to users' behavior recommendations, this study suggests that stream platform operators or moderators can provide users more integrated streaming activities by encouraging more live-streaming participation. Under the trend of live-streaming platforms that combine e-commerce activities, the current AI technology will be used to classify consumer behaviors and stream viewing habits in order to effectively improve user behavior when using online streaming. As for online purchasing recommendations, this study suggests that streaming platforms and e-commerce operators target specific user groups and increase the exposure of online shopping brands and products. In addition, through e-commerce combined with live streaming, online events can be held and promotional discounts can be given to stream viewers, thereby increasing the exposure period of products and achieving the goal of advertising. Moreover, this study suggests that streaming and e-commerce operators can offer preferential promotions to increase users' intentions to purchase online. This can make up for the shortcomings of TV shopping and e-commerce, which fail to provide human interaction and physical service.

Online Stream Service Recommendations
As for online service recommendations, this study suggests that streaming platforms and e-commerce operators both to launch discount activities specific to products/services when selling placement products to stimulate buying momentum and contemplate new marketing techniques. Furthermore, operators can encourage stream users to integrate their consumer behaviors into the interaction of online and offline purchase services, in addition to increasing the loyalty of the fans to the platform by products/services recommendations, which can turn the streaming actions of fans into actual purchasing g behaviors. On the other hand, online live-streaming messages that promote consumer loyalty can also be sent using the online to offline business model. Restaurants and food outlets might allow reservations online and provide the final service offline by using streaming platforms to complete an online to offline business model. Thus, online to offline could be a business model that allows retailers to build on a transactional platform that co-operates with online live-streaming operators.

Conclusion and Future Work
The gift and sponsorship-streaming business model refers to the content provided by the live streamer, who broadcasts the content to the user through the live-streaming platform, and provides the user with the stored value to purchase virtual gifts from the live streamer or Internet celebrity, or provide services, such as higher viewing quality, extra features, and opportunities to interact with live streaming. In addition, one type of live streaming platform allows users to pay directly for sponsorship through the live streaming of a game, or to donate money to a fundraising proposal. The e-commerce and shopping guide streaming business model adopts a broader definition. It refers to a business model in which the live-streaming platform provides streaming of the main merchandise to users, allowing the users to directly consume via live streaming. This business model is also integrated with e-commerce platforms. Through the live guide who is live streaming, users are directed to an e-commerce platform to purchase goods online, and e-commerce platforms are utilized in this process. In terms of future works, this study considers that different data mining methods might be investigate the online streaming issues on the further research for obtaining diversified knowledge and application results.

Disclosure Statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by the Ministry of Science and Technology, Taiwan [MOST 110-2410-H-032-023 -].