Marketing strategies evaluation based on big data analysis: a CLUSTERING-MCDM approach

Nowadays, a huge amount of data is generated due to rapid Information and Communication Technology development. In this paper, a digital banking strategy has been suggested applying these big data for Iranian banking industry. This strategy would guide Iranian banks to analyse and distinguish customers ’ needs to offer services proportionate to their manner. In this research, the balances of more than 2,600,000 accounts over 400weeks are computed in a bank. These accounts are clustered based on justified RFM parameters containing maximum balances, the most number of maximum balances and the last week number with the maximum balance using k-means method. Subsequently, the clusters are prioritised employing Best Worst Method-COmplex PRoportional ASsessment methods considering the diverse inner value of each cluster. The accounts are classified into six clusters. The experts named the clusters as special, loyal, silver-high interaction, silver-low interaction, bronze, averted-low interaction. sil- ver-low interaction cluster and loyal cluster are picked in order by experts and BWM-COPRAS as the most influential clusters and the digital banking strategy is developed for them. RFM parameters are modelled for customers ’ accounts singly. The aggregation of the separate accounts of a customer should be considered.


Introduction
Today, organisations and business enterprises' customers carry different needs and tastes in markets. Good and service providers try to attract these customers and create value for them in a fierce competition. The organisations would be the champion that identify the customers' behaviour agile and take the suitable actions. Concentrating on some of the groups of customers would be profitable while this procedure is not applicable for others. Customers are diagnosed in the stage of analysis and recognition during the process of strategic management. Customers, customer orientation and customer satisfaction are influential for organisations notably service organisations. Identifying customers, understanding their expectations, setting up process found on their desire and at last achieving the customers satisfaction and loyalty are a part of today's business goals (Amoozad Mahdiraji & Razavi Hajiagha, 2017). Consequently, grouping and clustering the customers and determining the interaction strategy for each of them is significant.
A huge volume of data are generated, gathered, stored and refined everyday by customers. It is impossible to analyse and process these data and extract knowledge from them using traditional ICT and data mining tools. Big data include four features (4 V) covering volume, variety, velocity and value (Qi, Tao, Zuo, & Zhao, 2018).
At first glance these data seem to be valueless and do not relate to each other logically; however, in recent years these data are called 'new oil (Kantarcioglu, Ferrari, & States, 2019). Refinement and analysis of these data could create value, gain competitive advantage and improve performance in organisations. Organisations can study their customer's behaviours and needs, respond them and designate the discrepancy between customers to provide goods and services tailored to them (Grover, Chiang, Liang, & Zhang, 2018).
According to the Gartner's Institute, global investing in the field of big data analysis is rising uninterruptedly. In 2013, approximately 2.1 billion dollars and in 2014 nearly 3.8 billion dollars has been invested in this field. In the review of big businesses (e.g., GE), 87% believes that analysis and proper use of big data will change their competition in next three years. Additionally, 89% suppose that they will lose a significant portion of their market share during next year without employing the big data analysis' techniques.
In order to extract knowledge from these data, data mining tools are used to identify, predict and respond to customers' actions. Clustering methods are some of the tools applying in data mining. In this paper, customer analysis with strategic management approach and using data mining tools is employed to choose the best strategy for offering goods and services to customers and create value for them. For this purpose, the market is clustered focusing on the performance of 2,600,000 accounts in an Iranian bank to develop an appropriate strategy by experts after prioritising the clusters using Best Worst Method (henceforth BWM)-COmplex PRoportional ASsesment (henceforward COPRAS).
The remaining of this research is organised as follow. First the theoretical foundation of the research including data mining, Recency, Frequency, Monetary (Hereafter RFM) model, clustering, Multi Criteria Decision Making (henceforth MCDM) and digital banking strategies is reviewed. Afterwards, previous researches are presented, explained and analysed. Next, the clustering variables are computed based on RFM parameters. Subsequently, the market is clustered applying k-means method. Finally, the clusters are labelled by experts, prioritised by BWM-COPRAS and the marketing group has developed a digital banking strategy for the two customers' most influential clusters.

Data mining
The ability of generating, gathering and processing data from various resources has been improved due to extension of computers and ICT. Explosive growth in storing and transferring data have highlighted the need of mechanised techniques to convert data to knowledge. This need has led to emergence of a modern science in computers field called data mining (Cheng, Chen, Sun, Zhang, & Tao, 2018).
In other words, extracting the patterns accumulating in data banks, webs, data bases and data flows that express knowledge is labelled as data mining (Han, Kamber, & Pei, 2011). Data mining discusses some issues in statistics, machine learning, pattern recognition, data base technology, data recycling, network science, knowledgebased systems, artificial intelligence and etc. In the Figure 1, some of the available tools in data mining are demonstrated. Among the tools in Figure 1, clustering is used in this research.

Data modelling based on RFM model in banking industry
RFM is a model in customer value analysis. This model studies the customer value based on three parameters including the purchase frequency, monetary and recency. The basis of the suggested model is the balance of customers' accounts (Roshan & Afsharinezhad, 2017). In this research the average balance of each customer's accounts has been computed during 403 weeks and the below steps are performed.
1. The account balances are computed for each day (Remain AccÀNo DayÀNo ). 2. The weekly average balance is calculated on previous step (Remain AccÀNo WeekÀNo ). 3. The maximum of the weekly average is determined applying (1).
4. There is a number as a maximum of the weekly average for each account. Formula 2 is used to compute the peak balance and analyse the customers with more than one account.
Note that in formula 2, a is a number between zero and one. The experts have chosen 0.5 in this research considering conservative circumstances in banking industry of Iran. In the following, the RFM parameters are defined. Recency. The number of the last week that the weekly balance is equal or more than peak balance. Frequency. The number of the weeks that the weekly balance is equal or more than peak balance. Monetary. The average of peak periods balance.
Moreover, the below assumptions are considered in RFM modelling.
1. Weeks starts at Saturday and ends in Fridays, considering Iranian national calendar; 2. The balances are calculated at the end of the working time of each day; 3. RFM has been modelled for customers' accounts solely. The aggregation of the separate accounts of a customer should be discussed (Gultom et al., 2018).

Clustering
The process of placing data in the groups that have the most resemblance in some features is called clustering. Each cluster contains a set of data similar to other data in that cluster and different from data in other clusters (Han et al., 2011). The subject matter in clustering is the similarities and differences of the samples. Similar samples are sat in a cluster; thus, data features are used to compare samples.
Distance is the similarity criterion and the formula to measure distance is significant in clustering. The distance assists the moving in data space and forming the clusters. The closeness of data is perceived by measuring the distance. There are different ways to measure distance (Jintana & Mori, 2019).

Best worst method
BWM is a new and efficient technique in MCDM. It is used to derive the weights of criterions in decision making. There are some approaches to BWM which is illustrated in Figure 2.
In this paper, nonlinear approach to BWM is employed. The steps of nonlinear BWM is described as below (Rezaei, 2015): 1. A set of decision criteria is determined ( C 1 :C 2 . . . :C n f g ). 2. The best and the worst criteria is elected by each expert or focus group decision; 3. The preference of the best criteria over the all criteria is determined using a number between 1 and 9 (A B ¼ A b1 :A b1 . . . :A bn ð Þ ) by each expert or focus group decision; 4. The preference of the all criteria over the worst criteria is determined using a number between 1 and 9 (A W ¼ A 1w :A 2w . . . :A nw ð Þ ) by each expert or focus group decision; 5. The optimal weights are found by solving the nonlinear (NLP) model of (3) BWM has been employed in many researches in recent years. Garoosi Mokhtarzadeh et al, in 2018 used BWM to find the weights of criterion to rank the technologies for R&D in an Iranian high tech company (Garoosi Mokhtarzadeh, Amoozad Mahdiraji, Beheshti, & Zavadskas, 2018).
Furthermore, Gupta performed BWM to prioritise the service quality attribute for airline industry (Gupta, 2018). Moreover, Rezaei et al., in 2018 applied BWM to assign weights to logistic performance index which is significant for policymakers (Rezaei, Roekel, Van, & Tavasszy, 2018). Note that, recently the integrations and applications of this method has been analysed and presented (Xiaomei et al., 2019)

COPRAS method
COPRAS is a MCDM technique introduced by Zavadskas to rank alternatives based on decision criteria (Zavadskas, Kaklauskas, & Sarka, 1994). The steps of this method are expressed below.
1. The decision matrix is formed.
In (4), X is the decision matrix and x ij denotes the value of criteria j th for the alternative i th . In this matrix, m is the number of alternatives and n is the number of criteria.
In (5), R is the normalised matrix and r ij presents the normalised elements.
3. Weights are provided to the matrix by using (6).
Note that, D is the weighted normalised matrix and y ij indicates the weighted normalised elements.
4. Weighted normalised scores are calculated for beneficial criteria as well as cost criteria employing (7).
Remark that, S þi is the beneficial score and S Ài is the cost score in (7).
5. Relative priority of alternatives (Q i Þ is obtained by (8).
Moreover, a mixed fuzzy approach of this method with BWM has been performed by Amoozad  to rank the key factors of the sustainable architecture (Amoozad Mahdiraji, Arzaghi, Stauskis, & Zavadskas, 2018). Furthermore, Roy et al., in 2019 applied COPRAS to prioritise hotels across several criteria from tourists' view of point (Roy, Sharma, Kar, Zavadskas, & Saparauskas, 2019). Recently, a novel uncertain approach of this method has been presented (Garg & Nancy, 2019).

Digital banking strategy
Technology development in bank sector is critical to attract and maintain customers. Digital banking has been used to reach this aim via telephones, internet and mobiles (Mbama & Ezepue, 2018). Digital banking will begin a serious competition between banks with other banks and other financial actors e.g., finteches (Grym, Koskinen, & Mannien, 2018). Hence, developing an appropriate strategy for digital banking is crucial.
Digital banking influences on every aspect of banking and gains many competitive advantages such as providing more accessible products and reducing cash payments in markets (Nguyen & Dang, 2018). In return, banks face a wide range of challenges on the way of digitalisation for instance lack of infrastructural facilities, network problems, customers' resistance in front of technology and etc. (Nayak, 2018).
Previous related researches in the field of data mining, clustering methods, big data and MCDM is reviewed in Table 1.
According to Table 1, there are three types of related researches.
Type 1. Researches focusing on reviewing and developing the algorithms and methods; Type 2. Researches demonstrating the application of these algorithms and methods in different fields e.g., banking industry. Type 3. Researches comparing and analysing the use of different algorithms and methods on a similar problem.
However big data can create value in serving customers, though a few researches have been found to focus on the application of data mining in Iranian banking industry. Moreover, the volume of big data employed in other researches has been limited in most cases. Besides, developing digital banking strategy is not skilled adequately in Iranian banks and this can harm the success of the banks. After all, the capacity of data mining tools and the combination of these tools with MCDM methods has not employed sufficiently.
This research is similar to type two that combines clustering and MCDM techniques in a banking system to segment the customers and develop a strategy for each cluster. This research has a new approach to RFM and employs expert opinion which is novel among similar researches. Further, the huge volume of data are used in this research containing the data from over 2,600,000 accounts for 403 weeks (including more than 200 million transactions) in a bank. This research provides a framework to develop a digital banking strategy by applying the power big data and aggregation of data mining and MCDM methods. This framework can be practiced by Iranian banking industry and other analogous bank in other countries.

Research methodology
This research is performed on a specific real-world case study that employs data of transactions in a bank during 403 weeks. The research community is all the customers of a bank in Iran which is unnamed due to security considerations. There are 2,636,540 accounts in this bank. It should be noted that an account holder can have more than an account at the same time. Over 200 million transactions have been analysed during these weeks and the below steps are scheduled. Applying data analysis to better serve customers and conduct customer segmentation more effectively

RFM, k-means clustering
Clustering the customers based on RFM (Motlagh, Berry, & Neil, 2019) Clustering of residential customers found on their load characteristics

Load clustering
Discussing important advantages of model-based time series clustering approaches Step 1. Data Collection.
In the first step, financial transactions data of the customers are gathered from Oracle database. These data are anonymised by reason of preserving confidentiality. It means that customer's personal data (e.g., account number, phone number and etc.) is changed in a way that doesn't affect undesirably on research. All the data contain 260 million records.
RFM values described in 2.2 are calculated found on prepared data for each account. In total, 2,636,540 data records have been formed that each demonstrates an account data.
The number of clusters is determined 6 based on the need of the bank (experts' opinions). In this step the RFM data are clustered using IBM SPSS Modeller, employing the K-Means method.
In this step, the clusters are named and ranked applying experts' opinion and BWM-COPRAS methods. LINGO software is employed for this step.
The significant clusters are preferred in this step. Experts can develop appropriate strategy for them. In Figure 3, the methodology process is shaped.

Research finding
In total, 2,636,540 data record have been formed that each demonstrates an account data. These data are gathered from transactions in Iran banking systems for 403 weeks; hence, Over 200 million transactions have been analysed during these weeks. RFM parameters are computed after data collection which was described in step 1. These data have been collected in collaboration with Kish Informatics Services Corporation which provides the banking information and communication infrastructure at national level.
As bolded in Figure 4, the maximum of weekly average balances is 900 using Formula (1). Experts have determined a ¼ 0:5, therefore the peak balance is 450 by formula (2). The weekly average balances are more than peak balances in 8 weeks. The average of these weeks is 796 and the last week with the balance more than peak balance is the 18th week. Hence, the result of step 2 is demonstrated in Table 2.
In the following, the accounts are clustered applying k-means method found on step 3. The result is illustrated in Table 3.
The cluster 6 includes the most frequency of the members and the cluster 2 has the least. Figure 5 pictures the relative frequency of clusters.
The clusters are named and ranked by experts and prioritised by BWM-COPRAS in step 4. First the experts elected the account balance as the most significant criteria. Name and Rank of clusters employing experts' opinion is shown in Table 4. It is worth noting here that the experts of this research consist of highly ranked managers from Iranian Central Bank; thus, the results nearly present the real situation.   The meaning of each cluster is characterised in Table 5 based upon experts' decision.
In the following, 5 criteria are selected based upon focus group expert's opinion including monetary average, frequency average, monetary standard deviation (SD), frequency SD and the frequency of members. Decision matrix is presented in Table 6.
BWM method is used to derive the weights of criterions. Average of monetary has been determined as the most important and the frequency of members as the least important criteria by experts. Table 7 displays the resulted weights.
Eventually, priority of the customer's clusters by applying COPRAS could be viewed in Table 8.   Averted-low interaction Having no interaction for a while As shown in Table 8, Cluster 2 is the most significant cluster which is similar to expert opinion expressed in Table 4. Table 9 and Figure 6 demonstrate the comparison between Expert Opinion and COPRAS ranks.     Expert opinion and COPRAS results are similar for clusters 2, 3 and 4 though dissimilar for cluster 1, 5 and 6. The reason of dissimilarity can be the frequency of members which is meaningless for experts; however, it is applied in COPRAS method.
Cluster 1 (silver-low interaction) and cluster 4 (loyal) are picked in order by experts and COPRAS as the most influential clusters and the digital banking strategy is developed for them which is described in Table 10.

Conclusion
This research has been implemented applying customer-oriented and service-centric approach. First, customers' behaviour has been identified and studied employing bank transaction analysis. In the following, customers have been clustered by k-means and ranked by expert and BWM-COPRAS. Two influential clusters have been picked and digital banking strategy has developed for them according to Table 10. Among various weighing methods in this research nonlinear BWM has been employed. Performing other possible weighing methods encompassing AHP, FARE, DEMATEL, Entropy or other BWM methods such as Euclidean, fuzzy, interval, multiplicative or Z numbers BWM may affect the results of this research.
In our proposed approach, expert's opinion has been considered for weighing and ranking the clusters under certain circumstances. Considering this research under uncertain situations by performing grey, fuzzy, hesitant fuzzy, intuitionistic fuzzy or interval valued fuzzy could be interesting. Moreover, the ranking results could be compared with other similar methods such as VIKOR, LINMAP, TOPSIS, CODAS or EDAS.

Disclosure statement
The author reports no conflicts of interest.