Data-driven algorithm for throughput bottleneck analysis of production systems

ABSTRACT The digital transformation of manufacturing industries is expected to yield increased productivity. Companies collect large volumes of real-time machine data and are seeking new ways to use it in furthering data-driven decision making. A challenge for these companies is identifying throughput bottlenecks using the real-time machine data they collect. This paper proposes a data-driven algorithm to better identify bottleneck groups and provide diagnostic insights. The algorithm is based on the active period theory of throughput bottleneck analysis. It integrates available manufacturing execution systems (MES) data from the machines and tests the statistical significance of any bottlenecks detected. The algorithm can be automated to allow data-driven decision making on the shop floor, thus improving throughput. Real-world MES datasets were used to develop and test the algorithm, producing research outcomes useful to manufacturing industries. This research pushes standards in throughput bottleneck analysis, using an interdisciplinary approach based on production and data sciences. GRAPHICAL ABSTRACT


Introduction
The digital transformation of manufacturing industries is arguably the key to manufacturing companies' future success in improving productivity and staying competitive. Today's manufacturing companies are seeing an increase in available data from sensor technologies, manufacturing execution systems (MES), enterprise resource planning systems (ERP) and other production planning systems (Chand & Davis, 2010). For example, an automotive manufacturer in Sweden collects 50 rows of machine data per hour by MES; an average of 500,000 rows of machine data per machine, per year (Subramaniyan, 2015). When scaled up to production-system level, this increased availability of large volume data is termed 'big data' (Lee, Lapira, Bagheri, & Kao, 2013). It brings new opportunities to improve manufacturing by enabling data-driven decision making (Liao, Deschamps, De, & Loures, 2017;Shao, Shin, & Jain, 2015). Creating data-driven decision-making algorithms means drawing meaningful insights from high volumes of fast-moving data. Accordingly, many researchers and companies have begun examining ways of using data to reach fact-based decisions (Lavalle, Lesser, Shockley, Hopkins, & Kruschwitz, 2011;Harding, Shahbaz, Srinivas, & A, 2006;Wuest et al., 2016).
'Throughput', also known as 'production rate', is one of the major indicators of production system performance. Throughput is constrained by one or more machines in a production system, known as 'bottlenecks' (Goldrat & Cox, 1990). Since production resources (machines, robots, operators and so on) are usually scarce, they must be used efficiently to increase system throughput (Li, Ambani, & Ni, 2009). In maximising throughput, it is essential to identify bottleneck machines in a production system so that maintenance (and other production improvement activities) can be focused on these (Gopalakrishnan, Skoogh, & Christoph, 2013;Wedel, Von Hacht, Hieber, Metternich, & Abele, 2015;Guner, Chinnam, & Murat, 2016). A system-level decision support tool is therefore needed, to analyse these bottleneck machines (Jin, Weiss, Siegel, & Lee, 2016). The requirement for such a tool in a digitalised manufacturing environment was also identified by Bokrantz, Skoogh, Berlin, and Stahre (2017). They conducted a Delphi-based scenario study of the future of maintenance organisations up to 2030. They also pointed out that real-time data analytics may be used by production and maintenance engineers, as a tool to make decisions about the production system. Moreover, Li, Blumenfeld, et al. (2009) point out that the availability of real-time data will provide new research opportunities in detecting bottlenecks on the shop floor.
Most current research efforts to identify throughput bottlenecks are based on descriptive performance metrics (active times, queue length and so on). These, in turn, are derived from discrete event simulation models of the production system. However, a simulation model of a production system is time-consuming to develop, difficult to keep updated with improvements made in the actual production system and involves various approximations and assumptions in its construction (Fowler, 2004;Li, Chang, & Ni, 2009). These limit the use of simulation-based approaches in detecting true bottlenecks in production systems . The alternative to simulationbased approaches is the data-driven approach, in which real-time data collected from the manufacturing systems is used to detect bottlenecks (Li, Blumenfeld et al., 2009). More recently, there has been increased research into developing data-driven algorithms without using a discrete event simulation model. These methods are called 'data-driven bottleneck detection'. Data-driven bottleneck detection has many advantages compared to discrete event simulation-based approaches. The main ones are that it involves no approximations in the input data, can be made in real-time and offers more practical value because the real-time data reflects the true system dynamics (Li et al., 2013Subramaniyan et al., 2016). Furthermore, there can be different types of bottleneck on the shop floor. They might be due to random downtime, variations in processing times, setup time and so on. There could be multiple bottlenecks in the production system, but of different types occurring simultaneously (Li, Blumenfeld et al., 2009). These types are sometimes considered equal in the literature, but this is not always true in practice . Treating different types of bottleneck as equal results in poor planning of improvement activities, especially in an environment with multiple bottlenecks of different types (Gopalakrishnan, Skoogh, & Christoph, 2014). Therefore, to support maintenance and production improvement activities, it is important to identify bottlenecks and also understand bottleneck machine types. The existing literature on bottleneck analysis provides no support in terms of diagnostic insights to aid understanding of bottleneck types. This means a system-level decision support tool is required which not only detects bottlenecks but also gives some diagnostic insights into their types.
The purpose of this paper is to improve throughput by facilitating bottleneck analysis using actual machine data. We propose a data-driven descriptive and diagnostic algorithm for bottleneck analysis, based on the active period theory of bottleneck detection which was previously developed and tested in a simulation environment by Roser, Nakano, and Tanaka (2001). The algorithm tests the statistical significance of the machines that are detected as bottlenecks. The main result of this research is datadriven bottleneck identification and the creation of diagnostic insights for understanding bottleneck types. The main industrial contribution is that the algorithm can be easily computer-automated, thereby allowing system performance to be monitored and analysed. This allows engineers to make quick decisions on bottleneck identification and mitigation. Using an interdisciplinary approach of production and data sciences, it will raise the standard of throughput bottleneck analysis.

Literature review
The first part of this section studies and briefly discusses different applications of data analytics in the context of manufacturing. Current bottleneck detection methods and the development tools used are then studied. This is followed by a detailed description of the active period theory of bottleneck detection.

Types of data analytics
The term 'analytics' is defined as the science of logical sequence of steps used to transform data into actions through analysis and insights (Liberatore & Luo, 2010). The main applications of data analytics in understanding and explaining past performance from real data are descriptive and diagnostic analytics. These are briefly discussed in the context of manufacturing.
• Descriptive analytics: the science of identifying what has happened and what is happening (Delen & Demirkan, 2013). It includes quantitative description of data using graphical or tabular representation, or summary statistics of data that is useful as a basis for decisions (Banerjee, Bandyopadhyay, & Acharya, 2013). Examples include average throughput, machine downtimes and machine blockage and starvation times. • Diagnostic analytics: the science of identifying why something happened (Banerjee et al., 2013). Useful in identifying the causes behind performance (Shao et al., 2015) and exploratory in nature. For example, increased machine downtime can be tracked to any or all of various possible factors, such as non-availability of spare parts, worker absenteeism or increased priority of another machine.

Previous work on bottleneck detection
Approaches to bottleneck detection can be broadly classified into three major categories: (1) discrete event simulation-model-based bottleneck detection, (2) purely datadriven bottleneck detection and (3) real-time data coupled with discrete event simulation-model-based bottleneck detection (hybrid approach). An exhaustive list of the existing methods of bottleneck detection is given in Table 1. Table 1 shows that most of the approaches are limited to validation in a discrete event simulation-based environment. However, limited research has been done into how these methods can be used with respect to the real-time data captured from machines on the shop floor. Moreover, the different bottleneck detection methods use different metrics to explain machine performance ( Table 1). The active times, blockage and starvation times, inter-arrival time of parts, inactive times and waiting time are the metrics developed in the literature to identify bottlenecks. When these metrics for all individual machines in a production system are compared, the bottleneck machines can be determined.
The information from these performance metrics is sufficient to identify bottlenecks from a systems perspective. However, they do not give sufficient information to help plan for specific bottleneck improvement strategies. This is because they are heavily influenced by various factors such as random machine downtimes, variations in Table 1. Different bottleneck detection methods and support tools used to develop them.

Inactive period
Inactive periods (Blockage and starvation probabilities) (Sengupta, Das, & VanTil, 2008) Inter-departure time variance Variance in arrival rate of parts at machines (Betterton & Silver, 2012) (2) Data-driven approaches Turning point Total of blockage and starvation times  (3) Hybrid approach: real-time data coupled with simulation model

Sensitivity-based bottleneck detection
Throughput sensitivity of machine (Chang, Ni, Bandyopadhyay, Biller, & Xiao, 2007) processing times, setup times or any combination of these (Chiang, Kuo, & Meerkov, 1998). These uncertainties are jointly correlated to machine performance metrics, which provide no explicit information on the above. To manage bottlenecks effectively, it is important to identify the contributions of the various underlying factors as to why a machine constitutes a bottleneck. This, in turn, is useful in better understanding the type of bottleneck that occurs. For example, a machine can be a bottleneck based on cycle time, downtime, setup time and so on. The existing literature in Table 1 is limited to identifying bottlenecks without explaining the types. Out of all the methods proposed in Table 1, the active period method is the only one that can potentially give diagnostic insights. This is because it aggregates different individual active-state durations such as downtime, setup time and the like, to calculate the overall active-time metric of the machine across a production run (Roser et al., 2001). Therefore, the active time may be considered a derived metric, as it is based on the consolidation of various active-state durations. This technique enables the creation of diagnostic insights as in terms of individual active states; these can be used to understand the type of bottleneck occurring. The other bottleneck detection methods use standalone measured metrics to identify bottlenecks. For example, in the turning point method, the blockage and starvation times are measured directly from online records ) and thus enable no further diagnostic insights. Similarly, the method based on queue uses the average waiting time of the parts metric to detect a bottleneck (Faget et al., 2005) and does not enable diagnostic analysis of bottlenecks. For example, a part could be queued for many reasons, including machine down-state or longer processing times. The reason cannot be deduced from the data, simply by interpreting the average waiting time. The inter-departure time variance method then uses variances in the arrival rate of parts at the station to detect bottlenecks. Again, the variance data does not support diagnostic analytics. Inter-departure time variance can be caused by various factors, but cannot explain those reasons on its own (Betterton & Silver, 2012). Thus, the active period theory aggregation of machine states is unique in its ability to provide diagnostic insights. Roser et al. (2001) demonstrate the active period method using a discrete event simulation model of the production system. This method of bottleneck detection is based on machine states during the production run. The term active state describes the machine's state when an activity is being performed by/on it; when the machine is producing a part or being set up, retooled, repaired and so on. Figure 1 shows a sample timeline of machine states during a production run.

Description of the active period method
The active period percentage can be determined by computing the total time the machine is in an active state during its entire production run. This is done by aggregating the various active states of the machine. For example, in Figure 1, the active period percentages are the total aggregated individual active states of the machine, such as producing, changeover and down states over the production run, t 0 to t 9 . By comparing the active period percentages for all machines in the production line for the period t 0 to t 9 , a bottleneck machine is determined as the one with the highest active period percentage compared to all other machines in the production system.

Summary of literature review
The above literature analysis shows that bottleneck analysis can be divided into two steps, to prioritise the right maintenance and production improvement activities. The first includes a procedure to determine which machines in the production system constitute bottlenecks (system-level bottleneck detection). The second is to provide diagnostic insights into the type of bottleneck (machine-level diagnostics).
As explained in Section 2.2, previous research efforts were mainly focussed on detecting bottlenecks, with most of them using discrete event simulation-based approaches but seldom considering the second aspect. By contrast, the active period theory of bottleneck detection has the potential to detect the bottlenecks as well as giving diagnostic insights. However, this method has been developed only for detecting bottlenecks; it has been demonstrated in a discrete event simulation environment, with no data-driven model proposed so far. A data-driven algorithm to detect bottlenecks and give diagnostic insights into them must therefore be constructed for the active period method. Figure 2 shows the framework of the proposed approach. The Cross Industry Standard Process for Data Mining (CRISP-DM TM ) methodology was used to design the algorithm (Pete et al., 2000). The CRISP-DM methodology is a systematic methodology for data-mining projects and comprises the following steps: (1) problem definition, (2) data definition, (3) data preparation, (4) analysis and modelling and (5) evaluation and deployment. It is used extensively in data-mining applications in manufacturing (Gröger, Niedermann, & Mitschang, 2012). The method provides detailed neutral guidelines, meaning it could be used for any data-mining project. It also provides an iterative approach for evaluating the process at each step, in relation to the problem definition (Pete et al., 2000). One main advantage of the CRISP-DM model is that it can be fully or partially adopted, depending on the problem and requirements (Harding et al., 2006).  (Roser et al., 2001)).

Methodology
The CRISP-DM methodology was adapted to mine MES data and develop an algorithm for modelling it to describe the bottlenecks and diagnose it. The methodology is broadly divided into two categories: the algorithm development phase and the verification and validation of the algorithm. The algorithm development phase has three steps: (1) a literature study, (2) a study of a sample MES dataset from a real-world production line and (3) the design of the algorithm. The verification and validation steps include data preparation, data modelling, application of the algorithm to realworld datasets and the evaluation of results.

Algorithm development phase
The theory behind the active period percentage method (proposed by Roser et al. (2001)) was studied in detail. As shown in Table 2, a real-world MES dataset from a production line was also studied, to understand the type of information captured and support the descriptive and diagnostic analysis of bottlenecks.  In Table 2, 'production area' refers to the production line, 'work area' refers to the machine number, 'date and time' refers to the time stamp and 'state of machine' refers to the relevant machine's state. Insights gained from the MES dataset and detailed literature studies were used to design and develop the data-driven algorithm for the active period percentage method.

Verification and validation phase
In this phase, the developed algorithm is tested on three different real-world MES data sets taken from the automotive industry. The first step is data preparation, in which the various MES datasets are cleaned and prepared for application of the algorithm by removing duplicate data and any data points outside the relevant time limits and outliers. The next step is data modelling, including application of the algorithm to the dataset. The final step is evaluation, including the study and interpretation of results to identify production line bottlenecks and obtain diagnostic insights into bottleneck machines.
Verification is the process of evaluating whether the rule definitions of the algorithm satisfy the necessary conditions. In other words, checking whether the algorithm represents the problem description and specification (Lengyel, 2015). This was done by testing the algorithm on three real-world MES datasets from three different production lines in the automotive industry and verifying whether their definitions and rules satisfied the bottleneck detection theory, as developed by Roser et al. (2001). In this case, validation is the process of evaluating whether these rules meet end-user requirements (Lengyel, 2015). The algorithm was validated using the multiple-test studies approach by examining whether it satisfied the requirements of the production and maintenance personnel leading the different production lines.

Descriptive and diagnostic data-driven algorithm for active period percentage method
The algorithm consists of two steps: descriptive analytics and diagnostic analytics. The descriptive part of the algorithm analyses the real MES data to summarise the machines' performance in terms of active times. This enables detection of groups of machines which are likely bottlenecks. The diagnostic part of the algorithm then analyses any detected bottlenecks and explains the proportions of each active state. This helps identify the type of bottleneck which, in turn, provides insights into why a particular machine is behaving as a bottleneck.

Descriptive analytics: detecting bottlenecks
The descriptive analytics consists of two more parts. Firstly, the mean active period percentage for each machine in the MES is calculated over a specified number of production runs. A suitable statistical significance test is then run (using these percentages) to identify a set of probable bottleneck machines. . ; N f gis taken. Now, using the following equation, a mn can be calculated for all these pairs: The mean active period percentage for each machine m can then be calculated as:

Detection of bottlenecks and statistical significance
The assumption made when constructing the algorithm is that, for each machine, the active periods are independent of the production runs. This is derived from the findings of Roser and Nakano (2003), that the active states of a machine are independent of each other. Moreover, the active period percentages of each machine are assumed to be a sample of a normally distributed population. The standard deviation of the active period percentage for machine m can be calculated as: The standard error of the active period percentage for machine m can be calculated as: The confidence interval of the mean active period can then be calculated as: where α is the selected confidence level, and the critical value t α r ð Þ is found from the t-distribution table for r degrees of freedom (degrees of freedom = N−1) (Knezevic, 2008). This is done to account for uncertainty in estimating the mean value. In other words, an interval is estimated which most likely includes the true mean of the sample.
In the next step, the machine with the highest mean active period percentage is determined. For this purpose, let k = argmax (A m , m 2 {1,. . ., M}) (meaning that k is the machine with the highest mean active period percentage) and let A k denote its corresponding mean active period percentage. Now, the overall differences between the mean active period percentages of the bottleneck and other machines needs to be statistically tested. This means the statistical significance of differences (Knezevic, 2008) in A m for all machines m 2 1; :::; M f gnk with respect to A k is tested using the following equation: The difference in the mean active period percentage is statistically significant if: Equation (7) is used to determine the probable set of bottleneck machines in the production line. Let the set of bottleneck machines be represented as {BM}.

Diagnostic analytics: exploring bottleneck types
Using the set of identified bottleneck machines, the reason for their appearance can be diagnosed. The mean percentage elapsed time (E jx Þ in each active state for the bottleneck machines is calculated as: a xjn a xn ! ; j 2 1; :::; I f g; x 2 BM f g

Industrial test results
The data-driven algorithm was tested, verified and validated by applying it to three different real-world MES datasets from the production lines, at different automotive manufacturing companies. These tests, referred to as test studies 1, 2 and 3 are described below, with their results. The production and maintenance experts working in all three production lines were tasked with detecting throughput bottlenecks from a system perspective, using the MES data and to diagnose the bottlenecks, allowing improvement activities to be planned. In all three cases, the algorithm is verified by evaluating whether it is able to detect the bottlenecks and is able to give diagnostic insights into them. In all three cases, the algorithm is validated by evaluating the results with production and maintenance experts from the different production lines, to determine whether it satisfies their requirements in terms of bottleneck analysis.

Test study 1
The first test study was carried out on a machining production line at an automotive engine manufacturing company in Sweden. The production system consists of 12 processing machines, as shown in Figure 3. M1 and M2, M3 and M4, M5 and M6, M7 and M8, M9 and M10 and M11 and M12 are parallel machines, with buffers between each set. Each machine is connected to an MES, which records their activity during the production run. As recorded by MES, the various states of the machine are: Producing, Part Changing, Error, Comlink Down, Comlink up, Waiting, Not Active and Empty Run. The definition for each state (as given by the production and the maintenance experts working in this production line) is shown in Table 3. At any given point in time, the machines are in one of the different states. A sample MES dataset of the production line number, machine number, machine state and corresponding date and time stamps is shown in Table 2.
From Table 2, it can be seen that the MES monitors the machine state and the corresponding time stamp for each machine state is shown. This satisfies the primary requirements of the algorithm.

Application of the algorithm
Data Preparation: MES data for all machines in the production line was collected for 62 production runs.
Data modelling: the machine states as shown in Table 3 were classified as 'active' or 'inactive' based on the production and maintenance experts' guidance. Also, based on  Machine making new products Active their inputs, the states Comlink down, Comlinkup and Empty run are considered as producing states of the machine and therefore, they are considered to be active states. Evaluation: (a) Detecting bottlenecks: the algorithm results show the active period percentage for each machine plus the 95% confidence interval band (see Figure 4). Machine M2 has the highest active period percentage and is therefore used as a reference. The t-test results (active period percentages of other machines in relation to M2) indicate insufficient evidence to prove the M1 and M8 values statistically different for the given period. The t stat values are −7.15 and −0.07, respectively. Thus, M2, M1 and M8 are the bottleneck group.
(b) Exploring bottleneck types: M2, M1 and M8 constitute the identified bottleneck group. Dividing each active machine state according to active period percentage for each bottleneck machine aids understanding of the bottleneck type. Figure 5 shows the division of active states for each bottleneck machine. It shows that M1, M2 and M8 are  mainly Producing bottlenecks. This is because the Producing state is high compared to the Part-changing and Error states. Thus, actions to reduce machine cycle times, or reduce the random variations in the cycle times (running them during breaks, scheduling them over time and so on) can increase overall production system throughput. Although M1, M2 and M8 are mainly Producing bottlenecks, maintenance teams need to prioritise these machines and carry out maintenance-related activities to ensure maximum availability.

Test study 2
The second test study was carried out on an automated assembly production line at manufacturing company that assembles the car body parts. The production system included five workstations (labelled S1 to S5) and two buffers (B1 and B2) as shown in Figure 6.
Each workstation is connected to a monitoring system. The monitoring system records the Down, Blocked, Starved and Producing states of the station. Table 4 shows definitions of these station states, as given by the production and maintenance experts. Table 5 shows examples of the station data that is recorded, such as location in the production line, alarm category, station state, product type and corresponding date and time stamps from the monitoring systems.   5.2.1. Application of the algorithm Data preparation: station state data (recorded by the monitoring system) was collected for 40 production runs. Data modelling: station states recorded in MES (see Table 4) are classified as 'active' or 'inactive', to compute the active period percentages.
Evaluation: (a) Detecting bottlenecks: the active period percentage of each station and 95% confidence interval were computed, see graph in Figure 7. This shows station S2 has the highest active period percentage and the t stat values of other stations relative to S2 are all positive. Hence, S2 is the bottleneck in this production line.
(b) Exploring the type of bottlenecks: station S2 is the only bottleneck in the production line. Figure 8 shows the division of active period components for S2. From this, it can be seen that for 55.62% of its active time, the station is in the Producing state, which is the highest. However, S2 is Down for 44.38% of the active period, which is also high. This input is useful to production and maintenance teams in deciding which station state needs addressing, to increase overall system throughput.

Test study 3
The third study was carried out at an automotive component machining production line. The production system has 26 machines (M1 to M26) and five gantries (G1 to G5) to transport material between the machines, as shown in Figure 9.
Each machine and gantry has an ANDON colour light which the MES records. These ANDON lights have four colours: red, yellow, green and white. At any given time, the machine/gantry may show one or more ANDON lights. Table 6 explains the ANDON light definitions, as given by the production and maintenance experts. Table 7 Figure 7. Active period percentage of the stations with t-statistics.
gives a sample MES record of one machine's ANDON lights during a production run, including date and time stamps plus ANDON light duration.  5.3.1. Application of the algorithm Data preparation: MES data for all machines in the production line was collected for 31 production runs.
Data modelling: based on guidance given by the production and maintenance teams, the ANDON lights were classified into active and inactive states, see Table 6.
Evaluation: (a) Detecting bottlenecks: Figure 10 shows the active percentages of all machines and gantries with 95% confidence intervals obtained after application of the algorithm. This shows M20 has the highest active period percentage of the machines. The t-test results indicate M20's active period percentage is statistically not significantly different from that of M26 for the period analysed, as the t-statistic is −0.29. Hence, M20 and M26 are the main bottleneck groups in this production line.
(b) Exploring bottleneck types: Figure 11 shows the division of active periods for M20 and M26. The Producing state of machines M20 and M26 is high, compared to the down states. This indicates that these machines are mostly Producing bottlenecks. This should guide the in-depth analysis by the production and maintenance teams of such things as variations in actual cycle times for the various products in the machine and help them frame strategies to manage those  bottlenecks. However, the Down state of M26 requires more attention from the maintenance teams.

Discussion
The aim of this paper was to develop a descriptive, diagnostic, data-driven algorithm for bottleneck analysis using the active period percentage method. The algorithm models the MES data to describe the machines' active times using descriptive analytic  techniques (Shao et al., 2015) and identifies statistically significant bottleneck machines from a system perspective (Jin et al., 2016). Moreover, the algorithm gives diagnostic information on the proportion of different active states of the machines (Delen & Demirkan, 2013). In contrast to the bottleneck analysis methods in the literature, this approach is the first to explore opportunities for obtaining diagnostic information on bottlenecks. Furthermore, the active period method of bottleneck detection (proposed by (Roser et al., 2001) and previously used only in data-rich environments such as discrete event simulations) can also be used with MES data from the shop floor to aid data-driven decision making (Li, Blumenfeld et al., 2009). Demonstrating this algorithm in real-world production lines is a step towards closing the widening gap (pointed out by Liao et al., 2017) between laboratory-based or simulation-based solutions and industrial applications. Exploring the contribution of each active state to understanding bottleneck types is particularly important when prioritising improvement activities. For example, in the case of a Producing bottleneck (Chiang, Kuo, & Meerkov, 2001), as shown in the Test study 1 in Section 5.1, cycle time reduction activities can be carried out or variations in processing times further analysed and reduced. Reactive maintenance activities can also be prioritised, especially for cycle-time bottlenecks (Gopalakrishnan et al., 2013). For machines which constitute downtime bottlenecks, corresponding sensor-level information from their components can be further analysed to explain any abnormal behaviour. Understanding bottleneck types is also critical when there are multiple types of bottleneck machine in a production system (Li, Blumenfeld et al., 2009). For example, if a production system has a combination of producing and downtime bottlenecks, maintenance teams can make an engineering decision on prioritising and optimising the amount of preventive and reactive maintenance action on each different type of bottleneck. Such an approach enables systematic planning of bottleneckbased improvement activities (Guner et al., 2016;Gopalakrishnan et al., 2013;Li, ambani et al., 2009).
The proposed algorithm has several advantages. Firstly, it uses only data on machine states, plus corresponding timestamps, as recorded in MES. Secondly, it eliminates the use of simulation models to identify bottlenecks. Figure 12 shows a comparison of simulation-based and algorithm-based bottleneck detection. As there is no simulation model, no approximations are made to the inputs (Fowler, 2004;Leemis, 2004). Thirdly, the algorithm can be integrated with existing maintenance decision-support tools, to improve workflow and prioritise preventive and reactive maintenance activities (Li, ambani et al., 2009). This makes it easier for engineers to view data and results from different systems across the facility. It means they can access key indicators to aid understanding of the bottleneck type affecting a specific production system, thus improving bottleneck response times. Therefore, machine information captured by an MES can be used to improve engineers' decision-making strategies (Lavalle et al., 2011). Lastly, the same algorithm can also be used to analyse material-handling equipment (such as gantries) in combination with the machines (demonstrated in Test study 3, see Section 5.3). This is because such equipment can negatively impact production line throughput in the same way as an individual machine (Roser et al., 2001).
Although there are several advantages to the data-driven algorithm, it does have a few working assumptions. Firstly, it can only be used if there is sufficient historical machine data describing machine activities and with corresponding time stamps. Also, large sets of machine data are required, to reduce the width of confidence intervals and find potential bottleneck groups (Roser et al., 2001). Moreover, the historical machine data should be representative of the production system's steady-state behaviour. Secondly, the descriptive and diagnostic insights provided by the algorithm are limited to the types of data available in MES. This means we cannot draw conclusions beyond the dataset under consideration. However, the algorithm gives different active-state components as a percentage of active durations. This aspect will guide engineers as they further investigate the different states and identify and understand the root causes. Thirdly, this algorithm detects bottlenecks from a utilisation perspective only. While this is important in improving the production flow and maintenance planning, it should also be acknowledged that a machine can also be a bottleneck from a quality perspective. Lastly, the descriptive algorithm is constructed based on independence assumption of the machine's active periods across its production runs. Thus, a future research direction would be to adjust the descriptive bottleneck detection algorithm to examine correlations between active periods across production runs.

Conclusion
Developing data-driven algorithms will remain necessary as an enabler of data-driven decision making on the shop floor. In this study, we attempted to address the question of how real-time machine data can be used for throughput bottleneck analysis. We proposed a data-driven, descriptive, diagnostic algorithm using active period theory as an alternative to the discrete event simulation-based modelling used in bottleneck analysis. The algorithm we developed was tested in three real-world production lines. Demonstrating the proposed algorithm in the real-world production line helps produce research outcomes that are useful in industry, thus enhancing the use of scientific knowledge. The proposed algorithm can be computer-automated to facilitate realtime decision making. The diagnostic information it yields can then be evaluated by engineers with practical experience in the production system and appropriate bottleneck management strategies framed. Thus, the aim of the descriptive and diagnostic algorithm is to complement engineers' efforts to manage true bottlenecks more effectively. Compared to existing literature, which focuses mainly on detecting bottlenecks, the research study proposed in this paper reinforces the importance not only of identifying bottlenecks but also knowing their type, so that they may be managed effectively.
Using data-driven descriptive and diagnostic research, more sophisticated diagnostic algorithms may be built on top of simpler ones by including new information from sensors and so on. Thus, a viable research direction would be using MES-based datadriven techniques to first detect and then understand types of bottleneck. Further aspects of bottlenecks can then be explored. So, an important future research direction is to integrate the proposed data-driven algorithm with other sensor-based systems, to deliver deeper diagnostic insights into bottleneck machines, leading to actual root causes. Maheshwaran Gopalakrishnan http://orcid.org/0000-0001-5102-6559