Methodology of monitoring key risk indicators

Like in many preventive activities, the effectiveness of a risk policy is hard to measure. The objective of this article is to represent the methodology of monitoring a risk management policy. We use the Markov chain methodology for the calculation of the key risk indicators (K.R.I.s) of the process. Observing them through time, it is possible to conclude the positive attitude of the risk-preventing measures undertaken on the observed process. The methodology is represented in the case of archaeological excavations.


Introduction
In order to better manage future incidents, each company must be able to capitalise on previous mistakes (Mouatassim & Ibenrissoul 2015). In this spirit, the enterprise risk management (E.R.M.) is implemented and becoming more and more popular  Daud, Yazid, & Hussin, 2010;Gorze n-Mitka, 2013;Ahmad, Ng, & McManus, 2014). Since the mid 1970 s, its concept has expanded from manufacturing organisations and been adopted by the public sector (Drennan & McConnell, 2007;Schiller & Prpich 2014). Daud et al. (2010) expose that, under the E.R.M. term, many different things are understood. This article is oriented to deal with the risk connected to the process, as Ciocoiu and Dobrea (2010) wrote, the 'mitigating process' or 'risks of business processes'.
Two challenges initiated this article. The first one is the fact that the I.S.O. 31000, 31000 (2009) standard, in its Section 5.6, requires the monitoring and review of the risk management. The second one is tightly connected to the first one. The fact is, that I.S.O. does not give any methodology or solution how to do that. Therefore, no wonder that John Loxley has written that 'while publications all call for risk to be measured, the majority of them do not indicate how that might be done' (Loxley, 2010, p. 70).
Many articles about risk management use the method of Failure Modes and an Effects (Criticality) Analysis (F.M.E. [C.]). This is one of the most proactive methods of risk management (Yong et al., 2018;Borkovi c, Mil ci c, & Donevski, 2017;Erbıyık, Can, & Kuşçu, 2014). In our research, we want to establish the mechanism which enables the supervision of risk management successfulness. We developed the key risk indicators (K.R.I.s) through which a risk process might be monitored and supervised. It is represented in archaeological excavations.
Several articles prove that risk management in the museum sphere is necessary. T etreault (2008) reports on the enormous damage to Canadian cultural heritage institutions. Kuzucuoglu (2014), in his article, reports two in the year 2010, and even five death cases in the year 2011 in Turkish libraries, archives and museums. There are some warnings given about digging (Patterson, 2013;similar SWA 2015) without following a risk methodology and the K.R.I.s calculation. With the implementation of the proper risk treatments, we expect to minimise risk event occurrences and alleviate the consequences when they occurred. Comparing K.R.I.s over time shows the successfulness of the risk policy.
The meaning of the importance of risk management exposes Bosilj Vuk si c, Brki c, and Tomi ci c-Pupek (2018) to define that, besides strategic alignment, top management support, a process-oriented structure, process performance measurement and other social aspects, it should become one of the critical success factors in the adoption of business process management software.

Literature overview
K.R.I.s are similar to the key performance indicators (K.P.I.s) and are often mistaken for these. While the K.P.I.s are focused on the historical process performance, the K.R.I.s are oriented towards future threats (Scarlat, Chirita, & Bradea, 2012;Beasley, Branson, & Hancock, 2010). Coleman (2009) defines K.R.I.s as statistics or measurements that can provide a perspective of a company's risk position. The goal of developing effective K.R.I.s, is to identify the relevant metrics that provide useful insights into potential risks that may have an impact on the organisation's objectives (Beasley et al., 2010).
It is beneficial to measure them over time in order to detect trends and provide contextual information (Davies, Finlay, McLenaghen, & Wilson, 2006; as well as Coleman, 2009;Frigo & Anderson 2011;Beasley et al., 2010). Scandizzo (2005) also highlighted the importance of the establishment of a database of quantitative information that can be used to model the operational risk profile of the organisation, as well as to guide management action in both corrective and preventive terms.
K.R.I.s can be reduced in two ways: by decreasing the probability that the risk event occurs, or by minimising the consequences. Analysing the impact of the risk event occurrence or its probability, and consequently its effect and therefore its weight in the process, can be found in articles dealing with multi-criterion decision making (M.C.D.M.). Gigovi c et al. (2017) implement their own methodology for a geographical information system for a flood in an urban area, Roy et al. (2018) introduced the D.E.M.A.T.E.L. model for the key success factors with the aim to control hospital service quality management. An interesting variety of ranking methods can be found in Mukhametzyanov and Pamu car (2018) and Komazec, Mladenovi c, and Dabi zljevi c (2018).
For the calculation method, we have followed the principle that simple methods do not provide precise solutions, while complicated ones require too much effort and time (Golubovi c et al., 2018). The processes are able to transit from one state to another, under the influence of many factors (Ginevi cius, Trishch, & Petra skevi cius, 2015). These transitions can be represented through different techniques, such as text descriptions, various diagrams, graphs, etc. One of them is a presentation through matrix form. We have chosen this because it is the basis for using Markov chains as a method for the calculation of the K.R.I.s. This method is also one of the techniques for risk assessment suggested by the I.S.O. 31010, 31010 (2009) standard. It is classified under the code B.24. The Markov chain model is one of the most widely used probabilistic models, because of its simplicity and ability to model various types of phenomena (Skulj, 2009). An increasing number of implementations of Markov chains are recorded, such as student progress in the case of Alawadhi and Konsowa (2010), Adeleke, Oguntuase, and Ogunsakin (2014) and Brezavscek et al. (2017). Gurning and Cahoon (2009) Koubek, 2015;Shuff, 2015;Wawak, 2015). The Public Risk Management Association has also placed more emphasis on the I.S.O. (PERI, 2010). Gjerdrum and Peter (2011) find that there is more in common between C.O.S.O. and I.S.O. than in opposition (similar Bosetti, 2015). Figure 1 presents the main part of the standard.
For our research, we have defined the probability or the frequency for each risk event in the process of archaeological excavations. For each risk event, we have also estimated its consequences (in the phase 5.4.4 Risk evaluation). Every risk case was estimated by minimum and maximum effects. In the next section counter measures were prepared.
I.S.O. 31000 requires ensuring that the controls are effective and efficient. They need to obtain further information for improving the risk assessment, analysing events, changes, trends, etc. (I.S.O. 31000, 31000 2009). Similar demands I.S. O. 31004 (2013) where stands, that there should be a comprehensive programme in place to monitor and record risk performance indicators. To fulfil the requirements, we have developed the K.R.I.s for this process. For our process, we have defined the following metrics: First and most important is the cost of the risk event in terms of time. Generally, archaeological excavations start when an investor requests permission to build an object in a certain territory, and the corresponding ministry expresses interest in an investigation for research. The archaeologists get a narrow timeframe in which they have to complete their work. In most cases, this time is 35 working days. The majority of risk events consume some of this valuable time. The second indicator measures the number of people involved in the event.
For the third indicator, we measured the direct expenditure that the event caused. The fourth K.R.I. is the overall costs that cover the time duration, involved workers, plus direct costs.
Every indicator was estimated from a minimum and maximum perspective. For example, the theft of equipment can cause zero time loss if the damage is not worth speaking of. This means that the expenditure is less than 50 EUR and is not worth reporting to the police. The second option is that the theft is reported to the police, but they do not start an investigation at the site. In this case, one man loses one or two hours preparing an announcement and communication with the police. In the third case, if the damage is serious, the police start an investigation and search for biological traces. In such a case, all (on average 20) workers stop their tasks for two to four hours, until the police finish their activities.

The calculation method
In Markov analysis, a process can be considered as a collection of variables X 0 , X 1 , … X n . Each X i is interpreted as the state of the system at the time i. The number of states is finite and the number N represents the number of states in which the system can be found. There also exists the set of numbers P ii , where i,j ¼ 1 … N, representing the probability that the system in state i will transfer to the next state, into state j. The collection {X n , n ! 0} constitutes the Markov chain. Some states are transition states. This means that they transform the process from one state to another. Other states are absorbing. When the system reaches any of the absorbing states, it remains in the absorbing state forever. Generally, a discrete event process is represented in a matrix form as: Where Q is a matrix of the transient states. These are the states of the system that the process will never return to; R is the matrix representing transitions from transient to absorbing states; I represents the identity matrix and 0 represents the zero matrix.
To achieve the goal of our research, we need to calculate the expected average number of occurrences of individual transition states before absorption. The absorption in our case is the ending of the project. We calculated this through the fundamental matrix N. The equation for the fundamental matrix is: Identity matrix I in our case is a matrix of the same size as the matrix of the transient states Q, for which the following holds true: p (i,j) ¼ 1 for i ¼ j and p (i,j) ¼ 0 for i 6 ¼ j. The mathematical proof and explanation is given in Hudoklin Bo zi c (1999) and Beichelt (2006).
We will multiply the average occurrences of the individual state with the impact of the state either for the minimum and maximum expected lost time, involved workers or costs.

The implementation of the RISK management in the process of archaeological excavation
We got the basic data about what was going wrong and how often it happened from the archaeological archive and from workers employed on these excavations. For activity 5.4.2 -Risk identification (see Figure 1), the archive of all archaeological researches in the northeast part of our country for the period of 2006-2016 was analysed. During the stated period, there were 460 excavations. After eliminating the terrain analyses and sounding, we focused on 320 projects. The major statistical data from the journals and diaries of these excavations show that the most popular duration of a single excavation was 35 days. On average there were 20 workers employed.
Unfortunately, the documentation from those digs was incomplete and therefore not useful for the identification of risk events. We collected data about events that have happened, through interviews with participants. They reported 376 risk events. We classified them into 48 types. The British Columbia Museums Association listed four major areas of risk: people, property, funding and community perception (Hall & Duckles 2005). In our research, oriented on archaeological digging, we identified five sources of risk events: assets, human factors, location, weather and administration. Table 1 lists 9 event types out of 48, with 90 event occurrences out of 376. For each event type, we have recorded the number of occurrences in the third column under the label Freq.
In the following text, we present the analysis of the events involving assets only. This means the risk events coded from AS1 to AS5.
In the next step, according to I.S.O. 31000 -5.4.3 Risk Analysis (see Figure 1), we analysed each event that was recognised as a risk event. We treat it as a state in which the observed process can be placed.
The malfunctioning of a machine was numbered as state no 1. It occurred 14 times. From this state, the system can transfer into one of the following three states: Repair in the field, where an end user (operator) eliminates a simple defect. We have marked this as state no. 2. This happened seven times out of 14. Servicing involving the intervention of a specialiststate no 3. This occurred five times. Replacement of the damaged machinestate no 4, which occurred twice.
Every one of these states transfers into the normal state on the next step. Fire in the aggregate (5) is a simple event. After the occurrence, the system moves back to its normal state.
Failure of the submersible pump (state 6) happened 11 times. Four times, the end user (state 2) eliminated the defect without assistance. The remaining times, the device was replaced (state 4).
Defect in a camera or drone is marked as state no. 7. It transferred to either the state of self-repair (state 2) or to the state of lost pictures (state 8). The first case happened six times out of seven occurrences. Pictures were lost one time.
State no. 9 is damage to expensive equipment, such as a theodolite or tachymeter. In two out of nine cases, the equipment was repaired in the field (state 2), in 6 cases it was replaced (state 4) and in one case the malfunction was detected too late. Figure 2 represents a graph of the system states with the probabilities of the transitions.
In the phase of risk evaluation (5.4.4see Figure 1), we estimated the consequences of each registered event. Every consequence was analysed using three criteria. The first was the lost hours due to the event occurrence. The second was the estimated workers involved in the event. The third criterion was the estimated direct costs. We estimated every criterion through the minimum and maximum impact. For the final figure, we calculated the overall costs caused by the event occurrence. They were calculated by multiplying the time lost by the workers involved and the event duration, multiplied by the average hourly cost, plus the direct costs. The average hourly rate was estimated to be 35 EUR. An example of data risk evaluation is presented in Table 2. According to the data from Table 2, malfunctioning of a work machine involves 1 to 2 workers who are occupied for 1 to 2 hours with the defect if they fix it in the field. It costs between 0 and 50 EUR. Overall, in this case, one to four hours are lost. It is estimated that one hour of work by one worker costs on average 35 EUR. So the overall costs are at least 35 EUR (only one worker occupied with the repairs for one hour), up to a maximum of 330 EUR (two workers occupied for four hours multiplied by 35 EUR plus 50 EUR for the direct costs).  In this phase (no. 5.5.see Figure 1), we prepared countermeasures for every recorded event. For all 48 event types, we prepared 328 proposed actions for risk avoidance, minimising likelihood, changing the consequences, sharing the risk, etc. All in the scope of options suggested by the standards. Table 3 presents the example for event AS2.

Monitoring and review
For the presentation of the method of monitoring and review (activity no. 5.6 in Figure 1), we will focus on events classified as 'Assets' in the presented sample in Table 1 -Part of the List of Identified Risks. They are: Malfunctioning of a work machine, Fire in the aggregate, Submersible pump malfunction, Defect in a camera or drone and Damage to a theodolite or tachymeter.
We have formed a process matrix from the data we collected during the risk analysis. Due to the lack of space dedicated for this article, we decided to use only five potential accidents in the presentation of the K.R.I. calculation, as mentioned previously. Overall, they occurred 42 times. During the phase of risk identification (see the previous Section 4), we recorded 48 threat types that occurred 376 times. Another very important factor that led to our decision is the fact that the gathered data was collected, based on the memories of the interviewed workers. The frequencies of each individual occurrence are therefore not reliable. Our only goal is to represent the methodology of the K.R.I. calculation.
Every risk occurrence can generate additional steps. For instance, the event type, malfunctioning of a machine, can take the process into three different states: (1) an operator can repair the machine, bringing the process into the new state no. 2; (2) an operator can call for professional servicing, which transfers the process into state 3; or (3) the machine is replaced by another onestate 4. Therefore, for the selected sample of five risk events, we have 10 different process states (see Figure 2 -Graph of the system). We have added two additional states: state 0 (zero), which represents everything functioning without any problems, and state 11, which is the End of project.
The process matrix in Table 4 represents the process matrix P for the selected risk events. In the matrix, there is state no. 11 (end of the project), an absorbing or sink state. States 0 through 10 are transient states and they form the matrix Q. This means that we need to exclude state no. 11 from further calculations. 11200 in the Table 3. Example of countermeasures for event AS2fire in the aggregate.

AS2
Fire in the aggregate Logging of electrical aggregates must be done by an accredited person Damaged electrical equipment must be replaced immediately It is forbidden to use the aggregate in a closed area Casting of the aggregate must be grounded Before starting the aggregate, all the electrical supplies must be switched off The exhaust pipe must be at least 7 m away from other objects In the case of a petrol spill, all the contaminated earth must be removed Fire appliances must be available at all times Table 4. Matrix of the process for the selected risk events. denominator is the result of the multiplication of the 320 excavations by the average excavation duration, which is 35 days. According to the previously stated formula, we have subtracted the matrix of the transient states Q from identity matrix I and thus get the matrix (I -Q)see Table 5. Then we calculated the fundamental matrix (I -Q) À1 . We did this using Excel's field function MINVERSE. The results are presented in Table 6.
In this matrix, we have a balanced distribution of the number of occupied states in which the process is placed, on average. It shows how many times a certain state repeats on average, if it started from that certain state. In our case, only the first row of the matrix is important. Every project starts from the beginning, when everything is OK.
We can say that the fundamental matrix represents the weight of a single process state. By multiplying this weight with the estimated lost time, the number of people involved, and the expenses caused by the accident event, we got the K.R.I.s. They are calculated in Table 7.

Results
The results of our research are presented in Table 7. The columns represent the observed states. The last one represents the K.R.I.s. The rows are as follows: The first row of this table is a copy of the first row in matrix (I-Q) À1 in Table 6. These numbers represent the weight of each individual state of the incident event.
The following two rows are the estimated impacts or consequences in terms of the time spent in each state. The first row represents the minimum estimated time spent when the process reaches the state and the next one the maximum estimated time spent. The data is copied from Table 2 -Part of the List of Evaluated Risk Event Consequences. The next two rows represent the minimum and maximum number of involved workers when the process reaches the state. Next are the minimum and maximum direct costs. The last two rows summarise the overall costs. The estimated average price per hour per worker (35 EUR) is multiplied by the number of expected workers involved in the event and by the number of hours spent if the process reaches this state.
A similar calculation is performed for the maximum number of wasted hours per archaeological dig. It gives the exact number 1. Therefore, we conclude that it can be expected that for an average dig, between 0.366 and one hour will be lost due to negative events.
The last column of the table represent the K.R.I.s. We have calculated them as the sum of multiplications of weights for each state by the estimated impacts or consequences. For example, the average minimum lost time was calculated as:   The similar calculation is done for the maximum number of wasted hours per archaeological digging. It gives the exact number 1. Therefore we can conclude that it can be expected that on average digging it can be expected between 0.366 and 1 hour will be lost due to the negative events that is expected.
The same calculation has provided the K.R.I.s: It can be expected that between 0.209 and 0.506 workers will be lost per archaeological dig. On average, unexpected negative events will cause a loss of between 58 and 430 EUR. The overall costs will be between 71 and 815 EUR per archaeological excavation project.
It is vital to stress that the calculated data is given only for the five selected risk event types, out of a recorded 48. They represent 42 events out of 376. We have not calculated all the events, since the data was gathered through interviews with participating workers. Their memories will have faded over 10 years and therefore the data is not reliable.

Conclusion
Now, we have established a database containing the main entities: the event, where the risk event is described. as well as the occurrence with a description of the concrete event, the consequences and countermeasures with the work assignments. These are connected to the ongoing projects. The database ensures the strongest possible anonymity of the data input, so workers are protected concerning any 'confessed failures'. From now on, all accidents, risk events and their impacts will be promptly recorded.
With the aim of observing the K.R.I.s, we recommend that every year, a matrix of the possible states is developed and the K.R.I.s calculated. The method is simple   Table 7. Key risk indicators. enough to be used anytime anywhere and only Excel is needed for the calculation. It is now possible to monitor the quality and effectiveness of the risk management policy, exactly as expected in the I.S.O. standard. It is not possible to predict all events that can occur in the observed process. New events will surprise us all the time. On the other hand, the probabilities of risk occurrence are very low. With the aim of eliminating the strong impact of new events on the K.R.I.s, we suggest observing them through calculations based on a threeyear period.

Disclosure statement
No potential conflict of interest was reported by the author(s).