Estimating the return to scale of an integer-valued data envelopment analysis model: efficiency assessment of a higher education institution

Abstract The integer radial data envelopment analysis (IDEA) approach has been introduced by numerous efficiency studies to accurately measure various real life applications where some of their inputs and/or outputs are restricted to integer values. However, all these studies have not addressed the nature of return to scale (RTS) in terms of an output integer-valued DEA model. This paper thus discusses the RTS for an output oriented integer-valued DEA model to identify its regions (increasing, decreasing, and constant). These regions are identified by classifying the efficiency scores resulting from the output-oriented integer-valued DEA model under constant return to scale (CRS) and variable return to scale (VRS) as well as the intensity factors’ values corresponding to the model under CRS. As an illustration purpose, the output oriented integer-valued DEA model under the two technologies was applied to the empirical data of public universities in Malaysia. The results showed that only half (50%) and slightly over half (55%) of the universities were efficient under CRS and VRS, respectively. In spite of this fact, some of the inefficient universities have a room for efficiency improvement by decreasing their inputs and increasing their outputs. In addition, the calculated scale efficiency and regions of the RTS showed that 50% of the universities were operating in the region of CRS, and each 25% of them were operating in increasing and decreasing RTS.


Introduction
Data envelopment analysis (DEA) was first proposed by Charnes, Cooper, and Rhodes (1978) based on the seminal work of Farrell (1957). Since then, DEA has been applied in many government organizations for assessing the relative efficiency of a homogeneous set of decision making units (DMUs) (Abbott & Doucouliagos, 2003;Amirteimoori, Emrouznejad, & Khoshandam, 2013;Johnes, 2006). To enhance the evaluation power of DEA, many efficiency studies have been conducted (e.g. Amirteimoo et al., 2013;Avkiran, 2001;Chen, Deng, & Gingras, 2011;Cook & Seiford, 2009;Lu, 2012). There are two types of DEA efficiency frontier depending upon model orientation (Yang, Li, Chen, & Liang, 2014). First, the CCR model proposed by Charnes, Cooper and Rhodes (1978) assuming that return to scale (RTS) is constant (CRS). Secondly, the BCC model proposed by Banker, Charnes and Cooper (1984) assuming that RTS for inputs-outputs is variable (VRS). The main difference between the two models is that the BCC model imposes a convexity constraint to estimate the RTS. As an importance efficiency measure, the technique of RTS is used to classify the operating region of a DMU under assessment whether it is operating within the decreasing, increasing or constant RTS region (Banker, Cooper, Seiford, Thrall, & Zhu, 2004;Golany & Yu, 1997).
All values of inputs and outputs in classical DEA models have however been assumed as non-integer (i.e. real-valued data). The real-valued data of inputsoutputs obviously conflict with many real life applications affecting the accuracy of the efficiency scores. To obtain the accurate efficiency scores, the requirements of mixed integer linear programming (MILP) have been integrated into the CCR-DEA model by Lozano and Villa (2006). The model was then enhanced by Lozano and Villa (2007) to involve the technology of VRS. However, these two models violated the convexity constraint since they dealt with all inputs-outputs as integer values. To rectify this issue, a hybrid integer CCR model classifying a dataset into subsets of integer and real values was introduced by Kuosmanen and Matin (2009). The drawback of Kuosmanen and Matin's (2009) model is that it cannot calculate the nature of RTS. The concept of RTS of the input-oriented model was then considered by Matin and Kuosmanen (2009). Matin and Kuosmanen's (2009) model in terms of outputoriented and VRS was extended by Du, Chen, Chen, Cook, and Zhu (2012). However, the Du et al. (2012) model only considered outputs as hybrid integer values, with inputs as real values. In addition, all the integer DEA studies did not consider the two technologies of VRS and CRS to estimate the nature of RTS (increasing, decreasing, or constant) in terms of an output-oriented integer-valued DEA model. This paper thus considers overcoming these two shortcomings of the existing model.
Integer-valued DEA models offer many empirical applications. This paper thus modifies the Du et al. (2012) model by improving its objective function and constraints to deal with mixed integer values of inputs and outputs. It also estimates the nature of RTS for the modified model by calculating its efficiency scores under the VRS and CRS, and then identifies the region of RTS (i.e. increasing, decreasing, or constant RTS). To the best of our knowledge, there have not been any studies considering RTS of the output-oriented model in terms of integer values of inputs and outputs. The modified model has two salient advantages: 1) it estimates the nature of RTS when some inputs and/or outputs are integer values, and 2) it identifies the real and integer values of inputs-outputs slacks. We illustrated the usefulness of our modified model through an empirical study measuring the operational efficiency of public universities in Malaysia.
The rest of this paper is organized as follows. Section 2 reviews some previous studies on the mixed integer radial DEA model. Section 3 estimates the RTS of the output oriented integer-valued DEA model whose development depends on the previous models. Its illustrative application involving 20 public universities in Malaysia is discussed in Section 4. Finally, Section 5 provides some concluding remarks.

Previous studies
Suppose that there are n sets of DMUs which need to be assessed. Each DMU uses a set of inputs m, x ij : i ¼ 1; :::; m to produce a set of outputs s, y rj : r ¼ 1; :::; s: Unit j denotes a DMU under evaluation, ðDMU j ; DMU 1 ; :::; DMU n Þ: Suppose x ij denotes the amount of the ith input consumed by the jth DMU, while y rj denotes the amount of the rth output yielded by the jth DMU. All the values of inputs-outputs are non-negative, i.e. x ij ; y rj >0 for all possible values of inputs-outputs. The non-negative weight of an efficient DMU o is represented by k j ; i.e. the efficient unit occupying a part on the efficiency frontier constructed by all efficient units. The procedure of the existing output-oriented model for evaluating the efficiency of a DMU o under the technology of CRS can be presented as follows (Cooper, Seiford, & Tone, 2006): k j x ij ¼ x i0 À s io i ¼ 1; :::; m X n j¼1 k j y rj ¼ d o y ro þ s þ ro r ¼ 1; :::; s k j ! 0 j ¼ 1; :::; n d o ; s io ; s þ ro ! 0 where e is a non-Archimedean infinitesimal. Based on form (1), each efficient DMU o which has a place on the efficient frontier obtains a unity score and all its input and output slacks are equal to zero (i.e., Otherwise, the DMU o is inefficient. Output-oriented model (1) provides suggestions on how to improve inefficient DMU o using the current level of inputs under the technology of CRS since it does not impose a convexity constraint (i.e., P n j¼1 k j ¼ 1). The convexity constraint was imposed onto form (1) by Banker et al. (1984) to evaluate the efficiency of the DMU o under the technology of VRS. However, the Banker et al. (1984) model assumes that all values of inputs and outputs are real under the two technologies of VRS and CRS.
To make the model more appropriate in dealing with real life situations, Kuosmanen and Matin (2009) constructed a hybrid integer-valued DEA model by classifying a dataset of inputs-outputs into a subset of real and integer values (i.e. hybrid integer-valued data). Their model classifies the real and integer values of inputs and outputs as I NI ; I I ; O NI ; and O I ; respectively. Clearly, the real and integer valued-data are mutually disjoint and satisfy I NI [ I I and O NI [ O I where I ¼ f1; :::; ng; O ¼ f1; :::; sg (Hussain, Ramli, & Khalid, 2015). Kuosmanen and Matin's (2009) model was extended later by Du et al. (2012) to introduce the output-oriented VRS model in hybrid integer-valued data, which can be presented as follows: The symbols s À io ; s þ r I o ; s þ r NI o reflect the non-radial slacks in terms of a real input and hybrid integer outputs, respectively. Note that form (2) discriminates between two types of input and output slacks. The first type of these slacks is represented by s À i which reflects the absolute difference between the reference set of inputs P n j¼1 x ij k j and their inputs of DMUs under evaluation. The second type is represented by s þ r I o and s þ r NI 0 which reflect the absolute difference between the reference set of integers, P n j¼1 x r I j k j and real outputs, P n j¼1 x r NI j k j and their projections of integer, d o y r I o and real outputs, d o y r NI o : Since the input or output slack measures the absolute distance between the reference set and the projection point of the related input or output, respectively, the reference set of an integer DEA model is free from the integer restriction (Du et al., 2012). The efficiency score of form (2) is revealed by d o whose value is equal to one for an efficient DMU o and less than one for an inefficient unit.
Moreover, DEA includes two types of efficiency measures which are technical and scale efficiencies. Each type has its own perspective of inputs-outputs to calculate the aggregate efficiency of a DMU (Fried, Lovell, Schmidt, & Schmidt, 2008). The DMU is technically efficient (TE) if it is unable to produce more outputs from its current level of inputs (Jahanshahloo & Khodabakhshi, 2004). The aim of technical efficiency is to evaluate how available resources can be used to produce the maximum amounts of outputs in an effective manner. A DMU may be technically efficient but it preserves too many or produces too few outputs. This information is derived from scale efficiency (Abbott & Doucouliagos, 2003). Scale efficiency is a measure discriminating between the efficiency score obtained from the technologies of VRS and CRS (Wu & Zhou, 2015).

Methodology of RTS
This paper aims to achieve two contributions to the literature of DEA. First, the proposal of an output-oriented integer-valued DEA model dealing with real and integer values of inputs and outputs by modifying the objective function and input-output constraints of Du et al.'s. (2012) model. Second, the RTS estimation of the proposed model. The RTS of a DEA model depends on the two technologies of CRS and VRS. Therefore, the technical efficiency of the modified DEA model in terms of CRS and VRS is calculated. Hence, form (2) can be reformulated by dealing with a two-stage output-oriented integer-valued DEA model as follows: for all DMUs (see Lozano & Villa, 2006).
The methodology of this paper is illustrated graphically in Fig. 1. It presents a schematic view of estimating RTS of the modified output-oriented integer-valued DEA model. 1 Forms (3) and (4) of our modified model compute the efficiency measures (i.e. the efficiency scores and slacks of inputs and outputs) under VRS. In the same context, to compute the efficiency measures under CRS of the same model, the convexity constraint should be ignored. The main aim of calculating the efficiency scores under CRS and VRS is to estimate the nature of RTS of the output oriented integer-valued DEA model. To achieve this, the paper employs the three conditions proposed by Zhu and Shen (1995). These conditions depend on the values of the intensity factor k j s corresponding to the efficiency scores of the CRS technology (Saranga, 2009). The three conditions are presented as follows: i. If technical efficiency score of the CRS technique ¼ the efficiency score of the VRS technique, then RTS is CRS. ii. If technical efficiency score of the CRS technique 6 ¼ the efficiency score of the VRS technique and P n j¼1 k j < 1; then RTS is increasing (IRS).
iii. If technical efficiency score of the CRS technique 6 ¼ the efficiency score of the VRS technique and P n j¼1 k j > 1; then RTS is decreasing (DRS).
Note that the technique of RTS provides an unambiguous meaning if a DMU under assessment is projected onto the efficient frontier in terms of VRS. For efficient DMUs, CRS reflects that any adjustments of their inputs cannot increase their efficiencies since the model assumes that the relationship between the scale of operations and the efficiency is not significant. IRS reflects that an increase in all inputs will cause a high increase in the outputs. DRS meanwhile reflects that an increase in all inputs will only cause a small increase in the outputs (Banker, 1984;Saranga, 2009). The classification of RTS for an inefficient DMU can be estimated if the inefficient DMU can be projected onto the efficient frontier (Seiford & Zhu, 1999). Since all inefficient DMUs are not operating at the optimum scale efficiency, the main justification of calculating RTS is to classify the region of an inefficient DMU whether it is operating in IRS or DRS regions. To clearly illustrate the concept of RTS, this paper estimates the RTS of public universities in the context of integer restrictions.

Dataset
The sample used in this paper covered 20 public universities in Malaysia during the year 2011. Each university can be represented as an independent decision making unit (DMU). The dataset was collected through two sources, i.e. 1) the statistical report of the Ministry of Education (MoE) published online (Ministry of Education, 2012), and 2) some inputs and outputs were retrieved from a study carried out by Irliana et al. (2014). The inputs and outputs used in this paper were classified into three inputs and three outputs. The inputs are: 1) the number of postgraduate enrollment (x 1 ), 2) the number of undergraduate enrollment (x 2 ) (Katharaki & Katharakis, 2010;Kuah & Wong, 2011), and 3) the number of academic staff (x 3 ) (Hussain, Ramli, & Khalid, 2016;Taleb, Ramli, & Khalid, 2018). The three outputs are: 1) the number of graduated postgraduate students (y 1 ) (Daraio, Bonaccorsi, & Simar, 2015), 2) the number of graduated undergraduate students (y 2 ) (Daraio et al., 2015), and 3) the number of graduates who furthered their studies (y 3 ) (Cordero-Ferrera, Pedraja-Chaparro, & Salinas-Jim enez, 2008; Taleb et al., 2018). The statistical descriptive of inputs and outputs is reported in Table 1.

Emperical testing and discussion
Based on the three inputs and three outputs in the output-oriented integer-valued DEA model, the efficiency measures of the model under the CRS and VRS technologies are presented in Table 2.  Zhu and Shen (1995) considered in Section 3, the RTS of 10 out of 20 DMUs are CRS since each efficient DMU under the two technologies is at the efficiency frontier (i.e. the efficiency score of the efficient DMU is equal to one). Therefore, these DMUs can have more external and internal development opportunities for students and their staff . The remaining 10 DMUs are functioning in the regions of IRS and DRS since they are inefficient units. An inefficient DMU can be projected to be an efficient DMU using the approach illustrated in Section 4.3.
The results show that there are five inefficient DMUs which can be classified in the region of DRS, while the remaining five inefficient DMUs are classified in the region of IRS. Therefore, half of the inefficient DMUs are operating in the region of DRS. This implies that the scale efficiency (i.e. the ratio between technical efficiency scores resulting from CRS and VRS technologies) presented in the seventh column of Table 2 is not at the optimum level. Thus, any additional inputs to the educational process will lead to lower return of the DRS universities. The universities operating in the region of IRS have excess capacity and every additional input will produce higher returns (Banker et al., 2004). These IRS universities are good at attracting students. For this, they need to focus on their inputs in order to increase their educational outcomes and achieve the most productive size scale (Banker, 1984;Seiford & Zhu, 1999). Since all the inefficient universities are functioning in the regions of DRS and IRS and half of them are scale inefficient, the VRS technology is preferred to calculate the slacks of inputs and outputs, which can improve the inefficient universities (Ahn, Charnes, & Cooper, 1988).

Further analysis
Section 4.2 calculates the TE scores for efficient and inefficient universities under VRS and CRS and estimates the region of RTS and the scale efficiency for the universities in the sample. There are 11 universities achieving the full-efficient status under VRS. For the inefficient universities, there is still room to improve their efficiencies by determining their integer slacks of inputs and outputs as presented in Table 3.
In Table 3, we can see that all inefficient universities owe their problems to input slacks (input excesses) and some of the output slacks (output shortfalls) of graduated postgraduate students and graduated undrgraduate students. For instance, DMU 4 has excess in all its inputs and some output shortfalls of graduated students of postgraduate and undergraduate students. Thus, to improve the efficiency of DMU 4, all of its inputs should be decreased by 2244, 1556 and 1063, respectively, while its outputs related to graduated postgraduates and undergraduates should be increased by 1 and 563, respectively. These slacks are important for calculating the projection (forecasting) of inputs and outputs to improve the performance of the inefficient universities. This process is known as the projection of inputs and outputs (see Cooper et al., 2006, p.48;Taleb, Ramli, & Khalid, 2019), which can be applied to the inefficient universities to improve their efficiencies. Note that the slack of output shortfall of graduates pursuing their studies for all the inefficient universities is equal to zero as presented in the last column of Table 3. This result reflects that the inefficient universities are operating effectively under this output. The results also reflect that all of the universities in the examined year supported and encouraged their students to further studies.

Comparison of the results between the modified model and the existing model
To show the difference in terms of efficiency scores and input and output slacks between our modified output-oriented integer-valued DEA model and the existing output-oriented model, both models were run under the technologies of CRS and VRS. The results are reported in Table 4. Table 4 demonstrates that the efficiency scores resulting from the existing model (1) are never greater than the results from our modified model (3) (see proposition 1). This means that efficiency scores increase when the performance of DMUs is evaluated in terms of integer requirements. In addition, all the efficiency scores under the CRS technology are less than or equal to the efficiency scores under the VRS technology for both models (1) and (3). For example, the efficiency scores of DMU10 resulting from existing model (1) and modified model (3) under CRS and VRS are 0.64273, 0.64809, 0.68363 and 0.68563, respectively. Additionally, the number of efficient and inefficient DMUs resulting from modified model (3) are the same as that obtained from existing model (1) under the CRS and VRS; i.e. ten efficient DMUs and 11 efficient DMUs. We can also observe that the average of the efficiency scores of all the inefficient DMUs resulting from the existing model under CRS and VRS, i.e. 0.628318 and 0.704479, respectively, are less than the efficiency scores resulting from the modified model, i.e. 0.677472 and 0.798113. For all inefficient DMUs, their input and output slacks resulting from existing model (2) and modified model (4) are shown in Table 5.
The input and output slacks of the inefficient DMUs were obtained from stage-II of existing model (2) and modified model (4). All the input and output slacks were calculated under the VRS technology. Under the VRS technology, any increases in inputs will cause a disproportional increase in outputs. VRS is preferred if the relationship between the efficiency and scale of operations is significant (Avkiran, 2001). This paper thus relies on VRS for calculating the input and output slacks for both models as shown in Table 5. In Table 5, we can see that the input and output slacks obtained from both models are different. For example, the three input excesses and three output shortfalls of DMU8 obtained from existing model (2) are 354.00290, 2278.96400, 1662.71500, 15.22032, 0 and 0, respectively. Meanwhile, the input and output slacks with integer restriction resulting  from modified model (4) are 353, 2276, 1662, 16, 3, and 0, respectively. Thus, it is obvious that the integer input and output slacks are more accurate in calculating the input and output targets of higher education institutions. For this, the integer slacks can be used to improve the performance of the inefficient universities and provide their decision makers with new insights into their educational institution development.
Based on the integer input excess and integer output shortfall, the sources of inefficiency for all inefficient DMUs can be determined (Tyagi, Yadav, & Singh, 2009). This paper identifies the inefficiency sources by runninge stage-II of model (4) and then calculates the average of the integer input excess and integer output shortfall. Fig. 2 graphically depicts the main inefficiency sources of the universities for year 2011.
The main inefficiency sources are undergraduate enrollment (x 2 ), academic staff (x 3 ) and postgraduate enrollment (x 1 ). These three inputs should thus be focused by the decision makers of the inefficient universities by removing the amounts of their excesses to achieve efficient status. This result is consistent with that considered in Section 4.2 stating that the inefficient universities classified in the region of increasing RTS have excesses in their inputs. In contrast, the graduates pursuing their studies (y 3 ) do not create the inefficiency status since all its slacks are zero. For this, the universities operated effectively under this output in year 2011. The outputs of graduated undergraduates (y 2 ) and postgraduates (y 1 ) do not significantly cause the inefficiency status.
In general, simply rounding input and output slacks to their nearest integer values could cause the values to be located at the outside of a production possibility set. This issue has been catered for by our output-oriented integer-valued DEA model to provide decision makers with more accurate efficiency measures and targets. Additionally, the integer restrictions considered in our modified model can also calculate the benchmarking of each inefficient DMU more precisely. Thus, the model can effectively be used to evaluate the performance of institutions whose inputs and outputs are to be integer values.

Conclusions
Integer-valued DEA models have been used in many real situations whose inputs and/or outputs are integer values; e.g. the number of enrollment students, academic staff, graduates and academic papers published by a higher education institution. However, there have not been any studies addressing the RTS of an output-oriented integer DEA model. This paper contributes to the literature in terms of estimating RTS of the output-oriented integer-valued DEA model. To show its applicability, the model was used to measure the efficiency of public universities in Malaysia. The scores of technical efficiency under the CRS and VRS technologies, the scale efficiency and the region of the RTS were calculated and reported. Based on the empirical results, 50% and 55% of the universities were efficient under CRS and VRS, respectively.
The results of technical and scale efficiency suggest that the public universities in Malaysia are operating at a fairly high level of relative efficiency compared to each other. In spite of this fact, some universities still have room for efficiency improvement. The findings from both output-oriented integer DEA models (i.e. CRS and VRS) reveal that 50% of the universities are operating under CRS and each 25% of them is operating under IRS and DRS. CRS reflects that the efficient universities cannot increase  their efficiencies by adjusting their inputs. DRS reflects that any increases in inputs will only cause a small increase in outputs. IRS meanwhile reveals that any increases in inputs will produce a high increase in outputs.
To illustrate the importance of combining integer restrictions of inputs and outputs into the standard output-oriented DEA model to obtain more accurate efficiency measures, a comparison between the efficiency scores resulting from the existing model and our modified output-oriented integer-valued DEA model under CRS and VRS was performed. The efficiency scores obtained from the existing model are never greater than those obtained from our model under both CRS and VRS technologies. The input excesses and output shortfalls of the inefficient universities were also calculated using both models. This information can provide the decision makers of the universities with more precise measures and targets to improve their performance using the projection of inputs and outputs.
This study has two limitations. First, the efficiency measures would be more accurate if the data consisted of other important factors. e.g. the number of international faculty members, papers published in conference proceedings or reputable scientific journals and registered research patents. Secondly, the analysis was only based on one-year data which did not fully reflect the universities' performance. For future work, a more dynamic nature of the public universities in Malaysia can be captured using panel data.