Analysis of the energy service in non-interconnected zones of Colombia using business intelligence

Abstract This work aims to evaluate the provision of energy services in non-interconnected zones (NIZ) between 2018 and 2019 using open data from the Colombian government and business intelligence methodology. The analysis is approached from the ETL (Extract, Transform, Load). It focuses on the variables obtained from the National Monitoring Centre (CNM) for monitoring the operation of these zones, where there is a Telemetry System that provides monthly information on location variables, energy variables, such as active and reactive energy maximum power, maximum power per day, and hours of service delivery. We concluded that in the years, 2018–2019 energy consumption in NIZ has remained stable with a variation of 0.19%. On the other hand, the NIZ include 56% of Colombia’s states, and the energy consumption is concentrated in 6 of the 93 municipalities with 80% of national consumption. The challenges in these areas are associated not only with the quality and provision of energy services but also with the environmental challenges that mean that most of these areas obtain their energy supply from fossil fuels, being protected areas such as ecological reserves, the biosphere, or the jungle.


PUBLIC INTEREST STATEMENT
This work aims to evaluate the provision of energy services in non-interconnected zones (NIZ) between 2018 and 2019 using open data from the Colombian government and business intelligence approach. The NIZ include 56% of Colombia's states. Their challenges are associated not only with the quality and provision of energy services but also with the environmental challenges that mean that most of these areas obtain their energy supply from fossil fuels, being protected areas such as ecological reserves, the biosphere or the jungle.
are associated not only with the quality and provision of energy services but also with the environmental challenges that mean that most of these areas obtain their energy supply from fossil fuels, being protected areas such as ecological reserves, the biosphere, or the jungle.

Introduction
Data analysis plays an essential role in generating knowledge, obtaining patterns, and predictions important for strategy formulation. Business intelligence (BI) is a methodology to get information from an organisation or process systematically to allow stakeholders (mainly strategic and tactical levels) to understand the organisation's dynamics from the data to make informed decisions.
BI encompasses a wide variety of software applications and processes that enable organisations to, among other things, collect data from various sources, evaluate it, and prepare it for analysis, develop and run queries, create reports, dashboards, and data visualisations to answer specific questions for making and implementing business decisions (TechTarget, n.d.).
BI allows to optimise the management data and information of the company, follow up the fulfillment of the goals in the medium-and long-term, and improve the capacity of response and decision-making to obtain higher efficiency (ORACLE, n.d.).
Within the BI architecture, the interaction between its components is important (Brannon, 2010) describes the importance of four parts for BI, which are explained as follows: a) System source: define the data to be obtained, identify the types of data available and the characterisation of the data; b) Data acquisition: it consists of a process of data extraction in a single repository; c) Storage: where the information through data cleaning processes is stored; d) Reporting: tools allowing the analysis of information, from reports, dashboards, queries, dynamics supported by statistical or predictive analysis.
The Non-Interconnected Zones (NIZ) are the areas that do not have access to the National Interconnected System of energy (SIN) in Colombia and are the municipalities, townships, localities and off-grid villages where about 3.04% of the population of Colombia lives and comprises about 66% of the national territory which, due to geographical and environmental conditions, are isolated from the SIN, where the energy to supply the needs of its inhabitants must be generated in the same zone.
The companies providing public electricity service located in the NIZ can develop, in an integrated manner, the activities of generation, distribution, and commercialisation whose formulas to determine the costs of these activities are established by the Energy and Gas Regulation Commission (CREG). The NIZ electricity is provided mainly by diesel generation plants, solar panels, and small hydroelectric plants, depending on each zone conditions and generation potential.
This paper presents a business intelligence analysis of the current state of service provision in the Colombian non-interconnected zones, using information from the National Monitoring System available on governmental open data portal. The methodology used for data analysis is ETL approach (Extract, Transform, Load), showing the results in each of the following stages: data characterisation, findings of the data cleaning process, selection of variables of interest, and dashboard design using the Power BI tool.
The discussion shows the analysis of results regarding the location of NIZ, energy consumption at the national and regional levels, and hours with no service. Finally, the conclusions exposed the importance of open data quality to facilitate its use by researchers and the challenges regarding energy access in non-interconnected areas.

Business intelligence
Business Intelligence (BI), according to (Tdwi, 2005), is the methodology that comprises processes, tools, and technologies to manage data in information and knowledge information to improve understanding of the dynamics of the organisation and support decision-making based on back end technologies, such as data warehousing and processes in the "front end" to generate queries, reports, analysis and tools to display information as shown in Figure 1.
Business intelligence solutions generate competitive and strategic advantages for organisations by increasing profits, improving efficiency, optimising processes, saving time and costs, and impacting decision-making. Information systems based on emerging technologies have high efficiency and capacity to capture, store and process data. BI tools make it easy for the user to create queries, analyse data using techniques such as data mining and statistical analysis, as well as forecasting (Bhatiasevi & Naglis, 2020).
The amount of data is increasing; therefore, the functionality of analysis and visualisation of data in an easier way through flexible and interactive reports to better understand the information and its impact on the business is very relevant. The organisations need agile access to the information underlying these systems, and the ability to analyse data in real-time for strategic decisions is one of the potentials of business intelligence (Turan & Ugur, 2018). BI provides integrated data that management draws upon to improve decision-making scenarios and deliver valuable information to organisations' tactical and operational levels.
We consider relevant references for Business Intelligence and its application in the energy sector where two trends can be identified: (a) Oriented to strategic aspects of the sector: Muntean et al. (2021) designed a process to develop information artifacts, which are the essence of a business intelligence & analytics system, to monitor the SDG 7 about SDG 13 indicators; Schinkinger (2020) explores how data analytics can be performed in real-time and help companies in decision making, sales, and strategy development by turning data into information and gaining new insights from it for energy suppliers; (Zia et al., 2019) describe outputs which may include the system feasibility, greenhouse gas emissions, cumulative financial costs, natural resource use; (Harison, 2012) based on the analytical framework, possible technical, organisational, and personal factors that affect the failure, partial or full success of BI system implementations are discussed and Argotte et al. (2009) present a comprehensive survey related to the BI applications for the Energy Markets (EM) to show trends and useful methods for tackling everyday EM challenges.
(b) Oriented to operational aspects: Pinheiro et al. (2020) deal with BI application, data visualisation tools to analyse the improvement of energy efficiency in residential buildings in Portugal; a data-driven methodology is used by (Lin et al., 2020) that will leverage individual energy consumption, available customer demographic data, third-party data, and innovative open-source data sets; while (Flieder, 2012) focuses on transparency of energy consumption through a targeted analysis by making use of business intelligence and (Gawin & Marcinkowski, 2017) provide a foundation for holistic analyses, optimisation, and forecasting of electrical power consumption within facility management for retail facility networks.
Business intelligence platforms are tools to assist in the analysis and visualisation of data. While all platforms have similar functionalities, they also differ in others. Each platform has strengths and weaknesses, advantages and disadvantages, so it is complex to adopt one or the other.
For the implementation of a BI model, the software's necessary characteristics must have for the visualisation of results must be taken into account. (Ali et al., 2016) presents some characteristics on which to compare different tools used to perform BI, and Table 1 is a comparison of the most relevant software tools for its analysis: It is necessary to specify that regarding the challenges (Turan & Ugur, 2018) he explains that "companies expect to obtain competitive advantages by adopting BI applications, but these benefits can vary significantly depending on the individuals within the company who are the end-users.
While the decision to adopt BI applications is typically made at the organizational level, the ultimate success of the BI program and the effective use of BI applications is determined by the individual users and, in particular, by individual factors" and (Hasan et al., 2016) has identified some challenges to implementing a Business Intelligence solution such as the definition of business objectives, data management, limited funding, training, and user acceptance, taking into account that BI is still in the know-how, the acceptance of the new technology becomes a significant barrier in the adoption process.
Two objectives of applying BI are consistency and transformation; the organizations adopting BI for data consistency use a comprehensive data collection strategy, while organizations adopting BI for transformation use a problem-driven data collection strategy (Ramakrishnan et al., 2011).

Open data
Open Data generally refers to accessible information to everyone, machine-readable, available online, at zero cost, and with no limits on reuse and redistribution. Advances in information technologies and their growing adoption by public administrations through e-government and now intelligent government strategies have allowed a large amount of data from a wide variety of sources to be collected, processed, disseminated, and preserved. As mentioned by (Naser & Concha, 2012), "All data produced by public administrations, are public data and only the data available online in open, usable, reusable formats and under open licenses will be really public." While the concept of open data is directly related to open government, it is important to emphasise that open data is seen as an essential enabler of open government, contributing to its characteristics of transparency, collaboration, and participation.
The concept of Open Government Data (OGD) is a working philosophy to empower citizens and grant them access and license to use data generated by public entities to use, store, redistribute and integrate with other data sources (Solar et al., 2012). This is promoted under the premise of open government to promote democratisation, motivate citizen participation, and generate  (Attard et al., 2015), OGD is a subset of open data, and it is the data generated and related to government that is open and accessible to all in general (Kučera et al., 2013) to be used by citizens to add value. OGD can include budget information, spending, population, census, geographic information, public services, and national statistics.

Methodology
The BI process includes phases such as: identification of information needs, data acquisition, storage and analysis of information. BI manages raw data to be analysed according to established criteria and processed through human analysis in order to generate information that adds value to the organisation (Bouaoula et al., 2019).
The methodological process assumes the ETL (Extract, Transform, Load) approach, as presented in Figure 2.
Adapted from https://www.analytics.cl/analytics/las-ventajas-del-data-warehouse/ • Extraction: corresponds to the obtaining of data from the various sources (web, CRM, databases, sensors) to be stored in a database.
• Transform: it is the stage where data from different sources are evaluated, cleaned, and transformed according to the requirements of the organisation. The data warehousing is defined, the data from different datasets are normalised, and the entity-relationship model of the databases to be analysed is established.
• Loading: once the data is cleaned and transformed, it is loaded into the data warehouse, where it will be stored as new dataset updates are generated.
• Reporting: the visualisations of the variables of interest for the interested parties are designed.

Figure 2. ETL methodology.
The following is a description of the activities of the methodology based on the proposal by (Balachandran & Prasad, 2017):

Extracting
For the process of applying Business Intelligence, data is obtained on the status of service provision in different non-interconnected zones (NIZ) in Colombia identified by the Institute for Planning and Promotion of Energy Solutions for Non-Interconnected Zones (IPSE), and monitored by the National Monitoring Centre (CNM), which have telemetry. Figure 3 shows the metadata of each dataset corresponding to the data and the entity that manages it.
These datasets were collected from the national government's open data system (https://datos. gov.co/), where the monthly report of the CNM is stored giving an account of each of the municipalities of the NIZ. The variables related to the state of the provision of the electrical energy service are measured. The evolution of the datasets from 2017 to date is shown in Table 2.
At the end of the data extraction process from the open data portal, a total of raw data is recorded as follows: 31 datasets (CVS files), an average of 80 records per month representing information from at least 13 fields corresponding to 2017-2019

Transforming
Once the information from the open data portal was collected, a single dataset was generated for 2017-2018. When analysing the obtained datasets, some difficulties for data integration were identified. For data integration and quality, the Practical Manual for Improving the Quality of Open Data is taken as a reference (Gobierno De España, 2017) and some quality criteria of the Ministry of Information Technology of Colombia (Ministerio de Tecnologías de la Información y las Comunicaciones de Colombia, 2020) as shown Table 3.
Given the above findings, the following activities are carried out for data integration and cleaning: • Correction of incorrectly reported numerical data.
• Identification of null, empty and missing data.
After each dataset's cleaning process (normalisation of data types, identification of outliers, null, empty, or inconsistent data), it was stored in a single dataset, with 29,003 data (13 fields X 2231 records). There were 1388 empty and zero data, which represents approximately 4.38% and 167 empty records were found, representing 7% as follows: 88 in 2017, 69 in 2018, and 10 in 2019 showing that the completeness of the data has been improving.
The null or empty data by months are reported mainly due to failures in the measurement system. For the six states that have the highest percentage of active energy, which represents 90% of the national total, the previous year's data is taken as a reference; the other municipalities are eliminated since they do not significantly affect the trends in the national analysis.

Load
The Business Intelligence process regarding the provision of energy services in non-interconnected zones was carried out in the Power BI software, establishing the entity-relationship model ( Figure  4) and loading the datasets after the cleaning process.
We obtained an overview of the relevant tools to perform descriptive data analysis and Business Intelligence (BI) through of Table 1. Considering the characteristics of the tools and their implementation costs, disadvantages, and advantages, Microsoft's Power BI tool was selected in its free version.  Fragmented, duplicate and difficult-to-access data There was more than one file with a similar name and referring to the same month, making it challenging to identify the right information. Additionally, the datasets have changed in their reported variables, so it was necessary to analyse which fields existed for each dataset and set a pattern in which all datasets were left with the same number of fields.
Obsolete and outdated data Generally, one month's information is available at least three months after the measurement.

Outdated Metadata for Updated Data
There is no description of the variables in the metadata.
There is no uniformity in the data types described in the metadata with the datasets, even with the different updates.
Accuracy Some of the data entered were incorrect, making it difficult to clean up.

Consistency
Some data types were not consistent with the data entered; for example, dates defined as text or text did not support UTF8. In some datasets, the unit of measurement of the report was not clear, and this was found in the variables of active energy, reactive energy, and power.

Understandability
In this aspect, it has been improved significantly, but it is relevant to emphasise that, in many months, the comprehensibility was very low, and much information had to be deduced.

Availability
The information was always available, although with some delay in availability for measurement.

Traceability
There is always information on updates and creation dates.
This software is currently leading the Business Intelligence market, it is an easy to learn tool with a powerful language of data analysis expressions, useful to get the most out of the information.
Once the information was organized and consistent in the software, the variables of interest were selected:

Visualisation
The objectives for the visualization were defined, and the dashboard was designed as shown in

Identification of NIZ
About the results obtained from data analysis, it can be identified that the Non-Interconnected zones of Colombia ( Figure 6) represent 56% of the national territory (18 of 32 departments) and 9% of the municipalities (98 of 1102), concentrated mainly in the Pacific Coast region (Chocó, Cauca and Nariño) and the Amazon region (Caquetá, Guainía, Guaviare), Putumayo, Vaupés and Vichada), and the Caribbean the department of La Guajira. All these zones are border areas (mainly with Venezuela, Brazil, Ecuador), being biodiverse territories, located far from the center of the country, having problems due to conflict situations and being regions that lag behind the country's poverty and economic development indicators.
According to information from DANE (National Administrative Department of Statistics), between 2016 and 2018, 1.1 million people entered multidimensional poverty, with Guainía (65.0%), Vaupés (59.4%), Vichada (55.0%), La Guajira (51.4%) and Chocó (45.1%) topping the list. Concerning monetary poverty, reviewing the population's economic levels, the poorest regions of the country are Chocó, where 61.1% of the population is classified as poor according to this indicator, followed by La Guajira (53.7%); Cauca (50.5%); as the regions with the largest population that have obstacles in accessing basic needs and quality of life elements (Becerra, 2019), among these needs is access to energy.
About the characteristics of the territory, it should be noted that San Andrés and Providencia's department is part of the Seaflower Biosphere Reserve, the largest in the world. Biosphere reserves are areas of terrestrial or coastal/marine ecosystems or a combination thereof, recognized internationally under UNESCO's Man and the Biosphere (MAB) program (MinAmbiente et al, n.d.). On the other hand, the municipalities of Leticia, Puerto Carreño, and Mitú are part of the Amazon rainforest that supplies moisture to all of South America, contributes to the stabilization of the global climate and has the highest biodiversity in the world (Gentry, 1986). Approximately 6.3% of Chocó is protected by ecological reserves and national parks. In other words, these areas of great importance for climate change are at high risk due to the current conditions of electric power generation, where the probability of accidents that could endanger the delicate ecosystem of the BVI territories with possible impacts at a global level is increased.
Additionally, according to research by the foundations: Ideas for Peace, Insight Crime, Peace and Reconciliation, has collected data from the authorities to identify which areas are most affected by the conflict where Non-Interconnected zones, such as Nariño, Cauca, Chocó, Vichada, Guainía, Caqueta and Putumayo and La Guajira are located (Redacción, 2018).

Energy comsumption
About energy consumption at the national level in NIZ (Figure 7), according to the national monitoring system, 80% of the consumption is represented in 6 of the 18 departments in which this consumption is mainly concentrated in 6 of the 98 municipalities in this way: San Andrés accounts for 54% of national consumption, followed with a considerable difference by Leticia with 12%, Puerto Carreño with 7% and Mitú and Providencia each with 4%, and the remaining 33% is distributed in the remaining 92 municipalities.
It should be noted that to supply the previous demand for energy is provided in most of these municipalities with the use of fossil fuels (diesel generation plants). For example, a study by the Inter-American Development Bank (BID & De, 2016) estimated that 15 million gallons of diesel are required per year to operate the 18 generation units distributed around San Andres and Providencia, using one million gallons. This situation increases the damage to environmental conditions.
The national consumption of the NIZ with telemetry was estimated for 2018 at 369.15 MW, for 2019 it was 371.04 MW which represents a variation of 0.5%, as shown in Figures 8 and 9, which maintains a similar consumption in both years and the days of maximum demand in the week according to the data are Thursday and Saturday mainly after 7:00 pm The states of San Andres and Providencia had a consumption of 211.98 MW, Amazonas of 46.96 MW, and Vichada of 35.23 MW, which represents the highest energy consumption of the country, which is 90%. This consumption corresponds to their capital cities, which have a more significant number of inhabitants and are the only economical, commercial, and health centers in these areas. San Andres has an average monthly consumption of 16.6 MW, Leticia of 3.78 MW, and Puerto Carreño of 2.19 MW (Figures 9 and 10).
We can observe a decrease in energy demand in the months of February 2018 and 2019, where in general the trend at the national level is a drop in active energy in these particular months as shown in Figure 10, and in particular, for San Andres and Amazonas representing 66% of the demand are zones highly dependent on tourism, where the months with the highest occupancy and tourism visits are December, January and July, and those with the lowest tourism are February and May, according to tourism statistics, 6 and in the 2 years analysed these months have a similar behaviour.
On the other hand, given the biodiversity of these territories is tourist attractions of national recognition, for example, San Andres and Providencia's islands are one of the leading destinations of Colombia in which during 2018 and 2019 exceeded one million visitors. According to the Hotel and Tourism Association of Colombia-COTELCO, hotel occupancy in 2019 was 57%, 1.5% more than in 2018, and San Andres leads the indicators of higher hotel occupancy in 2019 with 71.6% (MinICT; MinICT & De Industria, 2020). Similarly, in Leticia, the tourism has increased significantly, especially in the last 6 years reaching 90,000 visitors per year. Regarding the growth projections for this sector, Amazon has been positioned as a tourism alternative for locals and foreigners, and the accommodation capacity of Leticia grew by 20% since 2017 (Portafolio, 2018), which is reflected in these two municipalities representing 66% of national demand for the Non-Interconnected Zones of Colombia (Figure 10).

Hours without service
Regarding the hours without services nationwide, a total of 10,414 hours was estimated for 2019 and 8,619 hours for 2018. According to the reported data, there is a 20.8% variation compared to the previous year, as shown in Figure 11. As for hours without services at the national level, it is estimated that there will be a total of 10,414 hours in 2019 and 8,619 hours in 2018. These unserved hours are mainly concentrated in the departments of Chocó (21.7%), Cauca (21.4%), and Nariño (21.3%), which represent 70% of the unserved hours nationwide. These states are located on the Pacific coast of Colombia, and the municipalities with the most hours without service are Tumaco, Timbiquí, López, Ungía, and Bojayá, which account for 47% of the total national daily hours reported According to Figure 12, the month with the most hours without service in 2018 was January, and for 2019 was December, and the month with the least hours without service in 2018 was May, and for 2019 was July. On the other hand, it shows an increase in the total number of hours without service from 2018 (8619 hours) to 2019 (10,414 hours), represented by a 20.8% variation. Finally, we can observe that in Colombia, there are localities with more than 10 hours without daily access to energy.
These municipalities are isolated areas and belonging to the Pacific coast, which directly impacts these communities' quality of life and welfare.

Conclusion
Colombia has significant challenges related to the provision of services in the so-called Non-Interconnected Zones (NIZ) where roughly two million people do not have access to energy with high levels of poverty and violence. Furthermore, these zones are exposed to high environmental risks due to their power generation based on fossil fuels Additionally, it is evident that although there is a continued energy consumption between 2018 and 2019, San Andres and Providencia along with Amazonas and Vichada, are the determinants of consumption at a national level of the NIZ. For this reason, it is necessary to join efforts to diversify the energy matrix of these areas and promote renewable energy due to the special conditions of these territories with regard to the environmental impact.
The Colombian government and public bodies must advance in the generation of open quality data that allows for more efficient their use for research processes. The challenges associated with accuracy, traceability, and availability are critical to exploit these data and facilitate them from the academia and the public sector to create value for the data.
For future work, it is expected to analyse other contextual information of these areas, such as population, some environmental and geographical conditions that are available as open data to generate different types of analysis and perform predictive data analysis of variables, such as energy consumption, hours without service, maximum power from the data being analysed and to create a web platform to facilitate this analysis.