Investigating photovoltaic solar power output forecasting using machine learning algorithms

Solar power integration in electrical grids is complicated due to dependence on volatile weather conditions. To address this issue, continuous research and development is required to determine the best machine learning (ML) algorithm for PV solar power output forecasting. Existing studies have established the superiority of the artificial neural network (ANN) and random forest (RF) algorithms in this field. However, more recent studies have demonstrated promising PV solar power output forecasting performances by the decision tree (DT), extreme gradient boosting (XGB), and long short-term memory (LSTM) algorithms. Therefore, the present study aims to address a research gap in this field by determining the best performer among these 5 algorithms. A data set from the United States’ National Renewable Energy Laboratory (NREL) consisting of weather parameters and solar power output data for a monocrystalline silicon PV module in Cocoa, Florida was utilized. Comparisons of forecasting scores show that the ANN algorithm is superior as the ANN16 model produces the best mean absolute error (MAE), root mean squared error (RMSE) and coefficient of determination (R 2) with values of 0.4693, 0.8816 W, and 0.9988, respectively. It is concluded that ANN is the most reliable and applicable algorithm for PV solar power output forecasting.


Introduction
Solar energy, which is the radiant light and heat originating from the Sun, is a type of renewable energy that can be harnessed to produce solar power. Solar power is clean and sustainable as its utilization does not emit greenhouse gases (GHG) into the atmosphere and it is continuously available so long as the Sun continues to radiate light and heat on to the Earth . However, a key factor which impacts the feasibility of solar power systems is the volatility involved in photovoltaic (PV) solar power generation. This volatility is caused by changes in weather and meteorological conditions. The steady production of solar power depends heavily on the presence of optimal weather conditions that enable a stable and abundant supply of solar energy to be captured by PV solar modules. These optimal weather conditions are uncontrollable and intermittent, hence causing the solar power output to fluctuate depending on the weather conditions Das et al., 2018;Kim et al., CONTACT Yusuf Essam yusufessam22@gmail.com 2019; Marcos et al., 2011;Nespoli et al., 2019;Seyedmahmoudian et al., 2018;Shivashankar et al., 2016).
To address the issue of uncertainty and intermittency in PV solar power output, scientists and engineers have turned to machine learning (ML). ML is a branch of artificial intelligence (AI) that makes predictions or forecasts by accessing data sets and learning through the analysis of complex trends. Many contemporary real-world problems from different fields have been tackled through ML implementations as shown by recent case studies and literature reviews. Deep neural networks have been used in studies on pattern recognition within healthcare systems (Shamshirband et al., 2021;. The random forest (RF), deep learningbased gated recurrent unit (GRU), and convolutional neural network (CNN) have been utilized in studies on daily reference evapotranspiration estimation within irrigation systems (Ferreira & da Cunha, 2020;Sattari et al., 2020). The intrusion detection tree (IntruDTree) and computational intelligence (CI) have been developed to detect intrusions in studies within the cybersecurity field (Sarker et al., 2020;Shamshirband et al., 2020). The least squares support vector machine (LSSVM) and RF have been employed in studies relating to electrical efficiency evaluation and optimal design of photovoltaic thermal collectors (Ahmadi et al., 2020;Shahsavar et al., 2020). The adaptive neuro-fuzzy inference system (ANFIS) and bagged semiparametric neural network (SNN) have been implemented in studies on the effects of climate change on wind power resources and crop yields (Crane-Droesch, 2018;Nabipour et al., 2020). In the field of PV solar power generation, the idea is to utilize ML and available weather data to forecast the PV solar power output beforehand. This helps reduce or eliminate uncertainty towards the availability or abundance of solar power at a certain time, hence increasing confidence of consumers and hybrid power system operators in solar power. By forecasting PV solar power output, the proportion of solar power injection into a hybrid power system can be better modulated (Das et al., 2018;Khandakar et al., 2019;Seyedmahmoudian et al., 2018) and the mechanism to alternate between different types of energy and power in a hybrid power system can be improved (Touati et al., 2017). Many different weather parameters have been shown to be correlated with PV solar power output, with the most frequently utilized weather parameters including the panel-of-array irradiance (POA), relative humidity (RH), and dry bulb temperature (DBT) (AlKandari & Ahmad, 2019; Chen et al., 2020;Das et al., 2018;Dolara et al., 2018;Khandakar et al., 2019;Meng & Song, 2020;Nespoli et al., 2019;Persson et al., 2017;Rana et al., 2016;Seyedmahmoudian et al., 2018;Tato & Brito, 2019;Theocharides et al., 2018;Touati et al., 2017;Van Tai, 2019; Van-Deventer et al., 2019;Wang, Li, et al., 2018). Other weather parameters that have been used are the panel back-surface temperature (BST), atmospheric pressure (ATP), global horizontal irradiance (GHI), diffuse horizontal irradiance (DHI), direct normal irradiance (DNI), and daily accumulated precipitation (DAP). Throughout the years, the artificial neural network (ANN) and the random forest (RF) algorithms have been frequently shown to be the most accurate ML algorithms for PV solar power output forecasting. Single-algorithm studies on PV solar power output forecasting using either ANN or RF have produced high-accuracy forecasts (Alomari et al., 2018;Dolara et al., 2018;Erduman, 2020;Liu & Sun, 2019;Massaoudi et al., 2021), while many multiple-algorithm studies on the forecasting of PV solar power output have demonstrated ANN and RF to outperform other tested algorithms such as ANFIS, support vector regression (SVR), k-nearest neighbors, extreme learning machine (ELM), non-linear autoregressive neural network with exogenous inputs (NARXNN), linear regression (LR), multiple linear regression (MLR), elastic net (EN), gaussian process regression (GPR), adaptive boosting (AdaBoost), k-nearest neighbors (kNN), and others (Jawaid & Nazirjunejo, 2017;Khandakar et al., 2019;Kim et al., 2019;Meng & Song, 2020;Rana et al., 2016;Rana & Rahman, 2020;Su et al., 2019;Tato & Brito, 2019;Theocharides et al., 2018;Van Tai, 2019).
However, recently there have been new studies published that have documented the ability of other promising algorithms to accurately forecast PV solar power output, namely the decision tree (DT), extreme gradient boosting (XGB) and long short-term memory (LSTM). The study by Rahul et al. (2021) in India forecasted PV solar power output using the DT algorithm with high accuracy and investigated the relation of PV solar power output with several weather parameters. The web application project by Shetty et al. (2021) has shown the DT algorithm to forecast PV solar power output with better accuracy compared to several other algorithms, including the RF algorithm. Meanwhile, the XGB algorithm has produced better PV solar power output forecasting performance than state-of-the-art ML techniques including the RF algorithm in a comparative analysis by Carrera and Kim (2020) in South Korea. Interestingly, in a study by Munawar and Wang (2020) to develop and evaluate a framework for the selection of ML algorithms and data features for PV solar power output forecasting, it was found that the XGB algorithm outperforms both the ANN and RF algorithms. Additionally, the LSTM algorithm has been recently demonstrated to produce high-accuracy PV solar power output forecasts in a univariate forecasting analysis by Konstantinou et al. (2021) and multivariate forecasting analyses by Harrou et al. (2020) and Zhou et al. (2020). Apart from the field of PV solar power output forecasting, DTs, XGBs, and LSTMs have also proven their merit in many other fields. DTs have produced good predictive performances in studies relating to traffic flow, construction safety, rock bursts, heart diseases, and extreme precipitation (Cho et al., 2017;Crosby et al., 2016;Maji & Arora, 2019;Pu et al., 2018;Wei et al., 2020); XGBs have shown success in predictions relating to bioactive molecules, wind turbine gearbox monitoring, anode effect, wet deposition, and pipe heat transfer (Babajide Mustapha & Saeed, 2016;Qian et al., 2020;Stojić et al., 2019;Wang et al., 2017;Zhou, Zhang, et al., 2019); while LSTMs have been used to successfully make predictions in fields relating to domain generation algorithms, air pollutant concentration, battery life, supercapacitor life, and air quality (Li et al., 2017;Ma et al., 2020;Woodbridge et al., 2016;Zhang et al., 2018;Zhou, Huang, et al., 2019).  Carrera and Kim (2020), Munawar and Wang (2020) Long short-term memory (LSTM) Harrou et al. (2020), Konstantinou et al. (2021), Zhou et al. (2020) shows the proposed ML algorithms for PV solar power output forecasting based on existing literature. Given the promise shown by these three algorithms recently, a new research gap has surfaced in which it is currently unclear which among the aforementioned algorithms, consisting of the established algorithms (ANN, RF) and the promising algorithms (DT, XGB, LSTM), is the best algorithm for PV solar power output forecasting in terms of accuracy and applicability. Based on the new evidence from recent research (Carrera & Kim, 2020;Harrou et al., 2020;Konstantinou et al., 2021;Munawar & Wang, 2020;Rahul et al., 2021;Shetty et al., 2021;Zhou et al., 2020), the testable hypothesis for the present study is that the promising algorithms (DT, XGB, LSTM) are better in forecasting PV solar power output compared to the established algorithms (ANN, RF). In the present research, these five ML algorithms are investigated with regards to the forecasting of PV solar power output given a historical data set on weather and PV solar power output. The forecasting performance of the ML algorithms are evaluated and compared using selected performance measures.
The application and utilization of ML algorithms in a certain field requires to be continuously trialed and researched, as better suited ML algorithms may be discovered for a certain task as time passes and technology evolves. Hence, the main motivation of this study is to discover the current most suitable and reliable ML algorithm which can be easily implemented by solar and hybrid grid operators to forecast PV solar power output. By determining the best ML algorithm for this task and obtaining PV solar power output forecasts of higher accuracy, solar and hybrid grids may effectively minimize their losses in profit due to the power imbalances that occur as a result of the volatility and intermittency of solar power generation (Harrou et al., 2020;Zhou et al., 2020). A good PV solar power output forecasting system will greatly aid in maintaining a cost-effective grid and balancing the supply and demand of power as stakeholders will be able to effectively decide on common issues that come with the operation of solar power in power grids such as the supply of backup power using conventional fossil fuel resources, peak load shifting, power storage, and the power supply continuity (Carrera & Kim, 2020;Konstantinou et al., 2021;Laopaiboon et al., 2019;Mohammed & Aung, 2016;Persson et al., 2017). Therefore, the primary contribution of this research to the current body of knowledge is the investigation to establish the most accurate and applicable ML algorithm among established algorithms (ANN, RF) and promising algorithms (DT, XGB, LSTM) for the purpose of PV solar power output forecasting, given the reports of the promising ML algorithms by recent publications (Carrera & Kim, 2020;Harrou et al., 2020;Konstantinou et al., 2021;Munawar & Wang, 2020;Rahul et al., 2021;Shetty et al., 2021;Zhou et al., 2020). This research is important to investigate whether there currently exists ML algorithms that perform better than the ANN and RF algorithms in forecasting PV solar power output. Depending on the findings, this research sets out to further support the claim that the ANN and RF algorithms are the best algorithms to forecast PV solar power output, or to solidify evidence that there are possibly better ML algorithms for this particular forecasting task. In addition, this research also studies the effect of different weather input parameter combinations on the forecasting performance of each ML algorithm; and attempts to determine the best weather input parameter combination and the crucial weather parameters required for the forecasting of PV solar power output, given the available data set. Additional factors affecting forecasting performance namely the output parameter forecast horizons and the utilization of additional performance enhancing techniques, are comparatively analysed to better understand the applicability of each ML algorithm in forecasting PV solar power output. The performance scores of the best model in the present study are also compared with those from existing studies to understand the effectiveness of the methods used in the present study and to understand the feasibility of the model in real-world applications. The findings from the present study are suitable for broad international interest, as the materials and methods utilized are reproducible regardless of geographic location and the most reliable ML algorithm in the present study along with its required input parameters can be adopted by any solar or hybrid grid operator around the globe. By replicating the input parameter circumstances, test set up, and hyperparameter tuning, other parties may reproduce the PV solar power output forecasting performance shown in the present study.

Materials and methods
This section explains the processes the present study employs for the forecasting of PV solar power output forecasting in Cocoa, Florida. Important information regarding the case study; forecasting process; ML algorithms; data pre-processing; and evaluation measures are described.

Case study
In this section, information regarding the data used for this study is explained. Then, the background of the location for which data was collected is obtained.

Data
The present study utilizes data from a publicly available data set provided by the National Renewable Energy Laboratory (NREL) through a conference paper by Marion et al. (2014). This data set was prepared to be utilized in the validation of forecasting models for the performance of flat-plate PV modules. It comprises of weather and PV solar power output data for a period of one year at three climatologically different locations in the United States namely Cocoa, Florida; Eugene, Oregon; and Golden, Colorado, which represent subtropical, marine west coast, and semi-arid climates, respectively. The weather and PV solar power output data was measured at the three locations for seven different PV module types with different efficiencies. The seven types of PV modules are monocrystalline silicon modules; multi-crystalline silicon modules; microcrystalline silicon modules; copper indium gallium selenide modules; cadmium telluride modules; amorphous silicon tandem and triple-junction modules; and heterojunction with intrinsic thin-layer modules. Forty three data parameters relating to the weather and solar power output for the seven PV modules in the three aforementioned locations are available in the data set.

Location
From the available locations in the data set provided by the NREL, Cocoa was chosen as the case study location due to its varying weather and climate throughout the year, which would help in producing a more comprehensive study as different weather conditions generate different amounts of PV solar power. Cocoa, located in Southern United States, experiences a subtropical climate which brings about hot and humid summers, and cold to mild winters. The PV module is located in the Florida Solar Energy Centre (FSEC) at coordinates 28.3872°N, 80.7568°W, hence the weather and PV solar power output data were recorded by the FSEC authorities. The surrounding infrastructure and location of the FSEC can be seen in Figures 1 and 2, respectively. From Figure 3 the average global horizontal solar irradiance of Florida, which is an indication of the amount and intensity of sunlight in the area, is determined to be between 5.00 and 5.25 kWh/m 2 /day, with the highest average global horizontal solar irradiance in the United States recorded to be more than 5.75 kWh/m 2 /day in California, Arizona, and New Mexico.

Forecasting process
From the seven different PV module types provided in the data set by the NREL, the monocrystalline silicon PV module, is selected in this research for the development of ML forecasting models. The monocrystalline silicon PV module is selected as it is the commonly used PV module type in commercial buildings (Jumaat et al., 2019). The monocrystalline silicon PV module in the present study has an efficiency of 13.6% and a module area of 0.642 m 2 . The ML forecasting process employed in the present study can be seen as shown in Figure 4.

Machine learning algorithms
Based on the literature review conducted, five ML algorithms were selected for PV solar power output forecasting which is treated as a regression-based problem. Two established ML algorithms in the PV solar power output forecasting field, which are ANN and RF, and three promising ML algorithms namely DT, XGB, and LSTM, are chosen for utilization in the present study. A comparison on the PV solar power output forecasting performance between these ML algorithms is carried out. The Python programming language is selected to develop the ML models in the present study due to its English-like syntax which makes it easier to read and understand, and its vast libraries' support. The Python programming language has been commonly used in existing studies on PV solar power generation (AlKandari & Ahmad, 2019; Carrera & Kim, 2020;Khandakar et al., 2019;Kim et al., 2019;Li et al., 2016;Meng & Song, 2020;Persson et al., 2017;Rana & Rahman, 2020). The experimental setup for the development of ML models is shown in Table 2.

Decision tree (DT)
DTs, also defined as classification and regression trees (CARTs), are ML algorithms capable of solving both classification and regression-based problems. DTs utilize a structure similar to that of a tree, where each internal node represents a test of an attribute, each leaf node represents a category, and each branch represents an output of a test . This ML algorithm essentially segments predictors into a few groups for interpretation (Carrera & Kim, 2020). In PV solar power output forecasting, a decision tree regression model is developed based on the minimum square error norm, using a method that recursively generates binary trees through the selection of optimum split points and features. For a regression-based decision tree, the function structure is not required to be pre-set as the structure of the decision tree can be adjusted according to the data set's characteristics, allowing DTs to deal with continuous and discrete variables simultaneously .
The processes that take place in a decision tree regressor starts with the regressor model being fitted using each input variable (Mohammed & Aung, 2016). The best split for each individual input variable is then determined by utilizing the mean squared error, with the maximum feature number for each split depending on the total feature number. A notable disadvantage with the utilization of DTs is that when a decision tree regression structure is too complex, overfitting or trapping in the local minimum points may occur (Carrera & Kim, 2020;Wang, Li, et al., 2018). The general structure of DTs can be seen as in Figure 5.
The hyperparameter tuning for the DT algorithm in this study is shown in Table 3. For the DT algorithm, almost all hyperparameters are kept as their default values as good forecasts are obtained. The time and space complexity of the DT algorithm are shown as follows: Training time complexity = O(nd log(n)) (1) where n is the number of data points and d is the number of dimensions.

Random forest (RF)
An RF represents an ensemble method that improves learning performances using a system of voting given a number of decision trees that is set (Meng & Song, 2020;Su et al., 2019;Tato & Brito, 2019), and can be utilized to solve problems based on classification and also regression (Tato & Brito, 2019). To solve regression-based problems, RFs essentially construct numerous decision trees during training and produces an output which is the mean prediction of the decision trees (Carrera & Kim, 2020).
RFs are based on the idea that a group of learners that are weak are able to combine to become a strong learner (Mohammed & Aung, 2016). In the solar irradiation forecasting and PV solar power output forecasting fields, RFs have been customarily used as their applicability  and capability in these fields are proven through existing research (Meng & Song, 2020;Tato & Brito, 2019). This ensemble method is able to reduce overfitting risk using the process of balancing decision trees and is useful especially when handling small samples (Meng & Song, 2020). RFs function by training a large number of simple decision trees, with each decision tree trained through the utilization of random subsets from the data set (Tato & Brito, 2019). This enables RFs to identify data patterns (Tato & Brito, 2019). The results of each decision tree are then aggregated (Tato & Brito, 2019). As PV solar power output is heavily impacted by weather and meteorological factors including solar irradiance, humidity, and temperature, the data series is typically very noisy. Noisy data affects the performance of an ML model as it reduces the model's ability to perform generalization (Meng & Song, 2020). RFs are suitable for implementation in PV solar power output forecasting due to its characteristics, namely random feature selection, bootstrap sampling, out-of-bag error estimation, and full depth decision tree growing (Meng & Song, 2020). RFs reduce noise in data by utilizing the random feature selection and bootstrap sampling (Meng & Song, 2020). RF algorithms first extract a few samples using the bootstrap sampling after inputting data samples, then selects the features of these samples randomly, hence enabling them to handle noisy data better (Meng & Song, 2020). One of the primary drawbacks of the RF algorithm is the complexity in obtaining an interpretation of causal links between predictors and outputs due to the utilization of multiple decision trees within the RF algorithm, hence this algorithm is useful in situations where the need for high prediction accuracy is prioritized over the need for interpretation (Aria et al., 2021;Wongvibulsin et al., 2019).
The general steps in RF algorithm prediction are preceded by selecting samples through the bootstrap method (Meng & Song, 2020;Su et al., 2019). These selected samples as regarded as training sets. An initial tree is grown in each set, then the best node split of this tree is calculated according to its features. Next, the nodes are split so that the samples are from the same class. Lastly, all the trees are aggregated into a forest, with the RF algorithm's prediction represented by mean value of the trees' results. RFs employ a few main parameters, specifically the criterion index, number of estimators, and maximum features. Each of these parameters have specific functions in RF prediction processes. The general structure of RFs can be seen as in Figure 6.
The hyperparameter tuning for the RF algorithm in this study is shown in Table 4. For the RF algorithm, almost all hyperparameters are kept as their default values as good forecasts are obtained. The time and space  Training time complexity = O(ndk log(n)) (3) where n is the number of data points, d is the number of dimensions, k is the number of trees, and t is the number of nodes.

Extreme gradient boosting (XGB)
The XGB algorithm represents an ensemble method based on gradient boosting that is supervised and used in many fields of practical application and has frequently produced performances better than RFs, SVMs, and deep neural networks (Stojić et al., 2019). This boosting ensemble method works by trying to reduce a model's bias as it sequentially trains different models to improve earlier models produced (Carrera & Kim, 2020). XGBs are computed as the leaf weight sum in a decision tree and they generate output through a final score obtained by summing up all weights from the decision or CART trees (Qian et al., 2020). XGB models aim to select the splitting threshold for each tree node and then choose weights that are optimal (Qian et al., 2020). For XGBs, the total loss function can be described as a squared loss plus regularization term to decrease the complexity of a model (Qian et al., 2020). Similar to RFs, XGBs essentially aggregates weak classifiers, namely decision or CART trees, to generate a strong predictive model (Qian et al., 2020). However, unlike RFs which split features to reduce loss functions, XGB converts loss functions into a brand new scoring function to select the most suitable threshold (Qian et al., 2020). This ensemble method can use varieties of data samples for performance enhancement, as long as every CART tree's performance is slightly better than random guessing (Qian et al., 2020). Compared to traditional gradient boosting, XGB utilizes a better regularization technique to handle the problem of overfitting (Carrera & Kim, 2020). A limitation of the XGB algorithm is its inability to work with non-numeric or categorical data by itself, hence needing hybridization or prior data transformation in cases where non-numeric data is present (Dey et al., 2016;Jafarzadeh et al., 2021). However, as the present study only works with numeric data, the performance of the XGB algorithm is not affected. Additionally, the XGB algorithm also struggles to interpret causal links between predictors and outputs, similar to the RF algorithm (Aria et al., 2021;Wongvibulsin et al., 2019). The general XGB framework is described by Figure 7.
The general hyperparameters, booster hyperparameters, and learning task parameters are kept as their defaults values as good forecasts are obtained. The general hyperparameter tuning which guide the overall functioning of the XGB algorithm is shown in Table 5. The time and space complexity of the XGB algorithm are as follows: Training time complexity = O(ndk log(n)) (5) Figure 7. The general XGB framework (Liu & Liu, 2022). where n is the number of data points, d is the number of dimensions, k is the number of trees, t is the number of nodes, and m is the output value for each leaf in trees.

Artificial neural network (ANN)
ANNs represent heuristic ML algorithms that use adjustable weights to interconnect non-linear elements (Alomari et al., 2018;Theocharides et al., 2018). They are robust algorithms that can be used for the interpretation of real-time sensor data and have been commonly utilized in the PV solar energy field for both stand-alone PV systems and large-scale PV systems (Alomari et al., 2018).
ANNs are based on the biological functions involved in the human brain. ANNs depend on neurons, which are essentially units that process inputs to produce outputs using an activation function (Jawaid & Nazirjunejo, 2017;Theocharides et al., 2018), which is also known as a transfer function. Neuron inputs may comprise of external stimuli or direct outputs from other neurons (Li et al., 2016;Theocharides et al., 2018). Using the transfer function, neurons compute the sum of input weights to produce output (Li et al., 2016). Through training, the weighted sums are adjusted to minimize errors in training data, using metrics namely mean squared error (MSE), root mean squared error (RMSE), and summed squared error (SSE) (Li et al., 2016).
The training processes or algorithms commonly used in ANNs to optimize weights include back propagation, the quasi-Newton method, and the Levenberg-Marquardt method (Li et al., 2016). To avoid the issue of overfitting in ANNs, regularization is typically used by controlling the effective complexity of the ANN (Theocharides et al., 2018). ANNs mainly consist of three layers, which are the input layer, the hidden layer, and the output layer, which are made of neuron nodes (Li et al., 2016). The input layer of an ANN receives raw data, which is then processed by the hidden layers and forwarded to the output layer to be delivered as computed information (Alomari et al., 2018).
An advantage of ANNs is that assumptions are not required for the inputs or outputs (Li et al., 2016). The ANN model structure, which includes the number of hidden layers, the number of neurons in each hidden layer and the learning algorithm, only needs to be defined for the model to work (Li et al., 2016). However, a  disadvantage of the ANN algorithm is that it is very reliant on hardware capability and requires high computational power. On top of that, there are no specific rules in modeling or coding the ANN structure and hyperparameters, hence processes of trial-and-errors are required to achieve an ANN architecture suitable to solve a specific problem (Mijwil, 2018;Poblete et al., 2017;Zor et al., 2017) The ANN general structure can be seen as shown in Figure 8. The hyperparameter tuning for the ANN algorithm in this study is shown in Table 6. All other unmentioned hyperparameters including initializer, regularizer, and constraint hyperparameters are kept as their default values as good forecasts are obtained. The time and space complexity of the ANN algorithm are shown as follows: where n is the number of data points, e is the number of epochs, i is the number of input layer neurons, j is the number of second layer neurons, k is the number of third layer neurons, l is the number of output layer neurons, and z is the total number of neurons.

Long short-term memory (LSTM)
LSTMs are among the most advanced and special recurrent neural networks (RNNs). They have produced excellent results in existing research (AlKandari & Ahmad, 2019; Chen et al., 2020). RNNs are essentially neural networks with neurons that connect recurrently, hence enabling neural networks to learn from present and past information to produce better predictions or solutions (AlKandari & Ahmad, 2019). LSTM was proposed as a solution to the issue of the conventional RNN, where useful information was challenging to obtain due to the issues of gradient vanishing and explosion (AlKandari & Ahmad, 2019; Li et al., 2017). LSTMs are different from conventional RNNs, as LSTMs utilize memory blocks that are connected by successive layers Li et al., 2017). These memory blocks allow LSTMs to memorize useful input training data selectively through a unique structure of three gates. This enables LSTMs to learn nonlinear tasks with multivariate influences (AlKandari & Ahmad, 2019; Chen et al., 2020). This three-gate structure consists of a forgetting gate, an input gate, and an output gate (AlKandari & Ahmad, 2019; Chen et al., 2020;Li et al., 2017). The forgetting gate functions to reset memory blocks when status expiry occurs, the input gate allows the modification of memory cells' states by incoming signals, while the output gate controls the memory cell state from influencing other neurons (Li et al., 2017). By adjusting these three gates, LSTMs have the ability to read, store, and erase data from the memory blocks as needed (AlKandari & Ahmad, 2019), hence making LSTMs effective in handling long time dependencies (Li et al., 2017). With regards to limitations, the LSTM algorithm is also computationally expensive, similar to the ANN algorithm, as it is a deep learning ML algorithm (Belagoune et al., 2021;Pan et al., 2018). The LSTM algorithm may also require more time to train, depending on the difficulty of the problem to be solved and the utilized LSTM architecture (Choi & Han, 2020). In addition to that, dropout regularization and early call-back mechanisms are needed to minimize overfitting effects, as the LSTM is prone to overfitting (Denkena et al., 2020; ( Figure 9). The hyperparameter tuning for the LSTM algorithm in this study is shown in Table 7. All other unmentioned hyperparameters including initializer, regularizer, and constraint hyperparameters are kept as their default values as good forecasts are obtained. As LSTMs are local in time and space (Tsironi et al., 2017), the overall computational complexity of an LSTM per each time step is defined by: where w is the number of weights.

Data pre-processing
This section explains the pre-processing steps performed on the raw data set obtained from the NREL. The data pre-processing steps include the handling of missing data, the handling of negative values, data splitting into a training set and a test set, feature scaling, and selection of input parameters.

Missing data
To deal with missing values within the raw data set in the present study, listwise deletion is employed. This is because the missing data is regarded as missing completely at random (MCAR) according to Kang (2013), due to equipment problems or failure to meet quality assessments thresholds (Marion et al., 2014). Listwise deletion involves the complete removal of the rows of data containing missing parameters. The listwise deletion is the most optimal strategy to be used in this case study as the dataset is large and the assumption of MCAR is satisfied (Kang, 2013).

Negative values
Within the data set provided by the NREL, there exists negative readings for several weather parameters, which are direct normal irradiance (DNI), direct horizontal irradiance (DHI), and global horizontal irradiance (GHI). This should not be possible as irradiance cannot be negative. These negative irradiance measurements were most probably caused by slight errors in the application and calibration of measuring equipment or zeroing. The irradiance readings should be nil when there is no presence of sunlight, in particular during the night time (Rana & Rahman, 2020). In order to preserve the logic and accuracy of the developed ML models, the rows of data containing negative readings of DNI, DHI, or GHI readings are completely removed.

Data partitioning
The monocrystalline silicon PV module studied was deployed by the NREL at the FSEC in Cocoa, Florida, from 21st January 2011 to 3rd March 2012, which accounts for 407 days. The weather and PV solar power output data is provided with a time step of 5 min from approximately 0730 h to 1730 h every day. In total, the provided data accounts for a total of 38,989 rows. After cleaning up the data set from the aforementioned missing and negative values, the remaining data totals up to 37,131 rows. The cleaned-up data set is partitioned to form a training and a testing set. The purpose of the training set is to develop the ability of the ML models in recognizing data patterns and making forecasts, while the testing set is required for observation and evaluation of the ability of ML models in making forecasts based on its training. Based on the study by Kannangara et al. (2018), the 80:20 ratio for training and testing, respectively, is deemed to be the most optimum data partitioning ratio, as results in this particular study did not improve after increasing the training set ratio past 80%. Therefore, with reference to the study by Kannangara et al. (2018) and existing studies on PV solar power output forecasting by Alomari et al. (2018) and Khandakar et al. (2019), the data for the present study is partitioned so that about 80% and 20% of the total data is allocated for training and testing respectively. Following this ratio, the data from 21st January 2011 to 2nd December 2011 is used for training while the data from 5th December 2011 to 4th March 2012 is used for testing.

Feature scaling
Feature scaling is a data pre-processing step that normalizes or standardizes the range of input parameters within a data set. This step is generally performed when the ML algorithm to be used is sensitive to the scale of input parameter data, as is the case with deep learning algorithms. Therefore, feature scaling is performed in the present study when utilizing the deep learning algorithms, which are ANN and LSTM. Feature scaling the input parameter data before feeding them to the ANN and LSTM algorithms ensures that data is weighted accurately for effective, accurate, and fast training. On the other hand, DT, RF, and XGB do not require feature scaling as they represent tree-based methods that are not sensitive to the scale of input parameter data.  Meng and Song (2020) Total solar irradiance, atmospheric pressure, wind speed, dry bulb temperature, relative humidity, PM2.5 concentration. Khandakar et al. (2019) Dry bulb temperature, relative humidity, back-surface temperature, panel of array irradiance, dust accumulation, wind speed. Van Tai (2019) Panel of array irradiance, back-surface temperature, wind speed. AlKandari and Ahmad (2019) Panel of array irradiance, dry bulb temperature, relative humidity, daily accumulated precipitation. Nespoli et al. (2019) Dry bulb temperature, global horizontal irradiance, panel of array irradiance, wind speed, wind direction, atmospheric pressure, daily accumulated precipitation, cloud cover, cloud type. Tato and Brito (2019) Global horizontal irradiance, direct normal irradiance Theocharides et al. (2018) Incident global irradiance, relative humidity, wind direction, wind speed, dry bulb temperature. Dolara et al. (2018) Dry bulb temperature, global horizontal irradiance, panel of array irradiance, wind speed, wind direction, atmospheric pressure, daily accumulated precipitation, cloud cover, cloud type. Wang, Li, et al. (2018) Panel of array irradiance, dry bulb temperature. Persson et al. (2017) Dry bulb temperature, cloud cover, relative humidity, daily accumulated precipitation, wind speed. Touati et al. (2017) Panel of array irradiance, accumulated dust, relative humidity, dry bulb temperature, back-surface temperature. Rana et al. (2016) Panel of array irradiance, dry bulb temperature, relative humidity, wind speed.

Selection of input parameters
The selection of input parameters in the present study is performed with consideration to existing studies. The input parameters are chosen through review of the parameters that have been established and confirmed by existing literature to have influence on the generation of PV solar power. Based on Table 8, it can be found that there are several input parameters that have been utilized frequently in existing studies, namely the panel of array irradiance, the relative humidity, and the drybulb temperature. The frequency of the utilization of these input parameters highlights their importance in the forecasting of PV solar power output. Other input parameters that have been used but less frequently include the daily accumulated precipitation, PV panel back-surface temperature, global horizontal irradiance, atmospheric pressure, diffuse horizontal irradiance, and direct normal irradiance. The data set supplied by the NREL (Marion et al., 2014) comprises of 43 data parameters relating to weather and PV solar power output. The main objective in the present study is to use the available weather data to forecast the PV solar power generation, which is defined in the present study as the instantaneous maximum solar power output, denoted as P. With consideration to the input parameters used in existing studies as shown in Table 8, the data of nine weather parameters are selected and extracted from the available data set for use in the present study and is shown in Figure 10. A statistical analysis of the data parameters used is shown in Table 9.
The Pearson correlation coefficient, utilized in the research by (Jumin et al., 2020), is used in the present study to determine the weather parameters that have the highest correlation with the output parameter, which is the PV solar power output, P. The Pearson correlation coefficient is essentially the ratio of two variables' covariance to the product of the said two variables' standard deviation. This coefficient, denoted by r xy , is defined as follows: wherex,ȳ are respective data means; x i , y i are individual respective data points; and n is the sample size.
Based on the Pearson correlation matrix constructed in Figure 11, it can be found that PAI has a perfect correlation with P. Input parameters BST, DNI, and GHI have a strong positive correlation with P, while RH has a moderate negative correlation with P. Therefore, these input parameters may possibly have strong predictive powers for the forecasting of PV solar power output. DBT, ATP, DAP, and DHI are shown to have weak correlations with P.  Figure 11. Pearson's correlation matrix based on available data parameters.
A sensitivity analysis is utilized to determine the best input parameter combination for the forecasting of PV solar power output, P. The sensitivity analysis is performed by designing different cases of input parameter combinations, as demonstrated in studies by Abobakr Yahya et al. (2019), Sattari et al. (2020), and Zhou et al. (2020). This is crucial to establish the most effective input parameter combination for accurate PV solar power output forecasting. In the present study, 20 different cases were designed. Cases 1-10 were designed to investigate the effect of omitting each input weather parameter on the PV solar power output forecasting accuracy, which also helps in understanding their individual importance in the forecasting results. Cases 11-20 were designed by considering the Pearson's correlation coefficient between PV solar power output and input weather parameters; and the input parameter combinations that have been successfully employed in existing studies, as shown in Table 10.

Evaluation measures
For evaluation, two types of analyses are performed which are a performance analysis and an uncertainty analysis. Performance analyses are conducted on all of the ML models, to evaluate their accuracy and to decide the best performing model. After that, an uncertainty analysis is performed on the best performing ML model in order to determine the capability of the model in handling uncertainties or variability of input.

Performance analysis
The performance analysis measures utilized are the mean absolute error (MAE), mean absolute percentage error (MAPE), root mean squared error (RMSE), coefficient of determination (R 2 ), and ranking mean (RM), which are selected with reference to existing research (Ahmed et al., 2021;Carrera & Kim, 2020;Khandakar et al., 2019;Meng & Song, 2020;Theocharides et al., 2018). MAE, MAPE, and RMSE, have been shown to be the most popular performance analysis measures used in ML prediction research according to a study by (Botchkarev, 2019). The selected performance measures are described and explained in the following sections.

Mean absolute error (MAE
where y i is the real value, y i is the predicted value, and n is the sample size.

Mean absolute percentage error (MAPE).
MAPE is a measure which determines accuracy of forecasts from a forecasting model, hence it is often deemed as the fairest and most important precision evaluator (Meng & Song, 2020). A lower MAPE is generally desired. The equation that defines the MAPE is as follows: where y i is the real value, y i is the predicted value, and n is the sample size.

Root mean squared error (RMSE).
The RMSE represents a common accuracy evaluator (Meng & Song, 2020) that is an especially good indicator of large errors. Therefore, a lower RMSE is generally desired. The RMSE is measured in units of Watts (W) in the present study. The equation for determining the RMSE is as follows: where y i is the real value, y i is the predicted value, and n is the sample size.

Coefficient of determination (R 2 ). R 2 computes correlation between real and predicted values. It essen-
tially indicates the predictions model's performance in comparison with the naive mean model. Values of R 2 always lie between 0 and 1, with a value nearer to 1 indicating a higher correlation between real and predicted values. The equation for calculating R 2 is: where y i is real value, y i is predicted value, y i is the mean of y i , and n is sample size.

Ranking mean (RM).
The RM is a metric to determine the best overall model in a comparison of multiple performance analysis measures (Ahmed et al., 2021). In the present study, there are four performance analysis measures selected which include MAE, MAPE, RMSE, and R 2 . The developed models are ranked in a numerically ascending sequence from the best to the worst values based on each of the aforementioned performance analysis measures. Then, the RM of a model is obtained by calculating the mean of the rankings, with the model with the lowest ranking established as the best overall performing model and the model with the highest ranking deemed as the poorest overall performing model.
The RM is generally calculated as: where n is the number of performance analysis measures used, which is 4.

Uncertainty analysis
The uncertainty analysis measure used are the P-factor and the d-factor, which are utilized in existing studies (Jumin et al., 2020;Mehan et al., 2017;Noori et al., 2010;Singh et al., 2014;Yaseen et al., 2019;Zhao et al., 2018). These measures indicate the goodness of fit and the degree that the forecasting models are able to account for any uncertainties. These two uncertainty analysis measures are described and explained in the following sections.

P-factor.
The P-factor is specified as the data percentage bracketed by the 95% prediction uncertainty (95PPU). The 95PPU is calculated at the 2.5% and 97.5% percentiles of cumulative distribution of the predicted variable. The best P-factor value is 1, which suggests that 100% of the predicted values are within the 95PPU brackets. Theoretically, the P-factor ranges between 0% and 100%. The P-factor typically depends on the quality of the data set. A data set of good quality should result in 80% to 100% of the predicted values being within the 95PPU, while a data set with poor quality should produce at least 50% of predicted values within the 95PPU. The average distance of variable x,d x between the upper and lower 95PPU is defined through the following equation: where n is the sample size, X U is the upper percentile value of the 95PPU, and X L is the lower percentile of the 95PPU.

d-factor.
The d-factor is represented as the ratio of average 95PPU band thickness of a predicted data set to the standard deviation of that predicted data set. The range of desirable d-factor values is typically less than 1, however, the d-factor may range between 0 and ∞. The equation that is utilized to compute the d-factor is shown as follows: whered x is the average distance of variable x between the upper and lower 95PPU, and σ x is the standard deviation of variable x.

Results and discussion
This section presents and discusses the performances of the developed ML models. The best models developed by each ML algorithm are then compared and analysed. Table 11 shows the performance scores of the 20 models tested using the DT algorithm. It can be seen that DT10 produces the best MAE, MAPE, RMSE, and R 2 with scores of 0.6183, 0.0388, 1.0722, and 0.9983 respectively. It can also be observed that all the DT models, with the exception of DT2 and DT19, have relatively similar performances. DT2 and DT19 produce relatively poor performances compared to the other DT models. The similarity shared between these two models is the exclusion of PAI as an input parameter. This highlights the importance of PAI as an input parameter to obtain highaccuracy PV solar power output forecasts using the DT algorithm. By utilizing the RM metric, it is established that the best model produced using the DT algorithm is DT10. This model ranks first in all the performance analysis measures, hence obtaining the highest RM among all the tested DT models with a value of 1. DT10 uses all available input parameters except ATP. As the performance of the DT10 model is enhanced when ATP is not included as an input parameter, it can be deduced that ATP acts as noise that hinders the capability and performance of the DT algorithm in making forecasts of PV solar power output.   Figure 12 shows that the plotted points based on forecasted and actual PV solar power outputs are ideally close to the regressed diagonal line, as the R 2 of the DT10 model is high with a score of 0.9983. This indicates that the DT10 model has a strong goodness of fit, and majority of the predicted PV solar power outputs are relatively close to the actual PV solar power outputs.

Decision tree (DT) models' performance
To further interpret the DT10 results, several dates within the test set which are between 5th December 2011 and 3rd March 2012, are selected for comparison of the forecasted and actual PV solar power outputs. Figure 13 indicates that the forecasted PV solar power outputs by the DT10 model are very similar to the actual PV solar power outputs. This further demonstrates the accuracy of the DT10 model in forecasting PV solar power output. Table 12 shows the performance scores of the 20 models tested using the RF algorithm. It can be seen that RF10 produces the best MAE and MAPE with scores of 0.4852 and 0.0314 respectively, and RF16 produces the best RMSE with a score of 0.9198. The highest R 2 produced is 0.9987, which is exhibited by RF10, RF16, and several of the other RF models. It can also be observed that all the RF models, with the exception of RF2 and RF19, have relatively similar performances. RF2 and RF19 produce relatively poor performances compared to the other RF models. The similarity shared between these two models is the exclusion of PAI as an input parameter. This highlights the importance of PAI as input for accurate forecasts of PV solar power output using the RF algorithm. By utilizing the RM metric, it is established that the best model produced using the RF algorithm is RF10. This model ranks first in the MAE, MAPE, and R 2 performance measures, and second in the RMSE performance measure. RF16 ranks first in the RMSE performance measure. Therefore, this gives RF10 the highest RM with a value of 1.25. RF10 uses all available input parameters except ATP. As the performance of the RF10 model is  enhanced when ATP is not included as an input parameter, it can be deduced that ATP acts as noise that hinders the capability and performance of the RF algorithm in making forecasts. Figure 14 shows that the plotted points based on forecasted and actual PV solar power outputs are ideally close to the regressed diagonal line, as the R 2 of the RF10 model is high with a score of 0.9987. This indicates that the RF10 model has a strong goodness of fit, and majority of the forecasted PV solar power outputs are relatively close to the actual PV solar power outputs.

Random forest (RF) models' performance
To further interpret the RF10 results, several dates within the test set which are between 5th December 2011 and 3rd March 2012, are selected for comparison of the forecasted and actual PV solar power outputs. Figure 15 indicates that the forecasted PV solar power outputs by the RF10 model are very similar to the actual PV solar power outputs. This further demonstrates the accuracy of the RF10 model in forecasting PV solar power output. Table 13 shows the performance scores of the 20 models tested using the XGB algorithm. It can be seen that XGB13 produces the best MAE and MAPE with scores of 0.6208 and 0.0392 respectively, and XGB7 produces the best RMSE and R 2 with scores of 1.0667 and 0.9983 respectively. It can also be observed that all the XGB models, with the exception of XGB2 and XGB19, have relatively similar performances. XGB2 and XGB19 produce relatively poor performances compared to the other RF models. The similarity shared between these two models is the exclusion of PAI as an input parameter. This highlights the importance of PAI as an input parameter for accurate forecasts of PV solar power output using the XGB algorithm. By utilizing the RM metric, it can be found that XGB7 and XGB13 are tied with an RM of 1.75. XGB7 is ranked first based on the RMSE and R 2 performance measures, while XGB13 is ranked first based on the MAE and MAPE performance measures. However, the best model produced using the XGB algorithm is determined to be XGB7 due to the relatively larger difference in RMSE compared to the differences in the other performance measures. XGB7 uses all available input parameters except DHI. As the performance of the XGB7 model is enhanced when DHI is not included as an input parameter, it can be deduced that DHI acts as noise that hinders the capability and performance of the XGB algorithm in making forecasts of PV solar power output.  Figure 16 shows that the plotted points based on forecasted and actual PV solar power outputs are ideally close to the regressed diagonal line, as the R 2 of the XGB7 model is high with a score of 0.9983. This indicates that the XGB7 model has a strong goodness of fit, and majority of the forecasted PV solar power outputs are relatively close to the actual PV solar power outputs.

Extreme gradient boosting (XGB) models' performance
To further interpret the XGB7 results, several dates within the test set which are between 5th December 2011 and 3rd March 2012, are selected for comparison of the forecasted and actual PV solar power outputs. Figure 17 indicates that the forecasted PV solar power outputs by the XGB7 model are very similar to the actual PV solar power outputs. This further demonstrates the accuracy of the XGB7 model in forecasting PV solar power output. Table 14 shows the performance scores of the 20 models tested using the ANN algorithm. It can be seen that ANN4 produces the best MAE with a score of 0.4654, and ANN16 produces the best MAPE and RMSE with scores of 0.0338 and 0.8816 respectively. ANN4, ANN16, and two of the other ANN models produce the highest R 2 with a score of 0.9988. It can also be observed that all the ANN models, with the exception of ANN2 and ANN19, have relatively similar performances. ANN2 and ANN19 produce relatively poor performances compared to the other ANN models. The similarity shared between these two models is the exclusion of PAI as an input parameter. This highlights the importance of PAI as an input parameter for accurate forecasts of PV solar power output using the ANN algorithm. By utilizing the RM metric, it is established that the best model produced using the ANN algorithm is ANN16. This model ranks first in the MAPE, RMSE, and R 2 performance measures, and second in the MAE performance measure. ANN4 ranks first in the MAE performance measure. Therefore, this gives ANN16 the highest RM with a value of 1.25. ANN16 uses   all available input parameters except DAP and ATP. As the performance of the ANN16 model is enhanced when DAP and ATP are not included as input parameters, it can be deduced that DAP and ATP act as noise that hinders the capability and performance of the ANN algorithm in making forecasts of PV solar power output. Figure 18 shows that the plotted points based on forecasted and actual PV solar power outputs are ideally close to the regressed diagonal line, as the R 2 of the ANN16 model is high with a score of 0.9988. This indicates that the ANN16 model has a strong goodness of fit, and majority of the forecasted PV solar power outputs are relatively close to the actual PV solar power outputs.

Artificial neural network (ANN) models' performance
To further interpret the ANN16 results, several dates within the test set which are between 5th December 2011 and 3rd March 2012, are selected for comparison of the forecasted and actual PV solar power outputs. Figure 19 indicates that the forecasted PV solar power outputs by the ANN16 model are very similar to the actual PV solar power outputs. This further demonstrates the accuracy of the ANN16 model in forecasting PV solar power output. Table 15 shows the performance scores of the 20 models tested using the LSTM algorithm. It can be seen that LSTM1 produces the best MAE with a score of 5.4499, and LSTM8 produces the best MAPE, RMSE, and R 2 with scores of 0.3115, 10.0652, and 0.8474 respectively. It can also be observed that all the LSTM models have relatively similar performances, as the performance scores indicate that there are no particularly outstanding scores. This means that the addition or removal of any of the available input parameters does not significantly impact the performance of the LSTM algorithm in forecasting PV solar power output. By utilizing the RM metric, it is established that the best model produced using the LSTM algorithm is LSTM8. This model ranks first in the MAPE, RMSE,  and R 2 performance measures, and second in the MAE performance measure. LSTM1 ranks first in the MAE performance measure. Therefore, this gives LSTM8 the highest RM with a value of 1.25. LSTM8 uses all available input parameters except DBT. As the performance of the LSTM8 model is enhanced when DBT is not included as an input parameter, it can be deduced that DBT acts as noise that hinders the capability and performance of the LSTM algorithm in forecasting PV solar power output.  (20) 10.5997 (12) 0.8308 (12) 16 LSTM20 6.0188 (13) 0.3559 (8) 10.5444 (10) 0.8326 (10) 10.25

Long short-term memory (LSTM) models' performance
Note: Best algorithm, model, and scores in bold. Figure 20 shows that majority of the plotted points based on forecasted and actual PV solar power outputs do not accurately sit on the regressed diagonal line, as the R 2 of the LSTM8 model is moderate with a score of 0.8474. This indicates that the majority of forecasted values significantly differ to the actual values.
To further interpret the LSTM8 results, several dates within the test set which are between 5th December 2011 and 3rd March 2012, are selected for comparison of the forecasted and actual PV solar power outputs. Figure 21 indicates that the LSTM8 model forecasts trends correctly, however, it is not able to forecast values accurately.

Overall evaluation of the best models
Based on the performance score comparison of the best models for each ML algorithm, as can be seen in Table 16, ANN outperforms the other ML algorithms in terms of the MAE, RMSE, and R 2 scores, while RF has the superior MAPE score. The RF algorithm produces the best MAPE score as it looks to have better ability in minimizing the average percent deviation between the forecasted and actual solar power output in the present study. However, it may be seen that the ANN algorithm comes in a very close second in terms of the best MAPE score. It can also be observed that there is a tiny difference of about 0.0001 with regards to the best R 2 scores of the ANN and RF algorithms, which makes both of these algorithms very comparable as they outperform the other tested algorithms in this regard.
Other than that, there is a noticeable contrast in the performance scores between the LSTM algorithm and the other algorithms. Based on Table 16, it can be deduced that the performance of the LSTM algorithm is significantly poor in comparison to the other algorithms tested in the present study. Using the RM metric, ANN is determined as the best performing algorithm for the forecasting of PV solar power output. The ANN16 model ranks first in the MAE, RMSE and R 2 performance measures, and second in the MAPE performance measure. This gives it the highest RM with a value of 1.25. This makes the RF algorithm the second-best performing algorithm for the forecasting of PV solar power output in the present study. Based on the RM of the best models produced by the other algorithms, it is concluded that DT is the third-best performing algorithm in the present study, followed by the XGB algorithm as the fourth best. The LSTM algorithm is found to be the poorest performing algorithm for PV solar power output forecasting in the present study, as the LSTM8 model obtains a ranking of 5 for all performance measures, hence giving it the lowest RM with a value of 5.
A bar chart depicting the ranking mean for the best models based on each ML algorithm can be observed in Figure 22. Additionally, a spider chart visualizing the rankings of the best models based on each ML algorithm with regards to each performance measure is shown in Figure 23. Figure 24 reveals that the ANN, RF, DT, and XGB algorithms are able to forecast the PV solar power output with good accuracy on the selected dates within the test set between 5th December 2011 and 3rd March 2012. On the other hand, the LSTM algorithm is not able to forecast the PV solar power outputs as accurately as the other tested ML algorithms, but it is still able to correctly forecast the increasing or decreasing trends of the PV solar power output.

Uncertainty analysis of the best model
As aforementioned, an uncertainty analysis is to be performed on the best model developed in the present study in order to evaluate the capability of the model in handling uncertainties. Therefore, an uncertainty analysis is performed on the ANN16 model using the P-factor and d-factor metrics. It is found that the P-factor, which is the percentage of predictions within the 95PPU range, is 95.49% and the d-factor is 0.00049, as can be seen in Table 17. As the P-factor is between 80% and 100% while the d-factor is less than 1, it can be determined that the ANN16 has an acceptable goodness of fit and is able to accommodate for uncertainties sufficiently (Jumin et al., 2020;Singh et al., 2014).

Comparison of ANN16 performance scores with other studies
The present study's performance scores are compared with performance scores from existing comparable literature, in which the instantaneous solar power output is forecasted. The MAPE and R 2 performance indicators are selected for comparison as they are consistent in units across the available literature. The comparison of performance scores can be seen as shown in Table 18. It can be observed that the MAPE score of the ANN algorithm in the present study is the best compared to the existing comparable literature, while the R 2 score is among the   highest. This highlights the effectiveness of the methods used to develop the ANN16 model in the present study. Therefore, this model is feasible in real-world applications and may be adopted to obtain accurate solar power output forecasts.

Input parameter analysis
The findings in the present study highlight the fact that different algorithms utilize different input parameter combinations to produce the best PV solar power output forecasting models, as shown in Table 19. The input parameters omitted also differ between the algorithms, hence indicating that different input parameters act as noise depending on the utilized algorithm. It is evident a specific input combination that simultaneously produces the best PV solar power output forecasting performance for all algorithms does not exist, as different algorithms work differently. Due to this, it is clear that extensive testing, as has been performed in the present study, is required to determine the best input parameter combination for each algorithm and to differentiate between key input parameters and noisy input parameters for each algorithm. Five input parameters, namely PAI, GHI, DNI, BST, and RH, were discovered to have been utilized in all the best models, as shown in Table 19. This underlines the importance of these five input parameters in obtaining good PV solar power output forecasts. Therefore, these five input parameters can be generally  considered in developing PV solar power output forecasting models in real-world applications using any ML algorithm. An additional point that can be noted is that the ANN algorithm requires a lesser number of input parameters to produce its best PV solar power output forecasting performance, compared to the other tested algorithms.

Analysis of additional factors impacting forecasting performances
The performance scores in Table 16 shows that the ANN and RF are the two best performing algorithms for the   Table 20, it can be seen that although the output parameters are similar, in which all studies are forecasting the solar power output, the forecast horizon differs between studies. Some of the different forecast horizons used are based on minute, hour, and day spans of time as shown. In addition, some studies forecast the total solar power output, while others forecast the instantaneous solar power output. These factors greatly affect the accuracy of solar power output forecasting (Konstantinou et al., 2021). With regards to the DT algorithm, Table 20 shows a similarity between the solar power output forecast studies using the DT algorithm, in which it excels when the forecast horizon is set as the daily total solar power output, especially in the study by Shetty et al. (2021) where the DT algorithm is shown to perform better than the established RF algorithm. However, in the present study, the instantaneous solar power output is forecasted in 5-minute intervals. This resulted in the DT algorithm producing inferior forecasting performances compared to the established algorithms, as shown in Table 16. By comparing the findings of the studies mentioned in Table 20, it can be understood that the instantaneous solar power output is much more volatile compared to the daily total solar power output. From this, it is inferred that the DT algorithm is more suitable for the forecasting of less volatile parameters, but not highly volatile parameters such as the instantaneous solar power output. The established algorithms (ANN, RF) do better in forecasting instantaneous solar power output as shown in the present study.
The XGB algorithm has been shown to produce better solar power output forecasting performances than both established algorithms (ANN, RF) in studies by Carrera and Kim (2020) and Munawar and Wang (2020). Based on Table 20, it can be noted that the similarity between these two aforementioned studies is that additional advanced methods were applied in their studies, namely grid search, subset selection, feature importance, and PCA, while the present study did not employ any of these advanced methods. Therefore, it is inferred that the XGB algorithm can produce better solar power output forecasting performances over the established algorithms (ANN, RF), however, the utilization of additional advanced techniques is required. To deploy these advanced techniques, a higher level of expertise, more computational power and more time are needed, which are not always able to be provided in real life scenarios.
In recent studies demonstrating the ability of the LSTM algorithm in forecasting solar power output (Harrou et al., 2020;Konstantinou et al., 2021;Zhou et al., 2020), it is shown that the LSTM algorithm is less accurate in forecasting when there are sharp fluctuations or non-linear changes in the actual solar power output, regardless of the parameters and methods used as shown in Table 20. The sharp fluctuations are typically caused by ramp events, in which solar power output suddenly drops due to the presence of cloud cover and ramps up again as soon as the cloud cover passes (Mohammed & Aung, 2016). The reduced accuracy and ability of the LSTM algorithm to forecast solar power output due to sharp fluctuations is further emphasized in the present study, as shown in Figure 24. At these sharp fluctuations, it can be observed that the LSTM algorithm still correctly forecasts the trends, however, at a lagged rate, hence causing reduced accuracy in the forecasts. The LSTM algorithm is known to make forecasts based on its training process and also consideration towards a number of past observations which is predetermined by a selected time step number. In the present study, upon extensive testing through the trial-and-error method, the most optimum time step number possible is found to be 120. This step number accounts for weather observations of one and a half days duration. Therefore, when the LSTM algorithm in the present study makes forecasts on the PV solar power output for the current day, it takes into account the weather observations made in the past one and a half days, in which the weather may have been different compared to the current day. Ultimately, the lags occur possibly because the LSTM algorithm needs a certain number of time steps to correct its forecasts to be based on new weather observations from the current day, rather than making forecasts influenced heavily by weather observations from the past one and a half days. The LSTM algorithm is mainly effective and accurate when utilized in time series-based problems that depend heavily on time and have a clear time-based pattern. Although solar power output forecasting can be described to be a form of time series forecasting in which the solar power output can typically be expected to be low at the start or end of a day and high at mid-day, it is extremely volatile depending on weather changes. Therefore, it can be deduced that solar power output is more weather-dependent than it is time-dependent, whereas the LSTM algorithm heavily considers time in making forecasts. Due to these reasons, the LSTM algorithm produces the poorest forecasting performance compared to the other tested algorithms (DT, RF, XGB, ANN) which depend solely on input weather parameters with access to all data points within the training set (Gers et al., 2001). In summary, LSTM is the least suitable algorithm to be utilized in scenarios where solar power forecasting accuracy is of utmost importance and the required input data parameters are abundantly available. However, a considerable advantage of the LSTM algorithm is that reliable forecasts still may be obtained in situations where additional input data parameters are unavailable or limited. This is shown in studies by Harrou et al. (2020) and Konstantinou et al. (2021) in which solar power output was univariately forecasted using the LSTM algorithm.
The emergence of the ANN and RF algorithm in the present study as the best algorithms for solar power output forecasting supports findings from existing literature (Alomari et al., 2018;Dolara et al., 2018;Erduman, 2020;Jawaid & Nazirjunejo, 2017;Khandakar et al., 2019;Kim et al., 2019;Liu & Sun, 2019;Massaoudi et al., 2021;Meng & Song, 2020;Rana et al., 2016;Rana & Rahman, 2020;Su et al., 2019;Tato & Brito, 2019;Theocharides et al., 2018;Van Tai, 2019). From Table 20, the main point that can be taken is that the ANN and RF algorithms are very adaptable and robust, as they produce the best forecasting performance in multiple studies with different input parameters, data sets, forecast horizons and variations in the utilization of additional techniques. Given the adaptability of the ANN and RF algorithms, it can be generally deduced that they are the best algorithms to be chosen for utilization in real-world solar power output forecasting scenarios as they can be effectively applied to produce possibly the best forecasts without extensive beforehand study on the suitability of the available data set; type of input and output parameters; forecast horizon; and additional techniques to be utilized. However, in the context of the present study, it is shown that the ANN algorithm gives better forecasting performances over the RF algorithm. The ANN algorithm is capable of effectively learning and modeling non-linear and complex relationships and generalizing relationships between initial inputs. It also does not set boundaries on input parameters. These key advantages make the ANN algorithm more effective in the solar power output forecasting compared to other algorithms.
Future studies may focus on improving the predictive performance of the investigated ML algorithms in the present study through hybridization, as existing studies have shown PV solar power output forecasting accuracy to improve by combining different ML algorithms or using optimizers (AlKandari & Ahmad, 2019;Nespoli et al., 2019;Seyedmahmoudian et al., 2018;Su et al., 2019;VanDeventer et al., 2019). The necessary criteria that may be taken into account in order to be considered for hybridization is the performance and accuracy of a ML algorithm as a single algorithm. Based on the present study, as there is no hybridization performed, it is clear that the best performing single algorithms are the ANN and RF algorithms. Therefore, these two ML algorithms may represent the best algorithms to be hybridized, optimized, or improved using additional advanced techniques. With successful enhancement of PV solar power output forecasting accuracy, the developed hybrid predictive models may then be deemed practical for implementation and usage by power grid operators or stakeholders.

Conclusion
Given a historical data set on weather and PV solar power output, this research set out to identify the best ML algorithm for the forecasting of PV solar power output given recent promise in the DT, XGB, and LSTM algorithms shown by new research (Carrera & Kim, 2020;Harrou et al., 2020;Konstantinou et al., 2021;Munawar & Wang, 2020;Rahul et al., 2021;Shetty et al., 2021;Zhou et al., 2020). These promising algorithms (DT, XGB, LSTM) were compared in terms of PV solar power output forecasting performance with established algorithms in the field, namely the ANN and RF algorithms. Based on quantitative comparisons, the ANN algorithm is shown to produce the best PV solar power output forecasting performance in the present study. The ANN16 model, which utilizes all available weather input parameters except DAP and ATP, produces the best MAE, RMSE, and R 2 with scores of 0.4693, 0.8816 W, and 0.9988 respectively, and the highest RM with a score of 1.25 when compared with the best models of the other tested algorithms. The RF algorithm obtained the best MAPE with a score of 0.0314 by the RF10 model. Through comparison, the best to worst algorithms for the forecasting of PV solar power output in the present study can be sequenced as ANN, RF, DT, XGB, LSTM. The MAPE and R 2 scores of the ANN16 model were compared to those from existing studies and it was found that the ANN16 model produced among the best MAPE and R 2 scores, hence highlighting its accuracy and feasibility in real-world applications.
The ANN and RF algorithms are generally the most adaptable and reliable algorithms for the task of PV solar power output forecasting, as they can be effectively utilized to produce the best forecasts without extensive beforehand study on the suitability of the forecasting scenario. Therefore, future research may focus more on improving and hybridizing the ANN or RF algorithms for PV solar power output forecasting. Other than that, it was inferred that the DT algorithm is more suitable for the forecasting of less volatile parameters such as the daily PV solar power output, but not highly volatile parameters such as the instantaneous PV solar power output; and the XGB algorithm is able to produce better PV solar power output forecasting performances over the established algorithms (ANN, RF), but requires the usage of additional advanced techniques that improve forecasting performances which require a higher level of expertise, more computational power, and more time. This research also found that different algorithms utilize different weather input parameter combinations to produce their best PV solar power output forecasting models. However, five weather input parameters, namely PAI, GHI, DNI, BST, RH were found to have been utilized by all the best models in the present study. This highlights the importance of these five weather input parameters in forecasting PV solar power output.
To conclude, the present study addresses a research gap in which the best performing algorithm for PV solar power output forecasting among the DT, RF, XGB, ANN, and LSTM algorithms was previously unknown. Through high accuracy forecasting of PV solar power output using the ANN algorithm or the ANN16 model specifically, it is hoped that solar or hybrid grid stakeholders may be aided in minimizing losses due to volatility, while maintaining a cost-effective grid that balances the supply and demand of solar power. This research supports existing studies suggesting the ANN algorithm as the best algorithm for PV solar power output forecasting (Alomari et al., 2018;Dolara et al., 2018;Erduman, 2020;Jawaid & Nazirjunejo, 2017;Khandakar et al., 2019;Rana et al., 2016;Su et al., 2019;Theocharides et al., 2018;Van Tai, 2019).