Heuristic grey-box modelling for photovoltaic power systems

ABSTRACT Amongst non-conventional generators, photovoltaics (PVs) are becoming more popular owing to their relatively low costs and convenience. However, the intermittency of the PV generator outputs require accurate forecasting, planning, and optimal management. Existing forecasting methods, which are based on either clear-box or black-box modelling, have room for improvement, especially in the accuracy of capturing the underlying PV characteristics and forecasting of PV yields. This paper explores the use of a priori knowledge of PV systems to heuristically improve their clear-box and black-box models. The paper then further explores the use of heuristic grey-box modelling to identify uncertain parameters of physical principle. Incorporating black boxes to account for such un-modelled uncertainties inherent in a clear box provides the resultant grey box with improved forecasting performance and offers a new approach to practical system modelling. The experimental results on an installed PV system have confirmed the usefulness of this approach.


Introduction
Photovoltaic (PV) is now positioned amongst the top three new power generation sources installed in Europe and this trend is expected to stay (Global Market Outlook for photovoltaics -2017. Power from PV sources provides a number of benefits over other renewable energy sources (RESs). It can be supplied locally to loads, reducing the cost of transmission lines and their associated power losses. Furthermore, advances in technology and large-scale manufacturing have led to the decline in PV costs at a steady rate (Swanson, 2009). Despite a substantial capital setup cost, PV operation and maintenance cost is almost zero (Renewable Energy Technologies: Cost Analysis Series, Photovoltaics, 2012).
Nonetheless, like other RES, PV sources pose a number of integration challenges to the grid. A major criticism of PV sources is the variability in irradiance due to cloud transients and diurnal effects, which has negative impacts on the grid's voltage profile (Stetz, Marten, & Braun, 2013;Woyte, Thong, Belmans, & Nijs, 2006). In particular, intermittency of the PV sources is a major factor affecting the load following capabilities of the grid (Ma et al., 2012). While it is recognized that advance knowledge of the expected yield from PV sources will help tackle these challenges, to allow for proper planning of available generation sources and provide insights into the impact of PVs CONTACT Yun Li yun.li@glasgow.ac.uk on the power network, accurate forecasting of PV yield is not a trivial task. This can be largely attributed to the noncontinuous and nonlinear nature of PV sources, which are due to the interplay of factors such as the variability in sunrise and the amount of sunshine, sudden changes in atmospheric conditions, cloud movements and dust (Cibulka, Brown, Miller, & Meier, 2012). The PV power data can thus be viewed as consisting of two parts: the deterministic and the stochastic. The former represents the mathematical equations of irradiance that depend on location, sun's position, and equations of PV cells, whilst the latter represents the sudden atmospheric changes such as dust, clouds, and wind blow. Various mathematical models, such as first principlebased clear-box models, have been developed to forecast the PV yield. With underlying PV physics, these models provide a structured and meaningful forecast of power outputs and typically require fewer parameters to estimate than other models (Forsell & Lindskog, 1997), such as a black-box-based generic approximation models. Compared with black-box models, however, clear boxes are less accurate, especially for large systems (Tao, Shanxu, & Changsong, 2010). This is because in reality, most systems are complex and nonlinear and often contain dynamics that are hard to capture mathematically.
Black-box models are data driven and can be based on statistics in approximating nonlinear systems. They have found broad applications in engineering practice, including handling nonlinear time series, as they are relatively simpler to use. For example, dynamic neural networks (DNN) such as the 'focused time-delay neural networks' (FTDNN) and the 'distributed time-delay neural networks' have been applied to the PV forecasting problem (Al-Messabi, Li, El-Amin, & Goh, 2012). Black-box models however require good data -with both quality and quantityfor an accurate forecast, and its lack of underlying physics makes it hard to understand the results and their causes. Such models are also difficult to design structurally due to the large number of parameters required because of missing physics.
This paper focuses on the development of a greybox model, which takes advantages of both the blackbox and the clear-box models (Tan & Li, 2002), for the identification of improved PV models. The work reported starts with a clear-box model followed by a simple blackbox model, and then grey-box models. Section 2 provides an overview of related work on clear-box models for solar PV modelling. Section 3 develops a clear-box PV model using manufacturer data. Section 4 develops a data-driven FTDNN black-box model. In Section 5, uncertain parameters in the clear-box model are identified and optimized using heuristic Direct Pattern search (PS) and particle swarm optimization (PSO) algorithms, resulting in an evolutionary clear-box model. In Section 6, the evolutionary model is extended to grey-box models to account for unknown effects at forecast. Results and observations are discussed in Section 7 and conclusions are drawn in Section 8.
Another approach to forecasting PV power is to model solar data using statistical methods. Regression models are used to express power values as a regression of previous power values, irradiance, and temperature (Kroposki, Emery, Myers, & Mrig, 1994). Statistical approaches adopt classical time-series forecasting methods that assume the data stationary. Auto-regressive (AR), AR with exogenous input, and AR with integrated moving average (Bacher et al., 2009) are some of the well-known statistical models used in solar PV forecasting. As the parameters in these models usually do not represent a physical phenomenon or quantity, such models are often referred to as 'blackbox' models or a type of functional approximators. The artificial neural network (NN) is another example of these models and is gaining popularity in PV forecasting owing to their modularity in handling nonlinear models. There are various structures of an NN, but they can be categorized into two according to their temporal characteristics: the static NN (Mellit & Pavan, 2010) and the dynamic NN (Fernandez-Jimenez et al., 2012). However, data-fit models can suffer from a generalization deficiency. Further, there is no systematic way in arriving at the most appropriate structure of the model due to omitting the underlying physics.
An alternative to this is the 'clear-box' model based on physical principles. The benefits of these models were outlined in the introduction and their equations are detailed in the preceding section.

PV module physical equations
There are various physical-principle-based models developed for PV modules, for example, a double diode model (Gow & Manning, 1999), a simplified single diode model (SSDM), and further SSDMs in a descending order of complexity. Higher complexity can provide better accuracy on the expense of increased computational burden, which is not suitable for real-time or online applications.
The best model that gives a good compromise between simplicity and accuracy is the SSDM shown in Figure 1 (Villalva, Gazoli, & Filho, 2009).
The following equations describe the relation between the current and voltage output of the PV cell/array: where I is the output current of the cell in amperes, V is the solar cell voltage in volts, I ph is the photocurrent in amperes, I d is the Shockley diode equation, I o is the reverse saturation or leakage current of the diode, V t = kT/q is the thermal voltage of the array, q is the electron charge (1.60217646 × 10 −19 C), k is the Boltzman constant (1.3806503 × 10 -23 J/K), T is the temperature of the cell in kelvin, and a is the ideality factor constant. More details of these equations can be found in Villalva et al. (2009). To calculate power yield, values for I and V are usually computed using numerical methods (Gow & Manning, 1999;Villalva et al., 2009). The mathematical approach is usually tedious especially when applied to large or widely spread PV systems (Tao et al., 2010).

Simplified PV equations
The aforementioned equations of PV modules require numerical solutions and thus are sometimes replaced with simplified equations that relate the power output to the efficiency of the system and variation in radiation and temperature (Omran, Kazerani, & Salama, 2010;Osterwald, 1986). These equations are a translation of performance measurement from standard test conditions (STC; Air Mass 1.5 spectrum with global irradiance G = 1000 W/m 2 and module temperature = 25°C). One famous simple method is that of Osterwald (1986), which can be described as follows: where P m is the cell/module maximum power (W), P mo is the cell/module maximum power in STC (W), γ is the cell maximum power coefficient (°C −1 ), which ranges from −0.005 to −0.003°C −1 in crystalline silicon and can be assumed to be −0.0035°C −1 with good accuracy. Another version of Equation (3) is given below (Omran et al., 2010): where G t is the global irradiance on the titled surface in W/m 2 , K T is thermal derating coefficient of the PV module in %/°C, A a area of the PV array in m 2 , η m is the module efficiency, η dust is 1-the fractional power loss due to dust on the PV array, η mis is 1 -the fractional power loss due module mismatch, η DCloss is 1 -the fractional power loss in the DC side, η MPPT is 1 -fractional power loss due to the maximum power point tracking (MPPT) algorithm, T C is the cell temperature in°C, T ao is the ambient temperature at STC conditions in°C. The ac power of the PV system is then estimated by using manufacturer's efficiency curve of three phase inverter.
The simplified PV equation adopted for this work is given below (Rahman & Yamashiro, 2007): In this equation, miscellaneous losses including dust were lumped together in η loss ; PV cell efficiency η PV and MPPT or inverter efficiency η inv are kept separate. T m is the module temperature.
The aforementioned Equations (3)-(5) require detailed modelling of the global irradiance falling on a tilted surface G t as outlined in the next section.

Irradiance falling on a tilted surface: Hottel's equations
There exist various models for calculating irradiance on a tilted panel. However, some of these models rely on other meteorological data such as total irradiance on horizontal surface, diffuse irradiance on horizontal surface, and beam normal irradiance. Models of this type include those of Perez, Scott, and Stewart (1983), Perez, Seals, Ineichen, Stewart, and Menicucci (1987) and Klucher (1979). Others are not accurate in cloudy conditions, Temps and Coulson (1977), or in clear skies, Liu and Jordan (1963). Simple models that require no additional solar measurements were proposed by Hottel (Chupong & Plangklang, 2011;Duffie & Beckman, 2006;Hottel, 1976;Tao et al., 2010) and are adopted in this work. Description of this model is outlined as follows.
To explain irradiance equations, it is important first to present equations of solar angles as they are a prerequisite to calculate solar equations. The derivation of irradiance on tilted surfaces requires the calculation of different solar angles. These equations are mainly based on Duffie & Beckman (2006); Iqbal (1983). Solar angles that define the position of the sun with respect to a PV plane are illustrated in Figure 2.
In Figure 2, β = tilt angle of array; A s = solar elevation (altitude): the angle between the horizontal and line to the sun; θ = angle of incidence: the angle between normal to array surface and direct irradiance on a tilted surface (or line to the sun); θ z = zenith angle: the angle between vertical line to earth and line to the sun; γ s = solar azimuth angle: the angular displacement from south of the projection of beam radiation on the horizontal plane. Displacements east of south are negative and west of south are positive; γ = surface azimuth angle: the deviation of the projection on a horizontal plane of the normal to the surface from the local meridian, with zero due to south, east negative, and west positive; −180°≤ γ ≤ 180°. The zenith angle θ z can be written as follows: where δ is the declination angle given by δ = 23.45 sin 360 284 + n 365 , and ϕ = the latitude in degrees is the angular location north or south of the equator, north positive; −90°≤ ϕ ≤ 90°; ω = the hour angle which is the angular displacement of the sun east or west of the local meridian due to rotation of the earth on its axis at 15°per hour; morning negative, afternoon positive. The hour angle can be calculated by first calculating the solar time given by where L st is the standard meridian for the local time zone, L loc is the longitude of the location in question, and longitudes are in degrees west. The parameter E is the equation of time in minutes and is given by where B is calculated as follows: The hour angle ω can then be written as Furthermore, the incidence angle θ can be calculated using the following formula: cos θ = sin δ sin ϕ cos β − sin δ cos ϕ sin β cos γ + cos δ cos ϕ cos β cos ω + cos δ sin ϕ sin β cos γ cos ω + cos δ sin β sin γ sin ω.
The solar irradiance falling on a tilted surface, G t (W/m 2 ) is composed of three parts: the direct irradiance G tb (W/m 2 ), the diffuse irradiance G td (W/m 2 ), and reflected irradiance G tr (W/m 2 ), that is, The three components of irradiance can be calculated as follows: where G on is the extraterrestrial radiation (W/m 2 ), τ b is the beam atmospheric transmittance, τ d is the diffuse atmospheric transmittance, and τ r is the reflected atmospheric transmittance. G on can be calculated as follows: where G sc is 1367 ± 5 W/m 2 and d is the day of the year. The atmospheric transmittances atmospheric transmittances, τ b , τ d , and τ r can be calculated as follows: where a 0 , a 1 , and k are constants that can be calculated as follows: where A is the altitude of the location in km, r 0 , r 1 , and r k are correction factors for different types of climates and are given in Table 1.  Figure 3. These values represent clear-sky values expected to fall on Masdar PV panels.
Furthermore, the expected yield from the PV plant was simulated using Equation (5), where values of parameters given in Table 2 were used. Power yield for five days in July and January are given in Figures 4 and 5, respectively.
The absolute error residuals between modelled and actual power were calculated and shown in Figures 4 and 5. To measure the accuracy of the model, the Root Mean Square Error (RMSE) between actual and identified model in terms of PV power output was calculated. The RMSE is    calculated as follows: where P i a is the ith actual output power, P i p is the ith predicted power by model, and n is number of data points. The RMSE for the clear-box model is given in Table 6. The error plot and RMSE indicate large errors when standard values are used in the clear-box models. It also indicates that the error changes with yield of the PV system.

Enhancement of clear-box PV model
The clear-box model assumes that different parameters, with values outlined in data sheet, are constant all time. However, in reality, values change with variations in atmospheric conditions and degradation of materials. For the PV system studied in this work, uncertainties in the operating values of parameters such as PV efficiencies, albedo of ground, temperature coefficient and miscellaneous losses may have contributed to the under par forecasting performance. It is possible to further enhance the accuracy of the clear-box model by identifying optimal values for these parameters. Heuristic algorithms are ideal candidates for handling uncertainties since they are based on 'intuitive' learning and do not strictly require formulas to work. Here, two such heuristic algorithms, namely a PS and a swarm intelligence-based search, are applied to tune the clear-box model.

PS approach
A simple derivative-free direct PS method, suitable for a constrained search problem, is first used to find the best set of values for the parameters outlined in Table 3. The concept of the algorithm is to search a mesh of points around the current initial point. The pattern is a scalar multiple of a set of vectors that are added to the current point. If a point in the mesh provides an improved fitness value, it becomes the current point for the next iteration of the search.
The search seized as mesh size reached the minimum tolerance size (1 × 10 −6 ). The initialization of the search  required several repetitions as PS is sensitive to initial values. The data obtained from the system was used in two phases: first to tune the model and second to validate its performance. The model is trained using the hourly PV power and module temperature) from days 5-20 in July 2010 (summer) and days 5-20 in January 2011 (winter). The fitness function of the identification is chosen as the average error of July and January as shown below: Once the model is identified, five consecutive days in both July 2010 and January 2011 are used to validate its performance. Forecasting error is measured using the average RMSE (RMSE test ).
The modelled output power and error plots are shown in Figures 6 and 7. The RMSE value is given in Table 6. The results obtained show an improvement (585.2 kW reduced error corresponding to a 36% improvement over clear-box model) in the forecasted output. This indicates that further enhancement of the clear-box model is possible when parameters are tuned to reduce uncertainties in the operating values. The list of identified parameters is given in Table 4.

PSO approach
PSO is inspired by the social behaviour of bird flocking or fish schooling (Eberhat & Shi, 2001). PSO has also been used here to optimize the parameters in the evolutionary clear-box model. For consistency, the same objective function of (23) was used in the optimization. The optimization including flying particles, potential solutions, in the dimension of solution space at different velocities. The particles' positions and velocities are updated in each iteration until optimum solution is reached; details of the algorithm can be found in Kennedy, Eberhart, & Shi (2001). The number of particles is set to n = 50 and the maximum number of search iterations at 500. The RMSE resulting from the PSO is given in Table 6. It is clear from the results obtained that with PSO enhancement; the clear-box model is able to achieve better forecasting accuracies. Nonetheless, no marked improvement was observed when compared to the PS enhanced clear-box model.
The convergence of both algorithms is shown in Figure 8. It is observed that whilst PSO took a shorter time to find the global minimum, more evaluations were required. This is mainly due to the higher numbers of initial particles and objective evaluations required in each iteration of the search process. On the other hand, PS is sensitive to initial conditions. For both methods, improvement in forecasting accuracies is evident.

Global black-box approach
Another simple approach to forecast PV power is to use time-series or data-driven models. Owing to the capability to handle nonlinearity and time-series data and absence of requirement for transformation to stationary data (Mellit & Pavan, 2010), DNN have been studied for PV forecasting; specifically FTDNN was explored (Al-Messabi et al., 2012). FTDNN was trained on PV power yield for days 5-20 in July 2010 (summer) and 5 days 5-20 in January 2011 (winter). The PV power of Masdar is of an hourly interval resolution. A similar approach to (Al-Messabi et al., 2012) was used in building a global PV FTDNN model. Delayed inputs of D = [1:10] are fed to the network, that is, y(n) = f (u(n − 1), u(n − 2), . . . , u(n − 10)).   Eight neurons were chosen for the hidden layer. The network was tested for one to several steps ahead for the entire system. Five consecutive days in both July 2010 and January 2011 were used to test the models and to compute the average forecasting RMSE (RMSE test ). Average RMSE values for different step ahead forecasts are given in Figure 11. Sample plots of the forecasts are given in Figures 9 and 10.
It is noted that the black-box FTDNN model is suitable for few steps (i.e. few hours) ahead where RMSE is comparable with clear-box values. However, for longer steps ahead, the error increases, although it shows a declining trend for days (24 and 48 hours) ahead forecast.

Heuritic grey-box PV modelling
Enhancement of the clear-box model is achievable by introducing local black-boxes to account for unknown effects. These represent nonlinearities not captured by the clear-box model. These will be accounted for by incorporating black-box parts to model operating-point sensitive coefficients. For example, in PVs, the parameters of efficiency of inverter are known to be adaptive and are function of their loading (Pregelj, 2003).
The black-box models incorporated can take any form of generic function approximators: power series polynomials, NNs, fuzzy logic, etc. In this work, we use polynomials and rational functions or the Pade approximators. The following two grey-box models are thus developed to accommodate changes in the clear-box efficiencies with loading: where P o is the fractional loading calculated by dividing the P DC output of PV by nominal DC rating of the inverter. The first model, grey box 1, is based on Padé approximation while the second, grey box 2, needs higher order polynomials because of no denominator and hence no recursion in the approximation. Both will be artificially   evolved for optimal PV models. The coefficients of these models will be identified again by PS and PSO such that objective function (23) is minimized with the same set of training and test data. The heuristically evolved parameters of the models are given in Table 5. The accuracy of all the models is compared in Table 6.
For grey box 1, the search algorithms produced closer results, and PSO slightly outperformed PS. For the higher order grey box 2, PS failed to find the optimum solution while PSO succeed. The progresses of the search for grey boxes 1 and 2 are shown in Figures 14 and 16, respectively. The power yields and absolute errors of the greybox models are shown in 12-15 and 17-19. It can be seen that the grey boxes provide improved forecasts especially in transient period as power starts to rise or fall.

Discussion of results
(1) In general, tuning the parameters of a clear-box model improves forecasting results, as parameters from manufacturers require further calibration according to the specific atmospheric and location conditions of a given PV application. (2) Evolutionary clear-box models provide forecasts more accurately, especially on long horizons, than black-box FTDNN models. (3) Black-box models without exogenous inputs are found suitable only for very short horizon forecasting. (4) On the whole, grey-box models enhance the modelling accuracy compared with pure clear boxes     (6) Increasing or decreasing the order of grey boxes 1 and 2 is found to deteriorate the modelling or generalization accuracy and hence the given values are optimal numbers of the associated black boxes. (7) Generally the models coincides more with actual yield in clear days (such as in July) as they are based on clear-sky Hottel equations. (8) General observation: the simplified model, Equation (5), is found more sensitive to module temperature in July than in January, that is, excluding the temperature coefficient part (in brackets) has had a higher impact (worsen accuracy) in July than in January. This is expected as higher temperatures in July will impact the performance of the PV panels.
Areas of improvement include exploring different structures for grey-box models. Further, clear-box model can be explored to incorporate parameters replaced by black boxes to account for uncertainties.

Conclusion
The enhancement of clear-box modelling and forecasting of PV power through the introduction of black-box models has been studied and developed in this paper. It has been found that practical values of parameters can be tuned to improve the accuracy of the models. Further enhancement can be achieved through the introduction of grey-box models to account for uncertainties in the PV models. Derivative-free heuristics can be utilized in the identification processes, which has found particularly beneficial with grey-box model identification. The work presented is a novel step towards improved PV forecasting and a better integration of this RES in the power network.

Disclosure statement
No potential conflict of interest was reported by the authors.