VLCC’s fuel consumption prediction modeling based on noon report and automatic identification system

Abstract It is extremely important for fuel saving by taking the correct decisions where cost efficiency and environmental friendliness are top priorities. The fuel consumption rate of the ship is influenced by many parameters, such as average daily sailing speed, ship displacement, cargo, ballast water and bunker, trim and sea conditions (wind, wave and current) in a complicated way. In this study, noon report (NR) and automatic identification system (AIS) datum of four Very Large Crude Carriers (VLCC) are widely used to establish a prediction model. Needless to say that, the accuracy of statistical models depends on consistency and quality of collected datum, hence a novel combination methodology applied to NR and AIS datum to prepare a series of pure valid data population of vessel speed, fuel consumption and sea state. Then the consistency of populations are enriched by eliminating the out ranged or junkie members in different methods, i.e., T-test, normality control and outlier score base (OSB). Finally, multiple linear regressions are applied considering all fuel consumption influential parameters. Results show a high correlation between the independent and dependent variables. Consequently, generated formula predicts fuel consumption of vessels at all variable conditions in good agreement with recorded fuel consumption data.


PUBLIC INTEREST STATEMENT
Since the fuel consumption costs is the most considerable item among all ship operational and voyage cost elements, all ship owners are so concerns about reduction of this item in an envisaged voyage. One of the approach in reducing this cost in ship owners point of view is to predict the fuel consumption in each voyage precisely taking into consideration other influential factors such as sea condition. This paper presents a practical prediction formula based on the data collected and reported to the owners on daily basis for Very Large Crude Carrier (VLCC). It helps the owners not only to predict their yearly budget for fuel consumption and prevent any unwanted element resulting in increasing the fuel consumption but also to enable owners to decrease the consumption because of his full awareness on the real influential factors affecting the fuel consumption in each voyage. In addition, the paper presents an economical parcel size to be loaded in regard to maximizing owner income for the voyage.

Introduction
Fuel consumption rate of a ship is influenced by many parameters, such as average daily sailing speed, weight of ship, cargo, ballast water and bunker; trim and sea conditions (wind, wave and current) in a complicated way. So far, many researchers investigated on reducing fuel consumption.
After significant increase in fuel price in 2008, onboard ships, commonly referred to as "bunkers", has become the largest cost item of a ship's operational expenses, accounting today almost 50% of a voyage cost, even greater than crew wages (Stopford, 2009). The level of interest in designing a fuel efficient ship is linearly related to the fuel price. Between 1970 and 1980 fuel oil price increased significantly (nearly ten-fold), leading to ships with high fuel consumption being laid up (Wijnolst & Wergeland, 2009). During the period 1985-2000 prices of fuel oil fell, with research and development on energy efficiency not receiving particular attention by the maritime industry. However, from year 2000 the crude oil cost started to climb again, which pushed engine manufacturers, shipyards and designers to re-investigate design and operational solutions for reducing fuel consumption and increasing energy efficiency. Shipping is no different than other industries, and is highly affected by fuel prices. However, there is, to a certain extent, a control on the ship's fuel consumption by means of technical innovation fitted or by better ship operation such as weather routing, trimming, slow steaming, etc. (DNV, 2010). Even though oil price decreased for a brief period of time after the 2008 recession, today is again at record high levels, meaning that ship operators cannot ignore this expense as in the past, or just embody it into the price of the commodities carried, but there is a need to design and operate more efficient ships, consuming less fuel per carrying capacity. Furthermore, the intense focus on environmental protection, supported by considerable research findings, has led the international maritime organization (IMO) to take concerted measures towards this direction, in limiting the environment foot print of ships significantly. In particular, one of the top environmental topics is global warming due to increasing greenhouse gases (GHG) in the atmosphere. The shipping industry contributed about 4% of the world carbon dioxide (CO 2 ) emissions in 2007 (Reynolds, 2009). The aim is to reduce CO 2 emissions comes hand in hand with the increasing fuel price, and is leading towards the adoption of technological and operational innovations in order to decrease fuel consumption. In order to set means to improve ship's fuel efficiency, it is initially required to define the prevailing fuel consumption rate. For this purpose, the importance of carrying out a full scale ship performance analysis is highlighted in several publications as offering benefits to the designers and the operators. The aim of such an analysis can, for example, be the prediction of the required propulsion power (Pedersen & Larsen, 2009), or monitoring of the hull resistance due to fouling (Aas-Hansen, 2011). Boom et al. suggested that since sensors are already found onboard along with equipment to transmit the information, continuous monitoring may be achieved with adequate analysis (Boom, Koning, & Aalbrets, 2005). Coraddu et al. after collecting needed data onboard, created a data analytics method to calculate vessel fuel consumption. They declared that the latter can be used to reduce the vessel consumption by means of optimizing the vessel operational conditions. In particular, they have shown that gray-box models (GBM) are able to combine the high prediction accuracy of black-box models (BBM) while reducing the amount of data required for training the model by adding a white-box model (WBM) component. The resulting GBM model is then used for optimizing the trim of the vessel, suggesting that between 0.5 and 2.3% fuel savings can be obtained by appropriately trimming the ship, depending on the extent of the range for varying the trim (Coraddu, Oneto, Baldi, & Anguita, 2006). Henggeler, Henry, and Kenny (n.d.) proposed chart base correlations to estimate energy efficiency by measuring diesel fuel consumption. In 7th Society of Arctic Climate Change, Economy and Calculation of fuel consumption per mile for various ship types and ice conditions in past, present and in future has been studied. It has stated that there is no unambiguous relation between ice situation (extent, thickness, coverage) and the fuel consumption and exhaust emission. The reason is that if the ice extent increases towards the winter period fewer ships are able to travel the northern routes in a reasonable time. Additionally in the intermediate periods (freeze up and melting) ships will be restricted in speed due to safety reasons. In order to accumulate exhausts emissions for the arctic region for future times the number of ships, which may operate under reasonable safe and economic conditions, has to be determined. This number will be depending on the development of the region and its infrastructure (socio economic factors). Moreover, the travel time and operation condition of the different ship types will be of major concern as the speed profile will not only depend on technical ability but on freight rates and type of goods to be transported along the northern sea route (7th Arctic Climate Chang Economy and Society, 2014). Donggon (2016) has studied the economic impact of fuel consumption uncertainty for tankers as M.Sc. thesis. According to his published study empirical results show that optimal speed is very sensitive to bunker price rather than freight when subject to various weather and hull fouling conditions. The economic impact of uncertainty in weather and hull fouling conditions has not been empirically estimated on the basis of detailed noon report (NR) data. In this regard, as the first attempt he analyzed various factors in real life ship operation change the decision making for profit maximization and speed optimization. Bialystocki and Konovessis (2016) proposed a statistic method for prediction of fuel consumption procedure as shown in Figure 1. It illustrates the algorithm developed for the prediction of fuel consumption and speed curve. By utilizing the correlation curve, it is possible to estimate the fuel consumption in a future voyage, based on predetermined information. Initially, three corrections are applied before a preliminary curve is plotted: (1) The ground speed was calculated by dividing the traveled main distance with the steaming time. More so, the engine fuel consumption is per steaming time. Therefore, fuel consumption is corrected.
(2) Departure and arrival drafts for each voyage were also recorded, and intermediate ship's drafts were calculated using interpolation. Thereafter, a correction to the fuel consumption of the actual draft to design draft is carried out using the Admiralty coefficient.
(3) Ground speed was corrected to take into consideration the current, if occurred. When the current flowed aft wards it was added to the ground speed, while in case that the current flowed forward it was deducted from the ground speed.
As mentioned, some required data are collected from NR and AIS for VLCC. These data are used for establishing reliable equations to predict fuel consumption. NR data is found in by chief officer or captain of the ship once a day. Always human error is a part of all reports made by human. Therefore, when reviewing NR data, it is often found that some data are odd and not in right harmony with the others. There are many statistical methods to find the roots of the data's harmonies or odd data. In order to acquire pure valid NR data for defining thoroughly relation between the tankers fuel consumption influential parameters, the NR data of four VLCC ships are composed with their respective AIS data. Then, the out ranged or junkie's data are determined by different methods, i.e., T-test, normality control and OSB. Furthermore, the outlier scour based method used for filtering data to enhance better correlation between parameters (Safaei, Ghassemi & Ghiasi, 2018, 2019. The formula proposed may be used for prediction of fuel consumption taking into consideration the independent and dependent variables including ship speed, ship displacement, trim, sea state and fuel consumption rates. Figure 2-5 show the raw data of the vessel speed, fuel consumption and sea states for four ships (DUNE, DIONA, SEASTAR III, SERENA).

Prediction methods
Durbin-Watson test performed to validate the independency of all variables. This well-known method is applicable for inferential suitability models with serious deficiency in serial correlation (e.g., assessing the confidence in the predicted value of a dependent variable). On the other hand, the method is able to find multi variable correlation by fewer complexity in order to its linear nature. Also superposition rule can be deployed to use more correlated variables. The test statistic of the Durbin-Watson procedure is d and is calculated as follows (O'Brien, 2007): Recall that e t represents the observed error term (i.e., residuals) or: It can be shown that the value of d will be between zero and four; zero corresponding to perfect positive correlation and four to perfect negative correlation. If the error terms, e t and et À 1, are uncorrelated, the expected value of d is two. When the calculated value of d is less than two, it  shows the independent variables have first order serial correlation otherwise they have nonlinear or parallel relation. Unfortunately, the Durbin-Watson test can be inconclusive. As a general rule if d value be between 1.5 and 2.5, the independency of variables would be satisfactory. The variables in this essay are defined as follows:   Safaei et al., Cogent Engineering (2019), 6: 1595292 https://doi.org/10.1080/23311916.2019 The results of Dubin Watson test for variables have presented in Table 1.
The d value calculated for four parameters W n ; V v ; D s and W v is about 1.3 that shows correlation between variables. The other test done by ignoring W n (because of probability of correlation between W v and W n ). The results are presented in Table 2. By ignoring W n , the d value becomes 1.67 that shows the regression can done based on variables. Table 3 shows the results of co-linearity control of independent variables. A tolerance of less than 0.20 or 0.10 and/or a variance inflation factor (VIF) of 5 or 10 and above indicates a colinearity problem (O'Brien, 2007). As evident in the table, wave and wind parameters have significant co-linearity with other independent variables. So the correlation of independent variables shall check pairwise.  Safaei et al., Cogent Engineering (2019), 6: 1595292 https://doi.org/10.1080/23311916.2019 In the second step, the Pearson's correlation coefficient is calculated for data sets. This coefficient often referred to as the Pearson's R test is a statistical formula that measures the strength between variables and relationships. To determine how strong the relationship is between two variables, it is needed to find the coefficient value, which can range between −1 and +1. The results of Pierson test are shown in Table 4. It is evident from Table 4 that there is a significant correlation between wind and wave and ignoring one of these two variables can lead to better regression results. So according to orientation consideration, the wind effect has been ignored in regression.
The Breusch-Pagan (BP) test is one of the most common tests for heteroscedasticity. It begins by allowing the heteroscedasticity process to be a function of one or more of your independent variables, and it is usually applied by assuming that heteroscedasticity may be a linear function of all the independent variables in the model. This assumption can be expressed as follows: The values for ε 2 i are not known in practice, soε 2 i are calculated from the residuals and used as proxies for ε 2 i . Generally, the BP test is based on the estimation of: Alternatively, a BP test can be performed by estimating: whereŶ represents the predicted values from: So in this study estimation of model performed using ordinary least-squared (OLS): The predicted Y values were obtained after estimating the model then Estimation of the auxiliary regression performed using OLS: The R 2 value retained from the auxiliary regression R 2 S 2 : Finally, F-statistic or the chi-squared statistic is calculated as follows:

Result
The degrees of freedom for the F-test are equal to 1 in the numerator and n-2 in the denominator. The degrees of freedom for the chi-squared test are equal to 1. If either of these test statistics is significant, then you have evidence of heteroscedasticity. If not, you fail to reject the null hypothesis of homoscedasticity. The result of BP test is presented in Table 5.  As shown in Table 5, the P-value is more than 0.05 so the null hypothesis did not reject so the variables have homoscedasticity and multiple linear regressions can be done.
For this issue, stepwise method is performed. The results are presented in Table 6.
By performing the regression on the variables in final model (Model No. 3), it is obtained from significance coefficient values that there is significant relation between independent variables and Fuel Consumption also the β coefficient shows all the independent variables have positive effect on fuel consumption.
The predicted formula will be as follows: For example, the fuel consumption in the case of 5 continuous days calculated by the formula and compared with NR data are tested. The results show the accuracy of prediction is about 97% in those days. The results are presented in Table 7. Figure 6 depicts the predicted for the fuel consumption versus measured values for all four VLCC's ships. The voyage for all ships was from Khark Island heading laden to one south Chinese port. As it is shown, all data are in great accuracy with reported fuel consumptions.

Conclusion
In this study, some raw data of fuel consumption, speed, sea states are collected from NR and AIS for four VLCC ships. These data are analyzed in order to acquire a pure valid NR data for defining thoroughly relation between the tankers fuel consumption influential parameters by taking the following steps. In the first step, the NR data of four VLCC ships are composed with their respective AIS data and then the out ranged or junkie's data are determined by different methods, i.e., T-test, normality control and OSB. By perfuming the regression, the prediction formula produced indicating a high correlation between independent variables (ship speed, displacement, wind and sea states) and dependent variables (fuel consumption). The calculated R 2 value for estimating the accuracy of the model is about 0.75; therefore, independent variables can be predicted 75% of variation of fuel consumption. According to proposed model, by step-wise regression, ship speed has the greatest coefficient in the equation. Afterward wind and displacement coefficients are, respectively, 0.3 and 0.07 of ship speed impact. In future study, it is recommended to use nonlinear regression methods to increase the accuracy of the prediction due to the nonlinear physical relation of ship speed and fuel consumption.