Spatio-temporal modelling of the influence of climatic variables and seasonal variation on PM10 in Malaysia using multivariate regression (MVR) and GIS

Abstract In an era of rapidly changing climate, investigating the impacts of climate parameters on major air pollutants such as Particulate matter (PM10) is imperative to mitigate its adverse effect. This study utilizes Geographic Information System (GIS), a multivariate regression model (MVR) and Pearson correlation analysis to examine the inter-relationship between PM10 and major climate parameters such as temperature, wind speed, and humidity. Although the application of MVR for predicting PM10 has been examined in previous studies, however, the spatial modelling and prediction of this air pollutant is limited. Accurate spatial assessment of pollutants’ hazard susceptibility in relation to climate change can accelerate mitigation initiatives. Thus, to understand the behavior, seasonal pattern, and trend of PM10 concentration which is vital for good air quality, GIS is essential for enhanced visualization and interpretation of the predicted occurrence of the pollutant. The acquired data were randomly divided into 80% and 20% for training and validation of the MVR model, respectively while GIS was used to model the spatial distribution of the predicted ambient PM10 concentration, highlighting the hotspots of future PM10 hazard. A positive correlation index was obtained between PM10 with temperature and wind speed. However, humidity showed a negative correlation. The regression model showed high predictive performance of R2 = 0.298, RMSE = 12.737, and MAE of 10.343, with the highest PM10 concentration correlated with the warming event in the southwest monsoon. Temperature, wind speed, and humidity were identified as the most critical variables influencing PM10 concentration in the study area, in descending order of importance. This study’s outcome provides valuable spatio-temporal information on future climate change impact on PM10 in the study area with the potential to support effective air quality management.


Introduction
Particulate matters (PM) are hazardous pollutants that have tremendous negative impacts on human health and environmental sustainability (Choubin et al. 2020). Exposure of people to atmospheric pollution impacts human-health, causing heart diseases, respiratory diseases, and increased risk of stroke (World Health Organization 2016a), which has led to an estimated 3 million deaths annually (World Health Organization 2016b). Rapid urbanization characterized by population growth (United Nations 2015) is influencing the environment and affecting the air quality (Choubin et al. 2020;Liu et al. 2014). Recent projections indicate that the world's population will exceed 9.5 billion by 2050 (United Nations 2015), and a high percentage of cities with over 100,000 citizens, majorly in the underdeveloped and developing countries, are unable to meet standard air quality guidelines (Osseiran and Chriscaden 2016).
Particulate matter with diameter 10 mm (PM 10 ) are air pollutants that can easily penetrate the lung tract (de Rooij et al. 2017) and it is considered the most dangerous air pollutant in urbanized regions in the world (Yadav et al. 2014;Stoimenova et al. 2017;Ganguly et al. 2019;Cujia et al. 2019;Han et al. 2020). Exposure to PM 10 for a short or prolonged period has severe impacts on global health, causing lung and heart diseases, and deaths if its concentration increases above its strata in the atmosphere (Czernecki et al. 2017;Feng et al. 2019;Yao et al. 2015;Zheng et al. 2015). For instance, a rise in the concentration of PM 10 in Malaysia from 50 mg/m 3 to 150 mg/m 3 has been linked to an increase in respiratory diseases (12%), exacerbating asthmatic condition (19%) and rhinitis (26%) (Wong et al. 2017). This has been influenced by rapid urbanization leading to an increase in the population and industrialization (Qureshi et al. 2015), thereby depleting the air quality (Abdullah et al. 2017). PM 10 is the dominant atmospheric pollutant in Peninsular Malaysia due to its higher level of concentration in the Air pollution Index (API) (Abdullah et al. 2017). In comparison to other air pollutants such as sulfur dioxide (SO 2 ), ozone (O 3 ), nitrogen (NOx), and carbon monoxide (CO), it has been mostly considered as the principal air pollutant for API calculation which indicates the Malaysian API level (Shaadan et al. 2015;Althuwaynee et al. 2020).
Despite the recent focus on PM 2.5 Zaman et al. 2019;Li et al. 2019), further studies to deepen the understanding of the behavior, trend, seasonal patterns, and sources of PM 10 remain crucial due to its adverse impacts on the respiratory and cardiovascular system (Al-Hemoud et al. 2017;Sun et al. 2018). Its potentiality in increasing the death rate for a 10 mg/m 3 rise in concentration (Althuwaynee et al. 2020;de Rooij et al. 2017) is equally a source of concern.
1.1. Causes and enablers of air pollution (PM 10 ) PM 10 occurs naturally or anthropogenically. Natural sources of PM 10 include dust, sea salt, and carbon emitted from open burning (Czernecki et al. 2017). Emission from open burning in Malaysia and neighbouring countries, particularly Indonesia (Wong et al. 2017), is the main source of PM 10 in the country (Khan et al. 2015;Latif et al. 2011). Human-induced sources of PM 10 include vehicular emission, dust from soils, emission from industries, and power plants (Azarmi et al. 2016;Taheri Shahraiyni and Sodoudi 2016). These anthropogenic sources have a considerable influence on the release of PM 10 to the atmosphere (Khan et al. 2015;Abdullah et al. 2017).
Climatic factors such as temperature, wind speed, and humidity have a significant impact on the formation, transportation, and deposition of atmospheric pollutants (Khan et al. 2015;Zhang and Ding 2017). Temperature and humidity affect the physical and chemical components of PM 10 (Khan et al. 2015). Also, the temperature has a reductive effect on particulate matters through the convectional method (Li et al. 2015), and it is an important variable for secondary PM 10 formation (Taheri Shahraiyni and Sodoudi 2016). Wind speed influences the transportation of ambient pollutants (Wang and Ogawa 2015;Ganguly et al. 2019) while relative humidity makes PM 10 heavier and also aids the dry deposition removal (Kleine Deters et al. 2017). Recent studies have highlighted a correlation between air pollution and seasonal variation since different pollutants exhibit diverse properties at different times of the year due to climatic variability (Wang and Ogawa 2015;Kleine Deters et al. 2017).
An accurate assessment of the correlation between climatic variables and air pollutants is crucial for projecting PM 10 concentration in cities (Althuwaynee et al. 2020). The knowledge of the nexus between air pollutants and climatic variables will ensure a proper understanding of how it influences air pollution. Thus, understanding this inter-relationship will assist in the mitigation, monitoring, and control of air pollution emission, transportation, and distribution. Also, identifying the climatic variables that exert the most significant impact on air pollutants and the period when the pollutants are most lethal will assist stakeholders to prioritize intervention plans in the right context. For example, understanding the climatic variables that exhibit high positive correlation and high predictive performance as a predictor will provide nuggets of information on the leading climatic factors causing air pollution.
However, it is noteworthy that the interdependence of air pollutants and climatic conditions vary across regions, which makes it difficult to generalize trends. For example, while the temperature is highly correlated with PM 10 in Seoul in Korea (Kim 2019), its correlation outcome in Auckland in New Zealand (Hernandez et al. 2017) is different. Similarly Giri et al. (2008) obtained a negative correlation between temperature and PM 10 in the Himalayan Kingdom of Nepal while (Sharma and Sharma 2016) got varied and positive correlation index in four different seasons in Northern India.
Considering the influence of climatic conditions on air pollution occurrence and management in other regions, it is crucial to accurately assess the correlation of climatic variables with PM 10 in Malaysia too, taking into consideration its peculiar seasonal variations and regular occurrence of wildfire and haze pollution.

Modelling air pollutants
Different methods exist for modelling air pollution. The deterministic method which uses air transport, release, and chemical modules (Zhou et al. 2019) is often utilized (Djalalova et al. 2015;Choubin et al. 2020). Statistical models such as linear regression have evolved to show higher predictive performance and accuracy than the deterministic method (Song et al. 2015). Regression has been used to accurately predict air pollution because of its simplicity in computation and implementation (Abdullah et al. 2017;Fong et al. 2018). The most widely adopted regression method is the multi-linear regression model which utilizes climatic factors as predictors for modelling air quality.
Previous studies have implemented statistical models for PM 10 concentration prediction. Nazif et al. (2019), Fong et al. (2018), and Abdullah et al. (2017) used multiple regression for forecasting air pollution in Malaysia. However, many of these studies do not model the spatial distribution of PM 10, which can aid the prompt detection of hazardous areas. Spatial modelling of pollutants is essential because it can help stakeholders and decision-makers to determine regions that have poor air quality.
Further, studies that consider seasonal variation in modelling the emission and distribution of PM 10 in Malaysia are scant. Investigating the emission trend of PM 10 in different months, years, and seasons will offer further insights on the impacts of climatic variables such as temperature, wind speed and humidity on the pollutant and enable evidence-based intervention. The main contributions of this paper are as follows: i. The assessment of the impacts of climatic variables on PM 10 concentration, taking into consideration seasonal variation. This approach reveals the effect of climatic variables such as temperature, humidity, and wind speed on PM 10 variation in Malaysia ii. Development and validation of an algorithm to predict the concentration of PM 10 .
This will assess the potential of climatic variables to predict PM 10 concentration. iii. Leverage advanced geospatial techniques to model the PM 10 hazard level. This ensures better visualization, analysis and interpretation of the model's result.

Study area
Malaysia ( Figure 1) is an Asian country with latitude and longitude 4.2105 N, and 101.9758 , respectively (Pour et al. 2020). The federal constitutional monarchy consists of 13 states and three federal territories. The country is separated by the South China Sea into two regions, namely peninsular and East Malaysia. The study area was once dominated by palm oil plantation. However, there is a rapid increase in population due to urban sprawl, especially in peninsular Malaysia (Andaya and Andaya 2016). Its population is over 30 million with a land area of 320,000 km 2 (123,553 sq mi) (Ab Rahman et al. 2013). It has an equatorial climate with a warm and humid climate throughout the year (Tang 2019). The temperature ranges from 25.5 C to 33 C and is moderated by the presence of the surrounding oceans while annual rainfall ranges between 200 cm and 400 cm. Malaysia has two main seasons also referred to as 'monsoons'. The northeast monsoon starts from November to March, while the southwest monsoon extends from May to September (Fong et al. 2018;Nazif et al. 2019). The two transitional periods are between April-May and October-November, called intermonsoon 1 and 2, respectively (Abdullah et al. 2017). According to Andaya and Andaya (2016), the monsoon winds are associated with rainfall variationz; however, the wettest season corresponds to the northeast monsoon while the southwest monsoon is characterized by temperate weather. Changes in the monsoon pattern occur due to El Nino effects which are prevalent in the southwest monsoon (Ab Rahman et al. 2013). This causes a rise in temperature and a more extended period of warm weather (Andaya and Andaya 2016). According to Baker (2008), it cannot be fully established that there is a dry season in Malaysia because even in Southwest monsoon, there will be rainfall for at least 7 days which makes the two seasons either rainy or more rainy season.

Data utilized
The data used in this study are the meteorological and air quality data obtained from the Department of Environment (DOE) Malaysia. Climatic variables include the hourly temperature ( C), humidity (%), wind speed (m/s), and the air quality data, which comprises the hourly concentration of PM 10 (mg/m 3 ). These data, covering the period from 2012 to 2016, were acquired from five air monitoring stations ( Figure 2). The stations are strategically mounted around industrialized and urbanized regions (Ahmat et al. 2015) across five states in Peninsular Malaysia. The Bukit Rambai and Nilai air quality stations are located in the industrial hub in Peninsular Malaysia; Cederawasih, Taman Inderawasih, Perai station in Pahang, the second most populated state; the Klang air quality station is situated in Klang valley, the trade centre, and industrialized region in the most populated state (Selangor) in Malaysia (Ahamad et al. 2014). The data was classified into months, years, and seasons for analysis. The seasons considered in this study are the southwest and northeast monsoon. The spatial distribution of these stations is displayed in Figure 2.

Data analysis
First, a thorough investigation was carried out on the PM 10 concentration to understand its distribution and behaviour in Malaysia. To achieve this, the PM 10 concentration is studied based on months (January-December), years (2012-2016), and seasons (Southwest and Northeast Monsoons). This ensures a better understanding of the temporal and seasonal behaviour of PM 10 . Then, we observed and compared the relationship between the meteorological parameters and the atmospheric PM 10 concentration. This was done by studying the monthly, yearly, and seasonal distribution of the three climatic variables and PM 10 . Pearson's coefficient of correlation was used to investigate the inter-link between the climatic factors and PM 10' adopting the methodology of some related studies such as Rahman et al. (2019). Afterwards, the capability of multivariate regression to predict PM 10 using climatic variables as a predictor was investigated. The PM 10 modelling was done in python, while other analysis was carried out in Minitab software. Finally, the PM10 hazard susceptibility map of the study area was generated in the ArcGIS environment using the regression model. The map was generated using the coefficient prediction value of the model as earlier employed by Bozda g et al. (2020) through the application of geostatistical analysis. Also, the spatial distribution of PM 10 concentration was generated based on the Air Pollution Index (API) format of the DOE Malaysia using spatial interpolation analysis in ArcGIS. Spatial interpolation is a common geostatistical technique used to map air pollution (Jumaah et al. 2019). This approach involves using the point value of known regions to estimate unknown regions' point value ) within the spatial range of the known values. This method depends on the number and evenly distribution of point values, limiting its performance or spatial variability (Tian 2016;Kim et al. 2014). However, this approach is still widely utilized for spatial assessment and distribution of air pollutants (Bozda g et al. 2020;Shukla et al. 2020;Ahmed et al. 2018;Ma et al. 2019). Among the different types of interpolation techniques, IDW is one of the most widely used. The IDW can be used for multivariate interpolation (Chen and Liu 2012). The performance of IDW has been proven in air pollution studies (Jumaah et al. 2019;Kumar et al. 2016). Thus, Inverse Distance Weighting (IDW) was employed in this study.

Multivariate regression (MVR)
Multivariate regression is used to determine the relationship between a dependent variable and multiple independent variables (Abdullah et al. 2016). The model accounts for the influence of the multiple control variables on a single dependent variable. It is also used to check the impacts of one or more independent variables on the dependent variable (Abba et al. 2017). An advantage of MVR is that the observable changes in the dependent variable can be explained by the explanatory variable and the contribution of each of these variables can be easily determined. The equation of multivariate regression is shown in Eq. (1) where Y is the dependent variable, b 0 is the constant y-intercept, b 1 , b 2 , :::, b n are the regression slope coefficient for the independent variables, X is the independent variables and 2 is the residuals. Although simple linear regression has been implemented in air pollution studies, it has a low predictive accuracy for particulate matters (Wang and Ogawa 2015). In contrast, multiple linear regression has a higher probability of getting a better model fit than simple linear regression due to the inclusion of more than one explanatory variable that determines the prediction of the dependent variable (Gupta 2019). Thus, the capability and strength of the explanatory variables in predicting the dependent variable using MVR will be determined. Moreover, with the application of MVR, the significance of the independent variables (e.g. climatic factors) in predicting the dependent variable (PM 10 ) will be determined. Understanding the multicollinearity between the independent variables is fundamental to determine the statistical significance of the independent variables (Allen 1997;Yoo et al. 2014;Daoud 2017) in predicting the dependent variable. Thus, pairwise correlation analysis was performed to determine the problem of multicollinearity in the independent variables. The following statistical indices (Table 1) were used to evaluate the model's performance.

Results and discussion
The statistical description of the variables considered for this study is presented in Table 2. The total mean concentration of PM 10 recorded for 12 months is 55.77 mg/m 3 . The total maximum mean concentration of PM 10 in 12 months (68.33 mg/m 3 ) exceeded the guidelines set by the DOE, Malaysia in controlling the PM 10 concertation, as shown in Table 3. The guidelines state that the annual concentration of PM 10 should not exceed 50 mg/m 3 (Department of Environment 2013). However, this is even higher than the limit set by the European Union (EU) (40 mg/m 3 ) (Khan et al. 2015) and the World Health Organization (20 mg/m 3 ) (World Health Organization 2006) to reduce poor air quality. Exceeding the guidelines may be influenced by the biomass and bush burning used in Indonesia for land cultivation. The smokes from this combustion are transported to Malaysia, a neighbouring country , under the influence of wind and the seasons (Kusumaningtyas and Aldrian 2016).

Variation in PM 10 concentration in Malaysia
The time series of the monthly mean concentration of PM 10 is displayed in Figure 3. The monthly mean concentration ranges from 44.08 mg/m 3 to 68.33 mg/m 3 . Approximately 67% of the monthly concentration of PM 10 for five years is above the Malaysian Ambient Air Quality Guidelines (MAAQG). In terms of the monsoons, 20% of the high PM 10 concentration above 50 mg/m 3 occurred in the northeast monsoon while 40% of the high PM 10 concentration above 50 mg/m 3 occurred during the southwest monsoon.
Specifically, a trend of high concentration of PM 10 is observed in June, October, and March, respectively. The highest concentration of PM 10 occurs in June (68.33 mg/m 3 ), the second month that precedes the beginning of the Southwest monsoon while the lowest concentration of PM 10 falls in November (44.08 mg/m 3 ) which is the starting month for the Northwest monsoon. This finding aligns with the work of Hassan et al. (2020) whereby the highest PM 10 concentration started and was It expresses the model's fitness with the range of À1 to þ1. A negative value indicates a low performance while positive values that extend towards þ1 indicate a high predictive capability.
MAE evaluate the predictive accuracy of the prediction by estimating the difference between the mean absolute value between the predicted and true value MAE ¼ P n i¼1 bi Àai j j n dominant in the southwest monsoon. Thus, high PM 10 concentration is correlated with the warming event in the monsoon. Also, the lowest concentration of PM 10 in the starting month of the northeast monsoon might be due to the abundant rainfall in this monsoon compared to the southwest monsoon. Rainfall enables the reduction of PM 10 in the atmosphere (Chu 2015) by wet deposition (Luan et al. 2019). Therefore, it is inferred that as temperature affects air quality, rainfall aids the removal of air pollutants in the atmosphere in the study area.   This has been earlier done by some studies ( € Ozdemir and Taner 2014;Nazif et al. 2019) which used the boxplot to understand the concentration and trend of PM10 and to compare the PM10 concentration either in months or seasons.
The box plot, which is a graphical expression for devising comparative discussion and understanding the different variables (Abdullah et al. 2017) is shown in Figure 4. Noteworthy that a distribution is skewed when a tail is longer than the other. Thus, it can be inferred from Figure 4 that the concentration of PM10 is skewed to the left due to the tail pointing towards the negative direction. The negative skewness may indicate that there were more moderate to low concentration of ambient PM10 than the concentration represented by the positive skewness in Ng and Awang (2018). The box plot also depicts the effect of seasonal changes in the formation and accumulation of PM 10 in the atmosphere. It shows the average monthly concentration of PM 10 in different seasons. A higher average concentration of PM 10 in the southwest monsoon (72.01 mg/m 3 ) was recorded compared to the northeast monsoon (60.23 mg/m 3 ). A higher concentration in the southwest monsoon that represents a drier season may be impacted by a rise in temperature in this monsoon (Kassomenos et al. 2014).
Basically, Malaysia is a tropical country with high temperatures (Abdullah et al. 2017). Anthropogenic activities such as combustion will undoubtedly affect the warming in the country especially during the dry season. According to Abdullah et al. (2020), the transboundary haze that affects the air quality in Malaysia does occur more often during the southwest monsoon with lesser rainfall. Temperature also caused a variation in the circulation of wind (a medium for transboundary pollution) in the atmosphere (Abdullah et al. 2017). Thus, the 9% higher concentration of PM 10 in the warm southwest monsoon season than the wet northeast monsoon.
This finding is supported by some previous studies on the causes of variation in the concentration of air pollutants. Hassan et al. (2020) concluded that the high concentration of PM 10 in Malaysia in the southwest monsoon is due to the warming effect in the dry season (southwest monsoon). Also, a high concentration of PM 10 was observed in the warm season in the study of Kassomenos et al. (2014), which examined the variation in the atmospheric PM 2.5 and PM 10 in European cities. Nazif et al. (2019) also discovered that seasonal changes influence the PM 10 variation in the atmosphere with a higher concentration of PM10 in the warm season. According to Wang and Ogawa (2015), warm-seasons impact the formation of atmospheric particles by enhancing the physical and chemical reactions of atmospheric precursors. Figure 5 shows the trend of the PM 10 concentration with variations in the annual PM 10 level from 2012 to 2016. The trend of the annual concentration of PM 10 shows that they exceed the recommended Malaysian Air Quality Guidelines (RMG) of 50 mg/m 3 . The highest concentration values are recorded in 2013 (56.81 mg/m 3 ) and 2015 (63.73 mg/m 3 ). The relatively higher concentration in 2013 and 2015 is likely due to the haze pollution in Malaysia, which exacerbates poor air quality. PM 10 has been linked to haze episodes, a significant challenge in Southeast Asia since the 1980s, due to the combustion of biomass (Shaadan et al. 2015). The transboundary pollution from Indonesia wildfire has been the main causative factor of haze pollution in Malaysia (Wen et al. 2016). The smokes generated from the wildfire are dispersed and transported through wind which pollutes the atmosphere across borders. Studies have shown that wind speed influences the transportation of ambient atmospheric pollutants to other regions (Zhang et al. 2014). The haze event in 2015 caused a decline in the air quality in Malaysia Peninsular Sulong et al. (2017). Also, according to a study by Hassan et al. (2020), severe El Niño in dry seasons affects the weather, especially in Malaysia peninsular, which in turn influences the PM 10 variation. This study's outcome also corroborates the earlier research by How and Ling (2016) in finding the impact of particulate matters on API during haze and non-haze periods. Their study identified particulate matter as the major contributor to air quality variation in Malaysia, highlighting a higher concentration of PM 10 during the haze period in comparison to the non-haze period. The lower concentration in some years ( Figure 5) is likely to have been influenced by the El Niño effect coupled with the transboundary and regional emission sources.
Thus, PM 10 concentration tends to be more dominated in the southwest monsoon with high warming in western Malaysia. In the northeast monsoon, there exists a lower concentration of PM 10 due to abundant rainfall. Climatic variables, such as temperature and wind, influence the concentration and variation of atmospheric PM 10, in conjunction with the El Niño effect.

Inter-relationship between climatic variables and PM 10
It is noteworthy that analyzing the association between climatic variables and PM10 is intricately complex and sometimes not linearly related (Czernecki et al. 2017). Nevertheless, a Pearson correlation analysis was carried to examine the linear association between the climatic variables and the PM10 concentration.
Most of the fixed monitoring stations are located around residential areas, which according to Rani et al. (2018) contribute to high accumulation of atmospheric PM 10 . Thus, according to the authors, this location aids the high Air Pollution Index (API) value due to the atmospheric distribution of air pollutants. This is due to meteorological conditions that determine the emission and distribution strength, spatial location, and the seasons of this emission (Folberth et al. 2015). Pearson's correlation was used to ascertain the degree of linear correlation between the climatic variables and PM 10 . The correlation analysis was performed based on the geographic location of the stations and seasons. The correlation index of the climatic variables (humidity, temperature, and wind speed) with PM 10 is shown in Tables 3 and 4.
We observed that temperature showed a positive correlation with PM 10 concentration in all the five stations, which agrees with some studies in some other countries. For instance, Lee and Kim (2018) obtained a positive correlation between temperature and PM 10 in Seoul, Korea while analyzing the effect of climatic variables on PM 10 variation. Similarly, Dotse et al. (2016) obtained a positive correlation with PM 10 while investigating the impacts of haze on PM 10 concentration in Brunei Darussalam. The outcome supports the notion that temperature influences the PM 10 formation and its variability in the atmosphere. This is due to the impacts of high temperature inducing combustion activities and evaporation from the earth's surface in Malaysia. The correlation result explains the reason for the urinary correlation between temperature and PM 10 in this study.
Analysis of the seasons reveals that temperature exhibits a higher correlation index with PM 10 during the wet northeast monsoon than the warmer southwest monsoon. Studies such as Huang et al. (2016) and Hernandez et al. (2017) had earlier concluded that the concentration of PM 10 declines in extremely hot weather conditions. However, humidity tends to show a negative correlation with PM 10 concentration in all five air monitoring stations (Table 3). This is logical because high humidity is associated with high precipitation, which decreases PM10 concentration in the atmosphere because high humidity is associated with high rainfall (Ng and Awang 2018;Gvozdi c et al. 2011).
This outcome aligns with the study of Wang and Ogawa (2015) in Nagasaki Japan to study the effect of meteorological factors on particulate matter. A negative correlation is also gotten in the study of Afzali et al. (2014) in Pasir Gudang, Johor. The authors used monitoring stations to determine the effects of meteorological variables on PM 10 . Comparatively, a negative correlation was observed in the southwest monsoon while a positive but very low correlation index exists in the northeast monsoon. This result indicates that the relationship between PM 10 and humidity is not linear since an increase in humidity reduces the concentration of PM 10 . Zhang et al. (2015) reported a similar relationship between humidity and PM 10 in three metropolitan cities in China. Two of these cities (Shanghai and Guangzhou) exhibit a negative correlation between humidity and PM 10 .
The wind speed shows a positive to negative correlation with PM 10 concentration in Bukit Rambai and Nilai. Also, a positive correlation is observed in the monsoons, with the highest correlation in the northeast monsoon. Generally, high wind speed shows a positive correlation with PM 10 in the study area. High wind speed can increase the velocity of transportation of air pollutants to other regions as opposed to low wind speed. An increase in wind speed accelerates air pollutant dispersion in the atmosphere (Afzali et al. 2014). This implies that low wind speed influences the accumulation of PM 10 in the atmosphere. The negative correlation of wind speed and PM10 in some of the air quality stations may be due to the large ventilation of the air masses in these stations and the deracination of the air pollutants over a remote area. As shown in the study of Cichowicz et al. (2020), wind speed exhibited an inverse relationship with PM 10 in some air monitoring locations in Poland. That is, higher wind speed corresponds to a lower PM10 concentration and vice-versa. Also, the location of the stations is another influencing factor considering the conurbation and emission from vehicle fleet. This aligns with the outcome of earlier studies such as Czernecki et al. (2017) study of the impacts of atmospheric conditions of PM10 in Poland.
Thus, the result of the correlation analysis at the five monitoring stations indicates that high wind speed and temperature are major contributors to the dispersion of atmospheric pollutants in Malaysia, confirming the findings of Folberth et al. (2015) and Rani et al. (2018). Also, both temperature and wind exhibit a positive correlation with PM 10 in the northeast monsoon and southwest monsoon.

Prediction of PM 10 concentration using MLR
Due to the numerous assumptions about the regression model, which can lead to bias or unacceptable estimation of dependent variables, multicollinearity among the independent variables, was investigated. The multicollinearity explains the degree of correlation between the independent variables. A high correlation between the independent variables pose a problem, while a moderate correlation can be solved (Daoud 2017). Table 5 shows the relationship between the independent variables.
According to Table 5, all the three variables are uncorrelated except the relationship between the wind speed and temperature. This aligns with a study in Malaysia also by Zakaria et al. (2017), with correlation index of 0.633 between the wind speed and temperature. We performed the regression model, then exclude one of these variables to observe the changes in the model output. This is done in order to resolve the multicollinearity, because the omission of a variable that is correlated to the other in the model can be used to solve the problem of multicollinearity (Daoud 2017). Therefore, the regression model was performed out for all the variables before the omission.
Multiple linear regression (MLR) was used to predict the concentration of PM 10 . The predictors considered for this model are the climatic variables which include temperature, humidity, and wind speed. The dataset was divided into training and testing. 80% of the data was used to train the data, while the remaining 20% was used to test and validate the prediction in python software.
The statistical indices used are R 2 , MAE, and RMSE. The generated MVR equation for the three predictors is represented in Eq. (2). PM10 ¼ À251:20 þ 1:31 Wind Speed þ 4:03 Temperature À 0:89 Humidity (2) Figure 6 shows the residual plots of the PM 10 regression model prediction. The acceptability and reliability of a linear regression model are usually validated by residual plots (Law and Jackson 2017). Residual value is the difference between the observed and the predicted value. The residuals (error) are essential for determining the performance of a statistical model (Abdullah et al. 2017). The adequacy of a linear regression model is a factor of homoscedasticity (Law and Jackson 2017). It explains if the residuals are the same and evenly distributed. The normal probability plot of the residuals showed a normally distributed variance (e.g. it shows how well the data is distributed). Normal distributed data is obtained when the data are close to the straight line (Adio et al. 2019).
In this study, the plot of the normal probability is linear (Figure 6), thus, we postulate that the error terms are normally distributed. The versus fits show the fitness of the model, which is used to test the goodness of fit (Bowen 2018). In order to show the validity of the assumption of this model, a scatterplot of the residuals is plotted against the fitted value. The distance from 0 explains how good, or bad the prediction is, that is if the regression model is under-predicted, overpredicted, or correctly predicted.
Our model showed a symmetrically distributed prediction because the clustering tends towards the centre of the plot, which is zero (0). From this, we establish that our regression model is acceptable and appropriate to an extent for the variables used. This is because the points are randomly distributed around the horizontal axis. Also, the regression model is at 95% confidence interval with a p-value of 0.000 and standard error (S) of 12.8.
The statistical models used to rank the performance of the prediction of PM 10 concertation is shown in Table 6. Generally, the MVR's predictive accuracy is acceptable with very low RMSE and a positive and high coefficient of determinants. The coefficient of determinants revealed that approximately 30% of the variations in PM 10 is influenced by the climatic variables. A low RMSE and MAE values are indicative of the model's acceptable predictive performance. Thus, this model has a RMSE and MAE of 12.737 and 10.343, respectively. Some previous studies have also established the accuracy of MVR for air pollution prediction. The study of Karatzas et al. (2018) to predict urban air quality in Poland recorded a high predictive performance for MVR, outperforming the Artificial Neural Networks (ANN) results. The RMSE of the model ranges from 6.690-22.391, while the model has MAE of 6.380 to 15.050. Afzali et al. (2014) also obtained an acceptable predictive performance (R 2 ¼ 0.18) in Johor using a multiple regression model. Thus, MVR has proven to have better performance for the prediction of PM 10 concentration in the study area.
A histogram representing the actual and the predicted concentration of PM 10 is presented in Figure 7. Analysis of Figure 7 shows the difference between the observed and predicted, with little margin.
Temperature and wind are critical factors that showed a strong influence on the prediction. This is because the regression model chooses the coefficients of these two variables, as shown in equation 2. The temperature has the most considerable absolute value, followed by wind speed. To further confirm the effects of the individual variables on the linear regression model, the Pareto chart was used. According to Harvey and Sotardi (2018), the Pareto chart is used to determine the relative importance of parameters to the model result. It consists of variables with their association against the standardized effects (Adio et al. 2019). The horizontal axis represents the t-test at 95% confidence level. The vertical axis consists of the predictors used for the regression model. At a 95% confidence level, the value of the t-test is 1.963 with a vertical dash line representing the significance level. Figure 8 showed that all the predictors are significant. In order of significance, temperature, wind speed, and humidity showed a higher value, respectively. To further investigate the significance of the variables in the output model, an exclusion of a variable was carried out. Table 7 shows the regression output of the variable omission The exclusion of one of the variables affects the model outcome. The exclusion of temperature reduces the model performance with a R 2 of 0.237 compared to the overall model performance of 0.298. The MAE and RMSE of 10.618 and 13.280, respectively, also exceed the model performance of the three variables, suggesting that temperature is an essential variable in the model. The omission of humidity also reduces the performance of the mode to a R 2 of 0.238. Also, the MAE of 10.782 and RMSE of 13.272 exceed the model outcome of the three parameters, meaning that the performance of the model reduces due to the exclusion of humidity. Also, the regression model performance reduces with the exclusion of wind speed given output with R 2 of 0.240, MAE of 10.678 and RMSE of 13.255.

Spatial prediction of PM 10
After the regression model has been tested, the coefficients of the independent variables were used to produce the PM 10 hazard susceptibility map. The hazard susceptibility map was further classified into five classes, including the very high, high, medium, low, and very low. This was generated in the ArcGIS interface using a natural break (Jenks) classification. The map of the independent variables is shown in Figure 9.
Notably, the variation of the wind speed in Figure 9(a) may be due to El Nino effect and the terrain of the region (Shukla et al. 2020). Also, the offshore wind speeds are usually greater than the coastal wind speed (Najid et al. 2009). Since wind speed increase with height, the wind speed may be lower considering the low elevation of the coastal regions.   Figure 10 shows the PM10 hazard map. The PM10 hazard map is classified into five classes ranging from very low, low, moderate, high, and very high classes. The regions with a high concentration of PM10 are more susceptible to greater health impacts than others with low concentration. For instance, in Malaysia's API table (APIMS 2020), areas with high air pollution are susceptible to dangerous health conditions such as lung and respiratory complications. Similarly, very high PM10 hazard level is risky for public health, with the possibility of causing deaths if not well managed (Manisalidis et al. 2020). Conditions in areas with a moderate concentration of PM10 may deteriorate to an unhealthy level while low and very low hazard classifications pose no significant threats to the community's health.
The very high hazard classes covered 8153.82 km 2 (20.57%), high covered 3263.01 km 2 (8.23%), moderate covered 10,897.76 km 2 (27.49%), low covered 16,005.31 km 2 (40.38%), and very low PM 10 hazard classes covered 1318.69 km 2 (3.33%). The higher percentage of the PM 10 hazard in the study area is majorly between moderate (27.49%) and low (40.38%) PM 10 hazard level (Table 8). In this study, the climatic variables have shown the effectiveness of predicting PM10. However, further work is required to include some other spatial data such as road networks.

Spatial distribution of PM 10
An Air Pollution Index (API) susceptibility map for the study area was produced using data acquired from 2012 to 2016. To get the API, the sub-index value of PM 10  was calculated using the formula in Table 9 ( Rani et al. 2018). Most often, PM 10 is the dominant pollutant in the API value (Department of Environment 2019). For instance, PM 10 was discovered to be the most contributing factor to the haze event in 1997. Thus, the API values were based on the PM 10 sub-index (Department of Environment Malaysia 1997). However, not quite long, PM 2.5 concentration became the highest among other pollutants and is now used to determine the API value. But due to data accessibility constraints and considering the sustained interest in the impacts of PM10, this current study utilized PM10 to determine the API value. Figure 10 represents the generated API risk map of the area of study. The API values range from 41 to 58 denoting a good to moderate air quality. According to the Department of Environment (2019), this does not pose any threat to human health. This generated API map ( Figure 11) has a significant correlation with the PM 10 hazard level above ( Figure 10). Also, according to the Air Pollution Index of Malaysia (APIMS), the API level ranges from moderate to good (http://apims.doe.gov.my/pub-lic_v2/api_table.html). Thus, plans and regulations should be utilized to maintain good air quality for the sustainability of human and the environment.

Conclusion
This study investigated the impacts of three climatic factors on PM 10 concentration in Malaysia, considering seasonal variation. The choice of PM 10 is due to being the dominant pollutant in deriving the API value adopted in Malaysia. Considering the recent availability and potential accessibility to PM2.5 data, it is recommended that future studies focus on PM2.5. Correlation analysis and a multivariate linear   regression model was used to determine the relationship between the climatic variables (such as temperature, wind speed, and humidity), and PM 10 . A relatively high and positive correlation index was gotten for temperature and wind speed; however, humidity negatively correlates with PM 10 . The regression analysis also showed a high predictive performance, which suggests an inter-link between the climatic variables and PM 10 . The results of the MVR model showed, a positive correlation and high accuracy with statistical indices of R 2 (0.30), RMSE (12.74), and MAE (10.34) was obtained. The critical factors were identified, and the Pareto chart was used to determine the model's performance vis-a-vis the significance of each criterion. Although all the variables exhibit a significant influence on PM 10 , however, the chart indicates that temperature and wind speed are the most influential factors impacting PM 10 in the study area. This corroborates the outcome of the correlation analysis. Given the nexus between the climatic variable and PM 10 established in this study, the projected variation in the mass concentration of PM 10 due to climate change in the future is feasible. According to the spatial modelling result of PM10 hazard level, a large percentage of the study area has very low (3.33%), low (40.38%), moderate (27.49%), high (8.23%) and very high (20.57%) of PM 10 hazard level in the study area. The API map also showed a good and moderate air quality, which is not dangerous to human health. However, the increasing trend of climate change and urbanization in the World should be considered for a possible increase in PM 10 concentration also in the future. Some other findings of this study are: the annual mean concentration of PM 10 exceeds the MAAQG guidelines. Also, the season plays a vital role in influencing the accumulation and concentration of PM 10 . According to this study, the PM 10 concentration in the southwest monsoon is 9% greater than that of the northeast monsoon. Given the haphazard behaviour of the climatic variables with respect to the seasons Figure 11. API map of the study area. and climate change effects, a need to formulate and re-formulate policies/guidelines to monitor the PM 10 concentration with respect to these effects is necessary. Considering the effects of climate change on warming one of the causative factors of bushfire in Malaysia, awareness about the multiple impacts of this on the atmosphere and health should be made to citizens. This will ensure a fast means to cub bad air quality from this source. Finally, carbon emission should be regulated in urbanized and industrialized areas, while green infrastructure should be encouraged and maximize for solving urban and climate impacts in this context.