Predicting PM2.5 levels over the north of Iraq using regression analysis and geographical information system (GIS) techniques

Abstract Particulate matter (PM2.5) concentrations are a serious human health concern and global models are the common methods for PM2.5 particle estimation disregarding the local changes and factors. In this study, a polynomial model for PM2.5 particles prediction was proposed to examine the correlations among PM2.5, PM10, and meteorological parameters. The study was carried out in the north of Iraq including two provinces; Kirkuk and Sulaymaniyah. The data gathered from different sources. Two datasets have been used, collected during July 2019 and February 2020. To test our methodology, the model was applied on a small subset of the study area (5.6 km2) inside the Kirkuk province. Datasets (observation and ground truth) were utilized to examine the model. Based on the July 2019 dataset, the mean local R2 values were estimated at 0.98 and 0.97 in the north part of Iraq, and inside the Kirkuk province (the small subset), respectively. While based on the February 2020 dataset, the mean local R2 values were estimated at 0.98 inside the Kirkuk province. High values of prediction accuracies were obtained by 82% and 96% in July and February, respectively. Moreover, our findings highlighted that the health impacts and air quality varied from moderate to unhealthy in the region.


Introduction
Systematic and long-term air quality monitoring will enable the human for a sustainable plan to reduce and control particle pollution, contaminants, and air pollution.
Various studies point out the correlation between the concentration of fine particles and epidemics, where their effect on health is the subject of interest of researchers (He et al. 2001;Marcazzan et al. 2001;Ito et al. 2006;Borrego et al. 2016;Jumaah et al. 2018;Crippa et al. 2019;David et al. 2019). Particulate matter (PM) consists of a mixture of solids and liquids in the atmosphere that is induced to the air by natural and anthropogenic sources (Querol et al. 2004;Hu et al. 2013). Particulate matter 2.5 (PM 2.5) is mainly derived by combustion processes; it contains the elements with a carbon core (with related hydrocarbons and elements), hydrocarbons, and minor atoms shaped by sulfur oxides and nitrogen (Adams et al. 2015). The world health organization (WHO) concluded that PM is itself responsible for the health impacts in related diseases and epidemics and it is supported by toxic traces (Boldo et al. 2006). Due to the complexity of PM composition, it is necessary to control its sources, which contribute to the toxicity components in PM composites (Adams et al. 2015). Given the multiplicity of sources, PM occurs in different physical and chemical patterns and is based on climatic and geographic factors such as air temperature, wetness, radiation, rainfall, land topographic properties, and adjacency of a region to desert areas (Querol et al. 2004).
Air quality data is provided by monitoring stations given as air quality index (AQI) or as other indices with different meanings according to epidemiologic studies. When AQI rises, the contamination of air will be severe and result in adverse effects on the health (Wang and Chen 2017). Currently, air contamination has become one of the main disturbing consequences of urbanization, therefore, air contamination monitoring and assessment are necessary (Jassim and Coskuner 2017). The AQI is a helpful indicator to describe the daily quality of air and it might answer the concerns related to health impacts (Liu 2002). The initial use of AQI was in 1999, and it defines the main six air contaminants (Jassim and Coskuner 2017): fine particles PM 2.5 , coarse particles PM 10 , carbon monoxide (CO), nitrogen dioxide (NO2), ozone (O3), and sulfur dioxide (SO2). Most environmental studies focused on understanding of PM and the ability to predict PM concentrations in a certain region Ul-Saufie et al. 2013;Nazif et al. 2016;Ganesh et al. 2018;Jumaah et al. 2018;Sahu and Patra 2020). Generally, mathematical models are improved to give an effective description of the quantitative or statistical correlation, which exists among the independent factors and particle levels (Tian and Chen 2010). In terms of PM 2.5, the contamination is harmful and threatening especially to human wellness due to the presence of toxic materials, high acids, and their small size permits them to penetrate the respiratory system (Jassim and Coskuner 2017). As a result, various countries in the world described the quality standards to define the values for PM concentration together with specific networks to monitor air quality and condition on the regional and global scales (Manikonda et al. 2016). Additionally, the worldwide measurements of PM are significant to epidemiological studies in terms of the strategy in controlling air quality and forecasting (Van Donkelaar et al. 2006). Given serious environmental pollution problems, air quality monitoring is of great importance in order to be studied and understood (Yang et al. 2018).
In this regard, the researches analyze and measure the correlation of different weather variables and PM 2.5 (Li et al. 2019). Multiple regression analysis is generally used to estimate the dependent parameter based on the impact of independent parameters (Jumaah et al. 2018). The applied regression processes deploy geographical information system (GIS) and statistical methods (e.g., inverse weighted distance (IDW), multiple regression analysis, and polynomial model). It is based on the idea that air quality at a certain place could be influenced by the near and adjacent pollutant sources to that place and would affect the health more than distant sources. Using regression models and spatial data analysis enables us to inspect and describe the spatial relationships (link to the geographical locations) of the variables. Defining the model configuration and estimation of PM 2.5 levels are crucial. Deterministic model building, which clearly describes the contamination, requires usually intense datasets. For instance, one of the important requirements as input data is to define the source of pollution, whereas it might be difficult to determine or quantify the exact origination as a result of pollutant emissions. Tian and Chen (2010) specified that it mainly requires local scale investigations in order to consider the urban areas as an inconstant source of pollution.
Owing to the presence of various software packages to fit the multiple linear models, the least-square fitting of the linear mean function into geostatistical data leads to highly satisfactory results (Gelfand et al. 2010). Geostatistical analysis removes a lot of corresponding errors and restrictions compared to traditional statistics on the basis of the theory of irregular distribution (Setianto and Triandini 2015). In recent times, geospatial analysis using GIS techniques is one of the significant and efficient methods to define air effluent emissions (Tuna and Buluc 2015). Statistical models were grouped by Skidmore (2017) into empirical, numerical, or derived models. The empirical models are based on observations and experiments, where the expectation is accepted, once it is confirmed by the real experience. So, empirical models are considered site-related, due to collecting local data and adequate samples. Therefore, Hvidtfeldt et al. (2018) considered that covering and mapping air contamination exposure for various epidemiological considerations was infeasible and requested to refine the modeling processes. A geographically weighted regression approach applied by Jumaah et al. (2018) using an unmanned arial vehicle (UAV)-based dataset of PM 2.5 correlated to meteorological parameters with 82% to 94% model performance accuracies. Besides a framework of applied air-GIS in research by Hvidtfeldt et al. (2018), it revealed a high level of accuracy for modeling the levels of black carbon which was assessed based on PM 2.5 and PM 10 . The author noted that the correlation was expected between epidemiological investigations on the health and PM impact. Also, the PM 2.5 levels changes based on seasonal variations were explored by Saxena and Jagdeesh (2019) and it was suggested that additional investigation regarding this anomaly was needed in the region. Consequently, air quality models are progressively essential for public awareness, air quality management, and research purposes. (Donnelly et al. 2019). Traffic-related air pollution makes a significant contribution to the early mortality rates in developed and developing countries and most researches (Chambliss et al. 2014;Silva et al. 2016;Anenberg et al. 2019) highlighted the transport emissions to the exposure of PM 2.5 . Also, the effect of some pollutants on Kirkuk air quality was studied by Mohamedali et al. (2020) through mapping the pollutant levels at different places and stations within the city. It was mentioned that the pollutants concentrations exceeded the standards in most of the stations as a consequence of refineries and many traffic crosses. According to Fan et al. (2020), AQI in China slightly increased by 36% when turning to a winter heating system of coal-fired and then it increased the mortality rate up to 14%. Similarly, the exposure to fine particulate (PM 2.5 ) pollutants in the long term associates with heart disease and higher mortality rates; when the extra risks occur and the ranges excess the specified US agency standards it calls for continuous air contamination reduction (Hayes et al. 2020).
Ultimately, basic statistics and detailed maps of environmental pollutants distribution and regression analysis assist to predict pollutants levels (Fathoni et al. 2013;Tuna and Buluc 2015;Jumaah et al. 2018;Wu et al. 2018;Jumaah et al. 2019;Gui et al. 2020). However, adaptive-neuro-fuzzy inference system is rather precise in forecasting time-series records than the regression approach (Zeinalnezhad et al. 2020) assuming that there are nonlinear and complex components in air contamination modeling. Their study intended to respond that restriction by improving the precision of the day-to-day estimate of contaminants. Using three algorithms (i.e., support vector machine, naive bias, and random forest), air quality predictions for different inputs were analyzed by Bali (2020) with a high prediction accuracy of 99%. Therefore, the precise modeling and estimation of AQI and its relationship with other factors need more investigation.
Recently, some cities in Iraq are at a higher risk of air pollution and this phenomenon occurs widely and frequently in the regions. In terms of air pollution, based on the global air quality service provider (downloaded from https://air-matters.com/), the three cities of Kirkuk, Baghdad, and Najaf exhibited very poor air quality in 2019. In this study, we applied statistical analysis on two types of datasets (observation and ground truth) to predict and examine air pollution in Kirkuk and Sulaymaniyah with enhancement in experimental spatial patterns and exploration in the properties of prediction parameters. The regression processes along with GIS techniques allow for better describing the correlation between the main pollutant and other variables to explain their spatial effects. At this point, the principal goals of this research were: 1) to perform an air quality forecast model in the urban area with PM 2.5 estimation capability in all sites of the study area using linear regression processes, polynomial model, and GIS techniques, 2) to model the spatial variation on a large scale, and 3) to examine the health impacts of particulate matter (PM) on human.

The study area
The study area lies in the north of Iraq covering two provinces: Kirkuk (9,679 km 2 ) and Sulaymaniyah (20,144 km 2 ). The study area is located between latitudes of 36 27 0 15" À 36 37 0 50" N and longitudes of 43 10 0 37" À 46 17 0 50" E ( Figure 1). Geographically, Iraq is located in Southwest Asia. The average temperature ranges between 50 C in summer and 0 C in winter and the annual rainfall varies from 100 to 180 mm. The extreme rainfall happens between December and April, and the mountainous regions in northern Iraq have higher rainfalls than other regions (Al-Bayati and Al-Salihi 2019). Kirkuk climate characterizes as hot semi-arid and extremely hot with dry summers and cold winters (Buraihi and Shariff 2015). The climate of the Sulaymaniyah region is a continental arid climate with dry hot summer, cold winter, and high evaporation in summer due to high temperatures and relatively low humidity (Ali et al. 2015). According to Ajaj et al. (2018), exploratory analysis in Kirkuk by GIS-based spatial technique reported the high incidence of blood diseases patients in 2017. Based on their findings, the extreme incidence of blood disease happened in the southern parts of the city. On the other hand, the minimum prevalence of blood disease was recorded in the northern parts of the city and several quarters in the city center. It raises concerns regarding the environmental health threat and its correlation with air quality in the region as a research subject.

Data and processes
Weather data such as temperature, surface wind speed (m/sec), and humidity (%) significantly improve the model performance (Hu et al. 2013) and meteorological parameters are essential features that affect the PM 2.5 levels (Kong and Tian 2020). Therefore, daily PM 2.5 , PM 10 , temperature, and humidity values of Iraq (July 2019) were acquired from Air Matters and The global air quality service provider (https:// air-matters.com/). Besides, some parts of missing data such as PM 2.5 and PM 10 for some locations were compiled from Meteoblue, the worldwide local weather information site (https://www.meteoblue.com/). The wind data was gathered from The weather online Ltd. for meteorological services (https://www.weatheronline.co.uk/). All data were in point format and have been collected from nine stations inside and around the study area. Based on the collected datasets, the deployed procedures are described in Figure 2. All collected datasets were processed geo-statistically using ArcGIS version 10.3 and an IDW analysis was applied to the acquired data and existed stations for air quality and weather in the region to obtain continuous and detailed parameters for the region.
The significance of the IDW method is that a smooth and connected grid can be implemented where the extrapolation of information is created based on the data in the given area (Zaki et al. 2019). It is one of the highest commonly used methods in geoscience calculations due to its simple hypothesis (Sun et al. 2019). The IDW analysis is the finest interpolation process to predict the air contamination state and it is more reliable than the ordinary kriging (OK) or universal kriging (UK) of interpolation methods (Gong et al. 2014;Vorapracha et al. 2015;Jumaah et al. 2019). Interpolation determines the result of the cell at the part that requires descriptive data (Ajaj et al. 2017;Jumaah et al. 2019). In addition, interpolation is created on the concept of a spatial dependent; it measures the proportion of ties dependency amongst the adjacent and separate features (Ajaj et al. 2017). Moreover, a remotely sensed image of moderate resolution imaging spectroradiometer (MODIS) captured on 10 February 2020 was downloaded from NASA satellite erathdata. Additional measurements were applied during February 2020 using Air Quality Multimeter inside Kirkuk at 5.6 km 2 (Figure 3).

Regression analysis and modelling
Based on all station data, the IDW interpolation was applied. Afterward, 36 input points were randomly chosen within the study area from the outputs of IDW and have been used to build the polynomial model. Besides, the ground truth dataset (southern part of Kirkuk Province at the area of 5.6 km 2 ) was used to validate the model and measure its transferability using the same procedures and parameters as it was applied for those two provinces. Then, Measurements were done manually across a small subset of the region using air quality multimeter. Measurements applied during July 2019 and February 2020.
The PM 2.5 variable was used as the dependent variable and other parameters (i.e., PM 10 , temperature, humidity, and wind) were used as independent factors. Similarly, for the validation process, the polynomial model to predict PM 2.5 levels also was applied in a subset area. The environmental protection agency (EPA) of the U.S defines the values of AQI and PM 2.5 levels based on their impacts on human health. Table 1 represents the daily PM 10 , PM 2.5 levels (lg/m 3 ), and AQI modified by EPA as the reference for this study.
The numerous common statistical models represent an analysis of unsystematic response variables into an analytical structure explaining the mean and unsystematic structure, which defines variation and co-variation amongst the responses. The linear model equation is expressed as: where Z is a vector of response, here it refers to the predicted PM 2.5 . X is the design matrix for the regression variables (the coefficient). b is the vector of the parameters, here it refers to the PM 10 , temperature, humidity, and wind. d is the vector of the random error. The linear model (square root) for data of many variables, the equation specified as: Figure 3. The air quality multimeter.
Where l is the intercept coefficient, t k is an effect due to treatment k, b jk is an effect associated with the jth column of treatment k, and (d ijk ) is a random error (Cressie 1992).

Model performance and accuracy assessment
The regression model of the proposed polynomial method and its performance were assessed by the coefficient of determination or R-squared (R 2 ) and probability value (P-Value) (Jumaah et al. 2019). P-value defines the probability of correlation and the value less than 5% (P < 0.05) statistically indicates the significant correlation coefficient. R 2 (Coefficient of Determination) is preferred to be a high value that reflects the accuracy of model performance. In the final stage to estimate the model, the forward computations were utilized to obtain a higher value of R 2 adjusted for the model complexity. Moreover, a common method to measure the fitness and accuracy of the model is information fitting (Jumaah et al. 2019). Fitting processes applies mathematical analytical functions. The polynomial equation can be set as: for certain coefficients, c 0 , … , c n : If c n ¼ 0, theoretically the function is in order n: For fitting coefficients, the confidence bounds can be defined as: where b coefficient is generated by the fit, t reliance on the level of confidence, and S is the diagonal elements vector from the expected covariance matrix. Simultaneously prediction bounds for the predictor's value and the function are specified by: where f is related to the confidence level and is computed using the inverse of the F cumulative distribution function (Shareef et al. 2014;Jumaah et al. 2018).

Geo-statistics outputs
Based on the July 2019 dataset and ArcGIS geostatistical analysis, the IDW method was adopted to represent the distribution of features (factors) and interpolate between site sample points and then the result maps were created for each parameter in the study area ( Figure 4). Accordingly, PM 2.5 and PM 10 concentrations, temperature, humidity, and wind statistics were mapped. Figure 4 represents the maps (IDW statistical outputs) of PM 2.5 and, PM 10, temperature, humidity, and wind. Generally, the obtained statistical values of PM 2.5 varied between 10 and 47.99 mg/m 3 in the entire region. More specifically in the study area, the values of PM 2.5 ranged between 10 and12.13 mg/m 3 (good air quality in green) covering a small part of the area, 12.13-35.49 mg/m 3 (moderate in yellow) with predominant coverage, and 35.49-47.99 mg/m 3 (unhealthy air for sensitive people in orange). On the other hand, PM 10 statistic values were calculated from 27.65-67.99 mg/m3 in the whole region, mostly representing good quality with values less than 55 mg/m 3 (in green). The temperatures approximately raised from 11 to 45 C from north-east to west, in the study area. In contrast, the percentage of humidity increased from 20 to 70 from west to east. Besides, wind speed values were calculated from 2 to 3.56 m/s with a slight increase from west to north-east, south, and south-east.
Based on the remotely sensed image, PM 2.5 values were mapped. The values ranged between 18-200 lg/m 3 which approved the unhealthy air inside the study area during February 2020. Figure 5 represents the PM 2.5 distribution map based on the MODIS image. Table 2 represents the regression outputs of the July 2019 dataset, where most of the independent parameters exhibited a high level of correlation with PM 2.5 . Table 3 represents the regression outputs of the February 2020 dataset.

Regression outputs
To construct the model, the correlation was tested to measure the strength of the linear relationship between PM 2.5 and the other variables. All parameters showed a P-value lower than 0.05 indication the significant correlation with PM 2.5 . However, in this step of the analysis, the main objective was to find the correlation between the parameters to influence the linear relationship.
For creating a regression model, it is essential to study the influential factors, with P-value < 0.05 as inputs, and preferably high values of R square. Based on equation 2 (multiple linear model equation), the regression was implemented. The case involved an analysis of the relationship between each independent parameter and the where PM 2:5 ðEstimatedÞ is the calculated particle matter concentration in mg/m 3 with a diameter of 2.5 microns, PM 10 is particle matter concentration in mg/m 3 with a diameter of 10 microns, T in C is the surface air temperature, H % is the humidity of the environment, and W m/s is the wind velocity. It is required to detect the possible reliance of the predictors for model condition construction. The predicted model is also beneficial for PM 2.5 estimations in non-monitored locations (Thongthammachart and Jinsart 2020). The prediction equation was applied on a small subset of Kirkuk Province for some randomly selected points again. The estimation accuracy was achieved by a high value of R 2 (0.96) for July 2019 dataset. Also, the model was constructed based on 20 points collected in February 2020 by air quality multimeter device. The estimated accuracy was equal to (0.98) R 2 . Afterward, model test validation was performed using additional values of PM 2.5 . Moreover, Figure 6a represents a prediction map of PM 2.5 in Kirkuk and Sulaymaniyah provinces north of Iraq of July 2019, Figure 6b represents a prediction map of PM 2.5 in Kirkuk at an area of 5.6 km 2 of July 2019, and Figure 6c represents a prediction map of PM 2.5 in Kirkuk at an area of 5.6 km 2 of February 2020.
It is important to know that the estimated model did not achieve the real or the absolute prediction but it could be considered as the near value to the real (Shareef et al. 2014;Jumaah et al. 2019). The predicted PM 2.5 values ranged between 35.92 and 47.65 mg/m 3 indicating unhealthy air quality for the sensitive groups in the two provinces north of Iraq during July 2019. The estimated values of PM 2.5 in the small subset area inside Kirkuk during July 2019 are ranged between 47.46-47.57 mg/m 3 . While February 2020 prediction map in the small subset area inside Kirkuk showed high values of PM 2.5 ranged between 29.43-61.9 indicating three types of air quality; moderate (in yellow), unhealthy for the sensitive groups (in orange), and unhealthy (in red).

Cross-validation outputs
Validation was performed with the model-building in the study area; it means that the estimated PM 2.5 values were fit against the measured (ground truth) PM 2.5 values (Figure 7) PM 2.5 cross-validation. The correlation coefficient was calculated to evaluate the potential of prediction in the regression model. The results showed that all the parameters within the model equation correlated statistically, which indicated that the predictions made from the model were in good agreement with the inputs. As a result, the regression model generated a high correlation coefficient with an R 2 value of 0.98 in Kirkuk and Sulaymaniyah provinces. Furthermore, a high correlation coefficient with an R 2 value of 0.97 was acquired by the validation process of the same equation that was used to estimate PM 2.5 inside Kirkuk Province within the area of 5.6 km 2 based on the July 2019 dataset. Also, a high correlation coefficient with an R 2 value of 0.98 in Kirkuk at 5.6 km 2 was acquired by the validation process of the February 2020 dataset collected by the device. Moreover, the model validation was performed to fit the predicted PM 2.5 against the measured PM 2.5 data as tested. Figure 8 represents model validation in Kirkuk at 5.6 km 2 .
The obtained correlation coefficient R 2 was equal to 0.82 and 0.96 of July 2019 and February 2020 respectively. To compare with previous predictions in the north of Iraq, the lower correlation coefficient value might be due to PM 2.5 values variation during measurements, which refer to the average PM 2.5 value at each point. However, they describe the model ability in predicting by 82% and 96%.

Discussion
The concentration of PM 2.5 in Figure 4 depicted unhealthy air for sensitive people in the entire Kirkuk and west part of Sulaymaniyah province. However, by passing from the west borderline of Sulaymaniyah province, it was observed that the presence of PM 2.5 was at a moderate level in the north, south, east, and center of this province. The prediction of PM 2.5 using the regression method ( Figure 6) indicated unhealthy air quality for the sensitive groups in the center borderline of the two provinces in July 2019. While the risk increased in Kirkuk during February 2020. Some locations appeared unhealthy air quality south of Kirkuk province. Besides the mapped PM 2.5 distribution of the study area in February showed the unhealthy air quality in the two provinces. The increment in air pollution (PM 2.5 ) lately in Sulaymaniyah related to unhampered industrial development which resulted in poor air quality (Arif et al. 2018) causing serious health problems . Based on blood disease maps during 2017, the detected increase in patients was determined in southern areas of Kirkuk city with minor distribution to blood disease patients in the city center and northern parts. In general, the disease conditioning factors were the reason for the disease occurrences in the study area (Ajaj et al. 2018). The economic and industrial growth that took in Sulaymaniyah city made poor air quality associated with crucial health problems, leading to an increase in the risk of death from cardiopulmonary diseases and lung cancer, specifically when people are exposed to high levels of pollutants over time (Arif et al. 2018). The roads and transportation sector were the most important sources of the PM 2.5 concentrations (Al-Arkawazi 2020) and long-term exposure to fine particles leads to respiration problems (Attiya and Jones 2020).
The health impact of PM 10 was determined as good and moderate air quality. Based on the results of PM 10 and its distribution, exposing zones to higher PM 2.5 with unhealthy air quality (the entire of Kirkuk and west part of Sulaymaniyah province) was classified as moderate AQI in terms of the presence of PM 10 . The rest of the region was qualified as good AQI, regarding PM 10 . Thus, there was no critical impact on human health in the study area in terms of PM 10 existents.
According to the IDW outputs of meteorological variables, Kirkuk province represented higher temperatures (42.55-44.99 C), lower humidity (20.01-32.12%) with lower wind speed (2.01-2.39 m/s) to compare with Sulaymaniyah province. The concentration of PM 2.5 showed a correlation with those factors and they might contribute to the air quality and pollution in Kirkuk. Also, according to the proposed model, it indicated a high correlation between the PM 2.5 variable and the independent factors. So, in order to test each parameter and its contribution, we investigated the relationship between PM 2.5 and each parameter. It revealed that the temperature was the most important independent variable and it highly contributed to the assessment, followed by PM 10 , wind, and humidity. In Figure 7 the cross-validation applied for the model to gain the typical precision and to examine the range of equivalent at the recorded positions. The results showed the full recorded datasets were in the confidence boundary with an R 2 value of 0.98 in the north of Iraq (in the two provinces) and an R 2 value of 0.97 and 0.98 in the small subset, inside Kirkuk for July 2019 and February 2020 respectively. As it is shown in Figure 8, using the ground truths, the degree of confidence with observations was adopted, besides, to test the performance of the model. The result indicated that the predicted values were verified with R 2 equal to 0.82 and 0.96 for July 2019 and February 2020 respectively.

Conclusion
To calculate the ground-level PM 2.5 concentrations by a polynomial model, this paper studied the correlation of PM 10 in addition to significant meteorological parameters such as humidity, air surface temperature, and wind speed as the independent variables. Two equations to estimate PM 2.5 concentrations were developed and evaluated based on July 2019 and February 2020 datasets. The results showed that the use of all meteorological datasets could significantly improve the model performance in the region. Additionally, our finding indicated that PM 10 had also a significant relationship with PM 2.5 prediction.
Furthermore, the prediction was at high accuracy with R 2 ¼ 0.98. Accuracy assessment also was done in a small subset in Kirkuk using some measured samples as ground truth data. Upon chosen points, results displayed a high accuracy with R 2 ¼ 0.97 and 0.98 of July 2019 and February 2020 respectively. The result of the model cross-validation with R 2 evaluated the model by R 2 ¼ 0.82 and 0.96 of July 2019 and February 2020 respectively. Furthermore, the implications of the health impacts of PM 2.5 prediction and its distribution in the two provinces were within moderate to unhealthy air quality for people with respiratory diseases, and the sensitive people are advised to limit outdoor exertion. Besides the unhealthy increased air quality in 2020.
The research highlighted the effect of industrial zones and recommended monitoring, control, and reducing particles and pollutants exposures from factories using alternative methods and mitigation strategies. By promoting clean and renewable energies instead of fossil fuel, increasing people's awareness to deal with the impact of pollutants, increasing afforestation around the cities to reduce the effects of pollution along with early warning and prediction, the future would be more promising. To improve public health, and for a more and broad understanding of the relationship between compounds of PM 2.5 and PM 10 with their toxicological impacts, further investigations on sampling sites and causes of air contamination should be made in the study area. Moreover, our method could brighten the study of epidemiology and recent COVID-19 pandemic in terms of PM concentrations and spatial distribution of infections. Future work will be based on other influential factors in exceeding PM concentrations and their spatial distribution in the region incorporation with more meteorological parameters.