Hybrid electric vehicle data-driven insights on hot-stabilized exhaust emissions and driving volatility

Abstract Despite the fuel use and emission benefits of hybrid electric vehicles (HEVs), few studies have characterized in detail emission patterns and driving volatility profiles from HEVs in different road types under real-world driving conditions. This article examined the relationship between hot-stabilized emissions, engine variables, internal combustion engine status, roadway characteristics, and vehicular jerk types. Data were collected from a Toyota HEV subcompact on a driving route over rural, urban, and highway roads in the Aveiro region (Portugal). Driving volatility was represented by six driving styles based on combinations of acceleration/deceleration and vehicular jerk. Clustering and Disjoint Principal Component Analysis (CDPCA) was applied to extract principal components and identify specific clusters among variables. Both route type and internal combustion engine (ICE) operating behavior showed to have an impact on the distribution of vehicular jerk types. The proposed CDPCA constrained to the road sector exhibited different shapes in the clusters of the jerk types between ICE operation status. This article can provide insights into emissions monitoring analysis of the new generation of HEVs about the description of volatile driving behaviors.


Introduction and literature review
Vehicles powered by Internal Combustion Engines (ICE) contribute to environmental pollution, thus endangering human health, and the harmful gases they produce have accelerated climate change effects (Mayyas et al., 2017).There is a consensus that the mitigation of environmental pollution can be achieved by using cleaner and more efficient methods of energy use.Due to its excellent advantages over ICE in what respects to the notable fuel savings and tailpipe emissions of carbon dioxide (CO 2 ) (Wang et al., 2020;Zeng et al., 2021) and in line with sustainable development goals (Kang, 2021), hybrid electric vehicles (HEVs) are getting popular in the United States of America (USA) and European Union (EU) countries.In the first quarter of 2022, HEVs represented 25% of total passenger car sales in the EU (ACAE, 2022).
The main purpose of using an HEV propulsion system is to decrease fuel consumption by using an ICE assisted by an electrical motor (EM) to provide the required overall vehicle power output.The EM uses the energy stored in the batteries, which is either produced by the ICE or through regeneration from braking, to power auxiliary loads and reduce engine idling when the vehicle is stopped (Mi & Masrur, 2017).HEVs are typically categorized according to their capability for full-electric driving (full HEV) or not (mild HEV) and divided in terms of powertrain configurations (parallel, series, power-split, and multi-mode) (Zhuang et al., 2020).Optimization strategies for HEVs aim at minimizing fuel use or emissions, which in turn depend on the above HEVs types (Borthakur & Subramanian, 2019).
Although many countries have been frequently using vehicle taxes and purchase subsidies to provide incentives to spur the electric vehicles market, the penetration of these fuel types into the privately-owned passenger vehicle fleet is still lagging behind new vehicle sales.The forecasts for 2031 predict that EVs represent about 9% and 20% of all new vehicle sales in the USA and Europe, respectively (Wall, 2019), meaning that ICE and HEVs will still be dominant in the worldwide vehicle fleet in the medium-term.
A good body of research has been devoted to exploring passenger car HEVs and conventional gasoline/diesel vehicles' fuel use and emissions under real driving emissions (RDE) conditions.Two general observations hold across the state-of-the-art: (1) HEV fuel use and CO 2 are usually lower than a comparable gasoline ICE powered vehicle (Holm en & Sentoff, 2015;Huang et al., 2019;Wang et al., 2020); and (2) Restart of the ICE results in sporadic spikes in gas-phase and ultrafine particle emissions in HEVs exceeding those obtained in hot-stabilized conditions (Conger & Holm en, 2015;Duarte et al., 2014;Fernandes et al., 2021;Huang et al., 2019;Robinson & Holm en, 2011;Wang et al., 2021).
Emission performance of HEV powertrains also depends on vehicle operating mode, ambient temperature, road grade, road geometry, congestion levels, or type of traffic control (Alvarez & Weilenmann, 2012;Alvarez et al., 2010;Fernandes et al., 2021;Robinson & Holm en, 2020;Sullivan & Sentoff, 2020;Zhai et al., 2011).One of the first studies was conducted by Zhai et al. (2011) who developed a modal emissions model for a 2001 Toyota Prius based on vehiclespecific power (VSP) and associated with startup and shutdown of the ICE.Alvarez and Weilenmann (2012) examined the impact of low ambient temperature on CO 2 emissions for five HEVs models and revealed that the HEVs could reduce the cold-start extra emissions by 30% to 85% in comparison to a gasoline ICE vehicle.Ambient temperature is a relevant factor in the performance of a hybrid system battery, which in turn affects CO 2 emissions emitted by HEVs (Alvarez et al., 2010).A recent study operating one HEV passenger car on four roadway sections demonstrated the relevance of using real-world road grade as an input parameter in the estimation of CO 2 using a VSP model; the coefficient of determination (R 2 ) was 0.6 and 0.87 by using grade VSP and non-grade VSP models, respectively (Robinson & Holm en, 2020).Although another study (Fernandes et al., 2021) referred significant variations in HEV-related fuel use and emissions under speed variations.
One factor overlooked in the HEVs emission analysis has been mobilizing drivers to adopt eco-driving behaviors (Barkenbus, 2010).The eco-driving technology is recognized to be more cost-effective than fleet retrofit strategies and having low-cost and immediate in-vehicle implementation (Sivak & Schoettle, 2012;Xu et al., 2017).The eco-driving also allows to improve fuel efficiency and reduce CO 2 emissions (Huang et al., 2018).Despite the claimed benefits, eco-driving lacks for an integration between qualitative driving patterns into vehicle hardware so as to generate more constant and uniform improvements in the context of real-driving conditions (Barkenbus, 2010).
Some of major factors of eco-driving are acceleration, deceleration and driving speed (Huang et al., 2018).HEVs and EVs drivers very often adopt calm driving behaviors to obtain significant fuel economies (Liu et al., 2015), i.e., they tend to have less volatile driving styles than those of ICE vehicles.Volatility driving is due to reckless and aggressive driving styles that are associated with higher variations in acceleration/deceleration or higher acceleration values (Huang et al., 2019), thus, resulting in higher power requests to the vehicle powertrain.It can also depend on road type and traffic conditions related to traffic congestion or the presence of traffic incidents (Huang et al., 2019).Such driving styles have different impacts on fuel consumption and emissions even in identical test conditions (Liu et al., 2016;Rios-Torres et al., 2019;Zhang et al., 2022).
Besides speed and acceleration/deceleration, vehicular jerk (the first derivative of acceleration) is often used to represent volatility in instantaneous driving decisions (Wang et al., 2015).One advantage of using vehicular jerk is that it can be classified in different types to represent different driving behavior patterns (Wang et al., 2015;Zhang et al., 2022).Although transportation literature is rich in studying driving volatility on hot-stabilized emissions under RDE (Fernandes et al., 2021;Ferreira et al., 2022a), to the authors' knowledge, the relationship between HEV emissions, VSP, ICE operation status, and driving volatility according to the road type has not been explored before.A previous work conducted by the research team developed emission models for one 2019 SUV Toyota and associated driving behaviors with nine vehicular jerk types.Engine speed-based models showed to be good predictors of CO 2 and Particulate Matter (PM) by exhibiting R 2 higher than 0.70 for both ICE on/off states.However, the correlation of road sections with volatile driving behaviors and impacts on HEV hot-stabilized emissions were not addressed (Fernandes et al., 2021).
Although the existing literature is mostly devoted to studying global and local pollutant emissions of HEVs based on real-world driving conditions, less attention has been paid to how instantaneous driving decisions and their variation, i.e., driving volatility, may explain those emissions.Given the variability of HEV operation, a complete characterization of emission patterns and driving profile variability of speed and acceleration at road section level remains particularly challenging.Moreover, understanding the roadway characteristics and ICE operation status that cause certain volatile driving behaviors is of interest to further decrease fuel use and emissions in hybrid powertrains.(HEVs) With these forethoughts in mind, the main objective of this article is to identify relationships between emissions, engine operation status and driving volatility in HEVs operation.For this purpose, unique data on tailpipe emissions under hot-stabilized conditions, more specifically, second-by-second CO 2 and nitrogen oxides (NO X ) emissions, and vehicle engine and dynamics from an HEV subcompact were collected on three road types (urban, rural, and highway), and six vehicular jerk types (Wang et al., 2015) were considered in the analysis.A modified principal component analysis technique with emissions, engine activity data, road characteristics, and vehicular jerk types as variables was applied in this study to extract principal components and identify specific clusters.In particular, these variables were evaluated using the two-step-semidefinite programming (SDP) algorithm (Macedo, 2015) based on the clustering and disjoint principal component analysis (CDPCA) technique (Macedo & Freitas, 2015;Vichi & Saporta, 2009).The CDPCA allows clustering of objects and simultaneously, partitioning of variables in a disjoint way.This represents an asset for interpretation of the relationships within the set of variables since each original variable contributes to a single principal component, and the set of objects, where each data point belongs to a single cluster (Freitas et al., 2021;Macedo & Freitas, 2015).The advantage of using the Two-Step-SDP algorithm relies on its remarkable performance when compared to the Alternating Least-Squares (ALS) algorithm: it revealed a significant improvement in terms of computational time, the proportion of variance explained by the disjoint components, and better recovering of both the true number of object clusters and the true variable partition (Freitas et al., 2021).
This article also intends to answer the following research questions: 1. How does the level of volatility among vehicular jerk types change across three roadway types (urban, rural, and highway) and HEV ICE operation?
2. Which variables related to the exhaust emissions, dynamic and engine allow identifying hidden clusters in vehicular jerk types according to the route type and ICE status?
The study was carried out in the Aveiro region (Portugal), an area with more than 350 thousand inhabitants.The testing route included urban, rural, and highway roads that are relevant trip generators (Bandeira et al., 2018;Fernandes et al., 2020) since they serve the main hubs of the region, such as the University of Aveiro, the Port of Aveiro, one hospital and several high-density industrial complexes.This route was selected because it accounts for a normal driver operating over wider driving conditions making the study less dependent on the RDE boundary conditions.Field measurements were collected from one HEV using a micro-Portable Emission Measurement System (PEMS), On-Board diagnostic (OBD) tool and global positional system (GPS), to record second-by-second tailpipe emissions (CO 2 and NO x ), engine activity (vehicle speed, engine speed-RPM, mass air flow-MAF, fuel flow rate-FFR, and engine coolant temperature-ECT), and latitude and longitude coordinates data.To assure the quality of the real-world monitoring data, several methodological steps were done including the calibration of PEMS, the leak check of gas analyzers, the time alignment of PEMS, OBD, and GPS signals throughout fixed temporal shift, the checking of erroneous emission data, the comparison between PEMS and vehicle manufacturer emission data, and the accuracy analysis of PEMS and OBD in measuring vehicle fuel consumption.Six repetitions covering around 15,500 s of raw data by variable were performed on different days to include variability and reproducibility in the collected data.
It must be stressed that the main purpose of this research is to provide a methodological framework to allow transportation professionals, academics, or practitioners for a more insightful interpretation of kinematic, engine, road type, and driving volatility data in the context of emissions monitoring.Although results cannot be generalized to other cars, the proposed method can be applied to any vehicle type.Since classification in six different vehicular jerk types allowed for a characterization of different driving behavior patterns, results are relatively less dependent to the choice of the driver.The focus of the study is neither to certificate emissions from a specific HEV in the market nor to provide a comparative evaluation of the emissions and driving volatility results to a conventional vehicle with the same model (e.g., Toyota Yaris subcompact petrol).
With the advent of emission monitoring research, and through seamless integration of emissions and driving volatility and its consequences on safety, this vehicle characterization-oriented paper is novel as it provides a multivariate analysis method able of investigating vehicle data with emission, engine, kinematic, and vehicle-operation variables without taking into consideration the fundamental class structure or size.This contributes to the emissions monitoring analysis of the new generation of HEVs in what concerns the description of volatile driving behaviors.Such information can be used in vehicle electronic car units and navigation systems to give drivers feedback about their emission rates and jerkings to the vehicle.

Study setup
The study collected hot-stabilized exhaust emissions, engine activity, and vehicle dynamics data from one 2020 Toyota Yaris HEV subcompact with an initial odometer mileage of 2500 km.The choice of the Yaris relied on the predominance of the well-known Hybrid Synergy Drive, which was used in 41% of HEVs sold in Europe in 2020 (ACAE, 2021).The main characteristics of the test vehicle are listed in Table 1.The fuel consumption (FC) of the 2020 Toyota Yaris HEV in the Worldwide Harmonized Light-Duty Vehicles Test Procedure (WLTC) declared by the manufacturer is 3.5, 3.9, 5.5, and 4.3 L.100 km -1 for low-speed cycle (urban), highspeed cycle (rural), extra-high-speed cycle (highway), and combined cycle, respectively.Gasoline fuel used for tests was provided by PRIO energy company and complies with the EN 228 standard.
Tests were completed by one driver traversing a specified 44 km driving route over 4 days in April and May 2021 in Aveiro region (Portugal).The route was chosen because of its variability concerning the speed limits, traffic volumes, traffic control treatments (such as roundabouts, traffic lights, stop-controlled intersections, and priority intersections), and road grade (EC, 2017).
Past research conducted in the studied region has confirmed the relevance of road traffic on air pollution caused by NO X road traffic emissions (Rafael et al., 2020;Vicente et al., 2018).
Figure 1 depicts the selected route for this study, where points A represents the start of the trip and D the end of the trip.The route typology was collapsed into three sectors (urban, rural, and highway) based on speed limits and minimum distance, i.e., 13 km (Table 2).The rural sector corresponds to the beginning of the route (A-B) where the vehicle was driven with a speed lower than 90 km.h -1 .During the urban sector B-C, the vehicle is driven at speeds lower than 50 km.h - (B-C) in 86% of the trip section time.
Concerning the highway, it includes segment C-D, and with traveling speeds higher than 90 km.h -1 in more than 44% of trip section time.Important to note that the highway ends in the middle of this route (most west point in blue), and drivers need crossing a hill road section to approach and negotiate an intersection, and then make an U-turn to enter again in the highway.The selected route also includes a short urban sector ($1.8 km) corresponding to segment D-A to conclude the trip.It is worthwhile noticing that traffic conditions and control treatments did not allow to perform all the sectors at the posted speed limits, but some speeding situations were observed in some sectors along the trip.This is especially true in urban sectors where drivers exceed the speed limit in more than 10% of the trip section time.

Exhaust emissions and vehicle data collection
This article used the 3DATX ParSYNC micro-PEMS (3DATX, 2018) to collect CO 2 (in volume fraction with a range of 0-20%) and nitric oxides (NO) and nitrogen dioxides (NO 2 ) (with a range of 0-5000 ppm) concentrations of exhaust pulled directly from the tailpipe, sampling at a rate of 1 Hz.NO X concentration was defined as the sum of concentration signals of NO and NO 2 (Sandhu & Frey, 2013).Table 3 lists the range, sensitivity, and resolution of each gas sensor for the current version of PEMS equipment.
This micro-PEMS was selected because of its ability to provide reliable emissions measurements in light duty vehicles.A good body of research studies has also demonstrated the effectiveness of micro-PEMS and simplified PEMS as tools for collecting tailpipe emissions from light duty vehicles and light diesel trucks with different propulsion types on specified real-world routes (Fernandes et al., 2019(Fernandes et al., , 2021(Fernandes et al., , 2022;;Khan et al., 2020;Vu et al., 2020;Yang et al., 2018;Yuan et al., 2019).In addition, the equipment is easy to deploy and install, and it offers lower costs and weight compared to a 1065 compliant PEMS (Yang et al., 2018).Although the selected PEMS is not used for regulatory certification testing, it is capable of identifying different driving behavior profiles with reasonable accuracy (Ferreira et al., 2022a;Wei & Frey, 2021).
Trip tests preparation included routine calibrations of gas analyzers with certified zero/span gases before and after each trip using the UN 1956 gas mixture (300 ppm of NO; 100 ppm of NO 2 ; 6% of CO 2 ; and 93% of nitrogen) with a flow rate of approximately 2 L.min -1 .The mean zero NO X drift was lower than 5 ppm.For each time that PEMS was installed in the vehicle, gas analyzer system leak check was carried out as    prescribed by the equipment manufacturer.During the parSYNC experimental routine, a warm-up phase of around 15 km preceded all trips, allowing for the ECT to stabilize on values greater than 80 C (Sullivan & Sentoff, 2020).Neither parSYNC has an exhaust flowmeter nor includes an internal OBD reader.Thus, the iCar PRO BLE 4.0 (Sivak & Schoettle, 2012) Bluetooth OBD-II was connected to the electronic car unit (ECU) according to standard protocol ISO 15765-4 CAN (11 bit ID,500 Kbaud) to record at a frequency of 1 Hz the following engine parameters: OBD speed in km.h -1 ; Engine speed in RPM; MAF [sensor accuracy of 10% (Fernandes et al., 2019;Giechaskiel et al., 2018)] in g.s -1 ; FFR in L.s -1 ; ECT in C.
A QSTARZ GPS Travel Recorder (position accuracy of 3 m) was used to get second-by-second vehicle position and elevation.
A single male driver (27-year old) drove all sampling runs [a total of 6, according to the methodology described in (Pascale et al., 2021)] as an intentional study design with the main purpose of creating a similar driving style across the road sectors.The weather conditions observed during driving sessions were characterized by ambient temperatures between 19 C and 22 C , and relative humidity lower than 70%.

Road grade calculation
The segment method suggested in (Boroujeni & Frey, 2014) was used to obtain the road grade (r).This two-step procedure consists of correcting instantaneous vehicle altitude data followed by the computation of precise road grade at every location along the trip.
Although GPS device provided latitude, longitude and altitude data without any signal loss, second-by-second altitude data were acquired from GPS Visualizer tool (Rafael et al., 2020), followed by a correction under the conditions given in Equation (1) (Fernandes et al., 2019): where h i_corrected ¼ Corrected altitude (m above the sea level); h i -1 ¼ Altitude in the second of travel i -1 (m above the sea level); h i ¼ Altitude in the second of travel i (m above the sea level); After that, and following the recommendations made by Boroujeni and Frey (2014), the entire route trip was divided into multiple segments of 100-m length, and road grade (r) computed using linear regression through all altitude data.
2.3.2.Data processing and quality assurance Time alignment of PEMS, OBD, and GPS data was done before computation of mass emissions following the methodology described by Sandhu and Frey (2013).The alignment method was based on pair of variables exhibiting concurrent trends (e.g., rise in MAF and concordant rise in the concentration of exhaust gas) and further evaluated using the Pearson Correlation Coefficient (PCC), which measures the degree of linear correlation between two variables.The use of engine RPM and NO X concentration as pairs to be synchronized in gasoline powertrains is recommended (Sandhu & Frey, 2013).The process was repeated every day the PEMS was started.The maximum PCC values related to the RPM-NO X synchronized pairs lied around 0.40-0.41,and thus, in the range of values recommended in the literature (Sandhu & Frey, 2013).
Frequently used indicator variables in synchronizing OBD and GPS data are OBD speed versus vehicle speed.These variables produced PCC peak values of approximately 0.997, which means that datasets were properly synchronized (Sandhu & Frey, 2013).
Erroneous data were checked by taking into account quality assurance screening criteria (Sandhu & Frey, 2013).Common errors found during the monitoring campaigns were related to the negative NO and NO 2 concentrations, which represent values below the instrument detection limit.Such negative concentration values were zeroed for emission analysis.It must be emphasized that these negative values accounted for only 2.1% of the raw data.

Vehicle-specific power
VSP represents the vehicle instantaneous power per unit mass, and it accounts for engine power demand associated with changes in both vehicle potential and kinetic energies, aerodynamic drag, and rolling resistance (USEPA, 2002).An extensive body of research has applied the concept of VSP for the analysis of different HEVs under real driving conditions (Duarte et al., 2014;Fernandes et al., 2021;Ferreira et al., 2022a;Holm en & Sentoff, 2015;Pascale et al., 2021;Robinson & Holm en, 2020;Zhai et al., 2011).The second-by-second collected data of OBD speed (v i ) and acceleration (a i ), and r were used to quantify instantaneous VSP with assumptions for light-duty vehicles constants based on the work of Jimenez (USEPA, 2002), as denoted by Equation (2): where VSP i ¼ vehicle-specific power in the second of travel i (kW.ton -1 ); where m ex ¼ Exhaust mass flow rate (g.s -1 ); q fuel ¼ Fuel density (730 kg.L -1 at 15 C).

Vehicular jerk types
Instantaneous driving decisions imply short-term driving decisions to accommodate real-time changes during the trips, including pavement conditions, approaching traffic control treatments, or the presence of adjacent vehicles.Wang et al. (2015) stated that instantaneous driving decisions are related to acceleration/deceleration, constant speed (zero acceleration), jerking the vehicle (rate of change acceleration or deceleration), and/or maintaining constant acceleration and deceleration (zero vehicular jerk) episodes.Vehicular jerk (j) is the derivative of acceleration with respect to time, being considered as a proper kinematic variable for capturing drivers' abrupt adjustments.This kinematic parameter was chosen because it is widely recognized as a good measure of driving volatility and driver behavior, regardless of the road type (Feng et al., 2017;Ferreira et al., 2022b;Wang et al., 2015).Given that volatility reflects the degree to which vehicles move, vehicular jerk can be used in finding traffic conflicts undetected by common surrogate safety measures (Bagdadi & V arhelyi, 2011;Zaki et al., 2014).Other applications of vehicular jerk include the development of controllers in vehicle manufacturing with the main purpose of mitigating discomfort for car occupants caused by oscillations in longitudinal vehicle acceleration (Scamarcio et al., 2020).Past studies also confirmed vehicular jerk as a proper explanatory variable for predicting exhaust emissions and fuel consumption in both conventional vehicles and HEVs (Fernandes et al., 2021;2022;Ferreira et al., 2022a;Zhang et al., 2022), resulting in generally better models than those obtained based on VSP.
Several parameters are reported in the literature to characterize driving behavior including maximum and minimum acceleration, relative positive acceleration, mean positive acceleration, and speeding (Choi & Kim, 2017;Deligianni et al., 2017;Gallus et al., 2016Gallus et al., , 2017)).None of them allows to give a complete information about different acceleration patterns over time or cruise speeds.Six different types of vehicular jerk profiles can be defined based on acceleration and deceleration profiles at each sampling speed.Such instantaneous driving behaviors include jerk enhancements (Types B and D), jerk reversals (Types C and F), and jerk mitigations (Types A and E), as follows (Wang et al., 2015): Type A. Acceleration followed by lower acceleration: a i > ¼ 0 & a iþ1 > 0 & a i > a iþ1 !j < 0; Type B. Acceleration followed by higher acceleration: a i > ¼ 0 & a iþ1 > a i !j > 0; Type C. Acceleration followed by deceleration: a i > ¼ 0 & a iþ1 < 0 !j < 0; Type D. Deceleration followed by lower deceleration: a i < 0 & a i < a iþ1 < 0 !j > 0; Type E. Deceleration followed by higher deceleration: a i < 0 & a iþ1 < a i < 0 !j < 0; Type F. Deceleration followed by acceleration: Noted that a i and a i þ 1 represent acceleration (m.s -2 ) in the second of travel i and i þ 1, respectively.

Two-step-SDP algorithm
To detect the most relevant information behind emission and driving volatility data, the CDPCA methodology (Vichi & Saporta, 2009) is of special interest because it allows partitioning the variables into a reduced set of disjoint components (i.e., each component is characterized by a disjoint set of variables) and unveiling patterns among the objects through clustering objects along with a set of centroids, simultaneously, so that the between cluster deviance of the components in the reduced space is maximized.The CDPCA model results from applying the K-means on the data matrix for clustering the objects and simultaneously, performing sparse Principal Component Analysis (PCA) on variables, which leads to an improvement of interpretability (Freitas et al., 2021;Vichi & Saporta, 2009).The CDPCA model can be given by Equation ( 7): where X ¼ Data matrix (I Â J); U ¼ Object membership assignment matrix (I Â K); K ¼ Clusters of objects; A ¼ Orthonormal components loadings matrix (J Â Q); Y :¼ XA, where X is the (K Â J) object centroid matrix in the original space, represents the (K Â Q) cluster centroid score matrix; E ¼ Error matrix associated with the model (I Â J).
An iterative heuristic procedure suggested in (Macedo & Freitas, 2015) and an approximation algorithmic framework based on SDP proposed by (Macedo, 2015) were proposed to solve the CDPCA problem.The latter is called the twostep-SDP algorithm and is based on SDP relaxations of two clustering problems and a K-means step in the reduced space (Macedo, 2015).The two-step-SDP approach outperforms the ALS yielding improvements not only regarding the computational time but also in terms of the proportion of variance explained by the disjoint components (Freitas et al., 2021).This algorithm also reveals better recovering of both the true number of object clusters and the true variable partition clustering assignments (Freitas et al., 2021).
The two-step-SDP algorithm for CDPCA involves a first phase where the C clusters of objects and Q clusters of attributes are initially estimated considering orthogonal projections and 0-1 SDP models.These SDP models are relaxed to convex models and solved using a singular value decomposition-based approach.After that, a rounding procedure based on K-means applied in the reduced space of centroids is performed to obtain the CDPCA model solutions.More details about the procedure can be found in (Macedo, 2015).

Analysis of engine performance and emissions
Collected data covered approximately 15,500 s and almost road coverage of 265 km (EC, 2017).
From these results, values of average mass trip emissions and standard deviation were computed.Although tests were performed in hot-stabilized conditions, the PEMS CO 2 emissions per kilometer (97.8 ± 3.3 g.km -1 ) and FC (4.2 ± 0.2 L.100 km -1 ) are quite close to the vehicle-type approval, as reported in Table 1.The deviation between experimental NO X trip value (7.9 ± 2.5 mg.km -1 ) and WLTP NO X was 15.6%.Other PEMS validation studies found differences of approximately 15% at the 20 mg.km -1 NO X range (Varella et al., 2018), which also reinforces the reasonability of the obtained results.To assess the accuracy of the equipment, the FC values computed from OBD and PEMS were compared against FC WLTP values relatively to high-speed (rural) and extra-high-speed (highway) cycles, as suggested in Xu et al. (2017).The average OBD accuracy was 4.9% (median ¼ 4.2%; standard deviation -STD ¼ 3.0) and À3.2% (median ¼ À5.0%; STD ¼ 4.5) in rural and highway trips, respectively, and thus within the ± 5% set by the regulation (EC, 2017).For what concerns the PEMS values, the average accuracy ranged from À2.5% (median ¼ À0.9%) for all rural trips to 9.0% (median ¼ 8.7%) for all highway trips.
Figure 2a-d shows a HEV trip driving sample in terms of instantaneous OBD Speed, RPM, ICE mode and VSP, and cumulative CO 2 and NO X .The first part of the trip corresponds to the road sectors in rural and urban roads, and, under these driving conditions, the engine speeds are significantly different because of the cyclical on-off states of the ICE.This is especially true in the urban sector where resulting values of cumulative CO 2 increase slowly throughout its length.In the HEV highway driving, noticeable increments for both CO 2 and NO X emissions are obtained, which are expected to be a consequence of the acceleration episodes during the highway entrance ramp and high-vehicle power demand in uphill road sections.For all road sectors, the expected relationship between ICE on-off and VSP is observed; RPM is typically 0 for VSP values equal or lower than 0 kW.ton -1 (idling, braking, and low-speed operation).Scatter plots also revealed that the electric mode of the HEV only occurs up to VSP values of 15 kW.tonÀ1 , which is coherent based on available EM power alone and the testing vehicle weight (Table 1).
The distribution of FFR, CO 2 , NO X , distance, and travel time by road sector is exhibited in Figure 3.The values in the graphs represent the average values of all trips performed by HEV.The highway road sector contributes to the greatest portion of the CO 2 and NO X emissions (respectively, 42% and 54%) generated by a vehicle along the route.This segment corresponds to about 24% of the travel time.The expected fuel-use and emissions reductions attained from driving HEV technology are confirmed in urban road sectors, as already observed in previous studies (Fernandes et al., 2021;Holm en & Sentoff, 2015;Huang et al., 2019;Wang et al., 2020).They account together with less than 20% of the fuel use and emissions in more than 50% of trip time.
To identify the locations with higher emissions in each road sector, a distribution of CO 2 and NO X emissions per unit distance during the route is exhibited in Figures 4 and 5.Each point represented in the map indicates the average, computed as the trip emissions per unit distance in each 250-m length.The highest values are clearly distinguished during rural and highway sectors exhibiting, for instance, CO 2 values higher than 100 g.km -1 in more than 50% of their segments (Figure 4a-d).A close view of the urban road sectors reveals several locations with no CO 2 emitted by the HEV (EV mode), which accounted for 15% of segments in both road sectors.It was also possible to find some locations in EV mode on the rural and highway, which can be justified by two main reasons: regenerative braking at low speed and/or deceleration and the existence of downgrade segments.The analysis of NO X indicated that rural and highway sectors recorded values above 0.005 g.km -1 in about 48% and 40% of their segments, respectively (Figure 5a-d).This can be explained by the effects of roundabouts and uphill segments that cause instantaneous fuel injection after fuel cutoff during the acceleration shifting process leading to peaks in NO X emissions emitted by HEVs (Fernandes et al., 2021;Zhang et al., 2020).

Analysis of driving volatility indicators
To examine the typical profile of regular driving volatility by road type, vehicular jerk by speed bins in 1 km.h -1 increments is calculated based on second-by-second dataset (Figure 6a-c).Scatter plots also include upper and lower bands (mean plus or minus STD) for the aggregated positive and negative vehicular jerk values.Upper band characterizes the most typical driving practice on the roadway.The red and orange points that are out of the bands are defined as "highly" volatile driving behaviors while blue and green points represent "moderately" volatile driving behaviors (Wang et al., 2015).
Urban driving covers a wider band of vehicular jerks when compared to the other road sectors.About 62% of the data samples of urban driving is highly volatile (44% and 17% for a positive and negative vehicular jerk, respectively).Low-speed data (<45 km.h -1 ) exhibit larger bandwidth than high speeds data did.The rural road sector yields 28% of the data sample as highly volatile (15% and 13% for a positive and negative vehicular jerk, respectively), meaning that the typical driving behavior is less volatile under these conditions.For speed values higher than 50 km.h - , the absolute vehicular jerk values are lower than 1 m.s -3 .The largest bandwidth is between 10 to 40 km.h -1 under highway driving and it decreases markedly when speed is higher than 60 km.h -1 .The latter speed intervals are represented by vehicular jerk values lower than 0.3 m.s -3 , as reflected in the upper band.
The mapping of the vehicular jerk data for one sample is depicted in Figure 7a-c.For clarity of comparison, the dataset is divided in terms of negative, zero and positive jerking.The number of seconds in vehicular jerking (red and green points) is notably high in urban driving conditions, which is  reasonable because vehicles are continuously braking and accelerating due to the presence of intersections, other vehicles, and traffic incidents on the road.A recent study conducted by Fernandes et al. (2021) also identified this pattern.Rural and highway sections show more data corresponding to constant acceleration and speed (blue points), especially in sections characterized by fewer stop-and-go traffic where vehicles are allowed to move in free-flow conditions and with smoother variations in acceleration.These findings are relevant when combined with information given in the Figure 6.They confirm that the trend observed in vehicular jerk profile widely varies according to the type of road, which in turn can explain specific values on tailpipe emissions and engine variables of HEV.Analysis of other trips also showed identical trends in vehicular jerk spatial distributions.
To understand how much time is spent on different driving volatility states, the time spent for speed bins is aggregated by vehicular jerk type and road sector, as shown in Figure 8a-c.Because the number of data points is small (less than 10% of the dataset), analysis is centered in speed values higher than 40 km.h -1 and 60 km.h -1 in rural and highway, respectively.The percentage of time spent on zero vehicular jerking yields the highest portion, especially in locations outside urban environments.These conditions of constant acceleration accounted for 19%, 29% and 41% of driving time at urban, rural, and highway, respectively.When the vehicle is stopped, the zero jerking can represent more than 85% of the sample collected in the urban road sector.Vehicular jerk Types A and B show significant amounts of time spent on urban and rural roads with approximately 20% each.Driver maintaining vehicle deceleration (Types D and E) represents a small portion of the dataset (between 4% and 12%, depending on the road sector and jerk type).However, urban driving is characterized by a percentage of these volatile states that were higher than those observed in other road environments.

Numerical results of CDPCA
The two-step-SDP algorithm was applied to the above dataset to perform CDPCA and identify relevant factors contributing to explaining the variability and reveal the cluster structure hidden on data.Two cases were considered: (1) dataset constrained to the road sector and considering K ¼ 2, which is related to the ICE and EV modes; (2) dataset conditioned to the road sector together to a constraint concerning the operating mode (ICE or EV), and in this case, K ¼ 7, which is related to the jerk type classes.
In all experiments, the tolerance was set to 10 -5 , the maximum number of iterations was 100, the number of runs of the algorithm for the final solution was 50, and three principal components (Q ¼ 3) were considered.CDPCA was performed for each road sector data points with RPM, OBD speed, acceleration, vehicular Jerk, VSP, vehicular Jerk Type, CO 2 and NO X chosen as variables.

Case 1
The algorithm converged to an optimal solution after 7, 8, and 8 iterations for the urban, rural, and highway cases, respectively.The three components of the CDPCA explain almost 70% of the data variability regarding the urban and highway data, while for the rural case, this value is around 57%. Table 4 lists the results of component loading from CDPCA by constraining the dataset to the road type.RPM can be considered a variable that can explain the variability within the ICE or EV mode.For the urban case, the four factors contributing to the first Component 1 are RPM, VSP, and CO 2 , and NO X emissions and can be identified to reflect the distinctive differences between ICE operation status.The RPM, acceleration and VSP seem to contribute most for the differences on the operating mode at rural sector, while for highway sector, the most relevant factors contributing to the first principal component are RPM, speed, vehicular jerk, and CO 2 .
The plots in Figure 9, which project the observations onto a pair of principal components, give visual information on the shape of data.In particular, the authors found different patterns that allow a clear way for identifying ICE and EV modes; the between cluster deviance of CDPCA is 55% for urban, and 63% for rural and highway sections of total deviance.The weights of the attributes comprising the principal components rule such shapes.Each road sector exhibits a different clustering structure.In the urban case, the cluster regarding the ICE mode presents a significantly thinner and longer shape when compared to EV mode.These results indicated more variability in the data, such as related to the acceleration under urban environments, which are more prone to higher changes and driving volatility.Such engine/energy demand yields a constant switch between ICE and EV modes.In the rural section, the clusters are well separated and more balanced in terms of shape because of the wider variation in the road grade (see Table 1) that clearly is seen with the ICE turning on and off according to positive and negative values, respectively.Road grade impacts VSP, which was previously identified as one of most contributed factors on the operating mode at this sector.Finally, the highway sector presents clusters that allow identifying which points belong to the ICE or EV modes.It can be also observed a thin and long shape for ICE mode in this sector that is mainly due to the characteristics of the segment in the middle of route (as explained in the characterization of case study in Subsection 2.1), which in turn leads to variations in RPM and vehicle speeds.

Case 2
The optimal solution for ICE mode was found after 28, 23, and 36 iterations for the urban, rural, and highway cases, respectively.The three obtained components explain between 71% and 78% of the total variance for the urban, rural, and highway cases.Tables 5 and 6 show the component loadings of three principal components obtained by performing the CDPCA on the dataset constrained to the road type and operating mode, ICE and EV, respectively.RPM plays a significant role in reflecting the differences between jerk types for all cases.The explained variance obtained by Component 1 referring to the RPM, VSP, and CO 2 , and NO X is higher than 41% and these four variables can be used to distinguish the jerk type for the urban sector.While for the urban case, the VSP contributes to the first principal component and other relevant accelerationdependent factors contribute to the second component, for the rural and highway sections, these present a similar composition of the most relevant factors that are related to acceleration-  dependent variables and pollutant emissions.Under Component 1, RPM, Speed, and CO 2 account for approximately 30% of the total explained variance for rural, that together with the second component reaches more than 55%, where the contributors are those acceleration-dependent variables (acceleration, vehiclar jerk, and VSP).It should be mentioned that the NO X contributes as a single variable to a third principal component.Concerning the highway results, Component 1 reveals a high relationship of RPM and pollutant emissions variables, while the block of three acceleration-dependent variables contributes to Component 2, each of them representing approximately 30% of the total explained variance.As in the urban case, the speed is the only variable that contributes to a third component.The two-step-SDP algorithm could return a clustering of the objects with a clear separation by the jerk types (Figure 10); the between cluster deviance of CDPCA is clearly above 70% of total deviance for all road sections.The optimal solution for EV mode was found after 28, 52, and 25 iterations for the urban, rural, and highway sectors, respectively.The first two components explain more than 58% of the total variance for the urban, rural, and highway data.Results reveal that the variables that contribute to each one of the three principal components coincide in all road sectors.The explained variance obtained by Component 1 referring to RPM and CO 2 and NO X is between 31% and 34% (depending on the road sector).Component 2 includes in all cases, the acceleration-dependent variables and the explained variance is approximately 28%-30%.
For all cases, the between cluster deviance of CDPCA is around 80% of total deviance and the results highlight a clear different pattern when comparing the ICE and EV modes.In the EV mode, for all road sections, the factors contributing to each component coincide: RPM and pollutant emissions variables contribute for Component 1, acceleration-dependent variables in Component 2, and the third principal component reflects the relevance of Speed. Figure 11 highlights the clustering results after performing the CDPCA to try to identify the patterns by jerk types.When comparing the ICE and EV results (Figure 10), a clear difference in the shape of the clusters with respect to the jerk types can be observed.

Conclusions
This research focused on the comprehensive understanding of hot-stabilized exhaust emissions, vehicle engine, and driving volatility of an HEV in different road sectors (urban, rural, and highway) and for both ICE status.Driving volatility was categorized by several driving behaviors associated with a vehicular jerk that included zero jerking (constant speed and/or acceleration), jerk enhancements (increasing accelerations or decelerations), jerk reversals (acceleration followed by deceleration), and jerk mitigations (decreasing accelerations or decelerations).
Analysis of vehicular jerk showed the largest bandwidth of vehicular jerk under urban driving conditions, especially at speeds lower than 45 km.h -1 .The distribution of vehicular jerk widely differed across road sectors.Zero jerkings represented 19%, 29%, and 41% of driving time in urban, rural, and highway, respectively.Results also indicated discrepancies concerning the distribution of vehicular jerk types between ICE and EV modes.CDPCA results constrained to the road sector confirmed that RPM, VSP, CO 2 , and NO X are factors that allow identifying the cluster structure hidden on vehicular jerk type and ICE operation status data.
This article highlighted the importance of measuring driving volatility when evaluating HEV operation in the context emission monitoring topic.The analysis of driving behavior based on the extent of driving volatility, which goes beyond simply labeling a driver as aggressive or nonaggressive, and its further correlation with emissions are thus key study contributions.Vehicular jerk classification can be integrated into driving behavior monitoring and feedback devices able of using volatility information to provide alerts and warnings according to the ICE operation status.Applications can be embedded in vehicle navigation systems to warn drivers when high CO 2 and NO X emission and jerking movements are detected during a trip.This research has also scientific contributions in what respects which variables can be used as an indicator to distinguish and identify different clusters considering the jerk types between ICE operation status using two-step-SDP algorithm for performing CDPCA.
Although the methodology presented here can be used in any HEV or ICE, the data used in this research are from a single HEV, so the findings and conclusions may only apply to similarly operated vehicles.Other HEVs may be operated following different optimization strategies and might reveal a different influence of road type characteristics and engine variables on vehicular jerk distributions and tailpipe emissions.The exclusion of cold-start events in the experiments where emissions emitted by vehicles are expected to be much high on account of a catalytic converter could be considered a limitation, but the major focus of this study was to provide a thorough analysis of an HEV under hot-stabilized conditions.Thus, it should be mentioned that such events can dictate other relationships among emission, engine and driving volatility variables.Nevertheless, and considering any of these situations, the proposed methodological framework is valid and can be used for data interpretation.Future work will be devoted to the characterization of HEV-specific CO 2 , NO X and PM based on different driving behavior styles, engine operating temperatures (cold-and hot-stabilized exhaust emissions) and exhaust gas treatment systems.Since powertrain management is influenced by the level of battery state of charge, there is a need for collecting data in HEV to obtain robust conclusions on hidden clusters.Next research steps will also focus on the development of a graphical interface capable of remotely connecting with ECU system to both incorporate and map vehicular jerk classification, as well as to define a driver index to measure the extent of variations and emission rates in driving.There is a need for an independent verification of the results obtained here by collecting additional emission, engine and driving volatility data in comparable conventional vehicles and HEVs with similar characteristics to possibly generalize the research findings.The examination of differences in the distribution of jerk classification types using OBD and GPS data will be also performed.density of CO 2 at the standard conditions q fuel fuel density q NOX density of NO X at the standard conditions

Figure 1 .
Figure 1.Specific route of HEV tests in Aveiro region (Portugal).

Figure 2 .
Figure 2. Example of driving route from a single trip (trip 1) of the HEV: (a) OBD speed and RPM; (b) RPM and VSP; (c) OBD speed and cumulative CO 2 ; and (d) OBD speed and cumulative NO X .The ICE mode is ON when the engine speed is 0 RPM, and OFF for the remaining engine speed values.

Figure 3 .
Figure 3. Distribution of FFR, CO 2 , NO X , travel time and distance of the HEV by road sector.

Figure 6 .
Figure 6.Vehicular jerk distribution by speed bins: (a) urban; (b) rural; and (c) highway.Note: STD is the standard deviation value.

Figure 9 .
Figure 9. CDPCA results on the operating mode constrained to road sector: (a) urban; (b) rural; and (c) highway.

Figure 10 .
Figure 10.CDPCA results on the jerk types constrained to road sector and ICE mode: (a) urban; (b) rural; and (c) highway.

Figure 11 .
Figure 11.CDPCA results on the jerk types constrained to road sector and EV mode: (a) urban; (b) rural; and (c) highway.

Table 1 .
Technical specifications for the HEV used in this study.

Table 3 .
(Fernandes et al., 2019)ons for the 3DATX ParSYNC integrated PEMS.Second-by-second CO 2 and NO X emissions rates (mass per time) were computed based on the Regulatory Information 40 CFR 86.144 for tailpipe emissions(EPA, 2018).The exhaust mass flow rate from data as reported by the ECU(Fernandes et al., 2019)can be computed as follows (Equation (3)):

Table 5 .
Component loadings from CDPCA for case 2 and ICE mode.

Table 6 .
Component loadings from CDPCA for case 2 and EV mode.