Analysis of minimum pulse shape information needed for accurate chest injury prediction in real life frontal crashes

Abstract The relationship between crash pulse shape and injury risk has been studied primarily with laboratory studies, but these are not necessarily representative of most real-life crashes. For the past decade, pulse information from real-life crashes has been available through event data recorders. The aim of this study is to evaluate how crash pulses from event data recorders can be parameterized with as few parameters as possible without losing the ability to accurately predict occupant injury. Pulses from 122 NASS/CDS cases with a delta velocity over 40 km/h were parameterized using eigenvector analysis. Six different pulses were created for each of these cases, including the original pulse and five approximations with gradually more pulse information. Using a finite-element sled model with the detailed THUMS human body model, the risk of chest injury was evaluated for each pulse version in each case. By comparing the results from each pulse approximation to the original pulse, the change in chest injury could be evaluated as a function of pulse approximation for each case. Using linear regression to analyse the chest injury error results it was found that a pulse with as few as four parameters—delta velocity, duration, and two shape parameters—can sufficiently describe the pulse shape from a chest injury point of view.


Introduction
The deceleration as a function of time that a vehicle experiences during a frontal collision is referred to as the crash pulse. The characteristics of the crash pulse depend on parameters such as the mass and structural stiffness of the subject vehicle and the crash object as well as how they interact during the crash. One example of an interaction is the case when the subject vehicle collides with a pole. If the collision occurs in the centre of the front of the vehicle, the longitudinal beams of the subject vehicle will not be engaged and the pulse will be dependent only on the drive train load path, which will result in a relatively sharply rising crash pulse.
The severity of the crash is often described using the delta velocity (DV), which is the integration of vehicle acceleration over time, and the relationship between DV and the risk of injury to vehicle passengers is well established [1][2][3]. In addition to the crash severity, other properties of the crash pulse influence the injury risk. Kullgren [1] used data from crash pulse recorders and showed that combinations of DV, mean acceleration, and peak acceleration enhance the prediction of injury risk compared to using only one of these components alone. These results were confirmed by Ydenius [3]. Thus, the crash pulse in these studies can be considered to consist of a severity parameter (DV) and a shape parameter (the mean acceleration or the peak acceleration).
A more hands-on example of the influence of the crash pulse shape was shown in [4] by comparing a 60 km/h EU Offset to a 56 km/h USNCAP crash test. Even though the DV in the USNCAP crash test was lower, the Hybrid 3 injury values were higher in this test. According to the authors, 'the slow onset of deceleration in the EU Offset crash test leads to a lower occupant to interior contact velocities and a less severe environment for occupant restraint systems. ' In recent years, there have been dramatic developments of finite element (FE) models in passive safety applications. The low cost for each virtual crash test has allowed researchers to perform large parameter studies, stochastic simulations, and optimizations. Furthermore, if the crash pulse is parameterized, there is also the possibility to search for relationships between these pulse parameters and injury outcome. In parallel, detailed FE Human Body Models (HBMs) have been introduced as a complement to anthropometric test devices. The high level of anatomical details in HBMs, facilitates the assessment of injury at a higher level of detail, e.g. on tissue level.
A simple form of parameterization was mentioned above in terms of DV, mean acceleration, and peak acceleration. Other authors have used other types of pulse parameterization to study what parts of the pulse are relevant for injury outcome. Huang et al. [5] simplified a pulse using low-order polynomials or Fourier components. The conclusion was that even very low-order polynomial or Fourier components, described with as few as four parameters, were enough to get a response within ±5% of the occupant injury recordings. Lundell [6] continued this work by using a half sine pulse and adding high-frequency content at different onset times to study the sensitivity of the injury prediction to high-frequency content. The main conclusion of that study was that total DV was the most important factor for injury rating, and the high-frequency content had less influence as long as it did not change the final velocity. Wu et al. [7] used a two-step piecewise linear approximation of the acceleration pulse and reached a similar accuracy as Huang et al. However, they noted that a one-step or three-step approach would be more appropriate in some cases.
All these studies evaluated pulses from barrier crashes such as the US-NCAP and the Euro-NCAP, but the shapes of these pulses are not necessarily representative of real-life crashes. For the past decade, the National Highway Traffic Safety Administration (NHTSA) has published event data recorder (EDR) data from a selection of real-life crashes in the National Automotive Sampling System (NASS) database, and this provides an opportunity to study the pulse shapes of real life crashes.
The aim of this study is to determine how real-life crash pulses, as represented by EDR data, can be parameterized with as few parameters as possible without losing the ability to accurately predict occupant injury, as measured on tissue level using a HBM. These results can then be used for FE simulations of real-life crashes.

Method
The analysis of pulse shapes is not as straightforward as comparing scalar values such as DV. A real pulse is a continuous measure that is recorded using sampling. To be able to compare different approximation levels the pulse needs to be parametrized in a way that the addition of one more parameter improves the pulse approximation slightly more, up to the point that inclusion of all parameters will recreate the original pulse. This study will utilize singular value decomposition to parametrizes NASS/CDS EDR pulses [8]. The method can be summarized in the following steps with more details presented in the next sections: 1. collect real-life frontal EDR crash pulse data from NASS, 2. make all pulses comparable using resampling and normalization, 3. remove cases with incomplete pulse information, 4. parameterize the pulses using singular value decomposition, and 5. analyse the correlation between pulse information and occupant injury. To exclude cases where the source of vehicle damage was unclear, cases involving rolling (NASS variable ROLLTYPE) were removed. For cases involving multiple impacts, a judgment was made as to whether the other impacts could have influenced the studied pulse or resulted in overlapping damage to the subject vehicle. If this was judged to be a possibility, the case was excluded.
Finally, to exclude pulses where the major pulse component was not longitudinal, all cases with a principle direction of force (NASS variable PDOF) greater than ±45 degrees were removed. This resulted in a total sample of 145 cases for further analysis.

Make all pulses comparable using resampling and normalization
In NASS/CDS most EDR pulses are reported as DV versus time which is not well suited for this analysis. Thus, the first step was to convert the EDR DV signals to acceleration signals using a differentiation scheme (Equation 1; [8]).
where DVðtÞ is the change of velocity recorded in the EDR module at time t, and Dt is the sampling interval (for EDRs normally 10 milliseconds). The next task was to normalize the pulses which was performed in two steps. First, the x-axes of all pulses were scaled to unit time using the pulse duration as the scaling parameter, and second the y-axis (acceleration) of all pulses was scaled to a unit area under the pulse.
Finally, all pulses were resampled using linear interpolation to have 51 sample points.

Remove cases with incomplete pulse information
The definition of incomplete pulses used in this study was adopted from [11]. In summary, a pulse for which the last recorded acceleration was above 45% of the peak acceleration was considered to be incomplete and was removed from further analysis.

Parameterize the pulses using singular value decomposition
Pulse shapes have previously been parameterized in many different ways, including polynomials or Fourier components [5], half sine approximations [6], two-step approximations [7], and different geometrical approximations such as haversine and triangular shapes [12]. Since the aim of this study is to evaluate how much of the pulse information is relevant from an injury point of view, a parameterization that can be gradually improved with the addition of parameters-all the way back to the original pulse-is required. This is not possible with fixed geometrical shapes like halfsines, haversines, or triangles. This can be done using Fourier components, but for this study eigenvectors were chosen. Eigenvectors have similarities to Fourier components, but instead of having the sine curve as the base vector the shapes of the eigenvectors are based on the actual data being studied. This means that eigenvectors should be more efficient in parameterizing the pulse shape, and for a fixed number of parameters the eigenvectors will give a better approximation of the actual pulse shape.
The eigenvalue analysis was carried out using R version 3.0.3 [13]. First, the average pulse shape for all pulses was subtracted from all pulses to create a residual pulse shape for each pulse. Second, the residual pulse shapes were analysed using singular value decomposition. This numerical method yields fundamental vectors, i.e. shapes, called eigenvectors, and there are as many eigenvectors as there are points in the discretization of the pulse. The eigenvectors are stored such that the first represents 'most' of the residual variation, the second represents the next most, and so on. In addition, one eigenvalue for each eigenvector is calculated. By summing the eigenvalues multiplied by the eigenvectors and then adding the previously subtracted average pulse shape, the original pulse shape can be recreated (see Equation 2).
To create a pulse shape approximation, this summation is carried out by reducing the number of eigen vectors used. The addition of each eigenvector adds another parameter, and the approximation gets closer and closer to the original pulse shape (Figure 1).
This study uses the same eigenvectors derived in [8] and are thus based on a larger set of data from EDRs. More details on the eigenvalue analysis can be found in [8].
Analyse the correlation between pulse information and occupant injury As shown in Figure 1, the approximations using more eigenvalues more closely follow the original pulse shape. The question, therefore, is how many eigenvalues-or how much information-is needed to sufficiently predict occupant injury. Here, sufficient means that there should not be any statistical difference between the injury prediction using the approximation or the original pulse. In this study, this will be analysed using an FE simulation model. This model uses a driver's position environment that consists of a simplified seat, a three-point belt system, a deformable steering wheel, a steering wheel airbag, a generic instrument panel, and a generic door structure ( Figure 2). The three-point belt system utilizes a force limiter with a constant force of 4.5 kN. No pretensioner was modelled. The properties of the deformable steering wheel were set according to [14]. The steering column was modelled with an energy-absorbing feature with a maximum stroke of approximately 70 mm and a force increasing linearly from 4.5 kN to 6.5 kN over the stroke. Except for the applied pulse, this model is the same as the one used in the study by [11] where it is described in detail. The airbag trigger time was set to 13 ms, which was equal to the median trigger time for the EDR cases.  The occupant was the THUMS 50% male version 2.21-040407 human body model (HBM) with improvements according to [15]. This model was previously validated for side and frontal loading by [16] and [17] and for oblique loading by [18].
For each of the EDR cases, six pulse versions were created. The first version was the original pulse as recorded by the EDR module. As this pulse is based on only the longitudinal DV, it was assigned to the sled longitudinal acceleration. The lateral acceleration was set to zero, despite the PDOF reported in the NASS file. In addition, five different simplifications of the pulses were created, and assigned to five other versions of the FE model. The first approximation was to use only the average pulse shape scaled to the correct DV. In the other four approximations, one, two, three, or ten eigenvectors were added to the average pulse shape. This is equal to summing up to max ¼ 1, 2, 3, or 10 according to Equation 2.
For each EDR case and for all six versions, the risk of fracture for each rib was computed according to the probabilistic framework developed by [19]. This is a tissue-based injury criterion based on rib strain, taking full power of detailed HBMs. A tissue-based chest injury criterion, measured on element level, should be more sensitive to variations in the boundary conditions, compared to traditional ATD chest criteria like CTI, max VC or 3 ms maximum acceleration. However, instead of using the empirical cumulative distribution function (ECDF) for the rib fracture risk originally used by Forman et al., a Weibull distribution was fitted to the data of [20] and [21] to create a smooth risk function. The benefit of a smooth risk function is that a small increase in strain will give a small increase in risk for rib fracture. For a non-smooth risk function, e.g. an ECDF, a small increase in strain will give either a zero increase or a large increase in risk for rib fracture. This will induce more noise in the statistical calculations, and more details can be found in Appendix A. The risk of a rib fracture was based on the element with the highest maximum principal strain measured at the neutral layer of the cortical bone with the assumption that the occupant was either 30 or 70 years old. The fracture risk was age adjusted according to a linear function as shown in Appendix A, and no attempt was made to calculate multiple fractures within a single rib. The expected number of fractured ribs was then computed using the assumption of independence between fractured ribs. With this assumption, the expected number of fractured ribs was computed as a simple sum of the fracture risk for each rib according to Equation 3.

Expected number of fractured ribs
where p i is the risk of fracture for each rib.

Statistical analysis
To evaluate each crash pulse approximation, the simulations using these approximations were compared to the simulations using the original pulses. For each approximation, a linear regression model was created using the expected number of fractured ribs based on the original pulse as the independent variable and the expected number of fractured ribs using the approximate pulse as the dependent variable. However, because the distribution of DV was heavily skewed towards low-speed impacts, most cases would give a low risk of injury and the injury results would thus also be skewed towards low injury risk. Therefore, to avoid violating the assumption of homoscedasticity in the linear regression, the injury risk, computed as the expected number of fractured ribs, was first log transformed. For each regression model, the 95% confidence intervals were computed for the intercept and the slope. A systematic difference between the approximation and original pulse shapes was shown when both the confidence interval for the intercept did not include zero and when the confidence interval for the slope did not include zero. All statistical analyses were performed using R version 3.0.3 [13].

Results
Out of the 145 identified cases in the method section, 23 cases were judged to have incomplete EDR recordings. In other words, of the 145 identified EDR recordings, 122 cases fulfilled the inclusion criteria and were used for the analysis in this study. Seventy-five percent of these cases had a NASS estimated within PDOF 10 , and ninety-two percent within PDOF 20 . Figure 3 shows the relationship between the original crash pulse and the first three approximations for a 30-yearold occupant. The x-axis shows the result of the original pulse, and the y-axis shows the pulse approximation. The dashed line (y ¼ x) represents the condition in which the approximation perfectly predicts the original pulse. Any results below that line mean that the pulse approximation underestimates the injury risk. The solid line represents a linear regression model fitted to the data points. If this line does not match the dashed line, then there is a systematic difference between the approximation and the original pulse.
The results for the 30-year-occupant are further shown in Table 1. For a pulse approximation without any systematic error, the regression intercept should be zero and the slope should be equal to one. It is clear in the data that adding more eigenvectors results in a smaller approximation error. The first two approximations in which the pulse shape is approximated using only the average shape or adding one eigenvector have regression parameter with confidence intervals not covering zero and one, respectively. This means that there is empirical evidence that these approximations on average underestimate the injury risk. The results for the approximations using more than one eigenvector show that on average these approximations underestimate the injury risk slightly, but this underestimation is not significant at the 95% confidence level.
The results for a 70-year-old occupant are presented in Table 2. The convergence rate is slightly higher, but the results are essentially the same as for the 30-year-old occupant. It should be noted that the model predicts more rib fractures for the 70-year-old occupant compared to the 30-year-old occupant at the same crash severity. However, the model predicts more rib fractures both for the original pulse and for the pulse approximations meaning that the ratio is essentially the same as for the 30-year-old occupant.

Discussion
In Equation 2 and Figure 1, it is clear that by adding more eigenvectors the pulse shape approximation will become more and more similar to the original pulse shape. It should be noted that the area under the curve is the same for all approximations meaning that the total DV is correct for all approximations. The average acceleration will also be correct because the duration is the same for all the approximations. Therefore, the reduction in approximation error when more eigenvectors are added originates from other improvements of the pulse shape.
In Figure 3 and Table 1, it appears that even with the simplest pulse shape approximation, i.e. only the average pulse shape, the injury prediction was still reasonable. The linear regression slope was as high as 0.83 in the log-log scale, which corresponds to an average error of 17%. However, some cases showed more error than others, and therefore a natural question is what characterizes these cases?
Studies of real life data have shown that including peak acceleration in addition to DV (or mean acceleration) can enhance the injury prediction capabilities [1,3]. A more correct representation of the peak acceleration for approximations with more eigenvectors is, therefore, a natural candidate for explaining the improved injury prediction. To explore this with the data from this study, the approximation error was plotted against the error in peak acceleration in Figure 4. The left graph shows the results for the simplest pulse shape approximation, and the right graph shows the results for the pulse shape approximation with two eigenvectors, which according to Table 1 has a very small approximation error. In the left graph, there is a weak trend indicating that in cases where the simplified pulse shape underestimates the peak acceleration the injury prediction will also be underestimated. However, because the scatter is very large the predictive power of the peak acceleration variable is low. It can also be seen that there are many cases with a large error in peak acceleration but a very small error in injury outcome. Furthermore, it is seen in Figure 3 and Table 1 that an approximation with two eigenvectors gives injury estimates very close to the original pulse. As seen in the right graph, most of the injury prediction error is removed (all dots are close to zero on the y-axis). However, many of the two-eigenvector pulse approximations still underestimate the true peak acceleration by up to 35% (which can be seen by the spread along the x-axis). Thus, the convergence of the injury prediction seen for the two-eigenvector approximation cannot be explained by a convergence of peak acceleration in the approximation because the peak acceleration is far from converged in the two-eigenvector approximation.
A detailed analysis of the data showed that in many of the cases where the approximation underestimates the injury, the peak acceleration and much of the change in  velocity occurs early in the crash. This indicates that instead of the magnitude of the peak acceleration the early DV history seems to be important for the injury outcome. One way to measure the effect of the early DV history is to compute the maximum relative occupant-to-vehicle velocity. The definition of 'early' using this measure is from time zero to the time of maximum relative occupant-to-vehicle velocity, which in this study occurs roughly at 60 ± 12 ms. This relative occupant-to-vehicle velocity can be modified either by changing the interior safety systems of the vehicle or by changing the early part of the pulse. Because the interior safety system was the same for all simulations in this study, the difference in relative occupant-to-vehicle velocity is only due to the early DV history. In Figure 5, the difference in relative occupant-to-vehicle velocity is plotted against the difference in injury outcome. The left graph shows the results for the simplest approximation, and a trend can be seen with cases having a large difference in relative occupant-to-vehicle velocity also showing a large difference in injury outcome. In the right graph, this trend disappears in the results from the approximation with two eigenvectors. In fact, except for some scatter around zero, both the injury outcome and relative occupant-to-vehicle velocity have converged.
This convergence means that the important part of the pulse that is improved by adding the first two eigenvectors to the average pulse shape is the representation of the early DV history. A better approximation of this part of the pulse, as seen in the right graph of Figure 5, means that there will not be any statistically significant difference between the injury prediction using the approximate and original pulse.
The influence of the first two eigenvectors on the pulse shape is shown in Figure 6. The column labelled 'Mean' shows the average pulse shape. The other columns show how the average pulse shape is changed by the eigenvector when the corresponding eigenvalue has a value of plus/ Figure 4. Scatterplot of the error in approximation (y-axis)-measured as the difference in the expected number of fractured ribs-versus the relative difference in peak acceleration (x-axis). The error was computed according to [Expected number of fractured ribs using the approximate pulse À Expected number of fractured ribs using the original pulse]. The relative difference in peak acceleration was computed according to [peak acceleration using the approximate pulsepeak acceleration using the original pulse]/peak acceleration using the original pulse. Figure 5. Scatterplot of the error in the approximation (y-axis)-measured as the difference in the expected number of fractured ribs-versus the difference in peak relative velocity between the spine and the bucket (x-axis). The error was computed as [Expected number of fractured ribs using the approximate pulse À Expected number of fractured ribs using the original pulse]. The difference in relative occupant-to-vehicle velocity was computed according to [peak acceleration using the approximate pulsepeak acceleration using the original pulse]/peak acceleration using the original pulse. The occupant's velocity was measured at the T7 level of the spine. minus one or two standard deviations. The first eigenvector mainly moves the location of the peak acceleration, while the second eigenvector changes the pointiness of the pulse. The most improvement is from adding eigenvector two, and this seems to be most important from the chest injury point of view.
This study indicates that four parameters are enough to describe the pulse from an occupant chest injury point of view. This is consistent with the result of Huang et al. [5] who also concluded that 'approximations to vehicle deceleration involving as few as four parameters are adequate in many cases … to evaluate occupant dynamics'. While Huang et al. used a simple 2-D representation of the occupant and restraint system, the present study confirms these results using detailed 3-D modelling, including a FE HBM, and tissue-based injury criteria.

Limitations
The major limitation of this study is that the same vehicle interior, with the same interior safety system, was used for all cases. In other word, the results, in terms of convergence rate, could be different for another vehicle interior. A second limitation is the sampling of the data. Although the cases in the NASS/CDS database are selected using stratified sampling, not all NASS/CDS cases include EDR information. This means that the sample used in this study is not a truly random sample. Finally, although it is within the scope of this study, it is a limitation that only the risk of rib fractures was analysed. Other body parts, like the head or pelvis, might be more or less sensitive to pulse variations, and this would then affect the convergence rate.

Conclusion
The information in a crash pulse can be reduced to four parameters without any statistically significant reduction in the ability to predict rib fractures using tissue-based injury criteria. In this study these four parameters are DV, the pulse duration, and two additional parameters that modulate the pulse shape. Using the assumption of a linear agerelated fracture risk, this study shows that the results are consistent for younger and older vehicle occupants. An accurate representation of the peak acceleration is not necessary; instead it is a correct representation of the initial part of the pulse, i.e. the time up to peak loading, that is the most important from a chest injury point of view. Figure 6. The shape of first two eigenvectors and how these influence the pulse shape when they are added to the average pulse shape. The first column shows the two first eigenvectors. The column labelled 'Mean' shows the shape of the average pulse. The other columns show how the average pulse shape is changed by the eigenvector when the corresponding eigenvalue has a value of plus/minus one or two standard deviations according to the equation Pulse shape ¼ Average pulse shape þ ðEigenvalue i ÃEigenvector i Þ where i ¼ 1 or 2 in the figure.
In order to create a smooth risk function for rib fracture, the original data from [20] and [21] were treated in the same way as in [19] with regard to averaging the test of each subject and normalizing the data to a 25-year-old person. After this, the strain data points were scaled to the desired age using the same procedure as in [19] according to Equation A1. e fail:age of interest ¼ e 25yr 1 À age À 25 ð Þ 0:051 10 (A1) Finally, instead of creating an empirical cumulative distribution function, a Weibull distribution was fitted to the strain data points. In Figure A1, the cumulative Weibull distribution is compared to the empirical cumulative distribution function for a 30-year-old person. Figure A1. Comparison of an empirical cumulative distribution function and fitted Weibull distribution for a 30-year-old person.