Influence of road environmental elements on pedestrian and cyclist road crossing behaviour

The pedestrian and cyclist-related accident fatality rate is higher than that of other traffic accidents. One of the pedestrian behaviours that leads to traffic accidents is the act of moving rapidly onto the road from a blind spot without warning. Expert drivers practice hazard-anticipatory driving and will naturally seek to reduce uncertainty by attempting to fit their current driving context into a pre-existing category. Risk management is the process of identifying hazards and assessing and controlling risks to attain safety. The purpose of this study was to evaluate the influence that driving context-altering road environmental elements exert on the road-crossing behaviour of pedestrians and cyclists. Thus, this study attempted to identify covert hazards (obscured pedestrians and cyclists). A logistic regression analysis was employed along with data from the near-miss incident database, in which approximately 140,000 near-crash-relevant events were registered in 2017. By using the logistic regression analysis along with the annotations recorded in the database, we constructed a predictive model to identify covert hazards. The study demonstrated the feasibility of using a set of environmental elements that shape the driving context to construct a predictive model that identifies covert hazards.


Introduction
Erratic pedestrian and cyclist behaviour associated with moving into the road from blind spots is one of the factors that contribute to the occurrence of accidents and near-misses. Accidents involving either pedestrians (36.5%) or cyclists (13.4%) [1] are some of the most common types of accidents that lead to traffic fatalities. Pedestrians and cyclists experience an increased accident risk when crossing roads, owing to the shared road space [2]. Among the fatal pedestrian-and cyclistrelated accidents, 66.5% occurred when they crossed the road. Prevention of these accidents is vital for decreasing future traffic accident fatalities.
Urban areas that are crowded with residential and commercial buildings and stores contain many blind spots in which the driver's field of view is poor. One of the pedestrian and cyclist behaviours that leads to traffic accidents is the act of moving rapidly onto the road from a blind spot without warning. While the driver's environment leads to (i) cognitive errors, (ii) decision errors and aggressive behaviours, and (iii) performance errors [3,4] on the part of the driver, pedestrians can also cause accidents. For example, distracted pedestrians [5] or risk-takers may neglect to look for an approaching car while crossing a road. This may be due to circumstances such as talking with friends and walking the dog, which can cause distraction. Additionally, using a cell phone or camera or reading a magazine can cause inattentive blindness.
Car manufactures have developed advanced driver assistance systems (ADAS) to prevent traffic accidents pertaining to pedestrian and cyclist behaviour. The function of an ADAS includes enhancing driver perception, drawing attention to potential risks, setting off warning signals, and performing safety control [6]. For example, night vision/pedestrian detection, forward collision warning, and forward collision mitigation/avoidance systems have already been put into practical use. Night vision/pedestrian detection systems display an object photographed by an infrared camera on the display. When a forward collision warning system detects a pedestrian in front of the vehicle, it provides a warning (alert) to the driver to allow them to take evasive action. If evasive actions are not taken, the system activates autonomous emergency braking to avoid a crash. However, these ADASs support the driver only after they have detected a pedestrian or vehicle. Thus, these systems fail to address potential risks. When a pedestrian or a cyclist initiates a road crossing from a blind spot, the conventional forward collision mitigation/avoidance systems may not be able to avoid a crash due to the limited and short time margin. In such situations, expert drivers will naturally seek to reduce uncertainty by attempting to fit their current driving context into a pre-existing category [7]. Experienced drivers perform preventive driving, reducing their speed in advance based on knowledge and experience. Our research objective is the quantification of the hazard-anticipatory knowledge acquired by drivers.
Related works that have investigated the factors that influence vehicle-pedestrian crashes fall into two categories: (1) Microscopic spatial analysis [8,9] is carried out at specific locations in order to identify the relevant factors and solve safety issues. (2) Macroscopic spatial analysis [10][11][12][13][14][15] is focused on zonal-level traffic accidents at various levels of the entirety of the specified area and captures spatial trends and safety issues.
Chen and Zhou [15] explored the relationship between vehicle and pedestrian crash frequency vs. risk and various built environment factors such as road network and land use, intersection type, stop sign density, zonal speed limit, and school density. However, most of the studies that have investigated macroscopic spatial analysis cannot be accurately tracked in terms of driver behaviour when an accident occurs.
Today, powerful sensor technology is available (e.g. cameras, light detection and ranging (LIDAR), and radar [16]), and drive recorders [17] can also be useful for quantifying hazard-anticipatory knowledge. Since 2004, the Smart Mobility Research Center (SMRC) at the Tokyo University of Agriculture and Technology has managed the near-miss incident database. The recorded number of near-miss events and traffic accidents exceeded 140,000 in 2017. The near-miss incident data were recorded by drive recorders on taxis. When the longitudinal deceleration exceeded −0.45G, the drive data were automatically recorded 10 s before the trigger and 5 s after the trigger. This database includes vehicle and driver information. In addition, qualitative data (annotations) describing environmental elements that altered the driving context, such as area type, road type, intersection type, and traffic density, were added by rating experts. Previous studies [18][19][20] have proposed the potential risk estimation method, which is based on modelling the knowledge and experience contained in the near-miss database. These investigations focused on analysing the near-miss event formation process and proposed a machine learning approach to predict the time margin allowed for drivers to perform evasive action to avoid a crash, which represents the severity level of a near-miss.
The purpose of this study is to evaluate the influence that driving context-altering environmental elements exert on the road-crossing behaviour of pedestrians and cyclists. Whereas previous studies have attempted to estimate the potential risk (i.e. severity), the present study attempts to identify covert hazards (concealed pedestrians and cyclists). A near-miss incident database is used to achieve this purpose. In this study, we propose the following hypothesis: • The environmental elements that alter the driving context can determine whether a pedestrian or a cyclist will initiate a road-crossing from a blind spot.
Predicting or knowing the hazard information that could determine whether a pedestrian or a cyclist will move rapidly onto the road from a blind spot without warning can be useful in terms of assisting drivers to avoid near-miss incidents. Thus, the contribution of this study is in obtaining the hazard-anticipatory information related to whether a pedestrian or a cyclist will move rapidly onto the road from a blind spot without warning. The remainder of this paper is organized as follows: In Section 2, the methodology is described, which includes data extraction, pre-processed annotation, and data analysis methods. In Section 3, the results obtained from the analysis are presented. In Section 4, a discussion is provided regarding the abovementioned hypothesis.

Data extraction
We searched and extracted the specified data by using the management tool provided in the near-miss incident database. In this study, the near-miss incident data were extracted in the following manner: • Near-miss incidents that were extracted were separated into severity levels that included low, medium, and high. These incident severity levels were defined by the time margin allowed for drivers to perform evasive action to avoid a crash, and the rating procedure was based on a subjective method that depended on the experience of rating experts. • Data related to pedestrian and cyclist behaviour were selected. • Data recorded between 1 January 2011 and 1 January 2015 were selected.
The number of data points extracted using this procedure is shown in Table 1. Our target scenarios involved a pedestrian or cyclist initiating a roadcrossing from a blind spot at an unsignalized intersection. In the database, the information related to blind spots was not annotated, except for recorded video format information. Therefore, the following information was manually extracted while referring to recorded video footage: • Whether a pedestrian or a cyclist will move rapidly onto the road from a blind spot. Data pertaining   to pedestrians was marked with the symbol "0," and data pertaining to cyclists was marked with the symbol "1." • Elements that create a blind spot. These were annotated and categorized into four different groups: Building walls, parked vehicles, motionless vehicles, and moving vehicles. • Whether or not signalized intersections were involved. This study focused on investigating scenarios that did not involve a signalized intersection.
Near miss data at signalized intersections were thus excluded. • Whether or not the road on which the vehicle was travelling was a priority road. This study focused on investigating the erratic behaviour of pedestrians and cyclists. The events in which pedestrianand cyclist-travelled roads were at priority were excluded.
The extracted data as well as the classification of the blind spot elements are presented in Tables 2 and 3. The percentage (67.4%) of events in which cyclists initiated a road-crossing from a blind spot created by a building wall was relatively high compared to pedestrians (35.6%).

Environmental elements that alter the driving context
The annotations that expressed environmental elements had been added to the recorded data in the database by SMRC ratings experts. However, annotations were not added to all data. For example, annotations were not added to low severity incidents involving near-miss events. In addition, the criteria for classification of annotations were not publicly available. Therefore, in this study, the criteria for the classification of the annotations were redefined as shown in Table 4. Based on these criteria, annotations were attached to all of the extracted data (total 1423 events). A total of 13 parameters were used. These contextual properties could be classified into three groups: (a) static context properties, (b) dynamic context properties, and (c) other context properties, as shown in Table 4. The 12 parameters excluding Y gap were qualitative properties. The lateral distance to the object causing the blind-spot, Y gap , was one factor that could contribute to a reduction in the driver's field of view. The value of the lateral distance for each event was estimated using a distance measurement tool provided in the database. Although it is not easy for a vehicle system equipped with sensor technology to estimate the lateral distance to an object that causes a blind spot in real-time, the lateral distance, Y gap , was treated as though it could be estimated in this study. The sidewalk type, as shown in Table 4, was defined based on the degree of separation between the car driving corridor and the footpath. SIDE1 was defined as the condition in which the car driving corridor and the footpath were not completely separated. SIDE2 was defined as the condition in which the car driving corridor and the footpath were separated by a white line. SIDE3 was defined as the condition in which the car driving corridor and the footpath were clearly separated by curbs, and SIDE4 was defined as the condition in which the car driving corridor and the footpath were clearly separated by guardrails. The definitions for the density of parked vehicles, pedestrians, traffic, and leading vehicle are also shown in Table 4. With the exception of Y gap , the data used in this study were transformed into dummy variables (symbol "1" or "0"). Thus, a total of 38 categories were used.

Model description
The purpose of this analysis was to investigate the associations between pedestrian or cyclist behaviour and possible contributory variables (environmental elements) by using the data recorded in the near-miss incident database. This study used a logistic regression analysis to achieve this purpose. The goal of the analysis using logistic regression was to find the best fit parameters through the use of the model-building technique.
In the logistic regression model, a latent variable can be described using the following expression: where x j is the value of the jth explanatory variable, β j is the corresponding coefficient for j = 1, 2, 3, . . . , p, and p is the number of explanatory variables. With the latent Two categories: SUNNY, RAIN a The sidewalk type was defined as degree of a border between the car driving corridor and the footpath. SIDE1 was defined as the condition that the car driving corridor and the footpath are not completely separated. SIDE2 was defined as the condition that the car driving corridor and the footpath are separated by a white line. SIDE3 was defined as the condition that the car driving corridor and the footpath are clearly separated by a ramp, and SIDE4 was defined as the condition that the car driving corridor and the footpath are clearly separated by guardrails. b The parked vehicle density was defined as the number of parked vehicles present in the approximately 10 s before the nearmiss event occurred. The LOW PV was defined as less than two vehicles, MID PV was defined as three to five vehicles, HIGH PV was defined as 6 or more vehicles. c The pedestrian density was defined as the number of pedestrians present in the approximately 10 s before the near-miss event occurred. The LOW PED was defined as less than two pedestrians, MID PED was defined as three to nine pedestrians, HIGH PED was defined as 10 or more pedestrians. d The traffic density was defined as the number of traffic present in the approximately 10 s before the near-miss event occurred.
The LOW PED was defined as less than two running vehicles, MID PED was defined as three to nine running vehicles, HIGH PED was defined as 10 or more running vehicles. e The leading vehicle (LV) was defined as a vehicle in front of the own vehicle, and the distinction was made based on whether or not there was a LV in front of the own vehicle.
variable, the specific form of the logistic regression model was formulated as follows: where π(x) is the conditional probability of a positive outcome. The logistic regression model can also be expressed as a logit transformation: The maximum likelihood method was then employed to measure the associations. The logistic regression model determines the coefficients that produce the observed outcome (a scenario in which either a pedestrian or a cyclist moves rapidly onto the road from a blind spot). Thus, the best estimate β can be obtained.
In the logistic regression model, the influence that the contributory variables exert on the outcome can be revealed by the odds ratio: with a 95% confidence interval (CL). If the odds ratio is 1, it denotes that the environmental element does not have an impact on the outcome. Therefore, if the upper limit of the odds ratio (Upper CL) is less than 1, it denotes that the environmental element has an impact on the outcome. This implies that the environmental element leads to a scenario in which a pedestrian moves rapidly onto the road from a blind spot. However, if the lower limit of the odds ratio (Lower CL) is greater than 1, it also denotes that the environmental element has an impact on the outcome. This implies that the environmental element leads to a scenario in which a cyclist moves rapidly onto the road from a blind spot. In this study, using the stepwise method, the explanatory variables could be chosen based on the value of the index used for evaluating the goodness of fit in the Akaike Information Criterion (AIC). Its effectiveness in describing the associations between pedestrian or cyclist behaviour and possible contributory variables (environmental elements) could be assessed and revealed with numerical overall measures using the Hosmer-Lemeshow statistic.

Data description
The dataset used in this analysis was derived from a sample in which a pedestrian or cyclist initiated a roadcrossing from a blind spot created by a building wall. To facilitate the interpretation of the results of the analysis, the number of data points was made uniform in both cases. Among the 267 scenarios in which pedestrians moved rapidly onto the road from blind spots created by building walls (Table 2), 250 scenarios were randomly selected. Among the 459 scenarios in which cyclists moved rapidly onto the road from blind spots created by building walls (Table 3), 250 scenarios were randomly selected. The explanatory variables could be continuous or discrete, as can be seen in Table 4.

Performance measure
The logistic regression model provides a two-class classification based on a threshold. To judge the model's performance, four possible outcomes were calculated: true positives, true negatives, false positives, and false negatives. True positives were defined as the model correctly determining that a cyclist moved rapidly onto the road from a blind spot. True negatives were defined as the model correctly determining that a pedestrian moved rapidly onto the road from a blind spot. False negatives occurred when the model determined that a pedestrian moved rapidly onto the road from a blind spot, when in fact a cyclist moved rapidly onto the road from a blind spot. In contrast, false positives occurred when the model determined that a cyclist moved rapidly onto the road from a blind spot, when in fact a pedestrian moved rapidly onto the road from a blind spot. Based on these four possible outcomes, the accuracy, precision, recall, and specificity were then calculated. The area under the curve (AUC) was also calculated based on the receiver operating characteristic (ROC). Regarding the two-class classification, this study employed a threshold value for maximizing accuracy. k-fold cross-validation was conducted to evaluate the generalization performance of the constructed predictive model. In this study, as can be seen in Table 5, k was set to 5. The following procedure was followed for each of the k folds. The model was trained using k−1 of the folds as the training data. The constructed model was then validated using the remaining data. It was thus used as test data to calculate performance measures such as accuracy. The average values of the performance measures calculated in the loop of the k-fold cross-validation were evaluated.

Results
The descriptive statistical values for the environmental elements are listed in Table 6. To identify the significant environmental elements that affected pedestrian or cyclist road-crossing behaviour, logistic regression analysis was employed using the R software package (R 4.0.1). The result of the logistic regression analysis is shown in Table 7. This represents one result from a loop of the 5-fold cross-validation. The explanatory variables (environmental elements) were chosen using the stepwise method. The significance of the explanatory variables (environmental elements) was considered using the p-values. In this study, we set the alpha levels at p < .05 in order to interpret the significant results. Thus, it was found that the chosen variables were statistically significant with the exception of URBAN. This means that the partial regression coefficient was not statistically zero. The Hosmer-Lemeshow test was conducted, and the test revealed that the model was well-calibrated (p > 0.05).
From the analysis, it was found that there were four significant environmental elements namely ONE WAY, BOTH WAY, MID PED, and W/O LEADING VEHI-CLE (LV) for the pedestrian road-crossing behaviour. These environmental elements indicated a negative partial regression coefficient. Additionally, the upper limits of the odds ratio (Upper CL) were less than 1. This indicated an increased likelihood of a pedestrian initiating a road-crossing from a blind spot. For the MID PED condition, the results showed that pedestrians were more likely to initiate a road-crossing. As can be seen in Table 6, the number of events in which pedestrians initiated a road-crossing under the MID PED condition was large compared to cyclists (54 pedestrians vs. 23 cyclists). Typically, under these circumstances, cyclist movement behaviours are limited compared to those of pedestrians. For the WITH LV condition, the results also showed that pedestrians were more likely to initiate a road-crossing. The number of events in which pedestrians initiated a road-crossing under the WITH LV condition was large compared to cyclists (57 pedestrians vs. 27 cyclists). The results suggested that pedestrian behaviour could be characterized by quickness and flexibility.
In addition, it was found that there were nine significant environmental elements associated with cyclist road-crossing behaviour. These were namely SIDE1, SIDE2, SIDE3, 2-or 3-FORKED ROAD, 4-or 5-FORKED ROAD, 1 LANE, WITH CROSSWALK, SUNNY, and TIME16-20. These environmental elements indicated a positive partial regression coefficient, and the lower limits of odds ratio (Lower CL) were more than 1. The likelihood of a cyclist initiating a road-crossing from a blind spot was higher than that of a pedestrian initiating a road crossing from a blindspot. For the SIDE1, SIDE2, and SIDE3 conditions, the results showed that cyclists were more likely to initiate a road-crossing. SIDE4 was defined as the condition in which the car driving corridor and the footpath were clearly separated by guardrails. In general, pedestrians have more freedom of movement than cyclists. As can be seen in Table 6, the number of events in which pedestrians initiated a road-crossing under the SIDE4 condition was large compared to cyclists (62 pedestrians vs. 30 cyclists). Thus, the results suggested that SIDE1, SIDE2, and SIDE3 elements could contribute to cyclist road-crossing behaviour. For the 2or 3-FORKED ROAD and 4-or 5-FORKED ROAD conditions, the results also showed that cyclists were more likely to initiate a road-crossing. The number of  The predictive model used to classify between the two scenarios involving either a pedestrian or a cyclist initiating road-crossing behaviour was constructed using the logistic regression model and incorporated the influence of predictor factors (environmental elements). The four possible outcomes in judging the model's performance are listed in Table 8. From Table 8, it can be seen that the model demonstrated an accuracy of 67.0%. Table 9 shows the results of 5-fold crossvalidation along with a list of the descriptive statistical values for accuracy, precision, recall, and specificity. The model validation demonstrated an accuracy of 69% on average, indicating that the binary logit model possessed moderately accurate predictive power (i.e. the AUC was more than 0.7 on average).

Discussion
This study evaluated the influence that driving contextaltering environmental elements exert on pedestrian and cyclist road-crossing behaviour. This was done through the construction of a logistic regression model using near-miss incident data. The data included driving data that were recorded automatically 10 s before   and 5 s after a trigger. Thus, the set of environmental elements that altered the driving context was based on the driver's point of view because the data were not recorded from the pedestrian's or cyclist's point of view. The reason for using this data, which was based on the driver's point of view, was that our research goal was the development of a hazard anticipatory driving assistance system to improve driving safety that would not require a communication system. With this goal in mind, the data had to be information that could be registered by on-board vehicle sensors. Although the sets of environmental elements faced by drivers and other road users (pedestrians and cyclists) have similar characteristics, their sets of environmental elements cannot be matched. Therefore, the challenge of this study was to identify covert hazards (pedestrians and cyclists) by using information that could be registered by the onboard sensors of a vehicle. The original hypothesis was that environmental elements that alter the driving context could be used to determine whether a pedestrian or a cyclist would initiate a road-crossing from a blind spot.
The results of the logistic regression analysis revealed the significant environmental elements that contributed to the occurrence of pedestrian and cyclist road-crossing behaviour. Based on the calculated odds ratios, we were able to distinguish between the elements that were significant for pedestrian and cyclist road crossing behaviours. The pedestrian and cyclist characteristics, such as quickness and flexibility in behaviour, could be visualized as elements that were significant for those road crossing behaviours. Through the 5-fold cross-validation, the classification performance of the constructed predictive model was evaluated. The result showed an accuracy of 69% on average and revealed that the binary logit model possessed moderately accurate predictive power. Human behaviour is not simple enough to be explained using environmental elements alone. Human behaviour can also be affected by psychological factors. The main finding was that it was possible to construct a predictive model with 69% accuracy using only a set of environmental elements. The study demonstrated the feasibility of constructing a predictive model to identify covert hazards by using the set of environmental elements that altered the driving context based on the driver's point of view. The results essentially supported this hypothesis.
This study had several limitations. The question of whether the information that was used could indeed be obtained from a vehicle's on-board sensors remains to be addressed. The use of obtainable information is expected from a systems development perspective. The effect that missing information has on prediction accuracy has not yet been examined. Due to the use of annotations, the information used in this study was limited. Thus, the use of image information also has the potential to enable the extraction of more information or features compared to the set of environmental elements that we considered in this study. It is also necessary to consider different methods, such as deep learning techniques, to improve the accuracy of the predictive model.

Conclusions
Risk management is the process of identifying hazards, assessing risks, and controlling risks to attain safety. The purpose of this study was to evaluate the influence that driving context-altering environmental elements exert on the road-crossing behaviour of pedestrians and cyclists. Thus, this study attempted to identify covert hazards (concealed pedestrians and cyclists). To investigate this, a near-miss incident database was used. Through a logistic regression analysis incorporating the annotations recorded in the database, we constructed a predictive model to identify covert hazards. The study demonstrated the feasibility of constructing a predictive model to identify covert hazards by using a set of environmental elements that alter the driving context based on a driver's point of view. Identifying the contextual information on road environment elements that contribute to the occurrence of pedestrian and cyclist road-crossing behaviour would be useful for acquiring hazard-anticipatory knowledge. Further investigation is required that could involve the addition of annotations along with the incorporation of additional training data to improve the prediction accuracy. The proposed future work is as follows: • In order to improve safety, it would be useful to know from which blind spot elements other road users initiate a road-crossing. A future research challenge is the construction of a model to identify the priority of blind spot elements. • It would be useful to know the velocity of other road users at a road-crossing. A future research challenge is the construction of a model to identify the velocity of covert hazards. • A vital part of the development of an ADAS incorporating hazard-anticipatory knowledge is the construction of the framework for context-sensitive hazard anticipation to predict the potential risk of situations based on the driving context and the driver's behavioural state.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Notes on contributors
Yuichi Saito received the BS and MS degrees in mechanical engineering from the Shibaura Institute of Technology in 2008, and 2010, and the Doctor's degree in engineering from University of Tsukuba in 2015. Since 2019, he has been with the University of Tsukuba, where he is an Assistant Professor with the Faculty of Engineering, Information and Systems. His research interests include smart collaborations between human and machine, shared control, and vehicle dynamics.
Fuma Kochi received the BS degree in information engineering from University of Tsukuba in 2020. Since 2020, he is a student of Master's Program in Risk and Resilience Engineering, University of Tsukuba. His research interests include information science, risk, and resilience engineering.
Makoto Itoh received the BS, MS, and Doctor's degrees in engineering from University of Tsukuba in 1993, 1995, and 1999, respectively. Since 2013, he has been a Professor with the Faculty of Engineering, Information and Systems, University of Tsukuba. His research interests include enhancement of operator's situation awareness, adaptive automation, and building of appropriate trust as well as prevention of over-trust and distrust.