Video frame feeding approach for validating the performance of an object detection model in real-world conditions

The challenge of evaluating deep learning-based object detection models in complex traffic scenarios, characterized by changing weather and lighting conditions, is addressed in this study. Real-world testing proves time and cost-intensive, leading to the proposal of a Video Frame Feeding (VFF) approach as a solution. The proposed Video Frame Feeding approach acts as a bridge between object detection models and simulated environments, enabling the generation of realistic scenarios. Leveraging the CarMaker (CM) tool to simulate realistic scenarios, the framework utilizes a virtual camera to capture the simulated environment and feed video frames to an object identification model. The VFF algorithm, with automated validation using simulated ground truth data, enhances detection accuracy to over 95% at 30 frames per second within 130 meters. Employing the You Only Look Once (YOLO) version 4 and the German Traffic Sign Recognition Benchmark dataset, the study assesses a traffic signboard identification model across various climatic conditions. Notably, the VFF algorithm improves accuracy by 2% to 5% in challenging scenarios like foggy days and nights. This innovative approach not only identifies object detection issues efficiently but also offers a versatile solution applicable to any object detection model, promising improved dataset quality and robustness for enhanced model performance.


Introduction
Ensuring road safety is a critical component of modern vehicular systems, particularly with the increasing integration of computer vision systems in vehicles.These systems often have a virtual camera mounted behind the Inside Rear View Mirror (IRVM), play a crucial role in recognizing and interpreting traffic signs, thereby aiding in safe and informed driving.The efficiency of such systems hinges on the robustness of the underlying object detection algorithms, which must accurately identify, track, and label objects in real-time from video feeds [1].Despite advancements in this domain, existing systems predominantly rely on statistical-based evaluation methods, utilizing Precision-Recall Curves (PR curves) to assess the performance of object identification models [2].While these methods provide a baseline for evaluation, they exhibit limitations, particularly in real-world scenarios characterized by dynamic and adverse environmental conditions [3].Weather phenomena such as rain, fog, and nighttime conditions can significantly impede the model's ability to accurately detect and label traffic signs, leading to reduced precision and reliability.Furthermore, the pre-processing of video feeds, aimed at enhancing visibility, has shown to be insufficient in addressing these challenges, often resulting in a low detection rate and limited robustness in rapidly changing environments.This highlights a critical gap in the current evaluation and validation methodologies, necessitating a more comprehensive and versatile approach to testing object detection models across a spectrum of environmental conditions.This paper introduces a novel traffic sign detection method, leveraging the capabilities of CM, an advanced environmental simulation tool used for vehicle dynamics and Advanced Driver Assistance Systems (ADAS) [4][5][6].Through this tool, we generate photorealistic simulations that encompass a variety of road networks, static and moving objects, and an array of traffic entities.Utilizing IPG Movie for visualization, and a virtual camera configured within CM, we extract video feeds of the simulated environment under various manually adjusted conditions.Central to our approach is the VFF algorithm, a unique automation algorithm designed to identify regions where the object detection model falters.This algorithm facilitates the use of video frames reconstructed from traffic sign data streams, previously trained with the GTSRB dataset [7,8] on the YOLOv4 CSPDarknet53 framework [9,10].Accompanied by a pre-processing phase incorporating Non-Local Means (NL-Means) estimation [11], and image enhancement techniques for contrast and brightness adjustment, our method ensures a comprehensive and nuanced validation of object detection models.Our study specifically employs YOLOv4 for object detection, focusing on traffic sign detection as a practical application.We assert that by retraining the model with conditions that initially led to failure, we can significantly augment the model's accuracy and reliability, ultimately contributing to safer driving experiences and more robust computer vision systems in vehicles.
Key Features of the VFF Algorithm: • The VFF algorithm efficiently processes video frames from a simulation environment.These frames are generated based on traffic sign embedded data acquired from pre-trained with the GTSRB dataset using the YOLOv4 CSPDarknet53 framework.• It incorporates a pre-processing step using NL-Means estimation to smooth irregular object boundaries or edges.This step uses pixel similarity for refinement.
• The visual quality of the video frames is improved by adjusting their contrast and brightness.• This algorithm is cost-effective, allowing for the simulation of various real-time environmental conditions.These conditions, challenging to capture in reality, can be easily simulated, offering broader perspectives of target objects.• The study confirmed that around 95% accuracy in traffic signboard detection is achieved.Specifically, an improvement of 2-5% was reached for foggy day, cloudy, foggy night, and nighttime weather conditions from the data observed before applying the proposed VFF algorithm.
The rest of this paper is organized as follows: Section 2 discusses related works, which include a recent study on this topic.Section 3 discusses the methodology, which includes three phases of the model testing approach (i) simulation of video stream using CM Tool, (ii) Conversion of video frames (generation, pre-processing, and contrast and brightness enhancement), and (iii) Validation of object detection model using proposed VFF algorithm.Section 4 presents the results and analysis to demonstrate the efficiency of the proposed method.Section 5 briefly describes the entire work and suggests possible future guidelines.

Related works
This section provides an overview of recent articles on the performance evaluation of deep learning-based object detection models in various weather conditions, with a specific focus on traffic signboard detection.It clearly elaborates on the different weather conditions used for validation, as well as the datasets, which include images and video clips -adapted for each paper under discussion.

Deep learning-based object detection models
The exploration of deep learning-based object detection models has garnered significant attention, addressing various challenges posed by environmental conditions and specific use cases.Sharma et al [12] employed YOLOv5 to recognize entities like cars, traffic lights, and pedestrians, showcasing its robustness in both rainy and normal weather conditions.Al-Haija et al [13] introduced a powerful detection system leveraging transfer learning and Nvidia GPU, with a comparative study across three deep learning models, demonstrating superior performance in diverse weather conditions.Humayum et al [14] utilized the CSPDarkNet53 architecture, aiming at vehicle recognition under low illumination and blurred visibility, resulting in enhanced performance across various challenging conditions.The implementation of a pedestrian detection algorithm by Liu et al [15] demonstrated effective pedestrian identification during rainy conditions, addressing occlusion problems and eliminating rain streaks.Rothmeier and Huber [16] evaluated stateof-the-art object detectors in both normal and foggy conditions, creating dynamic test scenarios to analyze performance degradation under fog.Hnewa and Radha [17] analyzed the efficiency and limitations of de-raining techniques for object detection, providing insights into their impact on performance.Hasirlioglu and Riener [18] compared the performance of object detection algorithms under clear and adverse weather conditions, systematically investigating the effects of rainfall on various sensor data.

Deep learning model for traffic sign detection
Deep learning models have also been tailored specifically for traffic sign detection, aiming to enhance accuracy and adaptability.Azfar et al. [19] focused on training deep learning models for vehicle detection in complex urban traffic conditions, addressing challenges such as dust and wind sway.Jeon et al. [20] developed a model containing three primary detection modules to detect taillights rapidly and precisely in traffic.Chen et al. [21] integrated visibility complementation modules with YOLOv3, striving to enhance vehicle detection systems in low visibility conditions.Zhu et al. [22] employed RetinaNet for traffic sign recognition and detection, improving detection performance through virtual simulation.Ren et al. [23] used the Recurrent Rolling Convolutional (RRC) model to analyze real-time environmental influences on object detection, providing a systematic evaluation of these impacts.Kuo and Lin [24] investigated CNNs for real-time road sign detection, validating their approach with real-world video data.Lei et al. [25] focused on semantic segmentation, training, and testing labelling models pixel by pixel even in snowy conditions.Jia Li and Zengfu Wang [26] explored Deep Neural Networks (DNN) adapted to traffic sign detection through transfer learning, evaluating their performance across various metrics and conditions.Serna et al. [27] utilized Mask R-CNN for the detection and classification of a wide range of traffic signs, demonstrating successful application across multiple sign categories.These studies collectively highlight the advancements and adaptations made in deep learning models to cater specifically to the challenges posed by traffic sign detection.Table 1 presents a summary of related works, encompassing research focus, methodology, datasets, evaluation metrics, and inferences.From this table, it becomes evident that current validation methods predominantly emphasize statistical evaluations.Such evaluations entail inputting images or recorded videos into an object identification model, with the PR curve employed to assess the model's output accuracy.In realworld situations, numerous edge cases exist where the model may fail to recognize objects due to various environmental conditions.Practically, it is challenging to test a model across all possible environments simultaneously.To acquire specific environmental conditions, one would need to travel to locations with the requisite climate and capture relevant scenarios.This approach is not only time-consuming and labour-intensive but also incurs substantial expenses.
We have identified a gap in the evaluation of realtime performance of object detection models across diverse environmental conditions and scenarios.To bridge this gap, we propose a novel model validation approach.This approach enables the virtual creation of a wide range of real-world scenarios within a simulation environment, facilitating the comprehensive evaluation of object detection models' real-time performance and the identification of all potential edge cases.

Proposed method
This section presents our three-fold methodology for object detection model validation: simulation of driving scenarios using CM tool, conversion of these simulations into enhanced video frames, and model validation using the proposed VFF algorithm.Together, these phases ensure a robust evaluation of the model's performance under varied conditions.

Overview
Figure 1 depicts the performance evaluation of the YOLOv4 CSPDarknet53 model for detecting traffic signs, utilizing the proposed VFF algorithm.The process unfolds as follows: The CM simulation tool creates a photorealistic simulation of the driving environment, encompassing road networks, traffic signs, static objects (such as buildings and trees), and various traffic participants.Key parameters like vehicle model, driving maneuver, and road profile are set in CM to facilitate this simulation, aligning the scenarios with the GTSRB dataset.In these scenarios, a virtual camera, linked to the VFF algorithm, is mounted behind the IRVM below the vehicle's windshield.As the simulation progresses, this virtual camera captures the frontal road view, inclusive of traffic signs, road markings, and other road users.The captured data is then transformed into a data stream, embedded with simulation time and a list of traffic signs from CM, and sent to a predefined port.
The VFF algorithm connects to this port, retrieving the video streams and separating the embedded data from the video content.The video data is utilized to reconstruct frames to a specific image size, which are subsequently fed into a pre-processing model.This model serves to eliminate noise and enhance image quality.Following pre-processing, the frames are input into the YOLOv4 Darknet framework, loaded with the newly trained GTSDM.The GTSDM then performs traffic sign identification, yielding a list of identified objects corresponding to the simulation time.Concurrently, the ground truth data for traffic signboard identification is extracted from the embedded data provided by CM.
The final step involves comparing the object lists generated by GTSDM and the CM ground truth data, specific to the simulation time.This comparison facilitates a thorough evaluation, highlighting instances where the GTSDM may fail to accurately identify traffic signs across varying environmental conditions.Through this process, the proposed methodology not only identifies potential shortcomings in the model but also opens avenues for model enhancement, particularly in challenging scenarios and conditions.

Simulation of video stream using CM tool
A virtual camera is mounted behind the IRVM placed below the vehicle's windshield and continuously captures the frontal road simulation environment.This includes oncoming traffic signs, road markings, buildings, and other road users, which are displayed in IPG Movie.To facilitate this process, all intrinsic and extrinsic parameters of the camera must be defined within CM, and the camera configuration parameters must be properly initialized, as detailed in Table 2.The environmental data captured by the virtual camera is then transformed into a data stream.This stream, along with embedded data containing the simulation time and a list of traffic signs from CM, is fed to a pre-defined port.
The provided pseudocode outlines a simplified process for configuring a virtual camera to capture a simulation environment in CM, and it can be divided into three main parts: initialization, configuration file creation, and simulation environment capture.
Initialization: Consider input parameters including the number of camera views (C view ), frame size dimensions (W for width and H for height), camera FoV, export format (V export ), frame rate (f r ), camera mounting positions (C pos_X , C pos_Y , and C pos_Z ), camera rotation angles (φ x_rot , φ y_rot , and φ z_rot ), and the  port number for CM-VFF algorithm communication (P socket ).These parameters are then stored in a dictionary named "params" for easy access and organization.

Configuration File Creation:
The pseudocode proceeds to create and configure the virtual camera by writing the parameters to a configuration file, VDS.cfg.The file is opened in write mode, and for each parameter in the params dictionary, a line is written to the file in the format "parameter: value".After writing all the parameters, the configuration file is closed.

Simulation Environment Capture:
A connection to CM is established using the specified port number.If the connection is not successful, an error message is logged, and the programme exits.If the connection is successful, the pseudocode enters a loop to continuously capture frames from the CM simulation environment.This loop runs as long as the simulation is running.Each captured frame is then processed.

Proposed VFF algorithm
The VFF algorithm establishes a connection with CM via the same port.It initiates an empty binary string and assigns it to a variable named "data".Subsequently, it calculates the payload size.If the data's length is smaller than the payload size, the VFF algorithm begins to receive video streams from that port.It then segregates the embedded simulation data from the video data   streams based on the payload size.The embedded simulation data includes the number of camera output channels, the actual simulation time, the length of the data, and a list of CM object identifications in bytes.

Generation of video frames
Read the video streams of the simulation environment from the specified port and use the received video stream to regenerate the video frames utilizing the NumPy package.The video data streams provide the frame information in bytes.If the length of the data is less than the image length (frame size; for RGB: height * width * 3), proceed to receive the frame data up to the image length.When the length of the frame data matches the image length, the frame data will contain the information of the first frame in byte format.Utilize the NumPy package to transform the frame data from byte strings to an array, and subsequently recreate the video frame.In instances where the received frame is not in RGB format, employ the OpenCV package to convert the frame data to RGB format.The newly created video frame will encompass simulation environment data from CM, including traffic objects and signboards.

Pre-processing of video frames
Pre-process the video frame to eliminate noise and enhance the frame's quality.Remove noise from the generated frame using the NL-Means denoising approach.The underlying principle of the NL-Means denoising approach is to replace a pixel's colour with an average of the colours of similar pixels.Hence, it scans a large image area to find all pixels closely resembling the pixel targeted for denoising.The similarity is assessed by examining the entire window around each pixel, rather than just the colour alone.To remove noise from a colour image, implement pixel-wise calculations, represented as x = (x 1 , x 2 , x 3 ).The filtered image value at position p for a pixel is given in Eq. ( 1): Where, x(p) is the filtered image value at p position, y(q) is the unfiltered image value at q point, w(p, q) is the weighting function, and N(p) is a normalizing factor given in Eq. (2): The weighting function, based on a normal distribution with a mean μ = M(p), is given in Eq. (3): Where, h is the filtering parameter and M(p) is the local mean value in the region around p in an image, calculated as shown in Eq. ( 4): Where, |R(p)| represents the number of pixels in region R, and R(p) ⊆ A is a square region of pixels surrounding p, with dimensions (2r + 1) × (2r × 1) pixels.Due to computational constraints, the research zone is confined to a fixed square neighbourhood.For moderate and small values of σ , the window size is set to 21 × 21.For larger values of σ , the size of the research window increases to 35 × 35, allowing for the identification of more similar pixels to further minimize noise.In NL-Means, the restoration of each pixel value is achieved by averaging the most similar pixels, with similarity determined from the colour image.Consequently, each channel value for every pixel result from averaging the values of similar pixels.

Enhancement of contrast and brightness
After the initial pre-processing step, the received frames from CM undergo a noise reduction process using the NL-Means de-noising model, resulting in the generation of frames free from noise artifacts.The effectiveness of this noise reduction is quantified using the Peak Signal-to-Noise Ratio (PSNR), which in this case, yields a value of 28.9, indicating a substantial enhancement in image quality.Subsequently, these de-noised frames are subjected to various image enhancement techniques to further refine the visual quality.These techniques include adjustments to brightness, contrast, and sharpness, ensuring that the frames are optimally prepared for the subsequent object detection tasks.Upon completion of the image enhancement process, the frames are then fed into the YOLOv4 CSPDarkNet53 model.This model has been specifically trained for the task of traffic signboard detection and is adept at performing object detection tasks under varied environmental conditions.The darknet helper function is utilized to load the pre-trained model, which then processes the input frames to identify and classify objects within them.The output from the model comprises a list of detected object classes, along with their associated confidence scores and bounding box coordinates, providing a comprehensive overview of the detected objects within each frame.Concurrently, the proposed VFF algorithm receives a list of objects detected by CM, along with the corresponding simulation times.This information is then compared with the output from the YOLOv4 CSPDarkNet53 model to assess the accuracy and reliability of the object detection process.By conducting a comparative analysis between the detected object classes from the YOLOv4 CSPDarkNet53 model and the CM object identification list, extracted from the embedded simulation data, we are able to evaluate the performance of the model across different simulation times.This comparative analysis not only provides valuable insights into the model's proficiency in traffic signboard detection but also highlights potential areas for improvement, ensuring that the model can be further refined and optimized for future applications.
Pseudocode of proposed VFF algorithm for validating traffic signboard detection is given below which designed to process video streams and produce enhanced video frames.Initially, a connection is established with a predefined socket using specified port and host IP parameters.Once connected, the algorithm continuously receives video data packets and processes them in chunks defined by a payload size.For each chunk, the video data is extracted, and if its length matches the expected image size, then it generates the determined without considering various weather conditions.To enhance this accuracy without increasing hardware costs or the execution time for training and testing, we utilized the CM simulation tool.This tool was used to create a highway simulation environment in which multiple weather conditions were introduced simultaneously.This setup allows for an evaluation of the YOLOv4 CSPDarkNet53's performance under these conditions.Our analysis focuses on two main aspects: (i) identifying traffic signboards located within 150 meters from the driving vehicle, and (ii) detecting multiple traffic signs situated in the same region but separated by varying distances.All evaluations were conducted under six different weather conditions generated using the CM tool.A comparative analysis was then performed between the ground truth data from CM and the output from the YOLOv4 CSPDarkNet53 model, taking into account the influence of the proposed VFF algorithm for traffic signboard detection.

GTSRB Dataset
The GTSRB dataset is selected for the proposed research application.It comprises 900 images from 43 distinct traffic signboard classes, divided into four categories: prohibitory, danger, mandatory, and priority.As illustrated in Fig. 2, the GTSRB dataset has been classified for this study.The YOLOv4 darknet53 variant serves as the object detection model for this research.The dataset has been prepared in the YOLOv4 format, which means that the 900 images from the GTSRB dataset needed to be annotated accordingly.Prior to this, the GTSRB dataset underwent preprocessing techniques such as scaling, contrast enhancement, sharpening, and Principal Component Analysis (PCA) [29].The PCA technique assists in reducing the dataset's dimensionality while ensuring minimal information loss, thereby enhancing its interpretability.An automated annotation algorithm facilitated the conversion of raw data to the YOLO format.Of the original 900 images, 80% were designated for training, and the remaining 20% for testing.The resultant dataset consists of images and their corresponding annotation files.These files detail the object's class, its coordinates, and its dimensions in terms of height and width.The training and testing dataset folders are renamed as OBJ and Test, respectively.

Experimental setup
For our experimental analysis, we randomly selected four traffic signboards from a total of 43.Based on the chosen environmental parameters, we developed a CM scenario.We then simulated CM at a constant vehicle speed of 72 kmph and evaluated the detection capabilities of the GTSRB model using the proposed VFF algorithm under six different weather conditions.Observations indicate that the model was able to classify the traffic signboard into the selected classes with an accuracy exceeding 95%.

Analysis of detection accuracy (%) with random GTSRB detection under six conditions
This analysis primarily centres on the detection capabilities of the YOLOv4 DarkNet53 model, especially after being influenced by the proposed VFF algorithm.This model involves furnishing the model with clearer frames, which becomes notably crucial under certain weather conditions.For instance, in foggy day, foggy night, and standard night scenarios, the model required closer proximities to detect with precision.The detailed analyses is summarized per class as follows: Prohibitory: Achieved near-perfect detection during the day at 138 meters, but on foggy days, the model required a much closer range of just 12 meters.Performance remained high during cloudy conditions and dusk at 124 and 130 meters respectively.Foggy nights and standard nights required distances of 35 and 77 meters for optimal detection.Danger: Exhibited optimal performance during the day at 148 meters and on foggy days at a reduced 52 meters.Cloudy conditions and dusk demanded 145 and 150 meters respectively.However, on foggy nights, accuracy slightly dipped at 38 meters, but rebounded on standard nights at 60 meters.
Mandatory: Consistently detected during the day and at dusk at 150 meters.On foggy and cloudy days, the model required 55 meters and distances greater than 150 meters respectively.Foggy nights and standard nights needed 30 and 45 meters for accurate detection.
Priority: Demonstrated consistent performance across all conditions.During the day, it detected optimally at 130 and 90 meters on foggy days.Cloudy conditions, dusk, foggy nights, and standard nights required distances of 125, 95, 60, and 30 meters respectively.
In essence, while the YOLOv4 DarkNet53 model showed high accuracy across all classes, its detection range varied significantly based on weather conditions, with foggy scenarios necessitating closer proximities for accurate detection.It is clearly depicted in Table 3.

Analysis of accuracy (%) for traffic signs in close proximity but at different distances
In our study, we selected a traffic junction point at random, which has multiple traffic signboards separated by specific distances.For instance, there's a 15-meter gap between the prohibitory sign (TS10-No Overtaking) and the priority sign (TS43-Stop).Likewise, there's a 25-meter distance between the mandatory sign (TS33-Go right) and the priority sign (TS43-Stop).It's important to note that the danger and priority classes are positioned at the same location but are separated by a minimal distance of 2 meters.Fig. 3 illustrates the model verification in different environmental conditions.We observed variations in accuracy percentages, particularly for the Prohibitory and priority classes during foggy days, foggy nights, and nights.This variation can be attributed to cross-validation conflicts within the classes.For instance, within the prohibitory class, there's a conflict between TS10-no overtaking and TS11-no overtaking trucks.Similarly, within the priority class, we observed variations due to conflicts between TS28-pedestrian crossing and TS29-school crossing, which is clearly depicted in Table 4.Such cross-validations extend the execution time, reducing the detection accuracy.Consequently, more accurate  detection is achieved when the vehicle is closer to the target point.For the prohibitory class, the detection accuracy is as follows: during the day it's 99.85%, on foggy days it drops slightly to 97.09%, on cloudy days it's 97.71%, during dusk, it's at a high of 99.87%, on foggy nights it's 97.47%, and during regular nights it's 99.29%.For the danger class, the accuracy remains consistently high across all conditions: 99.87% during the day, 99.95% on foggy days, 99.93% on cloudy days and dusk, 99.92% on foggy nights, and 99.88% during the night.The mandatory class has the following accuracies: 99.96% during the day, 99.94% on foggy days, 99.9% on cloudy days, 99.93% during dusk, 99.57% on foggy nights, and 99.86% during the night.Lastly, for the priority class: the accuracy is 99.93% during the day, 99.98% on foggy days, 99.98% on cloudy days, 99.97% during dusk, it drops to 91% on foggy nights, and recovers to 97.47% during regular nights.
Table 4 reveals that under various weather conditions such as foggy days, cloudy weather, foggy nights, and nighttime, the detection accuracy for prohibitory and priority signs is notably lower compared to other categories.This observation stems from simulation video frames in CM, which were used to test traffic signboard detection under various weather scenarios.The diminished detection accuracy in these conditions is attributed to reduced visibility from greater distances; the system only recognizes the signs when the vehicle is relatively close to them.Consequently, the detection performance falls short of its standard accuracy levels.However, the implementation of the proposed VFF algorithm offers a solution by ensuring high detection accuracy from longer distances relative to the target point, thereby correctly identifying all four types of traffic signs regardless of the weather conditions.An improvement ranging from 2% to 5% in detection accuracy was noted with the VFF algorithm during foggy days (up to 99.24%) and cloudy conditions (up to 99.86%) for the prohibitory class.In the case of the priority class, the accuracy increased to 96.15% on foggy nights and 99.62% at night, after applying the VFF algorithm.

Evaluation metrics
Figure 4 (a) presents the output chart of average loss and mean Average Precision (mAP) against the number of iterations during model training.In the graph's early stages, the average loss begins at a notably high value, a common trait for models in their initial training phase.However, as the number of iterations increases, the loss shows a marked decline.This trend is emblematic of the model progressively refining its predictions, improving its alignment with the true labels of the training data.Generally, a model that exhibits a lower average loss is indicative of increased precision in its predictions.While the average loss gives insight into the model's evolving predictive capability, the mAP provides a measure of its accuracy over time.The mAP values, as displayed atop the graph, undergo fluctuations initially but seem to stabilize around a commendable 97% in the latter stages.This highlights the model's proficiency in accurately predicting the correct labels as training progresses.Specifically, an average loss can typically range from as low as 0.05 (in scenarios with a small model and a less complex dataset) to as high as 3.0 (when dealing with a larger model trained on intricate datasets).In the context of the traffic signboard detection that this graph represents, the model achieves an average loss of 0.098.This value, coupled with an mAP of 96.6%, signifies a highly precise model.Overall, the graph illustrates a successful training progression.The model's loss decreases significantly, indicating it's making better predictions as it learns, and the mAP values stabilize at a high percentage, showing that the model is achieving high precision in its predictions.
The evaluation matrix illustrates precision, recall, F1 score, True Positive (TP), False Positive (FP), False Negative (FN), average IoU, and mAP for the 8000 iterations depicted in Fig. 4(b).As the iterations progress, FP and FN identifications decrease, while TP increases.This leads to an enhancement in both precision and recall with each iteration.The model is becoming more adept at correctly identifying objects, resulting in fewer FP and FN identifications.The F1 score also shows an increasing trend with each iteration.An average Intersection over Union (IoU) greater than 50% is utilized to calculate the mAP of the model.As observed, the average IoU increases with each iteration, and concurrently, the mAP also rises.
Tables 5 and 6 provides a detailed evaluation of four distinct traffic signboard classes across different environmental conditions.This assessment uses several key metrics, including mAP, Recall, Precision, F1score, TP, FP, and Average Intersection over Union (Avg.IoU).The data specifically highlights the performance of a newly created pre-processed video, which underwent contrast and brightness enhancement by CM, when fed into the YOLOv4 framework with the proposed VFF algorithm.The table aids in comparing the efficacy of the detection model under various conditions, such as day, foggy day, cloudy, dusk, foggy night, and night.
The comparative analysis distinctly showcases the superiority of the proposed YOLOv4 CSPDarkNet53 model in traffic signboard detection across diverse weather conditions as shown in Fig. 5.Even when juxtaposed with established detection methodologies such as R-CNN, Fast R-CNN [30], Faster R-CNN [31], SSD [32], and YOLOv2 [33], the proposed model consistently registers a remarkable detection accuracy of 97%.This unvarying high performance across all conditions underscores the model's robustness and adaptability.The enhancements incorporated into the YOLOv4 CSPDarkNet53 evidently render it a more efficient and reliable choice for real-world applications, particularly in dynamic environments where weather variations can pose significant challenges to accurate traffic sign detection.

Conclusion
The real-time testing and validation of object detection models have been simplified using the proposed method.In this study, the YOLOv4 object detection and classification model was trained using the GTSRB dataset.The model was validated in various environmental conditions such as day, foggy day, cloudy, dusk, foggy night, and night using the VFF approach, and was tested at a vehicle speed of 72 km/hr.Impressively, the model recognized traffic signs with an accuracy exceeding 95% at a detection speed of 30 fps under these varying conditions.A comparative analysis was performed by juxtaposing the model's traffic sign detection results with the ground truth detection list from CM.This comparison provided insights into the model's accuracy, detection range, and the number of TP, FP, and FN detections.Such a method makes it convenient to pinpoint inconsistencies in object identification for specific classes, thereby guiding dataset improvements and subsequent model retraining.The VFF plays a pivotal role in fine-tuning the model to enhance its performance.A salient feature of the proposed method is its seamless compatibility with various object detection models, including R-CNN, Fast R-CNN, Faster R-CNN, SSD, and YOLO.Moreover, VFF expedites the validation process demonstrates a noteworthy improvement in accuracy, achieving a 2% to 5% increase across diverse environmental conditions such as foggy days, overcast skies, fog-enshrouded nights, and clear nights.Given that CM is a simulation tool tailored for automotive environments, this approach is ideally suited for all automotive object detection models, facilitating testing and validation in real-world scenarios.It's worth noting that this study didn't encompass rain and snow conditions due to limitations within the CM tool.Looking ahead, there's potential to augment this study by integrating VFF with other simulation tools, enabling the evaluation of object detection models under a broader spectrum of weather conditions.

Informed consent
Informed consent was obtained from all the authors.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 1 .
Figure 1.Operational flow of assessing the object detection model with the proposed VFF algorithm.Note: The diverse environmental conditions are generated using the CM tool, aiding in evaluating traffic sign detection performance amidst real-time variations.

Figure 3 .
Figure 3. Model verification in different environmental conditions.(a) Day, (b) Foggy Day, (c) Cloudy, (d) Dusk, (e) Foggy Night, and (f) Night.Note: traffic signboards in close proximity but at different distances.

Figure 4 .
Figure 4. Evaluation chart.(a) Progression of average loss and mAP against the number of iterations during model training, and (b) Evaluation metrics of the entire 8000 iterations.

Figure 5 .
Figure 5. Comparative analysis of traffic signboard detection methods.

Table 1 .
Summary of related works.

Table 2 .
Camera configuration in the CM tool.

Table 3 .
Results of detection accuracy (%) with random GTSRB detection under six weather conditions.

Table 4 .
Results of YOLOv4 DarkNet53 model verification in various weather conditions.