Validation of a full-shift benzene exposure empirical model developed for work on offshore petroleum installations on the Norwegian continental shelf

Abstract Workers on offshore petroleum installations might be exposed to benzene, a carcinogenic agent. Recently, a full-shift benzene exposure model was developed based on personal measurements. This study aimed to validate this exposure model by using datasets not included in the model. The exposure model was validated against an internal dataset of measurements from offshore installations owned by the same company that provided data for the model, and an external dataset from installations owned by another company. We used Tobit regression to estimate GM (geometric mean) benzene exposure overall and for individual job groups. Bias, relative bias, precision, and correlation were estimated to evaluate the agreement between measured exposures and the levels predicted by the model. Overall, the model overestimated exposure when compared to the predicted exposure level to the internal dataset with a factor of 1.7, a relative bias of 73%, a precision of 0.6, a correlation coefficient of 0.72 (p = 0.019), while the Lin’s Concordance Correlation Coefficient (CCC) was 0.53. The model underestimated exposure when compared to the external dataset with a factor of about 2, with a relative bias of −45%, a precision of 1.2, a correlation coefficient of 0.31 (p = 0.544), and a Lin’s CCC of 0.25. The exposure model overestimated benzene exposure in the internal validation dataset, while the precision and the correlation between the measured and predicted exposure levels were high. Differences in measurement strategies could be one of the reasons for the discrepancy. The exposure model agreed less with the external dataset.


Introduction
Benzene is a constituent of the petroleum production stream.Hence, there is a potential for exposure to benzene during daily work for several job groups on offshore petroleum installations.Benzene is classified as carcinogenic (IARC 2018), and exposure should remain as low as reasonably practicable (Ministry of Labour and Social Inclusion 2013).Consequently, knowledge about determinants with a significant impact on benzene exposure is a central part of the chemical risk assessment and constitutes an essential basis for success in reducing exposure levels.
Chemical risk assessments have traditionally included personal exposure measurements, using a sampling strategy that aims to cover representative workdays for the job groups in question.Industry-specific empirical models have been developed in several industries as a tool for risk assessment or as a part of exposure assessment in epidemiological studies, e.g., in the rubber industry (Kromhout et al. 1994), furniture industry (Mikkelsen et al. 2002), bridge painting (Qian et al. 2010), farming (Basinas et al. 2013), and oil drilling (Steinsvåg et al. 2006).Empirical exposure models are often based on measurements collected during a specific period.When such models are used to predict current exposure levels, care should be taken.Thus, before the models are implemented as a part of a risk assessment, they should be validated against other datasets containing measurements from the same type of industry, to check the accuracy and precision of the predicted exposure levels (Cherrie and Schneider 1999).
The authors recently published an empirical exposure model for full-shift exposure to benzene among workers on offshore petroleum installations (Ridderseth et al. 2022a).The exposure model was developed using exposure measurements collected between 2002 and 2018, mainly from one oil and gas company.Here, we aim to validate this model by comparing predicted exposure levels with measured benzene exposure from data collected between 2019 and 2021 from the same oil and gas company.In addition, we assessed agreement with an external dataset collected between 2002 and 2006 from another company that did not contribute measurements for the development of the exposure model.

Materials and methods
The exposure model for benzene on offshore petroleum installations was based on 924 personal benzene measurements collected between 2002 and 2018 from 25 offshore oil and gas installations (Ridderseth et al. 2022a).The exposure model comprised the following determinants: job group, design of the installation, season, and wind speed (Table 1).In addition, the exposure model was adjusted for sampling duration.Thus, the exposure model can be used to predict geometric means of benzene exposure for different job groups under various conditions.To validate the exposure model for benzene (Ridderseth et al. 2022a), an independent dataset of personal benzene exposure measurements was compiled (internal dataset).This dataset included 92 full-shift measurements collected by experienced occupational hygienists in the period 2019 to 2021 on four installations from the same company, of which three installations contributed data during the 2002 to 2018 period.
The selection of the four offshore installations was based on a set of criteria that aimed to cover differences in the design and size of the installations included in the exposure model: two selected installations were considered as small and two as large, judged by the number of beds (about 70 beds vs. about 350 beds).The measurements were planned to be conducted on random days during different seasons (three installations were measured during winter-November, December, and March) and one during summer (May) (designated according to Table 1).Three job groups were chosen to participate: laboratory technicians, mechanics, and process operators.On the two smallest installations, there were no laboratory technicians, and their tasks were performed by the process operators.At the end of the work shift, employees were interviewed about their activity during the workday and factors that may have influenced the level of benzene exposure.Employees also wore direct reading instrument CUB (ION Science Ltd, UK) to record continuous volatile organic compound (VOC) exposures.The workers were asked to explain which activity or tasks had been performed during peak exposure periods.
Agreement between the exposure model and a dataset was evaluated against measurement data from two Floating Production, Storage, and Offloading (FPSO) vessels owned by another oil company (external dataset).Measurement values in the external dataset were collected between 2002 and 2006 and contained 84 personal measurements collected by an offshore nurse in cooperation with an experienced occupational hygienist.According to the measurement reports, the measurements were mainly compliancedriven.FPSO vessels were equipped with similar processing systems for crude oil and gas as the installations where measurements in the internal dataset were taken.Further, personal measurements were collected from the same three job groups.Information on determinants included in the exposure model was identified in the monitoring reports.However, information about wind speed was missing for 50% of the measurements.For these measurements, we used the mean wind speed for the installation of Gullfaks C as a proxy, reported by the Norwegian Center for Climate Services for the respective days of measurements (https://seklima.met.no/).

Measurement method and analyses
The measurements in the internal dataset were collected using passive automated thermal desorption (ATD) and analyzed using the TD-GC/MS (Thermal desorption-gas chromatography/mass spectrometry) method.The limit of detection for benzene was 0.001 ppm.Exposure measurements were performed for the whole work shift (12 hr) (mean sampling duration: 677 min).
In the external dataset, 3 M 3500 passive diffusion dosimeters for organic vapors were used, and the measurements covered the whole work shift (mean sampling duration: 645 min).The limit of detection (LOD) for benzene was stated to be 0.001 ppm in one of the reports, while the other reports did not provide information about the LOD.The analyzing laboratory used was the same for both the internal and external datasets.The authors assumed an LOD of 0.001 for measurements that reported "below LOD" and the LOD's numerical value was missing.

Statistics analyses
The exposure model was validated against measured exposure levels according to Burstyn et al. (2002), who claimed that for these estimations it is preferable to use average exposure assigned to a group rather than individual measurements.We predicted geometric means (GMs) and 95% confidence intervals (95% CI) based on the Tobit regression model (Ridderseth et al. 2022a) to take into account measurements below LOD.We estimated GM benzene exposure overall, for each job group and each job group within the installations.On the two smallest installations in the internal dataset, there were no laboratory technicians because their tasks were performed by the process operators.
To evaluate the agreement between the predicted and measured GM exposure levels, bias, antilog of the bias, relative bias, precision, correlation, and Lin's Concordance Confidence Correlation (CCC) with associated 95% CI were predicted.The bias is the mean differences between predicted GMs and measured GMs for exposure using a logarithmic scale (1).This bias was also expressed as a multiplier by taking the antilog of the bias (2).Relative bias is the difference between the mean of the predicted GM exposure and the mean of the measured GM exposure, expressed as percent (3).Precision reflects the reliability of the predicted exposure.It is expressed as the standard deviation of the difference between measured GM and predicted GM for exposure (4) (Friesen et al. 2005): where: ŷi ¼ natural log of predicted GM exposure level y i ¼ natural log of measured GM exposure n 0 ¼ number of GMs in the validation set Lin's CCC (Lin 1989) where: q ¼ correlation coefficient between measured and predicted GM exposure level r ¼ variance l x ¼ natural logof predicted GM exposure level l y ¼ natural log ofmeasured GM exposure Spearman correlation (r) was used to study the association between the estimated GM benzene exposure for each job group within the four installations and the corresponding and predicted exposures from the model.
A Bland Altman plot was used to visualize the agreement between the GM exposure level for the individual job groups at each of the installations, both from the internal and external datasets, and the corresponding predicted exposures.
To conduct the statistical analyses the software STATA, version 17 (StataCorp LLC, US) was used.

Results
The overall measured GM exposure level of benzene in the original dataset used for the development of the exposure model was 0.004 ppm (range: <LOD to 16.75 ppm), and 26% of the measurements were below the LOD (Ridderseth et al. 2022a) (Table 2).

Internal validation
The measured GM exposure level in the internal dataset was 0.001 (range: < 0.001 to 0.078) ppm, and 40% of the measurements were below LOD (Table 2).Overall, the model overestimated the exposure with a factor of 1.7, a relative bias of 73%, and a precision of 0.6 (Table 3).There was a strong and significant correlation between exposure predicted by the model and the measurements (r ¼ 0.72, p ¼ 0.019) (Figure 1) and Lin's CCC was 0.53.
The measured GM exposure levels for the laboratory technicians, mechanics, and process operator job groups were 0.005, 0.001, and 0.001 ppm, and the corresponding predicted GM for the same job groups were 0.004, 0.002, and 0.003 ppm, respectively (Table 2).The model overestimated the GM exposure of the groups of the internal dataset by a factor of 1.6 for the mechanics and 2.7 for the process operators, with relative biases of 62% and 170%, respectively (Table 3).In contrast, the model underestimated the laboratory technicians' exposure by a factor of 0.8 with a relative bias of À18%.The precision of the predicted estimates was highest for laboratory technicians.
For measurements taken for internal validation, laboratory technicians reported daily tasks associated with benzene exposure such as collecting samples from the produced water system, crude oil, and gas.The mechanics and process operators had few daily activities involving benzene exposure during measurement periods.However, some activities were performed such as changing or cleaning filters, recertifying or changing valves, and pipeline inspection gauge (PIG) operation.

External validation
For the external dataset, the overall measured GM exposure level of benzene was 0.004 (0.004-0.011) ppm, with 30% of the measurements below LOD (Table 2).The model underestimated the geometric mean exposure of the groups of the external dataset with a factor of 0.6, the relative bias was À45%, and the precision was 1.2.The Lin's CCC was 0.25 (Table 3).The correlation between the geometric mean exposure predicted by the model and the measured geometric mean exposure was not significant (r ¼ 0.55, p ¼ 0.25) (Figure 1).
For laboratory technicians, mechanics, and process operator job groups, the measured GM exposures were 0.018, 0.002, and 0.012 ppm, and the corresponding predicted GM exposure levels were 0.002, 0.003, and 0.008 ppm, respectively (Table 2).When grouping job groups into each of the installations, the model underestimated geometric mean exposure for laboratory technicians and process operators by factors of 0.2 and 0.6, respectively (Table 3).In contrast, the model overestimated the exposure for the mechanics by a factor of 1.4.The relative bias for the job groups was À79% for laboratory technicians, 30% for mechanics, and À37% for process operators.The precision of the predictions was higher for process operators than for laboratory technicians and mechanics.
Table 2. Descriptive data for the measured and predicted exposures in the internal and external datasets were used for validation of the exposure model for benzene on offshore petroleum installations (Ridderseth et al. 2022a

Discussion
Overall, the full-shift benzene exposure model developed for laboratory technicians, mechanics, and process operators (Ridderseth et al. 2022a) overestimated exposure at four offshore petroleum installations between 2019 and 2021, by a factor of 1.7.The correlation between measured and predicted benzene exposure levels for the internal datasets was strong and statistically significant.The exposure model agreed less with external measurements.
Based on the relatively high precision, the high correlation between measured and predicted GM exposures, and the narrow scatter around the zero bias line of the Bland-Altman plot (Figure 2), the exposure model shows reasonable agreement with the internal dataset.However, if applying the exposure model for risk assessment, one must consider biased estimates and the fact that the internal validation covered only the lower range of exposure levels in the original dataset used for developing the model.The exposure model showed less agreement with the external dataset, including a lower precision, no significant correlation between measured and predicted exposures, and as illustrated by the Bland-Altman plot, an increasing difference between measured and predicted exposures as the average benzene exposure increases.Also, Lin's CCC estimates indicated better predicted geometric means for the internal than for the external dataset.
The reported agreement between predicted and measured geometric mean exposures for job groups within installations corresponded with analogous findings in Table 2 which are based on predictions of individual measurements.For the internal dataset, the model overestimated overall exposure for both the group and individual-based approaches, while for the external dataset, the model underestimated exposure for both approaches.
Similar validation studies of empirical exposure models have been done for several other industries but for airborne constituents other than benzene.For instance, such validation has been performed for bitumen fumes and benzo(a)pyrene exposure in asphalt paving (Burstyn et al. (2002), exposure in the rubber manufacturing industry (Vermeulen and Kromhout 2005), general dust in sawmills (Friesen et al. (2005), cotton in the Chinese textile industry (Astrakianakis et al. (2006), and for carbon nanotubes and nanofibers (Dahm et al. 2019).Exposure models in these studies underestimated the exposure, with relative biases ranging from 1% to 70%.In contrast, the exposure   model in this study overestimated the exposure when compared to the internal dataset.The precision for the internal measurements was higher (0.6) than the precision reported in other validation studies on occupational exposure.Burstyn et al. (2002) reported precision at 1.35 and 1.72 in the asphalt study.The validation of the empirical model performed by Friesen et al. (2005) in the wood industry reported precisions for dust and wood dust of 0.87 and 0.89, respectively.
There might be several reasons for the bias between measured and predicted values in this study.One reason might be differences in the set of tasks performed by the workers in the measurements conducted for validation, compared to the tasks covered by the measurements used for the development of the exposure model.More than 50% of the measurements used for developing the exposure model were conducted on days when benzene exposure was expected due to the opening of process systems (Ridderseth et al. 2022a).A recent study showed that several work tasks on offshore installations were associated with increased exposure to benzene and that the tasks that led to benzene escape into the environment were a significant exposure determinant (Ridderseth et al. 2022b).However, sufficient information on performed work tasks and benzene sources is not available to introduce these factors as possible determinants in the full-shift exposure model (Ridderseth et al. 2022a).
In contrast to the measurements used for the development of the full-shift exposure model, internal measurements for validation were collected on random days, with no specific a priori information about worker activities and non-routine tasks to be completed.According to interviews with the process operators and mechanics, validation measurements were primarily taken on days with few tasks completed that were known to be associated with increased benzene exposure, such as the tasks described by Ridderseth et al. (2022b).Due to the COVID-19 pandemic during the collection of the internal dataset, there was restricted access to the installations and a reduced number of available beds offshore.Such measures may have impacted the availability of sampling days or resulted in the completion of fewer non-routine tasks with known benzene exposures in process areas.On the other hand, the laboratory technicians reported carrying out their regular duties, comprising sampling and analysis in the laboratory, which may explain the better compliance between the measured and predicted values for this job group than for process operators and mechanics.Other factors not accounted for in the full-shift exposure model, such as benzene source and control measures, might also have contributed to the observed differences between measured and predicted exposure levels.
The external dataset consisted of compliance-driven measurements, implying that measurements were more commonly taken on days involving tasks with a known potential for exposure to benzene.The measurements in the upper exposure range seem to drive the bias and the increasing difference between measured and predicted exposures for increasing benzene exposure.However, the measurement reports did not contain sufficient information to interpret these findings.
This study is, to our knowledge, the only published validation of an empirical model for benzene exposure in the petroleum industry.We were able to collect new measurements for validation from three of the job groups used in the exposure model.However, when the new measurements in the internal dataset were taken, the number of tasks associated with known benzene exposures was low, which led to a high number of measurements below the limit of detection.A larger number of exposure measurements, collected over more days and distributed over a longer period would have been preferable, especially for the mechanics and process operators, who had relatively few tasks involving benzene exposure during the day's validation measurements were collected.Some workers may have provided exposure measurements from several installations in the dataset used to develop the exposure model.However, due to the high number of workers eligible for sampling over the years and because workers in the included job groups do not frequently change installations, the authors assumed that this factor has had an insignificant impact on the results.
In future development of a model for benzene exposure in the oil-and gas industry, work tasks performed during measurements and the type of benzene source worked should be considered as determinants.Exact sampling durations for many of the measurements in the external dataset were missing and reports generally stated that measurements were taken for the entire shift (720 min).If the measurement period were shorter, the model presented in this study would underestimate the exposure to a greater degree.

Conclusion
Overall, the exposure model overestimated benzene exposure in the internal dataset.However, the precision and the correlation between the measured and predicted exposure levels were considered sufficient.When using the exposure model in risk assessment, one must consider the biased estimates and that the internal validation covers only the lower range of exposure levels.In future development of an exposure model for exposure to benzene in the oil-and gas industry, work tasks and the specific benzene source resulting in benzene exposures should be considered as possible determinants.

f
GM exposure levels in the respective subsets of data.b Bias ¼ (log predicted GMlog observed GM)/n 0 , n 0 ¼ number of GMs in the validation set: c exp (bias) d Relative bias ¼ (exp bias -1) Insufficient number of measurements to determine the correlation and 95% CI.

Figure 1 .
Figure 1.Scatterplot of measured and predicted geometric mean benzene exposure for the different job groups at each installation for the internal dataset (circles) and the external dataset (triangles).

Figure 2 .
Figure 2. Bland Altman plot of the GM benzene exposures (ppm) for the laboratory technicians, mechanics, and process operators job groups on each of the installations from the internal and external datasets, respectively.The solid black line is the regression line for the internal dataset and the dashed line is for the external dataset.

Table 3 .
Spearman correlation (r), bias, relative bias, precision Lin's Concordance Confidence correlation (CCC), and associated 95% confidence interval (CI) of the full-shift benzene exposure model relative to the internal and external datasets.