Review of 2,4-dichlorophenoxyacetic acid (2,4-D) biomonitoring and epidemiology

A qualitative review of the epidemiological literature on the herbicide 2,4-dichlorophenoxyacetic acid (2,4-D) and health after 2001 is presented. In order to compare the exposure of the general population, bystanders and occupational groups, their urinary levels were also reviewed. In the general population, 2,4-D exposure is at or near the level of detection (LOD). Among individuals with indirect exposure, i.e. bystanders, the urinary 2,4-D levels were also very low except in individuals with opportunity for direct contact with the herbicide. Occupational exposure, where exposure was highest, was positively correlated with behaviors related to the mixing, loading and applying process and use of personal protection. Information from biomonitoring studies increases our understanding of the validity of the exposure estimates used in epidemiology studies. The 2,4-D epidemiology literature after 2001 is broad and includes studies of cancer, reproductive toxicity, genotoxicity, and neurotoxicity. In general, a few publications have reported statistically significant associations. However, most lack precision and the results are not replicated in other independent studies. In the context of biomonitoring, the epidemiology data give no convincing or consistent evidence for any chronic adverse effect of 2,4-D in humans.

introduction While most regulatory agencies heavily rely on toxicological data, the results of epidemiology studies are becoming more and more important in this area. For this reason and for public health purposes, in general, a critical review of the epidemiologic literature on crop protection products is of increasing value. The herbicide, 2,4-dichlorophenoxyacetic acid (2,4-D), is an example of a pesticide for which the epidemiology data are continually reviewed and debated. Recently re-registered for use by Health Canada (2008), and the US Environmental Protection Agency, EPA (2005), 2,4-D is currently being re-evaluated by the European Union. A past review by Gandhi et al. (2000) emphasized the inconsistent nature of the studies up to 2000 that precluded drawing any conclusion on carcinogenicity in humans. Reviewing the literature through 2001, Garabrant and Philbert (2002) observed more strongly that the epidemiology data "provide scant evidence that supports a conclusion that exposure to 2,4-D is associated with STS [soft tissue sarcoma], NHL [non-Hodgkin lymphoma] or HD [Hodgkin's Disease]." Since no qualitative review of the 2,4-D epidemiology literature since 2001 has been published we present an update in this paper.
The herbicide, 2,4-D, has been registered for use since the 1940s. As a selective herbicide, 2,4-D is used to control broadleaf weeds in a variety of settings from crops, rights-of-way, lawns, forests to aquatic settings. In aerobic environments, 2,4-D degrades rapidly from 2 to 13 days (Wilson et al., 1997). In humans, 2,4-D is excreted unmetabolized in urine with a half-life of 10 to 33 hours, an average of 17.7 hours (CDC, 2009;Sauerhoff et al., 1977). 2,4-D is cleared into urine of both animals and humans by a saturable organic anion transporter, OAT-1; toxicity of 2,4-D in rodents is typically limited to dose levels that saturate renal clearance (>50 mg/kg/ day). Toxicity observed in rodents at doses above renal saturation is generally not regarded as relevant to human health risk (Timchalk, 2004;EPA, 2005). Chronic toxicity of 2,4-D has been tested in laboratory animals at a wide range of dose levels (Garabrant & Philbert, 2002;EPA, 2005). Studies in both rats and mice have shown no carcinogenic effect of 2,4-D and the US EPA classifies 2,4-D as a Group D chemical (not classifiable as to human carcinogenicity). In two-generation reproduction studies there were no effects seen on fertility indices at doses up to and including 72 mg/kg/day. Reduced bodyweight gain has been seen at the higher doses (generally above the level at which renal excretion is saturated), with reduced food intake usually seen in parallel. Experimental in vitro and in vivo animal studies show no genotoxic potential for 2,4-D. Neurotoxicity studies showed aberrations in locomotion and open field behavior at high doses saturating renal clearance but no histopathological findings in neural tissues. The neurotoxic signs were reversible and were at doses that exceeded general toxicity. As one of the most widely used herbicides in the world, 2,4-D continues to be one of the most studied pesticides, both in animals and in humans.
The purpose of this review is to provide a qualitative review of the human biomonitoring and epidemiology data since the summaries by Garabrant and Philbert (2002) and Munro et al. (1992). The biomonitoring data on urine samples include publications since 1991, as they provide the most reliable information on human exposure. To our knowledge these data have not yet been comprehensively reviewed. Some exposure studies were conducted in the 1980s on commercial applicators to turf, forestry and crops (Kolmodin-Hedman & Erne, 1980;Kolmodin-Hedman et al., 1983;Yeary, 1986). These studies will not be included since technical changes, specifically improvements in analytical methods, have made these early studies less comparable to recent ones.

Scope
In 1986, the International Agency for Cancer Research (IARC) classified the chlorophenoxy herbicides as a Group 2B (possible) carcinogen (IARC, 1986). This monograph evaluated the group, which in addition to 2,4-D, includes other phenoxies such as MCPA and 2,4,5-trichlorophenoxyacetic acid (2,4,5-T) and related impurities of polychlorinated dioxins. Notably, IARC has not identified 2,4-D, per se, as a possible carcinogen. IARC sponsored an occupational cohort mortality study of 21,863 workers across 12 countries exposed to phenoxy herbicides. In their 1997 report, Kogevinas and colleagues stratified the workers by those exposed to the contaminant 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) and other chlorinated dioxins and workers unexposed to dioxins (Kogevinas et al., 1997). These investigators noted that the dioxin contaminants have similar mechanisms of action and toxicities whereas the herbicides such as 2,4-D were not known to contain TCDD. In their review, Garabrant and Philbert (2002) also designated cohorts with co-exposure to TCDD. The current review of human epidemiology and 2,4-D will use a similar approach. Where possible, results specific to 2,4-D will be presented and reviewed.

Interpretation
There are many approaches to compiling and reviewing a body of literature. The criteria or guidelines as proposed by Bradford Hill provide useful components to assess data. Swaen and Amelsvoort (2009) proposed a quantitative approach to empirically consider the Hill criteria. They tested their approach using the agents classified as Class 1 or 2A carcinogens by the IARC and observed that in addition to the experimental evidence, the criteria of strength of association and consistency had the greatest weight. In the current review, we did not evaluate the animal or experimental data. Summaries are available from government regulatory agencies such as the US EPA and Health Canada (US EPA, 2005;Health Canada, 2008). A strong association was considered to be a risk estimate or odds ratio greater than 2. We looked for consistency or replication of statistically significant results both within the same study and in independent studies. For example, results were considered to be internally inconsistent if exploratory analyses of all subjects were statistically significantly (p < 0.05 or the upper and lower bound of the confidence limits excluded the null value) but analyses by dose level were not statistically significant. We looked for replication of statistically significant results of specific outcomes to be reported in more than one study to meet the guideline for external consistency.
Biomonitoring data were also considered to be informative. Limited data on exposure are generally considered to be the most critical component of epidemiology studies on pesticide exposure and potential health outcomes. Good quality biomonitoring data can significantly strengthen this shortcoming and can even provide a reliable metric for internal exposure. The biomonitoring data also provide an excellent opportunity to put the exposure to 2,4-D of the various human populations into its proper perspective. Therefore we have included a specific review of the available biomonitoring data on 2,4-D. These data were organized by opportunity for exposure (general population, bystander and occupational) to be in line with populations that are included in epidemiology studies and this approach is similar to that taken by regulators and risk assessors.

Search strategy
As part of a larger project to report the toxicology of 2,4-D, a systematic literature search using the terms "2,4-dichlorophenoxyacetic acid" and the chemical abstract service registration number, "94-75-7" was conducted on the databases BIOSYS, Pascal, Agricola, Chemical Safety News base, ELSEVIER, and CA SEARCH during the years 2000 to 5 January 2012. Another search contained more search terms that included "epidemiology," "human," "occupation," "workers," "women," "men," and "farmers." More than 12,000 references were identified. The abstracts were manually screened by topic and relevance. The studies relevant to human, epidemiology and biomonitoring were selected for this review. Although English language was not a criterion there were no relevant studies identified that needed translation into English. In addition, papers were reviewed when cited as part of other studies and publications.

Inclusion and exclusion
Studies that evaluated human health effects following exposure to herbicides in general were not considered. There are many chemicals used to control vegetation. The US EPA reported that in 2001, 2,4-D use in the United States was ranked third to herbicides glyphosate and atrazine (US EPA, 2012). A survey for more than 35,000 pesticides applicators as part of the Agricultural Health Study observed that pesticides use varies by residence and commodity (Alavanja et al., 1999). Without further information from selected studies, we concluded that we would not attribute to 2,4-D a reported association of "herbicides" and a health effect. For example, we excluded the cancer mortality and illness and injury studies of herbicide applicators (Wesseling et al., 2001;Swaen et al., 2004) for this reason. We excluded other studies for which the results were not specific to 2,4-D because they implied exposure from residence (Schreinemachers, 2003) or occupation (Garaj-Vrhovac & Zeljezic, 2000, 2001Gómez-Arroyo et al., 2000;Martínez-Valenzuela et al., 2009;Remor et al., 2009).
Included were results for phenoxy herbicides and 2,4-D. In publications that reported analyses for all pesticides combined as well as individual exposures, only the results specifically addressing 2,4-D were reviewed.

General population
The largest general population studies of 2,4-D urinary levels in children and adults are those conducted by the US Centers for Disease Control and Prevention (CDC) (2009) and Health Canada (2010). With 2412 and 5480 subjects, respectively, both studies were designed to be representative of the national population (Table 1). Neither study detected 2,4-D at the 50th percentile (less than 1 parts per billion, ppb). These findings are similar to an earlier study by CDC (Hill et al., 1995). The US EPA's study of children and adults in North Carolina and Ohio (Morgan et al., 2008) and a study of children in Thailand (Panuwet et al., 2009), both observed exposure levels to be at or below 1 ppb. The maximum values shown in Table  1 ranged from 2 to 37 ppb. These studies demonstrate that background exposure to 2,4-D is low to undetectable in most populations. Certain individuals had the opportunity for exposure as evidenced by the maximum urinary levels. However, few of the studies collected data relevant to diet or activities to determine how the contact or exposure to 2,4-D occurred.

Bystanders, indirect exposure
There are a number of urinary measurements collected from groups of individuals considered to be bystanders (Table 2). These individuals do not mix, load or apply 2,4-D, but may have the opportunity for indirect exposure over and above the general population. Examples include spouses and children of applicators, as well as applicators who apply other herbicides. The largest study reported that urinary 2,4-D was detected (LOD = 0.2 ppb) in all but 2% of 196 farm workers (Arcury et al., 2010) although actual urinary levels were not reported. Another large study of 125 spouses of farmer applicators observed that most of the individuals were below 1 ppb (Arbuckle & Ritter, 2005). Other studies of women reported the geometric mean 2,4-D level to be 1 pbb or less (Harris et al., 1992;Cooper et al., 2001;Alexander et al., 2007). The monitoring levels of bystander children were slightly higher than the parents (Arbuckle et al., 2004;Alexander et al., 2007;Arcury et al., 2007). However, a few children and spouses were involved in the application process, invalidating their classification as "bystanders." The remaining bystander studies collected urine from crop applicators before the application (Alexander et al., 2007) or from crop applicators that applied other products (Arbuckle et al., 2002;Curwin et al., 2005;Panuwet et al., 2008;Bakke et al., 2009) and licensed applicators in Minnesota (Garry et al., 2001). In general, the central tendency for urinary levels among these applicator bystanders was from less than LOD to 3 ppb.

Manufacturers and applicators, direct exposure
The occupationally exposed include individuals working in 2,4-D application on crops, forests and turf and 2,4-D manufacture (Table 3). This group is highly heterogeneous with respect to exposure due to their varying opportunity for direct exposure to 2,4-D. Most have detectable urinary levels but not all. The data for crop and forestry applicators tend to be skewed with geometric means between 5 to 45 ppb, but maximum levels from 410 to 2500 ppb (Knopp & Glass, 1991;Garry et al., 2001;Arbuckle et al., 2002;Curwin et al., 2005;Alexander et al., 2007;Panuwet et al., 2008;Bhatti et al., 2010;Thomas et al., 2010). The unprotected turf applicators of liquid 2,4-D had the highest urinary levels reported in a controlled study (Harris et al., 1992). The highest levels in another turf study were observed during the spring season, compared to the summer and fall (Harris et al., 2010). Other exposure studies of professional turf applicators reported internal dose estimates and are not directly comparable to the studies in Table 3 ( Harris et al., 2002Harris et al., , 2005. However, modeling of the variation of exposure in these reports confirmed that doses were influenced by type of spray nozzle and use of gloves whereas job title alone was a poor determinant. Only one older study evaluated exposure among manufacturers in the 1980s (Knopp, 1994). This population had the highest range with a maximum of 12,963 ppb.

Summary of biomonitoring
In general, urinary levels of 2,4-D are correlated with individual behavior, performed tasks and opportunity for direct contact with the herbicide. Urinary 2,4-D levels are near or below 1 ppb in most of the sampled general populations. Many of the studies of bystanders and applicators reported information on activities, use of protective equipment and application methods. The highest urinary levels were observed in "bystanders" with opportunity for direct contact with 2,4-D by assisting with the application, being present during the application, handling the herbicide (Arbuckle et al., 2004(Arbuckle et al., , 2005Alexander et al., 2007). Work practices among applicators were also demonstrated to predict urinary 2,4-D. These include glove use, repairing equipment, application method, acres treated and personal hygiene practices. In seasonal applicators, Bhatti et al. (2010) observed that these factors explained only 16% of the variance between workers suggesting that other factors remain to be identified. Exposure classifications in epidemiology studies can be strengthened and validated by means of high quality biomonitoring data. Currently, pesticide exposure in epidemiology studies is frequently based upon unvalidated self-reported use and/or activities with probable contact with herbicides. From a given publication, it is often difficult to differentiate if a participant applied 2,4-D or was near an application and thus assigned as "exposed." Epidemiologists are always concerned about reporting and recall bias. However, the 2,4-D urinary biomonitoring studies suggest that we can also be focused on collecting better data relevant to the application. Biomonitoring studies of applicators and their spouses have established that contact with 2,4-D through mixing, loading and applying is a strong determinant of internal exposure whereas living near the application is not (Arbuckle et al., 2002;Arbuckle & Ritter, 2005;Alexander et al., 2007).
For example, in their study of NHL, Hartge et al. (2005) measured levels of 2,4-D in household dust from vacuum cleaner bags and subsequently calculated risk estimates by dust levels in quintiles from below detection to greater than 10,000 ng/g. However, without corresponding urinary monitoring, no internal dose estimates are possible. Levels of 2,4-D in carpet dust and urine were collected in another study by Morgan et al. (2008). Although 2,4-D was detected in more than 80% of the samples, with a maximum level of 21,700 ng/g, urine levels were not statistically significantly correlated with the 2,4-D concentrations in the dust samples. Whereas, 2,4-D can be identified in household dust, detection has not been demonstrated to be a good predictor of exposure based on urinary levels, and its use in epidemiology studies should be viewed with caution. This is consistent with the conclusions of Curwin et al. (2005) that the key determinant for exposure is actually applying the pesticide of interest.
In conclusion, the biomonitoring data for 2,4-D provide important information about the plausibility and validity of exposure estimates in the epidemiology literature.

Cancers affecting the lymphatic system (NHL, MM, leukemia)
The focus of cancer studies and 2,4-D has historically addressed the lymphatic system, and primarily non-Hodgkin lymphoma (NHL). The current decade is no different with 9 case-control studies (12 publications) reporting on lymphohematopoeitic cancers (Table 4). The largest NHL study is the Italian multicenter investigation with 1575 cases reported in 2003 and 1925 cases by 2006 (Miligi et al., 2003(Miligi et al., , 2006. The estimate among all participants demonstrated no increased risk from 2,4-D use (OR = 0.9, 95% CI = 0.5-1.8). The odds ratios were not significant when stratified by men (0.7, 95% CI = 0.3-1.9) and women (1.5, 95% CI = 0.4-5.7). Nine cases and 3 controls reported 2,4-D use (greater than low) and not using protective equipment. The large odds ratio of 4.4 was statistically significant but imprecise (95% CI = 1.1-29.1).
Another large NHL case-control study from Sweden, (Eriksson et al., 2008) reported no statistically significant increased risk for 2,4-D and/or 2,4,5-T use among 910 cases and 1016 controls. There was no increase in odds ratios by days used nor any significant increase by any of eight cancer entities. A statistically significant risk for NHL and phenoxy herbicides, in general, was influenced by the significant odds ratios for MCPA.
Data were pooled from three previously published case-control studies in Kansas (Hoar et al., 1986), Nebraska (Zahm et al., 1990) and Iowa and Minnesota (Cantor et al., 1992) resulting in 870 NHL cases and 2569 controls (De Roos et al., 2003;Lee et al., 2004a). There were 123 cases and 314 controls who reported using 2,4-D. No significant increases for using 2,4-D and NHL were reported in analyses to address possible associations among asthmatics and adjusting for co-exposure to other pesticides (De Roos et al., 2003;Lee et al., 2004a). Another similarly sized study (679 cases and 510 controls) reported no significant increase in risk of NHL and 2,4-D in carpet dust (Hartge et al., 2005).
A large case-control study of 517 NHL cases and 1506 controls from six Canada provinces observed a statistically significant odds ratio for 2,4-D use after adjusting for medical variables (OR = 1.32, 95% CI = 1.01-1.73) (McDuffie et al., 2001). However, multivariate analyses and stratification by days of use yielded no statistically significant results. In their analyses of the use of the insect repellent, DEET (N,N-diethyl-m-toluamide) the authors reported no statistically significant increase for NHL and 2,4-D (McDuffie et al., 2005). The odds ratio for using gloves, 2,4-D and DEET was 1.77 (95% CI = 0.90-3.45). The authors discuss that DEET may increase permeability of rubber gloves. However, one would expect a similar or equal risk compared to no glove use, not higher. The analyses of these data by Hohenadel et al. (2011) reported an increased odds ratio among respondents who used both malathion and 2,4-D (OR = 2.06, 95% CI = 1.45-2.93). This is in contrast with the pooled study of De Roos et al. (2003) that reported no association for use of both pesticides.
The smallest case-control study published in the past decade was the United Farm Worker analysis of 60 cases of NHL . With the small sample size and only 15% of the subjects classified as exposed to 2,4-D, the results are imprecise. For example, the confidence interval for the NHL-extranodal association (OR = 9.73, 95% CI = 2.68-35.3) is very wide (confidence interval ratio = 13).
Two occupational cohort studies of workers in 2,4-D and phenoxy herbicide manufacturing have been updated since the 2001 review. Neither study reported a statistically significant increase in NHL (Boers et al., 2010;Burns et al., 2011). The cohort of foresters was too small for a meaningful evaluation of 3 NHL cases (Thörn et al., 2000).
Other lymphohematopoetic cancers such as leukemia, Hodgkin lymphoma and multiple myeloma were also investigated and also were not found to be associated with 2,4-D exposure Pahwa et al., 2006;Boers et al., 2010;Burns et al., 2011).
In    The largest, most robust studies found no dose response and no statistically significant increase in use of 2,4-D and NHL. The few statistically significant results reported were internally inconsistent with other dose and multivariate analyses in the same study, were imprecise and were not confirmed in studies of other populations. These studies provide inconsistent evidence of increased risk of NHL or other cancers of the lymphatic system.

Other cancers
The inconsistencies in the epidemiology data are often attributed to poor quality and inadequate sample size. The Agricultural Health Study (AHS) was designed to address these inadequacies by enrolling approximately 90 000 farmer applicators and their spouses. There are more than 100 publications from the AHS (http://aghealth.nci.nih. gov/). Although the AHS has not reported on 2,4-D applicators as a group specifically, nested case-control studies of prostate cancer, breast cancer, colorectal cancer, melanoma of the skin and childhood cancer have evaluated 2,4-D. None of these studies reported an association with 2,4-D. Each is detailed below by cancer site. In addition to the AHS publications, other studies that report these cancer sites and 2,4-D are also discussed ( Table 4). The prostate cancer case-control study nested in the AHS reported no significant increase use of 2,4-D by cases (Alavanja et al., 2003). The much smaller Dutch manufacturing cohort also reported no significant increase (Boers et al., 2010) and the other manufacturing cohort reported a statistically significant deficit of prostate cancer among 2,4-D workers (Burns et al., 2011). Exploratory analyses in a recent case-control study were strong and statistically significant (OR = 2.72, 95% CI = 1.12-6.57) but with only 12 exposed cases, dose-response analyses were not examined (Band et al., 2011). Together, these studies provide inconsistent evidence of increased risk of prostate cancer and 2,4-D exposure.
No significant increase of breast cancer was reported by the Engel et al. (2005) analysis of the AHS women who reported using 2,4-D. This is in contrast to a UFW case-control study that reported a statistically significant odds ratio of 2.14 (95% CI = 1.06-4.32) for cases diagnosed after 1994 . However, the odds ratio was inexplicably less than expected for early diagnosed (before 1994) cases (OR = 0.6, 95% CI = 0.23-1.69). There was no dose response since the ORs were similar for low and high use. Exposure misclassification may exist in this study since exposures were assumed for all farm workers and the determination of 2,4-D exposure was not timed by application or entry into the field. These studies of breast cancer in women are inconsistent.
Cancers of the colon and rectum were evaluated in the AHS by Lee et al. (2007). No significant increase was reported for rectal cancer and a statistically significant inverse dose response by lifetime exposure days (p = 0.011) was observed for colon cancer and 2,4-D. The manufacturing cohorts of Boers et al. (2010) and Burns et al. (2011) reported no significant increases. These studies provide no evidence of increased risk of colorectal cancer. Dennis et al. (2010) reported no significant increase in or risk melanoma of the skin associated with 2,4-D exposure. This is consistent with both the Dutch and US 2,4-D manufacturing cohorts (Boers et al., 2010;Burns et al., 2011). These studies provide no evidence of increased risk of melanoma.
A study of children of participants of the AHS who ever applied 2,4-D was reported by Flower et al.(2004). No significant increase in cancer was observed.
The association of 2,4-D exposure and two other cancer sites have been addressed by other non AHS studies. A statistically significant association was reported by Lee et al. among glioma proxy respondents (OR = 3.3, 95% CI = 1.5-7.2) but not among self respondents (OR = 0.6, 95% CI = 0.2-1.6). There may be appreciable recall bias (Lee et al., 2005). No significant increase for 2,4-D and glioma was reported by Carreón et al. (2005). The risk of brain cancer was also not increased in the 2,4-D occupational Pahwa et al. (2006) and Pahwa et al. (2011) (Burns et al., 2011). These studies provide inconsistent evidence of increased risk of brain cancer. No significant association between stomach and oesophageal cancers and 2,4-D exposure was reported in a case-control study (Lee et al., 2004b) and the results were inconsistent for stomach cancer in the UFW casecontrol study by Mills et al. (2007). Neither the Dutch nor the US occupational cohort study reported significant increased risk of stomach cancer (Boers et al., 2010;Burns et al., 2011). These studies provide inconsistent evidence of increased risk of stomach cancer

Reproductive toxicity
Using a cross sectional design, Swan et al. (2003) evaluated urinary 2,4-D levels, semen quality, sperm concentration, morphology, and motility in a multi-center study of partners of pregnant women (Table 5). They found no significant associations in semen quality, concentration, morphology or motility with 2,4-D levels. An exposure study of 2,4-D farmers did identify 2,4-D in the sperm of study participants . These findings are difficult to interpret with respect to farmer practices because detectable levels were found in over 50% of the subjects with no reported use of 2,4-D and nearly half of those who did report using 2,4-D were below detection (summarized in Table 5).
To our knowledge, the only other study to evaluate semen quality and 2,4-D exposure is a small study of 32 farmers and 25 controls (Lerda & Rizzi, 1991). Using urine and sperm samples collected during a growing season, the authors reported differences in sperm quality at one stage but not another among 2,4-D exposed farmers compared to controls. The validity and generalizability of this study, however, are questionable. The study conduct was poorly described making evaluation of the study quality impossible. No information was provided about timing of either the urine or semen collection with respect to 2,4-D application. Lastly, the mean urinary levels of the exposure group at 9.0 mg/L (9000 ppb) are 200 to 500 times higher than the mean levels measured in other farmer studies (Alexander et al., 2007;Thomas et al., 2010). It is unclear if this is due to unique practices or incorrect analytical methods. Regardless, from a dose interpretation, these urinary levels are unanticipated for current farmers. These few publications provide no convincing evidence for sperm cell toxicity in humans from 2,4-D exposure. Post conception exposure OR = 0.6 (0.3-1.2) Related analysis of Arbuckle 1999, "results were sensitive to the cutpoint used" (12 vs. 13 weeks gestation) 22v5 Pre vs. Post OR = 2.9 (1.1-8.0) 11v14 12-19 weeks, Pre vs. Post OR = 0.5 (0.2-1.1) Swan et al. (2003) Nested C-C Urinary 2,4-D (≥0.1 µg/g cr) 5 Low semen quality OR = 0.8 (0.2-3.0) (Missouri, no exposed cases in Minnesota) Only one study evaluated rates of spontaneous abortion following use of 2,4-D during the pregnancy period Arbuckle et al., 2001). The overall results showed no association with 2,4-D use in the preconception period (OR = 1.2, 95% CI = 0.8-1.6) or the postconception period (OR = 1.0, 95% CI = 0.7-1.6). A single analysis comparing pre-and post-conception exposure less than 12 weeks gestation was statistically significant (OR = 2.9, 95% CI = 1.1-8.0). The results were sensitive to the use of 12 or 13 weeks as the cutpoint for exposure period suggesting a random finding.

No significant increase
Three studies evaluated birth defects but none identified a significant increase with 2,4-D exposure (Schreinemachers, 2003;Weselak et al., 2008;Waller et al., 2010). A publication on respiratory endpoints in children whose parents used 2,4-D during the pregnancy period reported a significant prevalence of hay fever and/or allergies (OR = 1.66, 95% CI = 1.11-2.49). However, the study did not evaluate and cannot exclude exposure to allergens after birth (Weselak et al., 2007). These studies provide no evidence of increased risk of birth defects due to 2,4-D exposure.

Genotoxicity, including hormones and immune system
A few investigators have reported on cytogenetic damage among agricultural workers, but none was specific to 2,4-D exposure (Garaj-Vrhovac & Zeljezic, 2000, 2001Gómez-Arroyo et al., 2000;Martínez-Valenzuela et al., 2009;Remor et al., 2009;. A small study of farmer applicators reported no association with micronuclei scores, lymphocyte phenotypes and blood counts (Figgs et al., 2000). Some estimates of replicative index were significantly associated with urinary 2,4-D levels but others were not. These findings do not support the hypothesis of changes in immunological variables after a 2,4-D application (Faustini et al., 1996). Another small study identified an association with luteinizing hormone (LH) levels and urinary 2,4-D levels, but found no additional associations with hormone levels or chromosomal aberrations (Garry et al., 2001). Neither study finding has been confirmed by other independent investigations. No significant increase in antinuclear antibodies positivity among 2,4-D users was reported in the cross sectional study by Semchuk et al. (2007) (summarized in Table 6).

Neurotoxicity
Parkinsonism, Parkinson's Disease and use of 2,4-D have been evaluated in four independent studies and five publications (Kamel et al., 2007;Dhillon et al., 2008;Hancock et al., 2008;Tanner et al., 2009;Tanner et al., 2011). A multicenter case-control study (519 cases and 511 controls) by Tanner et al. (2009) reported a strong and statistically significant increased risk of Parkinsonism among 2,4-D users (OR = 2.59, 95% CI = 1.03-6.48). However further analyses by job and exposure duration did not support a dose response. No significant increase risk of 2,4-D use and Parkinson's Disease was identified in the other studies (summarized in Table 7).

General toxicity
Various respiratory endpoints have been addressed in certain groups within the AHS. These endpoints are all self-reported and include wheeze, farmer's lung, chronic bronchitis, asthma, and rhinitis. Only atopic asthma was associated with 2,4-D use in the AHS women and rhinitis (runny nose) in the commercial applicators. In the absence of a dose response and in the context of many co-exposures the implication of these associations for 2,4-D per se is unclear. No other publication has confirmed these associations (summarized in Table 8).
Another publication used the national data from the CDC to compare urinary levels in the US population to a battery of tests relevant to risk factors for heart disease. Whereas all the endpoints were within the normal range and only 14% of the subjects had 2,4-D levels above the LOD, some endpoints were statistically significantly associated with detectable urinary 2,4-D. As a cross sectional analysis of data collected at the same time for endpoints within normal, the etiologic implications are unclear. No other published study to date has tested these findings.

Summary of epidemiology
The 2,4-D epidemiology literature after 2001 related to carcinogenic reproductive, genotoxic, neurotoxic and general outcomes in humans is overwhelmingly negative. For cancer, the data are broad and consistent and give no credible or plausible evidence for any increased risk of cancer in humans associated with 2,4-D exposure. The most frequently reported cancer endpoint was NHL. The overall results for 7 studies reported prior to 2001 (Woods & Polissar, 1989;Pearce 1989;Smith & Christophers, 1992;Hardell et al., 1994;Kogevinas et al., 1997;Zahm 1997;Hardell & Eriksson 1999) and the 9 new studies reviewed since 2001 are shown in Figure 1. Unfortunately it was not possible to compare NHL results across studies by dose estimates. Several studies used duration as a proxy for increasing exposure but these maximum levels ranged from 7 days (McDuffie et al., 2001), 29 days (Eriksson et al., 2008), 5 years (Burns et al., 2011), to 17 years (Chiu et al., 2006). Miligi et al. (2006) considered no use of protective equipment (i.e. gloves) to be the highest level whereas McDuffie et al. (2005) categorized glove use with 2,4-D and DEET as the highest level. Overall, there are a few statistically significant positive observations, but the data are not consistent across studies, particularly in the past 10 years.
Statistical power to detect an association is influenced by sample size of the study, numbers and magnitude of exposed and the strength of the association. One can play down a lack of statistical significance due to inadequate participants. Conversely, small studies with statistically significant and strong results may emphasize the role of statistical testing. A useful way to evaluate multiple results within and across studies is to compare the confidence interval ratio (CIR) (Poole, 2001). For example, as shown in Table 9, the female participants in the Italian study by Miligi et al. (2003 and might be assumed to have a higher risk of NHL than the men based upon the non significant odds ratios (OR = 1.5 vs. OR = 0.7). Further, the risk when not using protective equipment (OR = 4.4), attained statistical significance and was highlighted in the publication abstract. However, the estimates were imprecise as shown by their wide confidence intervals and large confidence interval ratios. The risk estimates may be subject to chance variation. The non significant estimate of 0.9 for the overall group is the most   Miligi et al. (2003Miligi et al. ( , 2006. precise, and should be considered the most trustworthy with the least random error. Thus, precision is an important consideration along with replication and statistical significance when reviewing epidemiology study results. In discussing specific health results, it is also important to take into account concepts related to reducing both random error and exposure misclassification. Investigators endeavor to maximize both sample size and exposure potential in selected populations. However, it is difficult to do one without sacrificing the other. The very large AHS enrolled approximately 90 000 farmer applicators and their spouses but relied upon questionnaire based exposure determination. Urinary biomonitoring of a few AHS participants confirmed that internal doses of 2,4-D were well below exposure guidance values and that exposure was not uniform across applicators (Aylward et al., 2010;Thomas et al., 2010). Conversely, the occupational cohort studies of manufacturing workers of phenoxy herbicides have potentially higher and more frequent exposure but the sample sizes tend to be small. Much has been written about the problems and direction of exposure misclassification. In short, if unexposed participants are categorized as exposed, the risk estimates can be higher or lower (Thomas, 1995;Jurek et al., 2008). Keeping these strengths and weaknesses in mind, we looked for replication from one study to another with the view that no single epidemiology study can confirm or refute a putative association.

Conclusion
Our interpretation of the epidemiology literature is that there is no convincing evidence for any chronic adverse effect of 2,4-D in humans. However, the epidemiology data are only a small portion of the information available.
There is an abundance of information from the biomonitoring literature. These data inform us that persons with direct contact with 2,4-D have the highest exposure and that most non-agricultural populations have little to no measurable exposure. Guided by biomonitoring we can better control exposure and target populations that are the most exposed for future health research.

Declaration of interest
The authors' affiliation is as shown on the cover page.