Patterns of Pediatric Cancers in Florida: 2000–2015

This study identifies pediatric cancer clusters in Florida for the years 2000–2015. Unlike previous publications on pediatric cancers in Florida, it draws upon an Environmental Protection Agency dataset on carcinogenic air pollution, the National Air Toxics Assessment, as well as more customary demographic variables (age, sex, race). The focus is upon the three most widely seen pediatric cancer types in the USA: brain tumors, leukemia, and lymphomas. The covariates are used in a Poisson regression to predict cancer incidence. The adjusted cluster analysis quantifies the role of each covariate. Using Florida Association of Pediatric Tumor Programs data for 2000–2015, we find statistically significant pediatric cancer clusters, but we cannot associate air pollution with the cancer incidence. Supplementary materials for this article are available online. ARTICLE HISTORY Received October 2017 Accepted January 2019


Introduction
studied the clustering of pediatric cancer incidences in Florida for [2000][2001][2002][2003][2004][2005][2006][2007]. The paper used data from the Florida Association of Pediatric Tumor Programs (FAPTP), and focused on the three major subtypes of pediatric cancers: brain tumors, leukemia, and lymphoma. The study initially adjusted for age and sex, but not race, because of the likelihood that race and environmental pollution are confounded with each other (see American Academy of Pediatrics, Committee on Pediatric Research 2000). If so, then adjusting for race could remove some of the possible effect that pollution has on cancer rates. It is more likely that race is associated with cancer mortality which is known to be associated with environmental pollution, but this may not be the case for cancer incidence. To have a better understanding of the role of race on the pediatric cancer incidences, we then adjusted for race in addition to age and sex in this study for data covering the period 2000-2015. To study the possible relationship between pollution and cancer incidence directly, this paper uses the Environmental Protection Agency (EPA)'s National Air Toxics Assessment (NATA) dataset on carcinogenic air pollution. The EPA provides good documentation of the data sources along with the types of carcinogenic pollutants that went into the NATA score to assess cancer risk. The latest NATA survey released by the EPA by 2017 was in 2011, which gives a relatively short exposure time between exposure and disease expression for the cancers considered. An option would be to use the NATA 2005 results instead. It is assumed that carcinogenic releases by major chemical companies into the air do not change abruptly from year to year. We perform a purely spatial analysis for the data covering the years 2000-2015 In 2014, the journal Statistics and Public Policy had five teams of statisticians and epidemiologists independently analyze the FAPTP data for the years 2000-2010. The analysts used different models and made different assumptions, and the five papers were published in a special issue of the journal: Amin et al. (2014), Lawson and Rotejanaprasert (2018), Wang and Rodriguez (2014), Heaton (2014), and Zhang, Lim, and Maiti (2014). Waller (2015) provided a detailed overview of each of the five articles on the pediatric cancers in Florida, along with a very useful comparison of the characteristics of each of the approaches used. In particular, he discussed four specific questions that each of the five research articles addressed slightly differently, with the common goal to identify clusters if they existed (Waller 2015). The questions were: (i) Which general question do we want to answer? (ii) Which type of data are available to address that question? (iii) Which specific questions does each methodology answer? (iv) What do the answered questions reveal about the motivating primary question? No method was declared as being inferior or superior. The methods used differed based on what the goals were in each of the five research articles that analyzed dataset. Our main goal here is to test for any outlying values in cancer rates, and to make our current results more comparable with our past two articles on this topic. We want to strive for comparability without compromising the improvements represented in this paper in comparison to those used previously. For these reasons we opted to use SaTScan as was done in our past two papers on pediatric cancers in Florida (Amin et al. 2010(Amin et al. , 2014. Using the same surveillance software package will reduce any differences in how clusters are identified, and this should allow for a clearer comparison with our past papers. This paper extends the work of Amin et al. (2014) in three ways; (i) it covers a longer span of time (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015), (ii) it brings in the EPA's NATA carcinogenic data, and (iii) it now also adjusts for race.

Data
The FAPTP data consist of cancer incidence counts for children aged 0-19 years for the years 2000-2015. FAPTP is not the only source of data on pediatric cancers in Florida, but it is one part of a system by which the Florida Cancer Data System (FCDS) collects such data. FAPTP is a reliable source for pediatric cancer data in Florida Roush et al. 1993). Amin et al. (2010) provide a useful discussion of the FAPTP data. The data on leukemia, lymphoma, and brain/CNS cancers include information on year of birth, age, and residence at the time of cancer diagnosis, sex, race, and the FAPTP diagnosis code.
The population data were obtained from the 2000 US Census and the 2010 U.S. Census. Linear interpolation provided estimates for the years 2001-2009. For the years 2011-2015, we used estimates from the Census Bureau's American Community Survey (ACS). Since the ACS population data are based on sampling, each population estimate has an associated confidence interval (Spielman and Folch 2015;Spielman and Singleton 2015). However, we did not make use of such confidence intervals to modify the analyses for the years 2011-2015 though there are arguments for doing so. A reasonable additional testing for the cluster analysis results could be to study in more depth the demographics of the most prominent cluster by obtaining the standard error of each population estimate, and then compare such errors with population estimates and accompanying standard errors for counties falling outside the cluster. It is well known that having a large sampling error in a small estimate could possibly reduce the usefulness of the estimate. On the other hand, a small sampling error in a large estimate may suggest that the estimate is reliable. Such adjustments or tests are outside the scope of this article.
The EPA collects ambient and exposure concentrations to monitor chemical facilities in the USA, in addition to using street monitors to obtain data for the NATA tool. The 2011 NATA is a national-level risk assessment based on the emissions of air toxics that produces census-tract level estimates of ambient and exposure concentrations for 180 air toxics, plus diesel particulate matter (PM), which EPA assessed for noncancer effects only. Using the concentration estimates for the 180 air toxics and diesel PM, NATA estimates cancer risk and noncancer hazard for 138 of these. For 42 air toxins, concentration estimates are available but not health effect information. The Hazardous Air Pollutant Exposure Model (HAPEM) is used by the EPA to estimate exposure concentrations by combining information on concentrations of ambient air toxins, population data from the U.S. Census Bureau, population activity data, and micro environmental data to estimate the final exposure concentrations in NATA (Office of Air Quality Planning and Standards 2002). The U.S. census data are used to build the simulation population. The EPA calculates a daily-averaged exposure and dose for each individual to obtain a distribution of exposure and dose for the population. The estimates are created from each state at both county and census-tract levels. The sources of the air toxic emissions are categorized as point, nonpoint, mobile on-road, nonroad, biogenic, and fires in the United States.
While the EPA does not provide data or information on specific air pollutants that have been linked to specific pediatric cancer types, the literature includes several articles in which such associations are claimed by the authors. For example, Hernández and Menéndez (2016) and Reynolds et al. (1993) found associations linking pesticide exposure to childhood leukemia. Filippini et al. (2015) provided a review and metaanalysis of outdoor air pollution and risk of pediatric leukemia. Their findings support a link between ambient exposure to traffic pollution and pediatric leukemia risk. García-Pérez et al. (2015) identified excess risk of childhood leukemia for children living near industrial and urban sites which use organic solvents and also industries of glass and mineral fibers.

Methods
We used the disease surveillance software SaTScan (Kulldorff 1993;Kulldorff and Information Management Services, Inc. 2009) to identify spatial cancer clusters (Kulldorff and Nagarwalla 2014). No space-time analysis was done as the focus here is on a purely spatial analysis. Our analysis reapplies the methodology in Amin et al. (2010Amin et al. ( , 2014, performing univariate cluster analyses for three cancer types (brain tumor, leukemia, lymphomas). The units analyzed for the cancer rates are ZIP Code Tabulation Areas (ZCTAs). The ZCTAs were created by the U.S. Census Bureau to correct for instability of US Postal Service zip codes over the years 2000-2015, which are based on mail delivery routes (www.census.gov/geo/reference/zctas.html) rather than fixed locations. It is a challenge that matching ZIP codes to ZCTA numbers will match for most, but not all addresses. The ZCTAs incorporate full census blocks, but some ZIP codes may change within blocks. The ZCTA number is associated with the ZIP code that is associated with the majority of addresses within the census block. The census blocks are aggregated to create ZCTAs, and the cancer incidence counts in each ZCTA are used for spatial analysis. We assumed that pediatric cancer counts in each ZCTA follow a Poisson distribution with parameter depending on the location. This method tests the null hypothesis that cancer rates are constant for all ZCTAs in Florida.
The Centers for Disease Control defines a cancer cluster as a statistically significant excess over the expected number of cancer cases among people in a geographic area during a period of time (www.cdc.gov/nceh/clusters/default.htm). SaTScan searches for clusters by imposing a moving window on a map, including different sets of neighboring ZCTAs whose centroids lie within the window. If the window includes the centroid of a specific ZCTA, then this geographical unit is included in the window. For each window, the spatial scan statistic tests the null hypothesis of equal risk of cancer incidence for all ZCTAs against the alternative that there is an elevated risk of cancer for ZCTAs within the scan window. The likelihood function for the Poisson model is proportional to where n is the number of cancer incidences within the scan window, N is the total number of cancer incidences in the population, and E is the expected number of cancer incidences under the null hypothesis. SaTScan performs a Monte Carlo approximation of a one-tailed permutation test. It can be shown that for fixed N and E, the likelihood increases as the number of incidences (n) increases in the scan window. This means that the likelihood ratio increases as one adds more incident cases to a potential cluster with a fixed population size and expected number of cases. The literature is rich with modern cluster analysis algorithms, such as EigenSpot (Fanaee-T and Gama 2015) and others, while more established algorithms, such as Moran's I or the Getis-Ord Gi* or Geary's C are also options available to researchers. SaTScan can detect outbreak locations on the map and also detect and test for space-time interactions if they exist. Moran's I or Geary's C can estimate the strength of clustering or dependence which could be monitored over time to check for any changes. SaTScan allows for circular shaped windows or for elliptically shaped windows, and there is little practical difference between using either type of analysis. The National Cancer Institute seems to be favoring the elliptically shaped cluster, while Martin Kulldorff, the creator of SaTScan, seems to favor the circular clusters. The other methods that Waller (2015) discussed do not assume windows of particular shapes. Also, FleXScan allows for irregular shaped clusters (Tango and Takahashi 2005), which is based on the strategy for the detection of arbitrarily shaped clusters (Duczmal and Assunção 2004).
EigenSpot (Fanaee-T and Gama 2015) is a new cluster methodology by which the space-time correlation structure is tracked without making assumptions about the data distribution or hotspot shape. It can handle only a single hotspot for each cluster analysis. There are a variety of cluster analysis algorithms available to the researcher, and our choice to use SaTScan does not imply that other algorithms are inferior.

Results
We performed a spatial cluster analysis for five cases: • counts adjusted for age and sex • counts adjusted for age, sex, and race • counts adjusted for age, sex, and air pollution • counts adjusted for age, sex, race, and air pollution • counts adjusted for age, with data for boys and girls analyzed separately To clarify comparisons between the first two cases, each of the three cancer types is analyzed first by using Poisson regression to obtain residuals representing counts that are corrected for contributions from age and sex, followed immediately by a cluster map that uses cancer counts which were similarly corrected with Poisson regression for age, sex, and race. To show any clusters clearly on each cluster map, the counties falling into a significant cluster are shown in color while the rest of the counties are not colored based on cancer rates. The reason for not being able to show the adjusted cancer rates for each county in Florida is the inability for extracting adjusted rates from SaTScan. In the Appendix, heat maps showing the raw cancer rates for all counties are provided. Only clusters with p < 0.05 are considered as being significant. Clusters with p slightly greater than 0.05 may still be shown on the cluster maps since such clusters may warrant some attention if there is future worsening of the situation there. The "Most Likely Cluster" is the cluster that is the most likely cluster not to be due to chance. Figure 1 shows the annual unadjusted brain tumor rates for Florida and for the "Most Likely Cluster" (based on what is shown in Figure 2). The graph shows that brain tumor rates in the cluster stayed above the Florida rates almost each year for 2000-2015, indicating that this spatial cluster is persistent. Brain tumor rates, adjusted for age and sex, yield only one significant cluster (p = 0.00037; RR = 1.60), shown in Figure 2. The area between Jacksonville and Orlando has 60% higher brain tumor rates than the rest of Florida. We also show a nearly significant possible cluster containing Miami, with p = 0.0543 and RR = 1.30. When we adjust for age, sex, and for race, the first cluster persists, but shifts to the north as seen in Figure 3. The secondary cluster vanishes, implying that race explains the elevated brain tumor rates near Miami. The race distribution in the population plays a role in the brain tumor clustering. Figure 4 shows the annual unadjusted Leukemia rates for Florida and for the "Most Likely Cluster" (based on what is shown in Figure 5). The graph shows that leukemia rates in the cluster stayed above the Florida rates each year for 2000-2015 except in 2003, indicating that this spatial cluster is persistent. In Figure 5, and for leukemia rates, the analysis that adjusts for age and sex finds a small (significant) cluster of seven ZCTAs northwest of Tampa near Clearwater (p = 0.0075; RR = 2.35). There is a possible secondary cluster near Ocala (p = 0.0547; RR = 2.80), also shown in Figure 5, but its p-value exceeds 0.05. Without using race as a covariate, the cluster analysis could be associated with social justice issues regarding the siting of unhealthy industries and occupations. After adjusting leukemia rates for age, sex, and for race, a large cluster emerges in south Florida (p = 0.0168; RR = 1.34) based on the race-adjusted risk. Several secondary clusters are also shown in Figure 6. The large cluster only appeared after adjusting for race, so it is associated with race. The most significant secondary cluster (p = 0.0310; RR = 2.90) is located near Ocala, in the same ZCTAs as the brain tumor cluster. The third most significant cluster is close to Clearwater (p = 0.0536; RR = 2.12), in the same location as the brain tumor cluster. The last possible cluster is in Jacksonville (p = 0.0743; RR = 1.98). While the significance level has been set to 0.05 in this study, it is useful to also report clusters with p-values smaller than 0.10 as clusters that need to be monitored over time for a possible disease outbreak. Figure 7 shows the annual unadjusted lymphoma rates for Florida and for the "Most Likely Cluster" (based on what is shown in Figure 8). The graph shows that lymphoma rates in the cluster stayed above the Florida rates almost each year for 2000-2015, indicating that this spatial cluster is persistent. In Figure 8, the lymphoma analysis (adjusting for age and sex) finds a cluster of eight ZCTAs north of Miami (p = 0.0091; RR = 2.39). Its lymphoma rate is 239% that of the rest of Florida. There are no secondary clusters identified. After also adjusting for race, the same ZCTAs remained a cluster (p = 0.0040; RR = 2.54), as seen in Figure 9. Figure 10 shows the total cancer risk for each ZCTA in Florida. The darker the coloring of a ZCTA, the higher is the cancer risk. To explore a possible association between cancer and air pollution, we used the software Stata to carry out Poisson regression with covariates age, sex, race, and air pollution (using the NATA data). This methodology is widely used in the literature for adjusting of cancer incidence or mortality for covariates. The NATA data are at the census tract level while data on age, sex, and race are measured at the ZCTA level, so ArcGIS was used to aggregate all data to the ZCTA level for the Poisson regression. We first used Poisson regression to predict counts for each of the three cancer types using age, sex, and NATA. Using the random-effects model, the Poisson regression adjusted for spatial correlation. Next, we repeated this, but also included race in the model. This gives additional insight on the extent to which race is confounded with pollution exposure. Specifically, the regression model for the first case for a maximum likelihood random-effects specification was (see, e.g., Allison 2009; Stata 2018) as follows: 1, 2, . . . , n and t = 1, 2 predictor variables x it (age, sex, air pollution) and parameter vector β. In the standard random-effects model, v i is assumed to be iid such that exp(v i ) is gamma distributed with mean one and variance α (estimated from the data). If normal is specified, v i is assumed to be iid N(0, σ 2 ). For more details on the differences between models using Stata, see chapter 4 in Allison (2009). In the second case, we just added the race covariate to obtain cancer counts (and rates) adjusted for age, sex, race, and air pollution.
Many epidemiologic studies have shown an association between air pollution levels and mortality in the USA (Gwynn and Thurston 2001). In our study, for brain tumors the NATA covariate was not significant (p = 0.980) for the model with age, sex, and NATA. When race was added to this model, neither NATA nor race was significant. For leukemia, in the model with age, sex, and NATA, no variable was significant at the 0.05 level, but when race was added, it was significant (p < 0.001) but NATA was not (p = 0.142). Last, for lymphomas, NATA was not significant for the first model (p = 0.384). When race was added, neither NATA nor race was significant. These results make it clear that carcinogenic air pollution cannot be linked to any of these three cancer types (brain, leukemia, lymphomas) in our study.
Finally, we used SaTScan to study boys and girls separately. It is shown in the literature that there are differences in cancer incidence between males and females in childhood (Dorak and Karpuzoglu 2012). They conclude that some cancers are more common in females, but overall, males have higher susceptibility. In this study, for brain tumors in girls it found one significant ZCTA west of Fort Lauderdale with population 1370 (p = 0.0481; RR = 11.73). But for boys, it identified a large cluster (p = 0.0024; RR = 1.67) of 101 ZCTAs close to Jacksonville. For girls, there was one significant leukemia cluster (p = 0.0089; RR = 1.32) in southern Florida. For boys, the only significant cluster (p = 0.0231; RR = 1.37) was in Fort Lauderdale. There were no significant lymphoma clusters for either boys or girls.

Discussion
The most important finding was a negative one, but it is valuable. There is no apparent relationship between the atmospheric carcinogens included in the NATA files and any of the three classes of pediatric cancers considered in this paper. It would be good to see if a similar result holds for water borne carcinogens.  In terms of public health, the analysis found a number of pediatric cancer clusters that appear to be significant and problematic. For brain tumors, the area between Jacksonville and Orlando has small p-value and large relative risk. For leukemia, the area near Clearwater stands out. For lymphomas, there are eight ZCTAs north of Miami that appear dangerous. We note that in some clusters, but not all, the inference depends upon whether one adjusts for race or not. It is noteworthy that for each of the three types of cancer the annual raw rates for the most likely clusters were much higher than the corresponding rates in Florida. Future research could look into the possibility of confounding between race and exposure to carcinogens, which can result from income and education disparities.
Also, we emphasize that a single study, or even a collection of studies, is not definitive. As Waller (2015) pointed out causal epidemiological conclusions are difficult to draw, and that apparent associations often collapse or become ambiguous under closer scrutiny. Nonetheless, the point of this kind of study is to flag locations that appear problematic. It directs future research resources to areas and topics that are more likely to be useful, and can help in prioritizing public health attention.