Quantifying oil palm expansion in Southeast Asia from 2000 to 2015: A data fusion approach

ABSTRACT The fusion of optical imagery with radar data can provide more accurate land cover change analysis of deforestation and tree-based agriculture. Radar data is limited temporally with most geographic areas not covered prior to 2007. This paper presents a new methodology to classify land cover change related to oil palm expansion that takes historic data limitations into account. Our approach utilizes Hansen’s Global Forest Cover data, optical imagery, and texture information, to extract land cover information in Sumatra and Western Malaysia, where historical data is absent. Our method demonstrates how to accurately classify oil palm without radar data with overall accuracies for optical only experiments within 4.4% of optical plus radar classifications. Our results show agricultural land use was the primary driver of land cover change with the largest increase due to oil palm expansion (6.1%). Better estimations of oil palm expansion could be used in sustainable land management policies.


Introduction
Global deforestation is primarily driven by the expansion of agriculture, which currently covers 38% of the Earth's land surface (Ramankutty et al., 2008). The highest rates of deforestation are observed in the Amazon Basin followed by Southeast Asia (Gutiérrez-Vélez et al., 2011;M. C. Hansen et al., 2013;De Alban et al., 2018). The expansion of tree-based plantations and other cash crops (e.g. coffee, cacao) are the largest drivers of deforestation in Southeast Asia (Hansen et al., 2009;Austin et al., 2019; M. C. Hansen et al., 2013). Indonesia, Malaysia, and Thailand are home to 80% of the world's oil palm plantations, which is driven by global demand for oil palm-derived products, such as renewable energy, food-based, and health/beauty products (Khatun et al., 2017;Mekhilef et al., 2011;Pirker et al., 2016;Wicke et al., 2011). Since 2000, high deforestation rates have occurred in Indonesia and Malaysia (M. C. Hansen et al., 2013), affecting biodiversity, local and regional climate, and land and water quality (Ahrends et al., 2015;Qaim et al., 2020).
A variety of remote sensing data and methods are used to classify tropical deforestation and monitor agriculture expansion. Methods include visual classification from aerial photographs (Skole & Tucker, 1993), supervised and unsupervised classification methods (Giri et al., 2011), vegetation indices (e.g. normalized difference vegetation index (NDVI), enhanced vegetation index (EVI;DeVries et al., 2015;Huete et al., 2002), statistical methods (e.g. regression analyses (Hansen et al., 2010), bagging and bootstrapping (Hansen et al., 2008), expert systems and multicriteria methods (Nangendo et al., 2007;Vaiphasa et al., 2006), deep-and sub-pixel analysis (DeFries et al., 2002), and more recently, machine learning and time series functions (Caughlin et al., 2021). These methods -and automated approaches -are most effective at classifying tropical forests in areas that include a combination of tropical forests, agriculture, urban areas, water, and bare soil. Classifying different types of tropical forests and/or tree-based agriculture can be challenging due to their spectral similarity, limiting the ability to differentiate between multiple forest types. This is particularly problematic when canopy penetrating sensors, such as radar data, are widely unavailable prior to 2007. Given the need to better monitor the extent of deforestation, methods to classify oil palm without radar are needed.
Quantifying the expansion of oil palm plantations is critical for understanding environmental and social impacts both globally and locally (Meijjard et al., 2020). Meijjard et al. (2020) estimate that oil palm plantations covered at least 19.5 Mha globally in 2019. The problem they point out is the significant variations in estimates of the extent of oil palm plantations, suggesting we need to better quantify the total area earlier this century as well as later. Better estimations of oil palm expansion could be used to develop sustainable land management policies that balance economic development with environmental and social implications of deforestation.
Data availability is the main challenge of measuring the expansion of oil palm plantations for dates prior to 2007. Radar data (e.g. SAR), which became widely available in 2007, can improve the classification of oil palm because it returns a signature specific to the physical structure of the plant (Dong et al., 2013;Reiche et al., 2015;Cheng et al., 2018;Joshi et al., 2016;Torbick et al., 2016;Chong et al., 2017;De Alban et al., 2018). Most oil palm plantations studies that utilized radar data have been at the local, sub-region scale (De Alban et al., 2018;Morel et al., 2012;Ramdani, 2019;Zeng et al., 2018) or single date classifications (Cheng et al., 2018;Koh et al., 2011;Miettinen et al., 2016;Sarzynski et al., 2020). From the 1970s to 2007, optical remotely sensed imagery (e.g. Landsat) has been the primary data source used to quantify the extent of deforestation (M. C. Hansen et al., 2013). The challenge with classifying oil palm using only Landsat is the spectral similarity between native forests and monoculture tree plantations. Very high-resolution imagery (e.g. <5 m) can provide detailed information needed to distinguish oil palm agriculture from forests, but the geographic coverage and earlier temporal coverage is extremely limited. This paper presents a new methodology to classify land cover change related to oil palm expansion and other tree-based agriculture that takes historic data limitations into account prior to radar availability. Our approach utilizes Hansen's Global Forest Cover data (available from 2000 to 2020), optical imagery (Landsat 7), and texture information, to extract land cover information in Sumatra and Western Malaysia, where historical radar and high-resolution imagery data are absent. We tested different classifiers for 2015 and selected the best classifier using datasets available only in 2000 to classify land cover (including oil palm) for 2000. We quantify land cover change from 2000 to 2015 and discuss the implications of our results. More accurate information on these long-term trends can be used to better assess the tradeoffs between economic gains and negative consequences of oil palm expansion. Furthermore, this approach provides insights into how land cover classification can be enhanced with limited data.

Study area
We calculated oil palm expansion for Sumatra and West Malaysia from 2000 to 2015 (Figure 1) where oil palm plantations have expanded here at a greater rate than any other tropical country due to suitable climate and economic need (Margono et al., 2014). These two locations lie in the archipelago of Southeast Asia between the Indian and North Pacific Oceans resulting in a climate that favors year-round warm temperatures and high levels of rainfall suitable for oil palm. The terrain consists mostly of coastal lowlands with the Bukit Barisan Mountain range, spanning north to south on Sumatra. The total land area studied covers 560,000 square kilometers and is comprised of urban areas, croplands including forest-based plantations (e.g. rubber, oil palm), and multiple types of tropical forests such as evergreen broadleaf, upper and lower mountain, peatland, sago and mangrove. Native tropical forests, which cover approximately half of Sumatra, continue to decrease rapidly due to agricultural expansion associated with large monoculture plantations (Miettinen et al., 2016) mostly in lowland areas. However, tree-based plantations are expanding into highland areas (Ahrends et al., 2015;Hurni et al., 2017;Zeng et al., 2018). The population of Indonesia and Malaysia has seen a 22% and 31% increase between 2000 and 2015; a pace which is expected to increase into 2050 (United Nations, Department of Economic and Social Affairs, Population Division, 2019).

Data
Three primary remote sensing data sources were used for land cover classification: optical (2000 and 2015), nighttime lights composites (2000 and 2015), and radar (2015 only). These data were accessed through Google Earth Engine (GEE) and were pre-processed for consistent geo-registration allowing imagery from many sources to be compared. GEE is a cloud-computing platform optimized for parallel processing of geospatial data with access to time-series satellite imagery and coding platforms (Gorelick et al., 2017). Additionally, two land cover datasets and forest cover change data were used to augment training and validation samples for land cover classification in 2000.

Optical imagery
Landsat 7 and 8 imagery were acquired for the years 2000 and 2015, respectively. Landsat imagery has been used extensively to assess land cover and land cover change because of the satellite's long data collection history (Goldblatt et al., 2018;Woodcock et al., 2008). In GEE, we obtained all available Landsat 7 imagery Tier 1 from 1 January 2000 to 31 December 2000 (n = 543) and Landsat 8 imagery Tier 1 from 1 January 2015 to 31 December 2015 (n = 787), which have a spatial resolution of 30 meters. Annual cloud free composites were created using GEE Landsat simple composite algorithm, which uses the median of the least cloudy pixels based on a simple cloud score and converts the pixels' digital numbers to top-of-atmosphere (TOA) reflectance (Gorelick et al., 2017).

Nighttime lights
Nighttime light information was used to help the classifier distinguish urban areas from bare soil (Goldblatt et al., 2018 (2000), a cloud-free annual composite was acquired at a spatial resolution of 30 arc seconds. Mean average radiance was calculated using the stable lights band. For VIIRS (2015), an annual composite was created from monthly composites. These data have a spatial resolution of 15 arc seconds and have been corrected for cloud cover and stray lights. Mean average radiance was calculated using the average radiance band.

Radar
For 2015, Synthetic Aperture Radar-2 (SAR) mosaic data were used to distinguish oil palm plantations from forest. SAR penetrates forest canopies and provides a signal based on the physical structure of the trees, improving the ability to differentiate a monoculture tree plantation from the complex structure of a forest (Cheng et al., 2018;De Alban et al., 2018). We acquired SAR data from Japan Aerospace Exploration Agency's (JAXA) Japan Earth Resources Satellite (JERS-1) Advanced Land Observing Satellite-2 Phased Array L-band (ALOS-2/PALSAR-2; Shimada & Osawa, 2012). ALOS-2/ PALSAR-2 data are provided in 1 × 1 degree mosaic tiles with a spatial resolution of 25 meters and include two dual-polarized bands (i.e. horizontal transmit-horizontal receive (HH) polarization and horizontal transmit-vertical receive (HV) polarization). The Refined Lee filter was applied to HH and HV bands to reduce signal noise in raw SAR imagery and preserve edge information between adjacent land cover types (J. S. Lee et al., 2009). Digital numbers (DN) were converted to sigmanaught values (σ 0 ) using the following equation: where a calibration factor (CF) of −83.0 dB was used for the PALSAR-2 mosaic (J. S. Lee et al., 2009).

Indices
For each Landsat composite, four vegetation indices, two urban indices, and one moisture index were calculated ( To improve the accuracy of land cover classification, texture indices were calculated for nearinfrared, short-wave infrared, and dual-polarization bands (HH, HV) (see , Table 1). Texture information, which can calculate the intensity differences between groups of neighboring pixels (Coburn & Roberts, 2004;Marceau et al., 1990), can capture the uniformity of plantation canopies, spacing, and structure relative to tropical forests (De Alban et al., 2018). For each band, Grey-Level Co-Occurrence Matrices (GLCM) and a 3 × 3 window were used to calculate second-order texture measures of contrast (CON), dissimilarity (DIS), inverse difference moment (IDM), angular second momentum (ASM), entropy (ENT), mean (AVG), correlation (COR), and variance (VAR; Marceau et al., 1990).
Band ratio indices were calculated from radar data to help distinguish between tropical forest and tree-based agriculture and other land cover classes (e.g. tropical forests, urban, water), (see , Table 1

Forest cover change
Hansen Global Forest Change (GFC) data and Center of Remote Imaging, Sensing and Processing (CRISP) data were used to augment training and validate our oil palm classification. Hansen GFC data (Version 1.5) characterize forest extent, loss, and gain from 2000 to 2020 at a 30 m spatial resolution and have a high overall accuracy (Galiatsatos et al., 2020). GFC data have been used to quantify global deforestation (Curtis et al., 2018;Zeng et al., 2018) and regional land cover change (Austin et al., 2019) and assess drivers of direct and indirect land use change (Austin et al., 2019;Magliocca et al., 2020). We extracted forest gain and forest loss to aid in oil palm classification (see, Figure 2). Land cover data were obtained from CRISP for the years of 2000 and 2015 to assist classifying and validating our classification of oil palm. These data were helpful for training and validation because the 2015 data explicitly includes oil palm. These data have a spatial resolution of 250 meter, overall accuracies greater than 80% (see, Miettinen et al., 2016) and include 12 (year 2000) and 13 (year 2015) land cover classes within the study area. CRISP data were reclassified to remove class distinction based on elevation for 2000 and 2015, leaving eight classes for 2000 (no oil palm) and nine classes (because of the addition of oil palm) for 2015 (see Appendix A).

Land cover classifier
To accurately classify oil palm plantations in 2000 for land cover change analysis, we created a classifier with comparable accuracy as those in 2015. We tested eight candidates that used different combinations of indices from optical, nighttime lights, and radar data (see, (Exp 1-4) did not use SAR and four classifiers (Exp 5-8) used SAR data. All land cover classifications were performed in GEE using a random forest classifier, which is an ensemble, non-parametric, machine learning classifier. Random forest is computationally efficient and able to handle multidimensional datasets (

Land cover classifier for 2015
Land cover classifiers (Exp 1-8) were created in three steps: 1) labeling training and validation samples, 2) performing Random Forest classifier using training samples and 3) validating classified images with validation samples. For training and validation samples, we selected a weighted random sample of polygons (30X30 m) proportional to the percent of total land cover using 2015 CRISP land cover data (see, Figure 3a). Using CRISP data to weight samples reduces time required to obtain a sufficient sample size for each land cover class and reduces potential sampling bias. To ensure accurate representation of reference data, each polygon was hand-labeled using visual interpretation of high-resolution imagery in Google Earth (see, Figure 3a). Open and mosaic land cover classes, defined as a mixture of cropland, shrubland, grasslands, and forest in CRISP land cover data, were used to obtain hand label samples for cropland. Sampling for bare soil was also augmented manually since this class was not defined in CRISP land cover data. To avoid potential bias, hand label samples were randomly partitioned into training and validating. Two-thirds of the samples were assigned as training data (n = 4,144) and one-third as validating data (n = 1,106). We used two approaches for classifying the images according to land cover type. Water was classified using a LSWI threshold of zero or less (see, Xu, 2006). The remaining classes (forest, cropland, oil palm, regrowth, peatland, wetland, bare soil, urban) were classified using supervised classification. For each experiment, the inputs (e.g. bands, indices) were stacked to train the random forest classifier. We used the training data (n = 4,144) to train the Random Forest classifier with 100 trees.  To validate classified images (Exp 1-8), we used the validation points (n = 1,106) to calculate five accuracy metrics. We calculated User's, Producer's, Balanced, overall, and F-measures statistics. We compared the results of Exp 5-8 (with SAR data) to Exp 1-4 (without SAR) to select the best combination of bands and indices for the 2000 classifier.

Land cover classifier for 2000
The 2000 image was classified similar to 2015 using training data, Random Forest classifier, and validation data. The differences were 1) the way we selected the training data and 2) that we used the band/index combination from Exp 4 that had the best accuracy for oil palm.
For training and validation data, we utilized Hansen GFC data to reduce the amount of manual labeling. We identified areas where no land cover change occurred between 2000 and 2015 (stable forest and stable non-forest see, Figures 2-3). Using 2015 training and validation samples, we extracted land cover samples from only stable forest and non-forest areas, resulting in n = 3,150 training points for 2000 ( Figure 3c). Because we removed 994 samples where Hansen GFC data showed land cover change, we added 500 samples to the 2000 training dataset using a weighted sampling of CRISP land cover data. To augment land cover classes that had low number of samples, we created a weighted random sample of polygons (30X30 m) proportional to the total land cover, similar to the 2015 sample data, and visually interpreted the land cover class from Landsat imagery. Oil palm samples, however, were identified only using the stable-forest scenario since oil palm cannot be identified using Landsat imagery. We assumed the 2000 and 2015 land cover were the same when no forest cover change was identified. Oil palm trees, reach maturity between 2-4 years and are economically viable from 4 to 30 years (Barcelos et al., 2015), whereas tropical forests have an average lifespan of 101 years or older (Rozendaal & Zuidema, 2011). These samples were randomly partitioned into training and validating, where two-thirds were assigned as training data (n = 3,650) and one-third as validating data (n = 998).
Land cover for 2000 was classified using indices and supervised classification. Water bodies were classified using the LSWI threshold defined above. The remaining land cover classes were classified using supervised classification and indices from Exp 4 that resulted in the highest oil palm accuracy. Bands and indices were stacked and used to train the Random Forest classifier with 100 trees.

Accuracy analysis
Accuracy metrics (User's, Producer's, Balanced, and overall), F-measures statistics, and McNemar's test were calculated to assess the accuracy and statistical significance of the 2015 land cover classification for the eight experimental classifiers. These metrics were calculated using the 2015 validating dataset (n = 1,106). From the accuracy metrics, the most accurate land cover classification with SAR data was selected for the 2015 land cover classification. Additionally, McNemar's test was used to assess the statistical significance of the 2015 land cover classifications with and without SAR and texture information based on overall and Producer's accuracies for the oil palm class. McNemar's (1947) test examines the consistency in responses across two variables that have a dichotomous relationship (e.g. with and without radar data).
To determine variable importance both with and without SAR, we calculated the contribution of each variable to the overall accuracy. Training samples were extracted for the top performing 2015 land cover classifications and 2000 land cover classification. To evaluate the variable of importance (VOI), we computed the mean decrease Gini index (Liaw & Wiener, 2002) using the randomForest package in DisplayR (R Core Team, 2014).

Land cover change
Analysis of land transitions globally and in the tropics provide insight into regional drivers and impacts of deforestation land cover change (Magliocca et al., 2020). Land transitions are typically quantified by calculating the area of land covers between two time periods (Damian et al., 2021;Lambin et al., 2003). For this study, land cover change was calculated from 2000 to 2015 and transitions from one land cover type to another are visualized as a Sankey diagram, showcasing not just the overall change but detailed insight into dynamic gains and losses. To better understand the fragmentation of land cover, we calculated the number of patches and mean patch size using landscape metrics R package (Hesselbarth et al., 2019;McGarigal et al., 2012). Both 2000 and 2015 land cover maps were aggregated to 500 m resolution using the majority rule to run metric calculations on commodity hardware (Stuhlmacher et al., 2020).

Land cover classification performance
For 2015 land cover classification (see , Table 3), Exp 7 and Exp 8 (optical, SAR, and texture information) produced the highest overall accuracies (88.18%), while Exp 1 (optical only) had the lowest overall accuracy (83.78%). In general, 'optical only' experiments performed well, falling within 4.4% of the overall accuracies of 'optical plus' radar experiments. Despite small differences in overall accuracies, radar data was shown to be statistically significant from 'optical only' experiments (see ,  Table 4) in agreement with previous research (De Alban et al., 2018;Chong et al., 2017;Joshi et al., 2016). While texture information increased overall accuracies by 1-2% in 'optical only' experiments (e.g. Exp 1 vs Exp 4) and 'optical plus' radar experiments (e.g. Exp 5 vs. Exp 8), only radar texture information was statistically significant (see , Table 4).
Radar data and texture information did not substantially improve overall accuracies but enhanced the distinction between vegetation classes. Radar data and texture information improved balanced accuracies and F-measures for oil palm by more than 10% (see , Table 3). This enhancement was mostly attributed to the inclusion of radar data (texture and non-texture information) with an increase of 8-9% in balanced accuracies and F-measures that was statistically significant at the 95% confidence level (i.e. Exp 1 vs. 5, Exp 4 vs. 7). This trend was also apparent with regrowth and cropland classes resulting in a 4-5% increase in balanced accuracies and F-measures. Balanced accuracies and F-measures for peatland, wetland, and regrowth classes improved mostly due to the addition of texture information, by approximately 4-6%. Exp 8 consistently had the highest accuracy metrics for forest (~95%), oil palm (~90%), and regrowth (~81%) compared to accuracy metrics for Exp 7. Therefore, Exp 8 was chosen as the most accurate land cover classification for 2015 and provided the baseline for land cover change analysis. Of the 'optical only' experiments, Exp 4 was chosen because of the highest accuracy metrics with the addition of NIR and SWIR texture information (see, Table 5). The land cover classification for 2000 had relatively high accuracy with an overall accuracy of 84.72% (see , Table 5). This accuracy fell within 0.2% of the overall accuracy of Exp 4 for the 2015 land cover classification, indicating high skill in land cover classification absent of SAR.  The experiments showed notable differences in terms of bands that were most important to the 2015 and 2000 land cover classifications. In the 2015 land cover classification, radar texture (HV) bands were consistently the top performing variables in the most accurate 2015 land cover classifications (Exp 7 and 8) (see, Figure 4) and statistically significant in the case of oil palm classification. When omitting radar data (texture and non-texture bands), the SWIR average texture band became the most important variable in the 2000 land cover classification with a mean decrease GINI value of 157.3. In both 2000 and 2015 land cover classifications, LSWI, NDBI, and nighttime lights were the next most important variables, providing information about land use and/or vegetation type and health (see, Figure 4).
For 2015 and 2000, land cover classifications showed high discriminating capabilities between vegetative and non-vegetative classes with forest and urban classes consistently above 88%. Accuracy metrics for most 2000 land cover classes were within 6% of accuracy metrics for 2015 land cover classes (see, Tables 3 and 5), indicating good accuracy and agreement. Discrimination between certain vegetation classes, however, was difficult for the 2000 land cover classification. Accuracy metrics for 2000 oil palm and wetland classes were below 75%, which were ~19% lower compared to the accuracy metrics for 2015 oil palm and wetland classes.

Land cover change
Land cover change from 2000 to 2015 was considerable, amounting to 19% of the total area (114,105.4 square kilometers) (see, Table 6). The largest changes in land cover from 2000 to 2015 occurred as a result of oil palm expansion (increase by 6.1% or 36,033.3 square kilometers) and deforestation (decrease by 5.8% -34,832.2 square kilometers) (see , Table 6 and Figure 5). Oil palm expansion was most evident in the central part of Sumatra (see, Figures 5-6). Approximately 72.2% of  the net gain in oil palm was due to conversions from forest, cropland, and regrowth (see, Table 6). This expansion was in addition to oil palm plantations established in 2000 or earlier, which amounted to 5.8% of the total area (see, Figure 7 and Table 7). More than half of net deforestation resulted from agricultural expansion (i.e. oil palm and cropland) with total gross forest losses in 2000 attributed to the conversion to cropland, oil palm, and regrowth. These losses were most apparent on the eastern side of Bukit Barisan mountains (see, Figure 5), similar to the findings of Zeng et al. (2018) that noted agricultural expansion into the highland areas of Southeast Asia. Net changes in cropland from 2000 to 2015 were mostly offset by changes in other land covers (see, Figure 7 and Table 7). Only 40.6% of cropland remained primarily in the southern part of Sumatra (see, Figure 5 and Table 7). For cropland, gross losses were attributed to oil palm expansion, afforestation (forest and regrowth), and to a lesser extent, urbanization. These losses were offset by gross gains from forest, oil palm, regrowth, and peatland classes, pointing to shifts in cultivation. Similar changes were observed in the regrowth class pointing to shifts in oil palm plantations and cropland.
Forest cover became less fragmented in part due to the rise of large-scale monoculture plantations. From 2000 to 2015, the number of oil palm patches decreased by 24% while the mean size of oil palm patches almost doubled (see, Table 8). Patches of forests were replaced by oil palm mostly in the eastern part of Sumatra (see, Figure 6). Maximum patch size of oil palm also increased by an order of magnitude (0.39 square kilometers vs 2.30 square kilometers). Maximum patch size also increased substantially for regrowth (0.20 square kilometers vs. 0.04 square kilometers). For regrowth, the number of patches and mean patch size decreased by 12.5% and 21.0%, respectively. For peatland, number of patches increased slightly (3%) and mean patch size increased by 27.4%.
In addition to agriculture, urbanization also increased. From 2000 to 2015, urbanization more than doubled from 1.0% to 2.6% (~9,608.9 square kilometers -see, Tables 6-7 and Figure 7). The net gain in urban areas was mainly due to the conversion from agricultural lands (i.e. croplands, oil palm plantations) on the periphery of cities (see, Figures 5-6). The number of urban patches quadrupled (2,368 vs. 9,189), while mean patch size decreased by 36.2% (see , Table 8). Urban losses, although minor (less than 0.2%), were observed in 2000, indicating some issues in class distinction between urban, cropland, and oil palm.

Land cover classifier
By evaluating land cover classifiers in the context of data limitations, our results demonstrate how to accurately classify oil palm without radar data and determine the contributions that different input data have in improving accuracy. We were able to achieve overall classification accuracies for 2000 within 4-5% of land cover classifications that used radar data (Exp 5-8) (see, Tables 3, 5) by using land cover change information and the most accurate land cover classification approach without radar data (Exp 4). In the 2015 optical only land cover classifications, NIR and SWIR texture information better discriminated between oil palm, forest, peatland, and wetland; improving balanced and F-measure accuracies by 4-5% (Exp 1 vs. Exp 4). These bands better discriminate oil palm from forest and other vegetation classes based on texture differences associated with oil palm's distinctive canopy crown (star-like) and grid-like planting. This additional information yielded relatively high balanced and F-statistic for the 2015 oil palm class (82.38%) and may explain why the SWIR average texture band was the most important variable in 2015 optical only land cover classification. Following the SWIR average texture band, LSWI, NDBI, and NTL bands were important variables in random forest classification due to their ability to discriminate vegetation content and land cover types (i.e. forests from plantations, croplands from urban areas) (see , Table 1 and Figure 3).
Oil palm classification, however, was most improved with the addition of radar data and texture information of HH and HV bands, which were statistically significant and consistent with the findings of De Alban et al. (2018) and Sarzynski et al. (2020). Radar information better distinguishes oil palm from tropical forests by returning a signature specific to the physical structure of the plant, penetrating through canopy cover (De Alban et al., 2018). Although radar information proved to be the most important variables in the most accurate land cover classification for 2015 (Exp 6 ~ 88.18%) (see , Table 3 and Figure 4), radar data only increased overall accuracies by 4%.
The methodology presented here provides a more efficient and accurate approach to monitor land cover change where historic data is limited. Land cover change analysis was supported by 1) evaluating land cover classifiers that consider data limitations and 2) extracting land cover change information from highly accurate forest cover change datasets and existing land cover maps. In this study, stable-forest and stable non-forest scenarios (no loss or gain) were generated using Hansen's GFC data to obtain oil palm samples for the 2000 land cover classification, since oil palm cannot be identified using traditional methods (i.e. visual interpretation, digitizing). These data have high overall accuracy (~99.5%) and have been used in other land cover change studies related to deforestation (M. C. Hansen et al., 2013), agricultural expansion (Zeng et al., 2018), and oil palm plantations in Kalimantan (Gaveau et al., 2016). Additionally, time required for sampling was reduced by using CRISP land cover data to show where samples should be scattered. These samples were hand-labeled to ensure accurate representation of reference data, thereby also addressing potential misclassification errors associated with Hansen's GFC data and CRISP land cover (Tropek et al., 2014). By identifying the best data combination for historical land cover classification prior to radar, this approach may better estimate oil palm expansion, especially where oil palm plantation locations are not reported, and can be applied to other geographical areas, where differentiating between vegetation types is difficult. Furthermore, this method provides a strategy for building a better classifier for time series analysis when earlier time periods lack richer datasets.

Land change analysis
The improvements in our classification's efficiency still yielded the expected trends in the land cover transitions over a 15-year timespan in Sumatra and Western Malaysia. We found that human activities had a particularly large impact, with increases to urban areas, agriculture, and cropland as well as the decreases to forest and wetland. Our results show that agricultural land use was the primary driver of land cover change from 2000 to 2015 (see , Table 6 and Figure 7). Oil palm and croplands were responsible for approximately 72% of the deforestation. These findings were similar to Imai et al. (2018), De Alban et al. (2018, and Zeng et al. (2018). Oil palm expansion increased by 6.1% (~36,033 square kilometers), resulting in the largest increase in land cover, supporting a regional assessment of deforestation drivers in Indonesia by Austin et al. (2019). In addition to agriculture, urbanization also increased from 1.0% to 2.6% (~9,608.9 square kilometers) primarily due to the conversion from agricultural lands on the periphery of established cities and the 20-30% increase in population between 2000 and 2015 (United Nations, 2019). While differences in land cover from 2000 to 2015 could partially be attributed to the difference in model inputs (e.g. 'optical plus radar' vs. 'optical only', our land cover change analysis shows similar trends with Zeng et al. (2018) and Austin et al. (2019). Understanding the extent of oil palm expansion and other land cover changes are vital to developing sustainable policies that balance the long-term environmental implications with economic development goals.
As Figure 7 shows, land cover transitions were complex. While forest remained at 39.9% of the total land area in 2015 compared to 45.7% in 2000, only 63.4% remained native/virgin forest. Many oil palm plantations have been established through clear-cutting of primary forests (Margono et al., 2014) as well as replaced cash crops in agricultural areas or degraded or secondary forests in regrowth areas (Gatto et al., 2015;Qaim et al., 2020). Large-scale monocultures such as oil palm are structurally less complex than forests and can lead to habitat destruction for rare and endemic species with global consequences for biodiversity (Qaim et al., 2020;Sodhi et al., 2004). Oil palm expansion, which has been incentivized by global demand and supported by government for economic development, has led to the rise of large-scale monoculture plantations (Khatun et al., 2017). Approximately 60% of oil palm plantations are managed by industry and hold long-term concessions, whereas only 40% are operated by smallholders (Byerlee et al., 2017;Qaim et al., 2020). Smallholder oil palm operations have been shown to be more biodiversity friendly and can also help alleviate poverty among rural farmers (Azhar et al., 2017). Institutions and policy, especially agricultural policy, have the potential to dramatically alter landscapes (Stuhlmacher et al., 2020). Efficient and accurate classifications, like the ones we proposed and tested, are essential to policy efforts to preserve remaining primary and secondary forests.

Future work
In addition to data fusion methods tested here, future work should examine land cover change algorithms in their ability to detect historical oil palm. Landsat-based detection of trends in disturbance and recovery (LandTrendr) and vegetation change tracking (VCT) algorithms may better detect land cover changes related to tree-based agriculture because they can capture abrupt signal changes in time series imagery associated with clear-cutting followed by green-up period of maturing oil palm trees (Kennedy et al., 2010;Hu & Dong, 2018). These algorithms could better address sample size issues with historic classifications and potentially reduce spectral confusion between land cover classes (e.g. urban | cropland | bare soil). Continuous Change Detection and Classification (CCDC), similar to LandTrendR and VCT, uses more complex calculations to detect changes in surface reflectance and thermal imagery and may better detect land cover changes related to tree-based agriculture (Zhu & Woodcock, 2014;Hu & Dong, 2018). CCDC could be especially useful in discriminating between mature and immature oil palm since immature oil palm are likely misclassified as cropland or bare soil due to their higher percentage of exposed soils and lower vegetation fraction (see, Ordway et al., 2019). These algorithms could provide more accurate results related to oil palm expansion.

Conclusion
We developed a more accurate land cover classifier that can distinguish between native forests and plantations for the year 2000. Radar data, which have been shown to improve the accuracy of oil palm agriculture by 8 or 9% over optical data only, are only available after 2007. To classify oil palm land cover for 2000, we evaluated 2015 land cover classifiers in the context of data limitations using Landsat imagery, nighttime lights data, radar data, and derived indices. We were able to achieve overall classification accuracies for 2000 within 4-5% of land cover classifications that used radar data (Exp 5-8) using a combination of vegetation indices to better discriminate vegetation moisture (e.g. LSWI) and plantations from forests, nighttime lights and NDBI to distinguish croplands from urban areas, and SWIR and NIR texture information to detect the spatial homogeneity of plantations relative to tropical forests.
Land cover change analysis was supported by extracting land cover change information from highly accurate forest cover change datasets and existing land cover maps. This approach allows us to analyze and better estimate land cover change related to tree-based agriculture for time periods prior to 2007 and could be applicable to other applications where differentiating between vegetation types is difficult and radar data are limited. In our study, oil palm plantations expanded from 15% to nearly 21% of the total land area of Sumatra and West Malaysia between 2000 and 2015.
These findings are consistent with those of Austin et al. (2019), who also found that agricultural land use was the primary driver of land cover change in Sumatra and Western Malaysia. Oil palm expansion combined with croplands were responsible for approximately 72% of the deforestation with 39.4% and 34.7% attributable to oil palm expansion and shifting cultivation, respectively. Better monitoring of oil palm expansion, through methods like data fusion, is needed to help policy-makers develop sustainable land management practices that balance economic development driven by global demand with environmental and social implications of deforestation.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Funding
This work was supported by the National Science Foundation CR-Water Sustainability and Climate [EAR-1204774, 2012]; NSF 1204774.

Supplementarty material
Supplemental data for this article can be accessed here