Cross validations of radio-frequency interference signature in AMSR-E data using two detection methods

Abstract Radio-frequency interference (RFI) detection for low-frequency microwave measurements is an important step before these data are applied to geophysical parameter retrieval or data assimilation. There are several robust techniques to identify the RFI signals, such as the mean/standard deviation method and the normalized principal component analysis method. However, verification of these existing detection methods remains an open issue in the absence of a reliable validation data-set of the ‘true’ RFI signals. In this paper, a cross-validation scheme using two independent RFI detection methods is proposed to derive the thresholds for identifying the RFI-contaminated data for the Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E). It is shown that the new scheme is effective in the quantitative classification of the RFI signals in the AMSR-E C- and X-band channels over the continents. Strong RFI signals are found to be populated over cities of the United States at AMSR-E C-band, while RFIs at X-band are mainly observed over Europe and Japan.


Introduction
The Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E) is a passive remote sensing radiation imager instrument onboard NASA's Aqua satellite, which was launched on 4 May 2002. The AMSR-E instrument was developed by the Japan Aerospace Exploration Agency (JAXA) for observing water-related geophysical parameters applied to global change science and monitoring (JAXA 2006;Kawanishi et al. 2003). The lower frequency channels from AMSR-E are sensitive to surface emissivity and designed to enhance surface sensing capabilities, providing soil moisture, surface temperature, sea surface wind speed, vegetation water content, and snow cover (Kawanishi et al. 2003;Weng and Grody 1994;Wentz et al. 2000). AMSR-E has similar frequencies to the WindSat radiometer onboard the Coriolis satellite, except for the 89-GHz channel. In addition, it has all the frequencies of the MicroWave Radiation Imager (MWRI) onboard the FengYun (FY)-3B/C satellite, besides a 6.925-GHz channel. FY-3B and FY-3C were launched on 5 November 2010 and 23 September 2013, respectively. AMSR-2, onboard the Global Change Observation Mission 1st -Water satellite, was successfully launched on 18 May 2012. The channels of AMSR-2 are the same as those of AMSR-E, except for the channels at 7.3 GHz.
Similar to WindSat and MWRI, the AMSR-E C-and X-band channels operate in unprotected bands since many active commercial services employ the microwave spectrum, such as weather radar, air traffic control, cell phones, garage door remote control, defense tracking, vehicle speed detection, etc. These C-and X-band channel measurements from the Earth's relatively weak thermal emissions may suffer from interference from signals originating from manmade microwave transmitters, collectively simultaneously. The field of view (FOV) size and the location of each frequency's pixels differ from one another. The higher the frequency, the smaller the FOV. Table 1 provides the channel characteristics of AMSR-E.
The level 2A AMSR-E brightness temperatures for a onemonth period in February 2011 are used in this study and binned onto a 1/3° × 1/3° grid over the globe. A coastline mask from the level 2A AMSR-E data is applied to remove data over ocean and large inland water regions.

Mean/standard deviation method
Earlier studies indicate that the brightness temperature spectral differences (i.e. TB 6H − TB 10H , TB 10H − TB 18H ) are usually negative at frequencies below 30 GHz over most land surfaces, as scattering effect is significantly weaker than emission (Li et al. 2004). The subscripts refer to the frequency and polarization. An RFI signal at C-band (X-band) produces a positive spectral gradient by increasing the brightness temperature at C-band (X-band). Thus, the spectral difference index is widely employed as an RFI discriminator (Li et al. 2004;McKague, Puckett, and Ruf 2010;Wu and Weng 2011). However, the scattering effect from snow and ice can also reverse the spectral difference gradient by decreasing the brightness temperatures at higher frequencies (Zou et al. 2012). As RFI signals originated from manmade transmitters are often pulsed or intermittent, the standard deviation of spectral difference can be chosen as an additional discriminator. Njoku et al. (2005) combined the mean and standard deviation of the RFI-sensitive spectral difference indices to detect RFIs at AMSR-E C-and X-band channels over the global continents. Also, the thresholds for classifying RFI signatures were determined subjectively according to the temporal variations of means and standard deviations.

NPCA method
In order to distinguish RFI signatures from the scattering effect over frozen grounds, the NPCA method was referred to as radio-frequency interference (RFI). RFI signals are often directional, narrow-banded, isolated in space, and persistent in time. The RFI signatures in brightness temperature measurements, if not identified and removed, introduce errors in AMSR-E retrievals. It is therefore important to identify RFI-contaminated data prior to carrying out geophysical parameter retrieval and data assimilation.
Currently, several robust RFI detection methods exist for various microwave imagers, including the spectral difference method (Li et al. 2004;McKague, Puckett, and Ruf 2010;Njoku et al. 2005;Wu and Weng 2011), the mean/ standard deviation method (Njoku et al. 2005), and the principle component analysis (PCA) method (Li et al. 2006). There are also some extended PCA techniques, including normalized PCA (NPCA) (Zou et al. 2012;Zou, Tian, and Weng 2014) and double PCA (DPCA) (Feng, Zou, and Zhao 2016;Guan, Xia, and Zhang 2015;Zhao, Zou, and Weng 2013). However, the verification of all RFI detection methods remains unresolved, since there is no reliable validation data-set of the 'true' RFI signals. Also, how to determine the exact RFI detection threshold by using each method is still an open issue. In this study, a cross-validation scheme is developed to obtain the thresholds for classifying the RFIcontaminated AMSR-E data over the continents.
The paper is organized as follows: Section 2 provides a description of the Aqua AMSR-E data characteristics. Two RFI identification methods are briefly described in section 3. The cross-validation scheme and numerical results are presented in section 4. Section 5 gives a summary and some conclusions.

AMSR-E data description
The AMSR-E instrument onboard Aqua flies in an afternoon-configured (1330 UTC) sun-synchronous orbit at an altitude of 705 km, with an observation swath width of 1445 km (JAXA 2006). The antenna beams view the Earth at a constant earth incidence angle of 55° in the conically scanning mode. The six frequency channels at 6.925, 10.65, 18.7, 24.8, 36.5, and 89.0 GHz are dual polarimetric, providing the horizontal and vertical polarizations presented to detect RFIs over land (Zou et al. 2012;Zou, Tian, and Weng 2014). It utilizes the multi-channel correlation differences between natural thermal radiation and RFI-contaminated data, as well as the normalized RFI indices mitigating the disparities between non-scattering and scattering surfaces. Zou et al. (2012) concluded that this detection method works well in identifying MWRI X-band RFI signals during both summer and winter. The second RFI detection method used in this paper is the same as that employed in Zou et al. (2012). To identify RFIs by using NPCA, the normalized RFI indices matrix is reconstructed from five principle component (PC) modes. The high values of the PC coefficient for the first PC mode indicate the high probabilities of the presence of RFI.

Numerical results
The existing RFI detection methods, including the mean/ standard deviation method, the NPCA method and the DPCA method, appear to be robust in detecting moderate and strong RFI signals for microwave imagers. Nevertheless, none of these methods can objectively obtain the threshold for classifying RFIs, which is desired in operational data processing and data assimilation systems. In this study, the RFI results from the mean/standard deviation method and NPCA are compared to explore an exact threshold for quantitatively classifying the RFI-contaminated AMSR-E data over land. Figure 1(a)-(d) present the regional distributions of the monthly means and standard deviations of the spectral indices between 6.925 and 10.65 GHz horizontal and vertical polarization for the AMSR-E ascending data over the United States in February 2011. Large positive mean and standard deviation values indicate the presence of strong RFI at 6.925 GHz, most remarkable from Boston along the eastern corridor to New York, Washington D.C., Richmond, Atlanta. The other places with high values are scattered in the central and western regions (e.g. Denver, La Crosse, etc.). Some areas show high positive standard deviations but low mean values (even negative mean values), such as the tongue-shaped region over (37°-50°N, 100°-90°W). This feature is related to the time-varying weather phenomena with snowfall, which leads to high standard deviations at the monthly time scale (Figure 1(c)). However, this will not be confused with RFI signals, as the dual criteria of mean and standard deviation is employed. The NPCA-based RFI distributions over the United States at 6.925 GHz for horizontal and vertical polarization during are constructed for the standard deviations (σ) and means (μ) with the 1st PC coefficient (α) satisfying α ≥ −0.4 or α ≥ −0.2 indicated in color, and α < −0.4 or α < −0.2 indicated in black (Figure 3). The thresholds of μ and σ are displayed as horizontal and vertical dashed lines, respectively. The majority of black points are clustered within the range of −20 K ≤ μ ≤ 10 K and 0 K ≤ σ ≤ 10 K. Basically, the points of color have positive mean values and high standard deviations. Color points far from the axes are indicative of strong RFI contamination. However, some color points with small means of spectral differences (horizontal polarization) or standard deviations (vertical polarization) exist, which will be classified as RFI signals by using the thresholds of μ and σ. Also noticeable is that some black points with large positive means and standard deviations are not identified as RFI-contaminated by NPCA. By applying the thresholds of α = −0.4 and α = −0.2 to NPCA, we obtain the RFI distributions at 6.925 GHz for horizontal and vertical polarization over the United States in February 2011 (Figure 4(a) and (b)). The intensities of RFI signals, expressed by the 1st PC coefficients, are indicated using the same color convention as in Figure 3. The RFI signals are found mostly near the big cities or populated areas. The RFI signatures for horizontal polarization are February 2011 are shown in Figure 1(e) and (f ). The 'RFI signal' is actually the PC coefficient for the first PC mode of the data matrix constructed in NPCA (Zou et al. 2012). Overall, similar results for strong RFIs are obtained by using NPCA. However, the mean/standard deviation method is more sensitive to the snow scattering effect, especially for the horizontal polarization, which leads to remarkably high standard deviations in the northwest of the United States. In addition, NPCA identifies a wider range of 'RFI signals' than the former method does along the eastern corridor. It is also seen that the NPCA-based RFIs at 6.925 GHz are evidently stronger for horizontal polarization than vertical polarization, while the intensities of RFIs detected by the mean/standard deviation method are quite similar for these two polarizations.
The similarity in the results obtained by using the two independent RFI detection methods gives confidence in the RFIs identified over the United States. Therefore, the RFI results from the mean/standard deviation method and NPCA are compared point-by-point to find out the thresholds for the mean, standard deviation and 1st PC coefficient (marked as μ, σ, and α, respectively), when the number of RFI signals classified simultaneously by these two methods reaches the maximum value. Figure 1(c) and (d) show that over the non-scattering surfaces (snow-free and ice-free surfaces) the distribution range varies relatively little with respect to the monthly standard deviation value when the inequality σ ≥ 3 K holds true. Njoku et al. (2005) also concluded that the standard deviations are more temporally stable than the means. Therefore, only the variations of the temporal mean of spectral differences (μ) and the 1st PC coefficient (α) are taken into account while the threshold of standard deviation remains the constant of 3 K. Figure 2 illustrates the variations of the percentages of 'RFI-contaminated' data points classified simultaneously by both detection methods with respect to μ and α. As the value of μ increases, the 'RFI-contaminated' data points increase initially, and then decrease rapidly with the maximum value at μ = 5 K (μ = 8 K) for horizontal (vertical) polarization. With regard to α, the percentages reach the maximum when α is equal to −0.4 and −0.2 for horizontal and vertical polarization, respectively. As a result, the thresholds of μ = 5 K and σ = 3 K for horizontal polarization, and μ = 8 K and σ = 3 K for vertical polarization, are chosen to classify RFI at 6.925 GHz over the United States by using the mean/standard deviation method. On the other hand, the thresholds of α = −0.4 for horizontal polarization and α = −0.2 for vertical polarization are applied in the NPCA method. These thresholds are selected by objective cross-validation of the RFI results, rather than subjective inspection.
To examine the appropriateness of the thresholds obtained by the cross-validation scheme, 2D histograms positive spectral differences and high standard deviations, which is also shown in Figure 3.
To test the performance of the proposed cross-validation scheme in other areas of the globe, it is further applied to AMSR-E data over eastern Asia (25°-45°N, 110°-145°E) and Europe, respectively. The RFIs at 6.925 GHz over Asia and Europe are much sparser than that at 10.65 GHz (figures omitted). For classifying RFI signals at 10.65 GHz, the thresholds of μ = 5.5 K, σ = 3 K, and α = −0.1 for horizontal obviously stronger than the vertical polarization. Similar results are obtained by using the mean/standard deviation method with the aforementioned threshold values (Figure 4(c) and (d)). Compared to the NPCA-based RFI maps, the RFI signals detected by the latter method are more widely distributed in the northwest of the United States, while slightly fewer RFIs are classified along the eastern corridor. The RFI increase in the northwestern region is mainly due to the time-varying snow features, leading to both large States, Asia and Europe by using the mean/standard deviation method and NPCA method. The mean/standard deviation method combines the spatial and temporal features of the spectral differences for RFI-contaminated brightness temperatures. NPCA takes advantage of the multi-channel correlation differences between natural thermal radiation and RFI-contaminated data, as well as a set of normalized RFI indices mitigating the disparities between non-scattering and scattering surfaces. Classifications using these two methods appear to be robust in identifying strong RFI over the continents. Therefore, the RFI results from these two methods are compared quantitatively to obtain the exact classification thresholds by maximizing the percentages of the 'RFI-contaminated' data identified by the two methods simultaneously.
The RFI signals are populated over cities of the United States at the AMSR-E C-band horizontal polarization channel, while RFIs at the corresponding vertical polarization channel are clearly weaker. In Japan, strong RFI is found at both the C-and X-band channels. The RFI signals at X-band are mainly observed over Europe and Japan. In general, strong RFI signals are located in big cities or populated areas. Applications to AMSR-E RFI detection over various continents show the robustness of the proposed cross-validation scheme for deriving the RFI classification thresholds.
The classification maps of NPCA-based RFI at 10.65 GHz over eastern Asia and Europe using these thresholds are provided in Figure 5. In the eastern Asia region, significant RFIs are found over Japan. Very few RFIs are found over China, except over Beijing and Shanghai cities for horizontal polarization (Figure 5(a) and (b)). The locations of RFI-contaminated AMSR-E data are close to those of FY-3B MWRI data for the 10.65-GHz channels over eastern Asia (Zou et al. 2012). In Europe the RFI signals are mainly located in the United Kingdom, Italy, and west of Turkey. The RFIs of horizontal polarization resemble those of vertical polarization, except that west of Turkey has a slightly wider distribution of RFI signals for vertical polarization. Njoku et al. (2005) showed similar 10.65-GHz RFI results for vertical polarization over Europe.

Summary and conclusions
This paper presents results of a cross-validation scheme to derive the thresholds for quantitatively classifying RFIcontaminated AMSR-E data over the continental United