A unified moment magnitude earthquake catalog for Northeast India

Abstract Earthquake-related studies on seismicity and seismic hazard assessment need a homogenous earthquake catalog for the region studied. A homogenous earthquake catalog for Northeast India region was compiled using derived regional and global empirical relationships between different magnitudes and moment magnitude based on an improved error-corrected methodology suggested in the recent literature. To convert smaller magnitude earthquakes, global empirical equations were derived and used. A procedure is suggested to change different magnitudes into moment magnitude. A homogenous earthquake catalog of 9845 events was compiled for the time period 1897–2012. Entire magnitude range (EMR) was found to be the most reasonable method for estimating magnitude of completeness. Derived local and global empirical equations are useful for every seismic hazard or seismicity study. A complete and consistent homogenized earthquake catalog prepared in this study could provide good data for studying earthquake distribution in Northeast India. By carefully converting these original magnitudes into homogenized Mw magnitudes, an obstacle is removed for the consistent assessment of seismic hazards in Northeast India.


Introduction
Northeast India is one of the most seismically active regions on a world basis, lying within the geographical coordinate 20 -30 lat. and 87 -98 long. Two great earthquakes, namely, Shillong earthquake in 12 June 1897 and Assam earthquake in 15 August 1950 occurred in this region a short time ago. Frequent moderate to intermediate magnitude earthquakes occur in the region.
To properly understand an earthquake phenomenon in a region, a complete and consistent earthquake catalog is essential. In general, earthquake catalogs for regional seismicity are heterogeneous in magnitude types, whereas a homogeneous and complete earthquake catalog is a basic requirement for studying earthquake distribution in a region, as a function of space, time and magnitude. Because of the inherent complex nature of earthquake phenomena and variations in instrumental characteristics, network coverage and observational practice, different definitions and methodologies have been devised for determining the earthquake size that leads to different magnitude scales. Due to use of different magnitude scales, an earthquake database becomes inhomogeneous in terms of earthquake size. So, in order to build a homogeneous earthquake catalog for a seismic region, the regression relationships used for changing different magnitude types into a preferred magnitude scale (i.e. to moment magnitude M w as it does not saturate at higher magnitude levels) are critically important since bias introduced during the conversion propagates errors in the frequency magnitude distribution parameters and consequently in the seismic hazard estimates. Most of the regression relationships used for magnitude conversion is based on the assumption that one of the magnitudes (independent variable) is error-free. When both the magnitude types contain measurement errors and the use of the standard least-squares regression procedure leads to systemic errors as high as 0.3-0.4 (Castellaro et al. 2006) in magnitude conversion, it is inadequate and, even more important, it may result in catalog incompleteness. General orthogonal regression (GOR) analysis is more appropriate to estimate regression relationships between different magnitude types (Thingbaijam et al. 2008;Ristau 2009;Das et al. 2012aDas et al. , 2012bDas et al. , 2013Das et al. , 2014aDas et al. , 2014bDas et al. , 2018aDas et al. , 2018b. However, it is well addressed in different studies on GOR procedure usability to obtain an unbiased estimate of the dependent variable (Das et al. 2012a(Das et al. , 2012b(Das et al. , 2013(Das et al. , 2014a(Das et al. , 2014b(Das et al. , 2018a(Das et al. , 2018bWason et al. 2012).
In regards to the homogenization of an earthquake catalog for Northeast India using regression relationships, several authors have worked on empirical relationships between different magnitudes and moment magnitude scale (Thingbaijam et al. 2008, Yadav et al. 2009Das et al. 2012aDas et al. , 2012bAnbazhagan and Balakumar 2019;Nath et al. 2017;Pandey et al. 2017). Thingbaijam et al. (2008) derived GOR for the conversion of body and surface wave magnitudes into moment magnitude, finding significant dispersion in the conversion of m b,ISC into M w . In a study conducted for this region, Yadav et al. (2009) used the standard regression technique for the conversion of m b and M s into M w . Das et al. (2012aDas et al. ( , 2012b derived regression relationships for m b to M w and M s to M w using standard regression, GOR and inverted standard regression, considering the error variance value (g) as 0.36 and 1, respectively. Considering the global earthquake database, these values are further modified as g ¼ 0.2 and 0.56 in a separate study by Das et al. (2014aDas et al. ( , 2014b. Errors of different magnitude scales are shown in Table 1. In this study, an improved GOR methodology was used, as suggested by Das et al. (2018b), for deriving regression relationships to change body and surface wave magnitudes into moment magnitudes on a regional and global basis. A dataset of 9845 earthquake events was used in terms of M w in the magnitude range 1.6-8.7 belonging to the region studied (lat. 20 -30 and long. 87 -98 ) for the period 1897-2012. GOR regression relationships using regional and global datasets for the conversion of m b and M s into M w were derived. In the compiled earthquake catalog for Northeast India, most events occur in a lower magnitude range (m b and M s ), moment magnitude values being available only for a few events. In view of the scarce moment magnitude data in the lower magnitude side of Northeast India, global seismicity-based GOR relationships were used for the conversion of m b and M s into M w . The homogenous catalog will remain open for complementing it in the near future with advanced seismic moment scale M wg (Das Magnitude scale, Das et al. 2019) or energy magnitude (M e ), and/or other related earthquake size measuring scales which allow a better characterization of the rupture mode and released seismic energy. The homogenous catalog will not only serve as a sound database for seismic risk assessment interests, but also for many other purposes. Following the earthquake catalog unification, declustering was also conducted, its completeness being assessed in the next section.

GOR methodology
GOR methodology based on the minimization of Euclidean distance between the given observed points and the corresponding points on the GOR line (Madansky 1959;Kendall and Stuart 1979;Fuller 1987;Carrol and Ruppert 1996;Das et al. 2012aDas et al. , 2012bDas et al. , 2014aDas et al. , 2014bDas et al. , 2018aDas et al. , 2018bKarimiparidari et al. 2013;Wason et al. 2012;Goitom et al. 2017) was used. A critical detail of the GOR procedure is explained in Appendix.

Regional regression relationships
For the conversion of m b , ISC and m b,NEIC into M w,GCMT in the magnitude range 4.8 m b,ISC/ m b,NEIC 6.1, the regression relationships were derived using newly developed GOR with g ¼ 0.2 (Das et al. 2011), based on a dataset of 116 and 106  events for the period 1976-2007, respectively. The regression parameters obtained for these conversion relationships are shown in Table 2 and the plots are shown in Figure 1(a and b), respectively. In addition, the regression relationships between M s,ISC and M w,GCMT using 93 events in the range 4.1 M s,ISC 6.1, and between M s,NEIC and M w,GCMT using 57 events in the range 4.2 M s,NEIC 6.1, were derived following the newly developed GOR procedure. The regression parameters for the corresponding regression relationships are shown in Table 2

Global regression relationships
For the conversion of m b , ISC to M w,GCMT into the magnitude range 2.9 m b , ISC 6.1, and m b,NEIC to M w,GCMT in the magnitude range 3.8 m b , NEIC 6.1, we derived the GOR relationships with g ¼ 0.2 based on a dataset of 22,803 and 22,340 events, respectively, for the period 1976-2006. The regression parameters obtained for these conversion relationships are shown in Table 2 and the plots are shown in Figure 1(e and f).
For the GOR relationship between M s,ISC and M w,GCMT , 15,728 events were used in the ranges 3.0 M s,ISC 6.1. Similarly, 7579 events were used in the magnitude range 3.6 M s,NEIC 6.1 for the GOR relationship between M s,NEIC and M s,GCMT. Furthermore, for the conversion of higher surface wave magnitudes, 2026 events were used, combining ISC and NEIC data into the magnitude range 6.2 M s 8.4. The regression parameters for the corresponding regression relationships are shown in Table 2 and the plots are shown in Figure 1(g-i), respectively.
A magnitude-intensity relationship based on data from 29 earthquakes in India and nearby regions from 1897 to 2016, with independent MMI (I 0 ) and moment magnitude (M w ) identified from different sources, was developed as follows: where I 0 is the maximum epicentral intensity (Figure 1(j)). Historical earthquakes with only intensity values were changed into M w using the above empirical intensity relationship.

Local magnitudes
The relationship between local magnitude and moment magnitude, based on 100 earthquakes in Northeast India was derived for the time period 1976-2005. The derived GOR1 relationship with g ¼ 1 is given as follows:

Duration magnitude
The relationship between duration magnitude and moment magnitude, based on 376 global data earthquakes collected from ISC database, was derived as follows:

Scheme for magnitude conversion into moment magnitude M w
The following scheme was followed for the conversion of different magnitude types into the unified moment magnitude M w .
(1) The unassigned magnitudes in the catalog by Gupta et al. (1986) are treated as M s (Thingbaijam et al. 2008;Das et al. 2013).
(2) When m b and M s magnitude types are reported for an event, magnitude type m b 6.0 and above M s are chosen.
(3) In case of small m b magnitude events, the appropriate global relationship is used in the absence of a regional regression relationship. (4) If only intensity data are available for an event, then M w is obtained by using the relationship between intensity and M w developed in this study. (5) Scordilis (2006) and Das et al. (2011) reported the equivalent of M w,GCMT and M w,NEIC . Therefore, in the absence of primary M w,GCMT , M w,NEIC values in the catalog are considered as almost identical proxies.
The unified catalog compiled here can be obtained from the author on request. A sample dataset of a homogenized catalog with 500 events is provided in the electronic version.

Earthquake catalog declustering
An earthquake sequence generally consists of foreshocksmain shocksand aftershocks. The foreshocks and aftershocks, being dependent events, should be eliminated from the catalog to estimate seismic hazards. Several methods have been proposed for declustering a catalog (e.g. Gardner and Knopoff 1974;Reasenberg 1985;Uhrhammer 1986). Declustering was conducted by using Uhrhammer (1986), following a moving space and time window approach (Figure 2). The time and distance window is shown in Table 3. After declustering 3454 events removed from the catalog for the period 1897-2012, they were homogenized with GOR1 procedure.

Determination of magnitude of completeness
To find the completeness of the homogenized catalog after declustering, eight different methods were employed using Zmap software (Table 4). The methods most frequently used were the entire magnitude range method (EMR) (Ogata and Katsura 1993;modified by Woessner and Wiemer 2005), maximum curvature (MAXC) method (Wiemer and Wyss 2000), the goodness-of-fit test (GFT) (Wiemer and Wyss 2000), and M c determination by b-value instability (Cao and Gao 2002). EMR method shows a reasonable value for the complete dataset, therefore, using EMR method for magnitude of completeness is better.

Summary and conclusions
This study aims to obtain a homogenized and complete earthquake catalog by developing GOR conversion relationships, following an improved error-corrected methodology from Das et al. (2018b). In this regard, regional regression relationships were derived for m b to M w from ISC and NEIC databases in the magnitude range 4.8 m b,ISC/ m b,NEIC 6.1, using 116 and 106 events, respectively. Similarly, for the conversion of M s into M w , regression relationships were derived for the magnitude ranges 4.1 M s,ISC 6.1 and 4.2 M s,NEIC 6.1, using the dataset of 93 and 53 events for ISC and NEIC, respectively. As regional relationships do not cover the smallest magnitude range and there is a big number of smaller magnitudes, global relationships were derived for the conversion of body and surface wave magnitudes into moment magnitudes. In this regard, regression relationships were derived for the conversion of m b,ISC into M w,GCMT in the magnitude range 2.9 m b , ISC 6.1, and m b,NEIC to M w,GCMT in the magnitude range 3.8 m b , NEIC 6.1. GOR relationships were derived with g ¼ 0.2 based on a dataset of 22,803 and 22,340 events, respectively. For the regression relationship between M s,ISC and M w,GCMT , data from 15,728 events were used in the range 3.0 M s,ISC 6.1. Similarly, for the relationship between M s,NEIC and M s,GCMT , 7579 events were used in the magnitude range 3.6 M s,NEIC 6.1 . In addition, for the conversion of higher surface wave magnitudes, 2026 events, combining ISC and NEIC data in the magnitude range 6.2 M s 8.4, were used.
A complete and consistent homogenized earthquake catalog prepared using these relationships could provide good data for studying earthquake distribution in Northeast India. By carefully changing these original magnitudes into homogenized M w magnitudes, an obstacle was removed for the consistent assessment of seismic hazards in Northeast India.
For the region studied, data from 9845 earthquake events for the period 1897-2012 were compiled from various databases (e.g. ISC, NEIC, GCMT, IMD, NEIST). For the historical seismicity from 1897 to 1962, data were taken from the catalog by Gupta et al. (1986). For the period 1964-2010, data were compiled from global ISC and NEIC databases. Data for 1963 were taken from International Seismological Summary (ISS). In addition, GCMT and NEIC moment magnitude data were considered for the periods 1978-2012 and 1964-2012, respect-ively. The number of annual earthquakes recorded by ISC, GCMT and NEIC for Northeast India are shown in Figure 3. Some events for the period 1999-2006 from India Meteorological Department (IMD) seismological bulletins, New Delhi, and the catalog by Bapat et al. (1983) were also considered. In addition, data from global events were considered to develop regression relationships to obtain the conversion of lower magnitude ranges not covered by regional conversion relationships. A sample of 500 homogenized events is shown in Electronic Supplement (Table 6).

Disclosure statement
No potential conflict of interest was reported by the authors.
and the regression model is where b and a are the slope and intercept of linear relationship, respectively. If, s 2 Y t , s 2 X t and s XtYt are the sample covariances of Y t , X t and between Y t , and X t , then where the error variance ratio From the above simultaneous Equations (A4)-(A6), we obtain b ¼ 2s XtYt X t Y t : (A8) Figure 4. Plots showing the theoretical true points (x t , y t ) corresponding given observed value (X t , Y t ) using (a) SLR; (b) GOR. Plots also show the deviations (in case of SLR, e.g., y 1 -y 1 , and in case of GOR, e:g:, y 1 -Y 1 1 ) between theoretical true values y t and estimated dependent variables on direct substitution of X t in Equation (3). Plot also explain the Euclidean distance used during the derivation of GOR does not maintain in the estimation (Das et al. 2018a).
The estimator for a can be obtained from the relation where X t and Y t are the average observed values. GOR estimations are inappropriately used in most of the seismic literature, therefore, it is important to discuss the issues to provide a complete view on this subject. The discussions will be conducted using two cases (Das et al. 2018a(Das et al. , 2018b: Case 1. A Standard Linear Regression (SLR) line is derived considering data pairs (X 1 , Y 1 ), (X 2 , Y 2 ) and (X 3 , Y 3 ), and the corresponding true points, that is, (x 1 ¼ X 1 , Y 1 ), (x 2 ¼ X 2 , Y 2 ) and (x 2 ¼ X 3 , Y 3 ), respectively, on the SLR fitting line, using vertical distance minimization. These true points on the line are used for estimating the best fitting SLR line, minimizing vertical residuals. After substituting the independent variables X 1 , X 2 , X 3 in the SLR line obtained, the exact true points can be obtained (see Figure 4(a)). Thus, statistical distance, which is used in the derivation of SLR line, is also used in the estimation of the dependent variable for a given independent variable.
Case 2. A GOR fitting line is derived considering the data pairs (X 1 , Y 1 ), (X 2 , Y 2 ) and (X 3 , Y 3 ) with errors in both variables. The true points of these data pairs ((X 1 , Y 1 ), (X 2 , Y 2 ) and (X 3 , Y 3 )) on the GOR line, that is, (x 1 , y 1 ), (x 2 , y 2 ) and (x 3 , y 3 ) are given by minimizing Euclidean distance. In substituting X 1 , X 2 , X 3 values in the GOR line obtained, the corresponding true points cannot be obtained, unlike Case 1. Therefore, the conventional GOR procedure does not follow the statistical Euclidean distance criteria. In substituting, instead of obtaining the true points, totally different points on the GOR line are achieved (see Figure 4(b)).
In view of the above discussion of Cases 1 and 2, the author recommended estimating firstly the true value of x i through SLR relationship between true abscissa (x t ) on GOR line and X t ., and then obtain Y t estimate by substituting an estimated x t in the GOR relationship or directly derive SLR relationship between X t and true ordinate (y t ).
It is observed that Castellaro et al. (2006) and Castellaro and Bormann (2007) inappropriately used the Euclidean distance and derived the GOR line. Thus, their conclusions are invalid for GOR conventional type. Therefore, it is incorrect to use these studies for future references. To point out inaccuracy, Fuller (1987) Euclidean distance is denoted by S F , Williamson (1968), Euclidean distance is denoted by S w and Castellaro et al. (2006), and distance is denoted by S C  Fuller (1987), Williamson (1968) and Castellaro et al. (2006) (A12) Table 5 shows that S F and S w are same, but S C is quite different because Castellaro et al. (2006) and Castellaro and Bormann (2007) used incorrect equations (details are given in Das et al. 2018a).