Semi-automated template matching and machine-learning based analysis of the August 2020 Castelsaraceno microearthquake sequence (southern Italy)

Abstract The accurate characterization of microearthquake sequences allows seismologists to better understand the physical processes involved in earthquake nucleation and rupture propagation and to gain insights on fault geometry at depth. Standard procedures for seismic sequences analysis are based on manual detection and phase-picking, requiring a huge amount of work from expert seismologists, particularly in the case of microseismic events. Here we show how the investigation of a low-magnitude foreshock-mainshock-aftershock sequence, occurred in August 2020 close to Castelsaraceno village (southern Italy), greatly benefited from the application of a semi-automated template matching and machine-learning based workflow. The phase-picking was automatically performed through a deep-learning algorithm on 202 microearthquakes detected between July and October 2020, followed by an automatic multi-step absolute and relative earthquake location procedure. The 72 relocated events of the seismic sequence were clustered in time (7–12 August) and in a narrow range of depths (10–12 km). The Ml 2.1 foreshock doublet and the Ml 2.9 mainshock identified a persistent asperity. The joint analysis of aftershocks distribution, the mainshock focal mechanism, and the geology of the study area suggest the occurrence of the sequence along a NNE-SSW left-lateral, transtensional fault in the brittle portion of the crystalline basement.


Introduction
When talking about earthquakes, public opinion immediately turns its attention to the most catastrophic seismic events, which are usually the ones making the news CONTACT Serena Panebianco serena.panebianco@imaa.cnr.it;Vincenzo Serlenga vincenzo.serlenga@imaa.cnr.itSupplemental data for this article can be accessed online at https://doi.org/10.1080/19475705.2023.2207715.and affecting people's lives the most.Nevertheless, the number of large earthquakes is definitely smaller than the low magnitude ones (Gutenberg and Richter 1954).This latter is the reason why the seismological community puts its efforts into the analysis of microseismicity: the higher number of available data may serve as a magnifying glass through which it is possible to inspect the mechanisms underlying seismic activity (Brodsky 2019).
In this context, an important role is played by micro-earthquake sequences, during which the seismic events are spatially, temporally and dynamically related to each other.Mogi (1963) classified three distinct types of seismic sequences: a) mainshockaftershock; b) foreshock-mainshock-aftershock; c) earthquake swarm.Each distinct typology is characterized by a different pattern of successive shock occurrence, in turn related to the structural state and the space distribution of stress inside the crust.Over the years, seismologists dug into waveforms generated by microearthquake sequences for several purposes.The most immediate is to gain insights on the geometry of fault structures at seismogenic depth: in that way, previously unmapped fault structures as well as geometrical complexities due to the presence of multiple fault strands, kinks, stepovers have been illuminated by the space distribution of earthquake hypocenters, obtained by both absolute and relative seismic location methods (e.g.Waldhauser and Ellsworth 2002;Valoroso et al. 2013;Shelly et al. 2015;Stabile et al. 2021).The accurate study of micro-earthquake sequences may also allow gaining precious insights on the foreshock behaviors and on nucleation processes (Ross et al. 2019b), on the role of fluids and/or creep governing the migration of hypocenters along faults (e.g.Dublanchet et al. 2015;Shelly et al. 2016) and the aftershock propagation (Miller et al. 2004), the role of static stress transfer in the event-to-event triggering (Stabile et al. 2012;Ellsworth and Bulut 2018).
The seismological appeal of very small earthquakes pushes seismologists to overcome the intrinsic difficulty of their detection: the lower is the magnitude, the lower is the signal-to-noise ratio (SNR, hereafter) and the more demanding will be the challenge to unearth them in the continuous seismic data-stream.Nonetheless, the manual detection of low magnitude seismic events has a twofold disadvantage: firstly, it is quite time-consuming, secondly the standard earthquake catalogs, filled in by the activities of human analysts, are inherently incomplete (Mignan and Woessner 2012) being thus strongly deficient in resolution to highlight some precious characteristics about the space-time evolution of seismic swarms and sequences.Therefore, several algorithms have been developed and adopted by the seismological community for lowering the completeness magnitude of finer resolution seismic catalogs through the detection of almost completely hidden earthquakes (e.g.Adinolfi et al. 2020), this effort being particularly worthy during seismic sequences (e.g.Roberts et al. 1989;Yoon et al. 2015;Caffagni et al. 2016;Bergen and Beroza 2018;Festa et al. 2021;Stabile et al. 2021).Indeed, detecting low magnitude events leads to a decrease of the average time separating consecutive earthquakes, thus allowing a better understanding of the hidden processes which may justify the possible connection of an event of the seismic sequence to another.Nonetheless, driving down the minimum magnitude of detection of seismic events determines a meaningful increase in the data to be analyzed.Therefore, Machine Learning (ML hereinafter) algorithms are surging forward to help seismologists in the management and in the semi-automated analysis of the huge seismic data volume currently acquired by the modern and high-density seismic networks.
It is in this very challenging and current scientific context that our study is situated.We apply a semi-automated, machine-learning based approach on a microearthquake seismic sequence, with local magnitudes ranging from À0.8 to 2.9.The seismic sequence occurred in August 2020 close to the municipality of Castelsaraceno in the High Agri Valley (HAV), southern Italy.This area is well known for its high seismic hazard, strictly related to both induced (Stabile et al. 2014a;2014b;Improta et al. 2015;Rinaldi et al. 2020) and up to M7 natural seismicity, as testified by the strong earthquake that occurred in this area in 1857.Therefore, a widespread monitoring of the HAV is carried out by the deployment of seismic stations and multiparametric geophysical networks (e.g.Stabile et al. 2020).Furthermore, the intense oil exploration activity led to an advanced knowledge of the geology of this sector of the chain (Finetti et al. 2005;Patacca andScandone 2007, 2013;Bonardi et al. 2009;Vezzani et al. 2010).These two factors make the HAV a natural laboratory for seismological studies.
Hereafter, in the first section we will provide the geological framework of the seismic sequence; then we will describe the seismic networks adopted for our study and the methodological workflow implemented and adopted, which is composed of: earthquake detection, automated phase arrival time picking and hypocenter location in a detailed 3D velocity model, and source characterization.The main results will be discussed in the last section, considering the tectonic and geologic conditions of the investigated area.

Geological framework of the seismic sequence
The low-magnitude earthquake sequence that occurred in August 2020 to the northwest of the Castelsaraceno village was characterized by a Ml 2.1 foreshock (origin time: 2020-08-07, 08:52:31 UTC), followed a few hours later by the Ml 2.9 mainshock (origin time 2020-08-07, 13:34:37 UTC).After the mainshock, several aftershocks occurred, classifiable as microseismic events with Ml 1.1.
The sequence is located in the southwestern sector of the HAV, in the axial region of southern Apennines (Figure 1a).This orogenic segment is a fold-and-thrust belt developed from the Upper Oligocene-Lower Miocene towards NE (Gueguen et al. 1997;Patacca and Scandone 2007).Figure 1b shows the location of the seismic sequence within the geological and structural setting of the study area along with the trace of a geological section (AB, Figures 1b and 10) suitably chosen to highlight the structural architecture of the main tectono-stratigraphic units.The highest mesozoic North-calabrian Unit with related thrust-sheet-top deposits of the Albidona Fm overthrust both the Lower Miocene-Jurassic shallow water carbonates of the Appennine Platform, and deep-sea Lower Cretaceous-Triassic Lagonegro Units (Carbone et al. 1991(Carbone et al. , 2018)).These units, in turn, overthrust the Upper Messinian -Lower Pliocene siliciclastic sediment of the Apulian Platform (green unit in Figure 10) consisting, from the top to bottom, of: i) Messinian -Lower Pliocene foredeep siliciclastic sediments and ramp carbonates succession some hundred meters thick; ii) Lower Cretaceous-Triassic shallow-water carbonates platform, stromatolitic dolomites and evaporites about 6 km thick (Patacca andScandone 2001, 2007).The Apulian Platform succession was deposited on terrigenous and subordinately carbonate Permo-Triassic sediments, several kilometers thick in turn covering the crystalline basement of the upper continental crust (Patacca et al. 2008).From the tectonic point of view, detailed geological-structural studies on the evolution of the southern Apennines and the neighboring sectors of the study area (Catalano et al. 2004;Mazzoli et al. 2014;La Bruna et al. 2017;Bucci et al. 2019;Bello et al. 2022) showed that a complete orogenic cycle is recorded, with the following tectonic steps: (1) lithosphere subduction below a NE verging accretionary wedge and emplacement of the allochthonous units by means of low-angle thrust faults (Lower Messinian -Early Pliocene boundary) coupled with re-activation of high angle reverse faults within the Apulian carbonates (Lower Messinian -Pliocene); (2) left-lateral, strike-slip motion along NW-SE striking faults (Lower Pleistocene); (3) extensional stress regime with the activation of several normal faults and reactivation of pre-existing tectonic structures (Middle Pleistocene-Holocene).This extensional tectonic regime is still active, at relatively low rates (2-5 mm/yr; Ferranti et al. 2014) as witnessed by NW-SE striking, oppositely dipping, high-angle normal and oblique fault systems (Figure 1a).These border the HAV basin and currently represent the main seismogenic structures in the area: the Monti della Maddalena Fault System (MMFS) and the Eastern Agri Fault System (EAFS) (Cello et al. 2003;Maschio et al. 2005;Giocoli et al. 2015;Improta et al. 2017;Bello et al. 2022).Looking at its historical seismicity (e.g.CPTI11 catalogue; Rovida et al. 2011) the HAV is one of the highest seismic hazard areas in Italy, with seven Mw > 4.5 historical events occurring in the last 200 years, including the 1857 Mw 7.0 Basilicata earthquake (Burrato and Valensise 2008).However, the natural seismicity currently consists of sparse, low-magnitude (Ml 4.0) earthquakes (Stabile et al. 2020) which can be locally clustered in microseismic sequences and swarms (Stabile et al. 2015;Improta et al. 2017;Serlenga and Stabile 2019).

Data and methods
In this section, we describe the virtual seismic network which provided seismic records analyzed in this work; then, we illustrate the procedure of manual earthquake identification.Finally, we depict in detail the proposed semi-automated workflow (Figure 2) which was applied for the refined detection, location and characterization of the Castelsaraceno sequence.

Virtual seismic network
Seismic data were mainly recorded by the stations of the local HAVO network (formerly INSIEME; Stabile et al. 2020), located at a maximum epicentral distance of $20 km from the sequence cluster.The seismic network (FDSN code: VD, https:// doi.org/10.7914/SN/VD) is part of the High Agri Valley Geophysical Observatory (HAVO), a multi-parametric network managed by the CNR-IMAA research institute designed to achieve two main purposes: (a) to study the seismic processes related to the occurrence of events belonging to two induced seismicity clusters in the HAV and (b) to provide the scientific community with open-access, high-quality seismic data (Stabile et al. 2020).The HAVO network is composed of eight stations deployed in an area of about 17 km Â 11 km.The minimum distance between stations ranges between 2.7 km (SARSB and SARCL stations) and 5.4 km (MONCM and MONTM).Each station is equipped with triaxial weak-motion broadband sensors: six 0.05-100 Hz and two 0.0083-100 Hz (MONCM, SARCL) Trillium Compact Posthole (TCPH) seismometers (see Stabile et al. 2020, for further details).To better investigate the main properties of the seismic sequence and to improve the quality of earthquake locations, a virtual seismic network made by a total of 27 seismic stations was put together, adding recordings of external stations located within about 60 km distance from the center of the HAVO network (Table 1): (a) 11 seismic stations of the Italian National Seismic Network (RSN) with a 100 Hz data sampling frequency (FDSN codes: IV, https://doi.org/10.13127/SD/X0FXnH7QfY;MN, https://doi.org/10.13127/SD/fBBBtDtd6q) and managed by Italian National Institute of Geophysics and Volcanology (INGV); (b) 7 stations belonging to the Irpinia Seismic Network (Weber et al. 2007;Stabile et al. 2013) with data sampled at 250 Hz (FDSN code: IX), and (c) the MARCO station belonging to the GEOFON network (FDSN code: GE, https:// doi.org/10.14470/TR560404)with a sampling frequency of 100 Hz.

Semi-automated workflow
Standard workflows for the study of low-magnitude earthquake sequences mainly involve manual event detection and phase-picking, both quite time-consuming.In this study, we used manually detected earthquakes occurred in the period between 7 and 10 August as input to our workflow, thus simulating a dataset produced by a seismic observatory from manual revision by expert seismologists.Indeed, the starting catalog has been realized by a visual inspection of the seismic data stream through a dedicated intranet system (WebObs, Beauducel et al. 2020) available at the CNR-IMAA data center.WebObs consists of a near real-time multi-record strip-chart (SefraN) of the seismograms of the configured stations from the local HAVO network and some of the stations belonging to the virtual network (MARCO, SLCN, MCEL, MTSN, SIRI, SCHR).
The proposed semi-automated, 4-stages workflow was aimed at a more thorough and faster analysis of seismic sequences, involving the integration of manual, semiautomated and automated techniques.The analysis developed as follows (Figure 2): 1. we applied a single-station waveform template matching technique looking for microearthquakes belonging to the sequence (Roberts et al. 1989;Stabile et al. 2021); 2. we discriminated between true and false events detected in the previous automatic step by visual inspection; 3. the P-and S-wave arrival times automatic picking was performed by a deepneural-network algorithm (PhaseNet; Zhu and Beroza 2018); 4. first, absolute locations were automatically performed using a probabilistic nonlinear approach (NLLoc; Lomax et al. 2000) in an iterative multi-step procedure to clean arrival times from inconsistent picks and progressively reduce location errors; then, relative double-difference locations (HypoDD; Waldhauser and Ellsworth 2000) have been automatically performed starting from the absolute locations from the previous stage: to this purpose, we used restrictive location parameters (see Section 3.2.2) to further select the located earthquakes belonging to the sequence and to improve the seismogenic fault imaging resolution.
Then we characterized the source of the catalogue earthquakes retrieved by the workflow through the estimation of: a. the local magnitude for all the absolute located events; b. the mainshock source parameters (seismic moment M 0 , corner frequency f c , and stress drop Dr) through the modeling of the S-wave displacement spectra (SourceSpec algorithm, Satriano 2021); furthermore, we estimated the focal mechanism of the mainshock (BISTROP, De Matteis et al. 2016) and assumed it as representative of the sequence; c. the source radius of all the earthquakes: under the hypothesis of self-similarity of the micro-seismic sequence (Aki 1967) we set the stress drop Dr equal to the one computed from the S-wave displacement spectral inversion of the mainshock (see Section 3.2.3).

Earthquake detection and phase arrival time picking
For the detection of small events possibly missed by the manual approach due principally to their low SNR, we applied the single-station template matching algorithm (Stabile et al. 2021).It takes into account waveform similarity between 3-component continuous data streams recorded at a single station, and properly chosen earthquake waveforms (master templates), recorded at the same station.The analysis was performed on seismic data recorded by the two nearest stations to the cluster: SARCL (3.5 km epicentral distance from the mainshock) and SPINS (5.9 km epicentral distance from the mainshock).We used 4 and 6 master templates for SARCL and SPINS stations, respectively (Supplementary Figure S1 and S2) which were therefore cross-correlated with the respective continuous data stream recorded in the period between July and October 2020.In order to prevent any subjective choice of the master templates, the following procedure guided the templates' selection: i) as initial step, the mainshock was used as master template; ii) a list of automatically detected events was thus obtained; iii) the first 'not-manually-identified' earthquake belonging to the list of the template matching detected events was then used as further master template; iv) a new list of detected events was retrieved; v) as in the point iii), and so on, until the catalog of manually detected events was fully retrieved.The final selected templates are summarized in Supplementary Table ST1: three common master events were used, such that a total number of seven templates was selected for the analysis.Following a similar procedure to that one adopted by Stabile et al. (2021), all the three-component templates and the continuous data streams ground velocities of each analyzed station were first band-pass filtered in the range 1-35 Hz, and then differentiated to obtain their ground acceleration to amplify the peak frequency of the signals.For each template, a cross-correlation window of 1.2 s was selected around the first P-wave arrival on the vertical component (CHZ) and around the S-wave arrival on the horizontal components: CH1, CH2 for SARCL and CHE, CHN for SPINS.Finally, only the signals that simultaneously fulfilled the following two main conditions were detected: (1) the obtained cross-correlation parameter (XC') between the master template and the station component data stream was greater than 0.5; (2) the previous condition was satisfied on the vertical component and at least on one of the two horizontal components.We set a relatively low value of the detection threshold (XC' ¼ 0.5) as we wanted to maximize the number of detected events.However, these led to a high number of false detected events which were manually dismissed by visual inspection performed by expert seismologists (e.g.Stabile et al. 2021).The procedure enabled us to increase our manually detected dataset of more than a hundred events (see Section 4) and to obtain a total number of 202 detected earthquakes in the analyzed timespan (July-October 2020).
For the P-and S-waves arrival time pickings, we applied a completely automated approach through the application of the PhaseNet algorithm (Zhu and Beroza 2018).PhaseNet is a deep neural network algorithm, which allows to perform seismic phase picking without any hand-designed input parameter.Indeed, thanks to their multilayer neural networks structures, deep-learning algorithms are capable of 'understanding' data in a hierarchical way, avoiding the need for human operators to formally specify the training features that machines require to 'learn' efficiently.For each earthquake, three minutes-long waveforms recorded on the three-components of all the stations of the virtual network, bandpass filtered in the range 2-40 Hz, have been given in input to PhaseNet, starting 60 s before the detection time (obtained from the single-station template matching algorithm), and ending 120 s later; for each record, a 30-seconds sliding window was chosen for the analysis.The P-wave, S-wave and noise probability distributions are calculated in output from the deep-learning algorithm, which defines the corresponding P-and S-wave arrival times (Figure 3) where the resulting probability distribution peak values overcome a certain threshold.A threshold of 0.3 was chosen for our dataset (Figure 3).

Absolute and relative earthquake locations
The P-and S-wave arrival times automatedly picked by PhaseNet were inverted in the 3-D velocity model of the HAV (Serlenga and Stabile 2019) for retrieving the absolute location of earthquakes.We applied the differential time (EDT) method implemented in the non-linear global approach (NonLinLoc).To our purpose, we implemented an automated, iterative four-step absolute earthquake location: 1) a first absolute location is performed from the inversion of all the P-and S-wave arrival times; 2) we inverted only the data with residuals lower than 3 s, properly corrected by the station delays obtained in the step 1); 3) we inverted only the data with residuals lower than 1 s, properly corrected by the station delays obtained in the step 2); 4) we inverted the remaining data, corrected by the station delays obtained in the step 3).Such a procedure aims at progressively discarding, at each iteration, the high residuals seismic phases (both P and S phases residuals), possibly related to wrong and/or multiple automated arrival time picks.
Finally, the locations were refined by applying the double-difference method (HypoDD code, Waldhauser and Ellsworth 2000), starting from the hypocentral parameters determined from the absolute locations and solving the double-difference equations in the same 3-D P-and S-wave velocity model used for absolute locations.Relative locations were performed with highly restrictive inversion parameters.Thus, only events with a maximum hypocentral separation of 7 km (MAXSEP parameter) and with a minimum number of 6 P-and S-wave linked differential arrival time observations (MINLINK parameter) were relocated.We set a maximum distance between linked pair of seismic events equal to 2 km (WDCC and WDCT parameters), and a threshold value for phase residuals equal to 0.1 s (WRCC and WRCT parameters).Relative locations were performed by linking each event to a maximum of 20 other events of the cluster (MAXNGH parameter), thus obtaining a total number of 14382 differential arrival times of P-and S-waves, the latter being computed both using catalog data (CT) and waveforms cross-correlation (CC) computed in the frequency domain using the CCHAR program (Rowe et al. 2002).The conjugate gradient method (LSQR, Paige and Saunders 1982) implemented in HypoDD was used for the minimization of the double-difference residuals for pairs of earthquakes at each station.Indeed, it is able to solve a large system of equations efficiently and hence it is more suitable for automatic locations (Waldhauser and Ellsworth 2000).Nevertheless, since it is well known that generally location errors are grossly underestimated with LSQR (Waldhauser and Ellsworth 2000), we a posteriori manually assessed them by using SVD (Singular Value Decomposition) on a subset of events.

Source characterization: magnitude computation, focal mechanism, and spectral inversion
Along with the hypocentral location, the characterization of seismic source parameters enables to gather insights on the physical and mechanical processes involved in the earthquake nucleation and the energy release.For each absolute located event, we estimated the local magnitude applying the formulation proposed for southern Italy by Bobbio et al. (2009): where Ml i is local magnitude of the i-th located event, A ij is the peak displacement (mm) measured from the signal of the i-th event recorded by the j-th station convolved for the response function of the Wood-Anderson seismograph, R ij the hypocentral distance (km) of the j-th station from the i-th event and w is the Huber mean estimator applied to the set of single-station magnitude values (Huber 1964).
We determined the focal mechanism to retrieve information on the mainshock fault plane orientation and the stress field in which it occurs (Figure 7).Knowing the azimuth and take-off angle at which the ray leaves the source -computed for an assumed location and seismic velocity model -the most used approach for focal mechanism calculation is the observation of the P-wave first-motion polarities at different stations, and their projection over a reference sphere (focal sphere) around the source.To better constrain the fault plane solutions, we instead adopted a Bayesian approach implemented in BISTROP (De Matteis et al. 2016), which allows for the joint inversion of the P-wave polarities and the long-period spectral level P/S ratios.This method leverages the relation between the theoretical radiation pattern and the low frequency spectral amplitude computed from the P-and S-waves and exploits the possibility of easily computing the spectral ratios for low-magnitude earthquakes (M < 3) as well.Twelve first motions (FM) polarities and nine P/S ratios were adopted for the inversion.The robustness of the final solution was confirmed by the low values of the Kagan angle median (KA-Med) and median absolute deviation (KA-Mad), equal to 4.3 and 1.3 , respectively (Figure S3).These values are computed from the difference between the final focal mechanism and each solution with a probability higher than 90% of the probability of the best solution retrieved.Therefore, the smaller the KA-Med and KA-Mad are, the more constrained the focal solution is (Adinolfi et al. 2022).
The source parameters (M 0 , f c , Dr) of the mainshock were retrieved through the S-wave displacement spectra inversion using the SourceSpec algorithm (1.5 Release) (Satriano 2021).The code estimates single event source parameters from station recordings through a x À2 Brune's source spectrum inversion model (Brune 1970).A total of 17 stations of the virtual network (out of 27) were used for the inversion: seven stations from the HAVO Network (sampled at 250 Hz); MARCO, CAGG, SIRI, MGR, SLCN, PTRP (data sampled at 100 Hz) and MRN3, CGG3, PGN3, SRN3 (data sampled at 125 Hz).We performed the inversion only for stations with a minimum signal-to-noise spectral ratio (spectral_sn_min) equal to 5.0, ignoring also the ones lacking in automatic picks.The source parameters inversion was performed through a full grid search approach to efficiently sample all the combinations in the (M 0 , f c , t Ã ) parameters space.The t Ã is the attenuation parameter, defined as the ratio between the wave travel time and the quality factor Q. We discretized the parameters domain in an 80 Â 200 x 150 nodes' grid.The initial values of corner frequency and seismic moment were estimated at the first inversion step by the code.On the other hand, the t Ã initial value for the inversion (t Ã 0 ) was set by us based on the knowledge of elastic and anelastic properties already available for the study area.It was chosen a value of 0.045, as calculated from: The term x is the mean epicentral distance calculated as the average epicentral distance of all the stations from the mainshock; V S and Q S are the average S-wave velocity and quality factor of the area, set equal to 3.4 km/s (Improta et al. 2017) and 200 (Amoroso et al. 2017), respectively.With a similar approach the t Ã bounds of the inversion grid (t Ã min , t Ã max ) were estimated equal to 0.01 and 0.085, respectively, given by: where x min and x max are the minimum (12 km, SARCL) and maximum (44 km, PGN3) station epicentral distance from the earthquake, respectively; Q S,min and Q S,max are the minimum and maximum S-wave quality factor estimated for the area equal to 150 and 300, respectively (Amoroso et al. 2017).Finally, we assumed a crustal density of 2700 kg/m 3 and a P-wave velocity (V P ) of 6.5 km/s (Improta et al. 2017).We defined the seismic moment inversion bounds in terms of the respective moment magnitude (M W ) bounds allowing a variability of 0.35 around the initial value.The f c bounds were automatically set by the inversion code.
The mainshock source parameters were estimated by inverting the spectral amplitudes computed at each station on a 2.0 s time window (win_length) for both S-wave signal and noise; the two time-windows were zero-padded up to 10 s, thus allowing the spectral resolution to be increased.We chose a start time of the noise window, with respect to the P-wave arrival time (pre_p_time), of 3.0 s and a start time of the S-wave window (pre_s_time), antecedent to the S-wave arrival time, of 0.4 s (Figure 6).Signal waveforms were bandpass filtered at each station before the analysis.Since stations of different networks adopted different sensors (e.g.accelerometer, short period, broadband) and were characterized by distinctive sampling frequencies, different frequency ranges for the filtering were chosen.The same minimum value of 0.1 Hz was selected, while the maximum filtering value for each station was chosen equal to the Nyquist frequency (except for the HAVO stations for which a max frequency of 110 Hz was chosen).
Looking at the S-waves and noise displacement spectra, we observed at many stations an abruptfall of both the noise and the signal amplitude at about 40-60 Hz (Supplementary Figure S4).The latter effect was investigated after checking that no mistakes were made in the signal pre-processing.Looking at stations power spectral densities (PSD) calculated for a long-time span of about one year (from 01-07-2020 to 30-05-2021 at http://services.iris.edu/mustang/)we observed a constant value of about À130 dB between 40-100 Hz (Supplementary Figure S5).This anomaly, consistent with the observed abrupt fall of the spectra at those frequencies, could be related to the near-site attenuation K(x) (Hanks 1982;Scherbaum 1990;de Lorenzo et al. 2010).These effects on the high-frequency level are a known problem for source parameters studies, particularly for microearthquakes, since they can alter the shape of the earthquake spectra at frequencies comparable to the corner frequency (Moratto et al. 2019).Moreover, attenuation site effects are also involved in the so-called corner frequency saturation (Hanks 1982), which is an apparent departure of small-magnitude events from the scaling law by Aki (1967).On these grounds, we performed the mainshock source parameters spectral inversion in the frequency range 0.5-40 Hz (Figure 6).Given the microseismic nature of the Castelsaraceno sequence, we chose to characterize all the other earthquakes belonging to it only in terms of the seismic moment (M 0 ) and the resulting moment magnitude (Mw) to avoid underestimations of the corner frequency.Indeed, the M 0 estimation is better constrained by the inversion because it is related to the low frequency amplitudes of displacement spectra.With this purpose we estimated M 0 for each microearthquake of the sequence by using the same inversion parameters applied for the mainshock, except for: (a) a lower minimum signal-to-noise spectral ratio (spectral_sn_min) of 3.0; (b) a zero-padding of the selected time window (spectral_win_length) up to 5 s (instead of 10 s).Then, the retrieved M 0 values were used to estimate the source radius r of all the events of the sequence, in the assumption of a circular plane rupture and using as fixed stress drop (Dr) the value estimated for the mainshock.Thus, we calculated the source radius of each event as (Borok 1959;Madariaga 1976):

Results and discussion
Manual detections through the WebObs interface let us initially identify a total number of 65 local earthquakes that occurred in a time span between August 7 and August 10, 2020, which were well recorded by the eight broadband stations of the HAVO network.Among them, we identified a seismic sequence close to Castelsaraceno town with the main Ml 2.1 foreshock and the Ml 2.9 mainshock occurred a few hours later.The last manually recognized earthquake of the sequence was detected on August 10 (origin time: 09:35:49 UTC).The high-resolution of the local HAVO seismic network allowed us to manually detect low magnitude events (median Ml ¼ -0.1) (Supplementary figure S6).However, the subsequent application of the single station template matching algorithm to SPINS and SARCL stations enabled us to increase the total number of catalog earthquakes to 202.We thus proved the template matching method both to be a powerful tool to identify a larger number of microearthquakes escaped from manual detection and to fastly investigate a wider period (July-October 2020) looking for further events ascribable to the Castelsaraceno sequence.The described automated absolute location approach (see Section 3.2.2) enabled us to progressively discard high residual seismic phases, possibly related to accidental picking mistakes or multiple pickings performed by the PhaseNet algorithm.At the end of the procedure, 167 out of the total of 202 events detected between 28/07/2020 and 12/10/2020 were located, and only 109 earthquakes occurred at 8 km epicentral distance from the mainshock (Figure 4a).The subsequent automatic double-difference, relative location allowed us to refine earthquake locations and further select events belonging to the sequence, obtaining a more accurate imaging of the fault.Indeed, earthquake relocations are characterized by both low horizontal and vertical location errors (median values of 0.08 km and 0.07 km, respectively), which are one order of magnitude smaller than the ones of the absolute locations (Supplementary Figure S7).The a posteriori assessment of relocation errors through SVD, performed on a subset of 50 events located near the mainshock (within 1.5 km epicentral distance), provided equivalent results (median horizontal error of 0.05 km; median vertical error of 0.04 km).
In conclusion, the initial number of 202 manually and automatedly detected earthquakes was reduced to a final dataset of 101 relatively relocated events.It comprised earthquakes that occurred between August 7 and October 12, 2020.Most of these fall inside the same cluster around the mainshock identified from absolute locations (Figure 4a).
The cluster comprises events with latitudes in the range 40.183 -40.253 , longitudes in the range 15.916 -15.943 and depths of 10-12 km, including the mainshock.Besides, the spatial distribution of the relocated events (Figure 4b) highlights three additional small clusters occurred in the area: (a) the first, consisting of 4 events, occurred between 7 and 8 August at similar depths of the main cluster ($11 km) but located to the East  ) of the study area and $9 km far from the mainshock; (b) the second is a swarm-like cluster with 13 events occurred in a larger timespan (21/08/2020 À 12/10/2020) which are characterized by shallower depths ($ 7 km) and are located $5 km far from the Castelsaraceno sequence  just to the SW of the Pertusillo Lake (Figure 4b); (c) lastly, a wider group of 9 events was located at $7 km south from the mainshock, in the southernmost limit of the virtual network , and is characterized by depths ranging between 8 and 11 km and occurred between 7 and 19 September 2020.Both spatial and temporal distributions of refined locations (Figure 5) allow a clear discrimination between the seismic events belonging to the sequence and the others.Indeed, the main cluster identified in the map around the mainshock is localized in a narrow range of depths between 10-12 km (black pointed circle, Figure 5a).These events occurred at relatively close distances but also in a relatively short time span (black dotted circle, Figure 5b), thus identifying the seismic sequence.The previous observations enabled us to identify 72 refined located earthquakes belonging to the cluster, which occurred between 7-12 August 2020 with a typical behavior of a foreshock-mainshock-aftershock sequence.Before the mainshock, 17 events were recorded in the morning on August 7 -between 4 and 11 am (UTC) -among which the aforementioned Ml 2.1 foreshock; furthermore, 54 aftershocks occurred on August 7 and for the following 5 days, with the last event of the sequence occurred on August 12, 2020 (origin time 17:50:02 UTC).
We estimated the source parameters (M 0 , f c , Dr) for the mainshock of the sequence through the S-wave displacement spectral inversion described in Section 3.2.3.The full inversion results at all the analyzed stations are reported in Supplementary material.The grid search misfit functions show the intrinsic correlations of the inversion parameters (Sonley and Abercrombie 2006;Zollo et al. 2014) with Mw negatively correlating with f c and quite well constrained by the inversion: the observed anti-correlation between the two parameters may be due to the irregular shape of the low-frequency displacement spectrum.The t Ã correlates with both corner frequency and Mw.The narrow shape of the misfit functions around the absolute minima of each subspace (Figure 6c) testifies that the best fit model parameters of the inversion are well constrained.The analysis provided average values of mainshock Mw, f c and Brune Dr equal to 2.7 (± 0.1), 11 (-2; þ3) Hz and 2.6 (-1.6; þ3.8) MPa, respectively.Referring to the relationship between Mw and Ml proposed by Zollo et al. (2014) for the Irpinia area (southern Apennines), for which Mw obtained for the mainshock is coherent, within the error, with the estimated local magnitude (Section 3.2.2) (Ml 2.9).Moreover, the obtained stress drop Dr ¼ 2.6 MPa is consistent with the stress drop estimations obtained from other authors for natural events in southern Italy (e.g.Stabile et al. 2012;Festa et al. 2021).
The application of the BISTROP method enabled us to obtain the fault plane solutions (FPS) of the main event: (a) the first nodal plane (FPS1) is characterized by a strike of 308 , a dip of 68 and rake of À126 ; (b) the second nodal plane (FPS2) has strike 190 , dip 41 and rake À34 , which are both consistent with the actual extensional stress-regime.We projected the refined hypocenters of the 72 earthquakes belonging to the sequence onto two sections orthogonal to the FPSs and centered in correspondence of the mainshock (Figure 7) with the aim to identify the principal plane through the aftershock signature.Indeed, the FPS2 orientation is in great accordance with the microearthquakes alignment along the dip in the corresponding orthogonal section (Figure 7c), thus indicating an anti-apenninic left-lateral strike-slip faulting kinematic with normal component (rake À34 ).The fault plane solution of the mainshock, constrained by the distribution of aftershocks (strike 190 ; dip 41 ), suggests the reactivation in the current extensional stress regime of a thrust which is compatible with those associated with left-lateral strike-slip NW-SE trending faults developed during Lower Pleistocene.
The retrieved seismic moment values of the other events of the sequence range between a minimum of 4.31 x 10 9 N m and a maximum of 1.47 x 10 13 N m.Looking at the cumulative seismic moment release (Figure 8a), it is clear that the main event is the main player: its seismic moment (M 0 ¼ 1.68 x 10 13 Nm; Mw ¼ 2.7) is about one order of magnitude larger than the cumulative seismic moment of the other microearthquakes (9.55 x 10 12 N m) (Figure 8a).However, we can recognize a minor contribution of the main foreshock (M 0 ¼ 1.73 x 10 12 N m; Mw ¼ 2.1), the only event of the sequence, alongside the mainshock, with an estimated Mw > 2.0.
The estimation of the source radius through the retrieved M 0 values (see Section 3.2.3)enabled us to gain further insights on the rupture processes which originated the sequence.We found that the source radius of most of the microearthquakes belonging to the sequence has an order of tens of meters (median of 15 m) except for the mainshock radius which is an order of magnitude larger (124 m).When looking at their locations, projected along the strike-dip plane and represented as function of time with respect to mainshock origin time (Figure 8b), we can notice and argue that: 1. most of the events of the sequence occurred within 1500 m along dip and within 700 m along strike, thus suggesting a preferential along-dip rupture propagation; 2. there is not an evident space-time migration of microearthquakes.It would exclude the contribution of hidden forcing transients in the sequence evolution such as pore-fluid diffusion or aseismic slip processes, but rather indicates the static stress transfer to be the driving factor: indeed, 'most aftershocks are located within few rupture lengths, where the static stress changes are relatively high' (Brodsky 2019); 3. the main foreshock and the mainshock are co-located on the rupture surface (Figure 8c).
In addition, we observed that: 1. the foreshock is a doublet earthquake with two similar seismic events occurred at a short distance in time (2 seconds).This similarity was assessed by means of a cross-correlation analysis in two steps: first we performed the autocorrelation for the foreshock waveform on a 10-seconds time window starting from the earthquake origin time (Figure 9a).We obtained a cross-correlation coefficient equal to 1.0 for time lag 0 and equal to 0.5 for time lags ± 500 samples, i.e. ± 2 seconds.Then, we computed the cross-correlation in a 1 s time window starting from the S-wave arrival time of the two seismic events of the doublet: a crosscorrelation coefficient equal to 1 for time lag 0 was again retrieved (Figure 9b); 2. there is a high similarity between the foreshock doublet and the mainshock waveforms.Indeed, both the cross-correlation analysis over the respective full-waveforms (Figure 9c) -performed in a 13 seconds window starting from the origin time of each event -and on the S-waves (Figure 9d) -in a 3.3 seconds time window starting from the S-phase arrival time of each event -confirmed their very high similarity, with a cross-correlation coefficient of 1.0 for time lag 0 (Figure 9c).
The observations from (3) to (5) allow us to define the foreshock and the mainshock as a seismic multiplet.Its occurrence may reflect seismic failures on a single, or on a set of coplanar asperities interacting in a small region of the same fault segment (Dublanchet et al. 2015).We finally suppose that the nucleation process of the sequence consists of a repeated failure occurring along a common ruptured seismogenic patch, thus identifying an asperity.
Finally, we investigated the frequency-magnitude distribution (FMD) (supplementary Figure S8) of the relocated events belonging to the Castelsaraceno sequence.Most earthquakes of the sequence have negative Ml values, with an estimated magnitude of completeness (Mc) of À0.4 ± 0.1 (using ZMAP code; Wiemer 2001), thus revealing the efficiency of the automated approach in detecting and locating small events which would have been missed with the manual approach.This helped us to better estimate the b-value.Indeed, we fit the FMD in the range Mc Ml 1.0 (only 5 events of the sequence have Ml > 1.0) by applying the nonlinear Levenberg-Marquardt least-squares algorithm (Marquardt 1963), obtaining a b-value of 0.73 ± 0.04 which is significantly lower than the value (about 1) estimated for southern Apennines (Gulia and Meletti 2007;Stabile et al. 2013).This low b-value is not surprising since Schorlemmer et al. (2005) demonstrated that different faulting styles within one seismogenic zone follow different recurrence laws, with the b-value estimated for strike-slip faults lower than the one estimated for normal faults.The latter, indeed, are the geological structures accommodating the current and the prevalent extensional stress regime in Southern Apennines (Mariucci and Montone 2020;Scarf ı et al. 2021).
The retrieved low b-value may also indicate that the sequence occurs in a region with lower or negligible pore-fluid pressure (Wyss 1973), higher stress levels (Scholz 1968), and less heterogeneous material (Mogi 1962) with respect to the regions where events generally occur in this study area.Indeed, in the HAV events generally occur in the allochthonous units or in the Apulian Platform whereas this sequence is characterized by deeper locations of microearthquakes (10-12 km b.s.l.) likely occurred in the crystalline basement (Figure 10), which is less heterogeneous with respect to the overburden units.Furthermore, the observed absence of a migration pattern of microearthquakes (Figure 8b) is consistent with a negligible contribution of pore fluid pressure which is the driving physical mechanism of induced seismicity in this area with b-values ranging from 1.1 to 1.5 (Stabile et al. 2021;Picozzi et al. 2022;and references therein).On the other hand, the foreshock-mainshock-aftershock characteristic of the Castelsaraceno sequence suggests some degrees of heterogeneity and hence a not uniform distribution of stress (Mogi 1963), which is in agreement with the hypothesized presence of an asperity (Madariaga 1979).We also suggest that the confinement of the studied seismic sequence in the depth range 10 -12 km may be due to the coexistence of the following elements: 1) the limited spatial extension of the static Coulomb stress transfer related to the low-magnitude mainshock; 2) the ductile rheology of the overlaying Permo-Trias deposits; 3) the brittle-ductile transition below 12 -13 km depths (which occurs in the upper crust at temperatures between 300 and 450 ; Dragoni 1993) according to the average geothermal gradient of about 20-25 C/km in the study area (Sciamanna et al. 2004;Megna et al. 2014).

Conclusions
A semi-automated workflow for earthquake detection and location was applied in this work for the characterization of a natural, low-magnitude earthquake sequence that occurred close to the Castelsaraceno village (High Agri Valley, southern Apennines) from 7 to 12 August 2020.(Carbone et al. 1991;2018;Nicolai and Gambini 2007;Patacca and Scandone 2007;2013;Patacca et al. 2008).
Starting from 65 manually detected earthquakes, the workflow developed as follows: (a) a semi-automated micro-earthquakes detection through a template matching algorithm; (b) a fully automated phase-picking though a deep neural network (PhaseNet); (c) an automated, iterative, multi-step absolute non-linear earthquake location (NLLoc); (d) the locations' refinement through an automated relative double-difference approach (HypoDD) using both catalogue and cross-correlation differential times.
The characterization of the microseismic sequence greatly benefited of the applied workflow since: more than twice the number of manually detected earthquakes were additionally identified through the single-station template matching technique; the application of the deep-learning based algorithm let us perform in few minutes the phase-picking; the automated, iterative absolute earthquake location let us efficiently remove the P-and S-waves phase pickings with high residuals; the subsequent automated relative earthquake location allowed the further selection of only clustered events while dramatically reducing location errors.
The further analyses carried out on the obtained refined catalog enabled us to define the Castelsaraceno foreshock-mainshock-aftershock sequence being characterized by: a Ml 2.1 foreshock doublet (origin time: 2020-08-07, 08:52:31 UTC), followed a few hours later by the Ml 2.9 mainshock (origin time 2020-08-07, 13:34:37 UTC) that ruptured the same patch, thus identifying a persistent asperity.Afterwards, a total of 54 microseismic aftershocks (Ml 1.1) occurred preferentially along-dip without showing a migration pattern.Looking at both spatial and temporal distributions of refined locations we identified a total of 72 events belonging to the sequence occurred at relatively close distances  and in a narrow range of depths (10-12 km).
Aftershock distribution and the focal mechanism of the mainshock allowed us to define the geometry and kinematic of an unprecedentedtly mapped fault, which may be an ancient Lower Pleistocene thrust re-activated in the current extensional regime.Indeed, our results show that it is an anti-apenninic (strike 190 ), left-lateral NNE-SSW strike-slip fault with normal kinematic component (rake À34 ).The estimated b-value (0.73 ± 0.04), lower than that expected for the study area, supports the strike-slip faulting kinematic instead of the typical normal faulting kinematic of the HAV faults.It also suggests the fault activation in negligible pore-fluid pressure conditions and in a relatively low-heterogeneity material.We also speculate that the limited depth range within which the sequence occurred may be due to the combined effect of static stress changes restricted in a very narrow area and of the confinement of a crystalline basement brittle layer between two regions with more ductile rheology.The tested workflow would represent a starting point for the improvement of the current standard procedures for seismic sequences analysis, which are actually quite time-consuming and less effective in microearthquake detection as also proved in this work.As the applied workflow only constitutes a first-order semi-automated procedure for characterization of microearthquake sequence, we aim to further improve our workflow and to extend its application to continuous data streams.The future developments would regard the integration of machine learning tools in the workflow to automatically identify false/true events in the template-matching detection step (e.g.ML classifier algorithms) and the automated seismic phase association (e.g.PhaseLink; Ross et al. 2019a).The procedure would enable us to easily analyze the spatio-temporal distribution of the seismicity in the HAV in larger timespans, verifying the possible occurrence of repeated earthquakes along the identified persistent asperity and/or other seismogenic structures in the area.

Figure 1 .
Figure 1.Map view of the study area.(a) Geographical representation of the High Agri Valley with the main tectonic elements.Seismic stations of the virtual network are represented in the map with triangles.A red star in the top left inset figure shows the location of the study area.(b) Geological map (simplified from Carbone et al. 1991; 2018) of the area delimited by the red rectangle in figure a).The Castelsaraceno sequence is reported (red dots) along with its mainshock (yellow star).

Figure 2 .
Figure 2. Summary of the workflow applied for the study of the Castelsaraceno sequence.Matched filter microearthquake detection, phase picking and (absolute and relative) earthquake location are fully automated.Event selection and source characterization are manual procedures.

Figure 3 .
Figure 3. Example of probability distributions predicted by the deep neural network algorithm (PhaseNet, Zhu and Beroza 2018) giving in input unfiltered three-component seismic waveforms: (i) East-West, (ii) North-South, and (iii) Z components of the seismogram; (iv) probability distributions of the P-(green dotted line) and S-wave (violet dashed line) arrival time, along with the threshold selected in this study (horizontal gray solid line).

Figure 4 .
Figure 4. Comparison between the results of (a) the absolute locations (NonLinLoc), and (b) the relative locations (HypoDD).The yellow star represents the epicentral position of the Ml 2.9 mainshock.The size of the circles representing the epicentral positions is proportional to the estimated magnitude, whereas their color represents the hypocentral depth.

Figure 5 .
Figure 5. 3D spatial (a) and temporal (b) distribution at depth of the refined locations.

Figure 6 .
Figure 6.Example of S-wave displacement spectral inversion for the mainshock source parameters estimation at SPINS station.It shows (a) the mainshock waveform with the S-waves (yellow panel) and noise (red panel) windows selected for the inversion; (b) the spectra calculated for each signal component (Z, N, E) and their vectorial composition, for which Brune's fit is calculated; (c) the grid search misfit function in the three inversion parameters subspaces -(Mw, fc), (t Ã , fc) and (Mw, t Ã ) -explored to find the best fit model parameters: the small white dot identifies the minimum in the space of parameters.

Figure 7 .
Figure 7. (a) Focal mechanism (beachball) and fault plane solutions (FPS1, FPS2) of the mainshock estimated using FPFIT code.Projection of relocated hypocenters of the seismic sequence along sections orthogonal to (b) the FPS1 and (c) the FPS2, respectively.Earthquakes up to 2 km away from each section have been projected on it.Black dashed lines in panels (b) and (c) represent the corresponding FPSs projections.

Figure 8 .
Figure 8.(a) Cumulative seismic moment and cumulative number of events of the sequence; the yellow and orange stars indicate the mainshock and the largest foreshock of the sequence, respectively.(b) Events of the sequence projected on the main event fault plane, with a size proportional to the estimated seismic radius (m) and with a color as a function of the time with respect to mainshock occurrence (minutes).(c) zoomed area of 1 km 2 around the mainshock (dashed black square in b) representing the events distribution along the strike-dip plane; horizontal and vertical refined location errors are also displayed by the respective error bars.

Figure 9 .
Figure 9. Similarity between the two events constituting the foreshock doublet and the mainshock.(a) Autocorrelation of the foreshock doublet.The red and blue rectangles indicate the Swave windows used in (b).(b) Cross-correlation between the first (red) and second (blue) S-wave visible in (a).(c) Cross-correlation between the foreshock doublet and the mainshock.The red and blue rectangles indicate the S-wave windows used in (d).(d) Cross-correlation between the foreshock double S-waves (red) and the mainshock S-wave (blue).

Figure 10 .
Figure10.Geological cross section AB of Figure1.The relocated microearthquakes far up to 2 km from the section have been projected on it.The structural architecture has been reconstructed from literature data(Carbone et al. 1991; 2018;Nicolai and Gambini 2007;Patacca and Scandone 2007; 2013;Patacca et al. 2008).

Table 1 .
Seismic stations belonging to the virtual network.