Feature subset selection in structural health monitoring data using an advanced binary slime mould algorithm

ABSTRACT Feature Selection (FS) is an important step in data-driven structural health monitoring approaches. In this paper, an Advanced version of the Binary Slime Mould Algorithm (ABSMA) is introduced for feature subset selection to improve the performance of structural damage classification techniques. Two operators of mutation and crossover are embedded to the algorithm, to overcome the stagnation situation involved in the Binary Slime Mould Algorithm (BSMA). The proposed ABSMA is then embedded in a new data-driven SHM framework which consists of three main steps. In the first step, structural time domain responses are collected and pre-processed to extract the statistical features. In the second step, the order of the extracted features is reduced using an optimization algorithm to find a minimal subset of salient features by removing irrelevant, and redundant data. Finally, the optimized feature vectors are used as inputs to Neural Network (NN) based classification models. Benchmark datasets of a timber bridge model and a three-story frame structure are employed to validate the proposed algorithm. The results show that the proposed ABSMA provides a better performance and convergence rate compared to other commonly used binary optimization algorithms.


Introduction
Vibration-based structural health monitoring (SHM) has been widely explored over the past decades.Avci et al. (2021) and Das et al. (2016) presented comprehensive reviews of vibration-based damage detection methods and their applications to civil infrastructure.Recently, with the fast development of sensing technologies (Corbally & Malekjafarian, 2022;Malekjafarian et al., 2021), signal processing techniques (Silik et al., 2021(Silik et al., , 2022)), and machine learning approaches (Ghiasi et al., 2016;Malekjafarian et al., 2019), data-driven SHM approaches have attracted a high attention for damage detection of civil infrastructure (Gharehbaghi et al., 2021;Gomes et al., 2018).Vibration-based SHM methods can be mainly classified into two categories: (a) modal-based approaches which are based on vibratory characteristics of structural systems, such as natural frequencies, mode shapes and curvatures (Avci et al., 2021) and (b) data-driven approaches which extract sensitive features from time domain responses to assess the structural conditions (Dadras Eslamlou & Huang, 2022).Data-driven damage detection approaches can be performed in the time domain from the raw sensor data or in the feature domain, in which damage-sensitive features are first extracted from the time series.This process is referred to as feature extraction (FE) (Soleimani-Babakamali et al., 2022).Due to the dimensionality of high-frequency acceleration data, the selection of features to be extracted from raw time domain signals is central to success of data-driven SHM methods.For this purpose, feature extraction methods are normally employed to find useful lower dimension metrics from the raw time domain signals.
The features extracted from acceleration signals are normally calculated over a set time window and provide a summary of the dynamic characteristics of the structure over that window.The change in these features over time is indicative of the behaviour of the dynamical system under measurement.However, most datasets contain irrelevant, highly correlated or noisy features that can be removed without a significant loss of information.This process is referred to as feature selection (FS) (Paniri et al., 2021).
FS is normally used in machine learning-based algorithms, especially when the learning task involves high-dimensional datasets.The primary purpose of FS is to choose a subset of available features, by eliminating features with little or no predictive information and also redundant features that are strongly correlated (Buckley et al., 2022;Paniri et al., 2021).In addition, the large volume of data represents a challenge to classification algorithms.This means that each feature used in the classification process should ideally provide an independent set of information.However, features are often highly correlated, and this can suggest a degree of redundancy in the available information which may have a negative impact on the classification accuracy (CA) (Pashaei & Pashaei, 2022a).Thus, FS approaches are needed to tackle these shortcomings.
The current methods for FS are generally divided into three categories: filter-based methods, wrapper-based methods and hybrid methods (Pashaei & Pashaei, 2022a).The filter-based methods use the statistical information of data to select features before the actual learning algorithm.These methods calculate the relevance of each feature with respect to the target classes.The features can then be sorted based on their individual relevance to the classes and then the top ranked features can be selected for modelling (Buckley et al., 2022).Examples of filter methods include analysis of variance (ANOVA) (Buckley et al., 2022), maximally relevant and minimally redundant (MRMR) (Zhao et al., 2019) and minimisation CONTACT Ramin Ghiasi ramin.ghiasisangani@ucd.ie of joint mutual information (JMI) (Bennasar et al., 2015).The wrapper-based methods use the CA of the predetermined learning model, as a fitness function for the subset evaluation.
The best feature set is the one which maximises the classification prediction accuracy (Pashaei & Pashaei, 2022b).The hybrid methods use independent measures to decide the best subsets for a given features and use a mining algorithm to select the final best subset between the available subsets (Cai et al., 2018).The FS algorithms have been reviewed in (Colaco et al., 2019).
When there is a large number of features, evaluating all states is computationally challenging and therefore metaheuristic search methods are required.Due to the inefficiency of the traditional search approaches in solving complex combinatorial optimization problems, several researchers have adopted metaheuristics algorithms (Xue et al., 2019).For instance, the Binary Coot bird optimization algorithm (BCOOT) (Pashaei & Pashaei, 2023) was developed as a wrapper feature selection method.Additionally, an enhanced version of the Black Hole Algorithm (BHA), namely hybrid dragonfly black hole algorithm was designed for real-world applications (Pashaei & Pashaei, 2021).Pashaei and Pashaei (2022a) introduced an efficient Binary Chimp Optimization Algorithm (BChoA) that integrated the crossover operator to improve the ChOA's exploratory behaviour.In another study, Pashaei and Pashaei (2022b) developed the modified Binary Arithmetic Optimization Algorithm (BAOA) to evolve the usage of RSO for gene selection in high-dimensional biomedical data.Moreover, the binary versions of the Rat Swarm Optimizer (BRSO) (Awadallah, Al-Betar, et al., 2022) and the Horse herd optimization (BHHO) (Awadallah, Hammouri, et al., 2022) were proposed to solve the feature selection problems as the wrapper methods.Recently, the non-dominated sorting genetic algorithm-III (NSGA-III) was developed for feature selection in databases with missing data (Xue et al., 2021).
The slime mould algorithm (SMA) (Li et al., 2020) is a novel and robust metaheuristic algorithm proposed to solve continuous problem and it is inspired by the propagation and foraging of the slime mould which includes a unique mathematical model.Feature selection is inherently a binary optimization problem (Ghiasi et al., 2021).The dimension of the problem is equal to the number of features and each solution vector represents the selection (1) or non-selection (0) of each feature.The binary version of the SMA (BSMA) proposed in (Abdollahzadeh et al., 2021) is used as the main optimization algorithm in this article.A comprehensive survey about SMA applications and its variants presented in (Soleimanian et al., 2023).Moreover, the robustness of using four variants of BSMA as FS algorithm is shown in (Abdel-Basset et al., 2021).Ghiasi and Malekjafarian (2022) discussed that BSMA contains stagnation situation and population diversity in SHM applications which might reduce the efficiency of classification.
In this paper, an Advanced version of the Binary Slime Mould Algorithm (ABSMA) is introduced by incorporating two new operators of mutation and crossover into the BSMA.Mutation and crossover are mainly used as the key operators in genetic algorithm to make changes in the genes of the chromosomes (Sivanandam & Deepa, 2008).These operators have also been used in several optimization algorithms such as Whale Optimization Algorithm (WOA) (Qi et al., 2022) and Salp Swarm Algorithm (SSA) (Faris et al., 2018) to increase their efficiency.In this paper, for the first time, they are used in a combination in the BSMA.The main focus of this work is to identify the minimal set of features which maximises the ability of a learning model to detect and distinguish between damages in structural systems.For this purpose, a three-step framework is presented in this paper based on the proposed ABSMA.Firstly, statistical characteristics of structural response signals under ambient vibration are extracted, and feature vectors are obtained.Subsequently, the best feature subset is selected by the ABSMA algorithm based on desirability index using F-score (Kashef & Nezamabadi-Pour, 2015).In the final step, the selected feature subset is employed for training the classification model based on radial basis function Neural Network (NN).The performance of the proposed framework is evaluated statistically using a benchmark datasets of a timber bridge model (Kullaa, 2011) and a three-story frame structure (Figueiredo & Flynn, 2009;Ghiasi & Ghasemi, 2018).Furthermore, the efficiency of using ABSMA as the main algorithm for FS is compared to several state-of-the-art MOAs such as Binary Particle Swarm Optimization (BPSO) (Chuang et al., 2011), Binary Harris Hawks Optimization (BHHO) (Thaher et al., 2020), Binary Whale Optimization Algorithm (BWOA) (Qi et al., 2022) and Binary Farmland Fertility Optimization Algorithm (BFFA) (Naseri & Gharehchopogh, 2022).Moreover, the main part of the binary version of the SMA is a transfer function that is responsible to map a continuous search space to a discrete search space (Abdollahzadeh et al., 2021;Too et al., 2019).Therefore, in this paper, the impact of various transfer functions (such as S-shaped and V-shaped) on the accuracy of the proposed ABSMA is also assessed.The primary contributions of this study can be summarized as below: (1) A novel algorithm called ABSMA is proposed by adding two operators of mutation and crossover to the BSMA algorithm, to overcome the stagnation situation involved in the original version of BSMA.
(2) The proposed ABSMA is embedded in a data-driven SHM framework to show its performance against other algorithms in the literature.
The rest of this article is organized as follows: SMA algorithm and its binary version are explained in Section 2. The proposed ABSMA is presented in Section 3. Details of the proposed data driven SHM framework using ABSMA are provided in Section 4. In Section 5, the proposed framework's performance is examined using two real-data sets from the SHM community.

Traditional SMA
The SMA is proposed by Li et al. (2020) based on the oscillation mode of slime moulds in nature.The SMA has a unique mathematical model that uses adaptive weights to simulate the process of producing positive and negative feedback of the propagation wave of slime moulds based on biooscillator.It uses these features to form the optimal path for connecting food with excellent exploratory ability and exploitation propensity (Ghiasi et al., 2022).This is normally carried out within three phases: (1) Approach food, (2) Wrap food and (3) Grabble food.The logic of the SMA and each of these phases are shown in Figure 1 and each of these phases are explained here.More details of the SMA is provided by (Li et al., 2020).

Approach food
Equation (1) represents the approaching behaviour of the slime mould to replicate the contraction mode (Li et al., 2020): where W is the weight of the slime mould, vb �! is a parameter with a range of À a; a ½ �, vc !decreases linearly from 1 to 0, trepresents the current iteration, X b !represents the individual location with the highest odor concentration currently found, Xrepresents the location of slime mould, X A �! and X B ! represent two individuals randomly selected from the swarm, W represents the weight of the slime mould.The formula for pcan be given as: where i 2 1; 2; . . .; n (n=number of moulds), and S i ð Þ represents the fitness of X.The best fitness acquired in all iterations is denoted by the DF.The vb �! formula is as follows: The formula of W is organized as follows: where conditionrepresents that S i ð Þ ranks first half of the population, r denotes the random value in the interval of 0; 1 ½ �, bF and wF denote the optimal and worst fitness obtained in the current iterative process, respectively.SmellIndex represents the sequence of fitness values sorted (ascends in the minimum value problem).

Wrap food
This phase replicates the contraction mode of venous tissues of slime mode.Equation ( 7) describes the updating position of slime mould: where LB and UB represents the lower and upper boundaries of the searching range, randand rdenote the random value in [0,1].

Grabble food
As the number of iterations increases, the value of vb �! oscillates randomly between À a; a ½ � and gradually approaches zero.The value of vc !oscillates between À 1; 1 ½ � and eventually tends to zero.The pseudo-code of the SMA is presented in Algorithm 1  (Li et al., 2020).

Binary Slime Mould Algorithm (BSMA)
FS is generally an NP-hard combinatorial binary optimization problem, in which the number of possible solutions increases exponentially with the number of features.For example, if D is the total number of features, the number of possible solutions is 2 D À 1 (Too et al., 2019).The BSMA was first proposed by Abdollahzadeh et al. (2021) for solving binary optimization problems.They compared the effectiveness of BSMA with several binary metaheuristics such as Binary Harris Hawks Optimization (BHHO), Branch and Bound algorithm (BB), Binary Tunicate Awarm Algorithm (BTSA), Binary Farmland Fertility optimization Algorithm (BFFA), Binary Particle Swarm Optimization (BPSO), Binary Teaching -Learningbased Optimization (BTLBO), Binary Archimedes Optimization Algorithm (BAOA) and etc., and they concluded that BSMA is the most robust method compared to other algorithms.Therefore, BSMA was chosen as the main algorithm in this study.The meta-heuristic optimization algorithms (MOAs) normally start with the initialization step to spread the solutions within the search space of the problem.Accordingly, the BSMA is initialized by creating a population of n moulds.Each mould which represents a solution to the optimization process that has D dimensions equal to the number of features in the used dataset.The FS problem is considered as a discrete problem as it is based on choosing a number of features that leads to better accuracy in the classification method.Therefore, for each dimension, BSMA is randomly initialized with a value of 1 for the accepted feature or 0 for the rejected one as shown in Figure 2.This provides the representation of an initial solution for the FS.At the end of each iteration, each mould has a solution in the form of a binary vector with the same length as the number of the features, where 1 means selecting and 0 means deselecting the corresponding feature.This process continues for all iterations and at last, the best feature subset with the least classification error of the classifier is suggested as the best result.
It should be noted that the values generated by the standard SMA are continuous, but the features in FS problems are binary: e.g. using 0 (selected feature) and 1 (not selected) values.Therefore, a transfer function is needed to convey solutions from the continuous space to the binary space.According to literature (Saremi et al., 2015), the using a transfer function is one of the effective ways to convert continuous optimizer into a binary one.In comparison with other operators, the transfer function is user-friendly and less computationally expensive (Saremi et al., 2015).A wide range of transfer functions belonging to the family of the V-Shaped and S-Shaped functions (Mirjalili & Lewis, 2013) can convert continuous values into binary ones.In this study, The V-Shaped and S-shaped transfer functions are used in this study are listed in Table 1.A transfer function receives a real value from the standard SMA as an input and then normalizes this value between 0 and 1 using one of the formulas in Table 1.The normalized value is then converted into a binary value using Equation ( 8) (Abdollahzadeh et al., 2021).
In Equation ( 8), S(a) is the S-shaped transfer function.

The proposed advanced binary slime mould algorithm
To overcome the inefficiency of the BSMA in solving the feature selection problems in SHM domain, an advanced version of BSMA (Ghiasi & Malekjafarian, 2022) is proposed in this section.In the proposed version of BSMA, two ideas from GA (Sivanandam & Deepa, 2008) are implemented on the BSMA to enhance its capability for the FS and solve low population diversity and stagnation situation.The new solutions in GA are mainly created by the two operators: crossover and mutation.In the crossover operator, two solution sets are randomly selected, and some portions are exchanged which result in two new solutions.In the mutation operator, a randomly selected bit of a particular solution is mutated; means that 1 is changed to 0 and 0 is changed to 1.To implement these two operations on BSMA, a three-step procedure is developed as shown in Figure 3.A random solution is generated in the first step, and then a crossover operation is applied to the randomly generated solution and the best available solution.In the second step, the solution obtained from the crossover operation is given as inputs to the mutation operation.Finally, if the new solution is better than the current one, the new solution replaces the current solution.The main purpose of these operations is to increase the population diversity and escaping from the local optimal points to improve the quality of the solutions.In other words, the integration of BSMA with both crossover and mutation operators simultaneously, improve both the exploration and exploitation capabilities of BSMA.Mutation operators improve the exploitation capability of the algorithm by searching around the best solution and crossover improves the exploration capability by searching around a slime mould created randomly.The pseudo-code of the ABSMA is presented in Algorithm 2.

Fitness function
The fitness function (FF) plays an important role in the efficiency of the ABSMA algorithm as it is shown in Algorithm 2. As the developed framework in this study is based on a wrapper feature selection method (Pashaei & Pashaei, 2022a), the fitness function is developed based on the classification model accuracy and the efficiency of selected subset of features.The classification model accuracy is obtained by the evaluation of the test data classification using the trained model.In addition, the efficiency of the selected subset of features is evaluated using F score which measures the desirability of the features and will be defined in the next subsection.The ABSMA selects a vector with the smallest fitness value when the completion conditions are satisfied.The fitness function of the ABSMA is formed as follows: Where W is weighting factor between 0 and 1 and n is the total number of features.F scorei is defined below.The CA is used to define the quality function of a solution, which is the percentage of samples correctly classified and evaluated as Equation ( 10):

F-score
A desirability value, for each feature generally represents the attractiveness of the features, and can be any subset evaluation function like an entropy-based measure or rough set dependency measure (Kashef & Nezamabadi-Pour, 2015).In this paper, F score is used as an index for measuring the desirability of the features.F score is a measurement to evaluate the discrimination ability of the feature i. Equation ( 11) defines the F-score of thei th feature.The numerator specifies the discrimination among the categories of the target variable, and the denominator indicates the discrimination within each category.A larger F score implies to a greater likelihood that this feature is discriminative (Kashef & Nezamabadi-Pour, 2015).
where c is the number of classes and n is the number of features; N k i is the number of samples of the feature i in class k, (k ¼ 1; 2; . . .; c; i ¼ 1; 2; . . .; n), x k ij is the j-th training sample for the feature i in class k, ( j ¼ 1; 2; . . .; N k i ), x i is the mean value of feature i of all classes and x ik is the mean value of feature i of the samples in class k (Kashef & Nezamabadi-Pour, 2015).
It should be mentioned that the performance of the proposed algorithm is evaluated with the state-of-the-art metrics such as precision, recall, accuracy, F1-score and Feature-Reduction index (F r ).F1-score is a weighted average of the precision and recall.This score can be additionally weighted to account for class imbalance.The F1-score is calculated independently for each class and the weighted based on the number of true instances for each class.F r which is used to compare the feature reduction rate in different algorithms, is defined as below: where n is the total number of features and p is the number of selected features by the FS algorithm.F r is the average feature reduction.The more it is close to 1, the more features are reduced, and the classifier complexity is less.The novel data-driven SHM using the proposed ABSMA In this section, an SHM framework is presented using the optimal feature subset selection proposed in Section 3. The method consists of three main steps: (A) The Feature Extraction, (B) The FS using ABSMA and (C) The Feature Classification.The detail of FS using ABSMA is described in the previous section.The following subsections will show the detail of steps A and C. The detailed flowchart of the proposed three-stage framework is depicted in Figure 4.

Feature extraction
In this paper, the functions given in Table 2, are used to form the feature vectors from the time domain sensors collected by sensors.These features are computed in time domain and provide a summary of the statistical characteristics of the signal over the feature extraction window.It should be mentioned that these features are selected based on the recommendations from the previous works in this field (Buckley et al., 2022;Ghiasi et al., 2021).These features represent the energy, the time series distribution and the vibration amplitude of the signals in time-domain (Buckley et al., 2022).

Feature classification
Wrapper-based feature selection methods require a supervised learning approach where knowledge of the varying damage states or classes is available in order to identify the subset of informative features that best discriminate different classes (Pashaei & Pashaei, 2022a).Therefore, in this step, a well-trained classification model is applied to classify various conditions of the structure.In this model, the input matrix includes the selected features, and the outputs are the corresponding damage conditions.In recent years, many neural network models have been proposed or employed for various components of SHM in order to perform  ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi pattern classification, function approximation, and regression (Altabey et al., 2021).Among them, the RBF network is a type of feed forward neural networks that learns using a supervised training technique.Lowe and Broomhead (1988) first exploited the use of the RBF for designing neural networks.Radial functions are a type of function in which the response reduces or grows monotonically with the distance from the centre point.The RBF network is a popular alternative to the well-known multilayer perceptron (MLP) since it has a simpler structure and a much faster training process (Wu et al., 2012).Therefore, RBF neural network is used as feature classifier in this paper.

Experimental results
In this section, two benchmark data set from the SHM community are selected to evaluate the effectiveness of the proposed FS algorithm.The first dataset includes acceleration responses of a timber bridge model recoded in the laboratory of Helsinki Polytechnic Stadia (Kullaa, 2011) and the second dataset contains the responses measured from a three-story frame scale structure published by Los Alamos National Laboratory (Figueiredo et al., 2011).Table 3 outlines the datasets used in this work.All the analysis in this paper are done in MATLAB 9.13 using a computer with processing Intel Core i7-3340 3.1 GHz with 16 GB random access memory (RAM).

The timber bridge dataset
This data set was collected in the laboratory of Helsinki Polytechnic Stadia (Kullaa, 2011) and is available open access (Kullaa, 2018).The data was collected from a timber bridge model as shown in Figure 5.In this experimental campaign, Kullaa (2013) used a random excitation generated by an electrodynamic shaker to activate the vertical, transverse, and torsional modes.They measured the responses at three different longitudinal positions using 15 accelerometers.The sampling frequency was 256 Hz and the length of the signals was 32 s.The data were filtered using a low pass filter at 128 Hz and resampled for sufficient redundancy.The measurements were repeated several times and it was noticed that the dynamic properties of the structure vary due to the environmental changes.The main influencing factors were assumed to be the changes in the temperature and humidity (Kullaa, 2013).Kullaa (2013) modelled the damage by adding masses to the original structure.As described in the original paper (Kullaa, 2013), five artificial damage scenarios were introduced by adding small point masses of different sizes on the structure.The mass sizes were 23.5, 47.0, 70.5, 123.2 and 193.7 gr.The point masses were attached on the top flange, 600 mm left from the midspan (Figure 5).The added masses were relatively small compared to the total mass of the bridge (36 kg), where the highest mass increase was only 0.5 %.The total number of experiments were carried out on the structure was 273.One hundred and ninety of the measurements were selected as the training data.The test data consisted of both the healthy and abnormal measurements.

Feature extraction
The first step in the framework proposed in this research is to extract statistical features from the recorded acceleration response of the structure.For this purpose, the statistical features shown in Table 2 are extracted from the responses of 15 sensors on the timber bridge.The total number of extracted features for each experiment based on Table 2 is 105 features (15 sensors � 7 features).To show the changes of features for different damage classes, eight samples were randomly selected from each damage class and the results are shown in Figure 6.It can be seen in the figures that the RMS of sensor 7 has a specific threshold and boundary for different damage classes, but the Skewness of sensor 10 does not have such a specific threshold.

Feature selection
The automatic feature selection approach introduced in this research is then used to select the best subset of features.As mentioned in the objective function of ABSMA, the selected features should meet the following two conditions: the maximum distance between classes and the minimum distance within classes.Considering the high impact of the transfer function on the performance of the ABSMA, this function must be selected first, which will be discussed in the next subsection.For providing the stochastic behaviour of MOAs, the performance of the algorithms is compared using the best, worst, average and standard deviation (SD) of the obtained fitness values over 20 independent runs in Table 4.  Columns ABSMA-V1, ABSMA-V2, ABSMA-V3, ABSMA-V4, ABSMA-S1, ABSMA-S2, ABSMA-S3, and ABSMA-S4 give the results of the transfer functions V1, V2, V3, V4, S1, S2, S3, and S4, respectively.As stated above, MOAs have stochastic nature and in each independent run, they may have slightly different results.Therefore, for comparing their performance, the approach used by other researchers (Varaee & Ghasemi, 2017) (considering the best, worst, average, and standard deviation of fitness values) is employed here.The results of this analysis are given in Table 3, where ABSMA-V2 shows the best performance in most indexes (best, average, and worst) in comparison with other transfer functions.Therefore, V2 is selected as the transfer function in this study.For simplicity, ABSMA-V2 will be denoted as ABSMA in the rest of the article.

Feature classification
In this section, the accuracy and effectiveness of the proposed framework for feature extraction/selection in SHM domain is evaluated.Furthermore, the results obtained by the proposed ABSMA algorithm are compared to BSMA, BPSO, BHHO, BWOA, and BFFA which are reported to be effective algorithms in FS (Abdollahzadeh et al., 2021).The parameters that need to be set in these algorithms are set to the best values are reported in the original studies and are shown in Table 5.In order to maintain a fair comparison, the population size for all the algorithms is set to be 50 and the maximum iterations is set to be 200.The weighting factor W in the fitness function is varied from 0.6 to 0.9 to get the different sets of features.The results are averaged over 20 independent runs in every algorithm.The dimension of the search space is equal to the total number of features of each dataset.
Table 6 gives the mean of the CA, best, worst, average and SD of the results for each algorithm.The number in the brackets in each table slot shows the ranking of each algorithm and the best result is highlighted with the bold text.It can be seen that the ABSMA scored the best fitness value, followed by BSMA.From Table 6, the best algorithm that contributed to the lowest average fitness value was found to be ABSMA, followed by BSMA and BHHO.On the one hand, BSMA perceived the most consistent results due to the lowest STD values in this dataset.
A comparison of the average precision, recall, F1 score and the amount of F r for other algorithms are given in Table 7.It can be concluded that the proposed ABSMA algorithm can obtain, in most of cases, better CA using a smaller feature set, compared to other algorithms.On the one hand, even though BFFA and BPSO algorithms can reduce the number of features; however, the relevant features are eliminated and thus resulting in unsatisfactory performance.The number of selected features and the average of F r for each optimization algorithm are shown in Figures 7 and 8.It can be seen that the ABSMA not only finds smaller feature subsets than the other algorithms but also the number of selected features also decreases much faster.It can be concluded that the ABSMA provides a higher degree of exploration than the other algorithms, which enables it to explore the search space to find a solution that selects a smaller number of features and better performance.
Among the 20 independent runs in Section 5.1.3,the highest overall performance of the ABSMA is achieved in 19 th and the worst in 13 th run (From 20 independent runs).Figure 9 shows a confusion matrix for the 13 th run of the ABSMA, the worst performing of it.The majority of misclassifications are   occurring between successive damage classes with values being misclassified as the previous damage state.The separation between the healthy and damaged states and the reduction of false alarms is critical for SHM applications (Buckley et al., 2022).For the healthy state, 83.3% of the unseen healthy data is being correctly predicted and the False Negative Rate (FNR) for the Healthy state is 16%.Therefore, despite the poor overall classification prediction for this run, the system of the ABSMA and NN have reasonable accuracy in distinguishing between the healthy state and the damaged states.
Figure 10 shows a confusion matrix for 19 th run, which has the best classification performance of ABSMA.In this run, only 2 samples are misclassified.One of the concerns with an imbalanced dataset is that a classifier may learn to improve prediction performance by randomly assigning datapoints to majority classes (Krawczyk, 2016).The confusion matrices show that this is not the case as the majority of miss classifications across the runs are when the unseen test data is at boundaries between classes particularly between the damage class 2 and 3.
In order to confirm the efficiency of the proposed feature selection framework, the CA of the NN for the selected feature subsets is compared with the one for all features in Table 8.The results show a higher CA value for the reduced number of features.In RBF case, the accuracy is increased from 87% to 94% with 81% data reduction.This result is reasonable because the main benefit of FS is to improve prediction performance and provide faster and more cost-effective predictors.Using too many features degrades prediction performance even when all features are relevant and contain information about the response variable.

The three-story frame structure dataset
The experimental dataset of a three-story frame structure published by Los Alamos National Laboratory is used here as the second case study (Figueiredo et al., 2011).Figure 11(a) shows the three-story frame structure where an electrodynamic shaker was used to excite the frame structure with various damage conditions with Gaussian white noise laterally on the base floor along the structural centreline.The excitation force applied from the shaker to the structure was recorded with a load cell mounted on the stringer and the structural responses were measured using four accelerometers attached at the centre line of each floor as shown in Figure 11(b).The data were collected and processed at a sampling frequency of 320 Hz with a data acquisition system.For each structural damage state, 10 shaking tests were conducted considering the variability of excitations and structural properties (Figueiredo et al., 2011).
The main goal of this benchmark study is to detect damage when the structure has undergone structural changes caused by operational and environmental effects (Figueiredo et al., 2011).For this purpose, the present study selects four structural conditions of the three-story building from the database available for open access (Figueiredo, 2007) to examine the effectiveness of proposed FS in damage localization.The damage cases were simulated through the introduction of nonlinearities into the structure.A bumper and a suspended column were used with different gaps in between them as shown in Figure 12.The gap between the bumper and the suspended column was varied (0.1 and 0.20 mm) to introduce different degrees of nonlinearity.
The selected conditions include the baseline condition without structural damage (termed as D0), structural condition with Gap equal to 0.20 mm and mass on the 1st floor as  Table 9 summarizes all structural conditions investigated in this paper.

Feature selection
The ABSMA proposed in this study is used for FS using this data set.Table 10 gives the CA of ABSMA compared to methods.Like the first dataset, each feature selection algorithm is executed for 20 runs, with different random seed.The averaged results of 20 runs are used for performance comparison.
Table 10 shows that the best mean CA is obtained by ABSMA (90%), followed by BFFA (85%).In comparison with original versions of BSMA, ABSMA has a higher chance to prevent itself from being trapped in the local optimum.
In addition, Table 11 illustrates the mean, best and STD computational time of the proposed methods over 20 independent runs.In addition, ABSMA shows the fastest processing speed in this work.This indicates that ABSMA can the optimal feature subset in a very short period.The reason ABSMA has a very short computational time is because it utilizes a mutation and crossover strategies together, which performs a position update for best slime mould.It can be concluded that ABSMA not only provides a great performance in feature selection but also provides the lowest computational cost.
Moreover, Table 12 shows the number of selected features for proposed method in comparison with other MOAs.It is observed that not all the features are required in the Figure 12.A adjustable bumper and the suspended column (Figueiredo & Flynn, 2009).classification process.A proper selection of features could lead to a higher classification performance with lower complexity.
As presented in Table 12, ABSMA contributes to the smallest number of the features in comparison with other FS wrapperbased algorithm.This means that ABSMA can achieve a promising CA while keeping a smaller number of features.On one side, BFFA has a higher mean number of selected features, 18.It can be inferred that BFFA does not evaluate the relevant features very well, thus leading to a poor classification performance in this work.Finally, according to the results shown, adding desirability index, mutation and crossover operators to the BSMA, increases the exploration of the search and guide the algorithm to more salient features.
Figure 13 demonstrates the convergence curve of the proposed methods on Los Alamos dataset which shows the fitness value in each iteration for different algorithms.It can be seen that ABSMA provides the lowest fitness value compared to others which means it has a good diversity which gives the ability to escape from the local optimum.Unlike BPSO and BHHO, ABSMA keeps tracking for the global optimum, thus leading to a very good performance.On the other side, BPSO and BHHO converged faster, but without acceleration.This shows that BPSO and BHHO are easily getting trapped in the local optimum.It can be concluded that ABSMA is effective and reliable in evaluating the optimal feature subset.
In order to investigate the effect of using the two operators on the efficiency of ABSMA, the convergence history of the algorithm in cases where only one operator is implemented on the ABMSA is compared with the case where both of them are employed in Figure 14.It can be seen that the separate implementation of mutation and combination operators does not significantly increase the efficiency of ABSMA.But, if they are used simultaneously, the algorithm could have the ability to escape from local optima and reach the best subset of features.
In order to present the underlying relationships between the features and structural damages, the features are plotted using data scatter diagrams at different damage classes in Figure 15.

Feature classification
In this subsection, the performance of the NN as a classification algorithm is compared with 5 other commonly used machine learning (ML) classifiers: Random Forest (RF) (Ghiasi et al., 2018) In addition, to compare the performance of proposed wrapper-based approach, 5 well-known filter-based methods are selected from the literature.These methods are as follows: Principal Component Analysis (PCA) (Santos et al., 2016), Neighborhood Component Analysis (NBC) (Malan & Sharma, 2019), Term Variance (TV) (Malan & Sharma, 2019), Pearson Correlation Coefficient (PCC) (Saidi et al., 2019) and Relief-F (Urbanowicz et al., 2018).Table 15 shows a comparison between performance of selected filter-based approaches with the proposed framework.
Generally, in comparison with the filter based models, the wrapper model achieve a higher CA and tend to have a smaller subset size; however, it has high time complexity (Kashef & Nezamabadi-Pour, 2015).
Finally, in order to compare the performance of the proposed framework on damage detection of the Los Almos dataset, the CA of similar ML algorithms from other paper (He et al., 2022) has been chosen and is shown in Table 16.As can be seen in Table 15, by using ABSMA, the SHM framework select more salient feature that enhances the capability of classifier in detecting damage class.
In practice, users might have difficulty in selecting the best features for each SHM problem.Unlike other traditional feature selection methods, users can apply the ABSMA to select the potential features without prior knowledge.Successively, ABSMA will automatically select the optimal features for  specific subjects, and that feature subset will be used in real world application.This, in turn, will reduce the complexity and improve the performance of the damage detection system.In sum, the proposed ABSMA is useful in feature selection.

Conclusion
In this paper, a new algorithm called ABSMA is proposed for FS in SHM problems.ABSMA is proposed for enhancing the capability of the SMA in this domain.The mutation and crossover operators are employed in the proposed ABSMA which could increase diversity, prevent excessive convergence during the optimization process, and local optimal trap escape.Two benchmark data sets selected from the SHM community are employed in this paper.The ABSMA is initially evaluated using eight transfer functions that convert continuous solutions to binary ones, in which the best transfer function (transfer function V2) is selected.The results obtained from the proposed algorithm are compared with 4 state-of-the-art metaheuristic-based algorithms including BHHO, BPSO, BWOA and BFFA.The results of the experiments indicate that a significant improvement in the proposed algorithm compared to other ones.Moreover, the proposed framework can remove the irrelevant and redundant information by choosing useful features as the input of the classification model.It is also shown that the proposed FS approach based on the ABSMA optimization algorithm reaches a better feature set in terms of CA in comparison with the full feature set.In addition, it can be concluded that ABMSA not only yields the optimal classification performance but also provides the minimal feature size and consumes a very low computational cost.Finally, the experimental results show that NN and CFNN can usually achieve the highest CA in comparison with KNN, SVM, RF and DT.The features extracted in time domain are used in this paper to identify the state of the structure.However, using the extracted features in frequency domain and comparing their performance in detecting the state of the structure can be considered as a future extension of current work.Furthermore, the supervised scheme is used for training/testing of ML algorithms in the proposed framework, while in some real-life cases, there is a limited access to labelled data.In such cases, unsupervised learning schemes should be used.Moreover, it is suggested to use a chaotic map to fine-tune the parameters of ABSMA in future works.The base code and extracted feature data have been made available at https:// github.com/raminqs/ABSMA.git.Fathnejat, H., Torkzadeh, P., Salajegheh, E., & Ghiasi, R. (2014)

Figure 2 .
Figure 2.An initial solution to the FS.

Figure 3 .
Figure 3.The process of implementing the crossover and mutation on the solution vector of the ABSMA.

Figure 4 .
Figure 4.The detailed flowchart of proposed framework for feature selection and classification.

Figure 5 .
Figure 5.The experimental case study; (a) the timber bridge model and (b) the locations of 15 sensors and the damage (D) are indicated (Kullaa, 2013).

Figure 8 .
Figure 8.Average of Fr for each optimization algorithms with respect to number of iterations.
representer of operational and environmental condition changes (D1), structural condition Gap equal to 0.10 mm and mass on the 1st floor (D2), and structural condition with Gap equal to 0.20 mm and mass on the base floor (D3).

Figure 13 .
Figure 13.The convergence curve of six different feature selection methods for Los Alamos dataset.

Figure 14 .
Figure 14.The convergence curve of implementation of mutation and cross over operator on ABSMA based on Los Alamos dataset.
Figure 15(a) shows two features randomly selected from the set of features, and Figure 15(b) shows two features selected by ABMSA.It can be concluded that by minimizing the objective function, ABMSA selects the features that have the highest ability to differentiate between the different damage classes.

Figure 15 .
Figure 15.Scatter diagram of features for various damage classes.(a) Shape factor of sensor 1 vs mean of sensor 4 (b) Std of sensor 1 vs Skewness of sensor 4.
, k-nearest neighbor (KNN) algorithm (with Euclidean distance and k = 5)(Too et al., 2019), Support Vector Machine (SVM) (with radial basis kernel function)(Santos et al., 2016), Decision Tree (DT)(Charbuty & Abdulazeez, 2021), and Cascade Forward Neural Network (CFNN) (with 2 hidden layer)(Fathnejat et al.,  2014).Figures 16 shows the boxplots of CA for different ML algorithms on Los Alamos datasets.In these figures, the red line in the box represents the median value, and the symbol "+" denotes the outlier.As can be seen, ABSMA showed competitive median value in most cases.Furthermore, in comparison between the ML algorithm, NN and CFNN provides better classification performance than KNN, SVM, RF and DT algorithms.

Table 1 .
V-shaped and S-shaped transfer function.

Table 3 .
The list of used datasets.

Table 4 .
The best fitness values under eight different transfer functions.

Table 5 .
Parameter settings for the comparative algorithms.

Table 7 .
Comparison of the performance (precision, recall, F1-score and F r ) of the algorithms on timber bridge.
Figure 7. Number of selected features of each optimization algorithms.

Table 6 .
CA of each algorithm for the tested datasets of timber bridge.

Table 8 .
Confusion matrix for worst result of ABSMA (13 th run).Confusion matrix for best result of ABSMA (19 th run).Comparing the performance of RBF neural network for selected features and all features.

Table 9 .
The structural conditions of the three-story frame structure dataset.

Table 10 .
CA of each algorithm for the tested datasets of three-story frame.

Table 11 .
Computational time of each algorithm for the tested datasets of three-story frame.

Table 12 .
Number of selected features in each algorithm for the tested datasets of three-story frame.

Table 13 .
P-values of Wilcoxon signed-rank test.

Table 14 .
P-values of Wilcoxon signed-rank test.

Table 15 .
CA of five different filter-based feature selection method and proposed approach.

Table 16 .
CA of proposed framework for classification of damage state in comparison with other framework.