Leveraging involution and convolution in an explainable building damage detection framework

ABSTRACT Timely and accurate building damage mapping is essential for supporting disaster response activities. While RS satellite imagery can provide the basis for building damage map generation, detection of building damages by traditional methods is generally challenging. The traditional building damage mapping approaches focus on damage mapping based on bi-temporal pre/post-earthquake dataset extraction information from bi-temporal images, which is difficult. Furthermore, these methods require manual feature engineering for supervised learning models. To tackle the abovementioned limitation of the traditional damage detection frameworks, this research proposes a novel building damage map generation approach based only on post-event RS satellite imagery and advanced deep feature extractor layers. The proposed DL based framework is applied in an end-to-end manner without additional processing. This method can be conducted in five main steps: (1) pre-processing, (2) model training and optimization of model parameters, (3) damage mapping generation, (4) accuracy assessment, and (5) visual explanations of the proposed method’s predictions. The performance of the proposed method is evaluated by two real-world RS datasets that include Haiti-earthquake and Bata-explosion. Results of damage mapping show that the proposed method is highly efficient, yielding an OA of more than 84%, which is superior to other advanced DL-based damage detection methods.


Introduction
A natural disaster is an extreme event that occurs within Earth's system and can cause widespread destruction, substantial collateral damage, and even loss of life as the result of forces independent of human activities (Davies et al., 2021;Kapucu et al., 2022).Among the types of natural disasters, earthquakes are the deadliest typically causing thousands of deaths (Binici et al., 2022;Lin et al., 2022;Ünlü & Kiriş, 2021).Earthquake damage mapping is a critical analysis after the occurrence of each earthquake (Li et al., 2019), as it can provide valuable information about damages for such applications as relief and response as well as insurance company assessments (Cui et al., 2021).In urban areas, in particular, buildings are the main components damaged by earthquakes (Rupnik et al., 2018).Thus, accurate and timely building damage mapping is very important for managing the response and performing subsequent analyses (Jundullah & Wahyu Wijayanto, 2022;Weber et al., 2022).
Advancements in technology have increased interest in building damage mapping as a research topic (Cotrufo et al., 2018).Recently, an assessment of the building damage caused by earthquakes has been conducted based on the type of dataset utilized.Mainly, damage assessment is made based on light detection and ranging (Lidar) (Talreja et al., 2021), synthetic aperture radar (SAR) (Boloorani et al., 2021;ElGharbawi & Zarzoura, 2021;Wang et al., 2022), very high resolution (VHR) satellite imagery (D'Addabbo et al., 2022), and aerial unmanned aerial vehicle (UAV) imagery (Naito et al., 2020).
Due to the availability and simple interpretation of the VHR dataset, the use of the VHR dataset for mapping building damage is more convenient than other kinds of datasets.Thus, many damage detection methods have been developed in previous studies.The damage detection based on VHR data can be categorized into two main groups that include (a) damage detection based on a bi-temporal VHR dataset and (b) damage detection based on only a single post-event dataset.
The damage detection methods based on bitemporal datasets are more popular in building damage mapping.These methods try to extract damage information from pre/post-event datasets by a change detection analysis.For instance, Gupta and Shah (2020) designed a building damage mapping method based on a bi-temporal pre/post-event dataset called RescueNet.The RescueNet is based on ResNet50 and employs the Atrous spatial pyramidpooling module for extracting multi-scale features.Furthermore, RescueNet segments the buildings and assesses individual damage levels simultaneously.Ji et al. (2020) employ a pre-trained CNN model to map damaged buildings by incorporating a bitemporal pre/post-earthquake VHR dataset.The pretrained VGG-Net model is used to classify the deliberately collapsed buildings.The fine-tuned VGG-Net demonstrates superior performance to the model trained from scratch.Merlin and Wiselin Jiji (2019) propose a strategy for mapping damage based on change detection on the pre/post-event VHR dataset.This framework follows several steps for detecting building damages: (1) building detection by the integration of color invariant and thresholding, (2) change detection by image differencing, (3) spectral and spatial feature extraction and, then, feature selection by a feature-ranking approach, and (4) classification by a feature-mean-ratio algorithm.Although these methods have obtained promising results in building damage detection, these methods are based on change detection by bi-temporal post/pre-event VHR dataset, and finding a pre-event dataset is sometimes a challenge.Furthermore, the result of other algorithms can be affected incorrectly by other conditions in a bi-temporal dataset (changes in a bi-temporal dataset can originate from other conditions) that can result in false alarms (Seydi & Hasanlou, 2021).
The damage detection methods by single postevent dataset extract damage patterns from the postevent dataset.Unlike, first-group methods, these methods do not require the pre-event dataset and some additional pre-processing (i.e.image-to-image registration).Since the pre-event dataset is not always available, these methods can be more effective for building damage assessment.To this end, some studies considered damage detection methods using only a single post-event dataset.For instance, Ünlü and Kiriş (2021) designed a deep learning-based damage mapping framework using a convolutional neural network (CNN) and image segmentation.This framework is applied in two phases: (1) image segmentation by a k-means clustering method that segments the labels "damaged" and "non-damaged" categories and (2) deep learning-based methods (e.g.visual geometry group or VGG-16).CNN models have been employed for labeling segments in three classes: damaged, less-damaged and non-damaged.
Furthermore, Zheng et al. (2021) designed an objectbased semantic segmentation method for building damage assessment.The deep object localization network replaces the super-pixel segmentation commonly used in the conventional object-based image analysis procedure for generating accurate building objects, an approach that offers seamless integration of object-based image analysis (OBIA) and DL.Additionally, a unified semantic change detection network is constructed using a deep object localization network as well as a deep damage classification network.Ji et al. (2019) compare the efficiency of the texture feature originating from a grey-level cooccurrence matrix and deep features by utilizing a random forest classifier and pre/post-earthquake VHR dataset.The results of this damage assessment show that the deep features outperformed texture features in identifying collapsed buildings.Furthermore, the combination of CNN with the random forest classifier has greater accuracy in damage mapping than CNN alone.However, the above-mentioned methods have provided some acceptable results in damage mapping, accurate damage detection remains a big challenge in damage detection by only post-event datasets.This issue can be originated from the complexity of urban areas and its effect on damage assessment.The urban area is the more complex regions with different types of buildings in different shapes and sizes.This issue caused by damage detection is considered a challenging research topic area.
Results of damage mapping based on previous studies indicate that RS VHR imagery has a high potential for damage mapping.The result of damage mapping depends on the quality of features and the classifier algorithm used.Mainly, the above-mentioned methods in both groups utilized standard convolution layers with additional pre-processing such as OBIA.The efficiency of advanced deep feature extraction layers (i.e.involution) and attention mechanism has been ignored by recent studies in building damage assessment.To minimize the mentioned challenges and enhance the performance of damage detection, it is crucial to develop an advanced procedure for the identification of damaged buildings.
DL-based algorithms have recently demonstrated very promising results in many RS applications, such as crop mapping (Natteshan & Suresh Kumar, 2020), classification (Wang & Miao, 2022;Xu et al., 2021), algae monitoring (Huynh et al., 2022).To this end, this research focuses on a novel damage detection framework based on DL.The proposed method uses the post-earthquake VHR dataset and building vector maps to assess building damages.Furthermore, the presented DL framework is based on a combination of multiscale convolution layers and a new version of the convolution layer called the involution kernel.
Furthermore, to enhance the robustness of the network attention mechanism was combined.
As a matter of scientific fact, black-box artificial intelligence algorithms are still barriers, since they lack interpretability and explainability (Petch et al., 2021).In contrast to simpler and self-explanatory models, DLbased methods lack interpretability due to their complexity and nonlinear nature.An explainable artificial intelligence (XAI) model is an artificial intelligence model that generates an output that humans can understand, the opposite of a "black box" model (Langer et al., 2021;Rojat et al., 2021).That is, XAI models provide an understanding of what is causing the model output.By using XAI, we can assess the rationality of the training dataset for damage mapping, which is vital in supervised learning methods.
In this study, the following main contributions are made: (1) present a novel DL-based framework for building damage assessment; (2) we apply the XAI approach for a building damage mapping model that describes how/which features contribute to model output in the damage mapping; (3) we employ an advanced feature representation method based on involution kernel and multi-scale convolution; and (4) we compare and assess the efficiency of our framework with that of other DL-based methods in two different real-world case study areas.

Methodology
The general overview of the building damage detection framework is shown in Figure 1.The building damage detection process is applied in five main steps: (1) preprocessing which is applied to prepare input data for the next analysis, (2) model training and optimization of model parameters, (3) after training, the predictive model is utilized to generate the final building damage map.Then, building footprints are extracted by overlaying buildings with no labels with the vector maps.Next, those building footprints are fed into the predictive model to obtain a label.Finally, the predicted label is assigned to all pixels within the footprint in the raster map.A visual representation of this process can be seen in Figure 2, (4) accuracy assessment that is evaluated on the results of damage mapping based on comparison with reference map, and (5) visual explanations from the proposed method that can help to understand what is causing the model output.

Pre-processing
Pre-processing is the first step of our damage mapping framework that is applied to prepare input datasets for consideration by DL.These are the main preprocessing steps.(1) Registration of the VHR dataset with a vector map: the registration is checked based on building polygons overlaid on the image dataset.( 2) Building footprint extraction: building footprints are extracted based on overlying building polygons (vector maps) with VHR datasets.Thus, the building footprint is used as the input feature dataset to the proposed framework.It is worth noting that all buildings have different sizes, while the input size of the network is constant.Therefore, the small buildings are up-sampled by interpolation and large buildings are down-sampled by aggregation.(3) Data augmentation is used to increase the size of sample data by some operations.This research employs the data augmentation technique for the Bata explosion dataset, a method that uses operations such as rotation (90 degree), random flip (left and right), random flip (up and down), and transpose.The criteria to classify buildings is based on whether they are non-damaged or damaged based on having debris in them.Accordingly, two classes are defined as follows: Non-Damaged Building: A building with an intact roof is classified as a non-damaged building.
Damaged Building: A building whose roof has been destroyed by an earthquake is considered a damaged building.

Proposed network training
The proposed method has several parameters that require tuning.First, the training network is applied based on the back-propagation method to both the training and validation datasets.The model parameters are initialized by an initializer such as He-Normal (He et al., 2015).They are then optimized by learning, using training samples, and are evaluated using the validation dataset.The error of the network, calculated by the loss function, is then fed to an optimizer such as adaptive moment estimation (Adam) (Kingma & Jimmy, 2017) to adjust the network error.This flow proceeds until stop conditions (i.e.number of iterations) is reached.After training, the optimum model is employed for the next analysis.We have employed the binary cross-entropy (BCE) as the loss function which is defined in Equation (1).
where v is the real label and p is related to the output of the method.The proposed network has some novelty and differences from other similar networks for building damage detection:

Proposed DL architecture
(1) Utilizing a multi-scale kernel instead of only a single kernel convolution enhances the robustness of the network against size variation.
(2) Employing an involution kernel to obtain rich feature representations.
(3) Utilizing an SE block to obtain the extraction of informative deep features.

The SE attention block
The SE block enhances channel interdependencies with minimal additional computational cost.The three main operations in SE are (1) squeezing, which reduces the spatial dimensions of the feature map to a singular value through global average pooling; (2) excitation, which learns the adaptive scaling weights for a feature by dense layers with different activation functions; and (3) rescaling, to get the original size via element-wise multiplication.Figure 4 illustrates the overview of the SE attention block.

Multi-scale convolution layers
Convolution layers are the basic operators in DLbased procedures.These layers automatically provide high-level, meaningful deep features from the given image (Jagannathan & Divya, 2021).The feature value (VÞ of the l th layer with an input data x and a nonlinear activation G is obtained as follows in Equation (2) (Seydi et al., 2021): where b and w are the bias and weighted vectors in the l th layer, respectively.The output of a convolution layer with kernel size (M � NÞ at position (x,y) is calculated as follows in Equation ( 3) (Yu et al., 2020): (3) The multiscale kernel convolution layer uses different kernel sizes (i.e.3�3, 5�5, and 7�7) instead of only a constant kernel size.It improves the robustness of the network against size variations by adopting multiscale blocks.

Innovation kernelkernel
Convolution kernels are space-agnostic and channelspecific, which makes them incapable of adapting to diverse patterns of visual representation according to location (Li et al., 2021).Convolutional layers can capture richer spatial context information and long-range spatial interactions with larger kernels, and receptive fields are greater with larger kernels.In contrast, the number of parameters from convolution kernels increases quadratically as the kernel size increases.Furthermore, convolution also presents problems with capturing long-range spatial interactions due to its receptive field.To address this limitation, an involution kernel was introduced that is location-specific and channel-agnostic (Meng et al., 2021).Due to fewer parameters and less computation, the involution layer is more efficient than traditional convolution layers.Thus, the combination of involution layers and basic layers can help to increase the performance of networks with low parameters.The general structure of the involution layer is shown in Figure 5.Let H 2 R H�W�C�ρ�ρ�Φ indicate the involution kernel with Φ groups.An involution kernel in position x; y ð Þ is as H x;y;:;:;φ 2 R ρ�ρ ; φ ¼ 1; 2; . . .; Φ.The output feature map of the involution kennel F ð Þ at the coordinate x; y ð Þ for the input feature map (X) is defined in Equation ( 4) (Li et al., 2021).
(4) where H x;y is generated solely conditioned on the spectral feature vector X x;y 2 R C for efficiency, as defined in Equation ( 5) (Li et al., 2021).Furthermore, considering the center pixel as the center, Δρ 2 Z 2 represents the neighborhood offsets.
where ϕ: 2 R C R K�K�G denotes the kernel generation function; r are the parameters of the first and second linear transformations, respectively.Furthermore, BN denotes batch normalization and r is the reduction ratio.

XAI Grad-CAM interpretation
Gradient-weighted class activation mapping (Grad-CAM) is a well-known class activation mappingbased method that employs backpropagation to score the feature maps' position in a layer (Sattarzadeh et al., 2021).Recently, this method has outperformed other XAI methods in remote sensing applications (i.e.classification) (Kakogeorgiou & Karantzalos, 2021; Figure 3. Proposed involution/convolutional neural network architecture for building damage mapping.Stomberg et al., 2022).Unlike other XAI methods (i.e.CAM), the Grad-CAM method does not need global average pooling; as a result, Grad-CAM has led to widespread use in the visualization of key features (Kakogeorgiou & Karantzalos, 2021).This method uses the output of the features in the last convolution layer for the saliency map (Sattarzadeh et al., 2021).The saliency map can be calculated out of the feature map (Ψ 2 R w�h�N Þ in the last convolution layer for class c, as shown in Equations ( 6) and (7).
where G is the corresponding gradient for the feature map Ψ i .

Accuracy assessment
Accuracy assessment is performed through two procedures, (1) visual analysis and (2) numerical analysis by measurement indices.The numerical analysis is based on the comparison of the results of the proposed method in test areas with the testing dataset.This research uses the confusion matrix and six widely adopted indices to assess the thematic quality that originated from the confusion matrix.These indices include overall accuracy, the Kappa coefficient, omission error, commission error, recall, and precision.Furthermore, to evaluate the performance of our framework, three of the most common DL-based methods are implemented: (1) CNN (Kalantar et al., 2020) that includes two convolution layers, one max-pooling and two dense layers, (2) residual CNN (Res-CNN) was built by one stem block for shallow deep feature generation, three residual blocks and one dense layer, (3) vision transformer (ViT) (Dosovitskiy et al., 2020)   several multi-head attention layers and two fully connected layers, (4) channel-expanded CNN (CECNN) (Qing et al., 2022) which has five convolutional layers, two max-pooling layers and three fully connected layers, and VGG-19 (Ünlü & Kiriş, 2022) which has been built by 16 convolutional layers organized into five blocks, interspersed with max-pooling layers.These methods are applied in the same condition as the proposed method with the same hyperparameters.

Dataset and case study #1: bata explosion
During the afternoon of 7 March 2021, a series of explosions occurred at the Nkuantoma armory and military barracks in Bata, Equatorial Guinea's economic center.As a result of these explosions, more than 100 people were killed and more than 600 were injured.Figure 6(a) shows the very highresolution dataset.Figure 6(b,c) presents the location of the first case study area (Bata explosion).This dataset was captured by Worldview-III on 9 March 2021, with four spectral channels and a spatial resolution close to 50 (cm).
Figure 7 illustrates the building vector map for the study areas.This vector map is generated based on a pre-event dataset having been manually digitized by a local expert.This vector map includes 706 building footprints, among which 338 polygons belong to nondamaged buildings (green polygons) and 368 polygons are related to the damaged buildings class (red polygons).Additionally, the Bata-Explosion dataset's Table 1 presents the details of the incorporated sample dataset used for this case study.This sample is divided into three group datasets: training data, validation data, and testing data.

Dataset and case study #2: Haiti earthquake
In the western portion of the Republic of Haiti, approximately 25 km south of the capital city of Portau-Prince Figure 8(b,c), a magnitude of 7.0 earthquake struck at 4:53 pm, local time on 12 January 2010.The Haitian government reported that over 316,000 people died or went missing, 300,000 were injured, and 1.3 million were left homeless by the earthquake (DesRoches et al., 2011).Worldview-II satellite imagery acquired on 15 January 2010, was used in this study to evaluate the proposed method Figure 8(a).This dataset contains a 50 (cm) spatial resolution and three spectral channels (red, green, blue).The testing area includes buildings of different sizes and roof shapes.
The classification results are strongly influenced by the quality and quantity of the sample dataset.One of the critically important evaluations of classifier methods is the assessment of their generalization capability.To evaluate generalization, this research uses two different regions for training and evaluating the network.Figure 9 illustrates the distribution of the sample data (red and green) for two classes of damaged and intact polygons.In addition, the yellow polygons are incorporated to assess the damage detection algorithms.A ground truth dataset for Haiti-Earthquake can be found on the website (https://dataverse.harvard.edu).
The details of the used sample data are presented in Table 2.As with the first case study, the sample dataset is divided into three groups: training data, validation, data, and testing data.

Experiment and results
DL-based algorithms have several parameters that need to be set.The values of these parameters are set as follows: the mini-batch size is 550, the dropout rate is 0.2, the number of epochs is 500, the learning rate is set at 10 −3 , and the number of neurons in the first and second fully connected layers is 550 and 250, respectively.Finally, due to the structure of buildings in both study areas, we set the final input patch-size of the model for Bata explosion and Haiti earthquake are 25 × 25 and 50 × 50, respectively.

Results from damage mapping of bata explosion
The results of damage detection methods for the Bata explosion are shown in Figure 10.It can be seen that the more damaged buildings are located in the center and the non-damaged buildings are located around the study area.Generally, the results of damage detection show that all methods have provided acceptable performance, but there is a difference in the more detailed mapping.For example, some non-damaged buildings are detected as damaged buildings by algorithms.The proposed method provides excellent performance in damage mapping of both classes.
Figure 11 shows the enlarged sample buildings with the results derived from different classification methods.As can be seen, the proposed method detects the most damaged building polygons, as well as the most undamaged building polygons as well, while other methods miss-classify buildings.
The accuracy assessment for the building damage map is presented in Table 3.As observed, the CNN, Res-CNN, and ViT algorithms provide an overall accuracy of under 80% and a Kappa coefficient close to 0.5.The CECNN and VGG-19 damage detection models have led to an overall accuracy of 81.6% and 81.4%, respectively.The proposed method provides considerable improvement in building damage mapping, as the overall accuracy and Kappa coefficients are over 84% and 0.68, respectively.All these methods present considerable efficiency in the detection of damaged buildings, more so than for non-damaged buildings.Furthermore, the CNN, Res-CNN, CECNN, VGG-19, and ViT algorithms provide recall and precision under 82%, while the proposed method provides more than 83%.
The confusion matrices from the building damage detection results are presented in Figure 12.These results indicate that the compared methods detected fewer than 160 out of 203 non-damaged building polygons, while the proposed method detected more than 170 out of 203 non-damaged building polygons.Similarly, the proposed method detected 185 out of the 221 buildings in the damaged class, while other methods detected fewer than 174 damaged polygons.Elements of the secondary diagonal show the detection error in the confusion matrix.The proposed method is provided under 36 polygons, while other methods have provided more than 43 building polygons.We also observed that the VGG-19, CECNN, and the proposed method have closer results in the secondary diagonal than other combinations of compared methods.

Results from damage mapping from the Haiti earthquake
Results of building damage mapping by DL-based methods for the testing area of the Haiti earthquake are shown in Figure 13.As can be seen, most of the methods provided similar results in building damage mapping, although their details differ.
For clarity, we selected some random building polygons from the results of building damage mapping presented in Figure 14.As can be seen, this figure presents the results of enlarged building polygons for six building polygons.We find that the proposed method has good performance in mapping both damaged and non-damaged building polygons.
The numerical results of building damage mapping for the Haiti earthquake are presented in Table 4. Based on the results obtained by quality measurement indices, all the methods yielded greater accuracy than the Bata explosion dataset in the testing area.Accuracy by DL-based methods ranged from 81% to 90% by overall accuracy index.Among the DL-based methods used, the proposed method provides the highest accuracy, as its overall accuracy is 90.76%.The proposed method improves damage mapping by more than 9, 5, 3, 2, and 7% points in the overall accuracy index for the CNN, Res-CNN, CECNN, VGG-19, and ViT algorithms, respectively.Furthermore, a significant improvement can be seen in the Kappa coefficient.As indicated by the results, the improvement in the Kappa coefficient index of the proposed method is more than 0.5 compared with other methods.The CECNN and VGG-19 models have led to better performances than the proposed method for nondamaged and damaged classes, respectively.These models have missed their performance in the detection of the damaged and non-damaged classes, respectively.
The confusion matrix of the building damage detection methods is presented in Figure 15.As can be seen, more methods provide good performance in the detection of non-damaged building polygons: among 943 non-damaged building polygons, more than 758 building polygons are truly detected by more methods.However, the proposed method detects 895 building polygons from 943 nondamaged building polygons.Furthermore, the efficiency of the proposed method in detecting damaged building polygons is considerable, as the proposed method accurately detects 412 out of 497 building polygons.However, the VGG-19 approach has detected 914 polygons correctly, which is better than the proposed method, whose results are downgraded in detecting damaged buildings.It is worth noting that the CECNN method provides the same performance in mapping damaged buildings, although it is less effective in mapping non-damaged buildings.

XAI results in building damage mapping
This research used the Grad-CAM XAI model to visualize the critical portions of the input data for the proposed method.Table 5 shows the result of  the Grad-CAM XAI model for some building polygons in the latest convolution layer of the proposed method for the Bata explosion.Based on this figure we can see that for non-damaged buildings, the model focuses on the whole areas of the buildings.Thus, the model for non-damaged buildings considers all parts of a building.For damaged buildings, the model tries to focus on collapsed areas, where high texture is considered for building damage classification.Thus, the model tries to learn how to classify a building into two classes based on its texture characteristics.Furthermore, the superior performance of the model is logical and not unexpected.
Similarly, the result of employing the Grad-CAM algorithm for the Haiti earthquake is illustrated in Table 6.As seen, the model provides selected results from the Bata explosion dataset.This subject originates from the features of the building for this study area.For non-damaged buildings, the model tried to focus on different areas of buildings; these areas have smooth textures.Moreover, since the damaged building model concentrates on non-smooth texture areas (debris), the model learns the properties of damaged and non-damaged areas as well.

Ablation analysis
This analysis aims to determine how the removal of a portion of the model affects the overall model performance.Using three different scenarios (S1) without an SE module, (S2) without an involution layer, and (S3) without a fully connected layer, we investigated the impacts of ablation analysis on the proposed framework.Table 7 represents the result of the ablation analysis on building damage mapping for the Bata explosion.As can be seen, the fully connected layers have the lowest impact on the structure of the proposed framework.Furthermore, the involution layer plays a key role in the proposed method since it reduces the performance of the model by more than 3% in terms of OA.
Table 8 provides the result of ablation analysis for the Haiti earthquake.Similarly, all components play a key role in the performance of the proposed model.Based on these results, the SE module has the highest impact on the effectiveness of the model (S1).Further, the fully connected layers have the least impact on the structure of the proposed framework (S3).
Table 9 is an illustration of the Haiti earthquake feature maps in the SE module and the involution layers.As seen, the involution layers focused on key points in the first and second layers.The SE module considers the specific region in the earlier  layers.Furthermore, the visualization of feature maps for the third layer shows that either the involution layer or the SE module focused on whole of the building surface for making a decision.

Discussion
This section investigates and delineates the challenges and issues related to the building damage mapping process and summarizes the performance of the proposed method in different scenarios.This research evaluates the performance of building damage detection in two different real-world study areas.Moreover, the results of building damage detection are compared with other state-of-the-art methods.Based on the results presented in Figure 10 through 15 and Table 3 through 4, the proposed method outperforms other state-of-the-art methods.The efficiency is proved in both datasets and presented as numerical results in Table 5.Most of the building damage detection-based methods use a bi-temporal pre/post-event dataset.However, change detection-based building damage detection methods could provide some promising results.For instance, Gupta and Shah (2020) proposed a deep learning-based framework for a bitemporal dataset that has provided an overall score of 0.77.Furthermore, Merlin and Wiselin Jiji (2019) provided an accuracy of 88% in damage detection by the DL-based method.Thus, the proposed method achieved an accuracy equal to other bi-temporal-based damage detection methods.It is worth noting that this study only uses a post-event dataset for damage mapping: pre-paring and extracting the change information from a bi-       temporal dataset is more time-consuming and challenging.Furthermore, finding a post-event dataset requires more consideration, presenting a substantial challenge to building damage mapping.Furthermore, the proposed framework uses the building vector map for extracting the footprints of buildings that are available by open street map (OSM) website and many organizations.In other words, the process lead to the proposed method as it does not require additional datasets (pre-event) to be processed.In addition, the model focuses only on post-event dataset that can help to reduce the complexity of model.This benefit of the proposed method can help apply damage mapping in real-world applications.
Unlike the semantic segmentation DL-based methods, which demand a large sample dataset, the proposed method applies to a small sample dataset.A pretrained model can also solve the challenge of obtaining sample data, but the negative transfer does not apply to all study areas.In fact, negative transfer learning refers to the similarity of the input dataset for the model and target datasets, which must be sufficient.To this end, Ji et al. (2020) utilized a pre-trained CNN model for building damage detection that provided an accuracy of 88% by overall accuracy index.Thus, the utilization of a pre-trained model is not effective for all solutions.Due to the structure of building areas in cities, this similarity is not considered, which affects building damage detection results.
Based on the presented results, increasing the size of sample datasets can improve the performance of DL-based methods.For example, the Bata explosion test case uses almost pieces of 210 data samples and the proposed method provides an accuracy of 84% by the OA index.In contrast, the proposed method for the Haiti earthquake provides an overall accuracy of more than 90% using 500 sample datasets to train the model.Thus, the training model with a suitable sample dataset can obtain considerable improvement in damage detection results.
The main difference between the buildings in the study areas is orientation, size, and color.The model can focus on the texture of buildings to classify them into non-damaged and damaged classes.The results of XAI show the model focuses on the texture of buildings for damage mapping.It is worth noting that buildings are a highly complex component in urban areas.Thus, their features might mislead the downstream model.
Generalization is an important criterion for DLbased methods in building damage detection.To evaluate the generalization of the DL-based method, we separated the training dataset from testing areas for the Haiti earthquake dataset.The result of building damage detection for this dataset shows that the model has high generalization for the unseen samples.
Table 10 provides a comprehensive comparison of the parameter counts of various deep learning models, which serves as an indicator of their respective computational costs.Notably, models such as VGG-19 and ViT, which are characterized by larger parameter counts, typically require more computational resources for both the training and deployment phases.Conversely, our proposed model, characterized by a leaner parameter count, has the potential to provide improved efficiency while maintaining performance.It is important to emphasize that the CNN model has the lowest computational cost among the models compared.However, it is important to recognize that this efficiency may come at the expense of optimal performance.

Conclusion
This study proposed a novel deep learning-based framework for rapid and accurate building damage detection in two different areas and event types (earthquake and explosion).The informative feature extraction is the most important task in the supervised learning methods that common damage detection methods ignored the capacity of advanced deep feature extraction methods.To this end, we proposed a DL framework based on combining involution and convolution layers that increase the capacity of the model in robust feature extraction.Furthermore, the proposed method took advantage of a multi-scale block and attention mechanism to improve the building damage detection results.The results showed that the proposed method had high efficiency in the mapping of building damages for both datasets.Our method achieved a high accuracy with only post-event datasets, while other building damage detection methods focused on bi-temporal datasets.Besides, we utilized Table 9.An illustration of the feature maps of the SE module and involution layer for the Haiti earthquake.building polygon vectors to model focused on deciding on building polygons (damaged or nondamaged).This theme helps to reduce computational cost and model complexity.
We employed the Grad-CAM XAI model to visualize critical features of the input data in the building damage mapping.The latest convolution layer of the proposed method was employed for visualization.Results from the Grad-CAM XAI model demonstrated that the texture of the building had a key role in the classification results.The proposed model tended to classify a building with a smooth texture as nondamaged, while a building with a harsh texture (debris areas) was classified as damaged.Thus, it is worth noting that sample data include the type of debris areas in a model that can generate reliable results on damaged polygons.Grad-CAM XAI is therefore used to analyze data in more detail to make generic sample data that can be used in the future to improve the efficiency of the model.
Generally, the proposed method had several advantages: (1) extraction of robust deep features that can detect damages with less error and high accuracy; (2) it has higher generalization than other DL-based methods; and (3) it uses only a post-event dataset for building damage mapping instead of bi-temporal dataset.

Figure 3
Figure 3 presents the general overview of our DL network for building damage mapping.As can be seen from this structure, the proposed framework is composed of two fundamental components: (1) deep feature extraction and (2) classification.The deep feature extraction component includes convolution and involution layers that extract high-level meaningful deep features.In addition, the squeeze-and-excitation (SE)

Figure 2 .
Figure 2. The labeling process of the proposed damage mapping framework.

Figure 1 .
Figure 1.Flowchart of the proposed framework for building damage mapping.

Figure 5 .
Figure 5.The involution schema.� and � refer to the summation and multiplication operations, respectively.

Figure 6 .
Figure 6.The dataset used in the first study area: (a) post-event high-resolution data, (b), and (c) the geographical location of the study area.

Figure 7 .
Figure 7.The sample building footprints for the first study area in both classes.

Figure 8 .
Figure 8. Dataset used for the Haiti earthquake: (a) post-earthquake VHR image, (b), and (c) the geographical location of the study area.

Figure 9 .
Figure 9. Spatial distribution of the sample dataset for the Haiti earthquake.

Figure 11 .
Figure 11.Enlarged sample buildings after the Bata explosion green polygons denote damaged buildings; red polygons denote undamaged buildings.

Figure 14 .
Figure 14.Enlarged sample buildings in study areas for the Haiti earthquake.

Table 1 .
Characteristics of the sample dataset for Bata explosions.

Table 2 .
Characteristics of the sample dataset for building damage mapping for the Haiti earthquake.

Table 3 .
Comparison of accuracy of different algorithms for building damage mapping for the first study area.

Table 4 .
Comparison of the accuracy of different classification algorithms for damage mapping for the Haiti earthquake.

Table 6 .
Visualization results of Grad-CAM for the Haiti earthquake.

Table 5 .
Visualization results of Grad-CAM for the Bata explosion.

Table 7 .
Ablation analysis of the proposed method for the Bata explosion.

Table 8 .
Ablation analysis of the proposed method for the Haiti earthquake.

Table 10 .
A comparison of the computational cost of deep learning models.