Enhanced EEG classification using adaptive DWT and heuristic-ICA algorithm

Electroencephalography (EEG) signals contain important information about the inner functioning of the brain. Effective extraction of this information will help in the detection of brain-related health conditions and emotions of a person or it can also be used as a communication medium between humans and machines. In our proposed system, we introduced Adaptive DWT by combining the temporal resolution capability of DWT, with the special capability of Fourier transform to remove the artefacts in the signal. This is achieved by using an adaptive thresholding function rather than hard or soft thresholding to improve the quality parameters of the signal. The proposed filtering model has improved the Signal to Noise ratio when compared to traditional filtering techniques. EEG features are extracted with the help of Heuristic-Independent Component Analysis (ICA) by applying covariance to equalize or improve the data. The main drawback with the existing CNN algorithm is gradient vanishing during training, this reduces the overall performance of the algorithm during classification. Therefore, using the memory function to store the previous value of iteration improves the classification accuracy and reduces the gradient vanishing problem. The proposed technique is found to have better accuracy of about 98% in classifying autism and epilepsy datasets.


Introduction
Electroencephalography (EEG) is a neurophysiological test to detect the neural activity of the brain.There are many applications for EEG signals including emotion detection, phycological disorder detection, etc.Many studies in recent years that claim that we can even interact with a machine using effective extraction of EEG signals.The Brain Control Interface (BCI) can also act as the interface between humans and machines.Many disadvantages with BCI technologies include amplitude suppression and the need for wet electrodes that make it hard for outdoor applications [1,2].Many studies experimented human-machine interfaces.The main disadvantage of these previous techniques is their low accuracy and the need for training by test experiments, inspite of high training the accuracy of selecting a particular target is about 63% or less [3][4][5].This is because data from each of the channels are the linear combination of output from many channels.The linear combination of each electrode output (YS) is given by the following function.
Frequency-based analysis or Fourier analysis is not a suitable alternative for signal noise extraction because of its inability to note the signal discontinuities and breakdowns.This brings to a need for an effective signal-filtering method.The main advantage of using a wavelet filter for noise reduction is its ability to extract the transient behaviour of the signal.There are many types of wavelets.The wavelets are the type of signal that starts at some point and ends at some finite point at the signal this is not the case in the sinusoidal signal.Because the sinusoidal signal goes from infinity and zero shown Figure 1.
The main concepts in wavelet transform are scaling and shifting.Scaling refers to the change in the centre frequency in the signal.Whereas shifting refers to the change in the position of the wavelet signal.Wavelets are composed of waveform whose average values corresponds to zero.The important attribute of wavelet transform is that the values can be scaled and positions can be altered accordingly.After wavelet transform a threshold value is applied to the deconstructed signal to extract the noiseless EEG signal.To select the proper thresholding value for the function gradientbased algorithms are used.The main drawback with a gradient decent-based algorithm is the number of iterations it takes to reach convergence.The more complex the signal, the more the number of iterations.Therefore, population-based threshold-finding methods gain interest [6][7][8][9].
Due to pseudo-Gibbs effects and deviation problems during signal reconstruction hard and soft thresholding functions cannot be used for noise suppression [10][11][12][13].Therefore, an effective thresholding function needs to be implemented along with Wavelets used for the signal that has abrupt changes in the data.Fourier transform has no time or space concern so it can't be used for EEG data smoothening.The main function of ICA is to extract the original EEG signal from the mixed signal.The main assumptions that are made before ICA are the signals are non-gaussian and independent of one other.The main disadvantage of existing ICA techniques is they use blunt elimination functions to extract the noisy content from the signal.To improve the signal quality and enhance the signal equalization we need a separate technique that uses mutual information values of the signals to extract the output of individual electrodes.The applications of these techniques are limitless, easy identification of affected brain parts, effective seizer detection, and even it can be used in BCI applications.The applications are endless.The main hurdle to be crossed is to find an effective classification function to extract the required signal component from the signal.
The main alternatives for classification algorithms are simple heuristic algorithms which don't serve the purpose of accuracy enhancement or else we can use machine learning algorithms which will increase the computation time and cost.Therefore, a much more effective and computationally reasonable algorithm needs to be used.The viable alternative for that will be SVM classifiers.The effectiveness of SVM classifiers can be improved using effective kernel functions which can increase the classification accuracy for our problem.The things that are discussed are addressed below.

Literature review
Ocular artefacts can be removed using haar waveletbased ICA techniques.Independent component analysis can be done based on the assumption that each of the electrode sources is linearly independent of each other, and they are non-gaussian.In these methods, the artefacts can be removed using wavelet-based methods, which are used to remove the harmonic components in the signal.Therefore, wavelet transform and ICA techniques are a natural partner to remove noise signals from EEG [14,15].The double-density wavelet method along with the ICA algorithm can also be used to remove artefacts from the signal.In this method, the ICA algorithm was used to decompose the input EEG signal and these decomposed signals are then analysed using wavelet transform [16].The main problem with ICA algorithms is their inability to reach a global maximum, this can be improved using ICA-R (ICA-reference) and constrained-ICA (cICA) methods.The ICA-R algorithm separates each source from the input using a deflationary scheme.To overcome the problems of penalized contract function in ICA-R, the combination of ICA and DE (Differential Evolutionary) algorithms [17][18][19] are used.Motor movement in EEG signals can be removed using clustering algorithms like KNN.The signal is decomposed using ICA and the decomposed signals' distance values are calculated and clustered to find the motor noise signals.The main problem with clustering techniques is their lack of ability to denoise multi-variate inputs [20].
Hybrid methods also are used to extract original EEG signals.The methods discussed above are offlinebased noise cancellation methods, for online signal noise cancellation a combination of ICA and Adaptive Noise Cancellation (ANC) can be used.This can be done by calculating the interference pattern between the input signals [21].Deep learning methods are employed to remove artefact noise from the signal.Ocular artefact in the signal is stored and used to train the Deep Learning Network (DLN).The deep learning methods show better efficiency than ICA methods [22][23][24][25].The correlation between signals is used to remove noise in the signal.The Canonical Correlation Analysis method is used to find signals with high correlation and low correlation.These correlation values are then selected with the spectral-slope rejection technique [26][27][28][29].

Methodology
The EEG classification is divided basically into two parts: noise removal and classification.Figure 2's flowchart can give a better understanding of the working of our proposed algorithm.

Pre-processing using DWT
To overcome the common problem of pre-mature convergence in wavelet-based filtering using the global thresholding technique, a new technique called adaptive thresholding is proposed.The thresholding function T(W i,k , λ) can be expressed as follows where W is the Ith wavelet coefficient of the signal at scale "k" and λ is the threshold value and n is any positive integer until infinity.The thresholding value is calculated based on the population creation done with the help of the population selection function.
where lr is the learning rate of the algorithm.This will reduce the number of iterations it takes to find the suitable threshold value for each EEG channel.

ICA signal decomposition
The Noisy EEG signal can be modelled as S i = S i x n, whereas S i is the pure EEG signal from channel "I" and n is the mixing signal.The use of Independent Component Analysis to separate each of the independent components from the mixed signals.Then the value of each signal is the most important assumption to be made before the ICA algorithm is to analyse all the sources of the signals as independent of each other.This is not the same case for EEG signals as the sensors will pick out the same type of output from the brain at each different electrode.Therefore, in our proposed method we are going to find out the mutual correlation value between two electrodes.This can be found using Shannon's information theory.
where H is the entropy estimate value, f is the probability of electrode mutual correlation between electrode x and y.
The mutual information of each electrode can be calculated using the following formula where I(x) is the true information of electrode x.H is the entropy value of the electrodes.The value of mutual information is used to find the covariance between two electrodes.The value of mutual information is sorted in ascending order and electrodes with order "k" is used for separating independent components.
The covariance value of each electrode is calculated between each of the adjacent electrodes with the most mutual information value.This value is then used to find the eigenvalue and eigenvector for the value of covariance.The value of covariance is assigned on the basis of "I" value in a matric of order n * k.The value of n is equal to the k value to make it a square matrix to calculate eigenvalues.The covariance values are organized, as shown in the following equation.
To eliminate the noise region from the data we need to separate the data using the heuristic feature of support vector machines.The component that we classified is highly non-linear in nature to make this data into a linear space we used kernel multiplication (Form) to bring the dimensional data into a higher dimension.Choosing the best kernel for the classifier can improve the training accuracy of the SVM classifier.The Gaussian kernel is used for the separation of data points.
x 1 -x 2 is the Euclidean distance between two data points, γ is the angular distance.The value of output data is iterated every epoch and the Euclidean distance between kernel and data points is calculated.This is done on every iteration and data with minimum Euclidean is selected to classify noisy regions from the data.Then inverse ICA function is used to reconstruct the signal.

Classification
The output signal extracted from the heuristic-ICA technique is then classified using M-CNN (Memorybased CNN).The main function of U-CNN is to eliminate the vanishing gradient problem and to reduce the computational cost of the CNN algorithm.This is done with the help of comparing the output value of the previous iteration to select the data instead of using other pooling methods in the layer.The main advantage of our proposed method is we eliminated a selection layer from the GRU method shown in the update gate architecture, shown in Figure 3a,b.With the reduced number of gates and activation function, computational cost is reduced by 25%.
Our proposed M-CNN has the advantage of both CNN and LSTM algorithms.To reduce the effect of gradient descent, the value of output from the previous iteration is saved for the net iteration.The value of output is used for the next layer and the value of h t is stored for the next iteration.Our proposed M-CNN consists of five layers, three convolutional layers with Leaky-ReLu activation function and two layers with memory function.
The slope function of Leaky-ReLU is updated on each iteration, the slope for positive values is higher than for negative values.
Applying the convolutional layer with the sequential data processing capability of the GRU-RNN layer we can develop an effective classification algorithm for EEG abnormality classification.

Data processing steps
The EEG data were collected from healthy individuals with a sampling rate of 256 by placing electrodes in international guidelines.The signal was recorded for five seconds for each interval and the eye blink and movement of eyeballs are noted and saved in an annotation file.To simulate motion artefacts in the signal the two electrodes placed near are moved in different directions simulating motion in individuals.The time value of these motion artefacts is saved.To analyse the performance of algorithms the motion artefact and eye blink data are measured to find the sensitivity of algorithms.Data processing is done with EEGLAB version 2021.1 in MATLAB.

Results
To prove the effectiveness of our proposed model the output variables are compared with existing techniques  from heuristic models to machine learning models.And the results were promising compared to previous methods.The first step of our model is noise removal.
With the help of the A-DWT algorithm to find its efficiency we compared our present outputs SNR and RMSE values with previous techniques.The formula for SNR can be given as ( 11) where X d is the denoised EEG signal and x is the noise EEG signal.The input EEG signal is filtered with the help of our proposed A-DWT algorithm with adaptive thresholding function.The output EEG signal after the A-DWT algorithm is compared with the input EEG signal, as shown in Figure 4.
The proposed A-DWT algorithm is compared with existing techniques which use transform functions.The noise reduction performance of our proposed A-DWT with existing transform algorithms is shown in Table 1.PSNR can be used to find the power of noise to the total   power input signal.The higher the value of PSNR, the higher the quality of signal with minimum noise in the signal.Mean Squared Error is the estimate the difference between the square of actual value to the square of original value.It is shown in the table that our proposed algorithm performs better in noise removal and error reduction.
After denoizing the input EEG signal, independent component analysis is applied to the input EEG signal.The value of k is set equal to n, this is used to make the input a square matrix to reduce the error in eigenvalue calculation.The independent component value calculated using our proposed technique is shown in Figure 5.
After decomposition by ICA, the components of individual electrodes are shown in the above figure.The total components are equal to the input channels.The independent components value are shown in Figure 5.
Figure 6 shows the topological plot of EEG data for all the 64 channels after applying our proposed method.The above figure shows the sensitivity of electrode values after denoising using ICA.The blue region in the image shows the lower intensity and the yellow region shows the higher intensity.
The scatter plot shows the sensitivity of EEG equalization that is done using our proposed Heuristic-ICA algorithm.The sensitivity calculation of epilepsy and autisim detection of different datasets shows that our proposed ICA algorithm is effective in seperating different types of data shown in Figure 7.The variance value is used to measure the amount of correlation between two different electrodes.These component variance values are used to separate the amount of signal integrated between the electrodes.From Figure 8 it is seen that components IC1, IC2 and IC3 have the most variance value among all other components.This component shows the most amount of shared information between electrodes.
Shanon entropy is used to measure the amount of information that is shared between all the components.Therefore, the higher the entropy the higher will be the chance of the presence of correlation.This will be evident in Figure 9 that component with high variance shows higher entropy.
The value of ROC gives the comparison between both sensitivity and specificity values.The higher the distance between the output curve and the median curve at 45°higher the accuracy of the algorithm to predict accurately.From Figure 10 it is shown that our proposed M-CNN algorithm performs better than other deep learning techniques.The accuracy is higher than deep learning algorithms with memory function and other algorithm with heuristic function.
The computational cost of neural networks for different batch sizes is compared in Figure 11.In Figure 11, it is shown that the proposed M-CNN algorithm consumes 30% less computational time and power than RNN and CNN algorithms [30][31][32][33][34][35][36][37][38][39].Due to the fewer gates and memory functions in the algorithm the training time will be reduced drastically.
Table 2 shows the accuracy and sensitivity comparison of the existing deep learning algorithms.Accuracy and sensitivity are calculated using the formula Accuracy = TP+TN TP+FP+FN+TN and Sensitivity = TP TP+FN .It is clear from the comparison table that the proposed M-CNN deep learning method has higher accuracy and high sensitivity in detecting both autism and epilepsy .Therefore, the proposed algorithm is more sensitive for different types of abnormalities and they can be used for classifying all the other abnormalities.

Conclusion
We designed an efficient and hybrid classification technique to classify different types of abnormality data.The proposed A-DWT algorithm has better noise removal than compared to other filters.The overall noise value in the signal is reduced for about 20%.The filtered EEG signal is then equalized with the help of the Heuristic-ICA algorithm.Our proposed system was effective in separating different types of datasets effectively.The values of Shannon entropy and variance values are used to calculate the mutual information between different independent components.After the equalization of EEG signals, the signal is classified using the M-CNN algorithm.The proposed M-CNN algorithm has the advantage of reducing the computation cost of upto 30% during the classification of abnormality.Apart from the reduced computation time, the M-CNN algorithm has higher accuracy of about 98.1% and higher sensitivity for different abnormalities.The proposed method is more sensitive for all different types of abnormality; therefore, this technique can be used for different types of prediction in EEG signal.In future, this method needs to be implemented in BCI applications and calculate its performance.

Disclosure statement
No potential conflict of interest was reported by the author(s).

Figure 3 .
Figure 3. (a, b) Architecture of the proposed memory pooling layer.(a) Memory pooling layer architecture.(b) Update gate architecture.

Figure 6 .
Figure 6.Topoplot of EEG data at different time intervals after the Heuristic ICA algorithm.

Figure 7 .
Figure 7. Data quality before and after proposed the Heuristic ICA algorithm.(a) Data after heuristic-ICA equalization.(b) Data before heuristic ICA equalization.

Figure 9 .
Figure 9. Entropy value of the proposed heuristic ICA algorithm.

Figure 11 .
Figure 11.Computation performance of the proposed M-CNN algorithm.

Table 1 .
Comparative analysis of proposed denoising techniques.

Table 2 .
Accuracy and sensitivity comparison between different algorithms.