Design of a hybrid model for cardiac arrhythmia classification based on Daubechies wavelet transform

Material and methods. The design phase of the classification model comprises the following stages: preprocessing of the cardiac signal by eliminating detail coefficients that contain noise, feature extraction through Daubechies wavelet transform, and arrhythmia classification using a collaborative decision from the K nearest neighbor classifier (KNN) and a support vector machine (SVM). The proposed model is able to classify 5 arrhythmia classes as per the ANSI/AAMI EC57: 1998 classification standard. Level 1 of the proposed model involves classification using the KNN and the classifier is trained with examples from all classes. Level 2 involves classification using an SVM and is trained specifically to classify overlapped classes. The final classification of a test heartbeat pertaining to a particular class is done using the proposed KNN/SVM hybrid model.


Introduction
Cardiovascular disease is a leading cause of global mortality.Hence, there is a need to develop automation strategies for the management of sudden cardiac death. 1 The objective of this work is to automate cardiac arrhythmia classification.An abnormality in the normal rhythm of a heartbeat causes arrhythmia.The ANSI/AAMI EC57: 1998 classification standard categorizes arrhythmias into 5 classes, namely: non-ectopic beat (N), supra-ventricular ectopic beat (S), ventricular ectopic beat (V), fusion beat (F), and unknown beat (Q).The diagnosis of a specific class of arrhythmia is done by careful monitoring of a long-term electrocardiograph (ECG) signal.Automation in ECG arrhythmia classification is very essential in order to make a fast and accurate decision about the arrhythmia class.
The key requirements of an automated system are reduced complexity, fast decision making, and less memory.Several research projects have been carried out for automation in arrhythmia classification.In general, the algorithm used for automated classification includes (i) preprocessing, (ii) feature extraction, and (iii) feature classification.The preprocessing of recorded ECG signals is done in order to eliminate the important noises that degrade the classifier performance, such as baseline wandering, motion artifact, power line interference, and high frequency noise.5][6][7]9,25 Commonly extracted ECG features include (i) temporal features of heartbeat, such as the P-Q interval, the QRS interval, the S-T interval, the Q-R interval, the R-S interval, and the R-R interval between adjacent heartbeats, (ii) amplitude-based features, such as P peak amplitude, Q peak amplitude, R peak amplitude, S peak amplitude, and T peak amplitude, (iii) wavelet transform-based features that include Haar wavelets, Daubechies wavelets, and discrete Meyer wavelets at various decomposition levels of 4, 6, and 8, and (iv) Stockwell transform-based features, including statistical features taken from a complex matrix of Stockwell transform, time-frequency contour and timemax amplitude contour.
][13][14][15][16]21,[24][25][26] Parameters such as accuracy, sensitivity, and specificity are used in the literature for evaluating the performance of a classifier.Most of the research works reported more than 90% average accuracy, average sensitivity, and average specificity taken over all 5 classes.However, the classifier outputs very poor sensitivity when the sensitivity of individual classes is considered.The reason is that, in a medical scenario, the number of training examples for each class of ECG arrhythmia may not be uniform.Usually, the normal class of heart beats dominates the entire population, which leads to biased classification towards classes with larger examples.Some of the common limitations in the literature are listed as follows: 1. Time interval features are used in many automated systems. 2,5,7,8Hidden information in the ECG signal cannot be completely recovered from those time domain features.2. Few researchers have used the entire data set of MIT_BIH arrhythmia database for experimentation.A random selection of only a few records from the entire database may not provide the actual result of their proposed system. 2,5,7-9,23,24,283. A few research works did not follow a standard classification scheme, such as the ANSI/AAMI EC57: 1998 standard.No special care is taken to overcome this issue.This work eliminates the above limitations by extracting features from the time-frequency representation of an ECG signal through wavelet transform.The entire dataset of a benchmark database (i.e., the MIT_BIH arrhythmia database) is used and the proposed model adheres to the classification standard.The proposed model trains the classifier in such a way that the classifier better predicts the minority class using a hybrid approach.

Material and methods
The MIT_BIH arrhythmia database was used in this work.It contains 48 half-h excerpts of 2-channel ambulatory ECG recordings which were obtained from 47 subjects studied by the BIH arrhythmia laboratory.The recordings were digitized at 360 samples per second per channel with 11-bit resolution over a 10-mV range.The reference annotations for each beat were included in the database.Four records containing paced beats (102, 104, 107, and 217) were removed from the analysis as specified by the AAMI.The total number of heart beats in each class is given in Table 1. Figure 1 shows the architecture of the proposed work.The entire experiments were carried out using Matlab R2012a (Math-Works, Natick, USA).The details of the methodology followed are summarized below.

Data preprocessing
The records contain continuous ECG recordings of a duration of 30 min.The raw ECG signals include baseline wander, motion artifact, and power line interference noise.The discrete wavelet transform (DWT) is used for denoising the ECG signal and for extracting the important features from the original ECG signal. 22,27The DWT captures both temporal and frequency information.The DWT of the original ECG signal is computed by successive high pass and low pass filtering of that signal.This can be mathematically represented as follows in equations ( 1) and ( 2), where x[n] is the original ECG signal samples, g and h are the impulse responses of the high pass and low pass filters, respectively, and are the outputs of the high pass and low pass filters after sub-sampling by 2. This procedure is repeated until the required decomposition level is reached.The low frequency component is called approximation and the high frequency component is called detail.
In this work, the raw ECG signals sampled at 360 Hz were decomposed into approximation and detail sub bands up to level 9 using Daubechies ('db8') wavelet basis function. 18The first and second level detail coefficients were made zero and were not used for reconstruction of the denoised signal, since most of the ECG information is contained within the 40-Hz frequency range and sub bands at the first and second levels contain the frequency ranges 90-180 Hz and 45-90 Hz, respectively.Moreover, power line interference noise occurs at 50 Hz or 60 Hz.Baseline wander noise occurs in the frequency range of <0.5 Hz, and therefore, the level 9 approximation sub band in the frequency range of 0-0.351Hz was not used for reconstruction.The denoised signal was obtained by applying inverse DWT to the required detail coefficients of levels 3, 4, 5, 6, 7, 8, and 9.The coefficients of detail sub bands 1 and 2 and the approximation sub band 9 were made 0.
After denoising, the continuous ECG waveform was segmented into individual heartbeats.This segmentation is done by identifying the R peaks using the Pan-Tompkins algorithm and by considering the 99 samples before the R peak and the 100 samples after the R peak. 19This choice of 200 samples, including the R peak for segmentation, was made because it constitutes one cardiac activity with P, QRS, and T waves.Figure 2 shows a segment of a recorded ECG waveform of patient No. 123 before and after preprocessing.

Feature extraction
The entire database (97,890 heartbeats) is divided into 10 sets, each containing 9,789 heartbeats.Nine sets are used for training (88,101 heartbeats) and 1 set for testing (9,789 heartbeats).From each heartbeat, wavelet-based features are extracted by using Daubechies wavelet ('db4').A Daubechies wavelet with level 4 decomposition was selected in this project after making performance comparisons with a discrete Meyer wavelet and other levels of Daubechies wavelets, including 'db2' and 'db6'.A total of 107 features were produced by the 4 th level approximation sub-band and another 107 features by the 4 th level detail sub-band.Principal component analysis (PCA) was applied to reduce redundant information on the extracted features and to reduce the dimensionality.After dimensionality reduction was applied separately to the approximation and detail sub-bands, a total of 12 features were obtained.The choice of 6 features from each sub-band was made since there is no significant improvement in classification when more than 6 features are used.

Training of classifiers
The training and testing matrix was computed, in which each row represents an ECG heartbeat and the features occupy the columns.The KNN (with distance metrics such as Euclidean, correlation, Mahalanobis, standardized Euclidean, and Spearman), tree, and discriminant (linear and quadratic SVM is used only to make a final decision of a highly overlapped minority class.A description of the classifiers used is discussed in the following sections.

K nearest neighbor classifier
KNN is an instance-based simple classification algorithm.For a training data set of N points and its corresponding labels, given by {(x 1 , y 1 ), (x 2 , y 2 )… (x N , y N )}, where (x i , y i ) represents a data pair 'i' with 'x i ' as the input feature vector and 'y i ' as its corresponding target class label, the most likely class of test beat 'x' is determined by finding the K closest training points to it.The prediction of a class is determined by majority vote.The distance is taken as the weighting factor for voting.The main advantage in selecting the KNN classifier is that complex tasks can be learned using simple procedures by local approximation.The training process for KNN only consists of storing feature vectors and their corresponding labels.It also works well on classes with different characteristics for different subsets. 20

Tree-based classifier
The decision tree algorithm works by selecting the best attribute to split the data and expand the leaf nodes of the tree until the stopping criterion is met.After building the tree, tree pruning is performed to reduce the size of the decision tree.This is done in order to avoid overfitting and to improve the generalization capability of decision trees.The class of a test heartbeat is determined by following the branches of the tree until a leaf node is reached.The class of that leaf node is then assigned to the test heartbeat.The advantage of this algorithm is its simplicity and good performance for larger data sets.Gini's diversity index is used as the split criterion in this work.

Discriminant classifier
The algorithm creates a new variable from one or more linear combinations of input variables.Linear discriminant analysis is done by calculating the sample mean of each class.Sample covariance is calculated by subtracting the sample mean of each class from the observations of that class, and taking the empirical covariance matrix of the result.In the linear discriminant model, only the means vary for each class, but the covariance matrix remains the same.For quadratic discriminant analysis, both the mean and covariance of each class varies.

Support vector machine
The support vector machine (SVM) constructs a hyper plane in such a way that the margin of separation between positive examples (minority class S) and negative examples (majority class N) is maximized.Since classes S and N overlap very much, the hyper plane cannot be linearly separable and cannot be constructed without a classification error.For such overlapped patterns, SVM performs nonlinear mapping of the input vector into a high dimensional feature space.An optimal hyper plane is constructed for separation of these newly mapped features.The hyper plane is constructed in such a way that it minimizes the probability of a classification error.For a training set X with N number of training examples, if {(x i , d i )} is the i th training example, where x i is the input vector for the i th example and d i is its corresponding target output, α i is the i th Lagrange multiplier, K(x, x i ) is the inner product kernel, and b is the bias, then the optimal separating hyper plane is defined as in Equation ( 3): (3) A radial basis function SVM was used in this work instead of polynomial and two-layer perceptron because of its higher discrimination ability.The inner product kernel K(x, x i ) of a radial basis function with width σ is given by equation ( 4): (4)   The performance of the proposed model was evaluated using performance metrics such as sensitivity, specificity, positive predictivity, F-score, and accuracy.These metrics are computed by calculating true positive (TP), true negative (TN), false positive (FP), and false negative (FN) counts and are defined as follows: sensitivity = TP / (TP + FN), specificity = TN / (TN + FP), positive predictivity = TP / (TP + FP), F-score = 2TP / (2TP + FP +FN), and accuracy = (TP + TN) / (TP + FP + FN +TN).The process is repeated 10 times so that each set is used once for testing.The overall performance of the classifier is computed by taking the average of all 10 folds.

Results and discussion
The reliability of a classifier in accurately predicting the test heartbeat's class is measured mainly by the sensitivity and F-score.The reason for not considering accuracy is that even a poor classifier can show good accuracy in favoring a class with more training examples.It can be observed from Fig. 3 that a discriminant classifier with linear and quadratic function produces consistently less sensitivity than KNN and tree classifiers.The KNN with Euclidean distance metric produces the highest sensitivity.
Figure 4 shows the specificity of all classifiers in each of the 10 folds.The discriminant classifier produces the least specificity.The KNN classifier produces the highest specificity.Figure 5 shows the F-score of all classifiers in all 10 folds.The discriminant classifier with a linear function produces the lowest F-score.The tree classifier and the quadratic discriminant classifier produce a nearly uniform F-score, while the KNN classifier achieves the highest F-score.
The KNN with Euclidean distance metric achieves the highest accuracy compared to other classifiers and is shown in Fig. 6.
Table 2 shows the average classification results of all classifiers at level 1.One can see from Table 2 that the KNN with Euclidean distance metric and 4 neighbors produces a better sensitivity, specificity, positive predictivity, F-score, and accuracy than the other 2 classifiers which were considered.Hence, KNN is used at level 1 of the proposed model.KNN with 3 neighbors also produces comparable results to KNN with 4 neighbors.Compared to KNN with 4 neighbors, the KNN classifier with 3 neighbors has a greater discrimination capability for class S.
From the confusion matrix obtained from tenfold cross validation using different classifiers, it was found that a high number of class S heartbeats are misclassified

Conclusions
In this paper, a hybrid classification model is proposed which inherits the abilities of both SVM and KNN.Instead of using a simple classifier as KNN for predicting highly overlapped classes, this mixed model improves the sensitivity of minority classes, which is dominated by the majority class.SVM is specifically trained to classify overlapped classes.At the same time, the low complex KNN classifier is trained to classify all 5 classes.Hence, the final decision of a test heartbeat is done using classifiers at both levels of the hierarchy.The performance of this model is supported by experimental results on the entire MIT/BIH arrhythmia database.Future work will experiment with other combinations of classifiers.

Fig. 1 .
Fig. 1.Architecture of the proposed work ) classifiers are trained with a training matrix 88,101 × 12 in size, which includes training examples from all 5 classes.The sensitivity, specificity, accuracy, positive predictivity, and F-score of those classifiers in classifying ECG arrhythmias were compared.The classifier that produced the best sensitivity and F-score was selected at level 1 of the proposed model.The radial basis function SVM was used at level 2 of the proposed model and was trained with examples from the entire class S and down-sampled examples from class N. Random down-sampling of class N is done in order to match the sample size of class S (2,646 × 12).The reason for this design is that samples from classes S and N are highly overlapped and many class S samples are wrongly predicted as class N at level 1 because of the large number of class N training examples (87,643 × 12).More weight is given to a decision from the SVM classifier while determining a test heartbeat to be other than class S. The advantage of the SVM classifier is that it performs well on datasets that have many attributes, even when there are few training examples available.But the drawback of SVM is its limitation in speed and size during both training and testing.Because of this limitation, SVM is not used for the training and classification of all classes.

Fig. 2 .
Fig. 2. A segment of an ECG waveform before and after preprocessing A -raw ECG signal; B -approximation subband level 9 and detail subband levels 1-4 (bottom left corner); C -detail subband levels 5-9 (top right corner); D -preprocessed ECG waveforms with R peaks detected.
2,12,13,16,23,244.Classes with major and minor training examples are treated equally in almost all projects, and this may lead to biased results towards major classes.

Table 1 .
Number of heartbeats in each class N − non-ectopic beat; S − supra-ventricular ectopic beat; V − ventricular ectopic beat; F − fusion beat; Q − unknown beat.

Table 2 .
Average classification results of tenfold cross validation for classifiers at level 1

Table 5 .
Sensitivity and F-score of class S before and after using the proposed model EUC 4 -K-nearest neighbour classifier with Euclidean distance 4.to classify class S from class N. The classification result of KNN with Euclidean distance metric (4 and 3 neighbors) was compared with the predicted result of Support vector machine (SVM).A test heartbeat is concluded to be class S if at least 2 classifiers predict it as class S. A sample confusion matrix of the proposed hybrid model is shown in Table4.