Developing a hybrid algorithm to detect brain tumors from MRI images

Image processing technologies have been developed in the past two decades to help clinicians diagnose tumors using medical images. Computer-aided diagnosis systems (CADs) have proven their ability to increase clinicians' detection rate of positive cases by 10% and have become integrated with many medical imaging systems and technologies. The study aimed to develop a hybrid algorithm to help doctors detect brain tumors from magnetic resonance imaging images. We were able to reach a detection accuracy of 96.6% and design a computer application that allows the user to enter the image and identify the location of the tumor in it if it exists with many additional features. This approach can be improved by using different segmentation techniques, extracting additional features, or using other classifiers.


Background
Brain cancer is considered one of the most dangerous and most common types of cancer. Hence, research focused on improving the quality of brain images in a non-intrusive manner, which is magnetic resonance imaging. MRI depends on stimulating water molecules' protons in the human body and hitting them with radio waves, where the protons respond to this stimulation by generating radio energy [1].
Along with the significant development in the medical image processing field, image processing algorithms have become an essential part of the MRI device software, from simple operations such as contrast control, edge detection, and gray-level transformations, to image segmentation, classification, and brain image diagnosis [2].
In the past couple years, several critical studies have been conducted in this field. In 2019, a study entitled "Brain tumor classification using MRI images with K-nearest neighbor method" was launched [3], this study detected brain tumors and classified the tumor into three types using watershed segmentation and K-nearest neighbor (KNN) classification algorithm, but the accuracy reached was 89%, which is not sufficient.
This was followed by a study in 2020 entitled "Detection and classification of brain tumor using support vector machine-based GUI" [4], this study relied on wavelet transformation to extract features and used principal component analysis (PCA) technology to reduce their number. A graphical user interface (GUI) was also designed to display the processing results. One of the disadvantages of this study is that the accuracy was not calculated to determine its success, in addition to that, the designed GUI displays the values of the extracted features which do not mean much to the user.
The same year, a study entitled "Semantic segmentation of brain tumor MRI images and SVM classification using GLCM features" was published [5]. This study also used the watershed segmentation technique, extracted gray level co-occurrence matrix (GLCM) features, and then compared the results of classification using six support vector machine (SVM) classifiers, the highest classification accuracy was for both the linear SVM and the quadratic SVM, with 93%. One of the disadvantages of this study is that it used only 36 images for training, which is not an adequate number, where employing additional features such as shape features could have raised the accuracy further.
The above research papers represent the gold standard we have attempted to outperform by applying a method based on several algorithms that achieved higher accuracy than all previous similar studies.
The primary motive of the research is the extremely large number of medical images that consumes a lot of time and effort to diagnose, and the inability of the clinician sometimes to determine all suspicious areas in the image, in addition to the lack of previous studies that reached a satisfactory result, so we designed an innovative hybrid algorithm, in which we relied on a database of 150 cross-sectional MRI images of the brain.
We followed a methodology that consists of two main stages: 1. Classifying magnetic resonance images of the brain into images with or without a tumor and displaying the tumor if it exists.
This stage consists of several steps: preprocessing and enhancement, segmentation, feature extracting, and classification in a hybrid manner based on the results of three classifiers combined. This method reached an accuracy of 96.6%. 2. Designing and programming a graphical user interface and a standalone application using MATLAB 2018a. The importance of this research lies in the ability to diagnose a large number of images in a short period of time, thus reducing the burden on the doctor, it also gives more accurate diagnoses because of its ability to distinguish areas that may not be visible to the naked eye, in addition to the possibility of using these programs in training medical students to diagnose images and identify suspicious areas.

Database
The dataset analyzed during the study is a standard dataset and available on the internet through Kaggle https:// www. kaggle. com/ navon eel/ brain-mri-images-for-braintumor-detec tion [6]. It is classified into two subfolders, the "YES" subfolder which contains 93 brain images with tumors, and the "NO" subfolder which contains 57 healthy brain images.

Work stages
The process begins with pre-processing and image enhancement, then cropping the tumor area, followed by features extraction and classification, and ending with designing a standalone application to display the results. The flow chart in Fig. 1 shows the stages of work.

Preprocessing and enhancement
Images entered by the user may vary in accuracy, clarity, size, and color. Consequently, before operating on the image, some improvements and adjustments are necessary to standardize the quality of the pictures and obtain better results in the subsequent processing stages. First, we transformed the image into a grayscale and then adjusted its size to 300 * 300 pixels to standardize the images in terms of accuracy and processing time.
Then we noticed that the black background represents a large part of the image and doesn't contain any valuable information, which places a massive mathematical burden on the subsequent stages. Thus, we cropped the parts of the image outside the rectangle containing the beneficial information, using the method of vertical and horizontal projection.
Afterward, we filtered the image using two types of filtering: • the median filter, which performs as a smoothing filter to eliminate all the noise in the image [7]. • the unsharp masking, which performs as a boosting filter to reinforce and clarify all the details in the image [8].

Segmentation:
This stage aims to detect the region of interest, i.e., the tumor area, to extract features from this area to use them in classification later, and we achieved this through two steps: • Skull stripping: This stage consists of four basic steps: • Thresholding using the average gray-level value method [7,8]. • Using the Connected Components algorithm to maintain the largest component only (the brain tissue) [8]. • Performing a closing operation on the image to fill all the holes [7,8]. • Retrieving original pixel values.

• Segmenting tumor area:
This step is one of the most complicated steps of the extraction process because the tumor area overlaps with the brain tissue.
This stage also consists of four basic steps: • Gamma transformation to increase the contrast between the tumor and the brain tissue [8]. • Thresholding using Otsu's method to automatically binarize the Gamma image [9]. • Morphological operations to delete all the unwanted parts and fill the gaps in the image.
• Retrieving original pixel values.

Feature extraction
After the acquisition of the tumor area, different elements of the image must be represented as a set of features to use in classification and diagnosis.
Following are the features that we extracted: Texture features [10,11]:

Classification
At this stage, we used the features extracted in the previous step to classify the image as one with a tumor "Yes" or one without a tumor "No".
We have trained three different classification algorithms: But what caught our attention is that the images a classifier classifies wrong can be different from the images that another classifies wrong, for instance, MG-SVM can make a mistake classifying one of the images, while the other two classify it correctly, so based on that idea, we have come up with a hybrid algorithm that relies on the previous three classifiers together, where each image is classified using each of the previous classifiers and then the final classification result depends on the majority opinion since it is unlikely that all or most of the classifiers will get the same image wrong, (for example: if MG-SVM classifies an image as "NO" while Fine KNN and Cosine KNN classify it as "YES", then the final result is more likely to be a "YES").

Standalone application
We have designed a GUI using "App Designer" (a program attached to MATLAB 2018a) and have programmed it using Object-Oriented Programming. This interface enables the user to enter an image, and display the tumor if it exists, the result of each processing stage, the classification decision of each classifier, and the final classification result, in addition to other display features like contrast and brightness. We then converted this interface into a stand-alone application, which can be transferred, installed, and used on any computer, even if it does not have an installed version of MATLAB.

Evaluating the classification
After applying all stages, from enhancement and segmentation to feature extraction and classification, we had to test the accuracy of the hybrid classifier and compare it with the accuracy of each classifier separately. We achieved this by drawing a Confusion Matrix for each classifier, from which we derived several values that reflected the success and effectiveness of the classification. But before doing so, some basic concepts needed to be clarified: • True Positive (TP): images of an injured brain that have been classified as injured. • False Positive (FP): images of a healthy brain that has been classified as injured. • True Negative (TN): images of a healthy brain that has been classified as healthy. • False Negative (FN): images of an injured brain that have been classified as healthy [12].
Heading out of the afore-mentioned concepts, we will clarify the most significant values deduced from the Confusion Matrix: • Sensitivity: It is the classifier's ability to distinguish disease states, and it is calculated as the number of injured images that were classified as injured, over the total number of injured images [12], i.e.: • Specificity: It is the classifier's ability to distinguish healthy states, and it is calculated as the number of healthy images that were classified as healthy, over the total number of healthy images [12], i.e.:

T P (T P + F N )
• Positive Predicted Value (PPV): it is the number of injured images that were classified as injured, over the total number of images that were classified as injured [13], i.e.: • Negative Predicated Value (NPV): it is the number of healthy images that were classified as healthy, over the total number of images that were classified as healthy [13], i.e.: • Accuracy: it represents the classifier's ability to classify correctly, and it is calculated as the number of correctly classified images, over the total number of images [14], i.e.:

Results
We will review the resulting images after applying each stage of the algorithm to an image from the database. In Fig. 2, we can see an original image from the database, and in Fig. 3, we may observe the result of the

T N (T N + F P)
T P (T P + F P) preprocessing and enhancement stage, which mainly includes the process of eliminating the black background and filtering. Figure 4 illustrates the result of the skull-stripping process. It is worth mentioning that the importance of this step is due to the great similarity between the gray level of both the skull and the tumor, so cropping the skull pixels reduces the error. Figure 5 depicts the result of segmenting the tumor after thresholding using Otsu's method.
While Figs. 6 and 7 express the designed GUI. We may notice that the interface enables the user to display the result of each work stage, and the classification result of each classifier separately, in addition to the final classification result. It also provides a set of additional features such as tumor encircling and brightness/ contrast control.

Discussion
After completing all the work stages, it was necessary to evaluate the accuracy of the results we obtained, that is why we calculated the testing and training accuracy resulting from each model separately, and the results were the following: However, the proposed hybrid model reached a testing accuracy of 96.7%.
We plotted the confusion matrix of each classifier to calculate the essential values needed to evaluate the results (Fig. 8, Table 1).
The following table demonstrates the values extracted from the Confusion Matrix for each classifier separately, along with the hybrid classifier.
We note from the table that the final accuracy of the hybrid classifier is higher than the accuracy of each classifier separately, which indicates the success of our innovative hybrid method and its clear and remarkable improvement in results.

Conclusions
Due to the large field of image processing, there is always room for development and improvement. Therefore, we will not stop at this point, but rather seek to develop and improve this work, whether by adopting better segmentation methods, extracting more features, or even training more classifiers such as Decision Trees and Logistic Regression, and comparing the results. The algorithm can also be developed to diagnose the type of tumor, according to the available datasets.
In the end, despite all the difficulties, we believe that we have been able to reach a good result and an innovative algorithm that has not been implemented before, hoping that it will be the beginning of better achievements and results in the future.