skip to main content
10.1145/3328833.3328872acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicsieConference Proceedingsconference-collections
research-article

Feature Extraction and Analysis for Lung Nodule Classification using Random Forest

Published: 09 April 2019 Publication History

Abstract

Early detection of lung nodule decreases the risk of advanced stages in lung cancer disease. Random forest (RF), a machine learning classifier, is used to detect the lung nodules and classify soft-tissues into nodules and non-nodules. A lung nodule classification approach is proposed to improve early detection for nodules. A five stages model has been built and tested using 165 cases from the LIDC database. Stage 1 is image acquisition and preprocessing. Stage 2 is extracting 119 features from the CT image. Stage 3 is refining feature vectors by removing all duplicate instances and undersampling the non-nodule class. Stage 4 is tuning the RF parameters. Stage 5 is examining different collections from the extracted feature sets to select those scores best for classification. The accuracy achieved by RF is the highest compared to other machine learning classifiers such as KNN, SVM, and DT. The proposed method aimed to analyze and select features that maximize classification results. Pixel based feature set and wavelet-based set scored best for higher accuracy. RF was tuned with 170 trees and 0.007 for in-bag fraction. Best results were achieved by the proposed model are 90.67%, 90.8% and 90.73% for sensitivity, specificity, and accuracy respectively.

References

[1]
J. E. Roos, D. Paik, D. Olsen, E. G. Liu, et al. "Computer aided detection (CAD) of lung nodules in CT scans: radiologist performance and reading time with incremental CAD assistance" European Radiology, vol. 20, no. 3, pp. 549--557, 2010.
[2]
S. Lee, A. Kouzani, and E. J. Hu, "Hybrid Classification of Pulmonary Nodules" Communications in Computer and Information Science, vol. 51, pp. 472--481, 2009.
[3]
American Cancer Society, "Cancer Facts and Figures 2019" Genes and Development, 2019.
[4]
S. A. El-Regaily, M. A. Salem, M. H. A. Aziz, and M. I. Roushdy, "Survey of Computer Aided Detection Systems for Lung Cancer in Computed Tomography" Current Medical Imaging Reviews, vol. 13, 2017.
[5]
J. A. Cruz and D. S. Wishart, "Applications of Machine Learning in Cancer Prediction and Prognosis," Cancer Informatics, vol. 2, pp. 59--78, 2006.
[6]
L. Breiman, "Random Forests," Machine Learning, vol. 45, pp. 5--32, 2001.
[7]
A. C. Bellail, J. J. Olson, and C. Hao, "A Generic Approach to Pathological Lung Segmentation" IEEE Trans Med Imaging, vol. 33, no. 12, pp. 2293--2310, 2014.
[8]
T. K. Ho, "Random Decision Forests" in Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, pp. 278--282, 1995.
[9]
T. K. Ho, "The random subspace method for constructing decision forests" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 8, pp. 832--844, 1998.
[10]
L. Breiman, "Bagging predictors" Machine Learning, vol. 24, no. 2, pp. 123--140, 1996.
[11]
E. Harris, "Information Gain Versus Gain Ratio: A Study of Split Method Biases" Amai, 2002.
[12]
S. Lee, A. Kouzani, and E. Hu, "Random forest based lung nodule classification aided by clustering" Computerized Medical Imaging and Graphics, vol. 34, no. 7, pp. 535--542, 2010.
[13]
L. P. Armato III, Samuel G., McLennan, Geoffrey, Bidaut, Luc, McNitt-Gray, Michael F., Meyer, Charles R., Reeves, Anthony P., Clarke, "Data From LIDC-IDRI. The Cancer Imaging Archive," 2015.
[14]
K. Clark, B. Vendt, K. Smith, J. Freymann, J. Kirby, P. Koppel, S. Moore, S. Phillips, D. Maffitt, M. Pringle, L. Tarbox, and F. Prior, "The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository," J Digit Imaging, vol. 26, pp. 1045--1057, 2013.
[15]
S. G. Armato, G. Mclennan, and et al., "The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): A Completed Reference Database of Lung Nodules on CT Scans" Medical Physics, vol. 38, pp. 915--931, 2011.
[16]
M. F. Mcnitt-gray, S. G. A. Iii, and et al., "The Lung Image Database Consortium (LIDC) Data Collection Process for Nodule Detection and Annotation" Acad Radiol, vol. 14, no. 12, pp. 1464--1474, 2008.
[17]
T. A. Lampert, A. Stumpf, and P. Gancarski, "An Empirical Study Into Annotator Agreement, Ground Truth Estimation, and Algorithm Evaluation," IEEE Transactions on Image Processing, vol. 25, no. 6, pp. 2557--2572, 2016.
[18]
M. N. Patel and P. Tandel, "A Survey on Feature Extraction Techniques for Shape based Object Recognition," International Journal of Computer Applications, vol. 137, no. 6, pp. 16--20, 2016.
[19]
Shodhganga, 2008, Chapter 6: feature extraction. Retrieved from Information and Library Network Center.
[20]
INFLIBNET Centre, 2019, January 29. Information and Library Network Center. Retrieved from https://www.inflibnet.ac.in
[21]
M. M. Galloway, "Texture Analysis Using Gray Level Run Lengths" Computer graphics and image processing, vol. 4, pp. 172--179, 1975.
[22]
R. M. Haralick, K. Shanmugan, and I. Dinstein, "Textural features for image classification," in IEEE Transaction on Systems, Man and Cybernetics, 1973.
[23]
D. A. Clausi, "An analysis of co-occurrence texture statistics as a function of grey level quantization," Canadian Journal of Remote Sensing, vol. 28, no. 1, pp. 45--62, 2002.
[24]
V. Michael, "Haralick texture features," 1999.
[25]
T. Messay, R. C. Hardie, and T. R. Tuinstra, "Segmentation of pulmonary nodules in computed tomography using a regression neural network approach and its application to the Lung Image Database Consortium and Image Database Resource Initiative dataset" Medical Image Analysis, vol. 22, no. 1, pp. 48--62, 2015.
[26]
T. Zhou, H. Lu, J. Zhang, and H. Shi, "Pulmonary Nodule Detection Model Based on SVM and CT Image Feature-Level Fusion with Rough Sets," BioMed Research International, vol. 2016, 2016.
[27]
J. John and M. G. Mini, "Multilevel Thresholding Based Segmentation and Feature Extraction for Pulmonary Nodule Detection," Procedia Technology, vol. 24, pp. 957--963, 2016.
[28]
O. Demir and A. Y. Camurcu, "Computer-aided detection of lung nodules using outer surface features," Bio-Medical Materials and Engineering, vol. 26, pp. S1213--S1222, 2015.
[29]
M. Alilou, V. Kovalev, E. Snezhko, and V. Taimouri, "A comprehensive framework for automatic detection of pulmonary nodules in lung ct images," Image Analysis and Stereology, vol. 33, no. 1, pp. 13--27, 2014.
[30]
T. Messay, R. C. Hardie, and S. K. Rogers, "A new computationally efficient CAD system for pulmonary nodule detection in CT imagery," Medical Image Analysis, vol. 14, no. 3, pp. 390--406, 2010.
[31]
T. M. Oshiro and P. S. Perez, "Machine Learning and Data Mining in Pattern Recognition" in MLDM, pp. 154--168, 2012.
[32]
H. Wang, Z. Zhou, Y. Li, Z. Chen, P. Lu, W. Wang, and W. Liu, Comparison of machine learning methods for classifying mediastinal lymph node metastasis of non-small cell lung cancer from 18 F-FDG PET / CT images," EJNMMI Research, 2017.
[33]
M. Anthimopoulos, S. Christodoulidis, A. Christe, and S. Mougiakakou, "Classification of Interstitial Lung Disease Patterns Using Local DCT Features and Random Forest" IEEE, pp. 6040--6043, 2014

Cited By

View all

Index Terms

  1. Feature Extraction and Analysis for Lung Nodule Classification using Random Forest

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICSIE '19: Proceedings of the 8th International Conference on Software and Information Engineering
    April 2019
    276 pages
    ISBN:9781450361057
    DOI:10.1145/3328833
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 April 2019

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Classification
    2. Computed tomography
    3. Feature Extraction
    4. Lung Nodule
    5. Machine Learning
    6. Medical Images
    7. Random Forest
    8. Wavelet

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICSIE '19

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)2
    Reflects downloads up to 05 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media