Early Lung Cancer Prediction Using Neural Network with Cross-validation

Main Article Content

Shawni Dutta
Samir Kumar Bandyopadhyay


Lung cancer is known as lung carcinoma. It is a disease which is malignant tumor leading to the uncontrolled cell growth in the lung tissue. Lung cancer is caused generally by smoking and the use of tobacco products. It is classified into two broad Small-cell lung Carcinomas and non-Small cell lung carcinomas. Lung cancer treatments include surgery, radiation therapy, chemotherapy, and targeted therapy. Lung Cancer disease is one of the most prominent cause of death in all over world. Early detection of this disease can assist medical care unit as well as physicians to provide counter measures to the patients. The objective of this paper is to approach an automated tool that takes influential causes of lung cancer as input and detect patients with higher probabilities of being affected by this disease. A neural network classifier accompanied by k-fold cross-validation technique is proposed in this paper as a predictive tool. Later, this proposed method is compared with another baseline classifier Gradient Boosting Classifier in order to justify the prediction performance. Experimental results conclude that analyzing interfering causes of lung cancer can effectively accomplish disease classification model with an accuracy of 95%.

Lung cancer prediction, neural network, cross-validation, gradient boosting classifier, automated tool.

Article Details

How to Cite
Dutta, S., & Bandyopadhyay, S. K. (2020). Early Lung Cancer Prediction Using Neural Network with Cross-validation. Asian Journal of Research in Infectious Diseases, 4(4), 15-22. https://doi.org/10.9734/ajrid/2020/v4i430153
Original Research Article


Michalski RS, Carbonell JG, Mitchell TM. Machine learning an artifical intelligence approach. Tioga Press, Palo Alto; 1983.

Safial Islam Ayon, Md. Milon Islam. Diabetes Prediction: A deep learning approach. International Journal of Information Engineering and Electronic Business (IJIEEB). 2019;11(2):21-27.

DOI: 10.5815/ijieeb.2019.02.03

Md. Rezwanul Haque, Md. Milon Islam, Hasib Iqbal, Md. Sumon Reza, and Md. Kamrul Hasan. Performance evaluation of random forests and artificial neural networks for the classification of liver disorder. 2018 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering, IEEE, Rajshahi, Bangladesh. 2018;1-5:8-9.

Md. Milon Islam, Hasib Iqbal, Md. Rezwanul Haque, Md. Kamrul Hasan. Prediction of breast cancer using support vector machine and K-Nearest neighbors. IEEE Region 10 Humanitarian Technology Conference (R10-HTC), IEEE, Dhaka, Bangladesh. 2017;226-229:21-23.

Md. Kamrul Hasan, Md. Milon Islam, A. Hashem MM. Mathematical model development to detect breast cancer using multigene genetic programming. 2016 5th International Conference on Informatics, Electronics and Vision (ICIEV), IEEE, Dhaka, Bangladesh. 2016;574-57:13-14.

Muhammad Lawan Jibril, Md. Milon Islam, Usman Sani Sharif, Safial Islam Ayon, Predictive data mining models for novel coronavirus (COVID-19) infected patients recovery. SN Computer Science, Springer. 2020;1(4):206.

Islam Z, Islam MM, Asraf A. A combined deep CNN-LSTM network for the detection of novel coronavirus (COVID-19) Using X-ray Images, (2020), med Rxiv. 2020;1–20. Available:https://doi.org/10.1101/2020.06.18.20134718.

Safial Islam Ayon, Md. Milon Islam and Md. Rahat Hossain. Coronary artery heart disease prediction: a comparative study of computational intelligence techniques. IETE Journal of Research, Taylor & Francis. 2020;1-20. Available:https://doi.org/10.1080/03772063.2020.1713916

Cancer Research UK. Lung cancer and smoking statistics -Key Facts; 2011. Available:http://info.cancerresearchuk.org/cancerstats/keyfacts/lung-cancer/.

Siegel RL, Miller KD, Jemal A. Cancer statistics, 2016.' CA, Cancer J. Clin. 2016;66(1):730.

Shen et al. W. Multi-crop convolutional neural networks for lung nodule malignancy suspiciousness classification,'' Pattern Recognition. 2017;61:663673.

Rebouças Filho PP, Rebouças EDS, Marinho LB, Sarmento RM, Tavares JMR, de Albuquerque VHC. Analysis of human tissue densities: A new approach to extract features from medical images. Pattern Recognition. Letter. 2017;94:211218.

Lee et al. HK. A System-theoretic method for modeling, analysis, and improvement of lung cancer diagnosis-to-surgery process in IEEE. Transactions on Automation Science and Engineering. 2018;15(2):531-544.

DOI: 10.1109/TASE.2016.2643627.

Dela Cruz, Charles S et al. Lung cancer: epidemiology, etiology, and prevention. Clinics in chest medicine. 2011;32(4):605-44.


Fangfang Han, Guopeng Zhang, Huafeng Wang, Bowen Song, Hongbing Lu, Dazhe Zhao, Hong Zhao and Zhengrong Liang. A texture feature analysis for diagnosis of pulmonary nodules using LIDC-IDRI Database. 2013 IEEE International Conference on Medical Imaging Physics and Engineering, Shenyang. 2013;14-18. DOI: 10.1109/ICMIPE.2013.6864494.

Jiang J. et al. Multiple resolution residually connected feature streams for automatic lung tumor segmentation from CT Images, in IEEE. Transactions on Medical Imaging. 2019;38(1):134-144.

DOI: 10.1109/TMI.2018.2857800.

Subrato Bharati, Prajoy Podder, Rubaiyat Hossain Mondal M. Hybrid deep learning for detecting lung diseases from X-ray images. Informatics in Medicine Unlocked. 2020;20:100391. Available:https://doi.org/10.1016/j.imu.2020.100391.

Bharati S, Podder P, Mondal R, Mahmood A, Raihan-Al-Masud M. Comparative performance analysis of different classification algorithm for the purpose of prediction of lung cancer. Advances in Intelligent Systems and Computing. 2020;941. Springer, Cham. Available:https://doi.org/10.1007/978-3-030-16660-1_44

Yusuf Dede. Lung Cancer Data Set, Version 1. Retrieved on May 28,2020 from Available:https://www.kaggle.com/yusufdede/lung-cancer-dataset

Nwankpa C, Ijomah W, Gachagan A, Marshall S. Activation functions: comparison of trends in practice and research for deep learning. arXiv, abs/1811.03378. 2018;1–20.

Kingma DP, Ba JL, Adam: A method for stochastic optimization. 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc. 2015;1–15.

Kirschen RH, O’Higgins EA, Lee RT. A study of cross-validation and bootstrap for accuracy estimation and model selection. Am. J. Orthod. Dentofac. Orthop. 2000;118(4);456– 461.

DOI: 10.1067/mod.2000.109032.

H. M, S. M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process. 2015;5(2):01–11.

DOI: 10.5121/ijdkp.2015.5201.

Vieira SM, Kaymak U, Sousa JMC. Cohen’s kappa coefficient as a performance measure for feature selection. 2010 IEEE World Congr. Comput. Intell. WCCI 2010; 2010.

DOI: 10.1109/FUZZY.2010.5584447.

Natekin A, Knoll A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013;7. DEC, 2013,

DOI: 10.3389/fnbot.2013.00021.

Jose Ramon Troche, Susan T. Mayne, Neal D. Freedman, Fatma M. Sheb, Christian C. Abnet. The association between alcohol consumption and lung carcinoma by histological subtype. American journal of epidemiology. 2016;183(2):110-21. DOI:10.1093/aje/kwv170