Biomarker Identification for Lung Cancer Using Deep Learning Approaches
Authors
Arlan Vincent John V. German
University of the Philippines Cebu
Demelo M. Lao
University of the Philippines Cebu
Abstract
Lung cancer is one of the most lethal diseases worldwide. The discovery of its discriminatory biomarkers is crucial to enhance precision diagnosis/prognosis and lower its early detection costs. The study sampled 12 GEO series from the GEO database containing human gene expressions related to lung cancer diagnosis, sub-typing, prognosis, and biomarker identification. These healthy control and lung cancer samples were used to train, fine-tune, and evaluate the designed “DeepGene” model which performed remarkably (AUC = 0.99) compared to the top-performing baseline model (AUC = 0.98) in classifying the samples accordingly. By utilizing DeepSHAP feature importance with the DeepGene model, a novel biomarker discovery method was proposed allowing for 13 discriminatory biomarkers to be identified; four (4) of which are novel (i.e., CCDC141, and Affymetrix probes 238891_at, AFFX-r2-Bs-dap-3_at, AFFX-ThrX-5_at); while the remaining nine (9) biomarkers are validated by existing studies. These 13 biomarkers attained an AUC = 0.99 based on the testing dataset. For future works, biomarker validation of the four novel biomarkers mentioned is recommended.