Abstract
Moumita Karmakar1, Pei-Chun Lai1, Samiran Sinha1, Shannon Glaser2 and Sanjukta Chakraborty2
1 Department of Statistics, Texas A&M University, College Station, TX 77843, USA
2 Department of Medical Physiology, Texas A&M Health Science Center, College of Medicine, Medical Research and Education Building, Bryan, TX 77807, USA
Correspondence to:
Sanjukta Chakraborty, | email: | schakraborty@tamu.edu |
Keywords: miRNA; supervised clustering; random forest; TCGA; head neck cancer
Received: January 11, 2021 Accepted: June 22, 2021 Published: July 20, 2021
ABSTRACT
Lymphovascular invasion (LVI) is an important prognostic indicator of lymph node metastasis and disease aggressiveness but clear molecular mechanisms mediating this in head and neck cancers (HNSC) remain undefined. To identify important microRNAs (miRNAs) in HNSC that associate with and are also predictive of increased risk of LVI, we used a combination of clustering algorithms, multiple regression analyses and machine learning approaches and analyzed miRNA expression profiles in the TCGA HNSC database. As the first step, we identified miRNAs with increased association with LVI as a binary variable. In order to determine whether the identified miRNAs would show functional clusters that are also indicative of increased risk for LVI, we carried out unsupervised as well as supervised clustering. Our results identified distinct clusters of miRNAs that are predictive of increased LVI. We further refined these findings using a Random forest approach, and miR-203a-3p, mir-10a-5p, and miR-194-5p to be most strongly associated with LVI. Pathway enrichment analysis showed these miRNAs targeted genes involved in Hippo signaling and fatty acid oxidation pathways that are mediators of lymph node metastasis. Specific association was also identified between the miRNAs associated with LVI and expression of several lymphangiogenic genes that could be critical for determination of therapeutic strategies.