11 October 2019 (13:40):  IEEE AP/MTT/EMC/ED Turkey Seminar Series (S.53)

Speaker: Asst. Prof. Ercüment Çiçek, Bilkent University

Topic: "SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in Genome-wide Association Studies"

Location: Middle East Technical University, Ankara, Turkey

Abstract: Complex traits often cannot be explained by individual variants. Therefore, the efficient selection of multiple loci that explain the phenotype is critical for understanding the genetic basis of these traits. Selecting multiple loci is a computationally challenging problem that grows exponentially with the number of genomic variants. Many methods tackle this problem by focusing on coding regions to reduce the complexity of the problem. However, these approaches ignore the non-coding regions and introduce literature bias. As one alternative, regularized regression methods have been used; however, they do not allow the incorporation of background biological knowledge and suffer from long execution times. Currently, there is only one machine learning method in the literature, which aims to select a large set of loci efficiently by incorporating biological background information - SConES. SConES selects a set of features guided by a SNP-SNP network and favors the selection of SNPs that are connected on the network. We argue that while connectedness assumption is frequently used for functionally related features, it leads to the selection of redundant features when the goal is to explain a complex phenotype. In the current study, we hypothesize that selecting features on an SNP-SNP network that are diverse in term of location would correspond to incorporating complementary terms and thus, would help to explain the phenotype better. We present SPADIS that implements this novel idea by maximizing a submodular set function with a greedy algorithm that ensures a constant factor approximation to the optimal solution. We compare SPADIS to the state-of-the-art method SConES on a dataset of Arabidopsis Thaliana genotype and continuous flowering time phenotypes. We show that (i) SPADIS has better average phenotype prediction performance in 15 out of 17 phenotypes when the same number of SNPs are selected and provides consistent and statistically significant improvements in regression performance on average across multiple networks and settings, (ii) it identifies more candidate genes, and (iii) runs much faster compared to other methods. We also perform rigorous simulation experiments and compare SPADIS with off the shelf regression-based feature selection methods and show that SPADIS outperforms its counterparts.

Bio: Ercument Cicek earned his BS (2007) and MS (2009) degrees in Computer Science and Engineering from Sabanci University. He received his Ph.D. degree in Computer Science from Case Western Reserve University in 2013. During his Ph.D., he visited Cold Spring Harbor Laboratory to work on gene discovery algorithms for Autism Spectrum Disorder in 2012. After graduation, he worked as a Lane Fellow in Computational Biology at Carnegie Mellon University till 2015. Since then, he is an Asst. Prof. in the Computer Engineering Department of Bilkent University and is an adjunct faculty member in Computational Biology Department of Carnegie Mellon University. His research is mainly focused on designing machine learning algorithms for analysis of large-scale biological data. He is the recipient of Simons Foundation Autism Research Initiative (SFARI) Explorer Award, SFARI Pilot Award, TUBITAK Career Award, TUBA-GEBIP Award and Parlar Foundation Research Incentive Award.

  Date and Time




  • Ankara, Ankara
  • Turkey



Asst. Prof. Ercüment Çiçek


SPADIS: An Algorithm for Selecting Predictive and Diverse SNPs in Genome-wide Association Studies