Exploratory Data Analysis for Gene Expression Data based on K-Formal Concept Analysis
Gene Expression Data (GED) analysis poses a great challenge to the scientific community that can be framed into the Knowledge Discovery in Databases (KDD) and Data Mining (DM) disciplines.
In this project, we develop a framework in which Gene Expression Data (GED) analysis is understood as an Exploratory Data Analysis (EDA) process where we provide support for human interaction with data aiming at improving the step of hypothesis abduction and assessment. We focus on the adaptation to human cognition of data interpretation and visualization of the output of a Data Mining process.
Within the framework of EDA, we introduce a set of interactive analysis tools, based on an extension of Formal Concept Analysis (KFCA), that include a new biclustering algorithm for GED analysis and visualization capabilities to support exploration of GED backed by two quality indicators: our own defined persistence of a bicluster and the p-values computed within the background of Gene Set Enrichment statistical confidence measures facilitated by the indexing capabilities of external databases, such as Gene Ontology (GO), of KFCA.
In contrast with the currently dominant paradigm of Confirmatory or Predictive Data Analysis, with important difficulties for its application to GED intrinsic to the problem definition and mainly due to the lack of ground-truth data, we belief that framing GED analysis in an EDA setting (possibly complemented with ulterior empirical verification of the findings) is a principled and relevant change of paradigm in GED analysis that eases the understanding of the process of scientific discovery.
Besides, a graphical interface to our interactive tool for analysis and decision, WebGeneKFCA, is made available as a web service, allowing researchers to analyse gene expression data with no previous knowledge of the experiment conditions and also interface with external gene ontologies.
- Francisco J. Valverde-Albacete, José María González-Calabozo, Anselmo Peñas, Carmen Peláez-Moreno, Supporting scientific knowledge discovery with extended, generalized Formal Concept Analysis, Expert Systems with Applications, Volume 44, February 2016, Pages 198-216, ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2015.09.022. http://www.sciencedirect.com/science/article/pii/S0957417415006442
Demos & Tools (or Downloads)
- WebGeneKFCA: An EDA (Exploratory Data Analysis) framework based on K-FCA specifically designed for Gene Expression Analysis including interfaces with GO and KEGG: https://webgenekfca.com/webgenekfca
- A more generic tool developed as an aid for scientific discovery is also available at: https://webgenekfca.com/general/changetype/kfca
Carmen Peláez-Moreno (carmen at tsc.uc3m.es)
Francisco J.Valverde-Albacete (fva at tsc.uc3m.es)