Saliency and Attention: MUltimodality, context-awaReness, self-Adaptation and bio-Inspiration
Funded by MINECO (Ministry of Economy and Competitiveness)
TEC2014-53390-P (Convocatoria 2014 de Proyectos de I+D del Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia)
Jan. 2015 – Dec. 2017
Saliency, attention, multimodality, bioinspiration, deep learning, latent topics, exploratory analysis, cognitive computation
Our knowledge of the world is shaped by human perception. Our sensory and motor capabilities allow us to understand and interact with reality. Cognition is the result of these interactions. Mimicking such brain functions is one of the most challenging scientific endeavours technologist have currently embraced with the name of cognitive computation, aiming at building biologically inspired intelligent machines.
Saliency is a key cognitive mechanism that prioritizes particular stimuli over others: our brain takes decisions about what is relevant or not in every particular situation in the process of exploring the world.
From a research perspective we identify the following key directions for advancing this technology:
- Multimodality: humans cannot conceive the world using a single modality. Yet most research results specialize on a particular one and have a limited understanding of others. Based on our experience we propose an integration of the two main human modalities: aural and visual. This integration pivots around two main conceptions: first, taking an information-theoretic based perspective for evaluation improves the interpretability of the results and is expected to provide a helpful metric for the fusion and second, an understanding of the role of time.
- Bio-inspiration: deep learning algorithms have had a profound impact in a large number of computational tasks and have also been employed for building models of visual saliency mainly for fusing maps based on different features. However, up to our knowledge, its application to aural saliency has not been explored. Mathematical morphology has also proven to be an advantageous tool to mimick psychoacoustical properties of the human auditory system.
- Context-awareness and self-adaptation: in contrast with the abundant literature about visual bottom-up saliency (the one based on low-level features or stimuli) the modeling of top-down visual attention still remains an open problem since its solutions are, in general, task-dependent. We aim at integrating bottom-up and top-down models by adopting a general framework where user goals and their relationship with low-level stimuli can be learnt and adapted for a particular context (task, individual, environmental, etc.). The capability of acquiring knowledge through the discovery of latent classes, topics, tasks or events together with the adoption of exploratory based analysis guided by experts is our proposal to contribute in this area.
From a methodological point of view, we adopt an end-user perspective since knowledge of the perceptual relevance of audio-visual items can be applied to several problems: e.g. object recognition, action classification or event detection. This, not only involves developing algorithms that incorporate saliency, but also changing the evaluation protocol, moving from the traditional evaluation that assesses the alignment of saliency maps and human fixations to a more meaningful one.
Under this conceptual framework, the purpose of this project is double-fold: first, to contribute to the advance the technology in each of the previous three directions and second, to develop a set of multi-purpose computational tools ready to be assembled into different applications such as event detection, object recognition, video annotation and indexing, personalized information retrieval or recommender systems, bio-imaging based diagnosis, healthcare, etc.
Figure 1. Conceptual Axes of the project
Demos & Tools
- WebKFCA: An EDA (Exploratory Data Analysis) framework based on K-FCA has been developed as an aid for scientific discovery. A more ad hoc tool, specifically designed for Gene Expression Analysis is also available in (url_WebGeneKFCA4GPM). https://webgenekfca.com/webgenekfca/general/changetype/webkfca
- The Entropy Triangle: New implementations of the set of information-theoretic tools for the assessment of multi-class classifiers in Weka that include The Entropy Triangle, NIT and EMA. Visit url_ET4GPM for previous developments in Matlab and R. http://apastor.github.io/entropy-triangle-weka-package/
- Francisco J. Valverde-Albacete, José María González-Calabozo, Anselmo Peñas, Carmen Peláez-Moreno, Supporting scientific knowledge discovery with extended, generalized Formal Concept Analysis, Expert Systems with Applications, Volume 44, February 2016, Pages 198-216, ISSN 0957-4174, http://dx.doi.org/10.1016/j.eswa.2015.09.022.
- F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín and C. Peláez-Moreno, “Morphologically Filtered Power-Normalized Cochleograms as Robust, Biologically Inspired Features for ASR,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, pp. 2070-2080, Nov. 2015. doi: 10.1109 / TASLP.2015.2464691.