SAMURAI

Saliency and Attention: MUltimodality, context-awaReness, self-Adaptation and bio-Inspiration

Abstract:

Our knowledge of the world is shaped by human perception. Our sensory and motor capabilities allow us to understand and interact with reality. Cognition is the result of these interactions. Mimicking such brain functions is one of the most challenging scientific endeavours technologist have currently embraced with the name of cognitive computation, aiming at building biologically inspired intelligent machines.

Saliency is a key cognitive mechanism that prioritizes particular stimuli over others: our brain takes decisions about what is relevant or not in every particular situation in the process of exploring the world.

From a research perspective we identify the following key directions for advancing this technology:

  1. Multimodality: humans cannot conceive the world using a single modality. Yet most research results specialize on a particular one and have a limited understanding of others. Based on our experience we propose an integration of the two main human modalities: aural and visual. This integration pivots around two main conceptions: first, taking an information-theoretic based perspective for evaluation improves the interpretability of the results and is expected to provide a helpful metric for the fusion and second, an understanding of the role of time.
  2. Bio-inspiration: deep learning algorithms have had a profound impact in a large number of computational tasks and have also been employed for building models of visual saliency mainly for fusing maps based on different features. However, up to our knowledge, its application to aural saliency has not been explored. Mathematical morphology has also proven to be an advantageous tool to mimick psychoacoustical properties of the human auditory system.
  3. Context-awareness and self-adaptation: in contrast with the abundant literature about visual bottom-up saliency (the one based on low-level features or stimuli) the modeling of top-down visual attention still remains an open problem since its solutions are, in general, task-dependent. We aim at integrating bottom-up and top-down models by adopting a general framework where user goals and their relationship with low-level stimuli can be learnt and adapted for a particular context (task, individual, environmental, etc.). The capability of acquiring knowledge through the discovery of latent classes, topics, tasks or events together with the adoption of exploratory based analysis guided by experts is our proposal to contribute in this area.

From a methodological point of view, we adopt an end-user perspective since knowledge of the perceptual relevance of audio-visual items can be applied to several problems: e.g. object recognition, action classification or event detection. This, not only involves developing algorithms that incorporate saliency, but also changing the evaluation protocol, moving from the traditional evaluation that assesses the alignment of saliency maps and human fixations to a more meaningful one.

Under this conceptual framework, the purpose of this project is double-fold: first, to contribute to the advance the technology in each of the previous three directions and second, to develop a set of multi-purpose computational tools ready to be assembled into different applications such as event detection, object recognition, video annotation and indexing, personalized information retrieval or recommender systems, bio-imaging based diagnosis, healthcare, etc.

fig1_SAMURAI

Figure 1. Conceptual Axes of the project

Keywords:

Saliency, attention, multimodality, bioinspiration, deep learning, latent topics, exploratory analysis, cognitive computation

Publications

  • [DOI] M. Á. Fernández-Torres, I. González-Díaz, and F. Díaz-de-María, “A probabilistic topic approach for context-aware visual attention modeling,” in 2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI), 2016, pp. 1-6.
    [Bibtex]
    @inproceedings{fer16,
    author={M. Á. Fernández-Torres and I. González-Díaz and F. Díaz-de-María},
    booktitle={2016 14th International Workshop on Content-Based Multimedia Indexing (CBMI)},
    title={A probabilistic topic approach for context-aware visual attention modeling},
    year={2016},
    volume={},
    number={},
    pages={1-6},
    keywords={image fusion;probability;ubiquitous computing;video signal processing;complex visual processes;context-aware visual attention modeling;generic fusion schemes;image frames;probabilistic topic approach;video category;video frames;Adaptation models;Computational modeling;Context modeling;Feature extraction;Image color analysis;Probabilistic logic;Visualization},
    doi={10.1109/CBMI.2016.7500272},
    ISSN={},
    month={June},
    key = {samurai}}
  • A. Zlotnik, J. M. M. Martínez, R. S. S. Hernández, and A. Gallardo-Antolín, “Random forest-based prediction of Parkinson’s disease progression using acoustic, ASR and intelligibility features,” in 16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015), 2015, pp. 503-507.
    [Bibtex]
    @inproceedings{zlo15,
           booktitle = {16th Annual Conference of the International Speech Communication Association (INTERSPEECH 2015)},
               title = {Random forest-based prediction of Parkinson's disease progression using acoustic, ASR and intelligibility features},
              author = {Alexander Zlotnik and Juan Manuel Montero Mart{\'i}nez and Rub{\'e}n San Segundo Hern{\'a}ndez and Ascensi{\'o}n Gallardo-Antol{\'i}n},
                year = {2015},
               pages = {503--507},
            keywords = {Random forest, regression, Parkinson?s disease,
    ASR features, intelligibility},
                 url = {http://oa.upm.es/42002/},
            abstract = {The Interspeech ComParE 2015 PC Sub-Challenge consists of automatically determining the degree of Parkinson?s condition using exclusively the patient?s voice. In this paper, we face this problem as a regression task and in order to succeed, we propose the use of an ensemble learning method, Random Forest (RF), in combination with features of different nature: acoustic characteristics, features derived from the output of an Automatic Speech Recognition system (ASR) and non-intrusive intelligibility measures. The system outperforms the baseline results achieving a relative improvement higher than 19\% in the development set.},
            key ={samurai}
    }
  • [DOI] J. Ludeña-Choez and A. Gallardo-Antolín, “Acoustic Event Classification using spectral band selection and Non-Negative Matrix Factorization-based features,” Expert Systems with Applications, vol. 46, pp. 77-86, 2016.
    [Bibtex]
    @article{lud16,
    title = "Acoustic Event Classification using spectral band selection and Non-Negative Matrix Factorization-based features",
    journal = "Expert Systems with Applications",
    volume = "46",
    pages = "77 - 86",
    year = "2016",
    issn = "0957-4174",
    doi = "https://doi.org/10.1016/j.eswa.2015.10.018",
    url = "http://www.sciencedirect.com/science/article/pii/S0957417415007137",
    author = "Jimmy Ludeña-Choez and Ascensión Gallardo-Antolín",
    keywords = "Acoustic Event Classification, Feature extraction, Temporal feature integration, Feature selection, Mutual information, Non-Negative Matrix Factorization",
    key = {samurai}
    }
  • [DOI] A. Jiménez-Moreno, E. Martínez-Enríquez, and F. Díaz-de-María, “Complexity Control Based on a Fast Coding Unit Decision Method in the HEVC Video Coding Standard,” IEEE Transactions on Multimedia, vol. 18, iss. 4, pp. 563-575, 2016.
    [Bibtex]
    @article{jim16,
    author={A. Jiménez-Moreno and E. Martínez-Enríquez and F. Díaz-de-María},
    journal={IEEE Transactions on Multimedia},
    title={Complexity Control Based on a Fast Coding Unit Decision Method in the HEVC Video Coding Standard},
    year={2016},
    volume={18},
    number={4},
    pages={563-575},
    keywords={computational complexity;data structures;video coding;CC algorithm;HEVC video coding standard;coding complexity;coding tool;computational complexity control algorithm;encoding configuration;encoding time;fast coding unit decision method;flexible data representation;hierarchical approach;prediction unit;target complexity reduction method;transform unit;video content;Complexity theory;Encoding;Proposals;Standards;Streaming media;Complexity control;Complexity control (CC);HEVC;fast coding unit decision;high efficiency video coding (HEVC);on the fly estimation},
    doi={10.1109/TMM.2016.2524995},
    ISSN={1520-9210},
    month={April},
    key = {samurai}}
  • F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, and C. Peláez-Moreno, Preliminary experiments on the robustness of biologically motivated features for DNN-based ASR, , 2015.
    [Bibtex]
    @book{RN415,
       author = {de-la-Calle-Silos, F. and Valverde-Albacete, Francisco J. and Gallardo-Antolín, A. and Peláez-Moreno, C.},
       title = {Preliminary experiments on the robustness of biologically motivated features for {DNN-based ASR}},
       series = {2015 4th International Work Conference on Bioinspired Intelligence},
       pages = {169-175},
       ISBN = {978-1-4673-7846-8},
       url = {<Go to ISI>://WOS:000380501500026},
       year = {2015},
       type = {Book},
       key = {samurai}
    }
  • [DOI] F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, and C. Peláez-Moreno, “Morphologically Filtered Power-Normalized Cochleograms as Robust, Biologically Inspired Features for ASR,” IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 23, iss. 11, pp. 2070-2080, 2015.
    [Bibtex]
    @article{cal:val:gal:pel:15,
        Author = {de-la-Calle-Silos, Fernando and Valverde-Albacete, Francisco J and Gallardo-Antol{\'i}n, Ascensi{\'o}n and Pel{\'a}ez-Moreno, Carmen},
       title = {Morphologically Filtered Power-Normalized Cochleograms as Robust, Biologically Inspired Features for {ASR}},
      Journal = {{IEEE/ACM} Transactions on Audio, Speech and Language Processing {(TASLP)}},
       volume = {23},
       number = {11},
       pages = {2070-2080},
       ISSN = {2329-9290},
       DOI = {10.1109/taslp.2015.2464691},
       url = {<Go to ISI>://WOS:000360835000031},
       year = {2015},
       key = {samurai},
       type = {Journal Article}
    }
  • [DOI] V. Buso, I. González-Díaz, and J. Benois-Pineau, “Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos,” Signal Processing-Image Communication, vol. 39, pp. 418-431, 2015.
    [Bibtex]
    @article{RN410,
       author = {Buso, Vincent and González-Díaz, Ivan and Benois-Pineau, Jenny},
       title = {Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos},
       journal = {Signal Processing-Image Communication},
       volume = {39},
       pages = {418-431},
       ISSN = {0923-5965},
       DOI = {10.1016/j.image.2015.05.006},
       url = {<Go to ISI>://WOS:000367412800009},
       year = {2015},
       type = {Journal Article},
       key = {samurai}
    }
  • V. Buso, I. González-Díaz, and J. Benois-Pineau, “OBJECT RECOGNITION WITH TOP-DOWN VISUAL ATTENTION MODELING FOR BEHAVIORAL STUDIES,” in 2015IEEE International Conference on Image Processing, , 2015, pp. 4431-4435.
    [Bibtex]
    @inbook{RN416,
       author = {Buso, Vincent and González-Díaz, Ivan and Benois-Pineau, Jenny },
       title = {OBJECT RECOGNITION WITH TOP-DOWN VISUAL ATTENTION MODELING FOR BEHAVIORAL STUDIES},
       booktitle = {2015{IEEE} International Conference on Image Processing},
       series = {IEEE International Conference on Image Processing ICIP},
       pages = {4431-4435},
       ISBN = {978-1-4799-8339-1},
       url = {<Go to ISI>://WOS:000371977804114},
       year = {2015},
       type = {Book Section},
       key = {samurai}
    }
  • [DOI] M. de-Frutos-Lopez, J. Luis González-de-Suso, S. Sanz-Rodriguez, C. Peláez-Moreno, and F. Díaz-de-María, “Two-level sliding-window VBR control algorithm for video on demand streaming,” Signal Processing-Image Communication, vol. 36, pp. 1-13, 2015.
    [Bibtex]
    @article{RN412,
       author = {de-Frutos-Lopez, Manuel and Luis González-de-Suso, José and Sanz-Rodriguez, Sergio and Peláez-Moreno, Carmen and Díaz-de-María, Fernando},
       title = {Two-level sliding-window VBR control algorithm for video on demand streaming},
       journal = {Signal Processing-Image Communication},
       volume = {36},
       pages = {1-13},
       ISSN = {0923-5965},
       DOI = {10.1016/j.image.2015.05.004},
       url = {<Go to ISI>://WOS:000360874700001},
       year = {2015},
       type = {Journal Article},
       key = {samurai}
    }
  • [DOI] F. J. ~. Valverde-Albacete, J.M.~González-Calabozo, A. Peñas, and C. Peláez-Moreno, “Supporting scientific knowledge discovery with extended, generalized Formal Concept Analysis,” Expert Systems with Applications, vol. 44, pp. 198-216, 2016.
    [Bibtex]
    @article{val:gon:pen:pel:15old,
      Author = {Francisco J.~ Valverde-Albacete and J.M.~Gonz\'alez-Calabozo and A. Pe\~nas and Carmen Pel\'aez-Moreno},
       title = {Supporting scientific knowledge discovery with extended, generalized Formal Concept Analysis},
       journal = {Expert Systems with Applications},
       volume = {44},
       pages = {198-216},
       ISSN = {0957-4174},
       DOI = {10.1016/j.eswa.2015.09.022},
       url = {<Go to ISI>://WOS:000365051500019},
       year = {2016},
       type = {Journal Article}
    }
  • [DOI] F. J. Valverde-Albacete and C. Peláez-Moreno, “The Linear Algebra in Formal Concept Analysis over Idempotent Semifields,” in Formal Concept Analysis, J. Baixeries, C. Sacarea, and M. OjedaAciego, Eds., , 2015, vol. 9113, pp. 97-113.
    [Bibtex]
    @inbook{RN417,
       author = {Valverde-Albacete, Francisco J. and Peláez-Moreno, Carmen},
       title = {The Linear Algebra in Formal Concept Analysis over Idempotent Semifields},
       booktitle = {Formal Concept Analysis},
       editor = {Baixeries, J. and Sacarea, C. and OjedaAciego, M.},
       series = {Lecture Notes in Artificial Intelligence},
       volume = {9113},
       pages = {97-113},
       ISBN = {978-3-319-19545-2; 978-3-319-19544-5},
       DOI = {10.1007/978-3-319-19545-2_6},
       url = {<Go to ISI>://WOS:000364534600006},
       year = {2015},
       type = {Book Section},
       key = {samurai}
    }

 

Funded by:

MINECO (Ministry of Economy and Competitiveness)

TEC2014-53390-P (Convocatoria 2014 de Proyectos de I+D del Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia)

Jan. 2015 – Dec. 2017

Contact Persons
Ascensión Gallardo-Antolín (gallardo at tsc.uc3m.es)
Carmen Peláez-Moreno (carmen at tsc.uc3m.es)

Comments are closed