Context-aware automatic speech recognition under cognitive stress aided by multimodal biometric detection (TD-10)
Funded by Airbus Group Defence and Space and CDTI (Center for Industrial Technological Development) project IDI-20141068
As part of the macro projects SAVIER (Situational Awarenes Virtual EnviRonment) and SAVIERX2 (Situational Awarenes Virtual EnviRonment Demonstrator)
Jul. 2013 – Jun. 2017
It is now fairly understood how speech articulation and recognition are modified under cognitive stress situations as those produced when the process of HMI (Human Machine Interfacing) takes place while performing a cognitively demanding task, very likely to be produced in a GCS (Ground Control Station). Cognitive sciences have provided a solid background for these phenomena. However, there is a lack of ASR (Automatic Speech Recognition) systems capable of adapting to these kinds of situations for several reasons: first, the absence of robust multimodal cognitive stress detection methods, second, the need to develop computational models for those phenomena and third, the necessity of flexible acoustic models with the plasticity enough to fully assume their wealthiness. We believe that a close resemblance of these phenomena described by cognitivists and the detection of the acoustic cues together with the employment of advanced acoustic models can lead to a stress aware and adaptable ASR system suitable for GCS.
SAVIER is a macro-project that integrates 12 research lines to develop in an integrated way human-machine interfacing technologies for unmanned aerial vehicles’ systems including mission planning, situational-awareness, operators’ stress management and decision support.
Automatic Speech Recognition, ASR, Unmanned Aerial Vehicles, UAV, Ground Control Stations, GCS, Human Machine Interfaces, HMI, cognition, cognitive stress, robustness, interfering speakers, multimodality
- F. de-la Calle-Silos, F. J. Valverde-Albacete. A. Gallardo-Antolín, “Asr feature extraction with morphologically-filtered power-normalized cochleograms,” in Proceedings of Interspeech (15th International Conference on Speech Communication and Technology) , pp. 2430 – 2434, 2014. http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7180326
- F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín and C. Peláez-Moreno, “Morphologically Filtered Power-Normalized Cochleograms as Robust, Biologically Inspired Features for ASR,” in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 23, no. 11, pp. 2070-2080, Nov. 2015. doi: 10.1109/TASLP.2015.2464691.
- F. de-la Calle-Silos, A. Gallardo-Antoln, and C. Pelez-Moreno, “Deep max-out networks applied to noise-robust speech recognition,” in Advances in Speech and Language Technologies for Iberian Languages (J. Navarro Mesa, A. Ortega, A. Teixeira, E. Hernndez Prez, P. Quintana Morales, A. Ravelo Garca, I. Guerra Moreno, and D. Toledano, eds.), vol. 8854 of Lecture Notes in Computer Science, pp. 109-118, Springer International Publishing, 2014. http://link.springer.com/chapter/10.1007%2F978-3-319-13623-3_12
- F. de-la Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolin, and C. Pelaez-Moreno, “Preliminary experiments on the robustness of biologically motivated features for dnn-based asr,” in Bioinspired Intelligence (IWOBI), 2015 4th International Work Conference on, pp. 169-176, June 2015. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=7160162