Emotion Detection from Speech: A Comprehensive Approach Using Speech-to-Text Transcription and Ensemble Learning

Fatima M Inamdar; Sateesh  Ambesange; Parikshit Mahalle; Nilesh P. Sable; Ritesh Bachhav; Chaitanya Ganjiwale; Shantanu Badmanji; Sarthak  Agase

doi:10.13052/2794-7254.002

, Articles

Emotion Detection from Speech: A Comprehensive Approach Using Speech-to-Text Transcription and Ensemble Learning

Articles

https://doi.org/10.13052/2794-7254.002

Published 2024-03-21

Fatima M Inamdar⁺⁻
Sateesh Ambesange⁺⁻
Parikshit Mahalle⁺⁻
Nilesh P. Sable⁺⁻
Ritesh Bachhav⁺⁻
Chaitanya Ganjiwale⁺⁻
Shantanu Badmanji⁺⁻
Sarthak Agase⁺⁻

Fatima M Inamdar

Vishwakarma Institute of Information Technology, Pune, Maharashtra, India

Sateesh Ambesange

Pragyan Smartai Technology Llp, Bangalore, Karnataka, India

Parikshit Mahalle

Vishwakarma Institute of Information Technology, Pune, Maharashtra, India

Nilesh P. Sable

Vishwakarma Institute of Information Technology, Pune, Maharashtra, India

Ritesh Bachhav

Vishwakarma Institute of Information Technology, Pune, Maharashtra, India

Chaitanya Ganjiwale

Vishwakarma Institute of Information Technology, Pune, Maharashtra, India

Shantanu Badmanji

Vishwakarma Institute of Information Technology, Pune, Maharashtra, India

Sarthak Agase

Vishwakarma Institute of Information Technology, Pune, Maharashtra, India

Wireless World: Research and Trends Magazine

PDF

HTML

Keywords

Text-to-Emotion analysis
natural language processing
sentiment analysis
emotion recognition
computa- tional linguistics
emotional insights
textual sentiment
emotional nuances
user sentiment
language and emotion

Abstract

The field of text-to-emotion analysis is investigated in this study, which uses an interactive methodology to reveal subtle emotional insights in textual data. The research explores the complex relationship between language and emotion using sophisticated methods without focusing on any particular frontend or backend technology. The research attempts to improve our understanding of how literary information transmits emotional subtleties by emphasizing a broad but methodical examination. The lack of mentions of particular libraries and backends indicates an emphasis on the general ideas and techniques used in text-to-emotion analysis.

The findings demonstrate the possibility of deriving significant emotional context from text, opening doors for applications in a variety of fields where user sentiment analysis is essential. This study adds to the body of knowledge on emotional intelligence in computational linguistics and lays the groundwork for future developments in text analysis techniques.

https://doi.org/10.13052/2794-7254.002

PDF

HTML

References

McArthur, V., Teather, R. J., and Jenson, J. (2015). The Avatar Affordances Framework: Mapping Affordances and Design Trends in Character Creation Interfaces. In Proceedings of the 2015 Annual Symposium on Computer-Human Interaction in Play (pp. 231–240). ACM. doi: 10.1145/2793107.2793121.

Hasan, R. Th., Sallow, A. B., and Hasan, A. B. (2021). Face Detection and Recognition Using OpenCV. JSCDM, 2(2). doi: 10.30880/jscdm. 2021.02.02.008.

Budiharto, W., Andreas, V., and Gunawan, A. A. S. (2021). A Novel Model and Implementation of Humanoid Robot with Facial Expression and Natural Language Processing (NLP). ICIC International doi: 10.24507/icicelb.12.03.275.

Wang, H., Gaddy, V., Beveridge, J. R., and Ortega, F. R. (2021). Building an Emotionally Responsive Avatar with Dynamic Facial Expressions in Human–Computer Interactions. MTI, 5(3), 13. doi: 10.3390/ mti5030013.

Canfes, Z., Atasoy, M. F., Dirik, A., and Yanardag, P. (2023). Text and Image Guided 3D Avatar Generation and Manipulation. In 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (pp. 4410–4420). IEEE. doi: 10.1109/WACV56688.2023.00440.

Jahanbakhsh-Nagadeh, Z., Feizi-Derakhshi, M.-R., and Sharifi, A. (2019). A Speech Act Classifier for Persian Texts and its Application in Identifying Rumors. doi: 10.48550/ARXIV.1901.0 3904.

Saxena, A., Khanna, A., and Gupta, D. (2020). Emotion Recognition and Detection Methods: A Comprehensive Survey. AIS, 2(1), 53–79. doi: 10.33969/AIS.2020.21005.

Pearl, L. S., and Enverga, I. (2014). Can you read my mind print?: Automatically identifying mental states from language text using deeper linguistic features. IS, 15(3), 359–387. doi: 10.1075/is.15.3.01pea.

Sutoyo, R., Chowanda, A., Kurniati, A., and Wongso, R. (2019). Designing an Emotionally Realistic Chatbot Framework to Enhance Its Believability with AIML and Information States. Procedia Computer Science, 157, 621–628. doi: 10.1016/j.procs.2019.08.226.

Schuller, B., Steidl, S., Batliner, A., Vinciarelli, A., Scherer, K., Ringeval, F., … and Marchi, E. (2013). The INTERSPEECH 2013 computational paralinguistics challenge: social signals, conflict, emotion, autism. In Proceedings of Interspeech (pp. 148–152). doi: 10.21437/ Interspeech.2013-70.

Ayadi, M., Kamel, M. S., and Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition, 44(3), 572–587. doi: 10.1016/j.patcog.2010.09.020.

Eyben, F., Scherer, K. R., Schuller, B. W., Sundberg, J., André, E., Busso, C., … and Tavabi, L. (2015). The Geneva Minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing. IEEE Transactions on Affective Computing, 7(2), 190–202. doi: 10.1109/TAFFC.2015.2457417.

Lee, C. M., and Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE Transactions on Speech and Audio Processing, 13(2), 293–303. doi: 10.1109/TSA.2004.840609.

El Ayadi, M., and Kamel, M. S. (2007). A new framework for audio–visual speech emotion recognition. Pattern Recognition, 40(6), 1674–1687. doi: 10.1016/j.patcog.2006.11.029.

Busso, C., Lee, S., and Narayanan, S. S. (2009). Analysis of emotionally salient aspects of fundamental frequency for emotion detection. IEEE Transactions on Audio, Speech, and Language Processing, 17(4), 582–596. doi: 10.1109/TASL.2008.2010426.

Deng, J., Zhang, Z., Marchi, E., Schuller, B., and Wu, D. (2013). Sparse autoencoder-based feature transfer learning for speech emotion recognition. In Proceedings of Interspeech (pp. 236–240). doi: 10.21437/Interspeech.2013-79.

Koolagudi, S. G., and Rao, K. S. (2012). Emotion recognition from speech: A review. International Journal of Speech Technology, 15(2), 99–117. doi: 10.1007/s10772-012-9157-x.

Lotfian, R., and Saeidi, R. (2016). Speech emotion recognition using hidden Markov models. Speech Communication, 77, 1–17. doi: 10.1016/j.specom.2015.10.010.

Li, C., and Deng, J. (2014). Emotion recognition from speech signals using new harmony features. IEEE Transactions on Multimedia, 16(7), 1904–1916. doi: 10.1109/TMM.2014.2323633.

Downloads

Download data is not yet available.