A Review on Emotion and Fluency Analyzer using Image Processing and Audio Extraction

Swati  Mishra; Arshita  Verma; Disha  Verma; Mayank  Kunal

doi:10.13052/2794-7254.019

, Articles

A Review on Emotion and Fluency Analyzer using Image Processing and Audio Extraction

Articles

https://doi.org/10.13052/2794-7254.019

Published 2025-08-11

Swati Mishra⁺⁻
Arshita Verma⁺⁻
Disha Verma⁺⁻
Mayank Kunal⁺⁻

Swati Mishra

Department of Electrical & Electronics Engineering, JSS Academy of Technical Education, Noida, India

Arshita Verma

Department of Electrical & Electronics Engineering, JSS Academy of Technical Education, Noida, India

Disha Verma

Department of Electrical & Electronics Engineering, JSS Academy of Technical Education, Noida, India

Mayank Kunal

Department of Electrical & Electronics Engineering, JSS Academy of Technical Education, Noida, India

PDF

HTML

Keywords

Audio extraction
cepstral coefficients
convolutional neural
gabor filter
haar cascade classifiers
image processing
logistic regression networks
mel frequency
random forest

Abstract

The recent advancements in integrating image processing with audio extraction have provided a new dimension to emotion and fluency assessment. This paper proposes a new system based on advanced image processing algorithms and audio extraction methods to perform states of emotion and fluent speech analysis. The designed system utilizes Gabor filters – an efficient texture representation and feature extraction method for facial expressions-based systems – to analyze facial movements that comprise particular emotions. It applies the Haar cascade classifier for practical yet straightforward facial detection from the system’s target image. As for the sound characterization, MFCC is employed to extract the emotional content of the voice and its effectively connected speech. The prepared information is processed further through a set of machine-learning techniques. Logistic regression offers a classic classifier for the first emotion categorization. Convolutional neural networks are utilized for one of the DNN sections because of their ability to recognize and learn complicated patterns in image and sound. Using random forest algorithms in the system improves the accuracy and robustness of the model by combining many decision trees, improving the predictive performance. The results indicate that the system efficiently recognizes different emotional states and changes in fluency levels. Hence, it is helpful in mental health, education, etc. In the coming years, the research development is focused on improving the system’s precision by additional models alongside increasing the scope of the system to ordinary day situations that require multilingual and multimodal analysis.

https://doi.org/10.13052/2794-7254.019

PDF

HTML

References

“Early Depression Detection from Social Network Using Deep Learning Techniques”, vol. IEEE Region 10 Symposium (TENSYMP), no. June 7, pp. 823–826, 2022. J. Kaur, J. Saxena, J. Shah, Fahad and S. P. Yadav, “Facial Emotion Recognition,” 2022 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 2022, pp. 528–533, doi: 10.1109/CISES54857.2022.9844366.

Y.-C. Chou, F. R. Wongso, C.-Y. Chao and H.-Y. Yu, “An AI Mock-interview Platform for Interview Performance Analysis,” 2022 10th International Conference on Information and Education Technology (ICIET), Matsue, Japan, 2022, pp. 37–41, doi: 10.1109/ICIET55102.2022.9778999.

Y. Adepu, V. R. Boga and S. U, “Interviewee Performance Analyzer Using Facial Emotion Recognition and Speech Fluency Recognition,” 2020 IEEE International Conference for Innovation in Technology (INOCON), Bengaluru, India, 2020, pp. 1–5, doi: 10.1109/INOCON50539.2020.9298427.

Ali, W., Tian, W., Din, S.U. et al. Classical and modern face recognition approaches a complete review—multimed. Tools Appl 80, 4825–4880 (2021).

Z. K. Abdul and A. K. Al-Talabani, “Mel Frequency Cepstral Coefficient and its Applications: A Review,” in IEEE Access, vol. 10, pp. 122136–122158, 2022, doi: 10.1109/ACCESS.2022.3223444.

K. Kumar and K. Chaturvedi, “An Audio Classification Approach using Feature Extraction Neural Network Classification Approach,” 2nd International Conference on Data, Engineering and Applications (IDEA), Bhopal, India, 2020, pp. 1–6, doi: 10.1109/IDEA49133.2020.9170702.

S. Luitel and M. Anwar, “Audio Sentiment Analysis using Spectrogram and Bag-of- Visual- Words,” 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI), San Diego, CA, USA, 2022, pp. 200–205, doi: 10.1109/IRI54793.2022.00052.

Patil, Pankaj Rambhau. “Elevating Performance Through AI-Driven Mock Interviews.” International Journal for Research in Applied Science and Engineering Technology (2024): n. pag.

K. Zmolikova, M. Delcroix, T. Ochiai, K. Kinoshita, J. Černocký and D. Yu, “Neural Target Speech Extraction: An overview,” in IEEE Signal Processing Magazine, vol. 40, no. 3, pp. 8–29, May 2023, doi: 10.1109/MSP.2023.3240008.

Lakdari, Mohamed Walid, et al. “Mel-frequency cepstral coefficients outperform embeddings from pre-trained convolutional neural networks under noisy conditions for discrimination tasks of individual gibbons.” Ecol. Informatics 80 (2024): 102457.

Varma, V. Sai Nitin, and Abdul Majeed. K.K. “Advancements in Speaker Recognition: Exploring Mel Frequency Cepstral Coefficients (MFCC) for Enhanced Performance in Speaker Recognition.” International Journal for Research in Applied Science and Engineering Technology (2023): n. pag.

Khare, Smith K. et al. “Emotion recognition and artificial intelligence: A systematic review (2014–2023) and research recommendations.” Inf. Fusion 102 (2023): 102019.

Venkatesan, Ramachandran et al. “Human Emotion Detection Using DeepFace and Artificial Intelligence.” RAiSE-2023 (2023): n. pag.

B. Dai, J. Jiang, G. Shen, X. Wang, and Q. Wang, “Deep Face Recognition for Intelligent Video Surveillance at Electrical Substations,” 2021 IEEE 7th International Conference on Cloud Computing and Intelligent Systems (CCIS), Xi’an, China, 2021, pp. 514–518, doi: 10.1109/CCIS53392.2021.9754622.

Amjad, Khan. (2022). Facial Emotion Recognition Using Conventional Machine Learning and Deep Learning Methods: Current Achievements, Analysis and Remaining Challenges. Information, 13(6):268–268. doi: 10.3390/info13060268.

Mishra, S., Agarwal, U. (2023), “Lung Cancer Detection (LCD) from Histopathological Images Using Fine-Tuned Deep Neural Network”, Proceedings of the International Conference on Intelligent Computing, Communication, and Information Security (ICICCIS 2022). Springer, Singapore. https://doi.org/10.1007/978-981-99-1373-2_19.

H. Ugail, H. Edwards, T. Benoy and C. Brooke, “Deep Facial Features for Analysing Artistic Depictions – A Case Study in Evaluating 16th and 17th Century Old Master Portraits,” 2022 14th International Conference on Software, Knowledge, Information Management and Applications (SKIMA), Phnom Penh, Cambodia, 2022, pp. 198–203, doi: 10.1109/SKIMA57145.2022.10029439.

Satya Prakash Yadav, “Emotion recognition model based on facial expressions,” 2021.

G. Krishna, C. Tran, M. Carnahan, Y. Han, and A. H. Tewfik, “Generating EEG features from acoustic features,” Proc. 28th Eur. Signal Process. Conf. (EUSIPCO), pp. 1100–1104, Jan. 2021.

M. Ren, Y. Zhu, Y. Wang, Y. Huang, and Z. Sun, “Understanding Deep Face Representation via Attribute Recovery,” in IEEE Transactions on Information Forensics and Security, vol. 19, pp. 6949–6961, 2024, doi: 10.1109/TIFS.2024.3424291.

A. Revathi, C. Ravichandran, P. Saisiddarth and G. S. R. Prasad, “Isolated command recognition using MFCC and clustering algorithm,” Social Netw. Comput. Sci., vol. 1, no. 2, pp. 1–7, Mar. 2020.

A. S. Haq, M. Nasrun, C. Setianingsih, and M. A. Murti, “Speech recognition implementation using MFCC and DTW algorithm for home automation,” Proc. Int. Conf. Electr. Eng. Comput. Sci. Informat., vol. 7, pp. 78–85, 2020.

H. Naing, R. Hidayat, R. Hartanto and Y. Miyanaga, “Discrete wavelet denoising into MFCC for noise suppressive in automatic speech recognition system,” Int. J. Intell. Eng. Syst., vol. 13, no. 2, pp. 74–82, Apr. 2020.

G. Pikramenos, G. Smyrnis, I. Vernikos, T. Konidaris, E. Spyrou, and S. J. Perantonis, “Sentiment analysis from sound spectrograms via soft bow and temporal structure modeling,” ICPRAM, pp. 361–369, 2020.

K. Patel, D. Mehta, C. Mistry, R. Gupta, S. Tanwar, N. Kumar, et al., “Facial sentiment analysis using A.I. techniques: state-of-the-art taxonomies and challenges,” IEEE Access, vol. 8, pp. 90495–90519, 2020.

S. Mishra and B. M. Agarwal, “Diagnosis and Classification of Cancer Using Machine Learning Techniques,” 2022 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), Delhi, India, 2022, pp. 1–5, doi: 10.1109/SOLI57430.2022.10294965.

S. Mishra and D. Srivastava, “Employing Machine Learning Techniques for Depression Prediction,” 2024 3rd International Conference for Advancement in Technology (ICONAT), Goa, India, 2024, pp. 1–4, doi: 10.1109/ICONAT61936.2024.10775113.

Mishra, S., Agarwal, U. (2023), “Lung Cancer Detection (LCD) from Histopathological Images Using Fine-Tuned Deep Neural Network,” Proceedings of the International Conference.

Downloads

Download data is not yet available.