Comparative Analysis of Stochastic Gradient Descent Optimization and Adaptive Moment Estimation in Emotion Classification from Audio Using Convolutional Neural Network

Authors

  • Aldelia Jocelyn Tutuhatunewa Universitas Pattimura

DOI:

https://doi.org/10.33005/jasid.v1i1.5

Keywords:

cnn, classification, mfcc, sgd, adam

Abstract

Emotion is a fundamental aspect of human life that profoundly shapes behavior, social interactions, and decision-making processes. The ability to effectively communicate and foster mutual understanding between individuals relies heavily on accurately recognizing and expressing emotions. Among various channels of emotional expression, sound stands out as a powerful and direct medium that reflects and conveys human emotional states. This makes audio-based emotion recognition a critical and rapidly evolving field of study. With the rapid advancements in information technology and artificial intelligence, research focused on recognizing emotions through sound signals has gained significant momentum. Machine learning algorithms, particularly deep learning models like neural networks, have demonstrated remarkable capabilities in identifying and classifying emotions expressed through multiple modalities such as text, images, videos, and especially audio signals. Within the family of neural networks, Convolutional Neural Networks (CNNs) have been especially effective for audio emotion classification, due to their strength in extracting hierarchical and spatial features directly from raw input data. This study specifically investigates the comparative effectiveness of two popular optimization algorithms—Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam)—in training CNN models for emotion classification from audio recordings. Utilizing the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, experimental results indicate that CNNs trained with the SGD optimizer achieve an overall accuracy of 53%, surpassing the 48% accuracy achieved by Adam. These results underscore the potential advantages of SGD in fine-tuning deep learning models for audio-based emotion recognition. Consequently, researchers and practitioners are encouraged to consider SGD optimization to improve the performance and robustness of emotion classification systems based on audio data.

References

M. F. Naufal, "Analisis Perbandingan Algoritma SVM, KNN, dan CNN untuk Klasifikasi Citra Cuaca," Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 8(2), pp. 311-317, 2021.

D. P. Kingma and J. Ba, "Adam: A Method for Stochastic Optimization," 2014.

S. Ruder, "An Overview of Gradient Descent Optimization Algorithms," arXiv, 2016.

L. Bottou, "Large-Scale Machine Learning with Stochastic Gradients Descent," In Proceedings of COMPSTAT'2010, pp. 177-186, 2010.

F. Zou, L. Shen, Z. Jie, W. Zhang and W. Liu, "A Sufficient Condition for COnvergences of Adam and RMSProp," In Proceedings of the IEEE/CVF Confrence on computer vision and pattern recognition, pp. 11127-11135, 2019.

D. Ardiyansyah and J. Jayanta, "Model Klasifikasi Emosi Berdasarkan Suara Manusia dengan Metode Multilayer Perceptron," Prosiding Seminar Nasional Mahasiswa Bidang Ilmu Komputer dan Aplikasinya, vol. 2, no. 1, pp. 689-702, 2021.

S. R. Livingstone and F. A. Russo, "The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): A dynamic, multimodal set of facial and vocal expressions in North American English.," 2018. [Online]. Available: www.kaggle.com. [Accessed April 2024].

E. Alpaydin, Introduction to Machine Learning, Cambridge, Massachusetts: The MIT Press, 2014.

S. Patil and D. G. K. Kharate, "Implementation of SVM with SMO for Identifying Speech Emotions Using FFT and Source Features," Turkish Journal of Computer and Mathematics Education, vol. 12, no. 6, 2021. S. T. Alexander, "The Mean Squared Error (MSE) Performance Criteria," in Adaptive Signal Processing, Texts and Monographs in Computer Science, Springer, New York, NY, 1986.

A. Muhaimin, D. D. Prastyo and H. Horng-Shing Lu, "Forecasting with Recurrent Neural Network in Intermittent Demand Data," 2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 2021, pp. 802-809, doi: 10.1109/Confluence51648.2021.9376880.

Downloads

Published

2025-05-28

How to Cite

Aldelia Jocelyn Tutuhatunewa. (2025). Comparative Analysis of Stochastic Gradient Descent Optimization and Adaptive Moment Estimation in Emotion Classification from Audio Using Convolutional Neural Network. Jurnal Aplikasi Sains Data, 1(1), 17–29. https://doi.org/10.33005/jasid.v1i1.5