多模態多語言情感計算:資料庫與演算法
The objective of this research is to enhance the cultural adaptability of affective computing systems through the integration of multimodal data and the refinement of labelling frameworks. While emotion recognition has advanced significantly, existing models are predominantly trained on Western, Educated, Industrialized, Rich, and Democratic (WEIRD) populations, resulting in biased interpretations in diverse cultural and linguistic contexts.
Emotion expression and perception vary across societies due to differences in social norms, communication styles, and language structures. For instance, Japanese honorifics (keigo) encode hierarchical relationships, whereas Dutch social interactions emphasize sincerity over overt expressiveness. Similarly, Cantonese and written Chinese differ significantly in structure and tone, affecting emotional interpretation in communication (Mesquita, 2022). Additionally, nonverbal cues such as bowing in Japan or Korea, or hand gestures in Mediterranean cultures, convey nuanced emotional and social meanings that AI models often fail to capture. Despite the challenges posed by these complexities, many current affective computing models rely on standardized labels that oversimplify cultural differences, resulting in misinterpretations.
This study aims to address these gaps by developing a multimodal framework that integrates facial expressions, speech prosody, and body language across diverse cultural settings while also redefining labelling methodologies to minimize cultural bias. The integration of diverse cultural datasets and interdisciplinary insights from psychology, linguistics, and human-computer interaction (HCI) is central to the research. The objective is to create context-aware affective computing systems, with the investigation of real-world applications in human-AI interaction, particularly in education, healthcare, and customer service, where accurate emotion recognition can enhance user experience and engagement.