RESEARCH

HOME RESEARCH
Behavior Computing
Spoken Dialogs
States and Traits
Speech and Language
Automatic Assessment of Individual Culture Attribute of Power Distance Using a Social Context-Enhanced Prosodic Network Representation
Abstract
Culture is a collective social norm of human societies that often influences a person's values, thoughts, and social behaviors during interactions at an individual level. In this work, we present a computational analysis toward automatic assessing an individual's culture attribute of power distance, i.e., a measure of his/her belief about status, authority and power in organizations, by modeling their expressive prosodic structures during social encounters with people of different power status. Specifically, we propose a center-loss embedded network architecture to jointly consider the effect of social interaction contexts on individuals' prosodic manifestations in order to learn an enhanced representation for power distance recognition. Our proposed prosodic network achieves an overall accuracy of 78.6% in binary classification task of recognizing high versus low power distance. Our experiment demonstrates an improved discriminability (17.6% absolute improvement) over prosodic neural network without social context enhancement. Further visualization reveals that the diversity in the prosodic manifestation for individuals with low power distance seems to be higher than those of high power distance.
Figures
It shows the complete architecture of our social context-enhanced prosodic network used for automatic power distance recognition: dynamic modeling of prosodic pitch and energy contour, training prosodic network by jointly optimizing setting-wise center-loss with standard cross entropy criteria, performing recognition using functional encoding of the network output layer with support vector classification.
It shows the complete architecture of our social context-enhanced prosodic network used for automatic power distance recognition: dynamic modeling of prosodic pitch and energy contour, training prosodic network by jointly optimizing setting-wise center-loss with standard cross entropy criteria, performing recognition using functional encoding of the network output layer with support vector classification.
Keywords
behavioral signal processing | prosody | centerloss embedding | culture attribute | power distance
Authors
Hao-Chun Yang Chi-Chun Lee
Publication Date
2018/09/02
Conference
Interspeech
Interspeech 2018
DOI
10.21437/Interspeech.2018-1523
Publisher
ISCA