Abstract
Formally, the problem that we present is that of identifying the hidden attributes of the system that modulates the body's signals, uncovered through novel signal processing and machine learning on large-scale multimodal data (Figure 1). Signal processing is the keystone that supports this mapping from data to representations of behaviors and mental states. The pipeline first begins with raw signals, such as from visual, auditory, and physiological sensors. Then, we need to localize information coming from corresponding behavioral channels, such as the face, body, and voice. Next, the signals are denoised and modeled to extract meaningful information like the words that are said and patterns of how they are spoken. The coordination of channels can also be assessed via time-series modeling techniques. Moreover, since an individual's behavior is not isolated, but influenced by a communicative partners' actions and the environment (e.g., interview versus casual discussion, home versus clinic), temporal modeling must account for these contextual effects. Finally, having achieved a representation of behavior derived from the signals, machine learning is used to make inferences on mental states to support human or autonomous decision making.