Pain is an unpleasant sensory and distressing feeling usually induced by physical damages, and the intensity is further modulated by the experienced pain site. Objective assessment of pain is critical in a variety of clinical practices, however, the status quo in medical practices is based solely on self-report. Recent advancements have been observed in automatic assessment of pain using audio-video recordings, but most do not consider the complex clinical dependency between pain level and pain site. In this study, we propose a Task Specific Encoder with Soft Layer Ordering structure (TSEN-SLO) that utilizes a learnable tensor to flexibly share information between pain level and pain site while still keeping the representations of each task in their selfencoding layers to improve pain level recognition. Our network learns from both face and voice data and achieves accuracy of 70% and 48.1% in a binary and ternary self-report pain level classification in a challenging in-the-wild setting. The approach improves a relative of 6.5% and 9.1% compare to previous work on the same dataset. Further analysis also demonstrates the variation in the self-reported pain level as observed in the facial and acoustic features for different pain sites, which points toward a potential relationship between the neural-mechanism behind internal pain sensation and its effect on expressive facial/vocal behaviors.