Researchers from various disciplines are concerned with the study of affective phenomena, especially arousal. Expressed affective modulations, which reflect both an individual's internal state and external factors, are central to the communicative process. Bone et al. developed a robust, unsupervised (rule-based) method which provides a scale-continuous, bounded arousal rating from the vocal signal. In this study, we investigate the joint-dynamics of child and psychologist vocal arousal in autism spectrum disorder (ASD) diagnostic interactions. Arousal synchrony is assessed with multiple methods. Results indicate that children with higher ASD severity tend to lead the arousal dynamics more, seemingly because the children aren't as responsive to the psychologist's affective modulations. A vocal arousal model is also proposed which incorporates social and conversational constructs. The model captures conversational signal relations, and is able to distinguish between high and low ASD severity at accuracies well-above chance.