Vocal expression is essential for conveying the emotion during social interaction. Although vocal emotion has been explored in previous studies, little is known about how perception of different vocal emotional expressions modulates the functional brain network topology. In this study, we aimed to investigate the functional brain networks under different attributes of vocal emotion by graph-theoretical network analysis. Functional magnetic resonance imaging (fMRI) experiments were performed on 36 healthy participants. We utilized the Power-264 functional brain atlas to calculate the interregional functional connectivity (FC) from fMRI data under resting state and vocal stimuli at different arousal and valence levels. The orthogonal minimal spanning trees method was used for topological filtering. The paired-sample t-test with Bonferroni correction across all regions and arousal–valence levels were used for statistical comparisons. Our results show that brain network exhibits significantly altered network attributes at FC, nodal and global levels, especially under high-arousal or negative-valence vocal emotional stimuli. The alterations within/between well-known large-scale functional networks were also investigated. Through the present study, we have gained more insights into how comprehending emotional speech modulates brain networks. These findings may shed light on how the human brain processes emotional speech and how it distinguishes different emotional conditions.