Affective media videos have been used as stimulus to investigate an individual’s affective-physio responses. In this study, we aim to develop a network learning strategy for robust cross-corpus emotion recognition using physiological features jointly with affective video content. Specifically, we present a novel framework of Visual Semantic Graph Learning Convolutional Network (VGLCN) for individual emotional state recognition using physiology on transfer learning tasks. The stimulus of videos content is integrated into learnable graph structure toweight the importance ofphysiology on the two emotion dimensions, valence and arousal. Furthermore, we evaluate our proposed framework on two public emotion databases with a rigorous cross validation method, and our model achieves the best unweighted average recall (UAR), which is 67.9%, 56.9% for arousal and 79.8%, 70.4% for valence on the cross datasets recognition experiments respectively. Further analyses reveal that 1) VGLCN is especially effective on transfer valence binary-task, 2) the physiological features (ECG, EDA) are very informative features for emotion recognition and 3) the affective media videos are important constraint to be included in the framework to stabilize the performance power.