This paper presents the RECOD approaches used in the MediaEval 2014 Violent Scenes Detection task. Our system is based on the combination of visual, audio, and text features. We also evaluate the performance of a convolutional network as a feature extractor. We combined those features using a fusion scheme. We participated in the main and the generalization tasks.