Automatically detecting violence in videos is paramount for enforcing the law and providing the society with better policies for safer public places. In addition, it may be essential for protecting minors from accessing inappropriate contents on-line, and for helping parents choose suitable movie titles for their children. However, this is an open problem as the very definition of violence is subjective and may vary from one society to another. Detecting such nuances from video footages with no human supervision is very challenging. Clearly, when designing a computer-aided solution to this problem, we need to think of efficient (quickly harness large troves of data) and effective detection methods (robustly filter what needs special attention and further analysis). In this vein, we explore a content description method for violence detection founded upon temporal robust features that quickly grasp video sequences, automatically classifying violent videos. The used method also holds promise for fast and effective classification of other recognition tasks (e.g., pornography and other inappropriate material). When compared to more complex counterparts for violence detection, the method shows similar classification quality while being several times more efficient in terms of runtime and memory footprint.