Embedded security system for multi-modal surveillance in a railway carriage
ZOUAOUI ; AUDIGIER ; AMBELLOUIS ; CAPMAN ; BENHADDA ; JOUDRIER ; SODOYER ; LAMARQUE
Type de document
COMMUNICATION AVEC ACTES INTERNATIONAL (ACTI)
Langue
anglais
Auteur
ZOUAOUI ; AUDIGIER ; AMBELLOUIS ; CAPMAN ; BENHADDA ; JOUDRIER ; SODOYER ; LAMARQUE
Résumé / Abstract
Public transport security is one of the main priorities of the public authorities when fighting against crimes and terrorism. In this context, there is a great demand for autonomous systems able to detect abnormal events such as violent acts aboard coaches and intrusions when the train is parked at the depot. To this end, we present an innovative approach which aims at providing efficient automatic event detection by fusing video and audio analytics and reduce the false alarm rate compared to classical video detection. The multi-modal system is composed of two microphones and one camera and integrates onboard video and audio analytics and fusion capabilities. On the one hand, for detecting intrusion, the system relies on the fusion of 'unusual' audio events detection with intrusion detections from video processing. The audio analysis consists in modeling the normal ambience, and detecting deviation from the trained models during testing. This unsupervised approach is based on clustering of automatically extracted segments of acoustic features and statistical GMM modeling of each cluster. The intrusion detection is based on the 3D detection and tracking of individuals in the videos. On the other hand, for violent events detection, the system fuses unsupervised and supervised audio algorithms with video event detection. The supervised audio technique detects specific events such as shouts. A Gaussian Mixture Model is used to catch the formantic structure of a shout signal. Video analytics use an original approach for detecting aggressive motion by focusing on erratic motion patterns specific to violent events. As data with violent events is not easily available, a normality model with structured motions from non-violent videos is learned for one-class classification. A fusion algorithm based on Dempster-Shafer's theory analyses the asynchronous detection outputs and computes the degree of belief of each probable event.