The HERMES Research Project Bridge the Semantic Gap for Smarter Police Surveillance

Share Article

Cutting-edge software developed by a team of European researchers that automatically detects human motion, behaviour and facial expressions from surveillance videos, generating a running commentary of what's happening and re-enacts events virtually could soon be helping police and security services.

The virtual graphical representation of the footage is generated in near real time and can be displayed alongside the actual video stream. Because it is virtual and 3D it allows operators to look at events from angles they would otherwise be unable to

Bridging the semantic gap for smarter police surveillance

This new system, developed by a team of researchers from five European countries, provides a comprehensive and innovative solution to the information overload facing police forces and public and private security services.

With millions of surveillance cameras across Europe capturing what happens on city streets and major meeting points like airports, malls and buildings, monitoring and analysing these video streams has become an epic task. Technology such as automated motion detection, object tracking and behaviour analysis has eased some of the burden, but a gap continues to exist between what surveillance cameras see and how it can be described and interpreted in terms a human operator or computer can understand. Bridging this semantic gap is important because meaningful descriptions of events can trigger meaningful automated or human responses that could spot a crime in progress, prevent injuries or save lives.

"The semantic gap in the analysis of human behaviour from digital video is huge," explains Andrew Bagdanov, a senior researcher at the Computer Vision Centre (CVC) of the Universitat Autonoma in Barcelona, Spain. "Most surveillance software operates only at a very low level… in order to bridge the gap it is necessary to build an artificial cognitive solution that operates at a much higher level, which is able to analyse footage, describe the events taking place and reason about what is going on."

Thanks to new research carried out by a multidisciplinary team working in the HERMES project, an EU-funded initiative named, fittingly, after the messenger of the gods in Greek mythology, such a solution now exists.

The state-of-the-art HERMES system consists of a scalable, flexible platform, integrating software components that not only detect events in real time as they are filmed by surveillance cameras but also describe them semantically and react to them intelligently. It operates at three levels: tracking the movement of people and objects; monitoring the behaviour of people; and, in the case of high-resolution footage taken at close quarters, detecting changes in facial expression.

Monitoring motion, detecting behaviour
Whereas most surveillance video tracking systems operate in a state of perpetual surprise, dumbly following a single target and struggling to reacquire it if lost, the HERMES tracking technology functions more like a human monitoring the same scene, making predictions about where a target is heading and also reacting to any other events in the scene that appear unusual.
"Say two people meet in the street and start to run. The system will detect the change in behaviour and start to follow them. It could alert a human operator if the pattern of behaviour seems suspicious… such as if it appears someone has had their bag stolen," Bagdanov, who oversaw the project's validation activities, says.

Using a combination of static cameras, which provide an overall view of an area, and Pan-Tilt-Zoom (PTZ) cameras, so-called "eyes in the sky" that zoom and move to follow a target, the system is able to automatically track a person as they walk down a street or even across an entire city.

This smarter tracking is made possible by the HERMES researchers' approach to solving the semantic gap. Instead of tracking objects in a scene directly - the current, low-level approach - the HERMES platform generates a running commentary in natural language text of what is going on: "A pedestrian labelled 'Actor 3' appears in the field of view," "He moves on the southeastern sidewalk," "Actor 3 stands nearby another pedestrian" etc.

This semantic information, generated automatically in real time, is then used by the artificial cognitive system to reason about events and behaviours of interest. Human operators, in turn, receive a more accurate description of what is occurring, and can more easily and quickly retrieve specific scenes from a recording with a simple text-based search. The current version of the system can generate text in six different languages.

3D models, automatically and in near real time
Generating semantic information from video in this way also enabled the HERMES researchers to develop another powerful tool as part of the system: a virtual 3D representation of the scene.

"The virtual graphical representation of the footage is generated in near real time and can be displayed alongside the actual video stream. Because it is virtual and 3D it allows operators to look at events from angles they would otherwise be unable to," Bagdanov notes.
The outdoor applications for the system - focused, primarily, on motion and behaviour detection - were tested extensively in Barcelona earlier this year, where cameras attached to the CVC building were used to monitor events in the street outside.

"The system held up better than we expected, though when there are more than 20 people in the scene it starts to break down. This, however, is a problem that can be solved with more cameras and more computer processing power, so the system should scale well," Bagdanov says.

Indoor applications of the system were developed and tested at ETH Zurich in Switzerland and Oxford University in the United Kingdom, both project partners. There, the facial expression recognition component showed the potential for the system to detect different emotions, especially powerful ones such as fear or anger.

Though facial expression detection does have security applications, Bagdanov notes that the technology could prove useful in research on human-computer interaction, for example, to make communication between humans and robots more natural.

"The HERMES project focused principally on developing technology for security and surveillance, but our research has uses in many other fields, not least human-computer interaction, natural language processing, multimedia communications and semantic annotation and search," the project technical coordinator says.

He notes that several project partners are developing commercial applications based on the work carried out in HERMES, and that one or more spin-off companies are under consideration.
The HERMES project received funding from the ICT strand of the EU's Sixth Framework Programme for research .

Media note: This feature can be republished without charge provided ICT Results is acknowledged as the source at the top or the bottom of the story. You must request permission before you use any of the photographs on the site. If you do republish, we would be grateful if you could link back to the ICT Results site (]. Let us know if you republish so as to help us provide you with a better service. If you want further contact information on any of the projects cited in this story please contact us.


Share article on social media or email:

View article via:

Pdf Print

Contact Author

Christian Nielsen
ICT Results
+32 2 639 02 77
Email >