Video Surveillance

Video Surveillance – Can Video Images Really Spot Suspects?

We might imagine that identifying suspected terrorists, or alleged criminals, in a busy shopping centre or railway station is a simple task in the modern surveillance age.

Not so reports the National Institute of Standards and Technology (NIST) after running a wide-ranging test known as the Face in Video Evaluation (FIVE).

NIST wanted to assist video facial identification, using the FIVE test to help developers of video facial recognition technology.

They reported that the accuracy of video facial recognition depends on good algorithms, a dedicated design, having a multidisciplinary team of experts, image databases limited in size and field tests to improve the technology.

FIVE ran 36 prototype algorithms from 16 commercial suppliers over 109 hours of video at varied settings, including pictures of people in hats, looking at smartphones, or just looking away from the camera. Some faces were blocked by other, taller people and some were badly lit.

The study matched faces from the videos to photographic databases of some 48,000 people. As when in the videos people were not facing the camera, the technology had to cope with big changes in their faces’ appearance.

The report decided that even with more accurate algorithms, faces may be identified from around 60% of the time to more than 99%, depending on the video or image quality and the algorithm’s ability to cope with different scenarios.

Basically, lower face quality in video, when they are small, badly lit or not facing forward, makes identification difficult, whereas when NIST compared facial photographs against a database of millions of portrait pictures accuracy rates could reach 99%.

The report said that accuracy in video-based facial recognition can be improved and NIST guidance aims to help people working with this technology, from system designers to algorithm developers and policymakers using the systems.

What is needed is accuracy, a limited gallery for comparisons, and high-quality imagery. Using video algorithms to control entry to a secure site might only have the appropriate people populating the gallery and good still photographs for matching to achieve the best results.

What is also needed is a multidisciplinary team of experts to design systems that capture high-quality video images, with videographers identifying the best lighting and most effective camera positions

Work at Carnegie Mellon University, Pittsburgh, also shows that recognising a face in a crowd, or any small or distant object within a large image, is a big challenge for computer vision systems.

Their researchers say that the solution to finding small objects is to look for bigger ones associated with them. Deva Ramanan, associate professor of robotics, and Peiyun Hu, a PhD student in robotics, demonstrated a significant advance in detecting tiny faces, reducing errors, with their “ foveal descriptors” system finding 81% of faces, compared with 29 to 64% with earlier methods.

“It’s like spotting a toothpick in someone’s hand,” Ramanan said. “The toothpick is easier to see when you have hints that someone might be using a toothpick. The orientation of the fingers and the motion and position of the hand are major clues.”

Finding a face only a few pixels in size, Ramanan said is helped by using “foveal descriptors” to encode context in a way similar way to how human vision is structured. The centre of the human field of vision focuses on the retina’s fovea, where visual sharpness is highest, providing sharp detail for a small part of the image, with the surrounding area more of a blur.

By blurring the peripheral image, the foveal descriptor provides enough context to be helpful in understanding the part shown in high focus. Simply increasing the resolution of an image may not help in finding tiny object as it creates a Where’s Waldo problem—pixels of objects lost in a sea of pixels.

While neither the NIST evaluation nor the Carnegie Mellon study covered the aspect of someone in a crowd showing stress, a while ago Chinese scientists claimed to have created a camera which could spot a stressed person in a crowd.

Researchers at China’s Southwest University in Chongqing used hyperspectral imaging to detect the amount of blood in oxygen and calculate stress levels. They say that they can spot both people with high oxygen levels due to physical exertion and those caused by stress.

Chen Tong, an associate professor of electronic information engineering, said that a system like that could have helped to prevent the Kunming massacre when 29 people were murdered and 130 injured in stabbings at a south-west China train station.

He said that the attackers’ level of mental stress must have been extremely high before they launched their attacks, and told the South China Morning Post. “The higher the mental stress, the higher the blood oxygenation. Our technology can detect such people, so law enforcement officers can take precautions and prevent these tragedies.”

What is clear, however, is that although technology is advancing at a rapid rate, video surveillance has a long way to go in protecting the public from attack. People are still the best weapon in our defensive armoury, which is why the Metropolitan Police ask members of the public for “ continued vigilance and if you see anything that causes you concern or raises your suspicions do not hesitate to call us – 0800 789 321 – or in an emergency 999.”