A robust Video Analytic system must have the most advanced intelligence as well as the capabilities to operate in varied environments with accuracy and high reliability. This does not happen by accident - the strength of the iOmniscient systems come from the effort that is put into their design. The design objective is to be best-in-class for each capability. Some of the key features that have resulted are described here.
Many camera vendors survive by creating very proprietary interfaces to their system. Once a customer has bought their system, they are locked in as they can never purchase any other brand no matter how superior it is as it will not interface with their existing system.
The system must not be proprietary to any camera manufacturer. It must be able to interface with any camera - analog or IP (IP stands for internet protocol and refers to cameras that can be addressed directly over an IP network based on the camera's IP address). iOmniscient has built a Universal Connectivity Module that allows it to interface with almost any camera with little effort. Such a capability is important as the user may change camera types and brands over the years and the system needs to be able to cope with such change.
Users need to be wary about being locked into one supplier for the life of their system.
To address this issue, the industry has been attempting to establish standards for interfaces between suppliers. Standards for camera interfaces and for receiving video are now maturing. Standards for Video Analytics are still at a very early stage in their development. This is because there is such a big difference in the capability of different suppliers that it is difficult to establish a standard. A standard set based on the most advanced technologies is unacceptable to those who provide technology with very limited intelligence. On the other hand, setting a standard based on the capabilities of the majority of suppliers establishes a very low benchmark and there is effectively no standard for the more advanced capabilities.
The iOmniscient system complies with all standards that exist, the prevalent one being ONVIF. It can receive video in the specified formats and can send alarm information in the required format (at least for those limited types of situations in which a standard is defined).
iOmniscient has also built a Universal Connectivity Module that allows it to receive video input even from cameras that do not comply with the standard as long as they use certain basic video streaming protocols. Also, it sends out all alarm information and meta data in an extended form of the existing standard to make sure that this is easy to receive by other systems.
The module has been developed in line with iOmniscient's commitment to openness. The company believes that the customer should always have a choice and they should always be able to purchase products on merit without being forced into ongoing bad decisions because a vendor has succeeded in locking them into a particular product.
In a complex environment, a camera may be required to perform different functions at different times or indeed to be set up with different configurations at different times. For instance, from 9am till noon, the system may be used for counting but from then on, except on public holidays, it may be required for intrusion detection. The ability to schedule different functions in a robust but sophisticated manner is critical for the effectiveness of a good system.
Images from a video are invariably a 2 dimensional representation of a 3 dimensional space. Humans can understand perspective because they can see everything around them in 3D with two eyes. Systems have to be sufficiently intelligent to understand perspective, e.g., to know that a large object in the distance looks much smaller than in the foreground of the image and vice versa.
Normally, if an event occurs, it should be brought to the attention of the operator who can make a decision on whether to archive that particular footage for later review. If there is no operator available, it should be possible to set the system up to automatically archive any event footage. The amount of time that lapses before the system realizes that there is no human around to intervene must be configurable by the user.
The users of a system may not always be close to events or even near the system used to monitor and analyze the events. They may or may not have the resources to manage and maintain their systems locally. For such situations, the iOmniscient system has been designed to allow implementation, diagnostics and maintenance to be performed remotely. The remote access capability can be used for more than the diagnoses of problems. It can be used for configuration when the system is being implemented.
In many situations, it may not be possible to continuously monitor the system. There may be insufficient staff to man a command and control center. To ensure people in charge receiving information on events immediately, all iOmniscient systems come with Mobile Client systems that operate on Android-based smartphones. The operator supervisors and senior management can monitor events and manage their system even when they are not inside a control room.
Location Based Knowledge
In a busy city, it is not sufficient to be advised when an incident occurs. One also needs to be aware of its occurring location and the available resources to address the event. For this reason, a good system knows the GPS location of all cameras. Beyond this, it is aware of the GPS location of the nearest police and emergency vehicles to the incident. If there is a fire, the system knows the location of the nearest fire station and of every fire engine. If there is an accident, it should know the location of the nearest ambulance.
In certain jurisdictions, various emergency authorities may not want to initiate an automated response but, at least, the system can provide vital information to the person co-ordinating the response.
So far, we have only talked about intelligence as it relates to the images from a particular camera.
The next generation of video management goes beyond the analysis of single images. It involves using the information from multiple cameras to provide the whole network with intelligence.
The best way to visualize Network Intelligence is to set an example. Consider a large theme park. At such a venue, they have to manage long queues for their various rides and activities. Some of these queues are extremely long. They could, for example, start at the entrance of a building, wind through underground corridors and emerge at a different building. No single camera can see every part of the queue. There are often many entrances to the queue and there may be many points where people leave the queue, possibly out of frustration.
In order to manage these queues, management requires information like the average waiting time for a person who enters the queue. They also need to know the length of the queue or where the queue ends, especially when all parts of the queue are not visible. Some of this information may also be made visible to the public to help them understand how long they may have to wait.
This is a very good example of an application that requires intelligence that goes beyond the use of a single camera. This type of application requires cameras to be placed strategically at all the entrance and exit points for the queue. Every camera is then used to count the number of people that pass that point. The cameras are also used to determine where the queue ends. The information from all the cameras is pooled together to provide the information that the park management requires. This allows management to open up more service points and to put up electronic signage advising customers of the expected waiting time for their ride.
Traditional suppliers of Video Analysis systems are still focused on analyzing the information from a single camera. Network Intelligence is only available from the most sophisticated providers of Video Analysis.
All analytics (video or otherwise) use explicit or implicit rules to determine if certain incidents have occurred, for instance, the system can raise an alarm if a person falls down or if a car speeds above the speed limit.
In real life, many incidents may occur in some combination and the way in which they are combined can provide more information on the situation. A person falling down may have slipped. However, if a gunshot is heard at the same time, it is possible that the person has been shot and a different type of response may be appropriate.
This ability to combine rules using AND and OR conjunctions which involves the concept of Boolean logic can provide greater insight into a situation and can help the stakeholder to provide a more appropriate response.
User Friendly Displays
The information generated by intelligent systems ultimately needs to be communicated to humans. Humans are known to have short attention spans and a limited ability to pull out key information from huge masses of data. Therefore, it is important that information is presented to human operators in a manner that is easy for them to absorb and use.
This is the primary challenge for VMS and Command and Control systems. Showing the video image from different cameras is the trivial part of the capability for such systems. For a Video Management System (VMS) or Physical Security Information Management (PSIM) system to present information effectively, it actually has to have the ability to accept the information about events that occur and to present this information. This means that the system must have a sophisticated interface for accepting such information from an advanced analytics system.
Unfortunately, many systems, while being quite capable of showing videos that come off cameras, have little ability to show what is actually happening in the video. Often, they provide text messages about alarms. Only those systems that have been fully integrated with an advanced Video Analytics system can provide all the information that is available from that system.
Many existing VMS systems maintain a proprietary interface limiting their own ability to interface with advanced Video Analytics systems. The analogy would be two people speaking in different languages. Let us assume that A speaks Japanese and B speaks English. If A says something to B in Japanese, B can look it up in a dictionary and translate the words to English to understand him. However, there may be some words that just cannot be translated because the concepts do not occur in English. This would mean that B cannot understand that particular concept precisely.
Similarly, if the designers of the VMS system have not understood a particular Video Analytics concept, their product would have no ability to display the appropriate information. This creates a dilemma. The Video Analytics system can provide increasingly advanced intelligence but this is of little use if it cannot be displayed and communicated to the user by the VMS system.
iOmniscient with its commitment to openness provides ALL meta data about ALL events in a simple to use and standard format. Unfortunately, this is not useable by those VMS systems that have proprietary interfaces.
To ensure that all information can be effectively displayed, iOmniscient does offer its own VMS and Command and Control system which have been specifically designed to understand and display ALL the intelligent information that is available from the Video Analytics system.
To provide the operator with context, the display can also be integrated into drawings, plans or maps of the site or into Geographical Information Systems (GIS) that have other information available about the environment. The icons for the sensors can be embedded on the maps or images. These icons are dynamic and they can indicate if there is an alarm on that particular camera or if it is not working effectively.
As a further advance on this concept, some vendors have developed a 3D rendering capability of the type used in video games. Essentially, an artist is required to convert a 2D image into a 3D simulation of the environment. This can be quite effective but very expensive to implement for a large network of cameras.
The ultimate requirement of a good display system is to be able to extract all the important information that is available from the analytics system and to display it in a way that is meaningful for the user.
Optimized Storage and Networking
Storage and networking are standard components of any surveillance system. Calculating the required amount of storage and the bandwidth of the networking is normally fairly straight forward.
However, as you see in the section about Automated Intelligent Surveillance, software, such as iOmniscient's iQ-Hawk, can actually reduce the required amount of storage and network bandwidth.
The savings can be enormous. The storage and networking costs constitute a huge proportion of the total cost of the system, often greater than the cost of the software itself. This can create situations where systems with iQ-Hawk software cost less than a system without it. In other words, the cost of the software is much less than the benefits derived from using it.
As you see in the last chapter of implementation, this is a key reason for selecting the software before one begins to design the hardware infrastructure.
Hardware Architecture for iQ-Smart City
Any system that is implemented in a busy city must survive and operate over a long period of time. It must be adaptable to evolving needs as the technology and the expectations of its citizens change over the years. The system must therefore have a flexible architecture which protects the city's investment in hardware over the next decade and longer.
The hardware and network architecture for a Smart City can be Centralized, Distributed or a hybrid of the two. Centralized architectures are suited to applications in which all the information needs to be captured and stored for future retrieval. There are applications, however, in which it does not make sense to store all the data as it may impose a heavy load on the network. Examples of these are situations in which megapixel cameras are used. In rare incidents, it makes sense to only store the high resolution images from certain incidents centrally. All other information can be stored in a low resolution format.
In these situations, the analysis can be done on a small computing device placed near or even inside the camera, known as computing at the Edge.
Since a city is a very distributed environment, at least a proportion of the surveillance would best be done on Edge devices. These are essentially little computers that sit inside or near the cameras (at the "edge" of the network). The traditional Edge devices that have been available to date just do not have the power to provide all the computing required. Hence, iOmniscient, in partnership with Intel, has built a Super Edge device which has the power to process the analytics for 4 cameras at a time in a rugged device.
The cameras used today will not be the cameras used tomorrow or the day after as the technology evolves. The system must be able to cope with analog or IP cameras (cameras that can be accessed using the internet protocol) without having to change the Edge device.
For that reason, iOmniscient's hardware solution separates the encoding function onto a different device which performs the analytics. Four miniature encoders or a single 4 channel encoder can be used in conjunction with a Super Edge device that can handle basic analytics for 4 cameras simultaneously. Over time, if some analog cameras are replaced with IP cameras, the Super Edge does not have to be replaced as the cameras can connect directly into it via a switch.
The Super Edge itself has many capabilities beyond the normal Edge devices that are generally available on the market. For instance, today's Super Edge can have up to 500 GB of hard disk embedded in it. It is therefore able to operate independently and to continue operating even if the network connection is lost for a few hours or even for days or months.
Super Edge Device
The Super Edge is a fan-less device and therefore can cope with difficult, dusty environments. With few moving parts, it requires less maintenance.
The Super Edge is very powerful compared with the TI chip based edge systems that are available on the market, today. Whereas a TI based Edge device can run applications up to iQ-100, the Super Edge can run iQ-Infinity for multiple cameras and also applications, such as Face Recognition in a Crowd, License Plate Recognition and iQ-Hawk.
For Smart Cities, the ideal architecture is a hybrid distributed environment which ensures that the most appropriate computing platform is used for the application. Some applications can be run at the Edge whilst others are run at a Central location. And they all run with the same effectiveness. The architecture is transparent to the user. This provides great flexibility when implementing Network Intelligence as opposed to intelligence on single cameras. The Super Edge is the most advanced Edge based device for CCTV, nowadays.
The right Cameras for each Job
The cameras are the eyes of the system and the software is the brain.
Whilst iOmniscient prides itself on working with any brand of camera which has a standard interface protocol, it does not mean that every camera is suited for every task.
The provision of an image of an appropriate quality to the system for processing ensures an accurate analysis of the image. To accomplish this, a camera with the right characteristics is required. If cameras have to see at night, working with the infrared or thermal spectrum may be needed. The scene may need to be illuminated with an infrared lamp. If vehicles are moving at high speed, the camera would need a high shutter speed to capture the image clearly. Different camera characteristics are required for applications, such as Face Recognition, License Plate Recognition, behavior analysis or counting. Hence, it is important to understand the objective for a camera before selecting it.
There are many good quality cameras available on the market. But few camera companies have tested their cameras for particular Video Analysis applications and hence, they are not able to guarantee whether a particular application is fit for a particular purpose. This is where companies like AnalyticsReady come in. Their cameras are designed and built for a particular application. They guarantee that their cameras would work in a particular situation.
Megapixel versus PTZ Cameras
As already noted, PTZ cameras are easily defeated. They also have many moving parts which makes them expensive to maintain. However, if they are already installed, they can be used for Video Analytic systems as long as they are in a fixed position when the Video Analysis is being performed.
However, for new sites, it does not make sense to use PTZ cameras. Today, megapixel cameras can provide the required quality of images at a lower price and they can be used for Video Analytics. Operators have no ability to watch thousands of cameras. Hence, cameras without intelligence are only useful after an event. To be useful for them in a pre-emptive manner, they need to be armed with intelligence. Megapixel cameras are far more useful than PTZ cameras for this purpose.
Thermal and Infrared Cameras
Ultimately all cameras require light to see. But the light does not have to be visible for humans. Camera sensors today are sensitive to infrared light and even to heat.
If a camera needs to see at night, it must have a sensor that is either very sensitive to light in order to be able to see even in very low light or it must have a sensor that can see in the infrared range. To see in the infrared spectrum, one usually needs to illuminate the area with infrared light which has a longer wavelength than visible light and which is invisible to humans.
There are several types of chips used in cameras that are sensitive to infrared light. Some are more sensitive than others and it is important, especially for applications, such as License Plate Recognition at night, that cameras with appropriate sensors are used.
Thermal cameras can see energy in the thermal spectrum. In other words, they can see heat. They can tell the difference between warm bodies and inanimate objects. They are particularly useful to see humans in the dark over relatively long distances. However, the image quality is usually insufficient to do any level of recognition on the person and thermal cameras are much more expensive than other types of cameras.
For major public events where huge crowds gather, it may be necessary to have a camera to view the scene from a height. Such events occur sporadically and it is usually not possible to have a fixed camera located at exactly the right place at the right time. To address this need, a hovering camera can be used. Several brands of hovering cameras are available powered by drone helicopter type devices.
However, Video Analytics only work on cameras with a fixed view, these days. For the hovering camera to be useful for Video Analytics, it must have the ability to provide a steady unmoving view. A hovering camera is available from AnalyticsReady that can fly on a gyroscopically controlled drone helicopter. It operates at or below 50m. It is tethered to a vehicle on the road below and can move as the vehicle moves. The drone helicopter can carry quite a heavy load, certainly sufficient to carry a large camera and illuminator. The tether provides two-way communication as well as power.
As emphasized earlier, Video Analysis does require a fixed camera view. The hovering camera's small movements may be sufficient to cause a Video Analysis system to give false readings. The analytics system has to compensate for these small movements to be able to use such a hovering camera effectively.
360 Degree Cameras
As innovation in camera technology continues, new types of cameras appear on the market. One of them is the 360 degree camera which can see in all directions. The older versions of this technology used four cameras pointing in different directions and their images were stitched together. Newer cameras use fish-eye lenses that can see in all directions. However, the lens tends to distort the images. De-warping software is usually supplied by the camera manufacturer to remove the distortion.
As these are fixed cameras, the undistorted images can be used for Video Analysis.
Video requires images. Audio analysis requires sensors that pick up sound. Smell analytics requires sensors that sense the molecules of the gas and register their "smell".
Many cameras come with a microphone and a loudspeaker, today. This allows any sound that occurs in the vicinity of the camera to be heard and analyzed.
Smell sensors operate independently from cameras, today. But the information gathered can be closely tied to the information that is collected from cameras and audio sensors.
Just as humans have five senses which can be used in combination, Smart City systems can now have three sensors that can work together to provide a more comprehensive view of the happening in the area.
Power from the Sun and Wind
All electronic devices draw power and sometimes it is very difficult to provide power in remote locations, such as on a freeway. Solar and wind powered cameras are now available which can operate independently in such remote locations. There is no difference in their capability other than that they draw their power from a different source. Of course, the power unit must be sized to provide sufficient power for the device to operate.