Back

ChatGPT Meets Video Security: A New Era of Intelligent Surveillance

In this blog, we'll cover how foundation models like GPT-4 enable new use cases in physical security, particularly in security camera systems. We will also discuss the benefits it offers to businesses in the form of saving time and surfacing new insights.

Ashesh Jain

Oct 30, 2023

ChatGPT Meets Video Security: A New Era of Intelligent Surveillance

Introduction

1. Understanding Foundation Models

Foundation models have taken the world by storm. One well-known incarnation of them is the Large Language Models (LLMs). A few notable products built on LLMs are ChatGPT, Bard, and Claude, to name just a few. Foundation models are machine learning models trained on vast amounts of internet data—encompassing text, images, videos, and audio. These models can perform many tasks straight out of the box without customization, such as answering questions, summarizing documents or images, or even generating an image based on a textual description. As they become integrated into various tools and products, an obvious question emerges: How can they elevate security camera systems or even broader physical security?

‍2. Driving Forces Behind the AI-Driven Security Revolution

We see a major transformation in how users engage with their security camera systems within the next few years. This shift is largely propelled by two trends:

The transition of physical security to the cloud: Cloud-based security cameras have become a part of every organization's physical security plan. The cloud has democratized access to and management of physical security systems. For instance, a company with multiple sites equipped with numerous cameras can remotely oversee all cameras from a unified interface right from their web browser and leverage AI features.
Innovations in LLMs and computer vision foundation models: Vision foundation models can interpret an image and translate it into human-readable text or answer questions about it, like, "Is smoke or fire detected in this image?" Cloud access facilitates the use of these LLMs, which might be too resource-intensive to be run on the camera or NVR or available exclusively via an API.

At Coram AI, we are building a cloud-first video security platform empowered by foundation models. In the following sections, we'll highlight how LLMs and vision foundation models are unlocking new applications in physical security.

3. How Foundation Models Transform Business Video Security

Streamlined Video Search to Expedite Investigations

Camera systems predominantly serve the purpose of incident investigation. For instance, if you're trying to trace a misplaced box over the past week, that could mean hours of footage to sift through. Normally, users would scan all instances of "motion" in the designated area over the week, potentially spending between 10 to 30 minutes to find the right video clip. But imagine the convenience of just querying, "Display videos of someone lifting a box?" Even better, what if you could narrow down this text search to a specific region of interest in the image? With foundation models powering Coram AI, such searches can be completed in a mere 30 seconds.

Presently, most security cameras can only detect predefined categories like “motion,” "people," or "vehicles." The most advanced search might involve attributes like "red car." However, foundation models unlock flexible searches. Users can search for "blue Tesla," "individual picking up trash," "open door," or virtually anything else. This can significantly expedite investigations.

Conversational Interface with a Physical Security AI Agent

Many interactions with physical security systems can be streamlined through a chat interface. Instead of endless clicks on a dashboard and self-analysis, the system can directly respond to user queries. Consider the possibility of asking your security system questions like:

How many individuals visited the office between 1 pm and 3 pm yesterday?
Can I view their faces?
When did the last person exit?
Notify me when FedEx delivers a package at the main entrance.
Alert me if a vehicle with License Plate ABC1234 appears.

By integrating vision foundation models with LLMs, such capabilities aren't just a distant dream—they're a reality that's continually improving. At Coram AI, our endeavor is a physical security AI assistant that swiftly responds to such queries, rendering the system more user-friendly and adaptable to mobile use. Our users already use this feature on our platform to search for specifics like "students on skateboards" to find instances when someone is using a skateboard in an area where they shouldn’t be or "a person in a blue shirt holding a box."

‍4. The Road Ahead: Expanding AI's Reach in Physical Security

While our initial efforts centered on utilizing foundation models for camera security, our ultimate vision encompasses all physical security devices. This includes access controls, alarms, speakers, and environmental sensors. We aim to ensure interoperability among these devices, allowing users to pose open-ended queries and receive prompt responses. For instance, a question like "Show images of individuals 10 minutes prior to detecting vape smoke on the first floor" requires the system to correlate camera footage with environment sensor data.

5. The Importance of an Open Physical Security Platform

Embracing an open-platform, hardware-neutral strategy guarantees that every organization can leverage these innovations. Ideally, AI functionalities should be compatible with any IP cameras, access control devices, or environmental sensors. This is the direction Coram AI is pursuing. Our Cloud-based NVR can work with any IP camera and support access control and environment sensors from various vendors. We're devoted to creating a state-of-the-art, foundation model-integrated open physical security platform, ensuring a wide array of organizations can benefit without being tethered to a specific vendor or hardware.

‍6. Choosing the Right Security Camera System for Your Needs

To best harness the imminent AI innovations, decision-makers should:

Decouple AI processing from the camera: AI models' computational needs are evolving swiftly. You maximize flexibility by situating all computations within a cloud-based NVR rather than a proprietary cloud camera. This allows you to select security vendors boasting the latest AI features without the hassle of camera replacement. If there's an AI chip upgrade, updating a cloud NVR is a breeze and doesn't necessitate camera changes, which can be quite expensive.
Probe into the vendor's internet bandwidth requirements: While LLM foundation models can function in the cloud, requiring minimal data transmission, the true potential is unlocked using computer vision foundation models. These models demand local on-prem processing of the video feed since continuous video data transmission to the cloud can be bandwidth-intensive. Consequently, it's crucial for the cloud-based NVR to possess the computational capacity to accommodate these vision foundation models. Integrating this level of computation within closed cloud-connected cameras poses significant challenges.

‍Conclusion

Foundation models will enable new ways in which users interact with their physical security system. They will allow users to get the exact information they need from a simple conversational interface, reducing the time they spend on the system. The security camera system will have a much deeper understanding of the scene it is capturing. In order to fully leverage the benefits of this, the customers should choose a physical security architecture that is open and hardware-neutral and separates AI computing from the camera into a cloud-based NVR.