In this blog, we'll cover how foundation models like GPT-4 enable new use cases in physical security, particularly in security camera systems. We will also discuss the benefits it offers to businesses in the form of saving time and surfacing new insights.
Foundation models have taken the world by storm. One well-known incarnation of them is the Large Language Models (LLMs). A few notable products built on LLMs are ChatGPT, Bard, and Claude, to name just a few. Foundation models are machine learning models trained on vast amounts of internet data—encompassing text, images, videos, and audio. These models can perform many tasks straight out of the box without customization, such as answering questions, summarizing documents or images, or even generating an image based on a textual description. As they become integrated into various tools and products, an obvious question emerges: How can they elevate security camera systems or even broader physical security?
We see a major transformation in how users engage with their security camera systems within the next few years. This shift is largely propelled by two trends:
At Coram AI, we are building a cloud-first video security platform empowered by foundation models. In the following sections, we'll highlight how LLMs and vision foundation models are unlocking new applications in physical security.
Streamlined Video Search to Expedite Investigations
Camera systems predominantly serve the purpose of incident investigation. For instance, if you're trying to trace a misplaced box over the past week, that could mean hours of footage to sift through. Normally, users would scan all instances of "motion" in the designated area over the week, potentially spending between 10 to 30 minutes to find the right video clip. But imagine the convenience of just querying, "Display videos of someone lifting a box?" Even better, what if you could narrow down this text search to a specific region of interest in the image? With foundation models powering Coram AI, such searches can be completed in a mere 30 seconds.
Presently, most security cameras can only detect predefined categories like “motion,” "people," or "vehicles." The most advanced search might involve attributes like "red car." However, foundation models unlock flexible searches. Users can search for "blue Tesla," "individual picking up trash," "open door," or virtually anything else. This can significantly expedite investigations.
Conversational Interface with a Physical Security AI Agent
Many interactions with physical security systems can be streamlined through a chat interface. Instead of endless clicks on a dashboard and self-analysis, the system can directly respond to user queries. Consider the possibility of asking your security system questions like:
By integrating vision foundation models with LLMs, such capabilities aren't just a distant dream—they're a reality that's continually improving. At Coram AI, our endeavor is a physical security AI assistant that swiftly responds to such queries, rendering the system more user-friendly and adaptable to mobile use. Our users already use this feature on our platform to search for specifics like "students on skateboards" to find instances when someone is using a skateboard in an area where they shouldn’t be or "a person in a blue shirt holding a box."
While our initial efforts centered on utilizing foundation models for camera security, our ultimate vision encompasses all physical security devices. This includes access controls, alarms, speakers, and environmental sensors. We aim to ensure interoperability among these devices, allowing users to pose open-ended queries and receive prompt responses. For instance, a question like "Show images of individuals 10 minutes prior to detecting vape smoke on the first floor" requires the system to correlate camera footage with environment sensor data.
Embracing an open-platform, hardware-neutral strategy guarantees that every organization can leverage these innovations. Ideally, AI functionalities should be compatible with any IP cameras, access control devices, or environmental sensors. This is the direction Coram AI is pursuing. Our Cloud-based NVR can work with any IP camera and support access control and environment sensors from various vendors. We're devoted to creating a state-of-the-art, foundation model-integrated open physical security platform, ensuring a wide array of organizations can benefit without being tethered to a specific vendor or hardware.
To best harness the imminent AI innovations, decision-makers should:
Foundation models will enable new ways in which users interact with their physical security system. They will allow users to get the exact information they need from a simple conversational interface, reducing the time they spend on the system. The security camera system will have a much deeper understanding of the scene it is capturing. In order to fully leverage the benefits of this, the customers should choose a physical security architecture that is open and hardware-neutral and separates AI computing from the camera into a cloud-based NVR.
Discover Coram AI's unmatched cloud camera security: seamlessly scalable, tailored for your business, delivering immediate and lasting value.