Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
Today as part of its CES 2025 opening keynote by CEO Jensen Huang, Nvidia launched a blueprint for AI agents that can analyze video.
The new Nvidia AI blueprint powered by Metropolis lets organizations and individuals increase productivity and safety, and could even help Nvidia’s CEO improve his fastball pitch.
The next big moment in AI is in sight — literally.
Today, more than 1.5 billion enterprise-level cameras deployed worldwide are generating roughly 7 trillion hours of video per year. Yet, only a fraction of it gets analyzed.
It’s estimated that less than 1% of video from industrial cameras is watched live by humans, meaning critical operational incidents can go largely unnoticed.
This comes at a high cost. For example, manufacturers are losing trillions of dollars annually to poor product quality or defects that they could have spotted earlier, or even predicted, by using AI agents that can perceive, analyze and help humans take action.
Interactive AI agents with built-in visual perception capabilities can serve as always-on video analysts, helping factories run more efficiently, bolster worker safety, keep track that things are running smoothly and even up an athlete’s game.
To accelerate the creation of such agents, Nvidia today announced early access to a new version of the Nvidia AI blueprint for video search and summarization. Built on top of the Nvidia Metropolis platform — and now supercharged by Nvidia Cosmos Nemotron vision language models (VLMs), Nvidia Llama Nemotron large language models (LLMs) and Nvidia NeMo Retriever — the blueprint provides developers with the tools to build and deploy AI agents that can analyze large quantities of video and image content.
The blueprint integrates the Nvidia AI Enterprise software platform — which includes Nvidia NIM microservices for VLMs, LLMs and advanced AI frameworks for retrieval-augmented generation — to enable batch video processing that’s 30 times faster than watching it in real time.
The blueprint contains several agentic AI features — such as chain-of-thought reasoning, task planning and tool calling — that can help developers streamline the creation of powerful and diverse visual agents to solve a range of problems.
AI agents with video analysis abilities can be combined with agents with different skill sets to enable even more sophisticated agentic AI services.
Enterprises have the flexibility to build and deploy their AI agents from the edge to the cloud.
How video-analyst AI agents can help industrial businesses
AI agents with visual perception and analysis skills can be fine-tuned to help businesses with industrial operations by:
● Increasing productivity and reducing waste: Agents can help ensure
standard operating procedures are followed during complex industrial
processes like product assembly. They can also be fine-tuned to carefully
watch and understand nuanced actions, and the sequence in which they’re
implemented.
● Boosting asset management efficiency through better space utilization:
Agents can help optimize inventory storage in warehouses by performing 3D
volume estimation and centralizing understanding across various camera
streams.
● Improving safety through auto-generation of incident reports and
summaries: Agents can process huge volumes of video and summarize it into contextually informative reports of accidents. They can also help ensure
personal protective equipment compliance in factories, improving worker
safety in industrial settings.
● Preventing accidents and production problems: AI agents can identify
atypical activity to quickly mitigate operational and safety risks, whether in a
warehouse, factory or airport, or at an intersection or other municipal setting.
● Learning from the past: Agents can search through operations video
archives and relevant information from the past and use it to solve problems or create new processes.
Video analysts for sports, entertainment and more
Another industry where video analysis AI agents stand to make a mark is sports — a $500 billion market worldwide, with hundreds of billions in projected growth over the next several years.
Coaches, teams and leagues — whether professional or amateur — rely on video analytics to evaluate and enhance player performance, prioritize safety and boost fan engagement through player analytics platforms and data visualization. With visually perceptive AI agents, athletes now have unprecedented access to deeper insights and opportunities for improvement.
During his CES opening keynote, Nvidia’s Huang demonstrated an AI video analytics agent that assessed the fastball pitching skills of an amateur baseball player compared with a professional’s. Using video captured from the ceremonial first pitch that Huang threw for the San Francisco Giants
baseball team, the video analytics AI agent was able to suggest areas for improvement.
The $3 trillion media and entertainment industry is also poised to benefit from video-analyst AI agents. Through the Nvidia Media2 initiative, these agents will help drive the creation of smarter, more tailored and more impactful content that can adapt to individual viewer preferences.
Worldwide adoption and availability
Partners from around the world are integrating the blueprint for building AI agents for video analysis into their own developer workflows, including Accenture, Infosys, Linker Vision, Pegatron, TATA Consultancy Services (TCS), Telit Cinterion and VAST.