Developing and deploying vision AI applications is complex and expensive. Organizations need data scientists and machine learning engineers to create training and inference pipelines based on unstructured data like images and videos. With the great shortage of engineers trained in machine learning, creating and integrating intelligent vision AI applications has become expensive for enterprises.
On the other hand, companies like Google, Intel, Meta, Microsoft, NVIDIA, and OpenAI are making pre-trained models available to customers. Pre-trained models such as face detection, emotion detection, pose detection, and vehicle detection are openly available for developers to build intelligent vision-based applications. Many organizations have invested in CCTV, surveillance, and IP cameras for security. Although these cameras can be connected to existing pre-trained models, the plumbing required to connect the points is too complex.
Building vision AI inference pipelines
Building a vision AI inference pipeline to get information from existing cameras and pretrained models or custom models involves processing, encoding, and normalizing video streams aligned to the target model. Once in place, the inference output needs to be captured along with metadata to provide insight through visual dashboards and analytics.
For platform providers, the vision AI inference pipeline presents an opportunity to create tools and development environments to connect the dots across video sources, models, and the analytics engine. If your development environment offers a no-code/low-code approach, it further speeds up and simplifies the process.
About Vertex AI Vision
Google’s Vertex AI Vision enables organizations to seamlessly integrate computer vision AI into applications without the need for plumbing or heavy lifting. It is an integrated environment that combines video sources, machine learning models, and data warehouses to deliver rich insights and analytics. Customers can use pre-trained models available in the environment or bring in custom models trained on the Vertex AI platform.
A Vertex AI Vision application starts with a blank canvas, which is used to create an AI vision inference pipeline by dragging and dropping components from a visual palette.
The palette contains various connectors including camera/video streams, a collection of pre-trained models, specialized models targeted at specific industry verticals, custom models built with AutoML or Vertex AI, and data warehouses in the form of BigQuery and AI. Vision Warehouse.
According to Google Cloud, Vertex AI Vision has the following services:
- Vertex AI Vision Streams: An endpoint service for ingesting video and image streams over a geographically distributed network. Connect any camera or device from anywhere and let Google handle the scaling and ingest.
- Vertex AI Vision Applications– Developers can create extensive, auto-scaling media processing and analytics pipelines using this serverless orchestration platform.
- Vertex AI vision models: Pre-built vision models for common analytical tasks, including occupancy counting, PPE detection, face blur, and retail product recognition. Additionally, users can build and deploy their own trained models within the Vertex AI platform.
- Vertex AI Vision Warehouse– An integrated serverless rich media storage system that combines Google search and managed video storage. Petabytes of video data can be ingested, stored, and searched within the vault.
For example, the pipeline below ingests video from a single source, forwards it to the people/vehicle counter, and stores the input and output metadata (inference) in the AI Vision Warehouse to run simple queries. It can be replaced with BigQuery to integrate with existing applications or perform complex SQL-based queries.
Deploying a Vertex AI Vision pipeline
Once the pipeline is built visually, it can be deployed to start making inferences. The green check marks in the following screenshot indicate a successful deployment.
The next step is to start ingesting the video stream to trigger the inference. Google provides a command line tool called
vaictl to take the video stream from a source and pass it to the Vertex AI Vision endpoint. It supports both static video files and RTSP streams based on H.264 encoding.
Once the pipeline is activated, the input and output streams can be monitored from the console, as shown.
Since the inference result is stored in the AI Vision Warehouse, it can be queried based on a search criteria. For example, the following screenshot shows frames that contain at least five people or vehicles.
Google provides an SDK to programmatically communicate with the store. BigQuery developers can use existing libraries to run advanced queries based on ANSI SQL.
Integrations and support for Vertex AI Vision at the edge
Vertex AI Vision has tight integration with Vertex AI, Google’s managed machine learning PaaS. Clients can train models via AutoML or custom training. To add custom processing of the output, Google integrated Cloud Functions, which can manipulate the output to add additional annotations or metadata.
The true potential of the Vertex AI Vision platform lies in its no-code approach and the ability to integrate with other Google Cloud services such as BigQuery, Cloud Functions, and Vertex AI.
While Vertex AI Vision is an excellent step toward simplifying vision AI, more support is needed to deploy applications at the edge. Industry verticals such as healthcare, insurance, and automotive prefer to run vision AI pipelines at the edge to avoid latency and meet compliance. Adding edge support will become a key driver for Vertex AI Vision.
Copyright © 2022 IDG Communications, Inc.
Be First to Comment