Who is the assignee on this patent?

Toyota Eng & Mfg North America, Toyota Motor Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06V10/764. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for efficient video instance segmentation for vehicles using edge computing

US12499555B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12499555-B2
Application number	US-202318227453-A
Country	US
Kind code	B2
Filing date	Jul 28, 2023
Priority date	Jul 28, 2023
Publication date	Dec 16, 2025
Grant date	Dec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for video instance segmentation is provided. The method includes inputting a plurality of video frames collected by a sensor of a vehicle to a trained machine learning model to obtain an n-th output from an n-th layer of the trained machine learning model and an n+1-st output from an n+1-st layer of the trained machine learning model, the trained machine learning model comprising a deep learning model and early-exit subnets, and in response to determining that a difference between the n-th output and the n+1-st output is less than a threshold value, controlling the vehicle based on the n+1-st output, the n+1-st output includes information about instances in the plurality of video frames.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for video instance segmentation, the method comprising: inputting a plurality of video frames collected by a sensor of a vehicle to a trained machine learning model to obtain an n-th output from an n-th layer of the trained machine learning model and an n+1-st output from an n+1-st layer of the trained machine learning model, the trained machine learning model comprising a deep learning model and early-exit subnets; and in response to determining that a difference between the n-th output and the n+1-st output is less than a threshold value, controlling the vehicle based on the n+1-st output, the n+1-st output includes information about instances in the plurality of video frames. 2 . The method of claim 1 , wherein the deep learning model is a transformer-based model, the n-th layer of the trained machine learning model is an n-th layer of the transformer-based model, and the n+1-st layer of the trained machine learning model is an n+1-st layer of the transformer-based model. 3 . The method of claim 1 , further comprising: preprocessing video data collected by the sensor of the vehicle; and determining whether the plurality of video frames is the same as or greater than a threshold number; and in response to determining that the plurality of video frames is the same as or greater than a threshold number, inputting the plurality of video frames to the trained machine learning model. 4 . The method of claim 1 , wherein each of the n-th output and the n+1-st output includes instance segmentation masks of the video frames, and the difference between the n-th output and the n+1-st output is determined by comparing boundaries of instance segmentation masks in the n-th output and boundaries of instance segmentation masks in the n+1-st output. 5 . The method of claim 1 , wherein each of the n-th output and the n+1-st output includes instance segmentation of the video frames, and the difference between the n-th output and the n+1-st output is determined by comparing classified objects in the n-th output and classified objects in the n+1-st output. 6 . The method of claim 1 , wherein each of the n-th output and the n+1-st output includes instance segmentation of the video frames, and the difference between the n-th output and the n+1-st output is determined by comparing pixels of instance segmentation masks in the n-th output and pixels of instance segmentation masks in the n+1-st output. 7 . The method of claim 1 , further comprising: training an initial machine learning model to obtain the trained machine learning model by: training the deep learning model of the initial machine learning model using a training data set including a plurality of video frames as input and instance segmentation masks as output; and training the early-exit subnets of the initial machine learning model using a training data set including a plurality of video frames as input and instance segmentation masks as output. 8 . The method of claim 7 , further comprising: optimizing the trained initial machine learning model by removing redundant or unnecessary layers or parameters of the trained initial machine learning model. 9 . A vehicle comprising: a sensor configured to collect a plurality of video frames; and a controller programmed to: input the plurality of video frames collected by the sensor to a trained machine learning model to obtain an n-th output from an n-th layer of the trained machine learning model and an n+1-st output from an n+1-st layer of the trained machine learning model, the trained machine learning model comprising a deep learning model and early-exit subnets; and in response to determining that a difference between the n-th output and the n+1-st output is less than a threshold value, control the vehicle based on the n+1-st output, the n+1-st output includes information about instances in the plurality of video frames. 10 . The vehicle of claim 9 , wherein the deep learning model is a transformer-based model, the n-th layer of the trained machine learning model is an n-th layer of the transformer-based model, and the n+1-st layer of the trained machine learning model is an n+1-st layer of the transformer-based model. 11 . The vehicle of claim 9 , wherein the controller is further programmed to: preprocess video data collected by the sensor; and determine whether the plurality of video frames is the same as or greater than a threshold number; and in response to determining that the plurality of video frames is the same as or greater than a threshold number, input the plurality of video frames to the trained machine learning model. 12 . The vehicle of claim 9 , wherein each of the n-th output and the n+1-st output includes instance segmentation masks of the video frames, and the difference between the n-th output and the n+1-st output is determined by comparing boundaries of instance segmentation masks in the n-th output and boundaries of instance segmentation masks in the n+1-st output. 13 . The vehicle of claim 9 , wherein each of the n-th output and the n+1-st output includes instance segmentation of the video frames, and the difference between the n-th output and the n+1-st output is determined by comparing classified objects in the n-th output and classified objects in the n+1-st output. 14 . The vehicle of claim 9 , wherein each of the n-th output and the n+1-st output includes instance segmentation of the video frames, and the difference between the n-th output and the n+1-st output is determined by comparing pixels of instance segmentation masks in the n-th output and pixels of instance segmentation masks in the n+1-st output. 15 . The vehicle of claim 9 , wherein the vehicle autonomously drives based on the n+1-st output. 16 . A system comprising: a server; and a vehicle comprising: a sensor configured to collect a plurality of video frames; and a processor programmed to: input the plurality of video frames collected by the sensor to a trained machine learning model to obtain an n-th output from an n-th layer of the trained machine learning model and an n+1-st output from an n+1-st layer of the trained machine learning model, the trained machine learning model comprising a deep learning model and early-exit subnets; and in response to determining that a difference between the n-th output and the n+1-st output is less than a threshold value, control the vehicle based on the n+1-st output, the n+1-st output includes information about instances in the plurality of video frames. 17 . The system of claim 16 , wherein the server is further programmed to: train an initial machine learning model to obtain the trained machine learning model by: training the deep learning model of the initial machine learning model using a training data set including a plurality of video frames as input and instance segmentation masks as output; and training the early-exit subnets of the initial machine learning model using a training data set including a plurality of video frames as input and instance segmentation masks as output. 18 . The system of claim 16 , wherein the server is programmed to: optimize the trained initial machine learning model by removing redundant or unnecessary layers or parameters of the trained initial machine learning model. 19 . The system of claim 16 , wherein the deep learning model is a transformer-based model, the n-th layer of the trained machine learning model is an n-th layer of the transformer-based mode

Assignees

Inventors

Classifications

G06V10/761
Proximity, similarity or dissimilarity measures · CPC title
G06V10/764Primary
using classification, e.g. of video objects · CPC title
G06T2207/20081
Training; Learning · CPC title
G06V20/56
exterior to a vehicle by using sensors mounted on the vehicle · CPC title
G06T2207/20084
Artificial neural networks [ANN] · CPC title

Patent family

Related publications grouped by family.

View patent family 94372362

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499555B2 cover?: A method for video instance segmentation is provided. The method includes inputting a plurality of video frames collected by a sensor of a vehicle to a trained machine learning model to obtain an n-th output from an n-th layer of the trained machine learning model and an n+1-st output from an n+1-st layer of the trained machine learning model, the trained machine learning model comprising a dee…
Who is the assignee on this patent?: Toyota Eng & Mfg North America, Toyota Motor Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06V10/764. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multi-attention machine learning for object detection and classification

Method of recognizing stop line of autonomous vehicle

Systems and methods for determining real-time lane level snow accumulation

Systems and methods for generating a task offloading strategy for a vehicular edge-computing environment

Target tracking method and apparatus, medium, and device

Motion determination system and method thereof

Frequently asked questions