Machine-learned architecture for efficient object attribute and/or intention classification

US12260651B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12260651-B2
Application numberUS-202418621922-A
CountryUS
Kind codeB2
Filing dateMar 29, 2024
Priority dateNov 9, 2021
Publication dateMar 25, 2025
Grant dateMar 25, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system for faster object attribute and/or intent classification may include an machine-learned (ML) architecture that processes temporal sensor data (e.g., multiple instances of sensor data received at different times) and includes a cache in an intermediate layer of the ML architecture. The ML architecture may be capable of classifying an object's intent to enter a roadway, idling near a roadway, or active crossing of a roadway. The ML architecture may additionally or alternatively classify indicator states, such as indications to turn, stop, or the like. Other attributes and/or intentions are discussed herein.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving first sensor data associated with an object; receiving second sensor data associated with the object; determining, by a first machine-learned model and based at least in part on the first sensor data, a first output; storing the first output in a cache; retrieving, from the cache a second output previously generated by the first machine-learned model and based at least in part on the second sensor data; aggregating, as an aggregated output, the first output and the second output; determining, by a second machine-learned model and based at least in part on the aggregated output, an attribute associated with the object in an environment, the attribute indicating at least one of a motion of the object, a state of the object, a predicted intent of the object, or an association of the object with another object; and controlling a vehicle based at least in part on the attribute. 2. The method of claim 1 , wherein the attribute indicates at least one of: an indication of an object motion state, an indication of an object indicator state, an indication that the object is idling, an indication that the object intends to enter a roadway, or an indication that the object is not associated with the roadway. 3. The method of claim 1 , wherein the first machine-learned model comprises an object detection machine-learned model and a set of machine-learned layers that receives outputs from the object detection machine-learned model. 4. The method of claim 3 , wherein aggregating the first output and the second output comprises: determining, by the object detection machine-learned model and based at least in part on the first sensor data, a first intermediate output; determining, by the object detection machine-learned model and based at least in part on the second sensor data, a second intermediate output; determining, based at least in part on a first machine-learned layer of the set of machine-learned layers and the first intermediate output, a third intermediate output; determining, based at least in part on a second machine-learned layer of the set of machine-learned layers and the second intermediate output, a fourth intermediate output; and determining the aggregated output by a third machine-learned layer. 5. The method of claim 1 , wherein: the first sensor data is a first subset of a first set of sensor data; the second sensor data is a second subset of a second set of sensor data; and the first subset and the second subset are determined by a pre-processing machine-learned component based at least in part on the first set of sensor data and the second set of sensor data. 6. The method of claim 1 , wherein the cache stores n number of outputs of the first machine-learned model, wherein n is a positive integer associated with n previous time steps. 7. A system comprising: one or more processors; and a memory storing processor-executable instructions that, when executed by the one or more processors, cause the system to perform operations comprising: receiving first sensor data associated with an object; receiving second sensor data associated with the object; determining, by a first machine-learned model and based at least in part on the first sensor data, a first output, wherein the first machine-learned model comprises an object detection machine-learned model and a set of machine-learned layers that receives outputs from the object detection machine-learned model; determining, by the first machine-learned model and based at least in part on the second sensor data, a second output; aggregating, as an aggregated output, the first output and the second output; determining, by a second machine-learned model and based at least in part on the aggregated output, an attribute associated with the object in an environment, wherein the attribute indicates at least one of: an indication of an object motion state, an indication of an object indicator state, an indication that the object is idling, an indication that the object intends to enter a roadway, or an indication that the object is not associated with the roadway; and controlling a vehicle based at least in part on the attribute. 8. The system of claim 7 , wherein aggregating the first output and the second output comprises: determining, by the object detection machine-learned model and based at least in part on the first sensor data, a first intermediate output; determining, by the object detection machine-learned model and based at least in part on the second sensor data, a second intermediate output; determining, based at least in part on a first machine-learned layer of the set of machine-learned layers and the first intermediate output, a third intermediate output; determining, based at least in part on a second machine-learned layer of the set of machine-learned layers and the second intermediate output, a fourth intermediate output; and determining the aggregated output by a third machine-learned layer. 9. The system of claim 7 , wherein: the first sensor data is a first subset of a first set of sensor data; the second sensor data is a second subset of a second set of sensor data; and the first subset and the second subset are determined by a pre-processing machine-learned component based at least in part on the first set of sensor data and the second set of sensor data. 10. The system of claim 7 , wherein the operations further comprise retrieving the second output from a memory, wherein the memory is a cache and the cache stores n number of outputs of the first machine-learned model, wherein n is a positive integer associated with n previous time steps. 11. The system of claim 7 , wherein the second sensor data is received before the first sensor data and the second output is determined before the first output. 12. One or more non-transitory computer-readable media storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving first sensor data associated with an object; receiving second sensor data associated with the object; determining, by a first machine-learned model and based at least in part on the first sensor data, a first output; determining a second output by retrieving the second output from a memory, wherein the memory is a cache and the cache stores n number of outputs of the first machine-learned model, wherein n is a positive integer associated with n previous time steps; aggregating, as an aggregated output, the first output and the second output; determining, by a second machine-learned model and based at least in part on the aggregated output, an attribute associated with the object in an environment, wherein the attribute indicates at least one of: an indication of an object motion state, an indication of an object indicator state, an indication that the object is idling, an indication that the object intends to enter a roadway, or an indication that the object is not associated with the roadway; and controlling a vehicle based at least in part on the attribute. 13. The one or more non-transitory computer-readable media of claim 12 , wherein the first machine-learned model comprises an object detection machine-learned model and a set of machine-learned layers that receives outputs from the object detection machine-learned model. 14. The one or more non-transitory computer-readable media of claim 13 , wherein aggregating the first output and the second output comprises: determining, by the object detection machine-learned model

Assignees

Inventors

Classifications

  • Control of position or course in two dimensions [2D] · CPC title

  • using external object recognition · CPC title

  • Learning methods · CPC title

  • from positioning sensors located off-board the vehicle, e.g. from cameras · CPC title

  • Recognition of whole body movements, e.g. for sport training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12260651B2 cover?
A system for faster object attribute and/or intent classification may include an machine-learned (ML) architecture that processes temporal sensor data (e.g., multiple instances of sensor data received at different times) and includes a cache in an intermediate layer of the ML architecture. The ML architecture may be capable of classifying an object's intent to enter a roadway, idling near a roa…
Who is the assignee on this patent?
Zoox Inc
What technology area does this patent fall under?
Primary CPC classification G06V20/58. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 25 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).