Prediction on top-down scenes based on object motion
US-2021192748-A1 · Jun 24, 2021 · US
US12260651B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12260651-B2 |
| Application number | US-202418621922-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 29, 2024 |
| Priority date | Nov 9, 2021 |
| Publication date | Mar 25, 2025 |
| Grant date | Mar 25, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system for faster object attribute and/or intent classification may include an machine-learned (ML) architecture that processes temporal sensor data (e.g., multiple instances of sensor data received at different times) and includes a cache in an intermediate layer of the ML architecture. The ML architecture may be capable of classifying an object's intent to enter a roadway, idling near a roadway, or active crossing of a roadway. The ML architecture may additionally or alternatively classify indicator states, such as indications to turn, stop, or the like. Other attributes and/or intentions are discussed herein.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving first sensor data associated with an object; receiving second sensor data associated with the object; determining, by a first machine-learned model and based at least in part on the first sensor data, a first output; storing the first output in a cache; retrieving, from the cache a second output previously generated by the first machine-learned model and based at least in part on the second sensor data; aggregating, as an aggregated output, the first output and the second output; determining, by a second machine-learned model and based at least in part on the aggregated output, an attribute associated with the object in an environment, the attribute indicating at least one of a motion of the object, a state of the object, a predicted intent of the object, or an association of the object with another object; and controlling a vehicle based at least in part on the attribute. 2. The method of claim 1 , wherein the attribute indicates at least one of: an indication of an object motion state, an indication of an object indicator state, an indication that the object is idling, an indication that the object intends to enter a roadway, or an indication that the object is not associated with the roadway. 3. The method of claim 1 , wherein the first machine-learned model comprises an object detection machine-learned model and a set of machine-learned layers that receives outputs from the object detection machine-learned model. 4. The method of claim 3 , wherein aggregating the first output and the second output comprises: determining, by the object detection machine-learned model and based at least in part on the first sensor data, a first intermediate output; determining, by the object detection machine-learned model and based at least in part on the second sensor data, a second intermediate output; determining, based at least in part on a first machine-learned layer of the set of machine-learned layers and the first intermediate output, a third intermediate output; determining, based at least in part on a second machine-learned layer of the set of machine-learned layers and the second intermediate output, a fourth intermediate output; and determining the aggregated output by a third machine-learned layer. 5. The method of claim 1 , wherein: the first sensor data is a first subset of a first set of sensor data; the second sensor data is a second subset of a second set of sensor data; and the first subset and the second subset are determined by a pre-processing machine-learned component based at least in part on the first set of sensor data and the second set of sensor data. 6. The method of claim 1 , wherein the cache stores n number of outputs of the first machine-learned model, wherein n is a positive integer associated with n previous time steps. 7. A system comprising: one or more processors; and a memory storing processor-executable instructions that, when executed by the one or more processors, cause the system to perform operations comprising: receiving first sensor data associated with an object; receiving second sensor data associated with the object; determining, by a first machine-learned model and based at least in part on the first sensor data, a first output, wherein the first machine-learned model comprises an object detection machine-learned model and a set of machine-learned layers that receives outputs from the object detection machine-learned model; determining, by the first machine-learned model and based at least in part on the second sensor data, a second output; aggregating, as an aggregated output, the first output and the second output; determining, by a second machine-learned model and based at least in part on the aggregated output, an attribute associated with the object in an environment, wherein the attribute indicates at least one of: an indication of an object motion state, an indication of an object indicator state, an indication that the object is idling, an indication that the object intends to enter a roadway, or an indication that the object is not associated with the roadway; and controlling a vehicle based at least in part on the attribute. 8. The system of claim 7 , wherein aggregating the first output and the second output comprises: determining, by the object detection machine-learned model and based at least in part on the first sensor data, a first intermediate output; determining, by the object detection machine-learned model and based at least in part on the second sensor data, a second intermediate output; determining, based at least in part on a first machine-learned layer of the set of machine-learned layers and the first intermediate output, a third intermediate output; determining, based at least in part on a second machine-learned layer of the set of machine-learned layers and the second intermediate output, a fourth intermediate output; and determining the aggregated output by a third machine-learned layer. 9. The system of claim 7 , wherein: the first sensor data is a first subset of a first set of sensor data; the second sensor data is a second subset of a second set of sensor data; and the first subset and the second subset are determined by a pre-processing machine-learned component based at least in part on the first set of sensor data and the second set of sensor data. 10. The system of claim 7 , wherein the operations further comprise retrieving the second output from a memory, wherein the memory is a cache and the cache stores n number of outputs of the first machine-learned model, wherein n is a positive integer associated with n previous time steps. 11. The system of claim 7 , wherein the second sensor data is received before the first sensor data and the second output is determined before the first output. 12. One or more non-transitory computer-readable media storing processor-executable instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: receiving first sensor data associated with an object; receiving second sensor data associated with the object; determining, by a first machine-learned model and based at least in part on the first sensor data, a first output; determining a second output by retrieving the second output from a memory, wherein the memory is a cache and the cache stores n number of outputs of the first machine-learned model, wherein n is a positive integer associated with n previous time steps; aggregating, as an aggregated output, the first output and the second output; determining, by a second machine-learned model and based at least in part on the aggregated output, an attribute associated with the object in an environment, wherein the attribute indicates at least one of: an indication of an object motion state, an indication of an object indicator state, an indication that the object is idling, an indication that the object intends to enter a roadway, or an indication that the object is not associated with the roadway; and controlling a vehicle based at least in part on the attribute. 13. The one or more non-transitory computer-readable media of claim 12 , wherein the first machine-learned model comprises an object detection machine-learned model and a set of machine-learned layers that receives outputs from the object detection machine-learned model. 14. The one or more non-transitory computer-readable media of claim 13 , wherein aggregating the first output and the second output comprises: determining, by the object detection machine-learned model
Control of position or course in two dimensions [2D] · CPC title
using external object recognition · CPC title
Learning methods · CPC title
from positioning sensors located off-board the vehicle, e.g. from cameras · CPC title
Recognition of whole body movements, e.g. for sport training · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.