Deep structured scene flow for autonomous devices

US12198358B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12198358-B2
Application numberUS-202217962624-A
CountryUS
Kind codeB2
Filing dateOct 10, 2022
Priority dateNov 16, 2018
Publication dateJan 14, 2025
Grant dateJan 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, tangible non-transitory computer-readable media, and devices associated with motion flow estimation are provided. For example, scene data including representations of an environment over a first set of time intervals can be accessed. Extracted visual cues can be generated based on the representations and machine-learned feature extraction models. At least one of the machine-learned feature extraction models can be configured to generate a portion of the extracted visual cues based on a first set of the representations of the environment from a first perspective and a second set of the representations of the environment from a second perspective. The extracted visual cues can be encoded using energy functions. Three-dimensional motion estimates of object instances at time intervals subsequent to the first set of time intervals can be determined based on the energy functions and machine-learned inference models.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: accessing a first pair of stereo images and a second pair of stereo images of an environment of an autonomous vehicle from a pair of stereo cameras, wherein the first pair of stereo images comprises a first image of the environment from a first perspective at a first time and a second image of the environment from a second perspective at the first time, wherein the second pair of stereo images comprise a first image of the environment from the first perspective at a second time and a second image of the environment from the second perspective at the second time; generating, in less than 10 seconds, a three-dimensional motion estimate for an object in the environment using a machine-learned motion flow model by: determining, using the machine-learned motion flow model, a plurality of extracted features from the first pair of stereo images and the second pair of stereo images, wherein at least one feature of the plurality of features describes an object instance associated with an object in the environment; and processing, using the machine-learned motion flow model, the plurality of extracted features, wherein the machine-learned motion flow model is configured to solve an energy optimization function for characterizing three-dimensional rigid motion of the object instance; and controlling a motion of the autonomous vehicle based on the three-dimensional motion estimate. 2. The computer-implemented method of claim 1 , wherein the three-dimensional motion estimate is generated in less than 1 second. 3. The computer-implemented method of claim 2 , wherein processing, using the machine-learned motion flow model, the plurality of extracted features comprises: executing a solver to optimize, over a plurality of steps, one or more energy terms associated with motion of the object instance; wherein the solver executes in less than 1 second. 4. The computer-implemented method of claim 1 , wherein the machine-learned motion flow model comprises a machine-learned segmentation model. 5. The computer-implemented method of claim 4 , wherein the machine-learned segmentation model is configured to associate the object instance with a portion of at least one of the first pair of stereo images and the second pair of stereo images. 6. The computer-implemented method of claim 1 , wherein the machine-learned motion flow model is trained end-to-end. 7. The computer-implemented method of claim 1 , wherein the plurality of extracted features from the first pair of stereo images and the second pair of stereo images comprise one or more visual cues, the one or more visual cues comprising at least one of an instance segmentation cue, an optical flow cue, or a stereo cue. 8. The computer-implemented method of claim 1 , wherein the machine-learned motion flow model is configured to solve the energy optimization function using a Gaussian-Newton (GN) algorithm implemented as layers in a neural network. 9. The computer-implemented method of claim 1 , the method further comprising: removing uncertain pixels from the plurality of extracted features before processing the plurality of extracted features with the machine-learned motion flow model. 10. A computing system comprising: one or more processors; and one or more tangible non-transitory computer readable media storing computer-readable instructions that are executable by the one or more processors to cause the one or more processors to perform operations, the operations comprising: accessing a first pair of stereo images and a second pair of stereo images of an environment of an autonomous vehicle from a pair of stereo cameras, wherein the first pair of stereo images comprises a first image of the environment from a first perspective at a first time and a second image of the environment from a second perspective at the first time, wherein the second pair of stereo images comprise a first image of the environment from the first perspective at a second time and a second image of the environment from the second perspective at the second time; generating, in less than 10 seconds, a three-dimensional motion estimate for an object in the environment using a machine-learned motion flow model by: determining, using the machine-learned motion flow model, a plurality of extracted features from the first pair of stereo images and the second pair of stereo images, wherein at least one feature of the plurality of features describes an object instance associated with an object in the environment; and processing, using the machine-learned motion flow model, the plurality of extracted features, wherein the machine-learned motion flow model is configured to solve an energy optimization function for characterizing three-dimensional rigid motion of the object instance; and controlling a motion of the autonomous vehicle based on the three-dimensional motion estimate. 11. The computing system of claim 10 , wherein the three-dimensional motion estimate is generated in less than 1 second. 12. The computing system of claim 10 , wherein the plurality of extracted features from the first pair of stereo images and the second pair of stereo images comprise one or more visual cues, the one or more visual cues comprising at least one of an instance segmentation cue, an optical flow cue, or a stereo cue. 13. The computing system of claim 10 , wherein processing, using the machine-learned motion flow model, the plurality of extracted features comprises: executing a solver to optimize, over a plurality of steps, one or more energy terms associated with motion of the object instance; wherein the solver executes in less than 1 second. 14. The computing system of claim 13 , wherein the plurality of steps are implemented as a plurality of layers of a neural network. 15. The computing system of claim 14 , comprising: a graphical processing unit (GPU); wherein the operations comprise: executing the layers of the neural network on the GPU. 16. The computing system of claim 10 , wherein the machine-learned motion flow model is trained end-to-end. 17. The computing system of claim 10 , wherein the pair of stereo cameras is configured to obtain the first pair of stereo images and the second pair of stereo images of an environment associated with an augmented reality system. 18. One or more tangible non-transitory computer readable media storing computer-readable instructions that are executable by one or more processors to cause the one or more processors to perform operations, the operations comprising: accessing a first pair of stereo images and a second pair of stereo images of an environment of an autonomous vehicle from a pair of stereo cameras, wherein the first pair of stereo images comprises a first image of the environment from a first perspective at a first time and a second image of the environment from a second perspective at the first time, wherein the second pair of stereo images comprise a first image of the environment from the first perspective at a second time and a second image of the environment from the second perspective at the second time; generating, in less than 10 seconds, a three-dimensional motion estimate for an object in the environment using a machine-learned motion flow model by: determining, using the machine-learned motion flow model, a plurality of extracted features from the first pair of stereo images and the second pair of stereo images, wherein at least one feature of the plurality of features describes an object instance associated with an object in t

Assignees

Inventors

Classifications

  • exterior to a vehicle by using sensors mounted on the vehicle · CPC title

  • using neural networks · CPC title

  • Extraction of image or video features · CPC title

  • Determining parameters from multiple pictures (depth or shape recovery from multiple images G06T7/55; stereo camera calibration G06T7/85) · CPC title

  • Stereo images · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12198358B2 cover?
Systems, methods, tangible non-transitory computer-readable media, and devices associated with motion flow estimation are provided. For example, scene data including representations of an environment over a first set of time intervals can be accessed. Extracted visual cues can be generated based on the representations and machine-learned feature extraction models. At least one of the machine-le…
Who is the assignee on this patent?
Aurora Operations Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/285. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).