Data segmentation using masks

US11620753B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11620753-B2
Application numberUS-202117541580-A
CountryUS
Kind codeB2
Filing dateDec 3, 2021
Priority dateApr 26, 2018
Publication dateApr 4, 2023
Grant dateApr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A vehicle can include various sensors to detect objects in an environment. Sensor data can be captured by a perception system in a vehicle and represented in a voxel space. Operations may include analyzing the data from a top-down perspective. From this perspective, techniques can associate and generate masks that represent objects in the voxel space. Through manipulation of the regions of the masks, the sensor data and/or voxels associated with the masks can be clustered or otherwise grouped to segment data associated with the objects.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: one or more processors; and one or more computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: receiving sensor data indicative of at least a portion of an object in an environment; associating the sensor data with a three-dimensional space; determining a representation associated with a portion of the three-dimensional space, the representation representing at least a portion of the object using a different perspective; and determining, based at least in part on the representation, at least a portion of the sensor data associated with the object. 2. The system of claim 1 , wherein: the representation comprises at least one of a two-dimensional mask, a bounding box, or segmentation information; and the three-dimensional space comprises a voxel space. 3. The system of claim 1 , wherein the representation is a first representation, the operations further comprising: determining a second representation based at least in part on the first representation and a third representation associated with another detected object in the three-dimensional space. 4. The system of claim 1 , the operations further comprising: determining, for a region of the three-dimensional space, a semantic segmentation probability; and determining the portion of the sensor data associated with the object further based at least in part on the semantic segmentation probability. 5. The system of claim 4 , wherein the semantic segmentation probability is one of a plurality of semantic segmentation probabilities associated with the region. 6. The system of claim 1 , wherein determining the at least the portion of the sensor data comprises segmenting the sensor data using a region growing algorithm. 7. One or more non-transitory computer-readable media storing instructions executable by one or more processors, wherein the instructions, when executed, cause the one or more processors to perform operations comprising: receiving sensor data indicative of an object in an environment; associating the sensor data with a three-dimensional space; determining a representation associated with a portion of the three-dimensional space, the representation representing at least a portion of the object using a different perspective; and determining, based at least in part on the representation, a portion of the sensor data associated with the object. 8. The one or more non-transitory computer-readable media of claim 7 , the operations further comprising: generating, based at least in part on the portion of the sensor data associated with the object, a trajectory for an autonomous vehicle; and controlling, based at least in part on the trajectory, the autonomous vehicle to traverse the environment. 9. The one or more non-transitory computer-readable media of claim 7 , the operations further comprising: associating the sensor data with a multi-channel image; inputting the multi-channel image into a machine learning algorithm; and receiving, as the representation, an output of the machine learning algorithm. 10. The one or more non-transitory computer-readable media of claim 9 , wherein the multi-channel image comprises a number of channels based at least in part on a height of a voxel space associated with the multi-channel image and one or more features. 11. The one or more non-transitory computer-readable media of claim 10 , wherein the one or more features comprise at least one of: an average of sensor data, a number of times sensor data is associated with a voxel, a covariance of sensor data, a probability of a voxel belonging to one or more classifications, a ray casting information associated with a voxel; or an occupancy of a voxel. 12. The one or more non-transitory computer-readable media of claim 7 , wherein the representation is further based at least in part on a classification associated with the object. 13. The one or more non-transitory computer-readable media of claim 12 , wherein the classification is at least one or more of a vehicle, a bicycle, or a pedestrian. 14. The one or more non-transitory computer-readable media of claim 7 , wherein the representation is a first representation, the operations further comprising: generating a second representation based at least in part on an intersection of an expansion of the first representation and a third representation associated with another object associated with the three-dimensional space. 15. The one or more non-transitory computer-readable media of claim 7 , wherein determining the portion of the sensor data associated with the object comprises at least one of: associating the portion of the sensor data with the representation; associating one or more pseudo pixels of a multi-channel image with the representation; or associating one or more voxels of a voxel space with the representation. 16. A method comprising: receiving sensor data representing at least a portion of an object in an environment; associating the sensor data with a three-dimensional space; receiving a representation associated with a portion of the three-dimensional space, the representation representing at least a portion of the object using a different perspective; and determining, based at least in part on the representation, at least a portion of the sensor data associated with the object. 17. The method of claim 16 , wherein: the representation comprises at least one of a two-dimensional mask, a bounding box, or segmentation information; and the three-dimensional space comprises a voxel space. 18. The method of claim 16 , wherein the representation is a first representation, the method further comprising: determining a second representation based at least in part on the first representation and a third representation associated with another detected object in the three-dimensional space. 19. The method of claim 16 , further comprising: inputting a multi-channel image representing a voxel space into a machine learning algorithm; and receiving, as the representation, an output of the machine learning algorithm, wherein the multi-channel image comprises a length associated with a first dimension of the three-dimensional space, a width associated with a second dimension of the three-dimensional space, and a number of channels, and further wherein the number of channels is based, at least in part, on a third dimension of the three-dimensional space. 20. The method of claim 19 , wherein the number of channels is further based at least in part on one or more features comprising at least one of: an average of sensor data; a covariance of sensor data; a number of observations of sensor data; occupancy data; or one or more probabilities associated with a semantic classification.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620753B2 cover?
A vehicle can include various sensors to detect objects in an environment. Sensor data can be captured by a perception system in a vehicle and represented in a voxel space. Operations may include analyzing the data from a top-down perspective. From this perspective, techniques can associate and generate masks that represent objects in the voxel space. Through manipulation of the regions of the …
Who is the assignee on this patent?
Zoox Inc
What technology area does this patent fall under?
Primary CPC classification G06T7/11. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).