Data segmentation using masks
US-2020218278-A1 · Jul 9, 2020 · US
US11620753B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620753-B2 |
| Application number | US-202117541580-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 3, 2021 |
| Priority date | Apr 26, 2018 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A vehicle can include various sensors to detect objects in an environment. Sensor data can be captured by a perception system in a vehicle and represented in a voxel space. Operations may include analyzing the data from a top-down perspective. From this perspective, techniques can associate and generate masks that represent objects in the voxel space. Through manipulation of the regions of the masks, the sensor data and/or voxels associated with the masks can be clustered or otherwise grouped to segment data associated with the objects.
Opening claim text (preview).
What is claimed is: 1. A system comprising: one or more processors; and one or more computer-readable media storing instructions executable by the one or more processors, wherein the instructions, when executed, cause the system to perform operations comprising: receiving sensor data indicative of at least a portion of an object in an environment; associating the sensor data with a three-dimensional space; determining a representation associated with a portion of the three-dimensional space, the representation representing at least a portion of the object using a different perspective; and determining, based at least in part on the representation, at least a portion of the sensor data associated with the object. 2. The system of claim 1 , wherein: the representation comprises at least one of a two-dimensional mask, a bounding box, or segmentation information; and the three-dimensional space comprises a voxel space. 3. The system of claim 1 , wherein the representation is a first representation, the operations further comprising: determining a second representation based at least in part on the first representation and a third representation associated with another detected object in the three-dimensional space. 4. The system of claim 1 , the operations further comprising: determining, for a region of the three-dimensional space, a semantic segmentation probability; and determining the portion of the sensor data associated with the object further based at least in part on the semantic segmentation probability. 5. The system of claim 4 , wherein the semantic segmentation probability is one of a plurality of semantic segmentation probabilities associated with the region. 6. The system of claim 1 , wherein determining the at least the portion of the sensor data comprises segmenting the sensor data using a region growing algorithm. 7. One or more non-transitory computer-readable media storing instructions executable by one or more processors, wherein the instructions, when executed, cause the one or more processors to perform operations comprising: receiving sensor data indicative of an object in an environment; associating the sensor data with a three-dimensional space; determining a representation associated with a portion of the three-dimensional space, the representation representing at least a portion of the object using a different perspective; and determining, based at least in part on the representation, a portion of the sensor data associated with the object. 8. The one or more non-transitory computer-readable media of claim 7 , the operations further comprising: generating, based at least in part on the portion of the sensor data associated with the object, a trajectory for an autonomous vehicle; and controlling, based at least in part on the trajectory, the autonomous vehicle to traverse the environment. 9. The one or more non-transitory computer-readable media of claim 7 , the operations further comprising: associating the sensor data with a multi-channel image; inputting the multi-channel image into a machine learning algorithm; and receiving, as the representation, an output of the machine learning algorithm. 10. The one or more non-transitory computer-readable media of claim 9 , wherein the multi-channel image comprises a number of channels based at least in part on a height of a voxel space associated with the multi-channel image and one or more features. 11. The one or more non-transitory computer-readable media of claim 10 , wherein the one or more features comprise at least one of: an average of sensor data, a number of times sensor data is associated with a voxel, a covariance of sensor data, a probability of a voxel belonging to one or more classifications, a ray casting information associated with a voxel; or an occupancy of a voxel. 12. The one or more non-transitory computer-readable media of claim 7 , wherein the representation is further based at least in part on a classification associated with the object. 13. The one or more non-transitory computer-readable media of claim 12 , wherein the classification is at least one or more of a vehicle, a bicycle, or a pedestrian. 14. The one or more non-transitory computer-readable media of claim 7 , wherein the representation is a first representation, the operations further comprising: generating a second representation based at least in part on an intersection of an expansion of the first representation and a third representation associated with another object associated with the three-dimensional space. 15. The one or more non-transitory computer-readable media of claim 7 , wherein determining the portion of the sensor data associated with the object comprises at least one of: associating the portion of the sensor data with the representation; associating one or more pseudo pixels of a multi-channel image with the representation; or associating one or more voxels of a voxel space with the representation. 16. A method comprising: receiving sensor data representing at least a portion of an object in an environment; associating the sensor data with a three-dimensional space; receiving a representation associated with a portion of the three-dimensional space, the representation representing at least a portion of the object using a different perspective; and determining, based at least in part on the representation, at least a portion of the sensor data associated with the object. 17. The method of claim 16 , wherein: the representation comprises at least one of a two-dimensional mask, a bounding box, or segmentation information; and the three-dimensional space comprises a voxel space. 18. The method of claim 16 , wherein the representation is a first representation, the method further comprising: determining a second representation based at least in part on the first representation and a third representation associated with another detected object in the three-dimensional space. 19. The method of claim 16 , further comprising: inputting a multi-channel image representing a voxel space into a machine learning algorithm; and receiving, as the representation, an output of the machine learning algorithm, wherein the multi-channel image comprises a length associated with a first dimension of the three-dimensional space, a width associated with a second dimension of the three-dimensional space, and a number of channels, and further wherein the number of channels is based, at least in part, on a third dimension of the three-dimensional space. 20. The method of claim 19 , wherein the number of channels is further based at least in part on one or more features comprising at least one of: an average of sensor data; a covariance of sensor data; a number of observations of sensor data; occupancy data; or one or more probabilities associated with a semantic classification.
Planning or execution of driving tasks · CPC title
Pedestrians · CPC title
Cycles · CPC title
Learning methods · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.