Patient volume segmentation apparatus and method
US-2024408409-A1 · Dec 12, 2024 · US
US12394220B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12394220-B2 |
| Application number | US-202318161661-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 30, 2023 |
| Priority date | Nov 10, 2022 |
| Publication date | Aug 19, 2025 |
| Grant date | Aug 19, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments described herein provide a system for three-dimensional (3D) object detection. The system includes an input interface configured to obtain 3D point data describing spatial information of a plurality of points, and a memory storing a neural network based 3D object detection model having an encoder and a decoder. The system also includes processors to perform operations including: encoding, by the encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features; generating, by attention layers at the decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights.
Opening claim text (preview).
What is claimed is: 1. A system for three-dimensional (3D) object detection, the system comprising: an input interface configured to obtain 3D point data including a plurality of coordinates describing spatial information of a plurality of points; a memory storing a neural network based 3D object detection model comprising an encoder and a decoder, and a plurality of processor-executable instructions; and one or more processors executing the plurality of processor-executable instructions to perform operations comprising: encoding, by the encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features, wherein the second set of point features are obtained by: upsampling the first set of coordinates into a second set of coordinates that contains more sample points than the first set of coordinates; determining, for each sampled point in the second set of coordinates, a respective subset of nearest neighbors from the first set of point features; and computing a corresponding point feature for the each sampled point in the second set of coordinates based on an interpolation of the respective subset of nearest neighbors; generating, by one or more attention layers at the decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights. 2. The system of claim 1 , wherein the determining of the second set of point features comprises: determining, by the encoder, three nearest neighbor points of the each sampled point in the second set of coordinates; determining, by the encoder, point features of the three nearest neighbor points in the first set of point features; performing, by the encoder, a weighted interpolation of the point features of the three nearest neighbor points; and projecting, by the encoder, the interpolated point feature into a feature representation of the each sampled point in the second set of coordinates. 3. The system of claim 2 , wherein the weighted interpolation comprises weighting each of the point features of the three nearest neighbor points by an inverse of the respective Euclidean distance to the each sampled point in the second set of coordinates. 4. The system of claim 1 , wherein the generating of the set of attention weights comprises: generating a first attention weight using the first set of point features and the set of object features; generating a second attention weight using the second set of point features and the set of object features; and concatenating the first attention weight and the second attention weight to form the set of attention weights. 5. The system of claim 1 , wherein the second set of coordinates contains at least twice a number of sampled points than the first set of coordinates. 6. The system of claim 1 , wherein the second set of point features are obtained by: predicting, by the decoder, an intermediate bounding box proposal based on the set of object features; performing cross-attention between the set of object features and candidate points in the intermediate bounding box proposal; and determining, from the first set of point features, a sampled point feature that belongs to the intermediate bounding box proposal based on the cross-attention. 7. The system of claim 6 , wherein the set of attention weights are obtained by: performing multi-head attention between a batch of object features from the set of object features and a batch of point features from the second set of point features. 8. The system of claim 7 , wherein the batch of point features are obtained by processing the second set of point features to have a same token length through padding or truncating tokens. 9. A method of three-dimensional (3D) object detection, the method comprising: receiving, via a data interface, 3D point data including a plurality of coordinates describing spatial information of a plurality of points; encoding, by an encoder, a first set of coordinates into a first set of point features and a set of object features; sampling a second set of point features from the first set of point features, wherein the second set of point features are obtained by: upsampling the first set of coordinates into a second set of coordinates that contains more sample points than the first set of coordinates; determining, for each sampled point in the second set of coordinates, a respective subset of nearest neighbors from the first set of point features; and computing a corresponding point feature for the each sampled point in the second set of coordinates based on an interpolation of the respective subset of nearest neighbors; generating, by one or more attention layers at a decoder, a set of attention weights by applying cross-attention over at least the set of object features and the second set of point feature, and generate, by the decoder, a predicted bounding box among the plurality of points based on at least in part on the set of attention weights. 10. The method of claim 9 , wherein the determining of the second set of point features comprises: determining, by the encoder, three nearest neighbor points of the each sampled point in the second set of coordinates; determining, by the encoder, point features of the three nearest neighbor points in the first set of point features; performing, by the encoder, a weighted interpolation of the point features of the three nearest neighbor points; and projecting, by the encoder, the interpolated point feature into a feature representation of the each sampled point in the second set of coordinates. 11. The method of claim 10 , wherein the performing of the weighted interpolation comprises weighting each of the point features of the three nearest neighbor points by an inverse of the respective Euclidean distance to the each sampled point in the second set of coordinates. 12. The method of claim 9 , wherein the generating of the set of attention weights comprises: generating a first attention weight using the first set of point features and the set of object features; generating a second attention weight using the second set of point features and the set of object features; and concatenating the first attention weight and the second attention weight to form the set of attention weights. 13. The method of claim 9 , wherein the second set of coordinates contains at least twice a number of sampled points than the first set of coordinates. 14. The method of claim 9 , wherein the second set of point features are obtained by: predicting, by the decoder, an intermediate bounding box proposal based on the set of object features; performing cross-attention between the set of object features and candidate points in the intermediate bounding box proposal; and determining, from the first set of point features, a sampled point feature that belongs to the intermediate bounding box proposal based on the cross-attention. 15. The method of claim 14 , wherein the set of attention weights are obtained by: performing multi-head attention between a batch of object features from the set of object features and a batch of point features from the second set of point features. 16. The method of claim 15 , wherein the batch of point features are obtained by processing the second set of point features to have a same token length through padding or truncating tokens.
Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features (colour feature extraction G06V10/56) · CPC title
using neural networks · CPC title
based on interpolation, e.g. bilinear interpolation (image demosaicing G06T3/4015; edge-driven or edge-based scaling G06T3/403) · CPC title
Combinations of networks · CPC title
Depth or shape recovery · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.