Systems and methods for multi-modal multi-dimensional image registration

US12462335B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12462335-B2
Application numberUS-202318092531-A
CountryUS
Kind codeB2
Filing dateJan 3, 2023
Priority dateMar 3, 2022
Publication dateNov 4, 2025
Grant dateNov 4, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method of multi-modal image registration is provided. The method includes receiving as input a fixed image from a first imaging device, receiving as input a moving image from a second imaging device, performing feature extraction on the fixed image via a first feature extractor to generate a fixed image feature map, performing feature extraction on the moving image via second feature extractor to generate a moving image feature map, performing cross-modal attention on the fixed image feature map and the moving image feature map to generate cross-modal feature attention data, performing deep registration on the cross-modal feature attention data via a deep registrator, and outputting a multi-modal registered image.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of multi-modal image registration, the method comprising: receiving as input a fixed image from a first imaging device; receiving as input a moving image from a second imaging device; performing feature extraction on the fixed image via a first feature extractor to generate a fixed image feature map; performing feature extraction on the moving image via second feature extractor to generate a moving image feature map; performing cross-modal attention on the fixed image feature map and the moving image feature map to generate cross-modal feature attention data; performing deep registration on the cross-modal feature attention data via a deep registrator; and outputting a multi-modal registered image. 2 . The method of claim 1 , wherein the first imaging device is a magnetic resonance imaging (“MRI”) device, and the fixed image is an MRI volume of a subject. 3 . The method of claim 1 , wherein the second imaging device is an ultrasound device, and the moving image is a transrectal ultrasound volume of a subject. 4 . The method of claim 1 , wherein performing the cross-modal attention comprises: inputting the fixed image feature map as a primary input into a first cross-modal attention block and inputting the moving image feature map as a cross-modal input into the first cross-modal attention block to generate a first cross-modal attention block output; inputting the moving image feature map as a primary input into a second cross-modal attention block and inputting the fixed image feature map as a cross-modal input into the second cross-modal attention block to generate a second cross-modal attention block output; inputting the first cross-modal attention block output into a common convolution layer to generate a first cross-modal attention convolution output; inputting the second cross-modal attention block output into the common convolution layer to generate a second cross-modal attention convolution output; and performing element-wise addition on the first cross-modal attention convolution output and the second cross-modal attention convolution output to generate the cross-modal feature attention data. 5 . The method of claim 4 , wherein each of the first cross-modal attention block and the second cross-modal attention block are configured to perform a first matrix multiplication of the primary input and the cross-modal input to generate a first matrix output, perform a second matrix multiplication of the primary input and the first matrix output to generate a second matrix output, and perform a concatenation of the cross-modal input and the second matrix output to generate the respective cross-modal attention block output. 6 . The method of claim 5 , wherein the concatenation comprises a plurality of channels, and features of the fixed image feature map are arranged in a first half of the plurality of channels and features of the moving image feature map are arranged in a last half of the plurality of channels. 7 . The method of claim 1 , wherein the deep registrator is configured to perform rigid deep registration on the cross-modal feature attention data to generate an estimated transformation data, the deep registrator comprising a rectified linear unit, two convolution blocks, and three fully connected layers. 8 . The method of claim 7 , further comprising performing a rigid registration implementation on the estimated transformation data to generate the multi-modal registered image. 9 . The method of claim 7 , wherein each of the first feature extractor and the second feature extractor comprise two convolution blocks. 10 . The method of claim 9 , wherein each convolution block comprises a convolution layer and a batch normalization and rectified linear unit layer. 11 . The method of claim 1 , wherein the deep registrator is configured to perform deformable deep registration on the cross-modal feature attention data to generate a predicted deformation field, the deep registrator comprising a rectified linear unit, a first convolution block, a second convolution block, and a convolution layer. 12 . The method of claim 11 , wherein each of the first feature extractor and the second feature extractor comprise a first convolution block, a second convolution block, and a third convolution block, and wherein performing the deep registration further comprises: performing a first channel-wise concatenation of the outputs of the third convolution blocks of the first feature extractor and the second feature extractor; inputting the output of the first channel-wise concatenation through a first intermediate convolution layer; and performing a second channel-wise concatenation of the outputs of the first intermediate convolution layer and the rectified linear unit of the deep registrator. 13 . The method of claim 12 , wherein performing the deep registration further comprises: performing a third channel-wise concatenation of the outputs of the second convolution blocks of the first feature extractor and the second feature extractor; inputting the output of the third channel-wise concatenation through a second intermediate convolution layer; and performing a fourth channel-wise concatenation of the outputs of the second intermediate convolution layer and the first convolution block of the deep registrator. 14 . The method of claim 12 , wherein each convolution block comprises a first convolution layer, a first batch normalization and rectified linear unit layer, a second convolution layer, and a second batch normalization and rectified linear unit layer. 15 . The method of claim 11 , further comprising performing a deformable registration implementation on the predicted deformation field to generate the multi-modal registered image.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12462335B2 cover?
A method of multi-modal image registration is provided. The method includes receiving as input a fixed image from a first imaging device, receiving as input a moving image from a second imaging device, performing feature extraction on the fixed image via a first feature extractor to generate a fixed image feature map, performing feature extraction on the moving image via second feature extracto…
Who is the assignee on this patent?
Rensselaer Polytech Inst
What technology area does this patent fall under?
Primary CPC classification G06T3/14. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 04 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).