Medical visual question answering

US11901047B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11901047-B2
Application numberUS-202017082334-A
CountryUS
Kind codeB2
Filing dateOct 28, 2020
Priority dateOct 28, 2020
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Aspects of the invention include a computer-implemented method including extracting a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data. A domain-specific semantic meaning of text data is determined. The object feature is mapped to a portion of the text data, wherein the portion of the text data describes the object. A joint representation of the object and the portion of the text data is created. A second image data and a query directed towards an object in the second image data is received. An answer to the query is generated based on the joint representation.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: extracting, by a processor, a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data; determining, by the processor, domain-specific semantic meaning of text data; mapping, by the processor, the object feature to a portion of the text data, wherein the portion of the text data describes the object; creating, by the processor, a joint representation of the object and the portion of the text data; receiving, by the processor, a second image data and a query directed towards an object in the second image data; and generating, by the processor, an answer to the query based on the joint representation. 2. The computer-implemented method of claim 1 , wherein extracting the domain-specific object feature comprises: generating a bounding box around the object in the first image data; and extracting the object feature from within the bounding box. 3. The computer-implemented method of claim 1 , wherein determining the domain-specific semantic meaning comprises: organizing the text data into a parse tree, wherein the parse tree is segmented into tokens; masking a token of the segmented tokens in the parse tree; and determining a semantic meaning of the masked token based at least in part on tokens surrounding the masked token. 4. The computer-implemented method of claim 1 further comprising: providing a training image and a training query; determining an object in the training image associated with the training query; and generating a natural language response to the training query based on the joint representation. 5. The computer-implemented method of claim 4 further comprising displaying the natural language response on a display of a user computing device. 6. The computer-implemented method of claim 1 , wherein the domain-specific object feature are extracted by a region-based convolutional neural network (R-CNN) and the semantic meaning is determined by a recurrent neural network (RNN). 7. The computer-implemented method of claim 1 , wherein the first image data and the text data are related to a healthcare domain. 8. A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising: extracting a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data; determining domain-specific semantic meaning of text data; mapping the object feature to a portion of the text data, wherein the portion of the text data describes the object; creating a joint representation of the object and the portion of the text data; receiving a second image data and a query directed towards an object in the second image data; and generating, by the processor, an answer to the query based on the joint representation. 9. The system of claim 8 , wherein extracting the domain-specific object feature comprises: generating a bounding box around the object in the first image data; and extracting the object feature from within the bounding box. 10. The system of claim 8 , wherein determining the domain-specific semantic meaning comprises: organizing the text data into a parse tree, wherein the parse tree is segmented into tokens; masking a token of the segmented tokens in the parse tree from at least one layer of the neural network; and determining a semantic meaning of the masked token based at least in part on tokens surrounding the masked token. 11. The system of claim 8 , the operations further comprising: providing the neural network with a training image and a training query; determining an object in the training image associated with the training query; and generating a natural language response to the training query based on the joint representation. 12. The system of claim 11 , the operations further comprising displaying the natural language response on a display of a user computing device. 13. The system of claim 8 , wherein the domain-specific object feature are extracted by a region-based convolutional neural network (R-CNN) and the semantic meaning is determined by a recurrent neural network (RNN). 14. The system of claim 8 , wherein the first image data and the text data are related to a healthcare domain. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of a neural network to cause the processor to perform operations comprising: extracting a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data; determining domain-specific semantic meaning of text data; mapping the object feature to a portion of the text data, wherein the portion of the text data describes the object; creating a joint representation of the object and the portion of the text data; receiving a second image data and a query directed towards an object in the second image data; and generating, by the processor, an answer to the query based on the joint representation. 16. The computer program product of claim 15 , wherein extracting the domain-specific object feature comprises: generating a bounding box around the object in the first image data; and extracting the object feature from within the bounding box. 17. The computer program product of claim 15 , wherein determining the domain-specific semantic meaning comprises: organizing the text data into a parse tree, wherein the parse tree is segmented into tokens; masking a token of the segmented tokens in the parse tree from at least one layer of the neural network; and determining a semantic meaning of the masked token based at least in part on tokens surrounding the masked token. 18. The computer program product of claim 15 , the operations further comprising: providing the neural network with a training image and a training query; determining an object in the training image associated with the training query; and generating a natural language response to the training query based on the joint representation. 19. The computer program product of claim 18 , the operations further comprising displaying the natural language response on a display of a user computing device. 20. The computer program product of claim 15 , wherein the domain-specific object feature are extracted by a region-based convolutional neural network (R-CNN) and the semantic meaning is determined by a recurrent neural network (RNN).

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G16H10/20Primary

    for electronic clinical trials or questionnaires · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11901047B2 cover?
Aspects of the invention include a computer-implemented method including extracting a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data. A domain-specific semantic meaning of text data is determined. The object feature is mapped to a portion of the text data, wherein the portion of the text data describes the object. A joint …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G16H10/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).