Systems and Methods for Pretraining Image Processing Models
US-2023281400-A1 · Sep 7, 2023 · US
US11901047B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11901047-B2 |
| Application number | US-202017082334-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 28, 2020 |
| Priority date | Oct 28, 2020 |
| Publication date | Feb 13, 2024 |
| Grant date | Feb 13, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Aspects of the invention include a computer-implemented method including extracting a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data. A domain-specific semantic meaning of text data is determined. The object feature is mapped to a portion of the text data, wherein the portion of the text data describes the object. A joint representation of the object and the portion of the text data is created. A second image data and a query directed towards an object in the second image data is received. An answer to the query is generated based on the joint representation.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: extracting, by a processor, a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data; determining, by the processor, domain-specific semantic meaning of text data; mapping, by the processor, the object feature to a portion of the text data, wherein the portion of the text data describes the object; creating, by the processor, a joint representation of the object and the portion of the text data; receiving, by the processor, a second image data and a query directed towards an object in the second image data; and generating, by the processor, an answer to the query based on the joint representation. 2. The computer-implemented method of claim 1 , wherein extracting the domain-specific object feature comprises: generating a bounding box around the object in the first image data; and extracting the object feature from within the bounding box. 3. The computer-implemented method of claim 1 , wherein determining the domain-specific semantic meaning comprises: organizing the text data into a parse tree, wherein the parse tree is segmented into tokens; masking a token of the segmented tokens in the parse tree; and determining a semantic meaning of the masked token based at least in part on tokens surrounding the masked token. 4. The computer-implemented method of claim 1 further comprising: providing a training image and a training query; determining an object in the training image associated with the training query; and generating a natural language response to the training query based on the joint representation. 5. The computer-implemented method of claim 4 further comprising displaying the natural language response on a display of a user computing device. 6. The computer-implemented method of claim 1 , wherein the domain-specific object feature are extracted by a region-based convolutional neural network (R-CNN) and the semantic meaning is determined by a recurrent neural network (RNN). 7. The computer-implemented method of claim 1 , wherein the first image data and the text data are related to a healthcare domain. 8. A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising: extracting a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data; determining domain-specific semantic meaning of text data; mapping the object feature to a portion of the text data, wherein the portion of the text data describes the object; creating a joint representation of the object and the portion of the text data; receiving a second image data and a query directed towards an object in the second image data; and generating, by the processor, an answer to the query based on the joint representation. 9. The system of claim 8 , wherein extracting the domain-specific object feature comprises: generating a bounding box around the object in the first image data; and extracting the object feature from within the bounding box. 10. The system of claim 8 , wherein determining the domain-specific semantic meaning comprises: organizing the text data into a parse tree, wherein the parse tree is segmented into tokens; masking a token of the segmented tokens in the parse tree from at least one layer of the neural network; and determining a semantic meaning of the masked token based at least in part on tokens surrounding the masked token. 11. The system of claim 8 , the operations further comprising: providing the neural network with a training image and a training query; determining an object in the training image associated with the training query; and generating a natural language response to the training query based on the joint representation. 12. The system of claim 11 , the operations further comprising displaying the natural language response on a display of a user computing device. 13. The system of claim 8 , wherein the domain-specific object feature are extracted by a region-based convolutional neural network (R-CNN) and the semantic meaning is determined by a recurrent neural network (RNN). 14. The system of claim 8 , wherein the first image data and the text data are related to a healthcare domain. 15. A computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor of a neural network to cause the processor to perform operations comprising: extracting a domain-specific object feature from a first image data, wherein the feature describes an object in the first image data; determining domain-specific semantic meaning of text data; mapping the object feature to a portion of the text data, wherein the portion of the text data describes the object; creating a joint representation of the object and the portion of the text data; receiving a second image data and a query directed towards an object in the second image data; and generating, by the processor, an answer to the query based on the joint representation. 16. The computer program product of claim 15 , wherein extracting the domain-specific object feature comprises: generating a bounding box around the object in the first image data; and extracting the object feature from within the bounding box. 17. The computer program product of claim 15 , wherein determining the domain-specific semantic meaning comprises: organizing the text data into a parse tree, wherein the parse tree is segmented into tokens; masking a token of the segmented tokens in the parse tree from at least one layer of the neural network; and determining a semantic meaning of the masked token based at least in part on tokens surrounding the masked token. 18. The computer program product of claim 15 , the operations further comprising: providing the neural network with a training image and a training query; determining an object in the training image associated with the training query; and generating a natural language response to the training query based on the joint representation. 19. The computer program product of claim 18 , the operations further comprising displaying the natural language response on a display of a user computing device. 20. The computer program product of claim 15 , wherein the domain-specific object feature are extracted by a region-based convolutional neural network (R-CNN) and the semantic meaning is determined by a recurrent neural network (RNN).
Supervised learning · CPC title
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
for electronic clinical trials or questionnaires · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.