Weighted deep fusion architecture

US2022019867A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2022019867-A1
Application numberUS-202016928094-A
CountryUS
Kind codeA1
Filing dateJul 14, 2020
Priority dateJul 14, 2020
Publication dateJan 20, 2022
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, a computer program product, and a computer system fuse features for multi-modal classifications for a plurality of modality inputs. The method includes receiving a request indicative of the modality inputs to be selected. The method includes performing an embeddings level fusion operation to concatenate features from the modality inputs. The method includes performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs. The method includes determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance. The method includes generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights. The method includes generating a response to the request based on the fused features. The method includes transmitting the response.

First claim

Opening claim text (preview).

1 . A computer-implemented method for fusing data for multi-modal classifications for a plurality of modality inputs, the method comprising: receiving a request indicative of the modality inputs to be selected; performing an embeddings level fusion operation to concatenate features from the modality inputs; performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs; determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance; generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights; generating a response to the request based on the fused features; and transmitting the response. 2 . The computer-implemented method of claim 1 , wherein the modality inputs have a deep architecture including a convolution neural network, a recurrent neural network, or a combination thereof. 3 . The computer-implemented method of claim 1 , wherein the features in the modality inputs are concatenated based on a distribution, an embedding, or a combination thereof of the feature in the modality inputs. 4 . The computer-implemented method of claim 1 , wherein the multi-modal discriminative level feature fusion operation includes a deep correlation fusion operation that determines contributions of correlations of the feature representations. 5 . The computer-implemented method of claim 4 , wherein the deep correlation fusion operation determines a degree of correlation of a first one of the feature representations to a second one of the feature representations. 6 . The computer-implemented method of claim 5 , wherein the deep correlation fusion operation determines a corresponding contribution for each of the modality inputs through a weighted sum of each degree of correlation of the feature representations. 7 . The computer-implemented method of claim 1 , wherein the multi-modal discriminative level feature fusion operation includes a pair-wise matching fusion operation is indicative of a pair-wise matching degree of the features according to embeddings obtained for different modality inputs. 8 . A computer program product for fusing data for multi-modal classifications for a plurality of modality inputs, the computer program product comprising: one or more non-transitory computer-readable storage media and program instructions stored on the one or more non-transitory computer-readable storage media capable of performing a method, the method comprising: receiving a request indicative of the modality inputs to be selected; performing an embeddings level fusion operation to concatenate features from the modality inputs; performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs; determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance; generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights; generating a response to the request based on the fused features; and transmitting the response. 9 . The computer program product of claim 8 , wherein the modality inputs have a deep architecture including a convolution neural network, a recurrent neural network, or a combination thereof. 10 . The computer program product of claim 8 , wherein the features in the modality inputs are concatenated based on a distribution, an embedding, or a combination thereof of the feature in the modality inputs. 11 . The computer program product of claim 8 , wherein the multi-modal discriminative level feature fusion operation includes a deep correlation fusion operation that determines contributions of correlations of the feature representations. 12 . The computer program product of claim 11 , wherein the deep correlation fusion operation determines a degree of correlation of a first one of the feature representations to a second one of the feature representations. 13 . The computer program product of claim 12 , wherein the deep correlation fusion operation determines a corresponding contribution for each of the modality inputs through a weighted sum of each degree of correlation of the feature representations. 14 . The computer program product of claim 8 , wherein the multi-modal discriminative level feature fusion operation includes a pair-wise matching fusion operation is indicative of a pair-wise matching degree of the features according to embeddings obtained for different modality inputs. 15 . A computer system for fusing data for multi-modal classifications for a plurality of modality inputs, the computer system comprising: one or more computer processors, one or more computer-readable storage media, and program instructions stored on the one or more of the computer-readable storage media for execution by at least one of the one or more processors capable of performing a method, the method comprising: receiving a request indicative of the modality inputs to be selected; performing an embeddings level fusion operation to concatenate features from the modality inputs; performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs; determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance; generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights; generating a response to the request based on the fused features; and transmitting the response. 16 . The computer system of claim 15 , wherein the modality inputs have a deep architecture including a convolution neural network, a recurrent neural network, or a combination thereof. 17 . The computer system of claim 15 , wherein the features in the modality inputs are concatenated based on a distribution, an embedding, or a combination thereof of the feature in the modality inputs. 18 . The computer system of claim 15 , wherein the multi-modal discriminative level feature fusion operation includes a deep correlation fusion operation that determines contributions of correlations of the feature representations. 19 . The computer system of claim 18 , wherein the deep correlation fusion operation determines a degree of correlation of a first one of the feature representations to a second one of the feature representations. 20 . The computer system of claim 19 , wherein the deep correlation fusion operation determines a corresponding contribution for each of the modality inputs through a weighted sum of each degree of correlation of the feature representations.

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Combinations of networks · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022019867A1 cover?
A method, a computer program product, and a computer system fuse features for multi-modal classifications for a plurality of modality inputs. The method includes receiving a request indicative of the modality inputs to be selected. The method includes performing an embeddings level fusion operation to concatenate features from the modality inputs. The method includes performing a multi-modal di…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 20 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).