Granular neural network architecture search over low-level primitives
US-2024428071-A1 · Dec 26, 2024 · US
US2022019867A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2022019867-A1 |
| Application number | US-202016928094-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jul 14, 2020 |
| Priority date | Jul 14, 2020 |
| Publication date | Jan 20, 2022 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, a computer program product, and a computer system fuse features for multi-modal classifications for a plurality of modality inputs. The method includes receiving a request indicative of the modality inputs to be selected. The method includes performing an embeddings level fusion operation to concatenate features from the modality inputs. The method includes performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs. The method includes determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance. The method includes generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights. The method includes generating a response to the request based on the fused features. The method includes transmitting the response.
Opening claim text (preview).
1 . A computer-implemented method for fusing data for multi-modal classifications for a plurality of modality inputs, the method comprising: receiving a request indicative of the modality inputs to be selected; performing an embeddings level fusion operation to concatenate features from the modality inputs; performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs; determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance; generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights; generating a response to the request based on the fused features; and transmitting the response. 2 . The computer-implemented method of claim 1 , wherein the modality inputs have a deep architecture including a convolution neural network, a recurrent neural network, or a combination thereof. 3 . The computer-implemented method of claim 1 , wherein the features in the modality inputs are concatenated based on a distribution, an embedding, or a combination thereof of the feature in the modality inputs. 4 . The computer-implemented method of claim 1 , wherein the multi-modal discriminative level feature fusion operation includes a deep correlation fusion operation that determines contributions of correlations of the feature representations. 5 . The computer-implemented method of claim 4 , wherein the deep correlation fusion operation determines a degree of correlation of a first one of the feature representations to a second one of the feature representations. 6 . The computer-implemented method of claim 5 , wherein the deep correlation fusion operation determines a corresponding contribution for each of the modality inputs through a weighted sum of each degree of correlation of the feature representations. 7 . The computer-implemented method of claim 1 , wherein the multi-modal discriminative level feature fusion operation includes a pair-wise matching fusion operation is indicative of a pair-wise matching degree of the features according to embeddings obtained for different modality inputs. 8 . A computer program product for fusing data for multi-modal classifications for a plurality of modality inputs, the computer program product comprising: one or more non-transitory computer-readable storage media and program instructions stored on the one or more non-transitory computer-readable storage media capable of performing a method, the method comprising: receiving a request indicative of the modality inputs to be selected; performing an embeddings level fusion operation to concatenate features from the modality inputs; performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs; determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance; generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights; generating a response to the request based on the fused features; and transmitting the response. 9 . The computer program product of claim 8 , wherein the modality inputs have a deep architecture including a convolution neural network, a recurrent neural network, or a combination thereof. 10 . The computer program product of claim 8 , wherein the features in the modality inputs are concatenated based on a distribution, an embedding, or a combination thereof of the feature in the modality inputs. 11 . The computer program product of claim 8 , wherein the multi-modal discriminative level feature fusion operation includes a deep correlation fusion operation that determines contributions of correlations of the feature representations. 12 . The computer program product of claim 11 , wherein the deep correlation fusion operation determines a degree of correlation of a first one of the feature representations to a second one of the feature representations. 13 . The computer program product of claim 12 , wherein the deep correlation fusion operation determines a corresponding contribution for each of the modality inputs through a weighted sum of each degree of correlation of the feature representations. 14 . The computer program product of claim 8 , wherein the multi-modal discriminative level feature fusion operation includes a pair-wise matching fusion operation is indicative of a pair-wise matching degree of the features according to embeddings obtained for different modality inputs. 15 . A computer system for fusing data for multi-modal classifications for a plurality of modality inputs, the computer system comprising: one or more computer processors, one or more computer-readable storage media, and program instructions stored on the one or more of the computer-readable storage media for execution by at least one of the one or more processors capable of performing a method, the method comprising: receiving a request indicative of the modality inputs to be selected; performing an embeddings level fusion operation to concatenate features from the modality inputs; performing a multi-modal discriminative feature level fusion operation that integrates feature representations learned by applying different network structures on the modality inputs; determining weights of the concatenated features and the feature representations based on a measure of the concatenated features and the feature representations indicative of affecting a final prediction performance; generating fused features for the modality inputs based on the concatenated features, the feature representations, and the weights; generating a response to the request based on the fused features; and transmitting the response. 16 . The computer system of claim 15 , wherein the modality inputs have a deep architecture including a convolution neural network, a recurrent neural network, or a combination thereof. 17 . The computer system of claim 15 , wherein the features in the modality inputs are concatenated based on a distribution, an embedding, or a combination thereof of the feature in the modality inputs. 18 . The computer system of claim 15 , wherein the multi-modal discriminative level feature fusion operation includes a deep correlation fusion operation that determines contributions of correlations of the feature representations. 19 . The computer system of claim 18 , wherein the deep correlation fusion operation determines a degree of correlation of a first one of the feature representations to a second one of the feature representations. 20 . The computer system of claim 19 , wherein the deep correlation fusion operation determines a corresponding contribution for each of the modality inputs through a weighted sum of each degree of correlation of the feature representations.
Learning methods · CPC title
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.