Dictionary learning method and means for zero-shot recognition

US11798264B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11798264-B2
Application numberUS-202217588218-A
CountryUS
Kind codeB2
Filing dateJan 29, 2022
Priority dateOct 22, 2021
Publication dateOct 24, 2023
Grant dateOct 24, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Dictionary learning method and means for zero-shot recognition can establish the alignment between visual space and semantic space at category layer and image level, so as to realize high-precision zero-shot image recognition. The dictionary learning method includes the following steps: (1) training a cross domain dictionary of a category layer based on a cross domain dictionary learning method; (2) generating semantic attributes of an image based on the cross domain dictionary of the category layer learned in step (1); (3) training a cross domain dictionary of the image layer based on the image semantic attributes generated in step (2); (4) completing a recognition task of invisible category images based on the cross domain dictionary of the image layer learned in step (3).

First claim

Opening claim text (preview).

What is claimed is: 1. A dictionary learning method for zero-shot recognition, comprises the following steps: (1) training a cross domain dictionary of a category layer based on a cross domain dictionary learning method; (2) generating semantic attributes of an image based on the cross domain dictionary of the category layer learned in step (1); (3) training a cross domain dictionary of the image layer based on the image semantic attributes generated in step (2); (4) completing a recognition task of invisible category images based on the cross domain dictionary of the image layer learned in step (3); the step (1) comprises: (1.1) extracting a category prototype P V of visual space by calculating a category center of a visible category image, the formula is as follows: p =∥Y v −P v H∥ F 2 ,  (1) wherein, Y v is a sample characteristic matrix, H is a sample label matrix; (1.2) forming a pair of inputs with the category prototype P v and category semantic attributes P s , training the cross domain dictionary at the category layer, and establishing a relationship between visual space and semantic space at the category layer by constraining the category prototype and category semantic attributes to share the sparsity coefficient; a specific representation is formula (2) seen =∥P v −D v X p ∥ F 2 +λ∥P s −D s X p ∥ F 2 ,  (2) wherein, the first term is a reconstruction error term of visual space dictionary, the second term is a reconstruction error term of semantic space dictionary, D v is a visual space dictionary, D s is a semantic space dictionary, X p is a sparse coefficient matrix, λ is a harmonic parameter; (1.3) introducing an adaptive loss function of invisible category as formula (3), in order to reduce an impact of domain difference between visible category and invisible category on model accuracy and improve the recognition ability of the model for invisible category samples, unseen =∥P v u −D v X p u ∥ F 2 +λ∥P s u −D s X p u ∥ F 2 ,  (3) wherein, P v u is a category prototype of unseen category to be solved, P s u is a semantic attribute matrix of invisible category, X p u is a sparse coefficient matrix corresponding to invisible category; a whole loss function of class-level model is as follows: class =L seen +αL unseen +βL p ,  (4) training objective of the category layer is to minimize the loss function shown in equation (4) for solving variables including: visual space dictionary D v , semantic space dictionary D s , seen category prototype P v , invisible category prototype P v u , seen category sparse coefficient X p , and invisible category sparse coefficient X p u . 2. The dictionary learning method for zero-shot recognition according to claim 1 , the step (2) comprises: (2.1) generating a sparse coefficient X y of the image by using the visual space dictionary D v , and a specific representation is formula (5): min X y ∥Y v −D v X y ∥ F 2 +ω x ∥X y −X p H∥ F 2 ,  (5) wherein, the first term is a reconstruction error term, the second term is a constraint term which constrains the generated image sparse coefficient to be closed to a sparse coefficient generated by its category based on the same visual space dictionary D v , w x is a harmonic parameter; (2.2) generating a semantic attribute of the image Y s by using the semantic space dictionary D s and its category semantic attribute P s , a specific representation is formula (6): Y s = λ ⁢ D s ⁢ X y + w p ⁢ P s ⁢ H λ + w p , ( 6 ) wherein, w p is a harmonic parameter. 3. The dictionary learning method for zero-shot recognition according to claim 2 , the step (3) comprises: training the cross domain dictionary of the image layer based on the image semantic attributes generated in step (2), in order to further find information of the image and improve generalization performance of the model, a specific representation is formula (7): seen =∥Y v −D v image X∥ F 2 +μ∥Y s −D s image X∥ F 2 ,  (7) wherein, the first term is a reconstruction error term of visual space; a second term is a reconstruction error term of semantic space, D v image and D s image is a dictionary of visual space in the image layer and a dictionary of semantic space in the image layer, respectively; X is a sparse coefficient, and μ is a harmonic parameter. 4. The dictionary learning method for zero-shot recognition according to claim 3 , the step (4) comprises: in the aspect of comparison of visual space: generating a sparse coefficient X u through semantic space dictionary of the image layer D s image firstly by the invisible category semantic attribute P s u , which is formula (8): min X u ∥P s u −D s image X u ∥ F 2 ,  (8) then, generating representation whose category is in visual space P v u′ =D v image X u by using the dictionary of visual space in the image layer D v image , computing cosine distance between a test image and a description of each category P v u′ [c] respectively, and judging the category of the test image according to the distance, which is formula (9): min c ( D c ( P v u′ [c],y v )),  (9); in the aspect of comparison of sparse domain: extracting its representation in sparse space according to the visual space dictionary of the image layer by the test image, which is formula (10): min x u ∥y v −D v image x u ∥ F 2 ,  (10) computing cosine distance between X u and the description of each category in sparse space X u [c], the closest category to the test image is the category of the image, which is formula (11): min c ( D c ( X u [c],x u )),  (11); in the aspect of comparison of semantic space: firstly, encoding the test image to attain

Assignees

Inventors

Classifications

  • G06V10/772Primary

    Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • G06F18/28Primary

    Determining representative reference patterns, e.g. by averaging or distorting; Generating dictionaries · CPC title

  • Organisation of the process, e.g. bagging or boosting · CPC title

  • Machine learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11798264B2 cover?
Dictionary learning method and means for zero-shot recognition can establish the alignment between visual space and semantic space at category layer and image level, so as to realize high-precision zero-shot image recognition. The dictionary learning method includes the following steps: (1) training a cross domain dictionary of a category layer based on a cross domain dictionary learning method…
Who is the assignee on this patent?
Univ Beijing Technology
What technology area does this patent fall under?
Primary CPC classification G06V10/772. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).