Method for keyword extraction and electronic device implementing the same

US12135940B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12135940-B2
Application numberUS-202117142566-A
CountryUS
Kind codeB2
Filing dateJan 6, 2021
Priority dateJan 6, 2020
Publication dateNov 5, 2024
Grant dateNov 5, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for keyword extraction, an apparatus, an electronic device, and a computer-readable storage medium, which relate to the field of artificial intelligence are provided. The method includes collecting feature information corresponding to an image to be processed, the feature information including text representation information and image visual information and then extracting keywords from the image to be processed based on the feature information. The text representation information includes text content and text visual information corresponding to each text line in the image to be processed. The method for keyword extraction, apparatus, electronic device, and computer-readable storage medium provided in the embodiments of the disclosure may extract the keywords from an image to be processed.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for keyword extraction, the method comprising: collecting feature information corresponding to an image to be processed, the feature information including text representation information and image visual information, the text representation information including text content and text visual information corresponding to a text line in the image to be processed, the text visual information comprising a text feature map corresponding to the text line, the text feature map including text line visual features obtained by encoding visual information including a font size, a font color, and a font shape of the text line; and based on the image visual information and the text representation information including the text content and the text visual information including the text feature map including text line visual features obtained by encoding the visual information including the font size, the font color, and the font shape of the text line, extracting keywords from the image to be processed, the keywords extracted from the image to be processed being representative words or phrases summarizing the image to be processed. 2. The method of claim 1 , wherein, for the text line, the text visual information corresponding to the text line further comprises at least one of: a text position in the image to be processed; word positions in the text line in the image to be processed; or word relative positions in the text line. 3. The method of claim 2 , wherein the extracting of the keywords from the image to be processed comprises: encoding the feature information to obtain an encoded result of the feature information; and based on the encoded result, extracting keywords from the image to be processed, wherein the encoded result includes a text context representation, an image feature representation, and at least one of structure information or topic information representations of all text lines, wherein the text context representation is obtained based on the text representation information, wherein the image feature representation is obtained based on the image visual information, and wherein the structure information and topic information representations of all text lines are obtained based on the text context representation. 4. The method of claim 3 , further comprising: decoding the text context representation, the image feature representation, and the at least one of structure information or topic information representations of all text lines; and based on the decoding, obtaining a keyword sequence comprising the keywords extracted from the image to be processed. 5. The method of claim 3 , wherein the extracting of the keywords from the image to be processed further comprises: based on the encoded result, determining target prediction modes corresponding to each decoding time operation, respectively, and determining a prediction word corresponding to the target prediction modes; outputting prediction words corresponding to each decoding time operation, respectively; and based on a prediction word sequence of all decoding time operations, obtaining keywords. 6. The method of claim 5 , wherein, for a decoding time operation, determining a target prediction mode corresponding to the decoding time operation and determining a prediction word corresponding to the target prediction mode comprises at least one of: based on the encoded result, determining prediction words of each pre-configured prediction mode corresponding to the decoding time operation, respectively, and determining the target prediction mode corresponding to the decoding time operation, and based on the prediction words of each pre-configured prediction mode and the target prediction mode corresponding to the decoding time operation, obtaining a prediction word corresponding to a target pre-stored mode; or based on the encoded result, determining the target prediction mode corresponding to the decoding time operation from each pre-configured prediction mode, and obtaining the prediction word corresponding to the target prediction mode. 7. The method of claim 6 , wherein a pre-configured prediction mode comprises: a first prediction mode in which a keyword prediction is performed based on a common word dictionary; and a second prediction mode in which the keyword prediction is performed based on all words in input text lines. 8. The method of claim 7 , wherein the determining of the prediction word corresponding to the target prediction mode comprises: in response to the target prediction mode being the second prediction mode, determining a weight corresponding to each word contained in the text content in the image to be processed based on the encoded result; and based on the weight corresponding to each word, determining the prediction word corresponding to the target prediction mode. 9. The method of claim 8 , wherein the determining of the weight corresponding to each word contained in the text content in the image to be processed based on the encoded result comprises: based on the encoded result, obtaining a hidden vector corresponding to a current decoding time operation through feature fusion processing; and based on the text context representation and the hidden vector, determining the weight corresponding to each word contained in the text content in the image to be processed. 10. The method of claim 3 , wherein the encoding of the feature information to obtain the encoded result corresponding to the feature information comprises at least one of: encoding the text representation information to obtain a text line representation; encoding the text line representation to obtain the text context representation; or encoding the text context representation to obtain a representation of the structure information and the topic information representations of all text lines. 11. The method of claim 10 , wherein the text content includes a word sequence corresponding to the text line, wherein, for one text line, the text content of the one text line includes a word sequence corresponding to the one text line, and wherein, for the one text line, encoding the text content to obtain one text line representation comprises: encoding the word sequence corresponding to the one text line to obtain a character-based word representation; and based on the character-based word representation, determining the one text line representation corresponding to the one text line. 12. The method of claim 10 , wherein the text line representation comprises a text line representation corresponding to at least one text line, and wherein the encoding of the text line representation to obtain the text context representation comprises: encoding the text line representation respectively to obtain a local text context representation corresponding to the text line; encoding all text line representations as a whole to obtain a global text context representation corresponding to all text lines; and based on the local text context representation corresponding to the text line and the global text context representation corresponding to all text lines, determining the text context representation corresponding to the text line. 13. A non-transitory computer-readable storage medium having stored thereon computer programs which, when are executed by a processor, perform the method of claim 1 . 14. The method of claim 1 , wherein the visual information used to obtain the text line visual features further includes a text background color, structure information, and border information, and wherein the text line visual features are furt

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Classification techniques · CPC title

  • Document-oriented image-based pattern recognition · CPC title

  • Character recognition · CPC title

  • Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12135940B2 cover?
A method for keyword extraction, an apparatus, an electronic device, and a computer-readable storage medium, which relate to the field of artificial intelligence are provided. The method includes collecting feature information corresponding to an image to be processed, the feature information including text representation information and image visual information and then extracting keywords fro…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/295. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).