What technology area does this patent fall under?

Primary CPC classification G06F40/247. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Automated identification of concept labels for a set of documents

US11416684B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11416684-B2
Application number	US-202016784145-A
Country	US
Kind code	B2
Filing date	Feb 6, 2020
Priority date	Feb 6, 2020
Publication date	Aug 16, 2022
Grant date	Aug 16, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described for intelligently identifying concept labels for a set of multiple documents where the identified concept labels are representative of and semantically relevant to the information contained by the set of documents. The technique includes extracting semantic units (e.g., paragraphs) from the set of documents and determining concept labels applicable to the semantic units based on relevance scores computed for the concept labels. The technique includes determining an initial set of concept labels for the set of documents based on the applicable concept labels. The technique further includes obtaining a reference hierarchy associated with the reference set of concept labels and determining a final set of concept labels for the set of documents using a reference hierarchy, the initial set of concept labels, and the relevance scores. The technique includes outputting information identifying the final set of concept labels for the set of documents.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: extracting, by a computer system, a plurality of semantic units from a plurality of documents; for each semantic unit in the plurality of semantic units, determining, by the computer system, from a reference set of concept labels, one or more concept labels applicable to the semantic unit; based on the concept labels determined for the plurality of semantic units, determining, by the computer system, an initial set of concept labels for the plurality of documents, wherein determining the initial set of concept labels comprises: for each semantic unit in the plurality of semantic units, computing an entropy value for the semantic unit based on the one or more concept labels determined to be applicable to the semantic unit, wherein the entropy value for the semantic unit indicates a degree of specificity of the one or more concept labels to the semantic unit; ordering the plurality of semantic units based on the computed entropy values; and using the ordered plurality of semantic units to determine the initial set of concept labels for the plurality of documents; obtaining, by the computer system, a reference hierarchy associated with the reference set of concept labels, the reference hierarchy identifying hierarchical relationships between two or more concept labels in the reference set of concept labels; determining, by the computer system, a final set of concept labels for the plurality of documents using the reference hierarchy and the initial set of concept labels; and outputting, by the computer system, information identifying the final set of concept labels for the plurality of documents. 2. The method of claim 1 , wherein the reference set of concept labels comprise titles of a plurality of reference documents, and wherein the plurality of reference documents comprise Wikipedia articles. 3. The method of claim 1 , wherein the plurality of semantic units comprise a plurality of paragraphs in the plurality of documents. 4. The method of claim 1 , wherein determining, from the reference set of concept labels, for each semantic unit in the plurality of semantic units, the one or more concept labels applicable to the semantic unit comprises: for each semantic unit in the plurality of semantic units: for each concept label in the reference set of concept labels, computing, by the computer system, a relevance score for the concept label for the semantic unit, the relevance score for the concept label indicative of a degree of relevance of the concept label to contents of the semantic unit; and based on the relevance scores computed for the concept labels in the reference set of concept labels for the semantic unit, selecting, by the computer system, the one or more concept labels applicable to the semantic unit from the reference set of concept labels. 5. The method of claim 1 , wherein: a semantic unit in the plurality of semantic units with a higher computed entropy value is placed lower in the ordered plurality of semantic units than a semantic unit in the plurality of semantic units having a lower computed entropy value. 6. The method of claim 1 , wherein determining the initial set of concept labels for the plurality of documents further comprises: (a) selecting an unprocessed semantic unit in the ordered plurality of semantic units with the lowest entropy value; (b) adding to the initial set of concept labels, any concept label associated with the semantic unit that is not already in the initial set of concept labels; and (c) marking the selected semantic unit as processed. 7. The method of claim 6 further comprising repeating (a), (b), and (c) until all the semantic units in the ordered plurality of semantic units have been processed or until a first threshold criterion is satisfied, wherein the first threshold criterion is satisfied when a preconfigured threshold number of concept labels are included in the initial set of concept labels. 8. The method of claim 7 further comprising: determining that the first threshold criterion is satisfied; and adding additional one or more concept labels to the initial set of concept labels to ensure that each semantic unit in the plurality of semantic units is associated with at least one concept label in the initial set of concept labels. 9. The method of claim 8 , wherein adding the additional one or more concept labels to the initial set of concept labels comprises: for at least one unprocessed semantic unit in the ordered plurality of semantic units: identifying that a first concept label associated with the at least one unprocessed semantic unit is not included in the initial set of concept labels; and adding the first concept label to the initial set of concept labels. 10. The method of claim 1 , wherein determining the final set of concept labels comprises: identifying, based upon the reference hierarchy, hierarchical relationships between concept labels in the initial set of concept labels; generating a Directed Acyclic Graph (DAG) of nodes for representing the hierarchical relationships, each node in the DAG of nodes representing a concept label in the initial set of concept labels and, wherein connections between the nodes in the DAG of nodes represent the hierarchical relationships; identifying, based upon the reference hierarchy, a set of ancestor concept labels for the concept labels in the initial set of concept labels, wherein, for at least a first concept label in the initial set of concept labels, the set of ancestor concept labels comprises multiple concept labels that are ancestors of the first concept label in the reference hierarchy and the multiple concept labels are not in the initial set of concept labels; and updating the DAG of nodes to add nodes corresponding to the set of ancestor concept labels to the DAG of nodes, wherein the updating comprises adding connections to the DAG of nodes to represent hierarchical relationships between the nodes representing the set of ancestor concept labels and the nodes representing the concept labels in the initial set of concept labels. 11. The method of claim 10 , wherein determining the final set of concept labels further comprises: assigning a weight to each node in the DAG of nodes based on relevance scores associated with the concept labels represented by the DAG of nodes; computing a usefulness score for each node in the DAG of nodes based on the weight of the node, wherein the usefulness score for each node in the DAG of nodes is computed based on a weighted relevance score computed for the node and a weighted relevance score computed for one or more descendant nodes of the node in the DAG of nodes; selecting a node from the DAG of nodes with the highest usefulness score; and adding a concept label represented by the node selected from the DAG of nodes to the final set of concept labels. 12. The method of claim 11 further comprising: (a) removing the selected node from the DAG of nodes to generate an updated DAG of nodes; (b) re-computing a weight for each node remaining in the updated DAG of nodes; (c) re-computing a usefulness score for each node in the updated DAG of nodes; (d) selecting a node from the updated DAG of nodes with the highest usefulness score; and (e) adding a concept label represented by the node selected from the updated DAG of nodes to the final set of concept labels. 13. The method of claim 12 further comprising: repeating (a), (b), (c), (d), and (e) until a number of concept labels included in the final set of concept labels equals or is higher than a pre-configured threshold number of concept labels. 14.

Assignees

Adobe Inc

Inventors

Classifications

G06F40/247Primary
Thesauruses; Synonyms · CPC title
G06F40/30Primary
Semantic analysis · CPC title
G06F18/29
Graphical models, e.g. Bayesian networks · CPC title
G06V30/416
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
G06F40/216
using statistical methods · CPC title

Patent family

Related publications grouped by family.

View patent family 77178737

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11416684B2 cover?: Techniques are described for intelligently identifying concept labels for a set of multiple documents where the identified concept labels are representative of and semantically relevant to the information contained by the set of documents. The technique includes extracting semantic units (e.g., paragraphs) from the set of documents and determining concept labels applicable to the semantic units…
Who is the assignee on this patent?: Adobe Inc
What technology area does this patent fall under?: Primary CPC classification G06F40/247. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 16 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).