Generating and presenting a text-based graph object
US-12169526-B2 · Dec 17, 2024 · US
US12499374B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12499374-B2 |
| Application number | US-202117566418-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 30, 2021 |
| Priority date | Dec 30, 2021 |
| Publication date | Dec 16, 2025 |
| Grant date | Dec 16, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure relates to extracting entities from a collection of digital content items based on text from within the digital content items. For example, the present disclosure describes a customizable entity extraction system that utilizes a number of models to extract entities, rank entities, and classify certain entities using a combination of rule-based and machine learning approaches. In one or more embodiments, a customizable entity extraction system applies a set of rules to unstructured text of a collection of digital content items to extract and classify a set of entities in connection with a specific domain of interest.
Opening claim text (preview).
What is claimed is: 1 . A method, comprising: receiving a plurality of digital content items, the plurality of digital content items including a collection of entities contained within unstructured text portions of the plurality of digital content items; applying a rule-based model to each digital content item from the plurality of digital content items to extract a proper subset of entities of the collection of entities, the rule-based model including a static set of rules to be applied to an unstructured text portion of a given digital content item for identifying any number of entities from the given digital content item for inclusion in the proper subset of entities; selectively applying an entity ranking machine learning model only to the proper subset of entities of the collection of entities to determine an importance score for each entity from the proper subset of entities, the entity ranking machine learning model being trained to output an importance score for a given entity within a given digital content item, the importance score indicating a metric of importance of an associated entity within an associated digital content item from the plurality of digital content items; receiving, based on a user input, a candidate term associated with a domain of interest; and applying a zero-shot classification model to the proper subset of entities and associated importance scores to determine key entities from the collection of entities associated with the candidate term, the zero-shot classification model being trained to associate a given input term with at least one term from a set of base terms embedded within a code of the zero-shot classification model, the set of base terms being independent from the candidate terms or the proper subset of entities extracted from the plurality of digital content items. 2 . The method of claim 1 , wherein the proper subset of entities include a proper subset of terms from the unstructured text content. 3 . The method of claim 1 , wherein the static set of rules of the rule-based model can be uniformly applied to a given portion of text to identify at least one term from the given portion of text based on characteristics of the terms within the given portion of text. 4 . The method of claim 1 , wherein the importance score for each entity from the proper subset of entities is determined based on a frequency of each entity within a corpus of text represented by the plurality of digital content items. 5 . The method of claim 1 , wherein the candidate term has a semantic meaning, and wherein the zero-shot classification model is trained to determine a semantic meaning for a given candidate term. 6 . The method of claim 1 , wherein the zero-shot classification model is configured to: associate a semantic meaning of the candidate term to a base term from the set of base terms embedded within the code of the zero-shot classification model; receive one or more entities as input entities to the zero-shot classification model; and associate the one or more entities with the candidate term based on a determined association between the one or more entities and the base term. 7 . The method of claim 1 , further comprising determining a filtered set of entities from the proper subset of entities based on importance scores of the filtered set of entities being greater than or equal to a threshold importance score. 8 . The method of claim 7 , wherein applying the zero-shot classification model to the proper subset of entities includes selectively providing only the filtered set of entities as inputs to the zero-shot classification model, and wherein the filtered set of entities is a filtered subset of the proper subset of entities based on determined associations between the proper subset of entities and the candidate term. 9 . The method of claim 1 , wherein the plurality of digital content items includes text portions of a plurality of posts shared by users of a social networking system. 10 . The method of claim 1 , further comprising generating an extraction report for the plurality of digital content items, the extraction report including a listing of the proper subset of entities from the plurality of digital content items and indications of an estimated importance of respective entities from the proper subset of entities based on importance scores for the proper subset of entities determined by the entity ranking machine learning model. 11 . The method of claim 10 , further comprising generating a correlation graph object for the collection of digital content items including a plurality of nodes associated with the proper subset of entities and a plurality of edges based on co-occurrence of the proper subset of entities and one or more additional terms included within the collection of digital content items. 12 . A system, comprising: at least one processor; memory in electronic communication with the at least one processor; and instructions stored in the memory, the instruction being executable by the at least one processor to: receive a plurality of digital content items, the plurality of digital content items including a collection of entities contained within unstructured text portions of the plurality of digital content items; apply a rule-based model to each digital content item from the plurality of digital content items to extract a proper subset of entities of the collection of entities, the rule-based model including a static set of rules to be applied to an unstructured text portion of a given digital content item for identifying any number of entities from the given digital content item for inclusion in the proper subset of entities; selectively apply an entity ranking machine learning model only to the proper subset of entities of the collection of entities to determine an importance score for each entity from the proper subset of entities, the entity ranking machine learning model being trained to output an importance score for a given entity within a given digital content item, the importance score indicating a metric of importance of an associated entity within an associated digital content item from the plurality of digital content items; receive, based on a user input, a candidate term associated with a domain of interest; and apply a zero-shot classification model to the proper subset of entities and associated importance scores to determine key entities from the collection of entities associated with the candidate term, the zero-shot classification model being trained to associate a given input term with at least one term from a set of base terms embedded within a code of the zero-shot classification model, the set of base terms being independent from the candidate terms or the proper subset of entities extracted from the plurality of digital content items. 13 . The system of claim 12 , wherein the proper subset of entities include a proper subset of terms from the unstructured text content, and wherein the static set of rules of the rule-based model can be uniformly applied to a given portion of text to identify at least one term from the given portion of text based on characteristics of the terms within the given portion of text. 14 . The system of claim 12 , wherein the importance score for each entity from the proper subset of entities is determined based on a frequency of each entity within the plurality of digital content items. 15 . The system of claim 12 , wherein the zero-shot classification model is configured to: associate a semantic meaning of the candidate term to a base term from the set of base terms embedded within
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title
Semantic analysis · CPC title
Recognition of textual entities · CPC title
into predefined classes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.