Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06N5/025. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Extracting and classifying entities from digital content items

US12499374B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12499374-B2
Application number	US-202117566418-A
Country	US
Kind code	B2
Filing date	Dec 30, 2021
Priority date	Dec 30, 2021
Publication date	Dec 16, 2025
Grant date	Dec 16, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to extracting entities from a collection of digital content items based on text from within the digital content items. For example, the present disclosure describes a customizable entity extraction system that utilizes a number of models to extract entities, rank entities, and classify certain entities using a combination of rule-based and machine learning approaches. In one or more embodiments, a customizable entity extraction system applies a set of rules to unstructured text of a collection of digital content items to extract and classify a set of entities in connection with a specific domain of interest.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: receiving a plurality of digital content items, the plurality of digital content items including a collection of entities contained within unstructured text portions of the plurality of digital content items; applying a rule-based model to each digital content item from the plurality of digital content items to extract a proper subset of entities of the collection of entities, the rule-based model including a static set of rules to be applied to an unstructured text portion of a given digital content item for identifying any number of entities from the given digital content item for inclusion in the proper subset of entities; selectively applying an entity ranking machine learning model only to the proper subset of entities of the collection of entities to determine an importance score for each entity from the proper subset of entities, the entity ranking machine learning model being trained to output an importance score for a given entity within a given digital content item, the importance score indicating a metric of importance of an associated entity within an associated digital content item from the plurality of digital content items; receiving, based on a user input, a candidate term associated with a domain of interest; and applying a zero-shot classification model to the proper subset of entities and associated importance scores to determine key entities from the collection of entities associated with the candidate term, the zero-shot classification model being trained to associate a given input term with at least one term from a set of base terms embedded within a code of the zero-shot classification model, the set of base terms being independent from the candidate terms or the proper subset of entities extracted from the plurality of digital content items. 2 . The method of claim 1 , wherein the proper subset of entities include a proper subset of terms from the unstructured text content. 3 . The method of claim 1 , wherein the static set of rules of the rule-based model can be uniformly applied to a given portion of text to identify at least one term from the given portion of text based on characteristics of the terms within the given portion of text. 4 . The method of claim 1 , wherein the importance score for each entity from the proper subset of entities is determined based on a frequency of each entity within a corpus of text represented by the plurality of digital content items. 5 . The method of claim 1 , wherein the candidate term has a semantic meaning, and wherein the zero-shot classification model is trained to determine a semantic meaning for a given candidate term. 6 . The method of claim 1 , wherein the zero-shot classification model is configured to: associate a semantic meaning of the candidate term to a base term from the set of base terms embedded within the code of the zero-shot classification model; receive one or more entities as input entities to the zero-shot classification model; and associate the one or more entities with the candidate term based on a determined association between the one or more entities and the base term. 7 . The method of claim 1 , further comprising determining a filtered set of entities from the proper subset of entities based on importance scores of the filtered set of entities being greater than or equal to a threshold importance score. 8 . The method of claim 7 , wherein applying the zero-shot classification model to the proper subset of entities includes selectively providing only the filtered set of entities as inputs to the zero-shot classification model, and wherein the filtered set of entities is a filtered subset of the proper subset of entities based on determined associations between the proper subset of entities and the candidate term. 9 . The method of claim 1 , wherein the plurality of digital content items includes text portions of a plurality of posts shared by users of a social networking system. 10 . The method of claim 1 , further comprising generating an extraction report for the plurality of digital content items, the extraction report including a listing of the proper subset of entities from the plurality of digital content items and indications of an estimated importance of respective entities from the proper subset of entities based on importance scores for the proper subset of entities determined by the entity ranking machine learning model. 11 . The method of claim 10 , further comprising generating a correlation graph object for the collection of digital content items including a plurality of nodes associated with the proper subset of entities and a plurality of edges based on co-occurrence of the proper subset of entities and one or more additional terms included within the collection of digital content items. 12 . A system, comprising: at least one processor; memory in electronic communication with the at least one processor; and instructions stored in the memory, the instruction being executable by the at least one processor to: receive a plurality of digital content items, the plurality of digital content items including a collection of entities contained within unstructured text portions of the plurality of digital content items; apply a rule-based model to each digital content item from the plurality of digital content items to extract a proper subset of entities of the collection of entities, the rule-based model including a static set of rules to be applied to an unstructured text portion of a given digital content item for identifying any number of entities from the given digital content item for inclusion in the proper subset of entities; selectively apply an entity ranking machine learning model only to the proper subset of entities of the collection of entities to determine an importance score for each entity from the proper subset of entities, the entity ranking machine learning model being trained to output an importance score for a given entity within a given digital content item, the importance score indicating a metric of importance of an associated entity within an associated digital content item from the plurality of digital content items; receive, based on a user input, a candidate term associated with a domain of interest; and apply a zero-shot classification model to the proper subset of entities and associated importance scores to determine key entities from the collection of entities associated with the candidate term, the zero-shot classification model being trained to associate a given input term with at least one term from a set of base terms embedded within a code of the zero-shot classification model, the set of base terms being independent from the candidate terms or the proper subset of entities extracted from the plurality of digital content items. 13 . The system of claim 12 , wherein the proper subset of entities include a proper subset of terms from the unstructured text content, and wherein the static set of rules of the rule-based model can be uniformly applied to a given portion of text to identify at least one term from the given portion of text based on characteristics of the terms within the given portion of text. 14 . The system of claim 12 , wherein the importance score for each entity from the proper subset of entities is determined based on a frequency of each entity within the plurality of digital content items. 15 . The system of claim 12 , wherein the zero-shot classification model is configured to: associate a semantic meaning of the candidate term to a base term from the set of base terms embedded within

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06F18/2113
by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation · CPC title
G06F40/30
Semantic analysis · CPC title
G06F40/279
Recognition of textual entities · CPC title
G06F16/353
into predefined classes · CPC title

Patent family

Related publications grouped by family.

View patent family 85108804

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12499374B2 cover?: The present disclosure relates to extracting entities from a collection of digital content items based on text from within the digital content items. For example, the present disclosure describes a customizable entity extraction system that utilizes a number of models to extract entities, rank entities, and classify certain entities using a combination of rule-based and machine learning approac…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06N5/025. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 16 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Generating and presenting a text-based graph object

Generating a domain-specific knowledge graph from unstructured computer text

Learning graph-based priors for generalized zero-shot learning

Natural language processing and artificial intelligence based search system

Entity fingerprints

Generating snippet modules on online social networks

Frequently asked questions