Enterprise knowledge graphs using enterprise named entity recognition

US12086546B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12086546-B2
Application numberUS-202016933907-A
CountryUS
Kind codeB2
Filing dateJul 20, 2020
Priority dateJul 20, 2020
Publication dateSep 10, 2024
Grant dateSep 10, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Examples described herein generally relate to a computer system including a knowledge graph storing a plurality of entities. A mining of a set of enterprise source documents within an enterprise intranet is performed, by an enterprise named entity recognition (ENER) model, to determine a plurality of entity names. An entity record is generated within a knowledge graph for a mined entity name from the linked entity names based on an entity schema and ones of the set of enterprise source documents associated with the mined entity name. The entity record includes attributes aggregated from the ones of the set of enterprise source documents associated with the mined entity name.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: a memory storing computer-executable instructions; a processor configured to execute the instructions to: perform, by an enterprise named entity recognition (ENER) model, a mining of a set of enterprise source documents within an enterprise intranet to determine a plurality of entity names, wherein the ENER model is trained in a multi-stage training process with public data and non-public enterprise data, the multi-stage training process comprising: in a first stage, training the ENER model using the public data, the public data comprising named entity recognition training data converted from an online reference encyclopedia using a structured knowledge graph; in a second stage, tuning the ENER model using collected data from enterprise documents from the non-public enterprise data and NER training corpora from academic research; and filtering entities that have a number of disconnected instances of potential entity attributes indicative of duplication that exceeds a threshold; generate an entity record within a knowledge graph for a mined entity name from the entity names based on an entity schema and ones of the set of enterprise source documents associated with the mined entity name, the entity record including attributes aggregated from the ones of the set of enterprise source documents associated with the mined entity name, wherein the entity record is a project entity record and includes metadata defining supporting enterprise source documents for each of the attributes of the entity record and the mining of the set of enterprise source documents comprises: comparing the set of enterprise source documents to a set of templates defining potential entity attributes to identify instances within the set of enterprise source documents; partitioning the instances by potential entity names into a plurality of partitions; and clustering the instances within each partition to identify the mined entity name for each partition; filter common words from the instances; filter the plurality of entity names to remove at least one mined entity name where all of the clustered instances for the mined entity name are derived from templates that do not define a project name according to the entity schema; and display an entity page including at least a portion of the attributes of the entity record based on permissions to view the ones of the set of enterprise source documents associated with the mined entity name. 2. The computer system of claim 1 , wherein the public data is Wikipedia data. 3. The computer system of claim 1 , wherein: the entity record includes metadata defining supporting enterprise source documents for each of the attributes of the entity record; and the processor is configured to display respective ones of the portion of the attributes included in the entity page in response to determining that a user has permission to access at least one of the enterprise source documents that supports the respective ones of the portion of the attributes. 4. The computer system of claim 1 , wherein the processor is configured to: receive a curation action on the entity record from a first user associated with the entity record via the mining; update the entity record based on the curation action. 5. The computer system of claim 1 wherein the entity record is a project entity record and the entity schema defines an identifier, a name, one or more members, one or more related groups or sites, and one or more related documents. 6. The computer system of claim 5 , wherein the entity schema further defines one or more managers, one or more related emails, or one or more related meetings. 7. The computer system of claim 1 , wherein the processor is further configured to: identify a reference to the entity record within an enterprise document accessed by a user; and wherein to display the portion of the entity page further comprises to display an entity card including a portion of the entity page within an application used to access the enterprise document. 8. A method of managing an entity record within a knowledge graph, comprising performing, by an enterprise named entity recognition (ENER) model, a mining of a set of enterprise source documents within an enterprise intranet to determine a plurality of entity names, wherein the ENER model is trained in a multi-stage training process with public data and non-public enterprise data, the multi-stage training process comprising: in a first stage, training the ENER model using the public data, the public data comprising named entity recognition training data converted from an online reference encyclopedia using a structured knowledge graph; in a second stage, tuning the ENER model using collected data from enterprise documents from the non-public enterprise data and NER training corpora from academic research; and filtering entities that have a number of disconnected instances of potential entity attributes indicative of duplication that exceeds a threshold; generating an entity record within a knowledge graph for a mined entity name from the entity names based on an entity schema and ones of the set of enterprise source documents associated with the mined entity name, the entity record including attributes aggregated from the ones of the set of enterprise source documents associated with the mined entity name, wherein the entity record is a project entity record and includes metadata defining supporting enterprise source documents for each of the attributes of the entity record and the mining of the set of enterprise source documents comprises: comparing the set of enterprise source documents to a set of templates defining potential entity attributes to identify instances within the set of enterprise source documents; partitioning the instances by potential entity names into a plurality of partitions; and clustering the instances within each partition to identify the mined entity name for each partition; filter common words from the instances; filter the plurality of entity names to remove at least one mined entity name where all of the clustered instances for the mined entity name are derived from templates that do not define a project name according to the entity schema; and displaying an entity page including at least a portion of the attributes of the entity record based on permissions to view the ones of the set of enterprise source documents associated with the mined entity name. 9. The method of claim 8 , wherein the entity record includes metadata defining supporting enterprise source documents for each of the attributes of the entity record, and wherein displaying the entity page comprises displaying respective ones of the portion of the attributes included in the entity page in response to determining that a user has permission to access at least one of the supporting enterprise source documents that supports the respective ones of the portion of the attributes. 10. The method of claim 8 , wherein the public data is Wikipedia data. 11. The method of claim 8 , further comprising identifying a reference to the entity record within an enterprise document accessed by a user; and wherein displaying the portion of the entity page comprises displaying an entity card including a portion of the entity page within an application used to access the enterprise document. 12. A non-transitory computer-readable medium storing computer-executable instructions that when executed by a computer processor cause the computer processor to: performing, by an enterprise named entity recognition (ENER) model, a mining of a set of enterprise source documents within an

Assignees

Inventors

Classifications

  • G06F16/313Primary

    Selection or weighting of terms for indexing · CPC title

  • Parsing · CPC title

  • Knowledge representation; Symbolic representation · CPC title

  • Clustering; Classification · CPC title

  • G06F40/295Primary

    Named entity recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12086546B2 cover?
Examples described herein generally relate to a computer system including a knowledge graph storing a plurality of entities. A mining of a set of enterprise source documents within an enterprise intranet is performed, by an enterprise named entity recognition (ENER) model, to determine a plurality of entity names. An entity record is generated within a knowledge graph for a mined entity name fr…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/313. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 10 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).