Enterprise knowledge graphs using user-based mining
US-2022019908-A1 · Jan 20, 2022 · US
US12086546B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12086546-B2 |
| Application number | US-202016933907-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 20, 2020 |
| Priority date | Jul 20, 2020 |
| Publication date | Sep 10, 2024 |
| Grant date | Sep 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Examples described herein generally relate to a computer system including a knowledge graph storing a plurality of entities. A mining of a set of enterprise source documents within an enterprise intranet is performed, by an enterprise named entity recognition (ENER) model, to determine a plurality of entity names. An entity record is generated within a knowledge graph for a mined entity name from the linked entity names based on an entity schema and ones of the set of enterprise source documents associated with the mined entity name. The entity record includes attributes aggregated from the ones of the set of enterprise source documents associated with the mined entity name.
Opening claim text (preview).
What is claimed is: 1. A computer system comprising: a memory storing computer-executable instructions; a processor configured to execute the instructions to: perform, by an enterprise named entity recognition (ENER) model, a mining of a set of enterprise source documents within an enterprise intranet to determine a plurality of entity names, wherein the ENER model is trained in a multi-stage training process with public data and non-public enterprise data, the multi-stage training process comprising: in a first stage, training the ENER model using the public data, the public data comprising named entity recognition training data converted from an online reference encyclopedia using a structured knowledge graph; in a second stage, tuning the ENER model using collected data from enterprise documents from the non-public enterprise data and NER training corpora from academic research; and filtering entities that have a number of disconnected instances of potential entity attributes indicative of duplication that exceeds a threshold; generate an entity record within a knowledge graph for a mined entity name from the entity names based on an entity schema and ones of the set of enterprise source documents associated with the mined entity name, the entity record including attributes aggregated from the ones of the set of enterprise source documents associated with the mined entity name, wherein the entity record is a project entity record and includes metadata defining supporting enterprise source documents for each of the attributes of the entity record and the mining of the set of enterprise source documents comprises: comparing the set of enterprise source documents to a set of templates defining potential entity attributes to identify instances within the set of enterprise source documents; partitioning the instances by potential entity names into a plurality of partitions; and clustering the instances within each partition to identify the mined entity name for each partition; filter common words from the instances; filter the plurality of entity names to remove at least one mined entity name where all of the clustered instances for the mined entity name are derived from templates that do not define a project name according to the entity schema; and display an entity page including at least a portion of the attributes of the entity record based on permissions to view the ones of the set of enterprise source documents associated with the mined entity name. 2. The computer system of claim 1 , wherein the public data is Wikipedia data. 3. The computer system of claim 1 , wherein: the entity record includes metadata defining supporting enterprise source documents for each of the attributes of the entity record; and the processor is configured to display respective ones of the portion of the attributes included in the entity page in response to determining that a user has permission to access at least one of the enterprise source documents that supports the respective ones of the portion of the attributes. 4. The computer system of claim 1 , wherein the processor is configured to: receive a curation action on the entity record from a first user associated with the entity record via the mining; update the entity record based on the curation action. 5. The computer system of claim 1 wherein the entity record is a project entity record and the entity schema defines an identifier, a name, one or more members, one or more related groups or sites, and one or more related documents. 6. The computer system of claim 5 , wherein the entity schema further defines one or more managers, one or more related emails, or one or more related meetings. 7. The computer system of claim 1 , wherein the processor is further configured to: identify a reference to the entity record within an enterprise document accessed by a user; and wherein to display the portion of the entity page further comprises to display an entity card including a portion of the entity page within an application used to access the enterprise document. 8. A method of managing an entity record within a knowledge graph, comprising performing, by an enterprise named entity recognition (ENER) model, a mining of a set of enterprise source documents within an enterprise intranet to determine a plurality of entity names, wherein the ENER model is trained in a multi-stage training process with public data and non-public enterprise data, the multi-stage training process comprising: in a first stage, training the ENER model using the public data, the public data comprising named entity recognition training data converted from an online reference encyclopedia using a structured knowledge graph; in a second stage, tuning the ENER model using collected data from enterprise documents from the non-public enterprise data and NER training corpora from academic research; and filtering entities that have a number of disconnected instances of potential entity attributes indicative of duplication that exceeds a threshold; generating an entity record within a knowledge graph for a mined entity name from the entity names based on an entity schema and ones of the set of enterprise source documents associated with the mined entity name, the entity record including attributes aggregated from the ones of the set of enterprise source documents associated with the mined entity name, wherein the entity record is a project entity record and includes metadata defining supporting enterprise source documents for each of the attributes of the entity record and the mining of the set of enterprise source documents comprises: comparing the set of enterprise source documents to a set of templates defining potential entity attributes to identify instances within the set of enterprise source documents; partitioning the instances by potential entity names into a plurality of partitions; and clustering the instances within each partition to identify the mined entity name for each partition; filter common words from the instances; filter the plurality of entity names to remove at least one mined entity name where all of the clustered instances for the mined entity name are derived from templates that do not define a project name according to the entity schema; and displaying an entity page including at least a portion of the attributes of the entity record based on permissions to view the ones of the set of enterprise source documents associated with the mined entity name. 9. The method of claim 8 , wherein the entity record includes metadata defining supporting enterprise source documents for each of the attributes of the entity record, and wherein displaying the entity page comprises displaying respective ones of the portion of the attributes included in the entity page in response to determining that a user has permission to access at least one of the supporting enterprise source documents that supports the respective ones of the portion of the attributes. 10. The method of claim 8 , wherein the public data is Wikipedia data. 11. The method of claim 8 , further comprising identifying a reference to the entity record within an enterprise document accessed by a user; and wherein displaying the portion of the entity page comprises displaying an entity card including a portion of the entity page within an application used to access the enterprise document. 12. A non-transitory computer-readable medium storing computer-executable instructions that when executed by a computer processor cause the computer processor to: performing, by an enterprise named entity recognition (ENER) model, a mining of a set of enterprise source documents within an
Selection or weighting of terms for indexing · CPC title
Parsing · CPC title
Knowledge representation; Symbolic representation · CPC title
Clustering; Classification · CPC title
Named entity recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.