Disambiguation in mention detection
US-10176165-B2 · Jan 8, 2019 · US
US11544312B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11544312-B2 |
| Application number | US-202016792456-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 17, 2020 |
| Priority date | Feb 17, 2020 |
| Publication date | Jan 3, 2023 |
| Grant date | Jan 3, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A mechanism is provided in a data processing system to implement a cognitive natural language processing (NLP) system with descriptor uniqueness identification to support named entity mention clustering. The mechanism annotates a set of documents from a corpus of documents for entity types and mentions, collects descriptor usages from all documents in the corpus of documents, analyzes the descriptor usages to classify the descriptors as base terms or modifier terms, generates compatibility scores for the descriptors, and performs entity merging of entity clusters based on the compatibility scores.
Opening claim text (preview).
What is claimed is: 1. A method, in a data processing system comprising a processor and a memory, the memory comprising instructions that are executed by the processor to specifically configure the processor to implement a cognitive question answering (QA) system with descriptor uniqueness identification to support named entity mention clustering, the method comprising: receiving, by an entity uniqueness identification and entity clustering engine executing within the cognitive QA system, an open-domain. corpus of text. documents and a domain-specific corpus of text documents; annotating, by an entity tagger within the entity uniqueness and entity clustering engine, a set of documents from the open-domain corpus of text documents and the domain-specific corpus of text documents for entity types and mentions; collecting, by an entity uniqueness identification component within the entity uniqueness and entity clustering engine, descriptor usages of descriptors from all documents in the open-domain corpus of text documents; analyzing, by the entity uniqueness identification component, the descriptor usages to classify the descriptors as base terms or modifier terms; building, by the entity uniqueness identification component, a frequency count of descriptor co-occurrences: generating, by the entity uniqueness identification component, specificity markers for the descriptors, wherein each specificity marker specifies whether the uniqueness of the corresponding descriptor is definite or indefinite; generating, by the entity uniqueness identification component, compatibility scores for combinations of the descriptors based on the frequency counts of descriptor co-occurrences and the specificity markers for the descriptors, wherein the compatibility scores comprise real-valued scores such that a negative score indicates incompatibility and a positive score indicates compatibility, with larger magnitude scores indicating strength of determination or confidence; performing, by an entity clustering component within the entity uniqueness and entity clustering engine, entity merging of entity clusters based on the compatibility scores; and generating, by the cognitive QA system, a set of candidate answers from passages within the domain-specific corpus of text documents for an input question based on results of the entity merging of entity clusters. 2. The method of claim 1 , further comprising removing context dependent descriptor terms. 3. The method of claim 1 , wherein annotating the set of documents comprises replacing each text-level mention with its corresponding entity type. 4. The method of claim 1 , wherein each description is in a grammatical construction selected from the group consisting of copula, pre-nominal, sentence-initial adverbial, and appositive. 5. The method of claim 1 , wherein generating compatibility scores for the descriptors comprises using a rule-based scorer with a taxonomy and synonym resource. 6. The method of claim 1 , wherein generating compatibility scores for the descriptors comprises using a trained statistical scorer. 7. The method of claim 1 , wherein performing entity merging comprises using a classification model, a distance-based model, or a Markov Chain Monte Carlo based inference model. 8. A computer program product comprising a computer readable storage medium having a computer readable program stored therein, wherein the computer readable program comprises instructions, which when executed on a processor of a computing device causes the computing device to implement a cognitive question answering (QA) system with descriptor uniqueness identification to support named entity mention clustering, wherein the computer readable program causes the computing device to: receive, by an entity uniqueness identification and entity clustering engine executing within the cognitive QA system, an open-domain corpus of text documents and a domain-specific corpus of text documents; annotate, by an entity tagger within the entity uniqueness and entity clustering engine, a set of documents from the open-domain corpus of text documents and the domain-specific corpus of text documents for entity types and mentions; collect, by an entity uniqueness identification component within the entity uniqueness and entity clustering engine, descriptor usages of descriptors from all documents in the open-domain corpus of text documents; analyze, by the entity uniqueness identification component, the descriptor usages to classify the descriptors as base terms or modifier terms; build, by the entity uniqueness identification component, a frequency count of descriptor co-occurrences; generate, by the entity uniqueness identification component, specificity markers for the descriptors, wherein each specificity marker specifies whether the uniqueness of the corresponding descriptor is definite or indefinite; generate, by the entity uniqueness identification component, compatibility scores for combinations of the descriptors based on the frequency counts of descriptor co-occurrences and the specificity markers for the descriptors, Wherein the compatibility scores comprise real-valued scores such that a negative score indicates incompatibility and a positive score indicates compatibility, with larger magnitude scores indicating strength of determination or confidence; perform, by an entity clustering component within the entity uniqueness and entity clustering engine, entity merging of entity clusters based on the compatibility scores; and generate, by the cognitive QA system, a set of candidate answers from passages within the domain-specific corpus of text documents for an input question based on results of the entity merging of entity clusters. 9. The computer program product of claim 8 , wherein the computer readable program causes the computing device to remove context dependent descriptor terms. 10. The computer program product of claim 8 , wherein annotating the set of documents comprises replacing each text-level mention with its corresponding entity type. 11. The computer program product of claim 8 , wherein each description is in a grammatical construction selected from the group consisting of copula, pre-nominal, sentence-initial adverbial, and appositive. 12. The computer program product of claim 8 , wherein generating compatibility scores for the descriptors comprises using a rule-based scorer with a taxonomy and synonym resource. 13. The computer program product of claim 8 , wherein generating compatibility scores for the descriptors comprises using a trained statistical scorer. 14. A computing device comprising: a processor; and a memory coupled to the processor, wherein the memory comprises instructions, which when executed on a processor of a computing device causes the computing device to implement a cognitive question answering (QA) system with descriptor uniqueness identification to support named entity mention clustering, wherein the instructions cause the processor to: receive, by an entity uniqueness identification and entity clustering engine executing within the cognitive QA system, an open-domain corpus of text documents and a domain-specific corpus of text documents; annotate, by an entity tagger within the entity uniqueness and entity clustering engine, a set of documents from the open-domain corpus of text documents and the domain-specific corpus of text documents for entity types and mentions; collect, by an entity uniqueness identification component within the entity uniqueness and entity clustering engine, descriptor usages of descriptors from all documents in the open-domain corpus of text document
Natural language query formulation · CPC title
Annotation, e.g. comment data or footnotes · CPC title
Creation or modification of classes or clusters · CPC title
using statistics or function optimisation, e.g. modelling of probability density functions · CPC title
Named entity recognition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.