Identifying ambiguity in semantic resources

US11379669B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11379669-B2
Application numberUS-201916524818-A
CountryUS
Kind codeB2
Filing dateJul 29, 2019
Priority dateJul 29, 2019
Publication dateJul 5, 2022
Grant dateJul 5, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to a system, program product, and method for dictionary membership management directed at identifying ambiguity in semantic resources. A dictionary of seed terms is applied to a text corpus and matching items in the corpus are identified. The linguistic properties for each matching item are characterized and a context pattern of each matching item is constructed. Each context pattern is applied to the dictionary and matching content between the seed terms and the context pattern is identified and quantified. Lexicon items from the dictionary that have anomalous behavior reflected in the quantification are identified. One or more seed words identified as having anomalous behavior are selectively removed from the dictionary.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer system comprising: a processing unit operatively coupled to memory; and an artificial intelligence (AI) platform in communication with the processing unit, the AI platform configured to manage dictionary membership, including: a dictionary manager configured to apply natural language processing (NLP) to a textual corpus, including configured to apply seed terms of a dictionary to the textual corpus and identify one or more matching items in the textual corpus; a context manager configured to characterize linguistic properties of each matching item in the textual corpus, including the context manager configured to construct a context pattern from the textual corpus for each of the identified one or more matching items; a director operatively coupled to the context manager and the dictionary manager and configured to: apply each constructed context pattern to the dictionary; identify matching content between the seed terms of the dictionary and content of the context pattern; and quantify the identified matching content, including to assess scores for the seed terms, the score for the seed term representative of a quantity of the context patterns matching the seed term; and the dictionary manager further configured to: identify one or more anomalous seed terms from the seed terms of the dictionary, the one or more anomalous seed terms having ambiguous behavior as reflected by the quantification; and selectively remove the identified one or more anomalous seed terms from the dictionary, including to selectively remove at least the anomalous seed term associated with the score corresponding to the highest quantity of matching context patterns. 2. The system of claim 1 , wherein the score characterizes a quantity of the identified matching content between the constructed context pattern and the seed terms. 3. The system of claim 2 , wherein the score calculation comprises the director configured to review a set of terms produced in the constructed pattern, and quantify a match of each term in the set of terms with seed terms in the dictionary. 4. The system of claim 2 , further comprising the dictionary manager configured to attach the score to the corresponding seed term as metadata, wherein the metadata represents a degree of ambiguity and/or spuriousness of the seed term with respect to the textual corpus. 5. The system of claim 1 , wherein application of NLP to the textual corpus further comprises the dictionary manager configured to randomly select portions of the textual corpus subject to review. 6. The system of claim 1 , wherein the selective removal of the one or more anomalous seed terms from the dictionary eliminates anomalous items from the dictionary. 7. The system of claim 1 , wherein the processing unit is configured to receive an electronic search request containing a plurality of keywords, identify a keyword of the plurality of keywords that matches the one or more anomalous seed terms, and carry out an electronic search that excludes the keyword matched to the one or more anomalous seed terms. 8. A computer program product for dictionary membership management, the computer program product comprising: a computer readable storage medium; and program code embodied therewith with the computer readable storage medium, the program code executable by a processor to: apply natural language processing (NLP) to a textual corpus, including apply seed terms of a dictionary to the textual corpus and identify one or more matching items in the textual corpus; characterize linguistic properties of each matching item in the textual corpus, including construct a context pattern from the textual corpus for each of the identified one or more matching items; apply each constructed context pattern to the dictionary; identify matching content between the seed terms of the dictionary and content of the context pattern; quantify the identified matching content, including assess scores for the seed terms, the score for the seed term representative of a quantity of the context patterns matching the seed term; identify one or more anomalous terms from the seed terms of the dictionary, the one or more anomalous seed terms having ambiguous behavior as reflected in the quantification; and selectively remove the identified one or more anomalous seed terms from the dictionary, including to selectively remove at least the anomalous seed term associated with the score corresponding to the highest quantity of matching context patterns. 9. The computer program product of claim 8 , wherein the score characterizes a quantity of the identified matching content between the constructed context pattern and the seed terms. 10. The computer program product of claim 9 , wherein the score calculation comprises program code executable by the processor to review a set of terms produced in the constructed pattern, and quantify a match of each term in the set of terms with seed terms in the dictionary. 11. The computer program product of claim 9 , further comprising program code executable by the processor to attach the score to the corresponding seed term as metadata, wherein the metadata represents a degree of ambiguity and/or spuriousness of the seed term with respect to the textual corpus. 12. The computer program product of claim 8 , wherein the program code executable by the processor to selectively remove one or more seed terms from the dictionary comprises program code executable by the processor to eliminate anomalous items from the dictionary. 13. The computer program product of claim 8 , further comprising the program code executable by the processor to: receive an electronic search request containing a plurality of keywords; identify a keyword of the plurality of keywords that matches the one or more anomalous seed terms; and carry out an electronic search that excludes the keyword matched to the one or more anomalous seed terms. 14. A method comprising: applying natural language processing (NLP) to a textual corpus, including applying seed terms of a dictionary to the textual corpus and identifying one or more matching items in the textual corpus; characterizing linguistic properties for each matching item in the textual corpus, including constructing a context pattern from the textual corpus for each of the identified one or more matching items; applying each constructed context pattern to the dictionary; identifying matching content between the seed terms of the dictionary and content of the constructed context pattern corresponding to the seed terms; quantifying the identified matching content, including assessing scores for the seed terms, the score for the seed term representative of a quantity of the context patterns matching the seed term; identifying one or more anomalous terms from the seed terms of the dictionary, the one or more anomalous terms having ambiguous behavior as reflected in the quantification; and selectively removing the identified one or more anomalous seed terms from the dictionary, including selectively removing at least the anomalous seed term associated with the score corresponding to the highest quantity of matching context patterns. 15. The method of claim 14 , wherein the score characterizes a quantity of the identified matching content between the constructed context pattern and the seed terms. 16. The method of claim 15 , wherein calculating the score comprises reviewing a set of terms produced in the constructed pattern and quantifying a match of each term in the set of terms with seed terms in the dictionary.

Assignees

Inventors

Classifications

  • Dictionaries · CPC title

  • G06F40/30Primary

    Semantic analysis · CPC title

  • G06F40/20Primary

    Natural language analysis (semantic analysis of natural language G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11379669B2 cover?
Embodiments relate to a system, program product, and method for dictionary membership management directed at identifying ambiguity in semantic resources. A dictionary of seed terms is applied to a text corpus and matching items in the corpus are identified. The linguistic properties for each matching item are characterized and a context pattern of each matching item is constructed. Each context…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 05 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).