Electronically based thesaurus querying documents while leveraging context sensitivity

US10073839B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10073839-B2
Application numberUS-201313930660-A
CountryUS
Kind codeB2
Filing dateJun 28, 2013
Priority dateJun 28, 2013
Publication dateSep 11, 2018
Grant dateSep 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Arrangements described herein relate to language enhancement. Source text can be automatically gathered from a plurality of text sources, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure. Subject text being exposed to thesaurus processing can be received, a context of the subject text can be identified, and the thesaurus data infrastructure can be accessed while the thesaurus queries previously acquired source texts or documents having similar context to identify source text having context similar to the context of the subject text. The identified source text can be analyzed to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text. The identified at least one candidate word or phrase can be recommended as the replacement for the at least one word or phrase contained in the subject text.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of language enhancement, the method comprising: automatically gathering source text from a plurality of text sources, wherein at least a portion of the source text is stored as natural language documents, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure; receiving subject text being exposed to thesaurus processing; identifying a context of the subject text; identifying source text having context similar to the context of the subject text by accessing the thesaurus data infrastructure and processing the source text using dynamically created rules to identify the source text having context similar to the context of the subject text, the dynamically created rules generated by performing initial processing on the source text when the source text is gathered; analyzing, using a processor, the identified source text to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text by performing natural language inference processing on the source text stored as natural language documents; and recommending the identified at least one candidate word or phrase as the replacement for the at least one word or phrase contained in the subject text by presenting the at least one candidate word or phrase on a display; wherein the recommendations follow the thesaurus further querying previously acquired source texts or documents having similar context. 2. The method of claim 1 , wherein the at least one candidate word or phrase contained in the source text comprises a plurality of candidate words or phrases, the method further comprising: assigning a ranking to each of the of the candidate words or phrases, each ranking assigned to a respective candidate word or phrase indicating a level of confidence that the respective candidate word or phrase is an appropriate replacement for the word or phrase contained in the subject text. 3. The method of claim 2 , wherein recommending the identified at least one candidate word or phrase as the replacement for at least one word or phrase contained in the subject text comprises: presenting to a user each of the candidate words or phrases and the respective ranking assigned to each of the candidate words or phrases. 4. The method of claim 1 , wherein the context of the subject text is identified by scanning at least a portion of a document containing the subject text. 5. The method of claim 1 , wherein the context of the subject text is identified by scanning an entire document containing the subject text. 6. The method of claim 1 , wherein automatically gathering source text from a plurality of text sources comprises: performing automated web crawling of social media websites to identify new source text to be stored in the thesaurus data infrastructure. 7. A system comprising: a processor programmed to initiate executable operations comprising: automatically gathering source text from a plurality of text sources, wherein at least a portion of the source text is stored as natural language documents, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure; receiving subject text being exposed to thesaurus processing; identifying a context of the subject text; identifying source text having context similar to the context of the subject text by accessing the thesaurus data infrastructure and processing the source text using dynamically created rules to identify the source text having context similar to the context of the subject text, the dynamically created rules generated by performing initial processing on the source text when the source text is gathered; analyzing the identified source text to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text by performing natural language inference processing on the source text stored as natural language documents; and recommending the identified at least one candidate word or phrase as the replacement for the at least one word or phrase contained in the subject text by presenting the at least one candidate word or phrase on a display; wherein the recommendations follow the thesaurus further querying previously acquired source texts or documents having similar context. 8. The system of claim 7 , wherein the at least one candidate word or phrase contained in the source text comprises a plurality of candidate words or phrases, the executable operations further comprising: assigning a ranking to each of the of the candidate words or phrases, each ranking assigned to a respective candidate word or phrase indicating a level of confidence that the respective candidate word or phrase is an appropriate replacement for the word or phrase contained in the subject text. 9. The system of claim 8 , wherein recommending the identified at least one candidate word or phrase as the replacement for at least one word or phrase contained in the subject text comprises: presenting to a user each of the candidate words or phrases and the respective ranking assigned to each of the candidate words or phrases. 10. The system of claim 7 , wherein the context of the subject text is identified by scanning at least a portion of a document containing the subject text. 11. The system of claim 7 , wherein the context of the subject text is identified by scanning an entire document containing the subject text. 12. The system of claim 7 , wherein automatically gathering source text from a plurality of text sources comprises: performing automated web crawling of social media websites to identify new source text to be stored in the thesaurus data infrastructure. 13. A computer program product for enhancing language, the computer program product comprising a computer readable storage device having program code stored thereon, wherein the computer readable storage device is not a transitory, propagating signal per se, the program code executable by a processor to perform a method comprising: automatically gathering, by the processor, source text from a plurality of text sources, wherein at least a portion of the source text is stored as natural language documents, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure; receiving, by the processor, subject text being exposed to thesaurus processing; identifying, by the processor, a context of the subject text; identifying, by the processor, source text having context similar to the context of the subject text by accessing the thesaurus data infrastructure and processing the source text using dynamically created rules to identify the source text having context similar to the context of the subject text, the dynamically created rules generated by performing initial processing on the source text when the source text is gathered; analyzing, by the processor, the identified source text to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text by performing natural language inference processing on the source text stored as natural language documents; and recommending, by the processor, the identified at least one candidate word or phrase as the replacement for the at least one word or phrase contained in the subject text by presenting the at least one candida

Assignees

Inventors

Classifications

  • Business processes related to social networking or social networking services · CPC title

  • Thesaurus · CPC title

  • Noise filtering · CPC title

  • G06F40/247Primary

    Thesauruses; Synonyms · CPC title

  • Indexing; Web crawling techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10073839B2 cover?
Arrangements described herein relate to language enhancement. Source text can be automatically gathered from a plurality of text sources, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure. Subject text being exposed to thesaurus processing can be received, a context of the subject text can be identified, and…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/247. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).