What technology area does this patent fall under?

Primary CPC classification G06F40/247. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Electronically based thesaurus querying documents while leveraging context sensitivity

US10073839B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10073839-B2
Application number	US-201313930660-A
Country	US
Kind code	B2
Filing date	Jun 28, 2013
Priority date	Jun 28, 2013
Publication date	Sep 11, 2018
Grant date	Sep 11, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Arrangements described herein relate to language enhancement. Source text can be automatically gathered from a plurality of text sources, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure. Subject text being exposed to thesaurus processing can be received, a context of the subject text can be identified, and the thesaurus data infrastructure can be accessed while the thesaurus queries previously acquired source texts or documents having similar context to identify source text having context similar to the context of the subject text. The identified source text can be analyzed to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text. The identified at least one candidate word or phrase can be recommended as the replacement for the at least one word or phrase contained in the subject text.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of language enhancement, the method comprising: automatically gathering source text from a plurality of text sources, wherein at least a portion of the source text is stored as natural language documents, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure; receiving subject text being exposed to thesaurus processing; identifying a context of the subject text; identifying source text having context similar to the context of the subject text by accessing the thesaurus data infrastructure and processing the source text using dynamically created rules to identify the source text having context similar to the context of the subject text, the dynamically created rules generated by performing initial processing on the source text when the source text is gathered; analyzing, using a processor, the identified source text to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text by performing natural language inference processing on the source text stored as natural language documents; and recommending the identified at least one candidate word or phrase as the replacement for the at least one word or phrase contained in the subject text by presenting the at least one candidate word or phrase on a display; wherein the recommendations follow the thesaurus further querying previously acquired source texts or documents having similar context. 2. The method of claim 1 , wherein the at least one candidate word or phrase contained in the source text comprises a plurality of candidate words or phrases, the method further comprising: assigning a ranking to each of the of the candidate words or phrases, each ranking assigned to a respective candidate word or phrase indicating a level of confidence that the respective candidate word or phrase is an appropriate replacement for the word or phrase contained in the subject text. 3. The method of claim 2 , wherein recommending the identified at least one candidate word or phrase as the replacement for at least one word or phrase contained in the subject text comprises: presenting to a user each of the candidate words or phrases and the respective ranking assigned to each of the candidate words or phrases. 4. The method of claim 1 , wherein the context of the subject text is identified by scanning at least a portion of a document containing the subject text. 5. The method of claim 1 , wherein the context of the subject text is identified by scanning an entire document containing the subject text. 6. The method of claim 1 , wherein automatically gathering source text from a plurality of text sources comprises: performing automated web crawling of social media websites to identify new source text to be stored in the thesaurus data infrastructure. 7. A system comprising: a processor programmed to initiate executable operations comprising: automatically gathering source text from a plurality of text sources, wherein at least a portion of the source text is stored as natural language documents, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure; receiving subject text being exposed to thesaurus processing; identifying a context of the subject text; identifying source text having context similar to the context of the subject text by accessing the thesaurus data infrastructure and processing the source text using dynamically created rules to identify the source text having context similar to the context of the subject text, the dynamically created rules generated by performing initial processing on the source text when the source text is gathered; analyzing the identified source text to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text by performing natural language inference processing on the source text stored as natural language documents; and recommending the identified at least one candidate word or phrase as the replacement for the at least one word or phrase contained in the subject text by presenting the at least one candidate word or phrase on a display; wherein the recommendations follow the thesaurus further querying previously acquired source texts or documents having similar context. 8. The system of claim 7 , wherein the at least one candidate word or phrase contained in the source text comprises a plurality of candidate words or phrases, the executable operations further comprising: assigning a ranking to each of the of the candidate words or phrases, each ranking assigned to a respective candidate word or phrase indicating a level of confidence that the respective candidate word or phrase is an appropriate replacement for the word or phrase contained in the subject text. 9. The system of claim 8 , wherein recommending the identified at least one candidate word or phrase as the replacement for at least one word or phrase contained in the subject text comprises: presenting to a user each of the candidate words or phrases and the respective ranking assigned to each of the candidate words or phrases. 10. The system of claim 7 , wherein the context of the subject text is identified by scanning at least a portion of a document containing the subject text. 11. The system of claim 7 , wherein the context of the subject text is identified by scanning an entire document containing the subject text. 12. The system of claim 7 , wherein automatically gathering source text from a plurality of text sources comprises: performing automated web crawling of social media websites to identify new source text to be stored in the thesaurus data infrastructure. 13. A computer program product for enhancing language, the computer program product comprising a computer readable storage device having program code stored thereon, wherein the computer readable storage device is not a transitory, propagating signal per se, the program code executable by a processor to perform a method comprising: automatically gathering, by the processor, source text from a plurality of text sources, wherein at least a portion of the source text is stored as natural language documents, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure; receiving, by the processor, subject text being exposed to thesaurus processing; identifying, by the processor, a context of the subject text; identifying, by the processor, source text having context similar to the context of the subject text by accessing the thesaurus data infrastructure and processing the source text using dynamically created rules to identify the source text having context similar to the context of the subject text, the dynamically created rules generated by performing initial processing on the source text when the source text is gathered; analyzing, by the processor, the identified source text to identify at least one candidate word or phrase contained in the source text to recommend as a replacement for at least one word or phrase contained in the subject text by performing natural language inference processing on the source text stored as natural language documents; and recommending, by the processor, the identified at least one candidate word or phrase as the replacement for the at least one word or phrase contained in the subject text by presenting the at least one candida

Assignees

Inventors

Classifications

G06Q10/40
Business processes related to social networking or social networking services · CPC title
G06F16/374
Thesaurus · CPC title
G10L21/0208
Noise filtering · CPC title
G06F40/247Primary
Thesauruses; Synonyms · CPC title
G06F16/951
Indexing; Web crawling techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 52116442

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10073839B2 cover?: Arrangements described herein relate to language enhancement. Source text can be automatically gathered from a plurality of text sources, the plurality of text sources including at least one social media website, and storing the source text to a thesaurus data infrastructure. Subject text being exposed to thesaurus processing can be received, a context of the subject text can be identified, and…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/247. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 11 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).