Methods and systems for processing language with standardization of source data

US11200378B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11200378-B2
Application numberUS-201816157573-A
CountryUS
Kind codeB2
Filing dateOct 11, 2018
Priority dateOct 11, 2018
Publication dateDec 14, 2021
Grant dateDec 14, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments for processing language by one or more processors are described. A plurality of document portions are detected. Each of the plurality of document portions includes text in a respective language type. The text of each of the plurality of document portions is converted to a standardized language type. A language processing method is caused to be performed on the plurality of document portions after the converting of the text of each of the plurality of document portions to the standardized language type.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method, by one or more processors, for processing language comprising: detecting a plurality of document portions, wherein each of the plurality of document portions includes text in a respective language type, the respective language type associated with a technical skill level requisite to understand the text; selecting a word list for each of the plurality of document portions based, at least in part, on an age of the plurality of document portions; converting the text of each of the plurality of document portions from a detected source language to a standardized language type, wherein the standardized language type is based on the detected source language of the respective language type such that the converting transforms the text into linguistic terms in which the technical skill level requisite to understand the text is standardized notwithstanding whether the detected source language of the respective language type and the standardized language type are of a same language, and wherein the converting is based on the word list such that the word list includes at least synonyms for those portions of the plurality of document portions having the linguistic terms of which meanings thereof have changed over time according to the age of the plurality of document portions; and causing a language processing method to be performed on the plurality of document portions after the converting of the text of each of the plurality of document portions to the standardized language type. 2. The method of claim 1 , wherein the language type of the text in each of the plurality of document portions is different than the language type of the text in the others of the plurality of document portions. 3. The method of claim 1 , wherein the converting is further based on a context of the respective document portion. 4. The method of claim 1 , wherein the selecting of the word list for at least some of the plurality of document portions is based on at least one of a natural language of the respective language type and a knowledge base associated with the respective document portion. 5. The method of claim 1 , wherein the selecting of the word list for at least some of the plurality of document portions is based on definitions within the respective documents. 6. The method of claim 1 , wherein the language processing method is at least one of performed utilizing a synonym table, includes natural language processing (NPL), and includes a cognitive analysis. 7. A system for processing language comprising: at least one processor that detects a plurality of document portions, wherein each of the plurality of document portions includes text in a respective language type, the respective language type associated with a technical skill level requisite to understand the text; selects a word list for each of the plurality of document portions based, at least in part, on an age of the plurality of document portions; converts the text of each of the plurality of document portions from a detected source language to a standardized language type, wherein the standardized language type is based on the detected source language of the respective language type such that the converting transforms the text into linguistic terms in which the technical skill level requisite to understand the text is standardized notwithstanding whether the detected source language of the respective language type and the standardized language type are of a same language, and wherein the converting is based on the word list such that the word list includes at least synonyms for those portions of the plurality of document portions having the linguistic terms of which meanings thereof have changed over time according to the age of the plurality of document portions; and causes a language processing method to be performed on the plurality of document portions after the converting of the text of each of the plurality of document portions to the standardized language type. 8. The system of claim 7 , wherein the language type of the text in each of the plurality of document portions is different than the language type of the text in the others of the plurality of document portions. 9. The system of claim 7 , wherein the converting is further based on a context of the respective document portion. 10. The system of claim 7 , wherein the selecting of the word list for at least some of the plurality of document portions is based on at least one of a natural language of the respective language type and a knowledge base associated with the respective document portion. 11. The system of claim 7 , wherein the selecting of the word list for at least some of the plurality of document portions is based on definitions within the respective documents. 12. The system of claim 7 , wherein the language processing method is at least one of performed utilizing a synonym table, includes natural language processing (NPL), and includes a cognitive analysis. 13. A computer program product for processing language by one or more processors, the computer program product comprising a non-transitory computer-readable storage medium having computer-readable program code portions stored therein, the computer-readable program code portions comprising: an executable portion that detects a plurality of document portions, wherein each of the plurality of document portions includes text in a respective language type, the respective language type associated with a technical skill level requisite to understand the text; an executable portion that selects a word list for each of the plurality of document portions based, at least in part, on an age of the plurality of document portions; an executable portion that converts the text of each of the plurality of document portions from a detected source language to a standardized language type, wherein the standardized language type is based on the detected source language of the respective language type such that the converting transforms the text into linguistic terms in which the technical skill level requisite to understand the text is standardized notwithstanding whether the detected source language of the respective language type and the standardized language type are of a same language, and wherein the converting is based on the word list such that the word list includes at least synonyms for those portions of the plurality of document portions having the linguistic terms of which meanings thereof have changed over time according to the age of the plurality of document portions; and an executable portion that causes a language processing method to be performed on the plurality of document portions after the converting of the text of each of the plurality of document portions to the standardized language type. 14. The computer program product of claim 13 , wherein the language type of the text in each of the plurality of document portions is different than the language type of the text in the others of the plurality of document portions. 15. The computer program product of claim 13 , wherein the converting is further based on a context of the respective document portion. 16. The computer program product of claim 13 , wherein the selecting of the word list for at least some of the plurality of document portions is based on at least one of a natural language of the respective language type and a knowledge base associated with the respective document portion. 17. The computer program product of claim 13 , wherein the selecting of the word list for at least some of the plurality of document portions is b

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Thesauruses; Synonyms · CPC title

  • Dictionaries · CPC title

  • G06F40/40Primary

    Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11200378B2 cover?
Embodiments for processing language by one or more processors are described. A plurality of document portions are detected. Each of the plurality of document portions includes text in a respective language type. The text of each of the plurality of document portions is converted to a standardized language type. A language processing method is caused to be performed on the plurality of document …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 14 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).