Readability awareness in natural language processing systems
US-9910912-B2 · Mar 6, 2018 · US
US10664507B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10664507-B2 |
| Application number | US-201916445825-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 19, 2019 |
| Priority date | Jan 5, 2016 |
| Publication date | May 26, 2020 |
| Grant date | May 26, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Electronic natural language processing in a natural language processing (NLP) system, such as a Question-Answering (QA) system. A receives electronic text input, in question form, and determines a readability level indicator in the question. The readability level indicator includes at least a grammatical error, a slang term, and a misspelling type. The computer determines a readability level for the electronic text input based on the readability level indicator, and retrieves candidate answers based on the readability level.
Opening claim text (preview).
What is claimed is: 1. A method for electronic natural language processing in an electronic natural language processing (NLP) system comprising a question-answering (QA) pipeline having a plurality of processing stages, wherein one or more of steps of the method are performed by one or more of the plurality of processing stages, comprising: determining, by the QA pipeline, one or more readability level indicators in a plurality of received natural language documents; filtering, by the QA pipeline, one or more of the natural language documents to exclude one or more natural language documents from processing by at least one other processing stage; and providing, by the QA pipeline in response to receiving a query text, at least one natural language document whose readability level is within a threshold distance of a readability level of the query text. 2. The method of claim 1 , further comprising: training a data model based on determining the readability level for the one or more of the plurality of natural language documents. 3. The method of claim 1 , wherein further comprising: querying, based on a received electronic text input from a user, a database storing the plurality of natural language documents; and retrieving a set of candidate answers in response to the query, wherein a candidate answer comprises at least a portion of a natural language document. 4. The method of claim 3 , further comprising: identifying the received electronic text input as a question. 5. The method of claim 1 , wherein the readability level of the query text is based on one or more readability level indicators including at least one of a grammatical error, a slang term, or a misspelling type in the query text. 6. The method of claim 3 , wherein retrieving a set of candidate answers in response to the query comprises: defining a score function having as an input at least a readability level, wherein the set of candidate answers comprise natural language documents whose score meets a threshold value. 7. The method of claim 1 , wherein determining a readability level for one or more of the plurality of natural language documents comprises: determining a readability level for at least two portions of at least one natural language document. 8. The method of claim 1 , wherein a score assigned to the query text is based on at least a misspelling type, wherein the misspelling type comprises one or more of: a misspelling in a word falling within a defined range; a misspelling of a word, where the word is found in at least one dictionary, and not found in at least another dictionary; and a number of auto-corrections detected during an input process for the query text, the input process comprising receiving the query text from a user via an input device. 9. A computer system for electronic natural language processing (NLP), comprising: one or more computer devices each having one or more processors and one or more tangible storage devices; and a program embodied on at least one of the one or more storage devices, the program having a plurality of program instructions for execution by the one or more processors, the program instructions comprising programming instructions for a question-answering (QA) pipeline having a plurality of processing stages, wherein one or more of steps of a method are performed by executing one or more of the plurality of processing stages, and the computer programming instructions further comprising instructions for: determining, by the QA pipeline, one or more readability level indicators in a plurality of received natural language documents; filtering, by the QA pipeline, one or more of the natural language documents to exclude one or more natural language documents from processing by at least one other processing stage; and providing, by the QA pipeline in response to receiving a query text, at least one natural language document whose readability level is within a threshold distance of a readability level of the query text. 10. The system of claim 9 , wherein the program instructions further comprise program instructions for: training a data model based on determining the readability level for the one or more of the plurality of natural language documents. 11. The system of claim 9 , wherein the program instructions further comprise program instructions for: querying, based on a received electronic text input from a user, a database storing the plurality of natural language documents; and retrieving a set of candidate answers in response to the query, wherein a candidate answer comprises at least a portion of a natural language document. 12. The system of claim 11 , wherein the program instructions further comprise program instructions for: identifying the received electronic text input as a question. 13. The system of claim 9 , wherein the readability level of the query text is based on one or more readability level indicators including at least one of a grammatical error, a slang term, or a misspelling type in the query text. 14. The system of claim 11 , wherein the program instructions further comprise program instructions for: defining a score function having as an input at least a readability level, wherein the set of candidate answers comprise natural language documents whose score meets a threshold value. 15. The system of claim 9 , wherein the program instructions for determining a readability level for one or more of the plurality of natural language documents comprise program instructions for: determining a readability level for at least two portions of at least one natural language document. 16. A computer program product for electronic natural language processing (NLP) in an electronic NLP system comprising a question-answering (QA) pipeline having a plurality of processing stages, the computer program product, comprising a non-transitory tangible storage device having program code embodied therewith, the program code executable by a processor of a computer to perform a method, the method comprising: determining, by the QA pipeline, one or more readability level indicators in a plurality of received natural language documents; filtering, by the QA pipeline, one or more of the natural language documents to exclude one or more natural language documents from processing by at least one other processing stage; and providing, by the QA pipeline in response to receiving a query text, at least one natural language document whose readability level is within a threshold distance of a readability level of the query text. 17. The computer program product of claim 16 , wherein the method further comprises: training, by the computer, a data model based on determining the readability level for the one or more of the plurality of natural language documents. 18. The computer program product of claim 16 , wherein the method further comprises: querying, by the computer, based on a received electronic text input from a user, a database storing the plurality of natural language documents; and retrieving, by the computer, a set of candidate answers in response to the query, wherein a candidate answer comprises at least a portion of a natural language document. 19. The computer program product of claim 18 , wherein the method further comprises: identifying, by the computer, the received electronic text input as a question. 20. The computer program product of claim 16 , wherein the readability level of the query text is based on one or more readability level indicators includi
Parsing · CPC title
Summarisation for human users · CPC title
Orthographic correction, e.g. spell checking or vowelisation · CPC title
Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title
Grammatical analysis; Style critique · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.