What technology area does this patent fall under?

Primary CPC classification G06F16/3322. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Methods and system for fast, adaptive correction of misspells

US10372814B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10372814-B2
Application number	US-201615296794-A
Country	US
Kind code	B2
Filing date	Oct 18, 2016
Priority date	Oct 18, 2016
Publication date	Aug 6, 2019
Grant date	Aug 6, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments are directed to a spellcheck module for an enterprise search engine. The spellcheck module includes a candidate suggestion generation module that generates a number of candidate words that may be the correction of the misspelled word. The candidate suggestion generation module implements an algorithm for indexing, searching, and storing terms from an index with a constrained edit distance, using words in a collection of documents. The spellcheck module further includes a candidate suggestion ranking module. In one embodiment, a non-contextual approach using a linear combination of distance and probability scores is utilized; while in another embodiment, a context sensitive approach accounting for real-word misspells and adopting deep learning models is utilized. In use, a query is provided to the spellcheck module to generate results in the form of a ranked list of generated candidate entries that may be an entry a user accidentally misspelled.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method for adaptive correction of misspelling, the method comprising: pre-training, by a processor, a pre-trained word vector; receiving, at the processor from a user device connected to the processor, a text for spelling analysis; creating, by the processor, a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; comparing, by the processor, a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; mapping, by the processor, each word in the text to the pre-trained word vector; obtaining, by the processor, a first vector representing a left context of the particular misspelled word and a second vector representing a right context of the particular misspelled word using a recurrent neural network (RNN); inputting, by the processor, the first vector and the second vector to a fully connected layer through the RNN, and inputting, by the processor, a third vector representing the particular misspelled word directly to the fully connected layer; replacing, by the processor, the particular misspelled word with each candidate in the candidate set of entries; outputting, by the processor, a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; ranking, by the processor, the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; ordering, by the processor, at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and displaying, to a user, the corrections to the particular misspelled word. 2. The method of claim 1 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, LDA, HAL, and NNMF. 3. The method of claim 1 , wherein the RNN comprises one or more of simple RNN, LSTM, and GRU. 4. The method of claim 1 , wherein the edit distance from the particular misspelled word and the minimum frequency of occurrence are predefined parameters. 5. The method of claim 1 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, HAL, and NNMF. 6. A system for adaptive correction of misspelling, the system comprising: a processor coupled to one or more user devices to receive user-generated search queries from the one or more user devices, the processor configured to: pre-train a pre-trained word vector; receive, from a first user device of the one or more user devices, a text for spelling analysis; create a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; compare a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; map each word in the text to the pre-trained word vector; obtain a first vector representing a left context and a second vector representing a right context using a recurrent neural network (RNN); output, the first vector and the second vector from the RNN to a fully connected layer, and output a third vector representing the particular misspelled word in the text directly to the fully connected layer; replace the particular misspelled word with each candidate in the candidate set of entries; output a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; rank the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; order at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and display, to a user, the corrections to the particular misspelled word. 7. The system of claim 6 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, LDA, HAL, and NNMF. 8. The system of claim 6 , wherein the RNN comprises one or more of simple RNN, LSTM, and GRU. 9. The system of claim 6 , wherein the edit distance from the particular misspelled word and the minimum frequency of occurrence are predefined parameters. 10. The system of claim 6 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, HAL, and NNMF. 11. A computer program product for adaptive correction of misspelling, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor coupled to one or more user devices to receive user-generated search queries from the one or more user devices to cause the processor to: pre-train a pre-trained word vector; receive, from a first user device of the one or more user devices, a text for spelling analysis; create a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; compare a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; map each word in the text to the pre-trained word vector; obtain a first vector representing a left context and a second vector representing a right context using a recurrent neural network (RNN); output the first vector and the second vector from the RNN to a fully connected layer, and output a third vector representing the particular misspelled word in the text directly to the fully connected layer; replace the particular misspelled word with each candidate in the candidate set of entries; output a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; rank the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; order at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and display, to a user, the corrections to the particular misspelled word. 12. The computer program product of claim 11 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, LDA, HAL, and NNMF. 13. The computer program product of claim 11 , wherein the RNN

Assignees

Inventors

Classifications

G06F16/3322Primary
using system suggestions (G06F16/3325 takes precedence) · CPC title
G06F40/232Primary
Orthographic correction, e.g. spell checking or vowelisation · CPC title
G06F17/273Primary
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 61904530

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10372814B2 cover?: Embodiments are directed to a spellcheck module for an enterprise search engine. The spellcheck module includes a candidate suggestion generation module that generates a number of candidate words that may be the correction of the misspelled word. The candidate suggestion generation module implements an algorithm for indexing, searching, and storing terms from an index with a constrained edit di…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F16/3322. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 06 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).