End-to-end memory networks for contextual language understanding
US-2017372200-A1 · Dec 28, 2017 · US
US10372814B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10372814-B2 |
| Application number | US-201615296794-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 18, 2016 |
| Priority date | Oct 18, 2016 |
| Publication date | Aug 6, 2019 |
| Grant date | Aug 6, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments are directed to a spellcheck module for an enterprise search engine. The spellcheck module includes a candidate suggestion generation module that generates a number of candidate words that may be the correction of the misspelled word. The candidate suggestion generation module implements an algorithm for indexing, searching, and storing terms from an index with a constrained edit distance, using words in a collection of documents. The spellcheck module further includes a candidate suggestion ranking module. In one embodiment, a non-contextual approach using a linear combination of distance and probability scores is utilized; while in another embodiment, a context sensitive approach accounting for real-word misspells and adopting deep learning models is utilized. In use, a query is provided to the spellcheck module to generate results in the form of a ranked list of generated candidate entries that may be an entry a user accidentally misspelled.
Opening claim text (preview).
We claim: 1. A computer-implemented method for adaptive correction of misspelling, the method comprising: pre-training, by a processor, a pre-trained word vector; receiving, at the processor from a user device connected to the processor, a text for spelling analysis; creating, by the processor, a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; comparing, by the processor, a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; mapping, by the processor, each word in the text to the pre-trained word vector; obtaining, by the processor, a first vector representing a left context of the particular misspelled word and a second vector representing a right context of the particular misspelled word using a recurrent neural network (RNN); inputting, by the processor, the first vector and the second vector to a fully connected layer through the RNN, and inputting, by the processor, a third vector representing the particular misspelled word directly to the fully connected layer; replacing, by the processor, the particular misspelled word with each candidate in the candidate set of entries; outputting, by the processor, a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; ranking, by the processor, the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; ordering, by the processor, at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and displaying, to a user, the corrections to the particular misspelled word. 2. The method of claim 1 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, LDA, HAL, and NNMF. 3. The method of claim 1 , wherein the RNN comprises one or more of simple RNN, LSTM, and GRU. 4. The method of claim 1 , wherein the edit distance from the particular misspelled word and the minimum frequency of occurrence are predefined parameters. 5. The method of claim 1 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, HAL, and NNMF. 6. A system for adaptive correction of misspelling, the system comprising: a processor coupled to one or more user devices to receive user-generated search queries from the one or more user devices, the processor configured to: pre-train a pre-trained word vector; receive, from a first user device of the one or more user devices, a text for spelling analysis; create a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; compare a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; map each word in the text to the pre-trained word vector; obtain a first vector representing a left context and a second vector representing a right context using a recurrent neural network (RNN); output, the first vector and the second vector from the RNN to a fully connected layer, and output a third vector representing the particular misspelled word in the text directly to the fully connected layer; replace the particular misspelled word with each candidate in the candidate set of entries; output a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; rank the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; order at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and display, to a user, the corrections to the particular misspelled word. 7. The system of claim 6 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, LDA, HAL, and NNMF. 8. The system of claim 6 , wherein the RNN comprises one or more of simple RNN, LSTM, and GRU. 9. The system of claim 6 , wherein the edit distance from the particular misspelled word and the minimum frequency of occurrence are predefined parameters. 10. The system of claim 6 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, HAL, and NNMF. 11. A computer program product for adaptive correction of misspelling, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor coupled to one or more user devices to receive user-generated search queries from the one or more user devices to cause the processor to: pre-train a pre-trained word vector; receive, from a first user device of the one or more user devices, a text for spelling analysis; create a table of entries for each correctly spelled word in a corpus, wherein each table of entries includes alternative words of a particular correctly spelled word, each alternative word having one or more characters less than the particular correctly spelled word, the number of occurrences of each alternative word in the corpus, and links of the alternative words to the particular correctly spelled word; compare a particular misspelled word in the text to the table of entries having an edit distance from the particular misspelled word and a minimum frequency of occurrence in the corpus to form a candidate set of entries; map each word in the text to the pre-trained word vector; obtain a first vector representing a left context and a second vector representing a right context using a recurrent neural network (RNN); output the first vector and the second vector from the RNN to a fully connected layer, and output a third vector representing the particular misspelled word in the text directly to the fully connected layer; replace the particular misspelled word with each candidate in the candidate set of entries; output a context sensitive score from a logistic unit for each candidate, wherein the logistic unit is connected to the fully connected layer; rank the candidate set of entries utilizing the context sensitive score so that each candidate has a ranking; order at least some of the candidates based on the ranking to identify corrections to the particular misspelled word; and display, to a user, the corrections to the particular misspelled word. 12. The computer program product of claim 11 , wherein the pre-trained word vector is implemented with an embedding method comprising one or more of: cbow, skip-gram, GloVe, LSA, PLSA, LDA, HAL, and NNMF. 13. The computer program product of claim 11 , wherein the RNN
using system suggestions (G06F16/3325 takes precedence) · CPC title
Orthographic correction, e.g. spell checking or vowelisation · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.