Methods and apparatus for performing spelling corrections using one or more variant hash tables

US9552349B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9552349-B2
Application numberUS-51378206-A
CountryUS
Kind codeB2
Filing dateAug 31, 2006
Priority dateAug 31, 2006
Publication dateJan 24, 2017
Grant dateJan 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and apparatus are provided for performing spelling corrections using one or more variant hash tables. The spelling of at least one candidate word is corrected by obtaining at least one variant dictionary hash table based on variants of a set of known correctly spelled words, wherein the variants are obtained by applying one or more of a deletion, insertion, replacement, and transposition operation on the correctly spelled words; obtaining from the candidate word one or more lookup variants using one or more of the deletion, insertion, replacement, and transposition operations; evaluating one or more of the candidate word and the lookup variants against the at least one variant dictionary hash table; and indicating a candidate correction if there is at least one match in the at least one variant dictionary hash table.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for correcting spelling of at least one candidate word, said method comprising: obtaining at least one variant dictionary hash table based on variants of a set of known correctly spelled words, wherein said variants are obtained by applying one or more of a deletion, insertion, replacement, and transposition operation on said correctly spelled words, wherein said variants obtained by applying one or more of an insertion and replacement operation on said correctly spelled words comprise a wildcard character, wherein for a given correctly spelled word having a length W, said at least one variant dictionary hash table comprises W of said variants each comprising said wildcard character in a different position, and wherein said variants are stored in said at least one variant dictionary hash table and are mapped to one of said corresponding correctly spelled words from said set of known correctly spelled words which are stored in a dictionary hash table that is distinct from said at least one variant dictionary hash table; performing one or more of a deletion, insertion, replacement, and transposition operation on the at least one candidate word to obtain one or more lookup variants, wherein said lookup variants obtained by applying one or more of an insertion and replacement operation on said at least one candidate word comprise a wildcard character; evaluating one or more of said lookup variants utilizing said at least one variant dictionary hash table to identify matches; and indicating a candidate correction if said evaluation step indicates that there is at least one match in the at least one variant dictionary hash table. 2. The method of claim 1 , wherein said dictionary hash table is pre-created prior to obtaining said candidate word, further comprising the step of obtaining said dictionary hash table having entries in a dictionary of known correctly spelled words and wherein said dictionary hash table and said at least one variant dictionary hash table are based on said dictionary and are comprised of at least one distance one variation for each of said entries, wherein said distance one variation comprises one or more of a deletion, insertion, replacement, and transposition operation performed on said entries; and wherein the step of evaluating said lookup variants utilizing said at least one variant dictionary hash table further comprises the step of evaluating one or more distance one variants utilizing said at least one variant dictionary hash table. 3. The method of claim 2 , wherein said distance one variation comprises a replacement operation to generate a replacement hash table having entries of single character wild card replacements of said entries in said dictionary and said method further comprises the steps of generating single character replacements and insertions of said candidate word and comparing said single character replacements and insertions against said replacement hash table. 4. The method of claim 3 , wherein the replacement hash table is obtained by: generating variants of each word in the dictionary, each variant is comprised of replacing any one character in the word with a wild card character and leaving other characters unchanged, thereby generating W variants for each word of length W; and for each generated variant of a word in the dictionary, storing a key-value pair in a hash table, wherein a key is a generated variant having a value that is the word itself. 5. The method of claim 2 , further comprising the steps of generating one or more distance one variants of said at least one candidate word and testing said distance one variants against one or more of said dictionary hash table and said at least one variant dictionary hash table, and accumulating matches. 6. The method of claim 5 , wherein said one or more distance one variants comprises adjacent character transpositions of said at least one candidate word obtained by generating all variants of the candidate word wherein any one pair of adjacent characters are interchanged, and the remaining characters are left unchanged. 7. The method of claim 5 , wherein said one or more distance one variants comprises single character wild card replacements of said at least one candidate word obtained by generating variants of said at least one candidate word by replacing any one character in said at least one candidate word with a chosen wild card character and leaving other characters unchanged, thereby generating W variants of said at least one candidate word of length W. 8. The method of claim 5 , wherein said one or more distance one variants comprises single character wild card insertions of said at least one candidate word obtained by generating variants of said at least one word which comprise inserting a wild card character before or after any character of said at least one candidate word and leaving the other characters unchanged, thereby generating W+1 variants of said at least one candidate word of length W. 9. The method of claim 2 , further comprising the step of testing said at least one candidate word against a deletion hash table, a deletions-transposition hash table, and a double deletion hash table and accumulating matches. 10. The method of claim 2 , further comprising the steps of generating all adjacent character transpositions of said at least one candidate word and testing said adjacent character transpositions against transposition and deletion hash tables, and accumulating matches. 11. The method of claim 2 , further comprising the steps of generating all single character deletions of said at least one candidate word and testing said single character deletions against transposition and deletion hash tables, and accumulating matches. 12. The method of claim 2 , further comprising the steps of generating all single character replacements of said at least one candidate word and testing said single character replacements against a transposition-replacement, deletion-replacement, and insertion-replacement hash tables, and accumulating matches. 13. The method of claim 2 , further comprising the steps of generating two character deletions of said at least one candidate word and testing said two character deletions against a double deletion hash table, and accumulating matches. 14. The method as recited in claim 2 , further comprising a step of generating arbitrary character transpositions of said at least one candidate word by generating all variants of the candidate word wherein any one pair of not necessarily adjacent characters are interchanged, and the remaining characters are left unchanged. 15. The method of claim 2 , further comprising the steps of generating all not necessarily adjacent character transpositions of said at least one candidate word and testing said not necessarily adjacent character transpositions against the transposition hash table and a deletion hash tables, and accumulating matches. 16. A system for correcting spelling of at least one candidate word, said system comprising: a memory; and at least one hardware processor, coupled to the memory, operative to: obtain at least one variant dictionary hash table based on variants of a set of known correctly spelled words, wherein said variants are obtained by applying one or more of a deletion, insertion, replacement, and transposition operation on said correctly spelled words, wherein said variants obtained by applying one or more of an insertion and replacement operation on said correctly spelled words comprise a wildcard character, wherein for a given correctly spelled word having a length W, said at le

Assignees

Inventors

Classifications

  • Converting codes to words; Guess-ahead of partial word inputs · CPC title

  • G06F40/232Primary

    Orthographic correction, e.g. spell checking or vowelisation · CPC title

  • Dictionaries · CPC title

  • Physics · mapped topic

  • G06F17/273Primary

    Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9552349B2 cover?
Methods and apparatus are provided for performing spelling corrections using one or more variant hash tables. The spelling of at least one candidate word is corrected by obtaining at least one variant dictionary hash table based on variants of a set of known correctly spelled words, wherein the variants are obtained by applying one or more of a deletion, insertion, replacement, and transpositio…
Who is the assignee on this patent?
Hantler Sidney L, Laker Meir M, Lenchner Jonathan, and 2 more
What technology area does this patent fall under?
Primary CPC classification G06F40/232. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).