Matching large sets of words

US9659059B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9659059-B2
Application numberUS-201514847769-A
CountryUS
Kind codeB2
Filing dateSep 8, 2015
Priority dateJul 20, 2012
Publication dateMay 23, 2017
Grant dateMay 23, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Word phrases are stored in a phrase structure. Each word is stored as a keyword in a keyword structure. Each keyword is associated with usage attributes identifying use of a word in a word phrase. Any preceding words associated with a keyword, and a mapping from any preceding words to a word phrase, is stored for each word. A word string is input. Match attributes are updated in a match structure if a word in the word string matches any keyword and if any preceding words associated with any matching keyword includes a preceding word which precedes the word in the word string. The match attributes indicate use of the matching word in the word string and in a word phrase. Whether a word phrase is present in the word string is determined based on the usage attributes and the match attributes associated with multiple matching words.

First claim

Opening claim text (preview).

The invention claimed is: 1. A system for matching large sets of words, the system comprising: one or more processors; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to: store a plurality of word phrases in a phrase-based data structure; store each word in the phrase-based data structure as a corresponding keyword in a keyword-based data structure, wherein each corresponding keyword is associated with corresponding usage attributes identifying use of a corresponding word in a corresponding word phrase in the phrase-based data structure; store, for each word in the phrase-based data structure, any corresponding preceding words associated with a corresponding keyword, and a mapping from any corresponding preceding words to a corresponding word phrase; determine whether a word from an inputted word string matches any keyword in the keyword-based data structure; determine whether any corresponding preceding words associated with any matching keyword comprises a preceding word which precedes the matching word in the word string in response to a determination that the word in the word string matches any keyword in the keyword-based data structure; update corresponding match attributes in a match-based data structure in response to a determination that any corresponding preceding words associated with any matching keyword comprises the preceding word which precedes the matching word in the word string, wherein the corresponding match attributes indicate use of the matching word in the word string and use of the matching word in a corresponding word phrase in the phrase-based data structure; determine, based on the usage attributes and the match attributes associated with a plurality of matching words, whether at least one of the word phrases in the phrase-based data structure is present in the word string. 2. The system of claim 1 , wherein at least one word phrase in the phrase-based data structure comprises a specified number of arbitrary words between words, and wherein any arbitrary words are ignored when identifying a preceding word. 3. The system of claim 1 , wherein the usage attributes comprise a numerical identifier of a corresponding word phrase in the phrase-based data structure, a position of a corresponding word in the corresponding word phrase; whether the corresponding word is a terminal word in the corresponding word phrase, and a number of arbitrary words which the corresponding word follows. 4. The system of claim 1 , wherein the match attributes comprise: a numerical identifier of the word string, a position of a corresponding word in a corresponding word phrase, and a position of the corresponding word in the word string. 5. The system of claim 1 , wherein the match-based data structure comprises an array having a length equal to a total number of word phrases in the phrase-based data structure, and an index corresponding to the total number of word phrases in the phrase-based data structure. 6. The system of claim 1 , comprising further instructions, which when executed, cause the one or more processors to determine whether corresponding usage attributes associated with any matching keyword identify the use of the word in the word string in response to a determination that any corresponding preceding words associated with any matching keyword comprises the preceding word which precedes the matching word in the word string. 7. The system of claim 1 , comprising further instructions, which when executed, cause the one or more processors to determine whether corresponding match attributes in the match-based data structure are consistent with the use of the matching word in the word string in response to a determination that corresponding usage attributes associated with any matching keyword identify the use of the word in the word string. 8. A computer program product comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, the program code including instructions to: store a plurality of word phrases in a phrase-based data structure; store each word in the phrase-based data structure as a corresponding keyword in a keyword-based data structure, wherein each corresponding keyword is associated with corresponding usage attributes identifying use of a corresponding word in a corresponding word phrase in the phrase-based data structure; store, for each word in the phrase-based data structure, any corresponding preceding words associated with a corresponding keyword, and a mapping from any corresponding preceding words to a corresponding word phrase; determine whether a word from an inputted word string matches any keyword in the keyword-based data structure; determine whether any corresponding preceding words associated with any matching keyword comprises a preceding word which precedes the matching word in the word string in response to a determination that the word in the word string matches any keyword in the keyword-based data structure; update corresponding match attributes in a match-based data structure in response to a determination that any corresponding preceding words associated with any matching keyword comprises the preceding word which precedes the matching word in the word string, wherein the corresponding match attributes indicate use of the matching word in the word string and use of the matching word in a corresponding word phrase in the phrase-based data structure; determine, based on the usage attributes and the match attributes associated with a plurality of matching words, whether at least one of the word phrases in the phrase-based data structure is present in the word string. 9. The computer program product of claim 8 , wherein at least one word phrase in the phrase-based data structure comprises a specified number of arbitrary words between words, and wherein any arbitrary words are ignored when identifying a preceding word. 10. The computer program product of claim 8 , wherein the usage attributes comprise a numerical identifier of a corresponding word phrase in the phrase-based data structure, a position of a corresponding word in the corresponding word phrase; whether the corresponding word is a terminal word in the corresponding word phrase, and a number of arbitrary words which the corresponding word follows. 11. The computer program product of claim 8 , wherein the match attributes comprise: a numerical identifier of the word string, a position of a corresponding word in a corresponding word phrase, and a position of the corresponding word in the word string. 12. The computer program product of claim 8 , wherein the match-based data structure comprises an array having a length equal to a total number of word phrases in the phrase-based data structure, and an index corresponding to the total number of word phrases in the phrase-based data structure. 13. The computer program product of claim 8 , wherein the program code comprises further instructions to determine whether corresponding usage attributes associated with any matching keyword identify the use of the word in the word string in response to a determination that any corresponding preceding words associated with any matching keyword comprises the preceding word which precedes the matching word in the word string. 14. The computer program product of claim 8 , wherein the program code comprises further instructions to determine whether corresponding match attributes in the match-based data structure are consistent with the use of the matching word in the word string in

Assignees

Inventors

Classifications

  • of unstructured textual data (document management systems G06F16/93) · CPC title

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • G06F40/247Primary

    Thesauruses; Synonyms · CPC title

  • Recognition of textual entities · CPC title

  • Query execution · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9659059B2 cover?
Word phrases are stored in a phrase structure. Each word is stored as a keyword in a keyword structure. Each keyword is associated with usage attributes identifying use of a word in a word phrase. Any preceding words associated with a keyword, and a mapping from any preceding words to a word phrase, is stored for each word. A word string is input. Match attributes are updated in a match structu…
Who is the assignee on this patent?
Salesforce Com Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/247. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 23 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).