Search pattern suggestions for large datasets

US2021042363A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021042363-A1
Application numberUS-201916536645-A
CountryUS
Kind codeA1
Filing dateAug 9, 2019
Priority dateAug 9, 2019
Publication dateFeb 11, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, method, and computer program product embodiments for performing phrase extraction on documents. A document may be broken down into fragments based on punctuation marks, and those fragments broken down into phrases based on stop-words. These phrases may be scored based on a frequency of appearance within the document, and the highest scoring phrases mapped to the document for search purposes. Those mapped phrases may also be used to provide suggestions for a search. Furthermore, phrases within a document may be scored against phrases across a set of documents to classify the document on the basis of these scores and a classification of documents that share similar phrase scores.

First claim

Opening claim text (preview).

1 . A computer implemented method, comprising: breaking, by one or more computing devices, a document on a punctuation mark to create a fragment; breaking, by the one or more computing devices, the fragment on a stop-word to create a phrase filtering out the stop-word; generating, by the one or more computing devices, a set of subphrases, wherein the set of subphrases comprises combinations of consecutive words of the phrase broken down into all individual subphrases that range in length from one word to the number of words in the phrase; and mapping, by the one or more computing devices, the document to a subphrase of the set of subphrases having a highest frequency score in the document in a document map. 2 . The computer implemented method of claim 1 , further comprising: storing, by the one or more computing devices, the document map in a memory instance; storing, by the one or more computing devices, the subphrase as a key value of a suggestions map in the memory instance; and suggesting, by the one or more computing devices, the subphrase from the suggestions map based on matching a character sequence from a search input to a character sequence of the subphrase. 3 . The computer implemented method of claim 2 , further comprising: suggesting, by the one or more computing devices, the document from the document map when the search input matches the subphrase. 4 . The computer implemented method of claim 2 , wherein storing the document map in the memory instance comprises: storing, by the one or more computing devices, the document map including a plurality of additional subphrases, wherein the additional subphrases are selected for storage based on a minimum frequency score. 5 . The computer implemented method of claim 2 , further comprising: storing, by the one or more computing devices, the document map and the suggestions map in an additional memory instance, wherein suggesting the subphrase from the suggestions map comprises obtaining the suggestions map from the memory instance or the additional memory instance based on a load of the memory instance and the additional memory instance. 6 . The computer implemented method of claim 1 , further comprising: calculating, by the one or more computing devices, a frequency score of the subphrase based on a frequency of words in the subphrase. 7 . The computer implemented method of claim 1 , further comprising: scoring, by the one or more computing devices, a frequency of the subphrase against subphrases of a first classification document and subphrases of a second classification document; determining, by the one or more computing devices, that the frequency of the subphrase is higher relative to the first classification document than the second classification document; and assigning, by the one or more computing devices, a classification of the first classification document to the document. 8 . A system, comprising: a memory configured to store operations; and one or more processors configured to perform the operations, the operations comprising: breaking a document on a punctuation mark to create a fragment, breaking the fragment on a stop-word to create a phrase filtering out the stop-word, generating a set of subphrases, wherein the set of subphrases comprises combinations of consecutive words of the phrase broken down into all individual subphrases that range in length from one word to the number of words in the phrase, and mapping the document to a subphrase of the set of subphrases having a highest frequency score in the document in a document map. 9 . The system of claim 8 , the operations further comprising: storing the document map in a memory instance; storing the subphrase as a key value of a suggestions map in the memory instance; and suggesting the subphrase from the suggestions map based on matching a character sequence from a search input to a character sequence of the subphrase. 10 . The system of claim 9 , the operations further comprising: suggesting the document from the document map when the search input matches the subphrase. 11 . The system of claim 9 , wherein storing the document map in the memory instance comprises: storing the document map including a plurality of additional subphrases, wherein the additional subphrases are selected for storage based on a minimum frequency score. 12 . The system of claim 9 , the operations further comprising: storing the document map and the suggestions map in an additional memory instance, wherein suggesting the subphrase from the suggestions map comprises obtaining the suggestions map from the memory instance or the additional memory instance based on a load of the memory instance and the additional memory instance. 13 . The system of claim 8 , the operations further comprising: calculating a frequency score of the subphrase based on a frequency of words in the subphrase. 14 . The system of claim 8 , the operations further comprising: scoring a frequency of the subphrase against subphrases of a first classification document and subphrases of a second classification document; determining that the frequency of the subphrase is higher relative to the first classification document than the second classification document; and assigning a classification of the first classification document to the document. 15 . A computer readable storage device having instructions stored thereon, execution of which, by one or more processing devices, causes the one or more processing devices to perform operations comprising: breaking a document on a punctuation mark to create a fragment; breaking the fragment on a stop-word to create a phrase filtering out the stop-word; generating a set of subphrases, wherein the set of subphrases comprises combinations of consecutive words of the phrase broken down into all individual subphrases that range in length from one word to the number of words in the phrase; and mapping the document to a subphrase of the set of subphrases having a highest frequency score in the document in a document map. 16 . The computer readable storage device of claim 15 , the operations further comprising: storing the document map in a memory instance; storing the subphrase as a key value of a suggestions map in the memory instance; and suggesting the subphrase from the suggestions map based on matching a character sequence from a search input to a character sequence of the subphrase. 17 . The computer readable storage device of claim 16 , the operations further comprising: suggesting the document from the document map when the search input matches the subphrase. 18 . The computer readable storage device of claim 16 , wherein storing the document map in the memory instance comprises: storing the document map including a plurality of additional subphrases, wherein the additional subphrases are selected for storage based on a minimum frequency score. 19 . The computer readable storage device of claim 15 , the operations further comprising: calculating a frequency score of the subphrase based on a frequency of words in the subphrase. 20 . The computer readable storage device of claim 15 , the operations further comprising: scoring a frequency of the subphrase against subphrases of a first classification document and subphrases of a second classification document; determining that the frequency of the subphrase is higher relative to the first classification document than the second classification document; and as

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • G06F16/35Primary

    Clustering; Classification · CPC title

  • Inference or reasoning models · CPC title

  • using directory or table look-up (use of a directory or look-up table in file systems G06F16/13) · CPC title

  • using system suggestions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021042363A1 cover?
Disclosed herein are system, method, and computer program product embodiments for performing phrase extraction on documents. A document may be broken down into fragments based on punctuation marks, and those fragments broken down into phrases based on stop-words. These phrases may be scored based on a frequency of appearance within the document, and the highest scoring phrases mapped to the doc…
Who is the assignee on this patent?
Capital One Services Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/35. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Feb 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).