Generation of pruning index for pattern matching queries

US11113286B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11113286-B2
Application numberUS-202117218962-A
CountryUS
Kind codeB2
Filing dateMar 31, 2021
Priority dateDec 26, 2019
Publication dateSep 7, 2021
Grant dateSep 7, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A query directed at a source table organized into a set of batch units is received. The query includes a pattern matching predicate that specifies a search pattern. A set of N-grams are generated based on the search pattern. A pruning index associated with the source table is accessed. The pruning index comprises a set of filters that index distinct N-grams in each column of the source table. The pruning index is used to identify a subset of batch units to scan for matching data based on the set of N-grams generated for the search pattern. The query is processed by scanning the subset of batch units.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one hardware processor; and at least one memory storing instructions that cause the at least one hardware processor to perform operations comprising: accessing a source table organized into a set of batch units; generating a pruning index based on the source table, the pruning index comprising a set of filters that index distinct N-grams in each column of the source table, each filter in the set of filters corresponding to a batch unit in the set of batch units, the generating of the pruning index comprising generating the set of filters, the generating of the set of filters comprising populating a cell in a first filter with a fingerprint generated based on an N-gram of a data value in the source table; and storing, in a database, the pruning index with an association with the source table. 2. The system of claim 1 , wherein the operations further comprise: preprocessing the data value, the preprocessing of the data value includes generating one or more preprocessed variants of the data value; and generating a set of N-grams for the data value based on the one or more preprocessed variant of the data value, the set of N-grams including the N-gram of the data value. 3. The system of claim 2 , wherein the preprocessing of the data value includes generating a case-agnostic variant of the data value. 4. The system of claim 2 , wherein the preprocessing of the data value includes generating one or more misspelled variants of the data value. 5. The system of claim 2 , wherein the preprocessing of the data value includes generating one or more synonymous variants corresponding to synonyms of the data value. 6. The system of claim 2 , wherein the preprocessing of the data value includes generating a variant with one or more special characters to mark a start and end of the data value. 7. The system of claim 1 , wherein the generating of the set of filters further comprises generating the fingerprint by computing a hash of the N-gram. 8. The system of claim 1 , wherein the operations further comprise: receiving a query directed at the source table, the query including a pattern matching predicate specifying a search pattern; identifying, using the pruning index, a subset of batch units to scan for matching data; and processing the query by scanning the subset of batch units. 9. The system of claim 8 , wherein identifying the subset of batch units comprises: generating a set of fingerprints based on N-grams of the search pattern; and comparing the set of fingerprints to the pruning index. 10. The system of claim 9 , wherein identifying the subset of batch units further comprises: identifying one or more values in the pruning index that match one or more fingerprints in the set of fingerprints; and identifying the subset of batch units based on the one or more values. 11. A method comprising: accessing a source table organized into a set of batch units; generating, using one or more hardware processors, a pruning index based on the source table, the pruning index comprising a set of filters that index distinct N-grams in each column of the source table, each filter in the set of filters corresponding to a batch unit in the set of batch units, the generating of the pruning index comprising generating the set of filters, the generating of the set of filters comprising populating a cell in a first filter with a fingerprint generated based on an N-gram of a data value in the source table; and storing, in a database, the pruning index with an association with the source table. 12. The method of claim 11 , further comprising: preprocessing the data value, the preprocessing of the data value includes generating one or more preprocessed variants of the data value; and generating a set of N-grams for the data value based on the one or more preprocessed variant of the data value, the set of N-grams including the N-gram of the data value. 13. The method of claim 12 , wherein the preprocessing of the data value includes generating a case-agnostic variant of the data value. 14. The method of claim 12 , wherein the preprocessing of the data value includes generating one or more misspelled variants of the data value. 15. The method of claim 12 , wherein the preprocessing of the data value includes generating one or more synonymous variants corresponding to synonyms of the data value. 16. The method of claim 12 , wherein the preprocessing of the data value includes generating a variant with one or more special characters to mark a start and end of the data value. 17. The method of claim 11 , wherein the generating of the set of filters further comprises generating the fingerprint by computing a hash of the N-gram. 18. The method of claim 11 , further comprising: receiving a query directed at the source table, the query including a pattern matching predicate specifying a search pattern; identifying, using the pruning index, a subset of batch units to scan for matching data; and processing the query by scanning the subset of batch units. 19. The method of claim 18 , wherein identifying the subset of batch units comprises: generating a set of fingerprints based on N-grams of the search pattern; and comparing the set of fingerprints to the pruning index. 20. A computer-storage medium comprising instructions that, when executed by one or more processors of a machine, configure the machine to perform operations comprising: accessing a source table organized into a set of batch units; generating a pruning index based on the source table, the pruning index comprising a set of filters that index distinct N-grams in each column of the source table, each filter in the set of filters corresponding to a batch unit in the set of batch units, the generating of the pruning index comprising generating the set of filters, the generating of the set of filters comprising populating a cell in a first filter with a fingerprint generated based on an N-gram of a data value in the source table; and storing, in a database, the pruning index with an association with the source table. 21. The computer-storage medium of claim 20 , wherein the operations further comprise: preprocessing the data value, the preprocessing of the data value includes generating one or more preprocessed variants of the data value; and generating a set of N-grams for the data value based on the one or more preprocessed variant of the data value, the set of N-grams including the N-gram of the data value. 22. The computer-storage medium of claim 21 , wherein the preprocessing of the data value includes at least one of: generating a case-agnostic variant of the data value; generating one or more misspelled variants of the data value; generating one or more synonymous variants corresponding to synonyms of the data value; and generating a variant with one or more special characters to mark a start and end of the data value. 23. The computer-storage medium of claim 21 , wherein generating of the set of filters further comprises generating the fingerprint by computing a hash of the N-gram. 24. The computer-storage medium of claim 21 , wherein the operations further comprise: receiving a query directed at the source table, the query including a pattern matching predicate specifying a search pattern; identifying, using the pruning index, a subset of batch units to scan for matching data; and processing the query by s

Assignees

Inventors

Classifications

  • Efficient disk access during query execution · CPC title

  • Hash tables · CPC title

  • G06F16/221Primary

    Column-oriented storage; Management thereof · CPC title

  • Management thereof · CPC title

  • Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11113286B2 cover?
A query directed at a source table organized into a set of batch units is received. The query includes a pattern matching predicate that specifies a search pattern. A set of N-grams are generated based on the search pattern. A pruning index associated with the source table is accessed. The pruning index comprises a set of filters that index distinct N-grams in each column of the source table. T…
Who is the assignee on this patent?
Snowflake Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24557. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 07 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).