Simplified Hash Table
US-2024422006-A1 · Dec 19, 2024 · US
US11314730B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11314730-B1 |
| Application number | US-202016828188-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 24, 2020 |
| Priority date | Mar 24, 2020 |
| Publication date | Apr 26, 2022 |
| Grant date | Apr 26, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for memory-efficient streaming count estimation for multisets are described. A method for memory-efficient streaming count estimation for multisets may include obtaining data from a plurality of data sources, and estimating a count for one or more attributes of the data using a telescoping count-min sketch (CMS) data structure, the telescoping CMS including at least a first table and a second table, wherein count values for the data are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: identifying a first plurality of blocks based at least on a plurality of records stored in a storage service of a provider network; identifying a plurality of sets of matching blocks from the first plurality of blocks; deleting the plurality of sets of matching blocks except for a first block from each set from the plurality of sets of matching blocks; estimating a count for each block from the first plurality of blocks using a telescoping count-min sketch (CMS) data structure, the telescoping CMS data structure including at least a first table and a second table, wherein count values for the plurality of records are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the overflowed count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table; and iteratively performing dynamic blocking based at least on the telescoping CMS data structure to generate subsequent pluralities of blocks until the subsequent pluralities of blocks are below a threshold size. 2. The computer-implemented method of claim 1 , wherein a first table width of the first table is larger than a second table width of the second table, wherein a first cell width of each cell of the first table is smaller than a second cell width of each cell of the second table, and wherein each table of the telescoping CMS data structure has an equal number of rows, each row corresponding to a hashing algorithm independent of the other rows. 3. The computer-implemented method of claim 1 , wherein the corresponding cell of the second table is determined by calculating a modulo of the cell of the first table and a table width of the second table. 4. A computer-implemented method comprising: obtaining data from a plurality of data sources; and estimating a count for one or more attributes of the data using a telescoping count-min sketch (CMS) data structure, the telescoping CMS data structure including at least a first table and a second table, wherein count values for the data are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the overflowed count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table. 5. The computer-implemented method of claim 4 , wherein a first table width of the first table is larger than a second table width of the second table. 6. The computer-implemented method of claim 4 , wherein a first cell width of each cell of the first table is smaller than a second cell width of each cell of the second table. 7. The computer-implemented method of claim 4 , wherein each table of the telescoping CMS data structure has an equal number of rows, each row corresponding to a hashing algorithm independent of the other rows. 8. The computer-implemented method of claim 4 , wherein the corresponding cell of the second table is determined by calculating a modulo of the cell of the first table and a table width of the second table. 9. The computer-implemented method of claim 4 , further comprising: determining an estimated number of intersecting blocks of the data using the telescoping CMS data structure is below a threshold; and performing intersection dynamic blocking using the blocks of data. 10. The computer-implemented method of claim 4 , further comprising: storing the data in a data lake service in a provider network, wherein the data lake service includes a plurality of storage instances in which the data are stored and, wherein each storage instance generates a local telescoping CMS data structure for a portion of data stored on that instance. 11. The computer-implemented method of claim 10 , further comprising: merging a plurality of the local telescoping CMS data structures generated by the plurality of storage instances to generate a merged telescoping CMS data structure. 12. The computer-implemented method of claim 1 , wherein a statistical distribution associated with a frequency of the plurality of records is a power distribution. 13. A system comprising: a first one or more electronic devices to implement a storage service in a multi-tenant provider network; and a second one or more electronic devices to implement a data analysis service in the multi-tenant provider network, the data analysis service including instructions that upon execution cause the data analysis service to: obtain a plurality of records from a plurality of data sources; estimate a count for one or more attributes of the data using a telescoping count-min sketch (CMS) data structure, the telescoping CMS data structure including at least a first table and a second table, wherein count values for the data are stored in a plurality of cells of the first table and when a cell of the first table is saturated, the overflowed count values for that cell are stored in a corresponding cell of the second table determined based at least on the cell of the first table. 14. The system of claim 13 , wherein a first table width of the first table is larger than a second table width of the second table. 15. The system of claim 13 , wherein a first cell width of each cell of the first table is smaller than a second cell width of each cell of the second table. 16. The system of claim 13 , wherein each table of the telescoping CMS data structure has an equal number of rows, each row corresponding to a hashing algorithm independent of the other rows. 17. The system of claim 13 , wherein the corresponding cell of the second table is determined by calculating a modulo of the cell of the first table and a table width of the second table. 18. The system of claim 13 , wherein the instructions, when executed, further cause the data analysis service to: determine an estimated number of intersecting blocks of the data using the telescoping CMS data structure is below a threshold; and perform intersection dynamic blocking using the blocks of data. 19. The system of claim 13 , wherein the instructions, when executed, further cause the data analysis service to: storie the plurality of records in a data lake service in a provider network, wherein the data lake service includes a plurality of storage instances in which the data are stored and, wherein each storage instance generates a local telescoping CMS data structure for a portion of the data stored on that instance. 20. The system of claim 19 , wherein the instructions, when executed, further cause the data lake service to: merge a plurality of the local telescoping CMS data structures generated by the plurality of storage instances to generate a merged telescoping CMS data structure.
Hash tables · CPC title
Updates performed during online database operations; commit processing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.