Approximate unique count
US-11487668-B2 · Nov 1, 2022 · US
US11720570B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11720570-B2 |
| Application number | US-202117214247-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 26, 2021 |
| Priority date | Mar 26, 2021 |
| Publication date | Aug 8, 2023 |
| Grant date | Aug 8, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Querying a distributed database including a table sharded into shards distributed to database instances includes receiving a data-query that includes an aggregation clause on a first column and a grouping clause on a second column; obtaining and outputting results data. Obtaining the results data includes receiving, by a query coordinator, intermediate results data; and combining, by the query coordinator, the intermediate results to obtain the results data. Receiving the intermediate results data includes receiving, from a first database instance, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receiving, from a second database instance, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause.
Opening claim text (preview).
What is claimed is: 1. A method for querying a distributed database, comprising: receiving a data-query at the distributed database, wherein the distributed database comprises a table, the table comprises a first column and a second column, the table is partitioned into shards according to a sharding criterion, wherein the sharding criterion indicates the first column, such that a first shard includes one or more rows of the table having a first value of the first column and a second shard omits a row of the table having the first value of the first column, the shards are distributed to database instances of the distributed database, and the data-query comprises an aggregation clause on the first column and a grouping clause on the second column; identifying a query coordinator for processing the data-query; obtaining results data responsive to the data-query, wherein obtaining the results data comprises: receiving, by the query coordinator and from at least a subset of the database instances of the distributed database, intermediate results data responsive to at least a portion of the data-query, wherein receiving the intermediate results data includes: receiving, from a first database instance for a first shard, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receiving, from a second database instance for a second shard, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause; and combining, by the query coordinator, the intermediate results to obtain the results data; and outputting the results data. 2. The method of claim 1 , wherein the aggregation clause comprises determining a cardinality of distinct values of the first column. 3. The method of claim 2 , wherein combining, by the query coordinator, the intermediate results to obtain the results data comprises: querying at least some of the shards for respective shard-specific distinct values of the first column grouped by shard-specific values of the second column; and for each shard-specific value of the second column, obtaining a respective count of the respective shard-specific distinct values. 4. The method of claim 1 , further comprising: receiving, at a low-latency data analysis system that includes the distributed database, data expressing a usage intent, in response to user input associated with a user, wherein receiving the data-query comprises: obtaining the data-query in response to receiving the data expressing the usage intent; and wherein outputting the results data comprises: outputting at least a portion of the results data for presentation to the user. 5. The method of claim 1 , wherein the sharding criterion consists of the first column of the table. 6. The method of claim 1 , wherein the sharding criterion comprises the first column of the table and the second column of the table. 7. The method of claim 6 , wherein the sharding criterion comprises sharding on the first column followed by sharding on the second column. 8. The method of claim 1 , wherein the aggregation clause comprises at least one of a minimum clause or a maximum clause. 9. The method of claim 1 , wherein the aggregation clause further comprises determining a cardinality of distinct values of a third column of the table, wherein the sharding criterion does not include the third column of the table, and wherein an intermediate result of the intermediate results further comprises distinct values of the third column for the each value of the second column. 10. The method of claim 1 , wherein the data-query further comprises a sampling clause. 11. A device for querying a distributed database, comprising: a memory; and a processor, the processor configured to execute instructions stored in the memory to: receive a data-query at the distributed database, wherein the distributed database comprises a table, the table comprises a first column and a second column, the table is partitioned into shards according to a sharding criterion, wherein the sharding criterion indicates the first column, such that a first shard includes one or more rows of the table having a first value of the first column and a second shard omits a row of the table having the first value of the first column, the shards are distributed to database instances of the distributed database, and the data-query comprises an aggregation clause on the first column and a grouping clause on the second column; identify a query coordinator for processing the data-query; obtain results data responsive to the data-query, wherein to obtain the results data comprises to: receive, by the query coordinator and from at least a subset of the database instances of the distributed database, intermediate results data responsive to at least a portion of the data-query, wherein to receive the intermediate results data includes to: receive, from a first database instance for a first shard, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receive, from a second database instance for a second shard, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause; and combine, by the query coordinator, the intermediate results to obtain the results data; and output the results data. 12. The device of claim 11 , wherein the aggregation clause comprises determining a cardinality of distinct values of the first column. 13. The device of claim 12 , wherein to combine, by the query coordinator, the intermediate results to obtain the results data comprises to: query at least some of the shards for respective shard-specific distinct values of the first column grouped by shard-specific values of the second column; and for each shard-specific value of the second column, obtain a respective count of the respective shard-specific distinct values. 14. The device of claim 11 , wherein the processor is further configured to execute instructions to: receive, at a low-latency data analysis system that includes the distributed database, data expressing a usage intent, in response to user input associated with a user, wherein to receive the data-query comprises to: obtain the data-query in response to receiving the data expressing the usage intent; and wherein to output the results data comprises to: output at least a portion of the results data for presentation to the user. 15. The device of claim 11 , wherein the sharding criterion consists of the first column of the table. 16. The device of claim 11 , wherein the sharding criterion comprises the first column of the table and the second column of the table. 17. The device of claim 16 , wherein the sharding criterion comprises sharding on the first column followed by sharding on the second column. 18. The device of claim 11 , wherein the aggregation clause comprises at least one of a minimum clause or a maximum clause. 19. The device of claim 11 , wherein the aggregation clause further comprises determining a cardinality of distinct value
Aggregation; Duplicate elimination · CPC title
Tablespace storage structures; Management thereof · CPC title
Presentation of query results · CPC title
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
Grouping and aggregation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.