Aggregation operations in a distributed database

US11720570B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11720570-B2
Application numberUS-202117214247-A
CountryUS
Kind codeB2
Filing dateMar 26, 2021
Priority dateMar 26, 2021
Publication dateAug 8, 2023
Grant dateAug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Querying a distributed database including a table sharded into shards distributed to database instances includes receiving a data-query that includes an aggregation clause on a first column and a grouping clause on a second column; obtaining and outputting results data. Obtaining the results data includes receiving, by a query coordinator, intermediate results data; and combining, by the query coordinator, the intermediate results to obtain the results data. Receiving the intermediate results data includes receiving, from a first database instance, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receiving, from a second database instance, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for querying a distributed database, comprising: receiving a data-query at the distributed database, wherein the distributed database comprises a table, the table comprises a first column and a second column, the table is partitioned into shards according to a sharding criterion, wherein the sharding criterion indicates the first column, such that a first shard includes one or more rows of the table having a first value of the first column and a second shard omits a row of the table having the first value of the first column, the shards are distributed to database instances of the distributed database, and the data-query comprises an aggregation clause on the first column and a grouping clause on the second column; identifying a query coordinator for processing the data-query; obtaining results data responsive to the data-query, wherein obtaining the results data comprises: receiving, by the query coordinator and from at least a subset of the database instances of the distributed database, intermediate results data responsive to at least a portion of the data-query, wherein receiving the intermediate results data includes: receiving, from a first database instance for a first shard, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receiving, from a second database instance for a second shard, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause; and combining, by the query coordinator, the intermediate results to obtain the results data; and outputting the results data. 2. The method of claim 1 , wherein the aggregation clause comprises determining a cardinality of distinct values of the first column. 3. The method of claim 2 , wherein combining, by the query coordinator, the intermediate results to obtain the results data comprises: querying at least some of the shards for respective shard-specific distinct values of the first column grouped by shard-specific values of the second column; and for each shard-specific value of the second column, obtaining a respective count of the respective shard-specific distinct values. 4. The method of claim 1 , further comprising: receiving, at a low-latency data analysis system that includes the distributed database, data expressing a usage intent, in response to user input associated with a user, wherein receiving the data-query comprises: obtaining the data-query in response to receiving the data expressing the usage intent; and wherein outputting the results data comprises: outputting at least a portion of the results data for presentation to the user. 5. The method of claim 1 , wherein the sharding criterion consists of the first column of the table. 6. The method of claim 1 , wherein the sharding criterion comprises the first column of the table and the second column of the table. 7. The method of claim 6 , wherein the sharding criterion comprises sharding on the first column followed by sharding on the second column. 8. The method of claim 1 , wherein the aggregation clause comprises at least one of a minimum clause or a maximum clause. 9. The method of claim 1 , wherein the aggregation clause further comprises determining a cardinality of distinct values of a third column of the table, wherein the sharding criterion does not include the third column of the table, and wherein an intermediate result of the intermediate results further comprises distinct values of the third column for the each value of the second column. 10. The method of claim 1 , wherein the data-query further comprises a sampling clause. 11. A device for querying a distributed database, comprising: a memory; and a processor, the processor configured to execute instructions stored in the memory to: receive a data-query at the distributed database, wherein the distributed database comprises a table, the table comprises a first column and a second column, the table is partitioned into shards according to a sharding criterion, wherein the sharding criterion indicates the first column, such that a first shard includes one or more rows of the table having a first value of the first column and a second shard omits a row of the table having the first value of the first column, the shards are distributed to database instances of the distributed database, and the data-query comprises an aggregation clause on the first column and a grouping clause on the second column; identify a query coordinator for processing the data-query; obtain results data responsive to the data-query, wherein to obtain the results data comprises to: receive, by the query coordinator and from at least a subset of the database instances of the distributed database, intermediate results data responsive to at least a portion of the data-query, wherein to receive the intermediate results data includes to: receive, from a first database instance for a first shard, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receive, from a second database instance for a second shard, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause; and combine, by the query coordinator, the intermediate results to obtain the results data; and output the results data. 12. The device of claim 11 , wherein the aggregation clause comprises determining a cardinality of distinct values of the first column. 13. The device of claim 12 , wherein to combine, by the query coordinator, the intermediate results to obtain the results data comprises to: query at least some of the shards for respective shard-specific distinct values of the first column grouped by shard-specific values of the second column; and for each shard-specific value of the second column, obtain a respective count of the respective shard-specific distinct values. 14. The device of claim 11 , wherein the processor is further configured to execute instructions to: receive, at a low-latency data analysis system that includes the distributed database, data expressing a usage intent, in response to user input associated with a user, wherein to receive the data-query comprises to: obtain the data-query in response to receiving the data expressing the usage intent; and wherein to output the results data comprises to: output at least a portion of the results data for presentation to the user. 15. The device of claim 11 , wherein the sharding criterion consists of the first column of the table. 16. The device of claim 11 , wherein the sharding criterion comprises the first column of the table and the second column of the table. 17. The device of claim 16 , wherein the sharding criterion comprises sharding on the first column followed by sharding on the second column. 18. The device of claim 11 , wherein the aggregation clause comprises at least one of a minimum clause or a maximum clause. 19. The device of claim 11 , wherein the aggregation clause further comprises determining a cardinality of distinct value

Assignees

Inventors

Classifications

  • Aggregation; Duplicate elimination · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Presentation of query results · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • G06F16/244Primary

    Grouping and aggregation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720570B2 cover?
Querying a distributed database including a table sharded into shards distributed to database instances includes receiving a data-query that includes an aggregation clause on a first column and a grouping clause on a second column; obtaining and outputting results data. Obtaining the results data includes receiving, by a query coordinator, intermediate results data; and combining, by the query …
Who is the assignee on this patent?
Thoughtspot Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24556. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).