What technology area does this patent fall under?

Primary CPC classification G06F16/24556. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Aggregation operations in a distributed database

US11720570B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11720570-B2
Application number	US-202117214247-A
Country	US
Kind code	B2
Filing date	Mar 26, 2021
Priority date	Mar 26, 2021
Publication date	Aug 8, 2023
Grant date	Aug 8, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Querying a distributed database including a table sharded into shards distributed to database instances includes receiving a data-query that includes an aggregation clause on a first column and a grouping clause on a second column; obtaining and outputting results data. Obtaining the results data includes receiving, by a query coordinator, intermediate results data; and combining, by the query coordinator, the intermediate results to obtain the results data. Receiving the intermediate results data includes receiving, from a first database instance, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receiving, from a second database instance, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for querying a distributed database, comprising: receiving a data-query at the distributed database, wherein the distributed database comprises a table, the table comprises a first column and a second column, the table is partitioned into shards according to a sharding criterion, wherein the sharding criterion indicates the first column, such that a first shard includes one or more rows of the table having a first value of the first column and a second shard omits a row of the table having the first value of the first column, the shards are distributed to database instances of the distributed database, and the data-query comprises an aggregation clause on the first column and a grouping clause on the second column; identifying a query coordinator for processing the data-query; obtaining results data responsive to the data-query, wherein obtaining the results data comprises: receiving, by the query coordinator and from at least a subset of the database instances of the distributed database, intermediate results data responsive to at least a portion of the data-query, wherein receiving the intermediate results data includes: receiving, from a first database instance for a first shard, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receiving, from a second database instance for a second shard, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause; and combining, by the query coordinator, the intermediate results to obtain the results data; and outputting the results data. 2. The method of claim 1 , wherein the aggregation clause comprises determining a cardinality of distinct values of the first column. 3. The method of claim 2 , wherein combining, by the query coordinator, the intermediate results to obtain the results data comprises: querying at least some of the shards for respective shard-specific distinct values of the first column grouped by shard-specific values of the second column; and for each shard-specific value of the second column, obtaining a respective count of the respective shard-specific distinct values. 4. The method of claim 1 , further comprising: receiving, at a low-latency data analysis system that includes the distributed database, data expressing a usage intent, in response to user input associated with a user, wherein receiving the data-query comprises: obtaining the data-query in response to receiving the data expressing the usage intent; and wherein outputting the results data comprises: outputting at least a portion of the results data for presentation to the user. 5. The method of claim 1 , wherein the sharding criterion consists of the first column of the table. 6. The method of claim 1 , wherein the sharding criterion comprises the first column of the table and the second column of the table. 7. The method of claim 6 , wherein the sharding criterion comprises sharding on the first column followed by sharding on the second column. 8. The method of claim 1 , wherein the aggregation clause comprises at least one of a minimum clause or a maximum clause. 9. The method of claim 1 , wherein the aggregation clause further comprises determining a cardinality of distinct values of a third column of the table, wherein the sharding criterion does not include the third column of the table, and wherein an intermediate result of the intermediate results further comprises distinct values of the third column for the each value of the second column. 10. The method of claim 1 , wherein the data-query further comprises a sampling clause. 11. A device for querying a distributed database, comprising: a memory; and a processor, the processor configured to execute instructions stored in the memory to: receive a data-query at the distributed database, wherein the distributed database comprises a table, the table comprises a first column and a second column, the table is partitioned into shards according to a sharding criterion, wherein the sharding criterion indicates the first column, such that a first shard includes one or more rows of the table having a first value of the first column and a second shard omits a row of the table having the first value of the first column, the shards are distributed to database instances of the distributed database, and the data-query comprises an aggregation clause on the first column and a grouping clause on the second column; identify a query coordinator for processing the data-query; obtain results data responsive to the data-query, wherein to obtain the results data comprises to: receive, by the query coordinator and from at least a subset of the database instances of the distributed database, intermediate results data responsive to at least a portion of the data-query, wherein to receive the intermediate results data includes to: receive, from a first database instance for a first shard, first aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause, and receive, from a second database instance for a second shard, second aggregation values indicating, on a per-group basis in accordance with the grouping clause, a respective aggregation value of distinct values of the first column in accordance with the aggregation clause; and combine, by the query coordinator, the intermediate results to obtain the results data; and output the results data. 12. The device of claim 11 , wherein the aggregation clause comprises determining a cardinality of distinct values of the first column. 13. The device of claim 12 , wherein to combine, by the query coordinator, the intermediate results to obtain the results data comprises to: query at least some of the shards for respective shard-specific distinct values of the first column grouped by shard-specific values of the second column; and for each shard-specific value of the second column, obtain a respective count of the respective shard-specific distinct values. 14. The device of claim 11 , wherein the processor is further configured to execute instructions to: receive, at a low-latency data analysis system that includes the distributed database, data expressing a usage intent, in response to user input associated with a user, wherein to receive the data-query comprises to: obtain the data-query in response to receiving the data expressing the usage intent; and wherein to output the results data comprises to: output at least a portion of the results data for presentation to the user. 15. The device of claim 11 , wherein the sharding criterion consists of the first column of the table. 16. The device of claim 11 , wherein the sharding criterion comprises the first column of the table and the second column of the table. 17. The device of claim 16 , wherein the sharding criterion comprises sharding on the first column followed by sharding on the second column. 18. The device of claim 11 , wherein the aggregation clause comprises at least one of a minimum clause or a maximum clause. 19. The device of claim 11 , wherein the aggregation clause further comprises determining a cardinality of distinct value

Assignees

Thoughtspot Inc

Inventors

Classifications

G06F16/24556Primary
Aggregation; Duplicate elimination · CPC title
G06F16/2282
Tablespace storage structures; Management thereof · CPC title
G06F16/248
Presentation of query results · CPC title
G06F16/27
Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title
G06F16/244Primary
Grouping and aggregation · CPC title

Patent family

Related publications grouped by family.

View patent family 80623926

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11720570B2 cover?: Querying a distributed database including a table sharded into shards distributed to database instances includes receiving a data-query that includes an aggregation clause on a first column and a grouping clause on a second column; obtaining and outputting results data. Obtaining the results data includes receiving, by a query coordinator, intermediate results data; and combining, by the query …
Who is the assignee on this patent?: Thoughtspot Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/24556. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Aug 08 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Approximate unique count

Approximate Unique Count

Aggregation Operations In A Distributed Database

Minimizing processing using an index when non leading columns match an aggregation key

Minimizing processing using an index when non-leading columns match an aggregation key

Driving massive scale out through rewrites of analytical functions

Index Sharding

Frequently asked questions