Aggregation operations in a distributed database

US12591579B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12591579-B2
Application numberUS-202418754315-A
CountryUS
Kind codeB2
Filing dateJun 26, 2024
Priority dateMar 26, 2021
Publication dateMar 31, 2026
Grant dateMar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A distributed database that includes multiple database instances receives a data-query that includes an aggregation clause on a first column of a table. The table is partitioned into shards according to a sharding criterion based on the first column such that all rows having the same value for the first column are included in the same shard. The shards are distributed to the multiple database instances. Respective intermediate results are received from at least some of the database instances. Each intermediate result received from a respective database instance that includes a respective shard aggregates values of the first column in the respective shard. The respective intermediate results are combined to obtain a final result of the data-query. The final result is then output.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method, comprising: receiving, at a distributed database that includes multiple database instances, a data-query including an aggregation clause on a first column of a table that is partitioned into shards according to a sharding criterion based on the first column such that all rows having a same value for the first column are included in a same shard, wherein the shards are distributed to the multiple database instances; receiving, from at least some of the database instances, respective intermediate results, wherein each respective intermediate result received from a respective database instance that includes a respective shard is an aggregation of values of the first column in the respective shard; combining the respective intermediate results to obtain a combined result of the data-query; and outputting the combined result. 2 . The method of claim 1 , wherein combining the respective intermediate results to obtain the combined result of the data-query comprises: obtaining the combined result by using a union operation to remove duplicate values from the respective intermediate results. 3 . The method of claim 1 , wherein the aggregation clause includes a distinct count of the first column. 4 . The method of claim 3 , wherein the combined result includes a count of distinct values of the first column. 5 . The method of claim 1 , wherein the sharding criterion includes at least one column of the table. 6 . The method of claim 1 , wherein the data-query includes a sampling clause. 7 . The method of claim 1 , wherein the combined result includes a sum of aggregated values from the respective intermediate results. 8 . The method of claim 1 , wherein the data-query includes a grouping clause on a second column of the table. 9 . The method of claim 8 , wherein the respective intermediate results are aggregated based on the values of the second column. 10 . The method of claim 9 , further comprising: calculating a distinct count of the first column for each group of the second column. 11 . The method of claim 1 , wherein the respective intermediate results include minimum values of the first column for each shard. 12 . The method of claim 1 , wherein the respective intermediate results include maximum values of the first column for each shard. 13 . A system, comprising: one or more memories; and one or more processors, the one or more processors configured to execute instructions stored in the one or more memories to: receive, at a distributed database that includes multiple database instances, a data-query including an aggregation clause on a first column of a table that is partitioned into shards according to a sharding criterion based on the first column such that all rows having a same value for the first column are included in a same shard, wherein the shards are distributed to the multiple database instances; receive, from at least some of the database instances, respective intermediate results, wherein each respective intermediate result received from a respective database instance that includes a respective shard is an aggregation of values of the first column in the respective shard; combine the respective intermediate results to obtain a combined result of the data-query; and output the combined result. 14 . The system of claim 13 , wherein the instructions to combine the respective intermediate results to obtain the combined result of the data-query comprise instructions to: obtain the combined result by using a union operation to remove duplicate values from the respective intermediate results. 15 . The system of claim 13 , wherein the one or more processors configured to execute instructions stored in the one or more memories to: calculate a distinct count of the first column for each group of a second column. 16 . The system of claim 13 , wherein the respective intermediate results include minimum values of the first column for each shard or maximum values of the first column for the each shard. 17 . One or more non-transitory computer readable media storing instructions operable to cause one or more processors to perform operations comprising: receiving, at a distributed database that includes multiple database instances, a data-query including an aggregation clause on a first column of a table that is partitioned into shards according to a sharding criterion based on the first column such that all rows having a same value for the first column are included in a same shard, wherein the shards are distributed to the multiple database instances; receiving, from at least some of the database instances, respective intermediate results, wherein each respective intermediate result received from a respective database instance that includes a respective shard is an aggregation of values of the first column in the respective shard; combining the respective intermediate results to obtain a combined result of the data-query; and outputting the combined result. 18 . The one or more non-transitory computer readable media of claim 17 , wherein combining the respective intermediate results to obtain the combined result of the data-query comprises: obtaining the combined result by using a union operation to remove duplicate values from the respective intermediate results. 19 . The one or more non-transitory computer readable media of claim 17 , wherein the operations further comprise: calculating a distinct count of the first column for each group of a second column. 20 . The one or more non-transitory computer readable media of claim 17 , wherein the respective intermediate results include minimum values of the first column for each shard or maximum values of the first column for the each shard.

Assignees

Inventors

Classifications

  • Presentation of query results · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor · CPC title

  • Distributed queries · CPC title

  • Aggregation; Duplicate elimination · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12591579B2 cover?
A distributed database that includes multiple database instances receives a data-query that includes an aggregation clause on a first column of a table. The table is partitioned into shards according to a sharding criterion based on the first column such that all rows having the same value for the first column are included in the same shard. The shards are distributed to the multiple database i…
Who is the assignee on this patent?
Thoughtspot Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/24556. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).