Intra-shard parallelization of data stream processing using virtual shards

US11550505B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-11550505-B1
Application numberUS-202017008998-A
CountryUS
Kind codeB1
Filing dateSep 1, 2020
Priority dateSep 1, 2020
Publication dateJan 10, 2023
Grant dateJan 10, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data stream may include a plurality of records that are ordered, and the plurality of records may be assigned to a processing shard. A first set of virtual shards may be formed, the first set of virtual shards having a first quantity of virtual shards that perform parallel processing operations on behalf of the processing shard. First records of the plurality of records may be processed using the first set of virtual shards. The first quantity of virtual shards may be modified, based at least in part on an observed record age, to a second quantity of virtual shards that perform parallel processing operations on behalf of the processing shard. A second set of virtual shards may be formed having the second quantity of virtual shards. Second records of the plurality of records may be processed using the second set of virtual shards.

First claim

Opening claim text (preview).

What is claimed is: 1. A computing system comprising: one or more processors; and one or more memories having stored therein computing instructions that, upon execution by the one or more processors, cause the computing system to perform operations comprising: receiving a data stream including a plurality of records that are ordered, wherein the plurality of records are assigned to a processing shard; receiving an indication of a target record age; forming a first set of virtual shards, the first set of virtual shards having a first quantity of virtual shards that execute in parallel on behalf of the processing shard; processing first records of the plurality of records using the first set of virtual shards; determining an observed record age associated with the processing shard; modifying, based at least in part on a relationship between the observed record age and the target record age, the first quantity of virtual shards to a second quantity of virtual shards that execute in parallel on behalf of the processing shard; forming a second set of virtual shards, the second set of virtual shards having the second quantity of virtual shards, wherein the second set of virtual shards is formed by adding or removing one or more virtual shards to or from the first set of virtual shards; and processing second records of the plurality of records using the second set of virtual shards. 2. The computing system of claim 1 , wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards comprises increasing the first quantity of virtual shards to the second quantity of virtual shards based at least in part on an increase in the observed record age. 3. The computing system of claim 1 , wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards comprises decreasing the first quantity of virtual shards to the second quantity of virtual shards based at least in part on a decrease in the observed record age. 4. The computing system of claim 1 , wherein the operations further comprise receiving an indication of a maximum parallelization factor, wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards is based in part on the maximum parallelization factor. 5. The computing system of claim 1 , wherein the operations further comprise receiving an indication of a minimum parallelization factor, wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards is based in part on the minimum parallelization factor. 6. A computer-implemented method comprising: receiving a data stream including a plurality of records that are ordered, wherein the plurality of records are assigned to a processing shard; forming a first set of virtual shards, the first set of virtual shards having a first quantity of virtual shards that execute in parallel on behalf of the processing shard; processing first records of the plurality of records using the first set of virtual shards; determining an observed record age associated with the processing shard; modifying, based at least in part on the observed record age, the first quantity of virtual shards to a second quantity of virtual shards that execute in parallel on behalf of the processing shard; forming a second set of virtual shards, the second set of virtual shards having the second quantity of virtual shards, wherein the second set of virtual shards is formed by adding or removing one or more virtual shards to or from the first set of virtual shards; and processing second records of the plurality of records using the second set of virtual shards. 7. The computer-implemented method of claim 6 , further comprising receiving an indication of a target record age, wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards is based at least in part on a relationship between the observed record age and the target record age. 8. The computer-implemented method of claim 6 , wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards comprises increasing the first quantity of virtual shards to the second quantity of virtual shards based at least in part on an increase in the observed record age. 9. The computer-implemented method of claim 6 , wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards comprises decreasing the first quantity of virtual shards to the second quantity of virtual shards based at least in part on a decrease in the observed record age. 10. The computer-implemented method of claim 6 , further comprising receiving an indication of a maximum parallelization factor, wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards is based in part on the maximum parallelization factor. 11. The computer-implemented method of claim 6 , further comprising receiving an indication of a minimum parallelization factor, wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards is based in part on the minimum parallelization factor. 12. The computer-implemented method of claim 6 , wherein the plurality of records are ordered based at least in part on a plurality of key values, and wherein each record of the plurality of records is assigned a respective key value of the plurality of key values. 13. The computer-implemented method of claim 12 , further comprising maintaining a mapping of the plurality of key values to virtual shards, wherein the mapping indicates, for each key value of the plurality of key values, a respective virtual shard for processing of the key value. 14. One or more non-transitory computer-readable storage media having stored thereon computing instructions that, upon execution by one or computing devices, cause the one or more computing devices to perform operations comprising: receiving a data stream including a plurality of records that are ordered, wherein the plurality of records are assigned to a processing shard; forming a first set of virtual shards, the first set of virtual shards having a first quantity of virtual shards that execute in parallel on behalf of the processing shard; processing first records of the plurality of records using the first set of virtual shards; determining an observed record age associated with the processing shard; modifying, based at least in part on the observed record age, the first quantity of virtual shards to a second quantity of virtual shards that execute in parallel on behalf of the processing shard; forming a second set of virtual shards, the second set of virtual shards having the second quantity of virtual shards, wherein the second set of virtual shards is formed by adding or removing one or more virtual shards to or from the first set of virtual shards; and processing second records of the plurality of records using the second set of virtual shards. 15. The one or more non-transitory computer-readable storage media of claim 14 , further comprising receiving an indication of a target record age, wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards is based at least in part on a relationship between the observed record age and the target record age. 16. The one or more non-transitory computer-readable storage media of claim 14 , wherein the modifying of the first quantity of virtual shards to the second quantity of virtual shards comprises increasing the first quantity of vi

Assignees

Inventors

Classifications

  • Improving or facilitating administration, e.g. storage management · CPC title

  • using a plurality of independent parallel functional units · CPC title

  • G06F3/0659Primary

    Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title

  • by program, e.g. task dispatcher, supervisor, operating system · CPC title

  • Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11550505B1 cover?
A data stream may include a plurality of records that are ordered, and the plurality of records may be assigned to a processing shard. A first set of virtual shards may be formed, the first set of virtual shards having a first quantity of virtual shards that perform parallel processing operations on behalf of the processing shard. First records of the plurality of records may be processed using…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/0659. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 10 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).