Schema-driven distributed data processing

US12411816B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12411816-B2
Application numberUS-202217671046-A
CountryUS
Kind codeB2
Filing dateFeb 14, 2022
Priority dateFeb 14, 2022
Publication dateSep 9, 2025
Grant dateSep 9, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One embodiment of the present invention sets forth a technique for performing schema-driven data processing. The technique includes detecting a first change to a first producer schema for a first dataset produced by a first data processor. The technique also includes performing a compatibility check between the first change and a first consumer schema associated with processing of the first dataset by a second data processor, wherein the first consumer schema includes a set of fields required by the second data processor. The technique further includes modifying an operation of the second data processor based on a result of the compatibility check.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, the method comprising: detecting, by a controller computing device, a first change to a first producer schema for a first dataset produced by a first data processor, wherein the first producer schema comprises a first set of fields corresponding to data produced by the first data processor, wherein the controller computing device manages one or more operations of a first set of data processors, including the first data processor, that are configured to act, at least in part, as data producers, and a second set of data processors, including a second data processor, that are configured to act, at least in part, as data consumers, and wherein the controller computing device terminates execution of at least one process executing on one or more data processors in the second set of data processors in response to detecting one or more schema incompatibilities; performing, by the controller computing device, a compatibility check between the first change and a first consumer schema associated with processing of the first dataset by the second data processor that is separate from the controller computing device, wherein the first consumer schema comprises a second set of fields required by the second data processor, and wherein at least one field in the second set of fields overlaps with the first set of fields; when a result of the compatibility check indicates a schema incompatibility between the first change and the first consumer schema, terminating, by the controller computing device, execution of one or more processes executing on the second data processor; and when the result of the compatibility check indicates that the first change and the first consumer schema are compatible, propagating one or more fields associated with the first change to a second producer schema for the second data processor based on metadata associated with the second data processor. 2. The computer-implemented method of claim 1 , further comprising: generating a topic schema for a topic based on one or more versions of the first producer schema for the first dataset; and transmitting the topic schema to the second data processor, wherein the topic schema is used by the second data processor to read one or more messages written to the topic by the first data processor. 3. The computer-implemented method of claim 2 , wherein the generating the topic schema comprises: specifying, within the topic schema, that a first field is required when the first field is required in each of the one or more versions of the first producer schema; and specifying, within the topic schema, that a second field is optional when the second field is not required in at least one version of the first producer schema. 4. The computer-implemented method of claim 1 , further comprising: determining that a second change to the second producer schema for a second dataset is to be propagated to a third data processor that consumes the second dataset; and propagating the second change to a third producer schema for a third dataset produced by the third data processor. 5. The computer-implemented method of claim 4 , further comprising: deploying the third data processor with the second change propagated to the third producer schema for the third dataset; and based on the deployed third data processor, propagating the second change to a fourth producer schema for a fourth dataset produced by a fourth processor that consumes the third dataset. 6. The computer-implemented method of claim 4 , wherein the determining that the second change is to be propagated to the third data processor comprises: determining that the third data processor consumes the second dataset based on a logical representation of a data pipeline; and determining that the third data processor has opted into schema propagation from the second dataset based on metadata associated with the third data processor. 7. The computer-implemented method of claim 1 , further comprising outputting a notification of the schema incompatibility to an entity associated with at least one of the first data processor or the second data processor. 8. The computer-implemented method of claim 1 , wherein the first change comprises a removal of a field from the first producer schema. 9. The computer-implemented method of claim 1 , wherein the first producer schema and the first consumer schema comprise at least one of a schema name, a schema namespace, a field name, a field type, a field nullability, or a primary key. 10. A non-transitory computer readable medium storing instructions that, when executed by a processor, cause the processor to perform the steps of: detecting, by a controller computing device, a first change to a first producer schema for a first dataset produced by a first data processor, wherein the first producer schema comprises a first set of fields corresponding to data produced by the first data processor and the first change comprises a removal of a field from the first producer schema, wherein the controller computing device manages one or more operations of a first set of data processors, including the first data processor, that are configured to act, at least in part, as data producers, and a second set of data processors, including a second data processor, that are configured to act, at least in part, as data consumers, and wherein the controller computing device terminates execution of at least one process executing on one or more data processors in the second set of data processors in response to detecting one or more schema incompatibilities; performing, by the controller computing device, a compatibility check between the first change and a first consumer schema associated with processing of the first dataset by the second data processor that is separate from the controller computing device, wherein the first consumer schema comprises a second set of fields required by the second data processor and wherein at least one field in the second set of fields overlaps with the first set of fields; when a result of the compatibility check indicates a schema incompatibility between the first change and the first consumer schema terminating, by the controller computing device, execution of one or more processes executing on the second data processor; and when the result of the compatibility check indicates that the first change and the first consumer schema are compatible, propagating one or more fields associated with the first change to a second producer schema for the second data processor based on metadata associated with the second data processor. 11. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to perform the steps of: determining that a second change to the second producer schema for a second dataset cannot be used with a first topic to which the second dataset is written; and creating a second topic associated with the second dataset, wherein a third data processor that produces the second dataset writes one or more messages that reflect the second change to the second topic. 12. The non-transitory computer readable medium of claim 11 , wherein the second change comprises at least one of a change to a field type included in the second producer schema or a change to a primary key in the second producer schema. 13. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to perform the steps of: determining that a second change to the second producer schema for a second dataset is to be propagated to a third data processor that consumes the second dataset; pro

Assignees

Inventors

Classifications

  • with details for schema evolution support · CPC title

  • G06F16/211Primary

    Schema design and management · CPC title

  • Data stream processing; Continuous queries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12411816B2 cover?
One embodiment of the present invention sets forth a technique for performing schema-driven data processing. The technique includes detecting a first change to a first producer schema for a first dataset produced by a first data processor. The technique also includes performing a compatibility check between the first change and a first consumer schema associated with processing of the first dat…
Who is the assignee on this patent?
Netflix Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/211. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 09 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).