Translating between versions of data object schemas for data producers and data consumers
US-2023095852-A1 · Mar 30, 2023 · US
US12411816B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12411816-B2 |
| Application number | US-202217671046-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 14, 2022 |
| Priority date | Feb 14, 2022 |
| Publication date | Sep 9, 2025 |
| Grant date | Sep 9, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
One embodiment of the present invention sets forth a technique for performing schema-driven data processing. The technique includes detecting a first change to a first producer schema for a first dataset produced by a first data processor. The technique also includes performing a compatibility check between the first change and a first consumer schema associated with processing of the first dataset by a second data processor, wherein the first consumer schema includes a set of fields required by the second data processor. The technique further includes modifying an operation of the second data processor based on a result of the compatibility check.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, the method comprising: detecting, by a controller computing device, a first change to a first producer schema for a first dataset produced by a first data processor, wherein the first producer schema comprises a first set of fields corresponding to data produced by the first data processor, wherein the controller computing device manages one or more operations of a first set of data processors, including the first data processor, that are configured to act, at least in part, as data producers, and a second set of data processors, including a second data processor, that are configured to act, at least in part, as data consumers, and wherein the controller computing device terminates execution of at least one process executing on one or more data processors in the second set of data processors in response to detecting one or more schema incompatibilities; performing, by the controller computing device, a compatibility check between the first change and a first consumer schema associated with processing of the first dataset by the second data processor that is separate from the controller computing device, wherein the first consumer schema comprises a second set of fields required by the second data processor, and wherein at least one field in the second set of fields overlaps with the first set of fields; when a result of the compatibility check indicates a schema incompatibility between the first change and the first consumer schema, terminating, by the controller computing device, execution of one or more processes executing on the second data processor; and when the result of the compatibility check indicates that the first change and the first consumer schema are compatible, propagating one or more fields associated with the first change to a second producer schema for the second data processor based on metadata associated with the second data processor. 2. The computer-implemented method of claim 1 , further comprising: generating a topic schema for a topic based on one or more versions of the first producer schema for the first dataset; and transmitting the topic schema to the second data processor, wherein the topic schema is used by the second data processor to read one or more messages written to the topic by the first data processor. 3. The computer-implemented method of claim 2 , wherein the generating the topic schema comprises: specifying, within the topic schema, that a first field is required when the first field is required in each of the one or more versions of the first producer schema; and specifying, within the topic schema, that a second field is optional when the second field is not required in at least one version of the first producer schema. 4. The computer-implemented method of claim 1 , further comprising: determining that a second change to the second producer schema for a second dataset is to be propagated to a third data processor that consumes the second dataset; and propagating the second change to a third producer schema for a third dataset produced by the third data processor. 5. The computer-implemented method of claim 4 , further comprising: deploying the third data processor with the second change propagated to the third producer schema for the third dataset; and based on the deployed third data processor, propagating the second change to a fourth producer schema for a fourth dataset produced by a fourth processor that consumes the third dataset. 6. The computer-implemented method of claim 4 , wherein the determining that the second change is to be propagated to the third data processor comprises: determining that the third data processor consumes the second dataset based on a logical representation of a data pipeline; and determining that the third data processor has opted into schema propagation from the second dataset based on metadata associated with the third data processor. 7. The computer-implemented method of claim 1 , further comprising outputting a notification of the schema incompatibility to an entity associated with at least one of the first data processor or the second data processor. 8. The computer-implemented method of claim 1 , wherein the first change comprises a removal of a field from the first producer schema. 9. The computer-implemented method of claim 1 , wherein the first producer schema and the first consumer schema comprise at least one of a schema name, a schema namespace, a field name, a field type, a field nullability, or a primary key. 10. A non-transitory computer readable medium storing instructions that, when executed by a processor, cause the processor to perform the steps of: detecting, by a controller computing device, a first change to a first producer schema for a first dataset produced by a first data processor, wherein the first producer schema comprises a first set of fields corresponding to data produced by the first data processor and the first change comprises a removal of a field from the first producer schema, wherein the controller computing device manages one or more operations of a first set of data processors, including the first data processor, that are configured to act, at least in part, as data producers, and a second set of data processors, including a second data processor, that are configured to act, at least in part, as data consumers, and wherein the controller computing device terminates execution of at least one process executing on one or more data processors in the second set of data processors in response to detecting one or more schema incompatibilities; performing, by the controller computing device, a compatibility check between the first change and a first consumer schema associated with processing of the first dataset by the second data processor that is separate from the controller computing device, wherein the first consumer schema comprises a second set of fields required by the second data processor and wherein at least one field in the second set of fields overlaps with the first set of fields; when a result of the compatibility check indicates a schema incompatibility between the first change and the first consumer schema terminating, by the controller computing device, execution of one or more processes executing on the second data processor; and when the result of the compatibility check indicates that the first change and the first consumer schema are compatible, propagating one or more fields associated with the first change to a second producer schema for the second data processor based on metadata associated with the second data processor. 11. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to perform the steps of: determining that a second change to the second producer schema for a second dataset cannot be used with a first topic to which the second dataset is written; and creating a second topic associated with the second dataset, wherein a third data processor that produces the second dataset writes one or more messages that reflect the second change to the second topic. 12. The non-transitory computer readable medium of claim 11 , wherein the second change comprises at least one of a change to a field type included in the second producer schema or a change to a primary key in the second producer schema. 13. The non-transitory computer readable medium of claim 10 , wherein the instructions further cause the processor to perform the steps of: determining that a second change to the second producer schema for a second dataset is to be propagated to a third data processor that consumes the second dataset; pro
with details for schema evolution support · CPC title
Schema design and management · CPC title
Data stream processing; Continuous queries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.