Data processing method in stream computing system, control node, and stream computing system

US10630737B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10630737-B2
Application numberUS-201816112236-A
CountryUS
Kind codeB2
Filing dateAug 24, 2018
Priority dateMar 6, 2014
Publication dateApr 21, 2020
Grant dateApr 21, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A stream computer system and a method for processing a data stream in a stream computing system are disclosed. The method includes a first working node invokes at least one execution unit to process a data stream according to an initial parallelism degree, a control node collects information reflecting data traffic between the first working node and a second working node, and information reflecting data processing speed of the first working node, determines an optimized parallelism degree for the first working node according to the collected information, and adjusts the parallelism degree of the first working node to be consistent with the optimized parallelism degree.

First claim

Opening claim text (preview).

What is claimed is: 1. A stream computing system, comprising: a control node; and a plurality of working nodes coupled to the control node, wherein a first working node of the plurality of working nodes is configured to invoke at least one execution unit to process a data stream being input to the first working node, wherein a quantity of execution unit being invoked by the first working node to process the data stream is indicated by an initial parallelism degree of the first working node, wherein the control node is configured to: obtain information reflecting input data traffic and data processing speed of the first working node; determine an optimized parallelism degree for the first working node according to the obtained information; and adjust the initial parallelism degree of the first working node to be consistent with the optimized parallelism degree, wherein the data stream comprises a plurality of tuples, and the optimized parallelism degree is determined according to a ratio of a tuple arrival time to a tuple processing time, and wherein the tuple arrival time and the tuple processing time are estimated according to the obtained information. 2. The stream computing system according to claim 1 , wherein the control node is further configured to adjust the initial parallelism degree of the first working node by adding at least one execution unit or deleting at least one execution unit for the first working node according to the optimized parallelism degree, and wherein a quantity of execution unit being invoked by the first working node to process the data stream after the adjustment is same as the optimized parallelism degree. 3. The stream computing system according to claim 1 , wherein the tuple arrival time reflects an average time interval at which a tuple arrives an execution unit of the first working node, and tuple processing time reflects an average time required by the execution unit to process a tuple. 4. The stream computing system according to claim 2 , wherein the control node is further configured to: adjust a data distribution policy of a second working node according to the added or deleted at least one execution unit, wherein the data distribution policy indicates data distribution path; and send the adjusted data distribution policy to the second working node. 5. The stream computing system according to claim 4 , wherein the initial parallelism degree of the first working node is configured in a flow graph that represents a data processing logic of the stream computing system. 6. The stream computing system according to claim 5 , wherein the second working node is an upstream working node of the first working node according to the flow graph. 7. A method for processing a data stream in a stream computing system, comprising: invoking, by a control node, at least one execution unit to process a data stream being input to a first working node, wherein a quantity of execution unit being invoked by the first working node to process the data stream is indicated by an initial parallelism degree of the first working node, and wherein the data stream comprises a plurality of tuples; obtaining, by the control node, information reflecting input data traffic and data processing speed of the first working node; determining, by the control node, an optimized parallelism degree for the first working node according to a ratio of a tuple arrival time to a tuple processing time, wherein the tuple arrival time and the tuple processing time are estimated according to the obtained information; and adjusting, by the control node, the initial parallelism degree of the first working node to be consistent with the optimized parallelism degree. 8. The method according to claim 7 , wherein adjusting the initial parallelism degree of the first working node comprises adding at least one execution unit or deleting at least one execution unit for the first working node according to the optimized parallelism degree, and a quantity of execution units invoked by the first working node to process the data stream after the adjustment is the same as the optimized parallelism degree. 9. The method according to claim 7 , wherein the tuple arrival time reflects an average time interval at which a tuple arrives an execution unit of the first working node, and tuple processing time reflects an average time required by the execution unit to process a tuple. 10. The method according to claim 8 , further comprising: adjusting a data distribution policy of a second working node according to the added or deleted at least one execution unit, wherein the data distribution policy indicates data distribution path; and sending the adjusted data distribution policy to the second working node. 11. The method according to claim 10 , wherein the initial parallelism degree of the first working node is configured in a flow graph that represents a data processing logic of the stream computing system. 12. The method according to claim 11 , wherein the second working node is an upstream working node of the first working node according to the flow graph. 13. A non-transitory computer readable medium including instructions, which, when executed by a processor, will cause the processor to perform the following operations: invoking at least one execution unit to process a data stream being input to a first working node; wherein a quantity of execution unit being invoked is indicated by an initial parallelism degree of the first working node, and wherein the data stream comprises a plurality of tuples; obtaining information reflecting input data traffic and data processing speed of the first working node; determining an optimized parallelism degree for the first working node according to a ratio of a tuple arrival time to a tuple processing time, wherein the tuple arrival time and the tuple processing time are estimated according to the obtained information; and adjusting the initial parallelism degree of the first working node to be consistent with the optimized parallelism degree. 14. The non-transitory computer readable medium according to claim 13 , wherein the initial parallelism degree of the first working node is adjusted by adding at least one execution unit or deleting at least one execution unit for the first working node according to the optimized parallelism degree, and a quantity of execution units invoked to process the data stream after the adjustment is the same as the optimized parallelism degree. 15. The non-transitory computer readable medium according to claim 13 , wherein the tuple arrival time reflects an average time interval at which a tuple arrives an execution unit of the first working node, and tuple processing time reflects an average time required by the execution unit to process a tuple. 16. The non-transitory computer readable medium according to claim 13 , wherein the initial parallelism degree of the first working node is configured in a flow graph that represents a data processing logic.

Assignees

Inventors

Classifications

  • Allocation of resources per group of connections, e.g. per group of users · CPC title

  • G06F9/5083Primary

    Techniques for rebalancing the load in a distributed system · CPC title

  • for graphical visualisation of monitoring data · CPC title

  • where at least one of the additional parallel sessions is real time or time sensitive, e.g. white board sharing, collaboration or spawning of a subconference · CPC title

  • G06F9/5005Primary

    to service a request · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10630737B2 cover?
A stream computer system and a method for processing a data stream in a stream computing system are disclosed. The method includes a first working node invokes at least one execution unit to process a data stream according to an initial parallelism degree, a control node collects information reflecting data traffic between the first working node and a second working node, and information reflec…
Who is the assignee on this patent?
Huawei Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F9/5083. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 21 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).