Transparent efficiency for in-memory execution of map reduce job sequences
US-9147373-B2 · Sep 29, 2015 · US
US9342564B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9342564-B2 |
| Application number | US-201213590805-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 21, 2012 |
| Priority date | Feb 27, 2012 |
| Publication date | May 17, 2016 |
| Grant date | May 17, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A distributed data processing apparatus and method through hardware acceleration are provided. The data processing apparatus includes a mapping node including mapping units configured to process input data in parallel to generate and output mapping results. The data processing apparatus further includes a shuffle node including shuffle units and a memory buffer, the shuffle units configured to process the mapping results output from the mapping units in parallel to generate and output shuffle results, and the shuffle node configured to write the shuffle results output from the shuffle units in the memory buffer. The data processing apparatus further includes a merge node including merge units configured to merge the shuffle results written in the memory buffer to generate merging results.
Opening claim text (preview).
What is claimed is: 1. A data processing apparatus comprising: a mapping node comprising mapping units configured to process input data in parallel to generate and output mapping results; a shuffle node comprising shuffle units and a memory buffer, the shuffle units configured to process the mapping results output from the mapping units in parallel to generate and output shuffle results, and the shuffle node configured to write the shuffle results output from the shuffle units in the memory buffer; a merge node comprising merge units configured to merge the shuffle results written in the memory buffer to generate merging results; and an input distribution node configured to distribute the input data among the mapping units on a record-by-record basis, wherein a number of the mapping units is determined based on a unit time taken by the input distribution node to input a record of the input data into one of the mapping units, or a unit time taken by the one of the mapping units to process the record, or any combination thereof. 2. The data processing apparatus of claim 1 , further comprising: a combining node configured to perform a reduction operation on the merging results to generate and output a combining result, the merging results being output from the merge units. 3. The data processing apparatus of claim 2 , further comprising: a reduce node configured to perform a reduction operation on the merging results output from the merge units or the combining result output from the combining node to generate a reduce result. 4. The data processing apparatus of claim 1 , wherein the data processing apparatus is implemented through hardware acceleration on a field programmable gate array. 5. A data processing apparatus comprising: a mapping node comprising mapping units configured to process input data in parallel to generate and output mapping results; a shuffle node comprising shuffle units and a memory buffer, the shuffle units configured to process the mapping results output from the mapping units in parallel to generate and output shuffle results, and the shuffle node configured to write the shuffle results output from the shuffle units in the memory buffer; a merge node comprising merge units configured to merge the shuffle results written in the memory buffer to generate merging results; and an output distribution node configured to combine the mapping results output from the mapping units into a piece of data, and distribute the piece of data among the shuffle units on a record-by-record basis, wherein a number of the shuffle units is determined based on a unit time taken by the output distribution node to input a record of the piece of data into one of the shuffle units, or a unit time taken by the one of the shuffle units to process the record, or any combination thereof. 6. A data processing apparatus comprising: a mapping node comprising mapping units configured to process input data in parallel to generate and output mapping results; a shuffle node comprising shuffle units and a memory buffer, the shuffle units configured to process the mapping results output from the mapping units in parallel to generate and output shuffle results, and the shuffle node configured to write the shuffle results output from the shuffle units in the memory buffer; and a merge node comprising merge units configured to merge the shuffle results written in the memory buffer to generate merging results, wherein the memory buffer comprises a matrix of memory section rows and memory section columns, and wherein the shuffle units correspond to the memory section rows, respectively, and the shuffle node is further configured to write the shuffle results output from the respective shuffle units in the corresponding memory section rows. 7. A data processing apparatus comprising: a mapping node comprising mapping units configured to process input data in parallel to generate and output mapping results; a shuffle node comprising shuffle units and a memory buffer, the shuffle units configured to process the mapping results output from the mapping units in parallel to generate and output shuffle results, and the shuffle node configured to write the shuffle results output from the shuffle units in the memory buffer; and a merge node comprising merge units configured to merge the shuffle results written in the memory buffer to generate merging results, wherein the memory buffer comprises a matrix of memory section rows and memory section columns, and wherein the merge units correspond to the memory section columns, respectively, and the merge units are further configured to merge the respective shuffle results written in the corresponding memory section columns to generate the merging results. 8. A data processing method comprising: processing input data in parallel to generate mapping results at a mapping node: processing the mapping results in parallel to generate shuffle results at a shuffle node; writing the shuffle results in a memory buffer; merging the shuffle results written in the memory buffer to generate merging results; distributing the input data to mapping units of the mapping node on a record-by-record basis; and determining a number of the mapping units based on a unit time taken to input a record of the input data into one of the mapping units, or a unit time taken by the one of the mapping units to process the record, or any combination thereof. 9. The data processing method of claim 8 , further comprising: combining the mapping results into a piece of data; and distributing the piece of data among shuffle units of the shuffle node on a record-by-record basis. 10. The data processing method of claim 8 , further comprising: performing a reduction operation on the merging results to generate a combining result. 11. The data processing method of claim 10 , further comprising: performing a reduction operation on the merging results or the combining result to generate a reduce result. 12. The data processing method of claim 8 , wherein the data processing method is implemented through hardware acceleration on a field programmable gate array. 13. A data processing method comprising: processing input data in parallel to generate mapping results; processing the mapping results in parallel a shuffle node; writing the shuffle results in a memory buffer; and merging the shuffle results written in the memory buffer to generate merging results, wherein the memory buffer comprises a matrix of memory section rows and memory section columns, and shuffle units of the shuffle node correspond to the memory section rows, respectively, and wherein the writing of the shuffle results comprises writing the shuffle results output from the respective shuffle units in the corresponding memory section rows. 14. A data processing method comprising: processing input data in parallel to generate mapping results; processing the mapping results in parallel to generate shuffle results; writing the shuffle results in a memory buffer; and merging the shuffle results written in the memory buffer to generate merging results at a merge node, wherein the memory buffer comprises a matrix of memory section rows and memory section columns, and merge units of the merge node correspond to the memory section columns, respectively, and wherein the merging of the shuffle results comprises merging the respective shuffle results written in the corresponding memory section columns to generate the merging results.
Distributed queries · CPC title
using a plurality of independent parallel functional units · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.