Functional unit having tree structure to support vector sorting algorithm and other algorithms
US-9405538-B2 · Aug 2, 2016 · US
US2016283549A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016283549-A1 |
| Application number | US-201514671954-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 27, 2015 |
| Priority date | Mar 27, 2015 |
| Publication date | Sep 29, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Apparatuses, systems, and methods may sort a value. A sorter may include a data exchanger to exchange a value that is to be held by a first execution element of a plurality of execution elements with a value that is to be held by a second execution element of the plurality of execution elements. The sorter may also include a data comparator to compare a value that is to be held by the first execution element with the value from the second execution element and a value that is to be held by the second execution element with the value from the first execution element. The values involved in the exchange and the compare may remain locally sorted and globally shuffled in a final output of the sorter that is to be stored.
Opening claim text (preview).
We claim: 1 . A computing system comprising: a hardware execution unit to implement an execution block including a plurality of execution elements; and a sorter including: a data exchanger to exchange a value that is to be held by a first execution element of a plurality of execution elements with a value that is to be held by a second execution element of the plurality of execution elements; and a data comparator to compare a value that is to be held by the first execution element with the value from the second execution element and a value that is to be held by the second execution element with the value from the first execution element, wherein the values involved in the exchange and the compare are to remain locally sorted and globally shuffled in a final output of the sorter that is to be stored. 2 . The computing system of claim 1 , wherein the sorter is to include one or more of: an XOR operator to execute an XOR operation for each of the execution elements based on an execution element identifier (ID) of each of the execution elements; and an AND operation to execute an AND operation for each of the execution elements based on the execution element ID of each of the execution elements. 3 . The computing system of claim 1 , further including a data storer to store the values in the final output of the sorter to memory in a globally sorted order. 4 . A sorter comprising: a data exchanger to exchange a value that is to be held by a first execution element of a plurality of execution elements with a value that is to be held by a second execution element of the plurality of execution elements; and a data comparator to compare a value that is to be held by the first execution element with the value from the second execution element and a value that is to be held by the second execution element with the value from the first execution element, wherein the values involved in the exchange and the compare are to remain locally sorted and globally shuffled in a final output of the sorter that is to be stored. 5 . The sorter of claim 4 , wherein a hardware execution unit is to implement an execution block including the plurality of execution elements, wherein the exchange is to occur in a phase of a sorting network, and wherein the compare is to occur in the same phase of the sorting network after the exchange. 6 . The sorter of claim 4 , further including a data assigner to assign a pair of vectors each including a set of values to each of the execution elements, wherein the pair of vectors are to be involved in the exchange and the compare. 7 . The sorter of claim 4 , further including a combiner to merge two vectors that are each to include a set of values at each of the execution elements to sort the set of values of each of the vectors into a locally sorted order. 8 . The sorter of claim 4 , further including a shuffler to directly exchange the value from the first execution element with the value from second execution element without an access to memory. 9 . The sorter of claim 4 , further including one or more of: an XOR operator to execute an XOR operation for each of the execution elements based on an execution element identifier (ID) of each of the execution elements; and an AND operator to execute an AND operation for each of the execution elements based on the execution element ID of each of the execution elements. 10 . The sorter of claim 9 , wherein the XOR operator is to execute the XOR operation for each of the execution elements between the execution element identifier ID of each of the execution elements and a number of a stage in a sorting network. 11 . The sorter of claim 4 , further including: an XOR operator to execute an initial XOR operation for an initial phase of a multi-phase stage of a sorting network between an execution element ID of each of the execution elements and a number of the multi-phase stage, wherein each output of the XOR operation is to identify an exchange partner for each of the execution elements in the initial phase; and a value selector to select a greater value of each exchange partner in the initial phase for the exchange. 12 . The sorter of claim 4 , further including: an XOR operator to execute a subsequent XOR operation for each subsequent phase of a multi-phase stage of a sorting network between an execution element ID of each of the execution elements and a number beginning with one that doubles with each subsequent phase of the multi-phase stage, wherein each output of the subsequent XOR operation is to identify an exchange partner for each of the execution elements in the subsequent phase; an AND operator to execute an AND operation for each of the execution elements between each output of the subsequent XOR operation and the number beginning with one that doubles, wherein each output of the AND operation is to identify one or more of a value and a channel to be involved in the exchange in the subsequent phase; and a value selector to: select a greater value when an output of the AND operation is to be zero; and select a lesser value when the output of the AND operation is to be one. 13 . The sorter of claim 4 , further including a data storer to store the values in the final output of the sorter to memory in a globally sorted order. 14 . The sorter of claim 13 , further including: a local ID establisher to determine a local ID for each execution element ID of each of the execution elements; and a mirrored ID establisher to determine a mirrored ID for each local ID, wherein the data storer is to store the values in the final output based on each mirrored ID. 15 . The sorter of claim 14 , wherein the local ID establisher is to determine a binary pattern for each execution element ID to determine each local ID, and wherein the mirrored ID establisher is to determine a mirrored binary pattern for each local ID to determine each mirrored ID. 16 . The sorter of claim 14 , wherein the data storer is to scatter the values in the final output to a corresponding location in the memory based on each mirrored ID. 17 . A method comprising: exchanging a value that is held by a first execution element of a plurality of execution elements with a value that is held by a second execution element of the plurality of execution elements; and comparing a value that is held by the first execution element with the value from the second execution element and a value that is held by the second execution element with the value from the first execution element, wherein the values involved in the exchange and the compare remain locally sorted and globally shuffled in a final output that is to be stored. 18 . The method of claim 17 , further including: assigning a pair of vectors each including a set of values to each of the execution elements, wherein the pair of vectors are involved in the exchange and the compare; directly exchanging the value from the first execution element with the value from second execution element without an access to memory; and merging two vectors that each include a set of values corresponding at each of the execution elements to sort the set of values of each of the vectors into a locally sorted order. 19 . The method of claim 17 , further including one or more of: executing an XOR operation for each of the execution elements based on an execution element identifier (ID) of each of the execution elements; and executing an AND operation for each of the execution elements based on the execution element ID of each of th
Physics · mapped topic
Physics · mapped topic
using a mask · CPC title
controlled by a single instruction for multiple data lanes [SIMD] · CPC title
Instructions to perform operations on packed data, e.g. vector, tile or matrix operations · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.