Technologies for dividing work across accelerator devices
US-2024143410-A1 · May 2, 2024 · US
US9870267B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9870267-B2 |
| Application number | US-38644306-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 22, 2006 |
| Priority date | Mar 22, 2006 |
| Publication date | Jan 16, 2018 |
| Grant date | Jan 16, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods and apparatus to provide virtualized vector processing are disclosed. In one embodiment, a processor includes a decode unit to decode a first instruction into a decoded first instruction and a second instruction into a decoded second instruction, and an execution unit to: execute the decoded first instruction to cause allocation of a first portion of one or more operations corresponding to a virtual vector request to a first processor core, and generation of a first signal corresponding to a second portion of the one or more operations to cause allocation of the second portion to a second processor core, and execute the decoded second instruction to cause a first computational result corresponding to the first portion of the one or more operations and a second computational result corresponding to the second portion of the one or more operations to be aggregated and stored to a memory location.
Opening claim text (preview).
What is claimed is: 1. An apparatus comprising: a decode unit to decode a first instruction into a decoded first instruction and a second instruction into a decoded second instruction; and an execution unit to: execute the decoded first instruction to cause a first logic circuit to allocate a first portion of one or more operations corresponding to a virtual vector request to a first processor core, and a second logic circuit to generate a first signal corresponding to a second portion of the one or more operations to cause allocation of the second portion to a second processor core, and execute the decoded second instruction to cause a first computational result corresponding to the first portion of the one or more operations and a second computational result corresponding to the second portion of the one or more operations to be aggregated and stored to a memory location. 2. The apparatus of claim 1 , wherein the second processor core comprises: a third logic circuit to allocate the second portion of the one or more operations to the second processor core; and a fourth logic circuit to generate a second signal corresponding to a third portion of the one or more operations to cause allocation of the third portion to a third processor core. 3. The apparatus of claim 2 , wherein the third logic circuit allocates the second portion based on information corresponding to one or more available resources of the second processor core. 4. The apparatus of claim 2 , wherein the third logic circuit allocates the second portion based on information corresponding to the first signal. 5. The apparatus of claim 2 , further comprising a fifth logic circuit to maintain information corresponding to one or more available resources of the second processor core. 6. The apparatus of claim 2 , wherein the third logic circuit allocates the second portion based on overhead information corresponding to communication with one or more of the first processor core or a third processor core. 7. The apparatus of claim 2 , further comprising a fifth logic circuit to transmit an acknowledgment signal to the first processor core after the second processor core has retired one or more operations corresponding to the second portion. 8. The apparatus of claim 1 , wherein the first processor core comprises one or more of the first logic circuit or the second logic circuit. 9. The apparatus of claim 1 , further comprising a third processor core. 10. The apparatus of claim 1 , further comprising a third logic circuit to maintain information corresponding to one or more available resources of the first processor core. 11. The apparatus of claim 1 , wherein the first logic circuit allocates the first portion based on information corresponding to one or more available resources of the first processor core. 12. The apparatus of claim 1 , wherein the second processor core is to generate a second signal, and the first logic circuit allocates the first portion based on information corresponding to the second signal. 13. The apparatus of claim 1 , further comprising a third logic circuit to schedule one or more tasks corresponding to the first portion. 14. The apparatus of claim 1 , wherein the first logic circuit allocates the first portion based on overhead information corresponding to communication with the second processor core. 15. The apparatus of claim 1 wherein the memory location is an input operand of the second instruction. 16. The apparatus of claim 1 , wherein the first processor core further comprises a third logic circuit to aggregate the first computational result corresponding to the first portion and the second computational result corresponding to the second portion. 17. The apparatus of claim 1 , wherein the first signal comprises data corresponding to one or more of an identifier of the first processor core, a starting value corresponding to the second portion, or an end value corresponding to the second portion. 18. A method comprising: decoding a first instruction into a decoded first instruction with a decode unit of a processor; decoding a second instruction into a decoded second instruction with the decode unit of the processor; executing the decoded first instruction with an execution unit of the processor to allocate a first portion of one or more operations corresponding to a virtual vector request to a first processor core, and generate a first signal corresponding to a second portion of the one or more operations to cause allocation of the second portion to a second processor core; and executing the decoded second instruction with the execution unit of the processor to cause a first computational result corresponding to the first portion of the one or more operations and a second computational result corresponding to the second portion of the one or more operations to be aggregated and stored to a memory location. 19. The method of claim 18 , further comprising allocating the first portion based on information corresponding to one or more available resources of the first processor core. 20. The method of claim 18 , further comprising maintaining information corresponding to one or more available resources of one or more of the first processor core or the second processor core. 21. The method of claim 18 , further comprising allocating the first portion based on overhead information corresponding to communication between the first processor core and the second processor core. 22. The method of claim 18 , further comprising transmitting an acknowledgment signal to the first processor core after the second processor core has retired the second portion of the one or more operations. 23. A system comprising: a memory; and a processor comprising: a decode unit to decode a first instruction into a decoded first instruction and a second instruction into a decoded second instruction, and an execution unit to: execute the decoded first instruction to cause a first logic circuit to allocate a first portion of one or more operations corresponding to a virtual vector request to a first processor core, and a second logic circuit to generate a first signal corresponding to a second portion of the one or more operations to cause allocation of the second portion to a second processor core, and execute the decoded second instruction to cause a first computational result corresponding to the first portion of the one or more operations and a second computational result corresponding to the second portion of the one or more operations to be aggregated to a register and stored to a memory location of the memory when completely aggregated. 24. The system of claim 23 , wherein the first processor core comprises the first logic circuit. 25. The system of claim 23 , wherein the first logic circuit allocates one or more of the first portion or the second portion based on information corresponding to one or more available resources of at least one of the first processor core or the second processor core. 26. The system of claim 23 , further comprising a third logic circuit of the first processor core to aggregate the first computational result corresponding to the first portion and the second computational result corresponding to the second portion. 27. The system of claim 23 , further comprising a third logic circuit of the processor to schedule one or more tasks corresponding to one or more of the first p
considering hardware capabilities · CPC title
Vector processors · CPC title
Offload · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.