Methods and apparatus for motion search refinement in a SIMD array processor

US9300958B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9300958-B2
Application numberUS-201414187499-A
CountryUS
Kind codeB2
Filing dateFeb 24, 2014
Priority dateApr 26, 2006
Publication dateMar 29, 2016
Grant dateMar 29, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various approaches for motion search refinement in a processing element are discussed. A k/2+L+k/2 register stores an expanded row of an L×L macro block. A k-tap filter horizontally interpolates over the expanded row generating horizontal interpolation results. A transpose storage unit stores the interpolated results generated by the k-tap filter for k/2+L+k/2 entries, wherein rows or columns of data may be read out of the transpose storage unit in pipelined register stages. A k-tap filter vertically interpolates over the pipelined register stages generating vertical interpolation results.

First claim

Opening claim text (preview).

We claim: 1. An apparatus for a motion search refinement function, the apparatus comprising: a processing element (PE) having an execution unit and the PE coupled by a first interconnection path to a local PE data memory, wherein the PE executes program instructions that output a memory address and control information from the execution unit to initiate operation of a motion search refinement function of video compression; and a hardware assist (HA) unit coupled to the execution unit and separately coupled by a second interconnection path to the local PE data memory, the HA unit responds to the memory address and the control information received from the PE to initiate the motion search refinement function on the HA unit, to read row data over the second interconnection path from the local PE data memory that is filtered by a first 2J-tap filter and interleaved to form interpolated row results overlapped with writing of the interpolated row results to a transpose memory configured to operate in the HA unit, wherein the row data is a row of pixels from a (J+L+J)×(J+L+J) pixel window, L is a number of the form 2 n , n is an integer greater than one, and J is a positive integer, wherein the interpolated row results are read in transpose order from the transpose memory as transposed output and then filtered by a second 2J-tap filter and interleaved to form interpolated column results that are stored over the second interconnection path in the local PE data memory to assemble a (2L+3)×(2L+3) search window for the motion search refinement function of video compression, wherein (2L+3) is greater than (J+L+J). 2. The apparatus of claim 1 , wherein the row data is filtered by a first finite impulse response (FIR) filter. 3. The apparatus of claim 1 , wherein the column pixels are filtered by a second finite impulse response (FIR) filter. 4. The apparatus of claim 1 further comprising: clusters of four PEs and clusters of four HA units both arranged in a 4 x 4 array organization, wherein each PE is separately coupled to an associated HA unit that provides a separate motion search refinement function for each PE; and clusters of four local PE data memories, wherein each local PE data memory is separately coupled by a corresponding first interconnection path to an associated PE and separately coupled by a corresponding second interconnection path to the associated HA unit that provides the separate motion search refinement function for the associated PE. 5. The apparatus of claim 1 , wherein the transpose memory comprises: a byte addressable memory; and transpose address generation logic to selectively generate addresses to the byte addressable memory for storing the interpolated row results in (J+L+J)×(2L+3) form and for reading the interpolated row results in transpose order as the transposed output in (2L+3)×(J+L+J) form. 6. The apparatus of claim 1 , wherein the HA unit comprises: execution state machines and a control unit configured to control a motion search refinement execution pipeline in response to the memory address and control information received from the PE, wherein the motion search refinement execution pipeline forms the interpolated column results that are stored over the second interconnection path in parallel with PE operations over the first interconnection path. 7. A method for motion search refinement using a hardware assist unit, the method comprising: sending a memory address and control information from a processing element (PE) in response to the PE executing program instructions to initiate operation of a motion search refinement function of video compression in a hardware assist (HA) unit, wherein the PE is coupled by a first interconnection path to a local PE data memory; reading row data over a second interconnection path from the local PE data memory, wherein the row data is filtered by a first 2J-tap filter and interleaved in the HA unit to form interpolated row results; writing the interpolated row results to a transpose memory configured to operate in the HA unit, wherein the row data is a row of pixels from a (J+L+J)×(J+L+J) pixel window, L is a number of the form 2 n , n is an integer greater than one, and J is a positive integer; reading the interpolated row results in transpose order from the transpose memory that are filtered by a second 2J-tap filter and interleaved to form interpolated column results; and storing over the second interconnection path the interpolated column results in the local PE data memory to assemble a (2L+3)×(2L+3) search window for the motion search refinement function of video compression, wherein ( 2 L+ 3 ) is greater than (J+L+J). 8. The method of claim 7 further comprising: issuing a motion search refinement instruction which when executed sends the memory address and control information from the PE to the HA unit to initiate operation of the motion search refinement function to store the interpolated column results over the second interconnection path in parallel with PE operations over the first interconnection path. 9. The method of claim 7 , wherein the (2L+3)×(2L+3) search window is used for a half pel refinement function. 10. The method of claim 7 further comprising: initiating in parallel a plurality of motion search refinement operations in k 2 HA units organized in k clusters of k HA units, wherein each cluster of k HA units includes k PEs and k local PE data memories. 11. The method of claim 7 further comprises: reading the row data from a (J+L+J)×(J+L+J) pixel window stored in the local PE data memory, wherein the row data is read as separate rows of J+L+J pixels; filtering each of the separate rows of J+L+J pixels to generate row results which are interleaved with selected row pixels to form the interpolated row results; and writing the interpolated row results to the transpose memory, wherein the writing the interpolated row results is overlapped with the reading the row data and the filtering to form the interpolated row results. 12. The method of claim 11 , wherein the filtering is provided by a first finite impulse response (FIR) filter. 13. The method of claim 7 further comprises: reading the interpolated row results in transpose order from the transpose memory as column pixels; filtering the column pixels to generate column results which are interleaved with selected column pixels to form interpolated column results; and writing the interpolated column results to the local PE data memory, wherein reading the interpolated row results in transpose order is overlapped with the filtering the column pixels. 14. The method of claim 13 , wherein the filtering is provided by a second finite impulse response (FIR) filter. 15. A computer readable non-transitory medium encoded with computer readable program data and code, the program data and code when executed operable to: send a memory address and control information from a processing element (PE) in response to the PE executing program instructions to initiate operation of a motion search refinement function of video compression in a hardware assist (HA) unit, wherein the PE is coupled by a first interconnection path to a local PE data memory; read row data over a second interconnection path from the local PE data memory, wherein the row data is filtered by a first 2J-tap filter and interleaved in the HA unit to form interpolated row results; write the interpolated row results to a transpose memory configured to operate in the HA unit, wherein the row data is a row of pixels from a (J+L+J)×(J+L+J) pixel window, L is a number of the form 2 n , n is an integer greater than o

Assignees

Inventors

Classifications

  • H04N19/433Primary

    characterised by techniques for memory access · CPC title

  • using parallelised computational arrangements · CPC title

  • Hardware specially adapted for motion estimation or compensation · CPC title

  • Motion estimation characterised by a search window with variable size or shape · CPC title

  • H04N19/61Primary

    in combination with predictive coding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9300958B2 cover?
Various approaches for motion search refinement in a processing element are discussed. A k/2+L+k/2 register stores an expanded row of an L×L macro block. A k-tap filter horizontally interpolates over the expanded row generating horizontal interpolation results. A transpose storage unit stores the interpolated results generated by the k-tap filter for k/2+L+k/2 entries, wherein rows or columns o…
Who is the assignee on this patent?
Stojancic Mihailo M, Pechanek Gerald George, Altera Corp
What technology area does this patent fall under?
Primary CPC classification H04N19/433. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Mar 29 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).