Shared memory eigensolver

US2016299874A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016299874-A1
Application numberUS-201615132085-A
CountryUS
Kind codeA1
Filing dateApr 18, 2016
Priority dateNov 8, 2013
Publication dateOct 13, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein is a shared memory system that use a combination of SBR and MRRR techniques to calculate eigenpairs for dense matrices having very large numbers of rows and columns. The disclosed system allows for the use of a highly scalable tridiagonal eigensolver. The disclosed system likewise allows for allocating a different number of threads to each of the different computational stages of the eigensolver.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for computing eigenvectors, the method comprising: populating data in a matrix, the matrix densely populated with non-zero values; identifying a plurality of tile sets in the matrix according to a first matrix configuration; identifying a first set of a plurality of domains in a first tile set of the plurality of tile sets; allocating a plurality of central processing units (CPUs) for processing, wherein each of the plurality of central processing units are coupled to one or more memories; concurrently processing by the plurality of CPUs data from at least two domains of a plurality of domains in the first tile set, wherein the processing converts at least a portion of data in the at least two domains of the plurality of domains in the first tile set from a dense format into a first portion of data in a band format; and storing the at least first portion of data in the band format. 2 . The method of claim 2 , further comprising: concurrently processing by the plurality of CPUs data from at least two domains of the plurality of domains in a second tile set concurrently by the plurality of CPUs, wherein the processing converts data in the at least two domains of the plurality of domains in the second tile set from a dense format into at least a second portion of data in the band format; storing the at least second portion of data in the band format; processing by the plurality of CPUs data from at least two domains of the plurality of domains in a third tile set concurrently by the plurality of CPUs, wherein the processing converts data in the at least two domains of the plurality of domains in the third tile set from a dense format into at least a third portion of data in the band format; and storing the at least third portion of data in the band format. 3 . The method of claim 1 , further comprising: processing data from the matrix in the band format, wherein the processing of the data from the band format converts the data in the band format to at least a portion of data in a tridiagonal format; performing calculations on the at least portion of data in the tridiagonal format, wherein the calculations on the at least portion of data in the tridiagonal format generate a plurality of eigenvalues and a plurality of eigenvectors; storing the plurality of eigenvalues in a memory of the one or more memories; performing a back-transformation on the plurality of eigenvectors wherein the back transformation on the plurality of the eigenvectors converts the eigenvectors into a plurality of eigenvectors in a band format; and performing a second back-transformation, the second back transformation converting the plurality of eigenvectors in the band format in to eigenvectors in a format consistent with the dense format. 4 . The method of claim 1 , further comprising allocating memory to store the matrix. 5 . The method of claim 1 , wherein the plurality of CPUs are located at a plurality of different nodes that communicated with each other over one or more communication interfaces. 6 . The method of claim 3 , wherein a memory policy is changed from a global memory policy to a local memory policy before allocating and initializing memory for storing the computed Eigenvectors. 7 . The method of claim 1 , further comprising: identifying a problem size, wherein the problem size is associated with a number of elements required to describe an Eigenproblem; identifying a number of processor sockets to allocate to solving the Eigenproblem; identifying a tile size, wherein the tile size results in a number of tiles in a tile set that is divisible by the number of processor sockets; and identifying a padded problem size that is equal to or larger than the problem size and is evenly divisible by the tile size, wherein the matrix is populated with the number of elements required to describe the Eigenproblem and with a number of zero entries corresponding to the number of elements in the padded problem size minus the number of elements in the problem size. 8 . The method of claim 3 , further comprising: reading a second matrix from global memory; dividing the second matrix into a plurality of columns; and performing vector calculations on data contained in the plurality of columns, wherein the vector calculations are according to a column-wise formula that is equivalent to an original formula, and the calculations according to the column-wise formula are performed without reading the second matrix a third time from the global memory. 9 . The method of claim 3 , wherein a plurality of threads are allocated to execute on the plurality of CPUs according to a non-sequential distribution to each of the plurality of CPUs, and each of the plurality of CPUs correspond to a different processing socket of a plurality of processing sockets. 10 . A non-transitory computer readable storage medium having embodied thereon a program executable by one or more processing cores to perform a method for computing eigenvectors, the method comprising: populating data in a matrix, the matrix densely populated with non-zero values; identifying a plurality of tile sets in the matrix according to a first matrix configuration; identifying a first set of a plurality of domains in a first tile set of the plurality of tile sets; allocating a plurality of central processing units (CPUs) for processing, wherein each of the plurality of central processing units are coupled to one or more memories; concurrently processing by the plurality of CPUs data from at least two domains of a plurality of domains in the first tile set, wherein the processing converts at least a portion of data in the at least two domains of the plurality of domains in the first tile set from a dense format into a first portion of data in a band format; and storing the at least first portion of data in the band format. 11 . The non-transitory computer readable storage medium of claim 10 , the program further executable to: process by the plurality of CPUs data from at least two domains of the plurality of domains in a second tile set concurrently by the plurality of CPUs, wherein the processing converts data in the at least two domains of the plurality of domains in the second tile set from a dense format into at least a second portion of data in the band format; store the at least second portion of data in the band format; process by the plurality of CPUs data from at least two domains of the plurality of domains in a third tile set concurrently by the plurality of CPUs, wherein the processing converts data in the at least two domains of the plurality of domains in the third tile set from a dense format into at least a third portion of data in the band format; and store the at least third portion of data in the band format. 12 . The non-transitory computer readable storage medium of claim 10 , the program further executable to: process data from the matrix in the band format, wherein the processing of the data from the band format converts the data in the band format to at least a portion of data in a tridiagonal format; perform calculations on the at least portion of data in the tridiagonal format, wherein the calculations on the at least portion of data in the tridiagonal format generate a plurality of eigenvalues and a plurality of eigenvectors; store the plurality of eigenvalues in a memory of the one or more memories; perform a back-transformation on the plurality of eigenvectors wherein the back transformation on the plurality of the eigenvectors converts the eigenvectors into a plurality of eigenvectors in a band format; and perform a second back-trans

Assignees

Inventors

Classifications

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • with multidimensional access, e.g. row/column, matrix · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016299874A1 cover?
Disclosed herein is a shared memory system that use a combination of SBR and MRRR techniques to calculate eigenpairs for dense matrices having very large numbers of rows and columns. The disclosed system allows for the use of a highly scalable tridiagonal eigensolver. The disclosed system likewise allows for allocating a different number of threads to each of the different computational stages …
Who is the assignee on this patent?
Silicon Graphics Int Corp
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Oct 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).