Sparse linear algebra in column-oriented in-memory database

US10067909B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10067909-B2
Application numberUS-201414314750-A
CountryUS
Kind codeB2
Filing dateJun 25, 2014
Priority dateJun 25, 2014
Publication dateSep 4, 2018
Grant dateSep 4, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments relate to storing sparse matrices in an in-memory column-oriented database system. Specifically, recent hardware shifts of primary storage from disc into memory, allow execution of linear algebra queries directly in the database engine. Dynamic matrix manipulation operations (like online insertion or deletion of elements) are not covered by most linear algebra frameworks. Therefore a hybrid architecture comprises a read-optimized main structure, and a write-optimized delta structure. The resulting system layout derived from the Compressed Sparse Row (CSR) representation, integrates well with a columnar database design. Moreover, the resulting architecture is amenable to a wide range of non-numerical use cases when dictionary encoding is used. Performance in specific examples is evaluated for dynamic sparse matrix workloads, by applying work flows of nuclear science and network graphs. Embodiments allow performing linear algebra operations on large, sparse matrices commonly associated with scientific computations and analytical business applications.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: causing an engine to store an updatable column representation of data including a table in a main structure and in a delta structure of an in-memory database, the data comprising a matrix defined by an extension of structured query language (SQL) by a matrix stack outside of the engine to include a matrix data type storing matrix dimensions as metadata recognized by a data manipulation primitive of the extension, the data manipulation primitive specifying a shrinking of the matrix dimensions; causing the engine to merge the delta structure into the main structure when a delta exceeds a threshold comprising a fixed percentage of the main structure; causing the engine to perform a reorganization to, improve the compression ratio by reordering rows of the table, and sort columns of the updatable column representation according to values of the table; causing the engine to derive an index according to the sorted columns; causing the engine to store the index within the engine; causing the matrix stack to reference the index to perform an algebraic operation comprising iterative matrix-vector multiplication in an inclusive breadth-first search; causing the engine to store a result of the algebraic operation; and causing the engine to update the delta structure by appending a triple utilizing the extension of SQL. 2. A method as in claim 1 wherein the column representation comprises a Compressed Sparse Row (CSR) representation and a value comprises a row pointer. 3. A method as in claim 1 wherein the matrix is fetched using a topological reference according to an access pattern. 4. A method as in claim 1 wherein the column representation utilizes dictionary encoding. 5. A method as in claim 1 further comprising updating the delta structure utilizing a validity control vector. 6. A non-transitory computer readable storage medium embodying a computer program for performing a method, said method comprising: causing an engine to store an updatable column representation of data including a table in a main structure and in a delta structure of an in-memory database, the data comprising a matrix defined by an extension of structured query language (SQL) by a matrix stack outside of the engine to include a matrix data type storing matrix dimensions as metadata recognized by a data manipulation primitive of the extension, the data manipulation primitive specifying a shrinking of the matrix dimensions; causing the engine to merge the delta structure into the main structure when a delta exceeds a threshold comprising a fixed percentage of the main structure; causing the engine to perform a reorganization to, improve the compression ratio by reordering rows of the table, and sort columns of the updatable column representation according to values of the table; causing the engine to derive an index according to the sorted columns; causing the engine to store the index within the engine; causing the matrix stack to reference the index to perform an algebraic operation comprising iterative matrix-vector multiplication in an inclusive breadth-first search; causing the engine to store a result of the algebraic operation; and causing the engine to update the delta structure by appending a triple utilizing the extension of SQL. 7. A non-transitory computer readable storage medium as in claim 6 wherein the column representation comprises a Compressed Sparse Row (CSR) representation and a value comprises a row pointer. 8. A non-transitory computer readable storage medium as in claim 6 wherein the matrix is fetched using a topological reference according to an access pattern. 9. A non-transitory computer readable storage medium as in claim 6 wherein the column representation utilizes dictionary encoding. 10. A non-transitory computer readable storage medium as in claim 6 further comprising updating the delta structure utilizing a validity control vector. 11. A computer system comprising: one or more processors; a software program, executable on said computer system, the software program configured to: cause an engine to store an updatable column representation of data including a table in a main structure and in a delta structure of an in-memory database, the data comprising a matrix defined by an extension of structured query language (SQL) by a matrix stack outside of the engine to include a matrix data type storing matrix dimensions as metadata recognized by a data manipulation primitive of the extension, the data manipulation primitive specifying a shrinking of the matrix dimensions; cause the engine to merge the delta structure into the main structure when a delta exceeds a threshold comprising a fixed percentage of the main structure; cause the engine to perform a reorganization to, improve the compression ratio by reordering rows of the table, and sort columns of the updatable column representation according to values of the table; cause the engine to derive an index according to the sorted columns; cause the engine to store the index within the engine; cause the engine to reference the matrix stack to perform an algebraic operation; cause the engine to store a result of the algebraic operation comprising iterative matrix-vector multiplication in an inclusive breadth-first search; and cause the engine to update the delta structure by appending a triple utilizing the extension of SQL. 12. A computer system as in claim 11 wherein the column representation comprises a Compressed Sparse Row (CSR) representation and a value comprises a row pointer. 13. A computer system as in claim 11 wherein the matrix is fetched using a topological reference according to an access pattern. 14. A computer system as in claim 11 wherein the column representation utilizes dictionary encoding. 15. A computer system as in claim 11 wherein the engine is further caused to update the delta structure utilizing a validity control vector. 16. A computer system as in claim 11 wherein the engine references a transposed structure of the matrix stack to perform the algebraic operation. 17. A method as in claim 1 wherein the engine references a transposed structure of the matrix stack to perform the algebraic operation. 18. A non-transitory computer readable storage medium as in claim 6 wherein wherein the engine references a transposed structure of the matrix stack to perform the algebraic operation.

Assignees

Inventors

Classifications

  • Column-oriented storage; Management thereof · CPC title

  • G06F17/16Primary

    Matrix or vector computation {, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization (matrix transposition G06F7/78)} · CPC title

  • using compression, e.g. sparse files · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10067909B2 cover?
Embodiments relate to storing sparse matrices in an in-memory column-oriented database system. Specifically, recent hardware shifts of primary storage from disc into memory, allow execution of linear algebra queries directly in the database engine. Dynamic matrix manipulation operations (like online insertion or deletion of elements) are not covered by most linear algebra frameworks. Therefore …
Who is the assignee on this patent?
Kernert David, Koehler Frank, Lehner Wolfgang, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F17/16. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 04 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).