Molecular data storage systems and methods

US12443366B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12443366-B2
Application numberUS-202017101824-A
CountryUS
Kind codeB2
Filing dateNov 23, 2020
Priority dateMay 21, 2018
Publication dateOct 14, 2025
Grant dateOct 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A data storage system and method are provided, as well as systems and methods for fabrication, and writing and reading of data therein. The data storage system includes at least one population of molecular sequences including chains of basic molecular building-blocks, and defining at least one respective data-block encoding data in the data storage system. The data of the data-block is encoded in a sequence S=(π 1 , π 2 , . . . , π k . . . , π K-1 , π K ) of encoded letters {π k } associated with an alphabet Σ≡{σ m }| m=1 to M , which are encoded according to the types of basic molecular building-blocks appearing at k respective location along storage segments of the molecular sequences of the population. The molecular sequences include a number Z of different types of basic molecular building-blocks {E n }| n=1 to Z , while the alphabet Σ has a size M strictly greater than the number Z of types of building-blocks. Each alphabet letter σ m is associated with a vector {P m n }| n=1 to Z indicative of occurrences of basic molecular building-block E n of type n in the alphabet letter σ m . Accordingly each encoded letter π k at location k in the storage segments of molecular sequences of the data-block/population, is mapped to a corresponding alphabet letter σ m by determining a match between the occurrence of basic molecular building-blocks of different types at that locations k of the molecular sequences of the population, with the vector {P m n }| n=1 to Z associated with the alphabet letter σ m . In some implementations the component P m n of the vector {P m n } m|n=1 to Z associated with alphabet letter σ m is indicative of a probability that a basic molecular building-block E n of type n, 1 ≤n≤Z, appears at the location k of the storage segment of a molecular strand of the at least one population in case the letter π k encoded at that location k corresponds to the alphabet letter σ m .

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for storing data comprising: providing at least one data-block for encoding data in at least one respective population of molecular sequences, said molecular sequences comprising respective molecules which comprise sequences comprising a number Z of different types of basic molecular building-blocks {E n }| n=1 to z , by which the data of the data-block is encoded; encoding the data of the data-block in a sequence S′=(π 1 , π 2 , . . . , π k . . . , π K-1 , π K ) of encoded letters {π k } belonging to an alphabet Σ, whereby an identity of a letter π k ∈Σ encoded at a location k in the data-block is indicated by the types of basic molecular building-blocks occurring at the location k in a multitude of molecular sequences of the population, wherein said alphabet σ has a size M greater than the number Z of different types of basic molecular building-blocks used in the data storage system (M>Z), and each alphabet letter σ m in the alphabet Σ={σ m }| m=I to M is defined by a vector P m n indicative of a composition of the types of basic molecular building-blocks to which the alphabet letter σ m corresponds and whereby a P m n in the vector is indicative of whether a basic molecular building-block M n of certain type n (1≤n≤Z) should occur at a location k in one or more molecular sequences of the population in case the encoded letter π k at that location k, corresponds to the alphabet letter om; and synthesizing said at least one population of molecular sequences in accordance with the sequence S, wherein at each location k in a plurality of the molecules of the at least one respective population of molecular sequences a respective composition of the types of basic molecular building-blocks correspond to a respective letter in the sequence S. 2. The method claim 1 , wherein data is stored in a plurality of populations of the molecular sequences defining a respective plurality of data-blocks encoding data in the data storage system; and wherein each molecular sequence of the molecular sequences includes a population identification segment comprising an identifying sequence of molecular building-blocks indicative of the population with which said molecular sequence is associated; and wherein said identifying sequence is different in molecular sequences associated with different ones of said plurality of populations. 3. The method of claim 2 , wherein the molecular building-blocks of said identifying sequence are selected from said Z types of basic molecular building-blocks. 4. The method of claim 2 , wherein a difference between identifying sequences that are used in population identification segments of different respective populations exceeds a predetermined threshold measured by a certain predetermined distance metric of strings. 5. The method of claim 2 , wherein molecular sequences of one or more of said plurality of populations are contained together in a common region; and wherein molecular sequences associated with a same population can be exclusively selected by utilizing binding molecules configured and operable for selectively binding to the population identification segment of the molecular sequences associated with said same population. 6. The method of claim 1 , comprising wherein data is stored in a plurality of populations of the molecular sequences defining a respective plurality of data-blocks encoding data in the data storage system, and comprising a structure defining a plurality of distinct regions at which molecular sequences of different respective populations reside respectively; and wherein the molecular sequences of different respective populations reside exclusively and respectively at said distinct regions. 7. The method of claim 1 , wherein said types of basic molecular building-blocks comprise at least A, C, G, and T nucleotides and/or chemical modifications thereof. 8. The method of claim 1 , wherein said types of basic molecular building-blocks are predetermined oligomers of a same length. 9. The method of claim 1 , wherein the vector {P m n }| n=1 to z is a probability vector defining the alphabet letter σ m and P m n indicates a probability that a basic molecular building-block E n of type n, 1≤n≤Z, appears at the location k of a storage segment of a molecular strand of said at least one population in case the letter π k which is encoded at that location, k, corresponds to the alphabet letter σ m . 10. The method of claim 9 , wherein said at least one population of molecules is adapted to being read with N fold nominal sequencing depth or higher, and wherein each encoded letter π k being read from the position k is represented by an observed probability vector X k = {x k (E n )/N}| n=1 to z whereby x k (E n ) is a number of times the basic molecular building-block of type E n was read in the location k out of the N fold sequencing depth, being thereby indicative of an observed probability that the basic molecular building-blocks of type E n | n=1 to z appear in the location k. 11. The method of claim 10 , wherein mapping between an observed probability vector X k at the location k and an inferred alphabet letter π k is performed by determining an alphabet letter σ k satisfying a minimum divergence from the observed probability vector X k , σ k =ArgMin [{σ m } m=I to M |D (σ m , X k )], where D is a divergence function. 12. The method of claim 11 , wherein the divergence function D (σ m , X k ) is at least one of the following: an LP distance function; Euclidean distance D (σ m , X k )=σ m −X k ∥; Kullack-Leibler divergence D (σ m , X k )=KL (σ m , X k ). 13. The method of claim 1 wherein the vector P m n defining each alphabet letter σ m in the alphabet Σ is probability vector σ m = {P m n }| n=1 to z and whereby P m n designates a probability of the appearance of basic molecular building-block of type n at location k in the molecular sequences of the population in case the encoded letter π k at that location k, corresponds to the alphabet letter σ m . 14. A data storage system, comprising at least one population of molecular sequences having data stored thereon according to the method of claim 1 . 15. A molecular label comprising the data storage system according to claim 14 , wherein said at least one data-block is being respectively encoded by the at least one population of molecular sequences. 16. A method for reading data stored in a molecular data storage system, the method comprising at least the following operations: (i) providing a molecular data storage system comprising a population of molecular sequences defining a data-block of the system, said molecular sequences comprising respective molecules formed with a number Z of different types of basic molecular building-blocks {E n }| n=1 to z , by which the data of the data-block is encoded; (ii) applying sequencing of N fold nominal sequencing depth to the population of molecular sequences to determine, per each location k out of 1 to K locations of a storage segments of the molecular sequences of the population, an observed probability vector X k ={x k (E n )/N′}| n=1 to Z whereby x k (E n ) is a number of times, out of an N′ fold actual sequencing depth obtained for the population, at which a basic molecular building-block of type E n was found in the location k in a plurality of said molecules of said population of molecular sequences; (iii) associating each observed probability vector X k with one of alphabet letters {σ m } of an alphabet Σ={σ m }| m=1 to M , whereby a size of said alphabet Σ is greater than the number Z of the different ty

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12443366B2 cover?
A data storage system and method are provided, as well as systems and methods for fabrication, and writing and reading of data therein. The data storage system includes at least one population of molecular sequences including chains of basic molecular building-blocks, and defining at least one respective data-block encoding data in the data storage system. The data of the data-block is encoded …
Who is the assignee on this patent?
Technion Res & Development Found Ltd
What technology area does this patent fall under?
Primary CPC classification G06F3/0659. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).