Multi-dimensional mapping of binary data to DNA sequences
US-11810651-B2 · Nov 7, 2023 · US
US12443366B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12443366-B2 |
| Application number | US-202017101824-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 23, 2020 |
| Priority date | May 21, 2018 |
| Publication date | Oct 14, 2025 |
| Grant date | Oct 14, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A data storage system and method are provided, as well as systems and methods for fabrication, and writing and reading of data therein. The data storage system includes at least one population of molecular sequences including chains of basic molecular building-blocks, and defining at least one respective data-block encoding data in the data storage system. The data of the data-block is encoded in a sequence S=(π 1 , π 2 , . . . , π k . . . , π K-1 , π K ) of encoded letters {π k } associated with an alphabet Σ≡{σ m }| m=1 to M , which are encoded according to the types of basic molecular building-blocks appearing at k respective location along storage segments of the molecular sequences of the population. The molecular sequences include a number Z of different types of basic molecular building-blocks {E n }| n=1 to Z , while the alphabet Σ has a size M strictly greater than the number Z of types of building-blocks. Each alphabet letter σ m is associated with a vector {P m n }| n=1 to Z indicative of occurrences of basic molecular building-block E n of type n in the alphabet letter σ m . Accordingly each encoded letter π k at location k in the storage segments of molecular sequences of the data-block/population, is mapped to a corresponding alphabet letter σ m by determining a match between the occurrence of basic molecular building-blocks of different types at that locations k of the molecular sequences of the population, with the vector {P m n }| n=1 to Z associated with the alphabet letter σ m . In some implementations the component P m n of the vector {P m n } m|n=1 to Z associated with alphabet letter σ m is indicative of a probability that a basic molecular building-block E n of type n, 1 ≤n≤Z, appears at the location k of the storage segment of a molecular strand of the at least one population in case the letter π k encoded at that location k corresponds to the alphabet letter σ m .
Opening claim text (preview).
The invention claimed is: 1. A method for storing data comprising: providing at least one data-block for encoding data in at least one respective population of molecular sequences, said molecular sequences comprising respective molecules which comprise sequences comprising a number Z of different types of basic molecular building-blocks {E n }| n=1 to z , by which the data of the data-block is encoded; encoding the data of the data-block in a sequence S′=(π 1 , π 2 , . . . , π k . . . , π K-1 , π K ) of encoded letters {π k } belonging to an alphabet Σ, whereby an identity of a letter π k ∈Σ encoded at a location k in the data-block is indicated by the types of basic molecular building-blocks occurring at the location k in a multitude of molecular sequences of the population, wherein said alphabet σ has a size M greater than the number Z of different types of basic molecular building-blocks used in the data storage system (M>Z), and each alphabet letter σ m in the alphabet Σ={σ m }| m=I to M is defined by a vector P m n indicative of a composition of the types of basic molecular building-blocks to which the alphabet letter σ m corresponds and whereby a P m n in the vector is indicative of whether a basic molecular building-block M n of certain type n (1≤n≤Z) should occur at a location k in one or more molecular sequences of the population in case the encoded letter π k at that location k, corresponds to the alphabet letter om; and synthesizing said at least one population of molecular sequences in accordance with the sequence S, wherein at each location k in a plurality of the molecules of the at least one respective population of molecular sequences a respective composition of the types of basic molecular building-blocks correspond to a respective letter in the sequence S. 2. The method claim 1 , wherein data is stored in a plurality of populations of the molecular sequences defining a respective plurality of data-blocks encoding data in the data storage system; and wherein each molecular sequence of the molecular sequences includes a population identification segment comprising an identifying sequence of molecular building-blocks indicative of the population with which said molecular sequence is associated; and wherein said identifying sequence is different in molecular sequences associated with different ones of said plurality of populations. 3. The method of claim 2 , wherein the molecular building-blocks of said identifying sequence are selected from said Z types of basic molecular building-blocks. 4. The method of claim 2 , wherein a difference between identifying sequences that are used in population identification segments of different respective populations exceeds a predetermined threshold measured by a certain predetermined distance metric of strings. 5. The method of claim 2 , wherein molecular sequences of one or more of said plurality of populations are contained together in a common region; and wherein molecular sequences associated with a same population can be exclusively selected by utilizing binding molecules configured and operable for selectively binding to the population identification segment of the molecular sequences associated with said same population. 6. The method of claim 1 , comprising wherein data is stored in a plurality of populations of the molecular sequences defining a respective plurality of data-blocks encoding data in the data storage system, and comprising a structure defining a plurality of distinct regions at which molecular sequences of different respective populations reside respectively; and wherein the molecular sequences of different respective populations reside exclusively and respectively at said distinct regions. 7. The method of claim 1 , wherein said types of basic molecular building-blocks comprise at least A, C, G, and T nucleotides and/or chemical modifications thereof. 8. The method of claim 1 , wherein said types of basic molecular building-blocks are predetermined oligomers of a same length. 9. The method of claim 1 , wherein the vector {P m n }| n=1 to z is a probability vector defining the alphabet letter σ m and P m n indicates a probability that a basic molecular building-block E n of type n, 1≤n≤Z, appears at the location k of a storage segment of a molecular strand of said at least one population in case the letter π k which is encoded at that location, k, corresponds to the alphabet letter σ m . 10. The method of claim 9 , wherein said at least one population of molecules is adapted to being read with N fold nominal sequencing depth or higher, and wherein each encoded letter π k being read from the position k is represented by an observed probability vector X k = {x k (E n )/N}| n=1 to z whereby x k (E n ) is a number of times the basic molecular building-block of type E n was read in the location k out of the N fold sequencing depth, being thereby indicative of an observed probability that the basic molecular building-blocks of type E n | n=1 to z appear in the location k. 11. The method of claim 10 , wherein mapping between an observed probability vector X k at the location k and an inferred alphabet letter π k is performed by determining an alphabet letter σ k satisfying a minimum divergence from the observed probability vector X k , σ k =ArgMin [{σ m } m=I to M |D (σ m , X k )], where D is a divergence function. 12. The method of claim 11 , wherein the divergence function D (σ m , X k ) is at least one of the following: an LP distance function; Euclidean distance D (σ m , X k )=σ m −X k ∥; Kullack-Leibler divergence D (σ m , X k )=KL (σ m , X k ). 13. The method of claim 1 wherein the vector P m n defining each alphabet letter σ m in the alphabet Σ is probability vector σ m = {P m n }| n=1 to z and whereby P m n designates a probability of the appearance of basic molecular building-block of type n at location k in the molecular sequences of the population in case the encoded letter π k at that location k, corresponds to the alphabet letter σ m . 14. A data storage system, comprising at least one population of molecular sequences having data stored thereon according to the method of claim 1 . 15. A molecular label comprising the data storage system according to claim 14 , wherein said at least one data-block is being respectively encoded by the at least one population of molecular sequences. 16. A method for reading data stored in a molecular data storage system, the method comprising at least the following operations: (i) providing a molecular data storage system comprising a population of molecular sequences defining a data-block of the system, said molecular sequences comprising respective molecules formed with a number Z of different types of basic molecular building-blocks {E n }| n=1 to z , by which the data of the data-block is encoded; (ii) applying sequencing of N fold nominal sequencing depth to the population of molecular sequences to determine, per each location k out of 1 to K locations of a storage segments of the molecular sequences of the population, an observed probability vector X k ={x k (E n )/N′}| n=1 to Z whereby x k (E n ) is a number of times, out of an N′ fold actual sequencing depth obtained for the population, at which a basic molecular building-block of type E n was found in the location k in a plurality of said molecules of said population of molecular sequences; (iii) associating each observed probability vector X k with one of alphabet letters {σ m } of an alphabet Σ={σ m }| m=1 to M , whereby a size of said alphabet Σ is greater than the number Z of the different ty
DNA computing · CPC title
comprising bio-molecules · CPC title
related to nanotechnology · CPC title
comprising cells based on organic memory material · CPC title
Single storage device · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.