Code generation method, code generating apparatus and computer readable storage medium

US9830553B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9830553-B2
Application numberUS-201515502528-A
CountryUS
Kind codeB2
Filing dateJul 31, 2015
Priority dateAug 8, 2014
Publication dateNov 28, 2017
Grant dateNov 28, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A code book is generated for mapping source to target code words which allows encoding source data at reduced probability of incorrect decoding, e.g. for DNA storage. The target code words are grouped ( 102 ) into subsets and comprise identifying and remaining portions. The identifying portions of target code words corresponding to a same subset are identical. A first code symbol set of source code words is selected ( 103 ) for addressing the subsets. For the subsets, neighboring subsets are determined ( 104 ). The identifying portions of the target code words of neighboring subsets differ from those of the corresponding subset by up to a predetermined amount of symbols. Source code words are assigned ( 105 ) where the corresponding first code symbols address the same subset to said subset such that an amount of target code words of said subset having their remaining portions identical to their neighboring subsets corresponds to an optimization criterion.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented code book generation method for mapping a plurality of source code words to a plurality of target code words comprising code symbols corresponding to nucleotides, comprising providing a plurality of source code words and a plurality of target code words; grouping the plurality of target code words into a plurality of subsets of the target code words, the target code words comprising an identifying portion and a remaining portion, wherein the identifying portions of the target code words corresponding to a same subset of the plurality of subsets are identical; selecting a first set of code symbols of the source code words to be associated with the plurality of subsets; determining for the subsets one or more corresponding neighboring subsets within the plurality of subsets, wherein the identifying portions of the target code words of the one or more neighboring subsets differ from the identifying portion of the target code words of the corresponding subset by up to a predetermined amount of code symbols; and assigning source code words where the corresponding first set of code symbols is associated with the same subset, to target code words of said subset such that an amount of the target code words of said subset said source code words are assigned to, having their remaining portions identical to the corresponding remaining portions of the target code words of their neighboring subsets corresponds to an optimization criterion. 2. Method according to claim 1 , comprising removing target code words from the plurality of target code words according to a decoding related criterion before grouping the plurality of target code words into a plurality of subsets of the target code words. 3. Method according to claim 2 , wherein according to the decoding related criterion target code words that comprise a run length of identical code symbols of more than a predefined maximum run length are removed. 4. Method according to claim 3 , wherein target code words that comprise a run length of identical code symbols of more than the predefined maximum run length when being concatenated with another of the target code words are removed. 5. Method according to claim 1 , wherein said determining comprises that the identifying portions of the one or more neighboring subsets differ from the corresponding subset by selected symbol flips corresponding to dominant sequencing errors based on a sequencing error probability of nucleotides within nucleic acid strands. 6. Method according to claim 1 , wherein the pluralities of source code words and target code words are divided into source code words and target code words of a first code and of a second code, the target code words of the first code and of the second code both having the properties that the reverse complementary word of a target code word of the corresponding code still belongs to the corresponding code, and that there is no common code word between the first code and the second code, and that a target code word of the second code is neither equal to any portion of two cascaded target code words of the first code nor equal to any portion of cascaded one target code word of the first code and one target code word of the second code, and wherein the grouping, selecting, determining and assigning is applied to the first code. 7. Method according to claim 6 , wherein the second code is generated according to the following: grouping the plurality of target code words of the second code into a plurality of subsets of the target code words of the second code, the target code words of the second code comprising an identifying portion and a remaining portion, wherein the identifying portions of the target code words of the second code corresponding to a same subset of the plurality of subsets of target code words of the second code are identical; selecting a first set of code symbols of the source code words of the second code to be associated with the plurality of subsets of target code words of the second code; assigning source code words of the second code where the corresponding first set of code symbols is associated with the same subset of target code words of the second code, to said subset according to a cost function minimizing a Hamming distance between the remaining portions of the target code words of the second code. 8. Method according to claim 7 , wherein the cost function depends on a symbol error probability. 9. Method according to claim 8 , wherein the symbol error probability is based on a sequencing error probability of nucleotides within nucleic acid strands. 10. Method according to claim 1 , comprising generating at least one code word sequence from one or more of the target code words; and synthesizing at least one nucleic acid molecule comprising a segment wherein a sequence of nucleotides is arranged to correspond to the at least one code word sequence. 11. Code generating apparatus for mapping a plurality of source code words to a plurality of target code words comprising code symbols corresponding to nucleotides, comprising a first input for receiving target code words and a second input for receiving source code words; a code word grouping unit configured to group the plurality of target code words into a plurality of subsets of the target code words, the target code words comprising an identifying portion and a remaining portion, wherein the identifying portions of the target code words corresponding to a same subset of the plurality of subsets are identical; a selection unit connected to the code word grouping unit and configured to select a first set of code symbols of the source code words to be associated with the plurality of subsets; a determining unit connected to the code word grouping unit and configured to determine for the subsets one or more corresponding neighboring subsets within the plurality of subsets, wherein the identifying portions of the target code words of the one or more neighboring subsets differ from the identifying portion of the target code words of the corresponding subset by up to a predetermined amount of code symbols; and a mapping unit connected to the selection unit and the determining unit and configured to assign source code words where the corresponding first set of code symbols is associated with the same subset, to target code words of said subset such that an amount of the target code words of said subset said source code words are assigned to, having their remaining portions identical to the corresponding remaining portions of the target code words of their neighboring subsets corresponds to an optimization criterion. 12. Apparatus according to claim 11 , comprising a code word sequence generating unit configured to generate at least one code word sequence from one or more of the target code words; and a synthesizer unit configured to synthesize at least one nucleic acid molecule comprising a segment wherein a sequence of nucleotides is arranged to correspond to the at least one code word sequence. 13. Computer readable storage medium having stored therein instructions enabling mapping a plurality of source code words to a plurality of target code words comprising code symbols corresponding to nucleotides, which, when executed by a computer, cause the computer to: provide a plurality of source code words and a plurality of target code words; group the plurality of target code words into a plurality of subsets of the target code words, the target code words comprising an identifying portion and a remaining portion, wherein the identifying portions of the target code words corresponding to a same subse

Assignees

Inventors

Classifications

  • comprising bio-molecules · CPC title

  • G06N3/123Primary

    DNA computing · CPC title

  • H03M5/145Primary

    Conversion to or from block codes or representations thereof · CPC title

  • Data managing, e.g. manipulating data before writing or reading out, data bus switches or control circuits therefor · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9830553B2 cover?
A code book is generated for mapping source to target code words which allows encoding source data at reduced probability of incorrect decoding, e.g. for DNA storage. The target code words are grouped ( 102 ) into subsets and comprise identifying and remaining portions. The identifying portions of target code words corresponding to a same subset are identical. A first code symbol set of source …
Who is the assignee on this patent?
Thomson Licensing
What technology area does this patent fall under?
Primary CPC classification G06N3/123. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).