Method and apparatus for encoding information units in code word sequences avoiding reverse complementarity

US9774351B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9774351-B2
Application numberUS-201515317914-A
CountryUS
Kind codeB2
Filing dateJun 9, 2015
Priority dateJun 17, 2014
Publication dateSep 26, 2017
Grant dateSep 26, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method ( 100 ) and an apparatus ( 200 ) for encoding information in codeword sequences are described which help avoid synthesizing reverse complementary nucleotide sequences, making them suitable for synthesizing nucleic acid strands. Multiple codes are provided ( 102 ), consisting of a same amount of corresponding code words. No word belongs to more than one code. Each code could completely encode all information units which are encoded using code word sequences generated from the codes. Generating ( 105 ) a sequence comprises: selecting ( 106 ), from code words of a code, a next code word to be appended to the sequence; appending ( 108 ) the next code word if a concatenation of the sequence and the next code word does not contain a reverse complementary of any code symbol sequence that at least partly contains the next code word; and otherwise ( 109 ) selecting a corresponding next code word from a different code and repeating the appending.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method for encoding a plurality of information units in at least one sequence of code words consisting of quaternary code symbols, to be used for storing information in synthesized nucleic acid molecules comprising sequences of nucleotides corresponding to the code words, comprising providing a plurality of codes, consisting of a same amount of corresponding code words, wherein the code words consist of code symbols representing nucleotides and none of the code words belongs to more than one of said codes and each of said codes is capable of completely encoding said plurality of information units, wherein the information units are mapped to corresponding code words; encoding the plurality of information units using at least one code word sequence generated from said plurality of codes during a generating process comprising: selecting, from the plurality of code words of a code of said plurality of codes, a next code word to be appended to the code word sequence; appending the next code word if a concatenation of the code word sequence and the next code word to be appended does not contain at least one section comprising a reverse complementary code symbol sequence of any code symbol sequence of a defined length that at least partly contains the next code word to be appended, wherein the reverse complementary code symbol sequence corresponds to the reverse complementary of a sequence of nucleotides that is represented by the code symbol sequence; and otherwise selecting a corresponding next code word from a different code of said plurality of codes and repeating the appending. 2. The method according to claim 1 , wherein the code symbol sequence consists of the next code word to be appended and a most recently appended code word, and the at least one section comprises a reverse complementary code symbol sequence of the next code word to be appended being directly adjacent to a reverse complementary code symbol sequence of the most recently appended code word. 3. The method according to claim 1 , wherein the selecting and appending is repeated to generate the code word sequence, until the code word sequence has at least a predefined length or the information units are completely encoded. 4. The method ( 100 ) according to claim 1 , comprising synthesizing at least one nucleic acid molecule containing a segment wherein a sequence of nucleotides is arranged to correspond to the at least one code word sequence. 5. The method according to claim 4 , wherein the synthesizing is performed during the generation of the at least one code word sequence. 6. The method according to claim 1 , wherein said code symbol sequence consists of the next code word to be appended and the at least one section comprises the reverse complementary code word of the next code word to be appended. 7. The method according to claim 1 , wherein said code symbol sequence consists of the next code word to be appended and a predefined amount of code words previously appended to the code word sequence and said at least one section comprises reverse complementary code words of the next code word to be appended and of the predefined amount of previously appended code words. 8. The method according to claim 1 , wherein, if none corresponding code word of the plurality of codes is appendable as the next code word, the most recently appended code word is removed and a corresponding code word of a different code of the plurality of codes is selected as the next code word to be appended. 9. The method according to claim 1 , wherein an additional code is provided, consisting of less code words than said plurality of codes, wherein none of the code words of the additional code belongs to any of the plurality of codes, the code words of the additional code have corresponding code words in said plurality of codes and the additional code is capable of incompletely encoding said plurality of information units. 10. The method according to claim 1 , wherein the next code word is appended if the concatenation of the code word sequence and the next code word to be appended contains said at least one section and a distance between a location of the most recently appended code word and any location of said at least one section within said code word sequence is greater than a predefined distance. 11. The method according to claim 1 , wherein the plurality of codes is generated from an initial plurality of code words having all code words containing a runlength of identical code symbols of more than a maximum runlength, either within a single code word or concatenated with another code word of the initial plurality of code words, removed. 12. An apparatus for encoding a plurality of information units in at least one sequence of code words consisting of quaternary code symbols, to be used for storing information in synthesized nucleic acid molecules comprising sequences of nucleotides corresponding to the code words, comprising a code generator unit configured to provide a plurality of codes, consisting of a same amount of corresponding code words, wherein the code words consist of code symbols representing nucleotides and none of the code words belongs to more than one of said codes and each of said codes is capable of completely encoding said plurality of information units, wherein the information units are mapped to corresponding code words; an information encoder unit configured to encode the plurality of information units using at least one code word sequence generated from said plurality of codes; and a code word sequence generator unit configured to generate said at least one code word sequence at least by: selecting, from the plurality of code words of a code of said plurality of codes, a next code word to be appended to the code word sequence; appending the next code word if a concatenation of the code word sequence and the next code word to be appended does not contain at least one section comprising a reverse complementary code symbol sequence of any code symbol sequence of a defined length that at least partly contains the next code word to be appended, wherein the reverse complementary code symbol sequence corresponds to the reverse complementary of a sequence of nucleotides that is represented by the code symbol sequence; and otherwise selecting a corresponding next code word from a different code of said plurality of codes and repeating the appending. 13. The apparatus according to claim 12 , comprising a synthesizer unit configured to synthesize at least one nucleic acid molecule containing a segment wherein a sequence of nucleotides is arranged to correspond to the at least one code word sequence. 14. A non-transitory computer readable storage medium having stored therein instructions enabling encoding of a plurality of information units in at least one sequence of code words consisting of quaternary code symbols, to be used for storing information in synthesized nucleic acid molecules comprising sequences of nucleotides corresponding to the code words, which, when executed by a computer, cause the computer to: provide a plurality of codes, consisting of a same amount of corresponding code words, wherein the code words consist of code symbols representing nucleotides and none of the code words belongs to more than one of said codes and each of said codes is capable of completely encoding said plurality of information units, wherein the information units are mapped to corresponding code words; encode the plurality of information units using at least one code word sequence generated from said plurality of codes; and generate said at

Assignees

Inventors

Classifications

  • Conversion to or from block codes or representations thereof · CPC title

  • G06N3/123Primary

    DNA computing · CPC title

  • H03M7/46Primary

    Conversion to or from run-length codes, i.e. by representing the number of consecutive digits, or groups of digits, of the same kind by a code word and a digit indicative of that kind · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9774351B2 cover?
A method ( 100 ) and an apparatus ( 200 ) for encoding information in codeword sequences are described which help avoid synthesizing reverse complementary nucleotide sequences, making them suitable for synthesizing nucleic acid strands. Multiple codes are provided ( 102 ), consisting of a same amount of corresponding code words. No word belongs to more than one code. Each code could completely …
Who is the assignee on this patent?
Thomson Licensing
What technology area does this patent fall under?
Primary CPC classification G06N3/123. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Sep 26 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).