Methods, systems, and software for identifying bio-molecules using models of multiplicative form

US9684771B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9684771-B2
Application numberUS-201414167713-A
CountryUS
Kind codeB2
Filing dateJan 29, 2014
Priority dateJan 31, 2013
Publication dateJun 20, 2017
Grant dateJun 20, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present invention provides methods for identifying bio-molecules with desired properties, or which are most suitable for acquiring such properties, from complex bio-molecule libraries or sets of such libraries. More specifically, some embodiments of the present invention provide methods for building sequence-activity models comprising multiplicative terms and using the models to guide directed evolution. In some embodiments, the sequence-activity models include one or more interaction terms, each of which including an interaction coefficient representing the contribution to activity of two or more defined residues. In some embodiments, the models describe relation between protein or nucleic acid sequences and protein activities. In some embodiments, the present invention also provides methods for preparing sequence-activity models, including but not limited to stepwise addition or subtraction techniques, Bayesian regression, ensemble regression and other methods. The present invention further provides digital systems and software for performing the methods provided herein.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of conducting directed evolution of one or more polypeptide or polynucleotide molecules, the method comprising, (a) receiving sequence data of a plurality of polypeptide molecules or a plurality of polynucleotide molecules encoding the plurality of polypeptide molecules, wherein the sequence data comprises identities and positions of a plurality of amino acids for each molecule of the plurality of polypeptide molecules or a plurality of nucleotides for each molecule of the plurality of polynucleotide molecules; (b) receiving activity data of the plurality of polypeptide molecules; (c) fitting a sequence-activity model to the received sequence data and the received activity data, wherein the sequence-activity model receives as one or more inputs one or more amino acids of a polypeptide molecule or one or more nucleotides of a polynucleotide molecule encoding the polypeptide molecule, the sequence-activity model provides as an output an activity of the polypeptide molecule, the sequence-activity model has a form comprising a product of a plurality of terms, and each of two or more of the plurality of terms comprises an independent variable representing at least one amino acid or at least one nucleotide at a sequence position and a coefficient representing a contribution to the activity by the at least one amino acid or the at least one nucleotide at the sequence position; (d) determining one or more amino acid sequences or one or more nucleic acid sequences using the sequence-activity model; (e) synthesizing one or more amino acid molecules or one or more nucleic acid molecules based on the one or more amino acid sequences or one or more nucleic acid sequences; and (f) recombining or performing mutagenesis on the one or more amino acid molecules or one or more nucleic acid molecules to provide the one or more polypeptide or polynucleotide molecules. 2. The method of claim 1 , wherein (d) comprises: selecting one or more mutations for a round of directed evolution by evaluating the coefficients of the two or more of the plurality of terms of the sequence-activity model to identify one or more defined amino acids or nucleotides at defined sequence positions that contribute to the activity; and determining a plurality of oligonucleotides containing or encoding the one or more mutations, wherein the plurality of oligonucleotides comprise at least portions of the one or more nucleic acid sequences. 3. The method of claim 2 , wherein selecting mutations for a round of directed evolution comprises identifying one or more coefficients that are determined to be larger than others of the coefficients, and selecting the defined amino acid or nucleotide at a defined position represented by the one or more coefficients so identified. 4. The method of claim 2 , wherein the recombining in (f) comprises shuffling a plurality of oligonucleotides containing or encoding the one or more mutations. 5. The method of claim 2 , further comprises synthesizing the plurality of oligonucleotides using a nucleic acid synthesizer. 6. The method of claim 2 , wherein the one or more mutations are located at the defined sequence positions or associated with the one or more defined amino acids or nucleotides. 7. The method of claim 1 , wherein (f) comprises fragmenting and recombining a polynucleotide molecule encoding a polypeptide molecule that is predicted by the sequence-activity model to have a desired level of activity. 8. The method of claim 1 , wherein (f) comprises performing saturation mutagenesis on a polypeptide molecule that is predicted by the model to have a desired level of activity. 9. The method of claim 1 , wherein (d) comprises: selecting one or more mutations by evaluating the coefficients of the two or more of the plurality of terms of the sequence-activity model to identify one or more defined amino acids or nucleotides at defined sequence positions that contribute to the activity; and identifying a new protein or a new nucleic acid sequence comprising the one or more mutations. 10. The method of claim 9 , further comprising using the new protein or new nucleic acid sequence as a starting point for further directed evolution. 11. The method of claim 9 , further comprising conducting saturation mutagenesis at one or more positions of the selected mutations. 12. The method of claim 1 , wherein (d) comprises: selecting one or more positions in an amino acid sequence or nucleic acid sequence by evaluating the coefficients of the two or more of the plurality of terms of the sequence-activity model to identify one or more defined amino acids or nucleotides at the one or more positions that contribute to the activity; and conducting saturation mutagenesis at the one or more positions. 13. The method of claim 1 , wherein (d) comprises: applying multiple protein sequences or multiple amino acid sequences to the sequence-activity model and determining activity values predicted by the sequence-activity model for each of the multiple protein sequences or nucleic acid sequences; and selecting a new protein sequence or a new nucleic acid sequence from among the multiple protein sequences or multiple amino acid sequences by evaluating the activity values predicted by the sequence-activity model for the multiple sequences; and wherein (e) comprises: preparing and assaying a protein having the new protein sequence or a protein encoded by the new nucleic acid sequence. 14. The method of claim 13 , wherein preparing the protein having the new protein sequence or the protein encoded by the new nucleic acid sequence comprises synthesizing a new protein molecule or a new nucleic acid molecule corresponding to the new protein sequence or the new nucleic acid sequence. 15. The method of claim 1 , wherein each of the two or more of the plurality of terms comprises a product of the coefficient and the independent variable, wherein the independent variable represents the presence or absence of the at least one amino acid or the at least one nucleotide at the sequence position. 16. The method of claim 15 , wherein each of the two or more of the plurality of terms are provided in the form of (1+coefficient×independent variable). 17. The method of claim 15 , wherein the at least one amino acid or the at least one nucleotide at the sequence position is one amino acid or one nucleotide at the sequence position. 18. The method of claim 17 , wherein the independent variable is a dummy variable with value 0 representing the absence of the amino acid or the nucleotide at the sequence position, and value 1 representing the presence of the amino acid or the nucleotide at the sequence position. 19. The method of claim 15 , wherein the at least one amino acid or the at least one nucleotide at the sequence position comprises two amino acids or two nucleotides at the sequence position alternatively. 20. The method of claim 19 , wherein the independent variable is a dummy variable with value −1 representing the presence of one of the two amino acids or one of the two nucleotides, and value 1 representing the presence of another of the two amino acids or another of the two nucleotides. 21. The method of claim 1 , wherein the coefficients are provided in a look-up table. 22. The method of claim 1 , further comprising forming a protein variant library using the one or more polypeptide or a polynucleotide molecules provided by operation (f). 23. The method o

Assignees

Inventors

Classifications

  • In silico combinatorial chemistry · CPC title

  • ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks · CPC title

  • ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment · CPC title

  • ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title

  • G16B35/00Primary

    ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9684771B2 cover?
The present invention provides methods for identifying bio-molecules with desired properties, or which are most suitable for acquiring such properties, from complex bio-molecule libraries or sets of such libraries. More specifically, some embodiments of the present invention provide methods for building sequence-activity models comprising multiplicative terms and using the models to guide direc…
Who is the assignee on this patent?
Codexis Inc
What technology area does this patent fall under?
Primary CPC classification G16B35/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jun 20 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).