Designing and folding structural proteins from the primary amino acid sequence

US12562236B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12562236-B2
Application numberUS-201916585679-A
CountryUS
Kind codeB2
Filing dateSep 27, 2019
Priority dateSep 27, 2019
Publication dateFeb 24, 2026
Grant dateFeb 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method, computer system, and a computer program product for designing one or more folded structural proteins from at least one raw amino acid sequence is provided. The present invention may include computing one or more character embeddings based on the at least one raw amino acid sequence by utilizing a multi-scale neighborhood-based neural network (MNNN) model. The present invention may then include refining the computed one or more character embeddings with at least one set of sequence neighborhood information. The present invention may further include predicting one or more dihedral angles based on the refined one or more character embeddings.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: obtaining at least one raw amino acid sequence; computing one or more amino acid character embeddings based on the at least one raw amino acid sequence by utilizing a multi-scale neighborhood-based neural network (MNNN) model; refining the computed one or more amino acid character embeddings with at least one set of sequence neighborhood information; predicting one or more dihedral angles based on the refined one or more character embeddings; and simulating a protein structure based on the predicted one or more dihedral angles. 2 . The method of claim 1 , further comprising: predicting at least one set of secondary structured information by utilizing a multilayer perception (MLP) layer of the MNNN model; performing protein structural analysis based on the predicted at least one set of secondary structured information; and predicting one or more dihedral angles associated with at least one next amino acid. 3 . The method of claim 1 , further comprising: generating an input request for a user, wherein input from the input request allows iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles. 4 . The method of claim 3 , wherein generating the input request for the user for the iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles, further comprises: receiving at least one piece of feedback on the predicted one or more dihedral angles by one or more experts; and modifying the MNNN model based on the received at least one piece of feedback. 5 . The method of claim 1 , further comprising: implementing one or more protein structural analysis based on the predicted one or more dihedral angles. 6 . The method of claim 1 , further comprising: transmitting at least one training set from at least one protein data file, wherein the at least one protein data file is associated with a known protein from a protein database; extracting a set of sequence data and a set of phi-psi angle data from the at least one protein data file; computing one or more known dihedral angles by analyzing a natural distribution of one or more phi-psi angles associated with the transmitted at least one training set; projecting the one or more known dihedral angles to a center of domains to reduce a degree of freedom using the natural distribution of the one or more known dihedral angles; and building a MNNN model to recognize one or more hidden patterns in a protein sequence associated with the transmitted at least one training set and the one or more phi-psi angles, wherein the one or more hidden patterns are mapped to the protein sequence for each amino acid in one or more proteins with a known 3D structure in the protein database. 7 . The method of claim 6 , further comprising: applying the built MNNN model to a second protein with a second 3D structure; predicting the one or more phi-psi angles associated with the second 3D structure; combining one or more raw sequence information and the predicted one or more phi-psi angles corresponding with the second 3D structure; and generating one or more folded structural proteins based on a translation of the combined one or more raw sequence information and the predicted one or more phi-psi angles corresponding with the second 3D structure to backbone orientation, wherein a plurality of intrinsic coordinates parameters is included with the generated one or more folded structural proteins. 8 . The method of claim 7 , further comprising: quantifying an acceleration of the predicted one or more phi-psi angles for the second 3D structure and a level of stability associated with the generated one or more folded structural proteins, wherein the acceleration is quantified by running one or more molecular dynamics simulations; and validating the generated one or more folded structural proteins. 9 . The method of claim 8 , wherein validating the generated one or more folded structural proteins, further comprises: comparing the generated one or more folded structural proteins with one or more experimental synthesis, wherein the one or more experimental synthesis is selected from the group consisting of existing experimental data and peptide synthesis data; and characterizing one or more structural features associated with the compared one or more folded structural proteins with a plurality of other similar protein structures. 10 . The method of claim 1 , wherein computing one or more character embeddings based on the at least one raw amino acid sequence by utilizing the MNNN model, further comprises: computing the one or more character embeddings in an absence of at least one template, at least one piece of co-evolution information, and at least one piece of structural biological knowledge. 11 . A computer system for designing one or more folded structural proteins from at least one raw amino acid sequence, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage medium, and program instructions stored on at least one of the one or more tangible storage medium for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: obtaining the at least one raw amino acid sequence; computing one or more amino acid character embeddings based on the at least one raw amino acid sequence by utilizing a multi-scale neighborhood-based neural network (MNNN) model; refining the computed one or more amino acid character embeddings with at least one set of sequence neighborhood information; predicting one or more dihedral angles based on the refined one or more character embeddings; and simulating a protein structure based on the predicted one or more dihedral angles. 12 . The computer system of claim 11 , further comprising: predicting at least one set of secondary structured information by utilizing a multilayer perception (MLP) layer of the MNNN model; performing protein structural analysis based on the predicted at least one set of secondary structured information; and predicting one or more dihedral angles associated with at least one next amino acid. 13 . The computer system of claim 11 , further comprising: generating an input request for a user, wherein input from the input request allows iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles. 14 . The computer system of claim 13 , wherein generating the input request for the user for iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles, further comprises: receiving at least one piece of feedback on the predicted one or more dihedral angles by one or more experts; and modifying the MNNN model based on the received at least one piece of feedback. 15 . The computer system of claim 11 , further comprising: implementing one or more protein structural analysis based on the predicted one or more dihedral angles; and generating the one or more folded structural proteins based on the implemented one or more protein structural analysis. 16 . The computer system of claim 11 , further comprising: transmitting at least one training set from at least one protein data file, whe

Assignees

Inventors

Classifications

  • Architecture, e.g. interconnection topology · CPC title

  • G16B15/20Primary

    Protein or domain folding · CPC title

  • ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title

  • Learning methods · CPC title

  • Supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12562236B2 cover?
A method, computer system, and a computer program product for designing one or more folded structural proteins from at least one raw amino acid sequence is provided. The present invention may include computing one or more character embeddings based on the at least one raw amino acid sequence by utilizing a multi-scale neighborhood-based neural network (MNNN) model. The present invention may the…
Who is the assignee on this patent?
IBM, Mit Massachusetts Institute Of Tech, Massachusetts Inst Technology
What technology area does this patent fall under?
Primary CPC classification G16B15/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).