Machine learning for determining protein structures
US-2021304847-A1 · Sep 30, 2021 · US
US12562236B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12562236-B2 |
| Application number | US-201916585679-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 27, 2019 |
| Priority date | Sep 27, 2019 |
| Publication date | Feb 24, 2026 |
| Grant date | Feb 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, computer system, and a computer program product for designing one or more folded structural proteins from at least one raw amino acid sequence is provided. The present invention may include computing one or more character embeddings based on the at least one raw amino acid sequence by utilizing a multi-scale neighborhood-based neural network (MNNN) model. The present invention may then include refining the computed one or more character embeddings with at least one set of sequence neighborhood information. The present invention may further include predicting one or more dihedral angles based on the refined one or more character embeddings.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method comprising: obtaining at least one raw amino acid sequence; computing one or more amino acid character embeddings based on the at least one raw amino acid sequence by utilizing a multi-scale neighborhood-based neural network (MNNN) model; refining the computed one or more amino acid character embeddings with at least one set of sequence neighborhood information; predicting one or more dihedral angles based on the refined one or more character embeddings; and simulating a protein structure based on the predicted one or more dihedral angles. 2 . The method of claim 1 , further comprising: predicting at least one set of secondary structured information by utilizing a multilayer perception (MLP) layer of the MNNN model; performing protein structural analysis based on the predicted at least one set of secondary structured information; and predicting one or more dihedral angles associated with at least one next amino acid. 3 . The method of claim 1 , further comprising: generating an input request for a user, wherein input from the input request allows iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles. 4 . The method of claim 3 , wherein generating the input request for the user for the iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles, further comprises: receiving at least one piece of feedback on the predicted one or more dihedral angles by one or more experts; and modifying the MNNN model based on the received at least one piece of feedback. 5 . The method of claim 1 , further comprising: implementing one or more protein structural analysis based on the predicted one or more dihedral angles. 6 . The method of claim 1 , further comprising: transmitting at least one training set from at least one protein data file, wherein the at least one protein data file is associated with a known protein from a protein database; extracting a set of sequence data and a set of phi-psi angle data from the at least one protein data file; computing one or more known dihedral angles by analyzing a natural distribution of one or more phi-psi angles associated with the transmitted at least one training set; projecting the one or more known dihedral angles to a center of domains to reduce a degree of freedom using the natural distribution of the one or more known dihedral angles; and building a MNNN model to recognize one or more hidden patterns in a protein sequence associated with the transmitted at least one training set and the one or more phi-psi angles, wherein the one or more hidden patterns are mapped to the protein sequence for each amino acid in one or more proteins with a known 3D structure in the protein database. 7 . The method of claim 6 , further comprising: applying the built MNNN model to a second protein with a second 3D structure; predicting the one or more phi-psi angles associated with the second 3D structure; combining one or more raw sequence information and the predicted one or more phi-psi angles corresponding with the second 3D structure; and generating one or more folded structural proteins based on a translation of the combined one or more raw sequence information and the predicted one or more phi-psi angles corresponding with the second 3D structure to backbone orientation, wherein a plurality of intrinsic coordinates parameters is included with the generated one or more folded structural proteins. 8 . The method of claim 7 , further comprising: quantifying an acceleration of the predicted one or more phi-psi angles for the second 3D structure and a level of stability associated with the generated one or more folded structural proteins, wherein the acceleration is quantified by running one or more molecular dynamics simulations; and validating the generated one or more folded structural proteins. 9 . The method of claim 8 , wherein validating the generated one or more folded structural proteins, further comprises: comparing the generated one or more folded structural proteins with one or more experimental synthesis, wherein the one or more experimental synthesis is selected from the group consisting of existing experimental data and peptide synthesis data; and characterizing one or more structural features associated with the compared one or more folded structural proteins with a plurality of other similar protein structures. 10 . The method of claim 1 , wherein computing one or more character embeddings based on the at least one raw amino acid sequence by utilizing the MNNN model, further comprises: computing the one or more character embeddings in an absence of at least one template, at least one piece of co-evolution information, and at least one piece of structural biological knowledge. 11 . A computer system for designing one or more folded structural proteins from at least one raw amino acid sequence, comprising: one or more processors, one or more computer-readable memories, one or more computer-readable tangible storage medium, and program instructions stored on at least one of the one or more tangible storage medium for execution by at least one of the one or more processors via at least one of the one or more memories, wherein the computer system is capable of performing a method comprising: obtaining the at least one raw amino acid sequence; computing one or more amino acid character embeddings based on the at least one raw amino acid sequence by utilizing a multi-scale neighborhood-based neural network (MNNN) model; refining the computed one or more amino acid character embeddings with at least one set of sequence neighborhood information; predicting one or more dihedral angles based on the refined one or more character embeddings; and simulating a protein structure based on the predicted one or more dihedral angles. 12 . The computer system of claim 11 , further comprising: predicting at least one set of secondary structured information by utilizing a multilayer perception (MLP) layer of the MNNN model; performing protein structural analysis based on the predicted at least one set of secondary structured information; and predicting one or more dihedral angles associated with at least one next amino acid. 13 . The computer system of claim 11 , further comprising: generating an input request for a user, wherein input from the input request allows iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles. 14 . The computer system of claim 13 , wherein generating the input request for the user for iterative user interaction in real-time to visualize the refined one or more character embeddings and the predicted one or more dihedral angles, further comprises: receiving at least one piece of feedback on the predicted one or more dihedral angles by one or more experts; and modifying the MNNN model based on the received at least one piece of feedback. 15 . The computer system of claim 11 , further comprising: implementing one or more protein structural analysis based on the predicted one or more dihedral angles; and generating the one or more folded structural proteins based on the implemented one or more protein structural analysis. 16 . The computer system of claim 11 , further comprising: transmitting at least one training set from at least one protein data file, whe
Architecture, e.g. interconnection topology · CPC title
Protein or domain folding · CPC title
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
Learning methods · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.