Models for Targeted Sequencing
US-2024321389-A1 · Sep 26, 2024 · US
US11923044B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11923044-B1 |
| Application number | US-202016896907-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 9, 2020 |
| Priority date | Jun 9, 2020 |
| Publication date | Mar 5, 2024 |
| Grant date | Mar 5, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for predicting a protein sequence are described. An exemplary method includes receiving a request to predict a missing area of a protein's primary sequence and a corresponding three-dimensional position of the missing area; applying a machine learning model to backbone Cartesian coordinates of the protein's primary sequence and a protein vector of a representation of the protein's primary sequence including the missing area to predict a missing area of the protein primary sequence and a corresponding three-dimensional position for the missing area, wherein the machine learning model is selected from the group consisting of: an attention-based machine learning model, a bidirectional long short term memory-based model, and a convolutional neural network-based model; and outputting a result of the machine learning model.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: receiving, at a protein sequence predictor comprising one or more processors, a request to predict a missing area of a protein primary sequence and a corresponding three-dimensional position of the missing area, the request including a representation of the protein primary sequence, backbone Cartesian coordinates for the protein primary sequence, and an indication of ablations in the protein primary sequence; conditioning the protein primary sequence and the backbone Cartesian coordinates for the protein primary sequence by: passing the representation of the protein primary sequence as input to an attention-based machine learning model of the protein sequence predictor and applying an embedding of the attention-based machine learning model to the representation of the protein primary sequence, obtaining output of a protein vector from the attention-based machine learning model, passing the backbone Cartesian coordinates as input to the protein sequence predictor to capture features in sequence space, and obtaining output of processed backbone Cartesian coordinates from the protein sequence predictor; combining the processed backbone Cartesian coordinates and the protein vector to generate a combined coordinate vector and protein vector; passing the combined coordinate vector and protein vector as input to the attention-based machine learning model; obtaining output of a prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area from the attention-based machine learning model; and generating a three-dimensional representation of the protein based on the output of the prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area. 2. The computer-implemented method of claim 1 , wherein the representation of the protein primary sequence uses an amino acid code consistent with International Union of Pure and Applied Chemistry usage. 3. The computer-implemented method of claim 1 , wherein the attention-based machine learning model is a transformer-based model. 4. A computer-implemented method comprising: receiving, at a protein sequence predictor comprising one or more processors, a request to predict a missing area of a protein primary sequence and a corresponding three-dimensional position of the missing area; passing as input to a machine learning model of the protein sequence predictor backbone Cartesian coordinates of the protein primary sequence and a protein vector of a representation of the protein primary sequence including the missing area, wherein the machine learning model is selected from the group consisting of: an attention-based machine learning model, a bidirectional long short term memory-based model, and a convolutional neural network-based model; and obtaining output of a prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area from the machine learning model. 5. The computer-implemented method of claim 4 , further comprising: applying an embedding of the machine learning model to the representation of the protein primary sequence to generate the protein vector. 6. The computer-implemented method of claim 5 , wherein the representation of the protein primary sequence is a character-based representation. 7. The computer-implemented method of claim 6 , wherein characters of the character-based representation conform to an amino acid code consistent with International Union of Pure and Applied Chemistry usage. 8. The computer-implemented method of claim 6 , wherein the request includes an indication of regions of ablation using a set of mask tokens in the representation of the protein primary sequence. 9. The computer-implemented method of claim 4 , wherein the request includes processed backbone Cartesian coordinates for the protein primary sequence and an embedded representation of the protein primary sequence as the protein vector. 10. The computer-implemented method of claim 4 , wherein the machine learning model is a transformer-based model. 11. The computer-implemented method of claim 4 , wherein the machine learning model is a convolutional neural network-based model comprising a stack of residual block layers. 12. The computer-implemented method of claim 4 , wherein the machine learning model is a long short term memory-based model comprising a stack of bidirectional long short term memory-based layers. 13. The computer-implemented method of claim 4 , further comprising: generating a 3-D representation from the output of the machine learning model. 14. The computer-implemented method of claim 4 , further comprising: combining the backbone Cartesian coordinates of the protein primary sequence and the protein vector of the representation of the protein primary sequence prior to passing the input to the machine learning model. 15. A system comprising: a first one or more electronic devices to implement a three-dimensional generation service in a multi-tenant provider network; and a second one or more electronic devices to implement a protein sequence predictor service in the multi-tenant provider network, the protein sequence predictor service including memory storing instructions that upon execution by one or more processors of the protein sequence predictor service, cause the protein sequence predictor service to: receive a request to predict a missing area of a protein primary sequence and a corresponding three-dimensional position of the missing area, pass as input to a machine learning model of the protein sequence predictor service backbone Cartesian coordinates of the protein primary sequence and a protein vector of a representation of the protein primary sequence including the missing area, wherein the machine learning model is selected from the group consisting of: an attention-based machine learning model, a bidirectional long short term memory-based model, and a convolutional neural network-based model, and obtain output of a prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area from the machine learning model, wherein the three-dimensional generation service is to generate a three-dimensional representation of the output. 16. The system of claim 15 , wherein the protein sequence predictor service is to apply an embedding of the machine learning model to the representation of the protein primary sequence to generate the protein vector. 17. The system of claim 16 , wherein the representation of the protein primary sequence is a character-based representation. 18. The system of claim 17 , wherein characters of the character-based representation conform to an amino acid code consistent with International Union of Pure and Applied Chemistry usage. 19. The system of claim 15 , wherein the request includes an indication of regions of ablation using a set of mask tokens in the representation of the protein primary sequence. 20. The system of claim 15 , wherein the request includes processed backbone Cartesian coordinates for the protein primary sequence and an embedded representation of the protein primary sequence as the protein vector.
ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment · CPC title
Sequence assembly · CPC title
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
Supervised data analysis · CPC title
Protein or domain folding · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.