What technology area does this patent fall under?

Primary CPC classification G16B15/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems, methods, and apparatuses to predict protein sequence and structure

US11923044B1 · US · B1

Patent metadata
Field	Value
Publication number	US-11923044-B1
Application number	US-202016896907-A
Country	US
Kind code	B1
Filing date	Jun 9, 2020
Priority date	Jun 9, 2020
Publication date	Mar 5, 2024
Grant date	Mar 5, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for predicting a protein sequence are described. An exemplary method includes receiving a request to predict a missing area of a protein's primary sequence and a corresponding three-dimensional position of the missing area; applying a machine learning model to backbone Cartesian coordinates of the protein's primary sequence and a protein vector of a representation of the protein's primary sequence including the missing area to predict a missing area of the protein primary sequence and a corresponding three-dimensional position for the missing area, wherein the machine learning model is selected from the group consisting of: an attention-based machine learning model, a bidirectional long short term memory-based model, and a convolutional neural network-based model; and outputting a result of the machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: receiving, at a protein sequence predictor comprising one or more processors, a request to predict a missing area of a protein primary sequence and a corresponding three-dimensional position of the missing area, the request including a representation of the protein primary sequence, backbone Cartesian coordinates for the protein primary sequence, and an indication of ablations in the protein primary sequence; conditioning the protein primary sequence and the backbone Cartesian coordinates for the protein primary sequence by: passing the representation of the protein primary sequence as input to an attention-based machine learning model of the protein sequence predictor and applying an embedding of the attention-based machine learning model to the representation of the protein primary sequence, obtaining output of a protein vector from the attention-based machine learning model, passing the backbone Cartesian coordinates as input to the protein sequence predictor to capture features in sequence space, and obtaining output of processed backbone Cartesian coordinates from the protein sequence predictor; combining the processed backbone Cartesian coordinates and the protein vector to generate a combined coordinate vector and protein vector; passing the combined coordinate vector and protein vector as input to the attention-based machine learning model; obtaining output of a prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area from the attention-based machine learning model; and generating a three-dimensional representation of the protein based on the output of the prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area. 2. The computer-implemented method of claim 1 , wherein the representation of the protein primary sequence uses an amino acid code consistent with International Union of Pure and Applied Chemistry usage. 3. The computer-implemented method of claim 1 , wherein the attention-based machine learning model is a transformer-based model. 4. A computer-implemented method comprising: receiving, at a protein sequence predictor comprising one or more processors, a request to predict a missing area of a protein primary sequence and a corresponding three-dimensional position of the missing area; passing as input to a machine learning model of the protein sequence predictor backbone Cartesian coordinates of the protein primary sequence and a protein vector of a representation of the protein primary sequence including the missing area, wherein the machine learning model is selected from the group consisting of: an attention-based machine learning model, a bidirectional long short term memory-based model, and a convolutional neural network-based model; and obtaining output of a prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area from the machine learning model. 5. The computer-implemented method of claim 4 , further comprising: applying an embedding of the machine learning model to the representation of the protein primary sequence to generate the protein vector. 6. The computer-implemented method of claim 5 , wherein the representation of the protein primary sequence is a character-based representation. 7. The computer-implemented method of claim 6 , wherein characters of the character-based representation conform to an amino acid code consistent with International Union of Pure and Applied Chemistry usage. 8. The computer-implemented method of claim 6 , wherein the request includes an indication of regions of ablation using a set of mask tokens in the representation of the protein primary sequence. 9. The computer-implemented method of claim 4 , wherein the request includes processed backbone Cartesian coordinates for the protein primary sequence and an embedded representation of the protein primary sequence as the protein vector. 10. The computer-implemented method of claim 4 , wherein the machine learning model is a transformer-based model. 11. The computer-implemented method of claim 4 , wherein the machine learning model is a convolutional neural network-based model comprising a stack of residual block layers. 12. The computer-implemented method of claim 4 , wherein the machine learning model is a long short term memory-based model comprising a stack of bidirectional long short term memory-based layers. 13. The computer-implemented method of claim 4 , further comprising: generating a 3-D representation from the output of the machine learning model. 14. The computer-implemented method of claim 4 , further comprising: combining the backbone Cartesian coordinates of the protein primary sequence and the protein vector of the representation of the protein primary sequence prior to passing the input to the machine learning model. 15. A system comprising: a first one or more electronic devices to implement a three-dimensional generation service in a multi-tenant provider network; and a second one or more electronic devices to implement a protein sequence predictor service in the multi-tenant provider network, the protein sequence predictor service including memory storing instructions that upon execution by one or more processors of the protein sequence predictor service, cause the protein sequence predictor service to: receive a request to predict a missing area of a protein primary sequence and a corresponding three-dimensional position of the missing area, pass as input to a machine learning model of the protein sequence predictor service backbone Cartesian coordinates of the protein primary sequence and a protein vector of a representation of the protein primary sequence including the missing area, wherein the machine learning model is selected from the group consisting of: an attention-based machine learning model, a bidirectional long short term memory-based model, and a convolutional neural network-based model, and obtain output of a prediction of the missing area of the protein primary sequence and the corresponding three-dimensional position of the missing area from the machine learning model, wherein the three-dimensional generation service is to generate a three-dimensional representation of the output. 16. The system of claim 15 , wherein the protein sequence predictor service is to apply an embedding of the machine learning model to the representation of the protein primary sequence to generate the protein vector. 17. The system of claim 16 , wherein the representation of the protein primary sequence is a character-based representation. 18. The system of claim 17 , wherein characters of the character-based representation conform to an amino acid code consistent with International Union of Pure and Applied Chemistry usage. 19. The system of claim 15 , wherein the request includes an indication of regions of ablation using a set of mask tokens in the representation of the protein primary sequence. 20. The system of claim 15 , wherein the request includes processed backbone Cartesian coordinates for the protein primary sequence and an embedded representation of the protein primary sequence as the protein vector.

Assignees

Amazon Tech Inc

Inventors

Classifications

G16B15/00Primary
ICT specially adapted for analysing two-dimensional [2D] or three-dimensional [3D] molecular structures, e.g. structural or functional relations or structure alignment · CPC title
G16B30/20
Sequence assembly · CPC title
G16B40/00
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
G16B40/20Primary
Supervised data analysis · CPC title
G16B15/20
Protein or domain folding · CPC title

Patent family

Related publications grouped by family.

View patent family 90062045

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11923044B1 cover?: Techniques for predicting a protein sequence are described. An exemplary method includes receiving a request to predict a missing area of a protein's primary sequence and a corresponding three-dimensional position of the missing area; applying a machine learning model to backbone Cartesian coordinates of the protein's primary sequence and a protein vector of a representation of the protein's pr…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G16B15/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 05 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).