High-order semi-Restricted Boltzmann Machines and Deep Models for accurate peptide-MHC binding prediction
US-2015278441-A1 · Oct 1, 2015 · US
US11557375B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11557375-B2 |
| Application number | US-201917059157-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 14, 2019 |
| Priority date | Aug 20, 2018 |
| Publication date | Jan 17, 2023 |
| Grant date | Jan 17, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are provided for predicting MHC-peptide binding affinity. A plurality of training peptide sequences is obtained, and a neural network model is trained to predict MHC-peptide binding affinity using the training peptide sequences. An encoder of the neural network model comprising an RNN is configured to process an input training peptide sequence to generate a fixed-dimension encoding output by applying a final hidden state of the RNN at intermediate state outputs of the RNN to generate attention weighted outputs, and linearly combining the attention weighted outputs. A fully connected layer following the encoder is configured to process the fixed-dimension encoding output to generate an MHC-peptide binding affinity prediction output. A computing device is configured to use the trained neural network to predict MHC-peptide binding affinity for a test peptide sequence.
Opening claim text (preview).
We claim: 1. A computing system-implemented method of predicting major histocompatibility complex (MHC)-peptide binding affinity, the method comprising: obtaining a plurality of training peptide sequences of variable length; training, by one or more computing devices, a recurrent neural network (RNN) model comprising at least one fully connected layer to predict MHC-peptide binding affinity with respect to an MHC allele sequence, wherein training the RNN model comprises, for each training peptide sequence of the plurality of training peptide sequences of variable length, iteratively: inputting the training peptide sequence into the RNN model; generating a fixed-dimension encoding output by processing the training peptide sequence, wherein the processing comprises applying a final hidden state of the RNN model at intermediate states of the RNN model and attention weighting to one or more positions of the training peptide sequence; generating an MHC-peptide binding affinity prediction output between the training peptide sequence and the MHC allele sequence by processing the fixed-dimension encoding output using the at least one fully connected layer of the RNN model; determining a loss factor by comparing the attention weighting to a known MHC-peptide binding affinity value corresponding to the training peptide sequence; and updating at least one parameter of a set of parameters of the RNN model based on the loss factor; inputting a test peptide sequence into the trained RNN model; and generating, by the trained RNN model, an MHC-peptide binding affinity prediction output for the test peptide sequence with respect to the MHC allele sequence. 2. The method of claim 1 , wherein applying the final hidden state at an intermediate state of the RNN model comprises taking a dot product, a weighted product, or other function, of the final hidden state and the intermediate state. 3. The method of claim 1 , further comprising applying weights learned through the training of the RNN model to the final hidden state prior to applying the final hidden state at intermediate states of the RNN model. 4. The method of claim 1 , further comprising concatenating the final hidden state with a final hidden state of an encoder of a second neural network model prior to applying the final hidden state at intermediate states of the RNN model. 5. The method of claim 4 , wherein the second neural network model is configured based on the set of parameters of the trained RNN model to predict MHC-peptide binding affinity for an MHC allele input. 6. The method of claim 1 , wherein the fixed-dimension encoding output generated by the RNN model comprises one or more positions each corresponding to an amino acid position of a training peptide sequence inputted into the RNN model. 7. The method of claim 6 , wherein each of the one or more positions of the fixed-dimension encoding output is a single value. 8. The method of claim 1 , wherein the RNN model comprises one of a Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit (GRU) RNN or variant thereof. 9. The method of claim 1 , wherein the RNN model comprises a bidirectional RNN model. 10. The method of claim 9 , wherein the fixed-dimension encoding output is generated by concatenating outputs of the bidirectional RNN model. 11. The method of claim 1 , wherein the plurality of training peptide sequences of variable length comprises two or more sequence lengths. 12. The method of claim 1 , wherein the plurality of training peptide sequences is one of one-hot, BLOSUM, PAM, or learned embedding encoded. 13. The method of claim 1 , wherein each training peptide sequence of the plurality of training peptide sequences is between 6-20 amino acids in length. 14. The method of claim 1 , wherein each training peptide sequence of the plurality of training peptide sequences is between 10-30 amino acids in length. 15. The method of claim 1 , wherein each training peptide sequence of the plurality of training peptide sequences is a positive MHC-peptide binding example. 16. The method of claim 1 , wherein the test peptide sequence is between 6-20 amino acids in length. 17. The method of claim 1 , wherein the test peptide sequence is between 10-30 amino acids in length. 18. The method of claim 1 , wherein the test peptide sequence has a sequence length different from a sequence length of at least one of the plurality of training peptide sequences. 19. The method of claim 1 , wherein the test peptide sequence is one of one-hot, BLOSUM, PAM, or learned embedding encoded. 20. The method of claim 1 , wherein generating MHC-peptide binding affinity prediction output for the test peptide sequence comprises generating a single prediction value. 21. The method of claim 20 , wherein the single prediction value relates to a likelihood of activating a T-cell response to a tumor. 22. The method of claim 1 , wherein the at least one fully connected layer comprises two fully connected layers. 23. The method of claim 1 , wherein the at least one fully connected layer comprises one of a deep convolutional neural network, a residual neural network, a densely connected convolutional neural network, a fully convolutional neural network, or an RNN. 24. The method of claim 1 , wherein generating, by the trained RNN model, the MHC-peptide binding affinity prediction output for the test peptide sequence comprises: generating a fixed-dimension encoding output by processing the test peptide sequence, wherein the processing comprises applying a final hidden state of the RNN model at intermediate states of the RNN model and attention weighting to one or more positions of the test peptide sequence, and generating the MHC-peptide binding affinity prediction output by processing the fixed-dimension encoding output using the at least one fully connected layer of the trained RNN model. 25. A computer program product embedded in a non-transitory computer-readable medium comprising instructions executable by a computer processor for predicting major histocompatibility complex (MHC)-peptide binding affinity, which, when executed by a processor, cause the processor to perform one or more steps comprising: obtaining a plurality of training peptide sequences of variable length; training a recurrent neural network (RNN) model comprising at least one fully connected layer to predict MHC-peptide binding affinity with respect to an MHC allele sequence, wherein training the RNN model comprises, for each training peptide sequence of the plurality of training peptide sequences of variable length, iteratively: inputting the training peptide sequence into the RNN model; generating a fixed-dimension encoding output by processing the training peptide sequence, wherein the processing comprises applying a final hidden state of the RNN model at intermediate states of the RNN model and attention weighting to one or more positions of the training peptide sequence; generating an MHC-peptide binding affinity prediction output between the training peptide sequence and the MHC allele sequence by processing the fixed-dimension encoding output using the at least one fully connected layer of the RNN model; determining a loss factor by comparing the attention weighting to a known MHC-peptide binding affinity value corresponding to the training peptide sequence; and updating at least one parameter of a set of parameters of the RNN model based on the loss factor;
Supervised data analysis · CPC title
Learning methods · CPC title
Drug targeting using structural data; Docking or binding prediction · CPC title
ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title
Probabilistic models · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.