Methods and systems for improved major histocompatibility complex (MHC)-peptide binding prediction of neoepitopes using a recurrent neural network encoder and attention weighting

US11557375B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11557375-B2
Application numberUS-201917059157-A
CountryUS
Kind codeB2
Filing dateAug 14, 2019
Priority dateAug 20, 2018
Publication dateJan 17, 2023
Grant dateJan 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are provided for predicting MHC-peptide binding affinity. A plurality of training peptide sequences is obtained, and a neural network model is trained to predict MHC-peptide binding affinity using the training peptide sequences. An encoder of the neural network model comprising an RNN is configured to process an input training peptide sequence to generate a fixed-dimension encoding output by applying a final hidden state of the RNN at intermediate state outputs of the RNN to generate attention weighted outputs, and linearly combining the attention weighted outputs. A fully connected layer following the encoder is configured to process the fixed-dimension encoding output to generate an MHC-peptide binding affinity prediction output. A computing device is configured to use the trained neural network to predict MHC-peptide binding affinity for a test peptide sequence.

First claim

Opening claim text (preview).

We claim: 1. A computing system-implemented method of predicting major histocompatibility complex (MHC)-peptide binding affinity, the method comprising: obtaining a plurality of training peptide sequences of variable length; training, by one or more computing devices, a recurrent neural network (RNN) model comprising at least one fully connected layer to predict MHC-peptide binding affinity with respect to an MHC allele sequence, wherein training the RNN model comprises, for each training peptide sequence of the plurality of training peptide sequences of variable length, iteratively: inputting the training peptide sequence into the RNN model; generating a fixed-dimension encoding output by processing the training peptide sequence, wherein the processing comprises applying a final hidden state of the RNN model at intermediate states of the RNN model and attention weighting to one or more positions of the training peptide sequence; generating an MHC-peptide binding affinity prediction output between the training peptide sequence and the MHC allele sequence by processing the fixed-dimension encoding output using the at least one fully connected layer of the RNN model; determining a loss factor by comparing the attention weighting to a known MHC-peptide binding affinity value corresponding to the training peptide sequence; and updating at least one parameter of a set of parameters of the RNN model based on the loss factor; inputting a test peptide sequence into the trained RNN model; and generating, by the trained RNN model, an MHC-peptide binding affinity prediction output for the test peptide sequence with respect to the MHC allele sequence. 2. The method of claim 1 , wherein applying the final hidden state at an intermediate state of the RNN model comprises taking a dot product, a weighted product, or other function, of the final hidden state and the intermediate state. 3. The method of claim 1 , further comprising applying weights learned through the training of the RNN model to the final hidden state prior to applying the final hidden state at intermediate states of the RNN model. 4. The method of claim 1 , further comprising concatenating the final hidden state with a final hidden state of an encoder of a second neural network model prior to applying the final hidden state at intermediate states of the RNN model. 5. The method of claim 4 , wherein the second neural network model is configured based on the set of parameters of the trained RNN model to predict MHC-peptide binding affinity for an MHC allele input. 6. The method of claim 1 , wherein the fixed-dimension encoding output generated by the RNN model comprises one or more positions each corresponding to an amino acid position of a training peptide sequence inputted into the RNN model. 7. The method of claim 6 , wherein each of the one or more positions of the fixed-dimension encoding output is a single value. 8. The method of claim 1 , wherein the RNN model comprises one of a Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit (GRU) RNN or variant thereof. 9. The method of claim 1 , wherein the RNN model comprises a bidirectional RNN model. 10. The method of claim 9 , wherein the fixed-dimension encoding output is generated by concatenating outputs of the bidirectional RNN model. 11. The method of claim 1 , wherein the plurality of training peptide sequences of variable length comprises two or more sequence lengths. 12. The method of claim 1 , wherein the plurality of training peptide sequences is one of one-hot, BLOSUM, PAM, or learned embedding encoded. 13. The method of claim 1 , wherein each training peptide sequence of the plurality of training peptide sequences is between 6-20 amino acids in length. 14. The method of claim 1 , wherein each training peptide sequence of the plurality of training peptide sequences is between 10-30 amino acids in length. 15. The method of claim 1 , wherein each training peptide sequence of the plurality of training peptide sequences is a positive MHC-peptide binding example. 16. The method of claim 1 , wherein the test peptide sequence is between 6-20 amino acids in length. 17. The method of claim 1 , wherein the test peptide sequence is between 10-30 amino acids in length. 18. The method of claim 1 , wherein the test peptide sequence has a sequence length different from a sequence length of at least one of the plurality of training peptide sequences. 19. The method of claim 1 , wherein the test peptide sequence is one of one-hot, BLOSUM, PAM, or learned embedding encoded. 20. The method of claim 1 , wherein generating MHC-peptide binding affinity prediction output for the test peptide sequence comprises generating a single prediction value. 21. The method of claim 20 , wherein the single prediction value relates to a likelihood of activating a T-cell response to a tumor. 22. The method of claim 1 , wherein the at least one fully connected layer comprises two fully connected layers. 23. The method of claim 1 , wherein the at least one fully connected layer comprises one of a deep convolutional neural network, a residual neural network, a densely connected convolutional neural network, a fully convolutional neural network, or an RNN. 24. The method of claim 1 , wherein generating, by the trained RNN model, the MHC-peptide binding affinity prediction output for the test peptide sequence comprises: generating a fixed-dimension encoding output by processing the test peptide sequence, wherein the processing comprises applying a final hidden state of the RNN model at intermediate states of the RNN model and attention weighting to one or more positions of the test peptide sequence, and generating the MHC-peptide binding affinity prediction output by processing the fixed-dimension encoding output using the at least one fully connected layer of the trained RNN model. 25. A computer program product embedded in a non-transitory computer-readable medium comprising instructions executable by a computer processor for predicting major histocompatibility complex (MHC)-peptide binding affinity, which, when executed by a processor, cause the processor to perform one or more steps comprising: obtaining a plurality of training peptide sequences of variable length; training a recurrent neural network (RNN) model comprising at least one fully connected layer to predict MHC-peptide binding affinity with respect to an MHC allele sequence, wherein training the RNN model comprises, for each training peptide sequence of the plurality of training peptide sequences of variable length, iteratively: inputting the training peptide sequence into the RNN model; generating a fixed-dimension encoding output by processing the training peptide sequence, wherein the processing comprises applying a final hidden state of the RNN model at intermediate states of the RNN model and attention weighting to one or more positions of the training peptide sequence; generating an MHC-peptide binding affinity prediction output between the training peptide sequence and the MHC allele sequence by processing the fixed-dimension encoding output using the at least one fully connected layer of the RNN model; determining a loss factor by comparing the attention weighting to a known MHC-peptide binding affinity value corresponding to the training peptide sequence; and updating at least one parameter of a set of parameters of the RNN model based on the loss factor;

Assignees

Inventors

Classifications

  • G16B40/20Primary

    Supervised data analysis · CPC title

  • Learning methods · CPC title

  • Drug targeting using structural data; Docking or binding prediction · CPC title

  • G16B40/00Primary

    ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding · CPC title

  • Probabilistic models · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11557375B2 cover?
Techniques are provided for predicting MHC-peptide binding affinity. A plurality of training peptide sequences is obtained, and a neural network model is trained to predict MHC-peptide binding affinity using the training peptide sequences. An encoder of the neural network model comprising an RNN is configured to process an input training peptide sequence to generate a fixed-dimension encoding o…
Who is the assignee on this patent?
Nantomics Llc
What technology area does this patent fall under?
Primary CPC classification G16B40/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).