Determination of base modifications of nucleic acids

US11091794B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11091794-B2
Application numberUS-202016995607-A
CountryUS
Kind codeB2
Filing dateAug 17, 2020
Priority dateAug 16, 2019
Publication dateAug 17, 2021
Grant dateAug 17, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for using determination of base modification in analyzing nucleic acid molecules and acquiring data for analysis of nucleic acid molecules are described herein. Base modifications may include methylations. Methods to determine base modifications may include using features derived from sequencing. These features may include the pulse width of an optical signal from sequencing bases, the interpulse duration of bases, and the identity of the bases. Machine learning models can be trained to detect the base modifications using these features. The relative modification or methylation levels between haplotypes may indicate a disorder. Modification or methylation statuses may also be used to detect chimeric molecules.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for detecting a modification of a nucleotide in a nucleic acid molecule, the method comprising: (a) receiving data acquired by measuring pulses in an optical signal corresponding to nucleotides sequenced in a sample nucleic acid molecule and obtaining, from the data, values for the following properties: for each nucleotide: an identity of the nucleotide, a position of the nucleotide within the sample nucleic acid molecule, a width of the pulse corresponding to the nucleotide, and an interpulse duration representing a time between the pulse corresponding to the nucleotide and a pulse corresponding to a neighboring nucleotide; (b) creating an input data structure, the input data structure comprising a window of the nucleotides sequenced in the sample nucleic acid molecule, wherein the input data structure includes, for each nucleotide within the window, the properties: the identity of the nucleotide, a position of the nucleotide with respect to a target position within the window, the width of the pulse corresponding to the nucleotide, and the interpulse duration; (c) inputting the input data structure into a model, the model trained by: receiving a first plurality of first data structures, each first data structure of the first plurality of data structures corresponding to a respective window of nucleotides sequenced in a respective nucleic acid molecule of a plurality of first nucleic acid molecules, wherein each of the first nucleic acid molecules is sequenced by measuring pulses in the optical signal corresponding to the nucleotides, wherein the modification has a known first state in a nucleotide at a target position in each window of each first nucleic acid molecule, each first data structure comprising values for the same properties as the input data structure, storing a plurality of first training samples, each including one of the first plurality of first data structures and a first label indicating the first state of the nucleotide at the target position, and optimizing, using the plurality of first training samples, parameters of the model based on outputs of the model matching or not matching corresponding labels of the first labels when the first plurality of first data structures is input to the model, wherein an output of the model specifies whether the nucleotide at the target position in the respective window has the modification, (d) determining, using the model, whether the modification is present in a nucleotide at the target position within the window in the input data structure. 2. The method of claim 1 , wherein: the input data structure is one input data structure of a plurality of input data structures, the sample nucleic acid molecule is one sample nucleic acid molecule of a plurality of sample nucleic acid molecules, the plurality of sample nucleic acid molecules are obtained from a biological sample of a subject, and each input data structure corresponds to a respective window of nucleotides sequenced in a respective sample nucleic acid molecule of the plurality of sample nucleic acid molecules, and the method further comprising: receiving the plurality of input data structures, inputting the plurality of input data structures into the model, and determining, using the model, whether a modification is present in a nucleotide at a target location in the respective window of each input data structure. 3. The method of claim 2 , further comprising: determining the modification is present at one or more nucleotides, and determining a classification of a disorder using the presence of the modification at one or more nucleotides. 4. The method of claim 3 , wherein the disorder comprises cancer. 5. The method of claim 4 , further comprising: determining that the classification of the disorder is that the subject has the disorder, and treating the subject for the disorder by chemotherapy, radiation, or surgery. 6. The method of claim 3 , wherein determining the classification of the disorder uses the number of modifications or the sites of the modifications. 7. The method of claim 2 , wherein the modification is a methylation, the method further comprising: determining the modification is present at one or more nucleotides, and determining a clinically-relevant DNA fraction, a fetal methylation profile, a maternal methylation profile, a presence of an imprinting gene region, or a tissue of origin using the presence of the modification at one or more nucleotides. 8. The method of claim 2 , wherein each sample nucleic acid molecule of the plurality of sample nucleic acid molecules has a size greater than a cutoff size. 9. The method of claim 2 , wherein: the plurality of sample nucleic acid molecules align to a plurality of genomic regions, for each genomic region of the plurality of genomic regions: a number of sample nucleic acid molecules is aligned to the genomic region, the number of sample nucleic acid molecules is greater than a cutoff number. 10. The method of claim 1 , further comprising sequencing the sample nucleic acid molecule. 11. The method of claim 1 , wherein the model includes a machine learning model, a principal component analysis, a convolutional neural network, or a logistic regression. 12. The method of claim 1 , wherein: the window of nucleotides corresponding to the input data structure comprises nucleotides on a first strand of the sample nucleic acid molecule and nucleotides on a second strand of the sample nucleic acid molecule, and the input data structure further comprises for each nucleotide within the window a value of a strand property, the strand property indicating the nucleotide being present on either the first strand or the second strand. 13. The method of claim 12 , wherein the sample nucleic acid molecule is a circular DNA molecule formed by: cutting a double-stranded DNA molecule using a Cas9 complex to form a cut double-stranded DNA molecule, and ligating a hairpin adaptor onto an end of the cut double-stranded DNA molecule. 14. The method of claim 1 , wherein the nucleotides within the window are determined using a circular consensus sequence and without alignment of the sequenced nucleotides to a reference genome. 15. The method of claim 1 , wherein each nucleotide within the window is enriched or filtered. 16. The method of claim 15 , wherein each nucleotide within the window is enriched by: cutting a double-stranded DNA molecule using a Cas9 complex to form a cut double-stranded DNA molecule, and ligating a hairpin adaptor onto an end of the cut double-stranded DNA molecule, or filtered by: selecting double-stranded DNA molecules having a size with a size range. 17. The method of claim 1 , wherein nucleotides within the window are determined without using a circular consensus sequence and without alignment of the sequenced nucleotides to a reference genome. 18. The method of claim 1 , wherein the optical signal is a fluorescence signal from a dye-labeled nucleotide. 19. The method of claim 1 , wherein each window associated with the first plurality of data structures comprises 4 consecutive nucleotides on a first strand of each first nucleic acid molecule.

Assignees

Inventors

Classifications

  • being a microscope, e.g. atomic force microscopy [AFM] · CPC title

  • Methylation detection other then bisulfite or methylation sensitive restriction endonucleases · CPC title

  • Supervised data analysis · CPC title

  • Signal processing, e.g. from mass spectrometry [MS] or from PCR · CPC title

  • G16B30/00Primary

    ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11091794B2 cover?
Systems and methods for using determination of base modification in analyzing nucleic acid molecules and acquiring data for analysis of nucleic acid molecules are described herein. Base modifications may include methylations. Methods to determine base modifications may include using features derived from sequencing. These features may include the pulse width of an optical signal from sequencing…
Who is the assignee on this patent?
Univ Hong Kong Chinese
What technology area does this patent fall under?
Primary CPC classification G16B30/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 17 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).