Systems and methods for analyzing viral nucleic acids

US12173374B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12173374-B2
Application numberUS-202318324799-A
CountryUS
Kind codeB2
Filing dateMay 26, 2023
Priority dateSep 1, 2015
Publication dateDec 24, 2024
Grant dateDec 24, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The invention provides systems and methods for analyzing viruses by representing viral genetic diversity with a directed acyclic graph (DAG), which allows genetic sequencing technology to detect rare variations and represent otherwise difficult-to-document diversity within a sample. Additionally, a host-specific sequence DAG can be used to effectively segregate viral nucleic acid sequence reads from host sequence reads when a sample from a host is subject to sequencing. Known viral genomes can be represented using a viral reference DAG and the viral sequence reads from the sample can be compared to viral DAG to identify viral species or strains from which the reads were derived. Where the viral sequence reads indicate great genetic diversity in the virus that was infecting the host, those reads can be assembled into a DAG that itself properly represents that diversity.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: using at least one computer hardware processor to perform: accessing a genomic reference graph from at least one non-transitory memory, the genomic reference graph representing at least a portion of a human genome and comprising nodes and edges connecting the nodes, wherein the nodes are stored as objects in the at least one non-transitory memory, each of the objects comprising pointers to one or more objects representing other nodes, the nodes including a first node, wherein: the first node is stored as a first object, of the objects, in the at least one non-transitory memory, and the first object comprises a first list of one or more pointers to one or more other objects in the at least one non-transitory memory, the first list of one or more pointers being stored in the at least one non-transitory memory; aligning one or more non-viral sequence reads to the genomic reference graph to identify one or more variants that are not represented by the genomic reference graph, the one or more non-viral sequence reads having been obtained from a non-viral sample previously obtained from a subject, wherein the aligning comprises aligning the non-viral sequence reads to the genomic reference graph using the objects representing the nodes and the pointers to one or more objects representing other nodes; updating the genomic reference graph to represent the identified one or more variants, thereby obtaining an updated genomic reference graph, the updating comprising: creating one or more additional nodes representing the one or more variants that are not represented by the genomic reference graph; and storing the one or more additional nodes as one or more additional objects in the at least one non-transitory memory, each of the one or more additional objects comprising a respective list of pointers to one or more other objects in the at least one non-transitory memory; and aligning sequence reads from a viral sample to the updated genomic reference graph to identify one or more candidate viral sequence reads not represented by the updated genomic reference graph, wherein the aligning comprises aligning the sequence reads from the viral sample to the updated genomic reference graph using the objects representing the nodes and the one or more additional objects representing the one or more additional nodes, the viral sample containing a viral nucleic acid and having been previously-obtained from the subject. 2. The method of claim 1 , further comprising: accessing a viral DNA reference graph from the at least one non-transitory memory, the viral DNA reference graph representing a plurality of viral sequences of a genome of at least one virus; and aligning the one or more candidate viral sequence reads to the viral DNA reference graph. 3. The method of claim 2 , further comprising: determining an identity of a virus in the viral sample based on a result of aligning the one or more candidate viral sequence reads to the viral DNA reference graph. 4. The method of claim 3 , further comprising generating a report that includes the identity of the virus in the viral sample. 5. The method of claim 3 , further comprising characterizing a quasispecies of the virus in the viral sample. 6. The method of claim 2 , further comprising: determining, based on a result of aligning the one or more candidate viral sequence reads to the viral DNA reference graph, identities of multiple viruses in the viral sample; and generating a report includes a list of viral species and/or viral strains based on the determined identities. 7. The method of claim 6 , further comprising: quantifying an amount of at least one viral species or viral strain present in the viral sample. 8. The method of claim 2 , wherein the viral DNA reference graph is a directed acyclic graph (DAG). 9. The method of claim 2 , wherein the viral DNA reference graph represents a plurality of known variations in the genome of the at least one virus. 10. The method of claim 1 , wherein the sequence reads from the viral sample comprise viral sequence reads and non-viral sequence reads, the viral sequence reads including the one or more candidate viral sequence reads. 11. The method of claim 10 , wherein aligning the sequence reads from the viral sample to the genomic reference graph to identify the one or more candidate viral sequence reads comprises: determining whether any of the sequence reads from the viral sample fail to align to the updated genomic reference graph; and upon determining that one or more of the sequence reads from the viral sample fail to align to the updated genomic reference graph, identifying the one or more of the sequence reads as the one or more candidate viral sequence reads. 12. The method of claim 11 , further comprising: upon determining that one or more of the sequence reads from the viral sample align to the updated genomic reference graph, identifying the one or more aligned sequence reads as one or more of the non-viral sequence reads of the sequence reads from the viral sample. 13. The method of claim 1 , wherein the genomic reference graph is a directed acyclic graph (DAG). 14. The method of claim 1 , wherein the genomic reference graph represents a plurality of known variations of at least the portion of the human genome. 15. The method of claim 1 , wherein aligning the sequence reads from the viral sample to the updated genomic reference graph comprises determining alignment scores between the sequence reads and symbol strings associated with nodes of the updated genomic reference graph, the symbol strings representing sequences of one or more nucleotides. 16. The method of claim 15 , wherein the first object comprises a first symbol string representing a first sequence of one or more nucleotides, wherein the one or more additional nodes of the updated genomic reference graph comprises a second node stored as a second object in the at least one non-transitory memory, the second object comprising a second symbol string representing a second sequence of one or more nucleotides, and wherein determining an alignment score for the second node comprises, for a first symbol in the second symbol string associated with the second node, determining the alignment score for the second node based on an alignment score associated with the first node. 17. The method of claim 16 , wherein aligning a first sequence read of the sequence reads from the viral sample to the updated genomic reference graph comprises: determining a first alignment score between a portion of the first sequence read and a portion of the updated genomic reference graph preceding and including a symbol in the second symbol string. 18. The method of claim 17 , wherein determining the first alignment score between the portion of the first sequence read and the portion of the updated genomic reference graph preceding and including the symbol in the second symbol string comprises: determining the first alignment score based on an alignment score associated with the first node, if and only if the symbol comprises the first symbol of the second symbol string. 19. A system, comprising: at least one computer hardware processor; and at least one non-transitory memory storing processor-executable instructions that, when executed by the at least one computer hardware processor, cause the at least one computer hardware processor to perform: accessing a genomic reference graph from the at least one non-transitory memory, the genomic reference g

Assignees

Inventors

Classifications

  • C12Q1/6809Primary

    Methods for determination or identification of nucleic acids involving differential detection · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • involving virus or bacteriophage {(immunoassay for viruses G01N33/56983)} · CPC title

  • Polymorphic or mutational markers · CPC title

  • C12Q1/701Primary

    Specific hybridization probes · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12173374B2 cover?
The invention provides systems and methods for analyzing viruses by representing viral genetic diversity with a directed acyclic graph (DAG), which allows genetic sequencing technology to detect rare variations and represent otherwise difficult-to-document diversity within a sample. Additionally, a host-specific sequence DAG can be used to effectively segregate viral nucleic acid sequence reads…
Who is the assignee on this patent?
Seven Bridges Genomics Inc
What technology area does this patent fall under?
Primary CPC classification C12Q1/6809. Mapped technology areas include Chemistry & Metallurgy.
When was this patent published?
Publication date Tue Dec 24 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).