Systems and methods for detecting recombination

US11250931B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11250931-B2
Application numberUS-201615254258-A
CountryUS
Kind codeB2
Filing dateSep 1, 2016
Priority dateSep 1, 2016
Publication dateFeb 15, 2022
Grant dateFeb 15, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for screening for disease in a genomic sample is includes receiving a representation of a reference genome comprising a sequence of symbols. The presence of a predicted mutational event is identified in a location of the reference genome. An alternate path is created in the reference genome representing the predicted mutational event. A plurality of sequence reads are obtained from a genomic sample, wherein at least one sequence read comprises at least a portion of the predicted mutational event. The at least one sequence read is then mapped to the reference genome and a location is determined corresponding to the predicted mutational event. The predicted mutational event is then identified as present in the genomic sample. The method may be used to detect evidence of non-allelic homologous recombination (NAHR) occurring in genomic samples.

First claim

Opening claim text (preview).

What is claimed is: 1. A method of detecting a non-allelic homologous recombination (NAHR), the method comprising: accessing, by a computer system, a genomic reference stored in the computer system and including information specifying a graph having a plurality of nodes and edges, the nodes representing nucleotide sequences, wherein the plurality of nodes and edges is stored as a plurality of objects in a memory of the computer system, wherein a first object in the plurality of objects stores a list of pointers specifying one or more locations in the memory at which at least one other object in the plurality of objects is stored, wherein the at least one other object is adjacent to the first object in the graph; identifying within the genomic reference a first pair of homologous nucleotide sequences; modifying the genomic reference to include a new object connecting the first pair of homologous nucleotide sequences and storing the new object in the memory, wherein the new object specifies at least a part of a new path through nodes of the graph and indicates a nucleotide sequence that results from a predicted NAHR event, the new object including a pointer to an object, among the plurality of objects, representing one of the first pair of homologous nucleotide sequences; accessing a sample nucleotide sequence associated with a subject; aligning the sample nucleotide sequence from the subject to the first pair of homologous nucleotide sequences using the new object indicating the nucleotide sequence that results from the predicted NAHR event; and determining, based on results of the aligning, whether the sample nucleotide sequence is indicative of a NAHR event in the subject. 2. The method of claim 1 , wherein the graph is a directed acyclic graph. 3. The method of claim 2 , wherein aligning the sample nucleotide sequence to the first pair of homologous nucleotide sequences comprises assigning a score to each of a plurality of paths through the first pair of homologous nucleotide sequences in the genomic reference modified to include the new object. 4. The method of claim 1 , further comprising: sequencing nucleic acid from the subject to generate a sequence read. 5. The method of claim 1 , wherein identifying the first pair of homologous nucleotide sequences comprises determining whether nucleotide sequences of the first pair have at least about 90% sequence similarity. 6. The method of claim 5 , wherein identifying the first pair of homologous nucleotide sequences comprises determining whether nucleotide sequences of the first pair are separated by a distance between 10 kilobases and 300 kilobases. 7. The method of claim 1 , further comprising identifying the first pair of homologous nucleotide sequences from a database of known repeats. 8. The method of claim 1 , further comprising: identifying a plurality of low copy repeats, each low copy repeat comprising a pair of homologous nucleotide sequences, and modifying the genomic reference to include, for a first low copy repeat of the identified plurality of low copy repeats, a second new object connecting the associated pair of homologous nucleotide sequences and storing the second new object in the memory. 9. The method of claim 1 , wherein identifying the first pair of homologous nucleotide sequences comprises: reading a set of nodes and edges associated with a string of N bases from the genomic reference; and aligning the set of nodes and edges to the genomic reference to determine that nucleotide sequences of the first pair of homologous nucleotide sequences are homologous. 10. The method of claim 9 , further comprising: reading a second set of nodes and edges associated with a string of bases offset along the genomic reference from the string of N bases; and aligning the second set of nodes and edges and the genomic reference. 11. The method of claim 10 , further comprising: iteratively searching a window of nodes and edges of length N at a plurality of positions of the genomic reference; and for each position of the plurality of positions, aligning the second set of nodes and edges to a third set of nodes and edges of length N at the position of the genomic reference. 12. The method of claim 11 , wherein each position of the plurality of positions is offset from a previous position by a fixed distance. 13. A computer system for detecting a non-allelic homologous recombination (NAHR), the computer system comprising: at least one processor; and at least one non-transitory computer-readable storage device storing processor-executable instructions that, when executed by the at least one processor, cause the at least one processor to perform: accessing a genomic reference stored in the at least one non-transitory computer-readable storage device and including information specifying a graph having a plurality of nodes and edges, the nodes representing nucleotide sequences, wherein the plurality of nodes and edges is stored as a plurality of objects in a memory of the computer system, wherein a first object in the plurality of objects stores a list of pointers specifying one or more locations in the memory at which at least one other object in the plurality of objects is stored, wherein the at least one other object is adjacent to the first object in the graph; identifying within the genomic reference a first pair of homologous nucleotide sequences; modifying the genomic reference to include a new object connecting the first pair of homologous nucleotide sequences and storing the new object in the memory, wherein the new object specifies at least a part of a new path through nodes of the graph and indicates a nucleotide sequence that results from a predicted NAHR event, the new object including a pointer to an object, among the plurality of objects, representing one of the first pair of the homologous nucleotide sequences; accessing a sample nucleotide sequence associated with a subject; aligning the sample nucleotide sequence from the subject to the first pair of homologous nucleotide sequences using the new object indicating the nucleotide sequence that results from the predicted NAHR event; and determining, based on results of the aligning, whether the sample nucleotide sequence is indicative of a NAHR event in the subject. 14. The computer system of claim 13 , wherein the graph is a directed acyclic graph. 15. The method of claim 1 , wherein the new object indicates a deletion of an intervening nucleotide sequence between the first pair of homologous nucleotide sequences. 16. The method of claim 1 , wherein two objects in the plurality of objects represent the first pair of homologous nucleotide sequences. 17. The method of claim 16 , the pointer is a pointer stored in a list of pointers in a first of the two objects and is a pointer to a physical location in the memory at which a second of the two objects is stored. 18. At least one computer readable storage device storing processor-executable instructions that, when executed by at least one processor, cause the at least one processor to perform: accessing a genomic reference stored in the at least one non-transitory computer-readable storage device and including information specifying a graph having a plurality of nodes and edges, the nodes representing nucleotide sequences, wherein the plurality of nodes and edges is stored as a plurality of objects in a memory, wherein a first object in the plurality of objects stores a list of pointers specifying one or more locations in the memory at which at least one other object in the plura

Assignees

Inventors

Classifications

  • G16B20/20Primary

    Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection · CPC title

  • G16B20/00Primary

    ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations · CPC title

  • ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • Sequence alignment; Homology search · CPC title

  • for simulation or modelling of medical disorders · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11250931B2 cover?
A method for screening for disease in a genomic sample is includes receiving a representation of a reference genome comprising a sequence of symbols. The presence of a predicted mutational event is identified in a location of the reference genome. An alternate path is created in the reference genome representing the predicted mutational event. A plurality of sequence reads are obtained from a g…
Who is the assignee on this patent?
Seven Bridges Genomics Inc
What technology area does this patent fall under?
Primary CPC classification G16B20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 15 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).