Method, computer-accessible medium and system for base-calling and alignment

US10964408B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10964408-B2
Application numberUS-201013266662-A
CountryUS
Kind codeB2
Filing dateApr 27, 2010
Priority dateApr 27, 2009
Publication dateMar 30, 2021
Grant dateMar 30, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Exemplary methods, procedures, computer-accessible medium, and systems for base-calling, aligning and polymorphism detection and analysis using raw output from a sequencing platform can be provided. A set of raw outputs can be used to detect polymorphisms in an individual by obtaining a plurality of sequence read data from one or more technologies (e.g., using sequencing-by-synthesis, sequencing-by-ligation, sequencing-by-hybridization, Sanger sequencing, etc.). For example, provided herein are exemplary methods, procedures, computer-accessible medium and systems, which can include and/or be configured for obtaining raw output from a sequencing platform configured to be used for reading fragment(s) of genomes, obtaining reference sequences for the genomes obtained independently from the raw output, and generating a base-call interpretation and/or alignment using the raw output and the reference sequences. For example, a score function can be determined based on information associated with the sequencing platform that can be used to analyze polymorphisms based on the base-call interpretation and/or alignment.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-accessible medium having stored thereon computer executable instructions for assembling at least one genetic sequence which, when executed by a hardware processing arrangement, configure the hardware processing arrangement to: (a) obtain a series of raw intensity outputs from a sequencing platform configured to (i) be used for reading a fragment of at least one genome and (ii) use a sequencing-by-ligation procedure, wherein each of the obtained raw intensity outputs comprises a plurality of randomly located short sequence reads, and wherein each of the randomly located short sequence reads has a read length of at least 48 base pairs (bps); (b) obtain at least one reference sequence for the at least one genome, wherein the at least one reference sequence for the at least one genome is obtained independently from the series of first raw intensity outputs obtained from the sequencing platform; (c) automatically generate a search tree comprising a plurality of nodes, wherein each of the plurality of nodes corresponds to a particular nucleotide base; (d) automatically select a node of the plurality of nodes in the search tree; (e) automatically expand the selected node by creating a plurality of child nodes, each of the plurality of child nodes corresponding to a particular further nucleotide base; (f) automatically generate a score for one or more of the plurality of child nodes, wherein the score is a function of (i) at least one raw intensity output from the series of raw intensity outputs, (ii) the plurality of reference sequences, and (iii) the nucleotide base to which a particular one of the plurality of child nodes corresponds; (g) automatically select one or more of the plurality of child nodes based on the score; (h) automatically repeat procedures (e)-(g) for the selected child node; (i) automatically generate a path through the plurality of nodes; and (j) automatically assemble the at least one genetic sequence based on the path. 2. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to: automatically generate the score using a score function; determine the score function based on information associated with a sequencing platform from which the series of raw intensity outputs are obtained; and with the score function, analyze polymorphisms based on at least one of the raw intensity outputs or the reference sequences. 3. The computer-accessible medium of claim 1 , wherein the sequencing platform is further configured to utilize at least one of a Sanger chemistry procedure or a sequencing-by-synthesis procedure. 4. The computer-accessible medium of claim 1 , wherein the read length is at least 78 bps. 5. The computer-accessible medium of claim 1 , wherein each of the raw intensity outputs further comprises at least one error associated with at least one of the plurality of randomly located short sequence reads. 6. The computer-accessible medium of claim 5 , wherein the at least one error is related to at least one of an incorrect base-call, a missing base, one or more inserted bases, one or more deleted bases, or a homopolymeric compression. 7. The computer-accessible medium of claim 1 , wherein the at least one genome comprises a genome from at least one of (i) one or more diseased cells, (ii) one or more normal cells, (iii) at least one individual organism, (iv) at least one population, or (v) at least one ecological system. 8. The computer-accessible medium of claim 1 , wherein the at least one reference sequence is obtained from at least one of (i) a mathematical model, (ii) existing data, (iii) genomic single-molecules, or (iv) genomic materials that are at least one of amplified or otherwise modified. 9. The computer-accessible medium of claim 2 , wherein the analyzing procedure comprises a branch-and-bound process. 10. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the score based on an alignment between the raw intensity outputs and the at least one reference sequence. 11. The computer-accessible medium of claim 10 , wherein the alignment includes determining, with the processing arrangement, if any of the raw intensity outputs is contained are within the reference sequences. 12. A method for assembling at least one genetic sequence, comprising: (a) obtaining a series of raw intensity outputs from a sequencing platform configured to (i) be used for reading a fragment of at least one genome and (ii) use a sequencing-by-ligation procedure, wherein each of the obtained raw intensity outputs comprises a plurality of randomly located short sequence reads, and wherein each of the randomly located short sequence reads has a read length of at least 48 base pairs (bps); (b) obtaining at least one reference sequence for the at least one genome, wherein the at least one reference sequence for the at least one genome is obtained independently from the series of raw intensity outputs obtained from the sequencing platform; (c) automatically generating a search tree comprising a plurality of nodes, wherein each of the plurality of nodes corresponds to a particular nucleotide base; (d) automatically selecting a node of the plurality of nodes in the search tree; (e) automatically expanding the selected node by creating a plurality of child nodes, each of the child nodes corresponding to a particular further nucleotide base; (f) automatically generating a score for one or more of the child nodes, wherein the score is a function of (i) at least one raw intensity output from the series of raw intensity outputs, (ii) the plurality of reference sequences, and (iii) the nucleotide base to which a particular one of the plurality of child nodes corresponds; (g) automatically selecting one or more of the plurality of child nodes based on the score; (h) automatically repeating procedures (e)-(g) for the selected child node; (i) automatically generating a path through the plurality of nodes; and (j) using a computer hardware arrangement, automatically assembling the at least one genetic sequence based on the path. 13. The method of claim 12 , further comprising: automatically generating the score using a score function; automatically determining the score function based on information associated with a sequencing platform from which the series of raw intensity outputs are obtained; and with the score function, automatically analyzing polymorphisms based on at least one of the raw intensity outputs or the reference sequences. 14. The method of claim 12 , wherein the sequencing platform is further configured to utilize at least one of a Sanger chemistry procedure or a sequencing-by-synthesis procedure. 15. The method of claim 12 , wherein the read length is at least 78 bps. 16. The method of claim 12 , wherein each of the raw intensity outputs further comprises at least one error associated with at least one of the plurality of randomly located short sequence reads. 17. The method of claim 16 , wherein the at least one error is related to at least one of an incorrect base-call, a missing base, one or more inserted bases, one or more deleted bases, or a homopolymeric compression. 18. The method of claim 12 , wherein the at least one genome comprises a genome from at least one of (i) one or more diseased cells, (ii) one or more normal cells, (iii) at least one individual organism, (iv) at least one population, or (v) at least one ecological system. 19. The method of cl

Assignees

Inventors

Classifications

  • G16B30/00Primary

    ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

  • Sequence assembly · CPC title

  • Sequence alignment; Homology search · CPC title

  • Methods for sequencing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10964408B2 cover?
Exemplary methods, procedures, computer-accessible medium, and systems for base-calling, aligning and polymorphism detection and analysis using raw output from a sequencing platform can be provided. A set of raw outputs can be used to detect polymorphisms in an individual by obtaining a plurality of sequence read data from one or more technologies (e.g., using sequencing-by-synthesis, sequencin…
Who is the assignee on this patent?
Mishra Bhubaneswar, Narzisi Giuseppe, Univ New York
What technology area does this patent fall under?
Primary CPC classification G16B30/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 30 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).