Who is the assignee on this patent?

Mishra Bhubaneswar, Narzisi Giuseppe, Univ New York

What technology area does this patent fall under?

Primary CPC classification G16B30/20. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, computer-accessible medium and systems for score-driven whole-genome shotgun sequence assemble

US10839940B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10839940-B2
Application number	US-200913139809-A
Country	US
Kind code	B2
Filing date	Dec 23, 2009
Priority date	Dec 24, 2008
Publication date	Nov 17, 2020
Grant date	Nov 17, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Exemplary embodiments of the present disclosure relate generally to methods, computer-accessible medium and systems for assembling haplotype and/or genotype sequences of at least one genome, which can be based upon, e.g., consistent layouts of short sequence reads and long-range genome related data. For example, a processing arrangement can be configured to perform a procedure including, e.g., obtaining randomly located short sequence reads, using at least one score function in combination with constraints based on, e.g., the long range data, generating a layout of randomly located short sequence reads such that the layout is globally optimal with respect to the score function, obtained through searching coupled with score and constraint dependent pruning to determine the globally optimal layout substantially satisfying the constraints, generating a whole and/or a part of a genome wide haplotype sequence and/or genotype sequence, and converting a globally optimal layout into one or more consensus sequences.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-accessible medium having stored thereon computer executable instructions for assembling at least one part of at least one of at least one haplotype sequence or at least one genotype sequence of at least one genome, wherein, when the executable instructions are executed by a computer processing arrangement, the processing arrangement is configured to perform at least one procedure comprising: (a) obtaining (i) a plurality of randomly located short sequence reads, and (ii) overlap information about overlaps between the randomly located short sequence reads; (b) obtaining long range information for the randomly located short sequence reads, wherein the long range information includes optical map data and mate-pair data; (c) automatically randomly selecting a first read from the randomly located short sequence reads; (d) automatically identifying one or more overlapping second reads of the randomly located short sequence reads that overlap with the first read; (e) automatically generating one or more scores for the one or more overlapping second reads using the overlap information and the long range information; (f) selecting a particular read of the one or more second overlapping reads based on the one or more scores; (g) automatically generating a path through the plurality of randomly located short sequence reads by repeating procedures (e) and (f); and (h) automatically assembling the at least one part of the at least one of the at least one haplotype sequence or the at least one genotype sequence of the genome based on the path. 2. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores based on at least one of a containment or an overhang among a single pair of the randomly located short sequence reads. 3. The computer-accessible medium of claim 2 , wherein the processing arrangement is further configured to evaluate the at least one of the containment or the overhang using at least one of (i) an orientation of the randomly located short sequence reads, (ii) a location of the randomly located short sequence reads, or (iii) a haplotypic identity of the randomly located short sequence reads. 4. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores using a weighted transitivity score. 5. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores using a Bayesian likelihood. 6. The computer-accessible medium of claim 5 , wherein the Bayesian likelihood is based on at least one penalty function. 7. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores based on a plurality of homologous reference sequences. 8. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to generate the one or more scores based on short range information. 9. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to prune at least one of the paths. 10. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on the one or more scores. 11. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on the overlap information. 12. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on a maximum number of candidate paths allowed in a queue. 13. The computer-accessible medium of claim 12 , wherein the maximum number of candidate paths allowed in the queue is fixed. 14. The computer-accessible medium of claim 9 , wherein the processing arrangement is configured to prune the at least one of the paths based on a percentage of top ranking paths compared to an optimum score. 15. The computer-accessible medium of claim 14 , wherein the percentage of top ranking paths compared to an optimum score dynamically changes over time. 16. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to obtain the randomly located short sequence reads using at least one of (i) Sanger chemistry, (ii) sequencing-by-synthesis, (iii) sequencing-by-hybridization, or (iv) sequencing-by-ligation. 17. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to obtain the randomly located short sequence reads using at least one method having at least one error, wherein the at least one error is at least one of: (i) incorrect base-calls, (ii) missing bases, (iii) inserted bases, or (iv) homopolymeric compression. 18. The computer-accessible medium of claim 1 , wherein the long-range information further includes a physical map that is at least one of (i) an ordered restriction map, (ii) a probe map, or (iii) a base-distribution map. 19. The computer-accessible medium of claim 1 , wherein the processing arrangement is further configured to evaluate the scoring procedure based on a consistency of the one or more scores with respect to the long-range information by determining a local alignment with an alignment score. 20. The computer-accessible medium of claim 1 , wherein the randomly located short sequence reads are generated using at least one procedure having at least one error, and wherein the at least one error is at least one of: (i) incorrect base-calls, (ii) missing bases, (iii) inserted bases, (iv) homopolymeric compression or (v) expansion. 21. The computer-accessible medium of claim 1 , wherein the long-range comprises approximately 10 Kb-200 mb of information associated with the at least one genome. 22. A method for assembling at least one part of at least one of at least one haplotype sequence or at least one genotype sequence of at least one genome, comprising: (a) obtaining (i) a plurality of randomly located short sequence reads, and (ii) overlap information about overlaps between the randomly located short sequence reads; (b) obtaining long range information for the randomly located short sequence reads, wherein the long range information includes optical map data and mate-pair data; (c) automatically randomly selecting a first read from the randomly located short sequence reads; (d) automatically identifying one or more overlapping second reads of the randomly located short sequence reads that overlap with the first read; (e) automatically generating one or more scores regarding the one or more overlapping second reads using the overlap information and the long range information; (f) selecting a particular read of the one or more second overlapping reads based on the one or more scores; (g) automatically generating a path through the plurality of randomly located short sequence reads by repeating procedures (e) and (f); and (h) using a computer hardware arrangement, automatically assembling the at least one part of the at least one of the at least one haplotype sequence or the at least one genotype sequence of the genome based on the path. 23. The method of claim 22 , further comprising generating the one or more scores based on at least one of a containment or an overhang among a single pair of the randomly located

Assignees

Inventors

Classifications

G16B30/20Primary
Sequence assembly · CPC title
G16B30/10
Sequence alignment; Homology search · CPC title
G16B30/00Primary
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title

Patent family

Related publications grouped by family.

View patent family 42288439

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10839940B2 cover?: Exemplary embodiments of the present disclosure relate generally to methods, computer-accessible medium and systems for assembling haplotype and/or genotype sequences of at least one genome, which can be based upon, e.g., consistent layouts of short sequence reads and long-range genome related data. For example, a processing arrangement can be configured to perform a procedure including, e.g., …
Who is the assignee on this patent?: Mishra Bhubaneswar, Narzisi Giuseppe, Univ New York
What technology area does this patent fall under?: Primary CPC classification G16B30/20. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 17 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).