Transmitting ultrasonic signal data
US-2024329189-A1 · Oct 3, 2024 · US
US9929746B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9929746-B2 |
| Application number | US-201515501804-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 5, 2015 |
| Priority date | Aug 5, 2014 |
| Publication date | Mar 27, 2018 |
| Grant date | Mar 27, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present disclosure provides computer implemented methods and systems for analyzing datasets, such as large data sets output from nucleic acid sequencing technologies. In particular, the present disclosure provides for data analysis comprising computing the BWT of a collection of strings in an incremental, character by character, manner. The present disclosure also provides compression boosting strategies resulting in a BWT of a reordered collection of data that is more compressible by second stage compression methods compared to non-reordered computational analysis.
Opening claim text (preview).
The invention claimed is: 1. A nucleic acid sequencing system for compressing sequencing data, comprising: a) a processor; and b) a memory coupled with the processor and having instructions that when executed by the processor perform a method comprising: i) receiving a collection of data strings corresponding to a first set of nucleotide data for a first nucleic acid fragment being sequenced in the system; ii) identifying a first character representing a first nucleotide in each of the data strings in the collection; iii) generating a first Burrows Wheeler transform index for a compressed data string containing the first characters corresponding to a first nucleotide of each data string; iv) identifying an additional character representing an additional nucleotide in each of the data strings; and v) updating the first Burrows Wheeler transform index with the additional characters corresponding to each additional nucleotide of the received collection of data strings to form compressed sequencing data. 2. The nucleic acid sequencing system of claim 1 , wherein receiving a collection of data strings corresponding to a first set of nucleotide data comprises receiving a collection of nucleic acid reads from a target sequence in the nucleic acid sequencing system. 3. The nucleic acid sequencing system of claim 2 , wherein the first set of nucleotide data comprises the first nucleotide from each data string corresponding to the target sequence. 4. The nucleic acid sequencing system of claim 2 , wherein the target sequence is a genomic DNA sequence. 5. The nucleic acid sequencing system of claim 1 , wherein the system repeats steps iv) and v) for each nucleotide in the collection of data strings to update the Burrows Wheeler transform index with all of the nucleotides in the collection of data strings. 6. The nucleic acid sequencing system of claim 1 , further comprising a server comprising a copy of the collection of data strings and the first Burrows Wheeler transform index. 7. The nucleic acid sequencing system of claim 6 , wherein the memory has instructions that when executed by the processor perform a further method comprising: vi) determining a predicted next nucleotide for each of the data strings; vii) determining a confirmed nucleotide by receiving a second set of nucleotide data that confirms the identity of the next nucleotide in the nucleic acid sequence; viii) creating a file of difference information comprising the differences between the predicted nucleotide and the confirmed nucleotide; and ix) compressing the file of difference information to form a compressed sequence data file. 8. The nucleic acid sequencing system of claim 7 , wherein the memory has instructions that when executed by the processor determine the predicted next nucleotide, at least partly, on the Burrows Wheeler transform index. 9. The nucleic acid sequencing system of claim 7 , wherein the instructions, when executed by the processor, perform a further method comprising sending the compressed file of difference information to a server having a copy of the first set of nucleotide data. 10. The nucleic acid sequencing system of claim 7 , wherein creating a file of difference information comprises creating a file with a zero for each confirmed nucleotide that is the same as the predicted nucleotide, and a character representing the confirmed nucleotide for each confirmed nucleotide that is different from the predicted nucleotide. 11. The nucleic acid sequencing system claim 7 , wherein compressing the file of difference information comprises replacing the zeros in the file of difference information with a reference to the number of zeros being replaced. 12. A nucleic acid sequencing system for compressing sequencing data, comprising: a) a processor; and b) a memory coupled with the processor and having instructions that when executed by the processor perform a method comprising: i) receiving a collection of data strings corresponding to a first set of nucleotide data for a first nucleic acid fragment being sequenced in the system; ii) identifying a first character representing a first nucleotide in each of the data strings in the collection; iii) determining a predicted next nucleotide for each of the data strings; iv) determining a confirmed nucleotide by receiving a second set of nucleotide data that confirms the identity of the next nucleotide in the nucleic acid sequence; v) creating a file of difference information comprising the differences between the predicted nucleotide and the confirmed nucleotide; and vi) compressing the file of difference information to form a compressed sequence data file. 13. The nucleic acid sequencing system of claim 12 , further comprising a server having a processor with instructions that when executed perform a method comprising: receiving the compressed file of difference information; comparing the compressed file of difference information to a data string in a collection of data strings; and replacing predicted nucleotides in the data string with confirmed nucleotides from the compressed file of difference information to form an updated data string. 14. The nucleic acid sequencing system of claim 12 , wherein determining the predicted next nucleotide for each of the data strings comprises performing a Burrows Wheeler transform. 15. The nucleic acid sequencing system of claim 12 , wherein creating a file of difference information comprises creating a file with a zero for each confirmed nucleotide that is the same as the predicted nucleotide, and a character representing the confirmed nucleotide for each confirmed nucleotide that is different from the predicted nucleotide. 16. A method of compressing sequencing data, comprising: a. receiving a collection of data strings corresponding to a first set of nucleotide data for a first nucleic acid fragment being sequenced in the system; b. identifying a first character representing a first nucleotide in each of the data strings in the collection; c. determining a predicted next nucleotide for each of the data strings; d. determining a confirmed nucleotide by receiving a second set of nucleotide data that confirms the identity of the next nucleotide in the nucleic acid sequence; e. creating a file of difference information comprising the differences between the predicted nucleotide and the confirmed nucleotide; and f. compressing the file of difference information to form a compressed sequence data file. 17. The method of claim 16 , wherein determining the predicted next nucleotide for each of the data strings comprises performing a Burrows Wheeler transform. 18. The method of claim 16 , wherein creating a file of difference information comprises creating a file with a zero for each confirmed nucleotide that is the same as the predicted nucleotide, and a character representing the confirmed nucleotide for each confirmed nucleotide that is different from the predicted nucleotide. 19. The method of claim 16 , wherein the first set of nucleotide data comprises the first nucleotide from each data string corresponding to the target sequence.
Physics · mapped topic
Precoding preceding compression, e.g. Burrows-Wheeler transformation · CPC title
Compression of genetic data · CPC title
ICT specially adapted for sequence analysis involving nucleotides or amino acids · CPC title
ICT programming tools or database systems specially adapted for bioinformatics · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.