Computerized systems and methods of data compression

US12050557B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12050557-B2
Application numberUS-202117532947-A
CountryUS
Kind codeB2
Filing dateNov 22, 2021
Priority dateMay 19, 2017
Publication dateJul 30, 2024
Grant dateJul 30, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computerized system and method of compressing symbolic information organized into a plurality of documents, each document having a plurality of symbols, the system and method including: (i) automatically identifying a plurality of sequential (also referred to as adjacent) and/or non-sequential symbol (also referred to as non-adjacent) pairs in an input document; (ii) counting the number of appearances of each unique symbol pair; and (iii) producing a compressed document that includes a replacement symbol at each position associated with one of the plurality of symbol pairs, at least one of which corresponds to a non-sequential symbol pair. For each non-sequential pair the compressed document includes corresponding indicia indicating a distance between locations of the non-sequential symbols of the pair in the input document.

First claim

Opening claim text (preview).

What is claimed is: 1. A computerized method of compressing symbolic information organized into a plurality of documents, each document having a plurality of symbols, the method comprising: (a) generating, by a computer based system, a symbol dictionary based on a first uncompressed document of the plurality of documents; performing, by the computer based system and with the symbol dictionary, a first data compression on the first uncompressed document by at least one of the adjacent pair dictionary method and the non-adjacent pair dictionary method to generate a compressed output document; (b) appending, by the computer based system, a new uncompressed document of the plurality of documents to the compressed output document to generate an appended compressed document; (c) updating, by the computer based system, the symbol dictionary based on the appended compressed document to generate an updated symbol dictionary; (d) identifying, by the computer based system, a plurality of symbol pairs, each symbol pair consisting of two sequential symbols in the first uncompressed document; (e) for each unique symbol pair of the plurality of symbol pairs, updating, by the computer based system, a count identifying the number of appearances of the unique symbol pair; and (f) producing, by the computer based system, the compressed output document by causing the compressed output document to include, at each position associated with one of the plurality of symbol pairs from the input document, a replacement symbol associated by a compression dictionary with the unique symbol pair matching the one of the plurality of symbol pairs, if the count for the unique symbol pair exceeds a threshold. 2. The method of claim 1 , further comprising: performing, by the computer based system and with the updated symbol dictionary, a second data compression on the appended compressed document by at least one of the adjacent pair dictionary method and the non-adjacent pair dictionary method. 3. The method of claim 2 , wherein performing the second compression comprises: (a) identifying, by the computer based system, a plurality of symbol pairs, each symbol pair consisting of two sequential symbols in the appended compressed document; (b) for each unique symbol pair of the plurality of symbol pairs, updating, by the computer based system, a count identifying the number of appearances of the unique symbol pair; and (c) producing, by the computer based system, a combined compressed document by causing the combined compressed document to include, at each position associated with one of the plurality of symbol pairs from the input document, a replacement symbol associated by a compression dictionary with the unique symbol pair matching the one of the plurality of symbol pairs, if the count for the unique symbol pair exceeds a threshold. 4. The method of claim 2 , wherein performing the second compression comprises: (a) identifying, by the computer based system, a plurality of symbol pairs, each symbol pair consisting of two sequential or non-sequential symbols in the appended compressed document, one or more symbol pairs consisting of two non-sequential symbols in the appended compressed document; (b) for each unique symbol pair of the plurality of symbol pairs, updating, by the computer based system, a count identifying the number of appearances of the unique symbol pair; and (c) producing, by the computer based system, a combined compressed document by causing the combined compressed document to include, at each position associated with one of the plurality of symbol pairs from the input document, including one or more symbol pairs consisting of two non-sequential symbols, (i) a replacement symbol associated by a compression dictionary with the unique symbol pair matching the one of the plurality of symbol pairs, if the count for the unique symbol pair exceeds a threshold, and (ii) for at least those symbol pairs consisting of two non-sequential symbols, indicia indicating a distance between locations of the non-sequential symbols of the pair in the input document. 5. The method of claim 2 , wherein the second data compression is only performed on an appended portion of the appended compressed document. 6. The method of claim 1 , further comprising: performing, by the computer based system, an analysis of the appended compressed document based on the symbol dictionary to determine whether any new words are present. 7. The method of claim 6 , further comprising: adding, by the computer based system, a new word to the symbol dictionary based on determining the presence of new words in the appended compressed document; and updating, by the computer based system, a frequency count of the symbol dictionary in response to adding the new words. 8. The method of claim 7 , further comprising: sorting, by the computer based system, the symbol dictionary by order of frequency in response to updating the frequency count. 9. The method of claim 1 , wherein performing the first data compression comprises: (a) identifying, by the computer based system, a plurality of symbol pairs, each symbol pair consisting of two sequential or non-sequential symbols in the input document, one or more symbol pairs consisting of two non-sequential symbols in the first uncompressed document; (b) for each unique symbol pair of the plurality of symbol pairs, updating, by the computer based system, a count identifying the number of appearances of the unique symbol pair; and (c) producing, by the computer based system, a compressed document by causing the compressed document to include, at each position associated with one of the plurality of symbol pairs from the input document, including one or more symbol pairs consisting of two nonsequential symbols, (i) a replacement symbol associated by a compression dictionary with the unique symbol pair matching the one of the plurality of symbol pairs, if the count for the unique symbol pair exceeds a threshold, and (ii) for at least those symbol pairs consisting of two nonsequential symbols, indicia indicating a distance between locations of the non-sequential symbols of the pair in the input document. 10. A computer system comprising: (a) a processor; and (b) a tangible, non-transitory memory configured to communicate with the processor, the tangible, non-transitory memory having instructions stored thereon that, in response to execution by the processor, cause the processor to perform operations comprising the computerized method steps of claim 1 . 11. The computer system of claim 10 that further comprises performing, by the computer system and with the updated symbol dictionary, a second data compression on the appended compressed document by at least one of the adjacent pair dictionary method and the non-adjacent pair dictionary method. 12. The computer system of claim 11 that further comprises: (a) identifying, by the computer system, a plurality of symbol pairs, each symbol pair consisting of two sequential symbols in the appended compressed document; (b) for each unique symbol pair of the plurality of symbol pairs, updating, by the computer based system, a count identifying the number of appearances of the unique symbol pair; and (c) producing, by the computer based system, a combined compressed document by causing the combined compressed document to include, at each position associated with one of the plurality of symbol pairs from the input document, a replacement symbol associated by a compression dictionary with the unique symbol pair matching the one of the plurality of symbol pairs, if the count for the unique symbol pair exceeds a threshold.

Assignees

Inventors

Classifications

  • Information retrieval; Database structures therefor; File system structures therefor · CPC title

  • Trees, e.g. B+trees · CPC title

  • Indexing; Web crawling techniques · CPC title

  • Methods or arrangements to increase the throughput · CPC title

  • H03M7/3088Primary

    employing the use of a dictionary, e.g. LZ78 · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12050557B2 cover?
A computerized system and method of compressing symbolic information organized into a plurality of documents, each document having a plurality of symbols, the system and method including: (i) automatically identifying a plurality of sequential (also referred to as adjacent) and/or non-sequential symbol (also referred to as non-adjacent) pairs in an input document; (ii) counting the number of ap…
Who is the assignee on this patent?
Suzuki Takashi
What technology area does this patent fall under?
Primary CPC classification H03M7/3088. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jul 30 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).