What technology area does this patent fall under?

Primary CPC classification G06F3/0608. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and method for data compaction with codebook statistical estimates

Patent metadata
Field	Value
Publication number	US-12373101-B2
Application number	US-202318520473-A
Country	US
Kind code	B2
Filing date	Nov 27, 2023
Priority date	Oct 30, 2017
Publication date	Jul 29, 2025
Grant date	Jul 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for data compaction with codebook statistical estimates to improve entropy encoding methods to account for, and efficiently handle, previously-unseen data in data to be compacted. Training data sets are analyzed to determine the frequency of occurrence of each sourceblock in the training data sets. A mismatch probability estimate is calculated comprising an estimated frequency at which any given data sourceblock received during encoding will not have a codeword in the codebook. Entropy encoding is used to generate codebooks comprising codewords for data sourceblocks based on the frequency of occurrence of each sourceblock. A “mismatch codeword” is inserted into the codebook based on the mismatch probability estimate to represent those cases when a block of data to be encoded does not have a codeword in the codebook. During encoding, if a mismatch occurs, a secondary encoding process is used to encode the mismatched sourceblock.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for encoding data using mismatch probability estimation, comprising: a computing device comprising a processor, a memory, and a non-volatile data storage device; a statistical analyzer comprising a first plurality of programming instructions stored in the memory which, when operating on the processor, causes the computing device to: calculate a mismatch probability estimate comprising a probability that any given sourceblock in a non-training data set will not be a sourceblock that was contained in a training data set; generate a mismatch sourceblock representing sourceblocks that were not contained in the training data set; and a codebook generator comprising a second plurality of programming instructions stored in the memory which, when operating on the processor, causes the computing device to: generate a codebook from the sourceblocks of the training data set and the mismatch sourceblock. 2. The system of claim 1 , wherein a mismatch probability estimate is assigned to the generated mismatch sourceblock as the frequency of occurrence of the mismatch sourceblock; and Wherein an entropy encoding method is used to assign codewords to each mismatch sourceblock based on its frequency of occurrence. 3. The system of claim 1 , further comprising an encoder comprising a third plurality of programming instructions stored in the memory which, when operating on the processor, causes the computing device to: receive the non-training data set for encoding, the non-training data set comprising sourceblocks of data; for each sourceblock of the non-training data set, look up and return the codeword for that sourceblock in the codebook and insert that codeword into an encoded data stream; and generate a new codeword for the looked up sourceblock using a secondary encoding method if the sourceblock is a mismatch, and insert the new codeword into the encoded data stream. 4. The system of claim 1 , further comprising a decoder comprising a fourth plurality of programming instructions stored in the memory which, when operating on the processor, causes the computing device to: receive an encoded data stream comprising codewords; for each codeword in the encoded data stream, look up and return the sourceblock for that codeword in the codebook and insert that sourceblock into a decoded data stream; and determine the sourceblock for that codeword using a secondary encoding method if the sourceblock is a mismatch, and insert the determined sourceblock into the decoded data stream. 5. The system of claim 1 , wherein the training data set is a low-entropy data set, either having a small subset of sourceblocks of a given size relative to the total possible number of sourceblocks of that size or having a set of sourceblocks closely matching the set of sourceblocks expected in the non-training data set. 6. The system of claim 1 , wherein the entropy encoding method is Huffman coding or a known variant thereof. 7. The system of claim 1 , wherein the mismatch probability estimate, q, is calculated as q=M/N, where: M is the number of times a previously-unobserved sourceblock appeared in the training data set; and N is the total number of sourceblocks observed in the training data set. 8. The system of claim 7 , wherein the mismatch probability estimate, q, is calculated as q=M/N=(Σ j=1 N X j )/N, where: X j = { 1 ⁢ if ⁢ S j ∉ { S i : 1 ≤ i < j } 0 ⁢ otherwise ; and N is the total number of sourceblocks observed in the training data set. 9. The system of claim 8 , wherein an exponentially-weighted moving average is applied to the calculation of q=(Σ j=1 N X j )/N. 10. The system of claim 9 , wherein the exponentially-weighted moving average is a modified form of an exponentially-weighted moving average of the form: μ j = { 1 ⁢ if ⁢ j = 0 ( 1 - β j ) ⁢ μ j - 1 + β j ⁢ X j ⁢ if ⁢ j

Assignees

Atombeam Technologies Inc

Inventors

Classifications

G06F3/067
Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title
G06F3/0623
in relation to content · CPC title
G06F3/0659
Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title
H03M7/6011
Encoder aspects · CPC title
H03M7/6005
Decoder aspects · CPC title

Patent family

Related publications grouped by family.

View patent family 85151955

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12373101B2 cover?: A system and method for data compaction with codebook statistical estimates to improve entropy encoding methods to account for, and efficiently handle, previously-unseen data in data to be compacted. Training data sets are analyzed to determine the frequency of occurrence of each sourceblock in the training data sets. A mismatch probability estimate is calculated comprising an estimated frequen…
Who is the assignee on this patent?: Atombeam Technologies Inc
What technology area does this patent fall under?: Primary CPC classification G06F3/0608. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).