Sample array coding for low-delay
US-2018367810-A1 · Dec 20, 2018 · US
US10411728B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10411728-B2 |
| Application number | US-201716076347-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 8, 2017 |
| Priority date | Feb 8, 2016 |
| Publication date | Sep 10, 2019 |
| Grant date | Sep 10, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A device ( 100 ) for and method of determining clusters of sequences of instances of a first type of data for compacting a data set comprising sequences of instances of the first type of data is provided. Also a method of compacting a data set, a method of transmitting compacted data and a computer program product are provided. In a sequence clustering unit ( 110 ) of the device, sequences of a first set of data are clustered on basis of conditional probabilities. Each unique sequence of the first set of data is associated with one or more conditional probabilities that an instance of the second set of data has a specific value given the unique sequence. In the clustering a significant part of the mutual information between the first set of data and the second set of data is maintained.
Opening claim text (preview).
The invention claimed is: 1. A data reduction arrangement for compacting sequences of instances of a first type of data, the data reduction arrangement comprising: a device for determining clusters of sequences of instances of a first type of data for compacting a data set comprising sequences of instances of the first type of data, the instances of the first type of data comprising information for predicting instances of a second type of data, the instances of the second type of data comprising data based on a characteristic of a physical entity, the device comprising: a first data set unit for obtaining a first set of data comprising sequences of instances of the first type of data; a second data set unit for obtaining a second set of data comprising instances of the second type of data, each instance of the second set of data corresponds to a sequence in the first set of data; a sequence clustering unit for assigning the sequences of the first set of data to clusters, the assigning is based on conditional probabilities of data of the second type given a sequence of the first set of data, wherein each unique sequence of the first set of data is associated with one or more conditional probabilities that an instance of the second set of data has a specific value given the unique sequence, wherein the assigning of the sequences of the first set of data to clusters comprises applying the Context Tree Weighting method to the first set of data and the second set of data to obtain a context tree, in the Context Tree Weighting method every unique sequence of the first set of data is represented by a path in the context tree from a root node to a specific leaf node and counts stored in at least the leaf nodes of the context tree are based on the corresponding instances of the second set of data, wherein the estimated conditional probability of a respective leaf node is calculated on the basis of the counts of the respective leaf nodes; and a compaction unit configured for compacting the first data set by replacing the sequences of instances of the first type of data of the data set by an identification data of the cluster to which the sequence is assigned, wherein the identification data of a specific cluster uniquely identifies the specific cluster and is stored by a fewer number of bits than each sequence of the data set. 2. A computer-implemented method for determining clusters of sequences of instances of data of a first type of data for compacting a data set comprising sequences of instances of the first type of data, the instances of the first type of data comprising information for predicting instances of a second type of data, the instances of the second type of data comprising data being based on a characteristic of a physical entity, the method comprising: obtaining a first set of data comprising sequences of instances of the first type of data; obtaining a second set of data comprising instances of the second type of data, each instance of the second set of data corresponds to a sequence in the first set of data; assigning the sequences of the first set of data to clusters, the assigning is based on conditional probabilities of data of the second type given a sequence of the first set of data, wherein each unique sequence of the first set of data is associated with one or more conditional probabilities that an instance of the second set of data has a specific value given the unique sequence, wherein the assigning of the sequences of the first set of data to clusters comprises applying the Context Tree Weighting method to the first set of data and the second set of data to obtain a context tree, in the Context Tree Weighting method every unique sequence of the first set of data is represented by a path in the context tree from a root node to a specific leaf node and counts stored in at least the leaf nodes of the context tree are based on the corresponding instances of the second set of data, wherein the estimated conditional probability of a respective leaf node is calculated on the basis of the counts of the respective leaf nodes; and compacting the first data set by replacing the sequences of instances of the first type of data of the data set by an identification data of the cluster to which the sequence is assigned, wherein the identification data of a specific cluster uniquely identifies the specific cluster and is stored by a fewer number of bits than each sequence of the data set. 3. The method according to claim 2 , wherein the assigning of the sequences of the first set of data to clusters comprises forming the cluster on basis of estimated conditional probabilities of the leaf nodes of the context tree, wherein, if a specific leaf node is related to a specific cluster, then all sequences of the first set of data equal to the unique sequence ending in the specific leaf node are assigned to the specific cluster and wherein the estimated conditional probability of a respective leaf node is a Krichevsky and Trofimov estimator that is calculated on basis of the counts of the respective leaf nodes. 4. The method according to claim 2 , wherein the assigning of the sequences of the first set of data to clusters uses a k-means algorithm to form the clusters and assigns sequences of the first set to clusters, and wherein the k-means algorithm uses the estimated conditional probabilities of the leaf nodes of the context tree to form the clusters. 5. The method according to claim 4 , wherein, in the assigning of the sequence of the first set of data to clusters, sequences of the first set of data ending in leaf nodes having a total count that is smaller than a minimum number of observations are assigned to two additional clusters, sequences ending in leaf nodes having an estimated conditional probability smaller than 0.5 and having a total count that is smaller than the minimum number of observations are assigned to a first one of the two additional clusters, sequences ending in leaf nodes having an estimated conditional probability larger than 0.5 and having a total count that is smaller than the minimum number of observations are assigned to a second one of the two additional clusters, sequences ending in leaf nodes having an estimated conditional probability that is equal to 0.5 and having a total count smaller than the minimum number of observations are assigned to either the first one of the additional clusters or the second one of the additional clusters. 6. The method according to claim 2 , wherein the clusters of the sequences of the first set of data are further optimized by an iterative optimization method to minimize an optimization function that is based on a conditional entropy of the second set of data given the data of the clusters. 7. The method according to claim 6 , wherein the iterative optimization method comprises simulated annealing. 8. The method according to claim 2 , wherein the sequences of instances of the first type of data of the first set of data comprise time series of sensor data, each time series comprises results of measurements of one specific sensor at consecutive moments in time and the specific sensors are of an equal type. 9. The method according to claim 2 , wherein the instances of the second set of data are binary data instances. 10. The method according to claim 9 , wherein the assigning based on conditional probabilities is based on the conditional probabilities that the data of the second set of data is one given an unique sequence of the first set of data. 11. A method of transmitting compacted data comprising at least one sequence of instances of the first type of data, the at least one sequence is a sequence to be transmitted, the method comprising: obtaining
Prediction · CPC title
Clustering or classification · CPC title
for evaluating statistical data {, e.g. average values, frequency distributions, probability functions, regression analysis (forecasting specially adapted for a specific administrative, business or logistic context G06Q10/04)} · CPC title
Digital compression and data reduction techniques where the original information is represented by a subset or similar information, e.g. lossy compression · CPC title
Binary arithmetic codes · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.