Systems and methods of generating datasets from heterogeneous sources for machine learning
US-2019303719-A1 · Oct 3, 2019 · US
US11038528B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-11038528-B1 |
| Application number | US-202016892418-A |
| Country | US |
| Kind code | B1 |
| Filing date | Jun 4, 2020 |
| Priority date | Jun 4, 2020 |
| Publication date | Jun 15, 2021 |
| Grant date | Jun 15, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for genetic programming based compression determination are described herein. An aspect includes adding a first plurality of randomly generated compression algorithms to a first set of compression algorithms. Another aspect includes determining a respective mutated version of each of the first plurality of randomly generated compression algorithms. Another aspect includes adding the determined mutated versions to the first set of compression algorithms. Another aspect includes evaluating and ranking the first set of compression algorithms based on respective achieved degrees of compression.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: adding, by a processor, a first plurality of randomly generated compression algorithms to a first set of compression algorithms, wherein the first plurality of randomly generated compression algorithms is generated based on a library of reversible matrix operations; determining, by the processor, a respective mutated version of each of the first plurality of randomly generated compression algorithms by substituting one or more operations in each of the first plurality of randomly generated compression algorithms based on the library of reversible matrix operations; adding, by the processor, the determined mutated versions to the first set of compression algorithms; evaluating and ranking, by the processor, the first set of compression algorithms based on respective achieved degrees of compression for a data type; and identifying, by the processor, a winning compression algorithm based on the evaluation and ranking of the first set of compression algorithms, wherein the winning compression algorithm is used to compress data files corresponding to the data type. 2. The method of claim 1 , wherein adding the first plurality of randomly generated compression algorithms to the first set of compression algorithms comprises: generating, by a processor, an initial set of compression algorithms based on the library of reversible matrix operations; evaluating and ranking the initial set of compression algorithms based on respective achieved degrees of compression; and determining a top tier of the ranked initial set of compression algorithms, wherein the determined top tier of the ranked initial set of compression algorithms is the first plurality of randomly generated compression algorithms. 3. The method of claim 1 , further comprising adding a second plurality of randomly generated compression algorithms to the first set of compression algorithms before evaluating and ranking the first set of compression algorithms. 4. The method of claim 1 , further comprising: determining a top tier of the ranked first set of compression algorithms; adding the top tier of the ranked first set of compression algorithms to a second set of compression algorithms; determining a respective mutated version of each of the top tier of the ranked first set of compression algorithms; adding the determined mutated versions to the second set of compression algorithms; and evaluating and ranking the second set of compression algorithms based on respective achieved degrees of compression. 5. The method of claim 4 , wherein evaluating and ranking the first set of compression algorithms based on respective achieved degrees of compression is performed based on first input data corresponding to the data type; and wherein evaluating and ranking the second set of compression algorithms based on respective achieved degrees of compression is performed based on second input data corresponding to the data type. 6. The method of claim 5 , further comprising determining the winning compression algorithm and corresponding winning decompression algorithm for the data type based on the ranked second set of compression algorithms. 7. The method of claim 6 , further comprising: receiving a file corresponding to the data type by a compression application, the compression application comprising a plurality of compression algorithms including the winning compression algorithm; compressing the file by the winning compression algorithm based on the data type; inserting an algorithm identifier corresponding to the winning compression algorithm into metadata of the compressed file; receiving the compressed file by a decompression application, the decompression application comprising a plurality of decompression algorithms including the winning decompression algorithm; and decompressing the compressed file by the winning decompression algorithm based on the algorithm identifier. 8. A system comprising: a memory having computer readable instructions; and one or more processors for executing the computer readable instructions, the computer readable instructions controlling the one or more processors to perform operations comprising: adding a first plurality of randomly generated compression algorithms to a first set of compression algorithms, wherein the first plurality of randomly generated compression algorithms is generated based on a library of reversible matrix operations; determining a respective mutated version of each of the first plurality of randomly generated compression algorithms by substituting one or more operations in each of the first plurality of randomly generated compression algorithms based on the library of reversible matrix operations; adding the determined mutated versions to the first set of compression algorithms; evaluating and ranking the first set of compression algorithms based on respective achieved degrees of compression for a data type; and identifying a winning compression algorithm based on the evaluation and ranking of the first set of compression algorithms, wherein the winning compression algorithm is used to compress data files corresponding to the data type. 9. The system of claim 8 , wherein adding the first plurality of randomly generated compression algorithms to the first set of compression algorithms comprises: generating an initial set of compression algorithms based on the library of reversible matrix operations; evaluating and ranking the initial set of compression algorithms based on respective achieved degrees of compression; and determining a top tier of the ranked initial set of compression algorithms, wherein the determined top tier of the ranked initial set of compression algorithms is the first plurality of randomly generated compression algorithms. 10. The system of claim 8 , wherein the operations further comprise adding a second plurality of randomly generated compression algorithms to the first set of compression algorithms before evaluating and ranking the first set of compression algorithms. 11. The system of claim 8 , wherein the operations further comprise: determining a top tier of the ranked first set of compression algorithms; adding the top tier of the ranked first set of compression algorithms to a second set of compression algorithms; determining a respective mutated version of each of the top tier of the ranked first set of compression algorithms; adding the determined mutated versions to the second set of compression algorithms; and evaluating and ranking the second set of compression algorithms based on respective achieved degrees of compression. 12. The system of claim 11 , wherein evaluating and ranking the first set of compression algorithms based on respective achieved degrees of compression is performed based on first input data corresponding to the data type; and wherein evaluating and ranking the second set of compression algorithms based on respective achieved degrees of compression is performed based on second input data corresponding to the data type. 13. The system of claim 12 , wherein the operations further comprise determining the winning compression algorithm and corresponding winning decompression algorithm for the data type based on the ranked second set of compression algorithms. 14. The system of claim 13 , wherein the operations further comprise: receiving a file corresponding to the data type by a compression application, the compression application comprising a plurality of compression algorithms including the winning compression algorithm; compressing the file by the winning compression algo
according to the data type · CPC title
Selection between different types of compressors · CPC title
Compression Theory, e.g. compression of random number, repeated compression · CPC title
Selection strategies · CPC title
Updates performed during online database operations; commit processing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.