Systems and methods for intelligent data compression

US12182075B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12182075-B2
Application numberUS-202318092696-A
CountryUS
Kind codeB2
Filing dateJan 3, 2023
Priority dateJan 3, 2023
Publication dateDec 31, 2024
Grant dateDec 31, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, computer program products, and methods are described herein for intelligent data compression, in accordance with an embodiment of the invention. The present invention may be configured to receive a plurality of files for storage in a database and perform a series of steps iteratively, for each file of the plurality of files, and until each file of the plurality of files is represented in the database. The series of steps may include identifying one or more data points in the respective file, where each identified data point was previously unidentified in the database and adding the identified one or more data points to the database. The series of steps may also include identifying one or more features of the respective file for storage in the database and storing the identified one or more features in the database as a surrogate for the respective file.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for intelligent data compression, the system comprising: at least one processing device; and at least one non-transitory storage device comprising computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to: receive a plurality of files for storage in a database, wherein each file of the plurality of files comprises data; and iteratively, for each file of the plurality of files and until each file of the plurality of files is represented in the database, compress a respective file into a surrogate to reduce an amount of memory resources required to store the data of the respective file in the database by: identifying, using a first machine learning model, one or more data points in the respective file, wherein each identified data point was previously unidentified with respect to a stored file in the database; adding the identified one or more data points to the database; identifying, using a second machine learning model, one or more features of the respective file for storage in the database; and storing the identified one or more features in the database as the surrogate for the respective file. 2. The system of claim 1 , wherein the first machine learning model is an unsupervised machine learning model. 3. The system of claim 1 , wherein the first machine learning model is a one-class support vector machine trained using a radial basis function kernel. 4. The system of claim 1 , wherein the second machine learning model is an unsupervised machine learning model. 5. The system of claim 1 , wherein the second machine learning model is at least one of a ridge regression model, an elastic net regression model, or a least squares regression model. 6. The system of claim 1 , wherein the plurality of files is a second plurality of files, and wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, before receiving the second plurality of files: receive a first plurality of files for storage in the database; identify data points in each of the first plurality of files; add the identified data points to the database; identify features in each of the first plurality of files; and store, for each of the first plurality of files, the identified features in the database as a surrogate for the respective file. 7. The system of claim 6 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, before receiving the second plurality of files, train the first machine learning model using data in the database to determine whether subsequent data points are previously unidentified data points. 8. The system of claim 7 , wherein training the first machine learning model to determine whether subsequent data points are previously unidentified data points comprises training the first machine learning model to: determine, based on data in the database, a likelihood of a data point being previously unidentified; determine whether the likelihood satisfies a threshold; and identify, based on the likelihood satisfying the threshold, that the data point is a previously unidentified data point. 9. The system of claim 7 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, after a surrogate for each file of the second plurality of files is stored in the database and using the identified data points of the first plurality of files and the identified data points of the second plurality of files, retrain the first machine learning model to determine whether subsequent data points are previously unidentified data points. 10. The system of claim 6 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, before receiving the second plurality of files, train the second machine learning model using data in the database to determine whether subsequent features should be stored in the database. 11. The system of claim 10 , wherein training the second machine learning model to determine whether subsequent features should be stored in the database comprises training the second machine learning model to: determine, based on data in the database, a likelihood of a feature being previously unidentified; determine whether the likelihood satisfies a threshold; and determine, based on the likelihood satisfying the threshold, that the feature should be stored in the database. 12. The system of claim 10 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, after a surrogate for each file of the second plurality of files is stored in the database and using the identified features of the first plurality of files and the identified features of the second plurality of files, retrain the second machine learning model to determine whether subsequent features should be stored in the database. 13. The system of claim 1 , wherein the files comprise high-resolution image files, and wherein storing the identified one or more features in the database as a surrogate for the file comprises storing (i) at least one of a lower resolution image file or an image identifier corresponding to the file and (ii) metadata comprising the identified one or more features. 14. The system of claim 13 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to: receive an indication of an error associated with a file of the plurality of files; identify, using a third machine learning model, other files in the database associated with the error; generate a report comprising the other files and the identified one or more features of the other files; and provide the report to a user. 15. The system of claim 13 , wherein the high-resolution image files comprise images captured by one or more cameras of an autonomous vehicle, and the metadata comprises at least one of a number of people identified in each image, a number of objects identified in each image, or a number of vehicles identified in each image. 16. The system of claim 13 , wherein the high-resolution image files comprise images captured by a microscope of a plurality of cells, and the metadata comprises at least one of a number of cells identified in each image, a wavelength of light used to capture each image, or an identifier of the microscope. 17. The system of claim 13 , wherein the high-resolution image files comprise images of wafer devices, and the metadata comprises at least one of a step of manufacturing during which an image was captured, a defect identified in the wafer device, a location of a defect identified in the wafer device, or a type of defect identified in the wafer device. 18. A method for intelligent data compression, the method comprising: receiving a first plurality of files for storage in a database; identifying data points in each of the first plura

Assignees

Inventors

Classifications

  • using kernel methods, e.g. support vector machines [SVM] · CPC title

  • Machine learning · CPC title

  • using compression, e.g. sparse files · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12182075B2 cover?
Systems, computer program products, and methods are described herein for intelligent data compression, in accordance with an embodiment of the invention. The present invention may be configured to receive a plurality of files for storage in a database and perform a series of steps iteratively, for each file of the plurality of files, and until each file of the plurality of files is represented …
Who is the assignee on this patent?
Mellanox Technologies Ltd
What technology area does this patent fall under?
Primary CPC classification G06F16/1744. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 31 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).