What technology area does this patent fall under?

Primary CPC classification G06F16/1744. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 31 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems and methods for intelligent data compression

US12182075B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12182075-B2
Application number	US-202318092696-A
Country	US
Kind code	B2
Filing date	Jan 3, 2023
Priority date	Jan 3, 2023
Publication date	Dec 31, 2024
Grant date	Dec 31, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, computer program products, and methods are described herein for intelligent data compression, in accordance with an embodiment of the invention. The present invention may be configured to receive a plurality of files for storage in a database and perform a series of steps iteratively, for each file of the plurality of files, and until each file of the plurality of files is represented in the database. The series of steps may include identifying one or more data points in the respective file, where each identified data point was previously unidentified in the database and adding the identified one or more data points to the database. The series of steps may also include identifying one or more features of the respective file for storage in the database and storing the identified one or more features in the database as a surrogate for the respective file.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for intelligent data compression, the system comprising: at least one processing device; and at least one non-transitory storage device comprising computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to: receive a plurality of files for storage in a database, wherein each file of the plurality of files comprises data; and iteratively, for each file of the plurality of files and until each file of the plurality of files is represented in the database, compress a respective file into a surrogate to reduce an amount of memory resources required to store the data of the respective file in the database by: identifying, using a first machine learning model, one or more data points in the respective file, wherein each identified data point was previously unidentified with respect to a stored file in the database; adding the identified one or more data points to the database; identifying, using a second machine learning model, one or more features of the respective file for storage in the database; and storing the identified one or more features in the database as the surrogate for the respective file. 2. The system of claim 1 , wherein the first machine learning model is an unsupervised machine learning model. 3. The system of claim 1 , wherein the first machine learning model is a one-class support vector machine trained using a radial basis function kernel. 4. The system of claim 1 , wherein the second machine learning model is an unsupervised machine learning model. 5. The system of claim 1 , wherein the second machine learning model is at least one of a ridge regression model, an elastic net regression model, or a least squares regression model. 6. The system of claim 1 , wherein the plurality of files is a second plurality of files, and wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, before receiving the second plurality of files: receive a first plurality of files for storage in the database; identify data points in each of the first plurality of files; add the identified data points to the database; identify features in each of the first plurality of files; and store, for each of the first plurality of files, the identified features in the database as a surrogate for the respective file. 7. The system of claim 6 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, before receiving the second plurality of files, train the first machine learning model using data in the database to determine whether subsequent data points are previously unidentified data points. 8. The system of claim 7 , wherein training the first machine learning model to determine whether subsequent data points are previously unidentified data points comprises training the first machine learning model to: determine, based on data in the database, a likelihood of a data point being previously unidentified; determine whether the likelihood satisfies a threshold; and identify, based on the likelihood satisfying the threshold, that the data point is a previously unidentified data point. 9. The system of claim 7 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, after a surrogate for each file of the second plurality of files is stored in the database and using the identified data points of the first plurality of files and the identified data points of the second plurality of files, retrain the first machine learning model to determine whether subsequent data points are previously unidentified data points. 10. The system of claim 6 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, before receiving the second plurality of files, train the second machine learning model using data in the database to determine whether subsequent features should be stored in the database. 11. The system of claim 10 , wherein training the second machine learning model to determine whether subsequent features should be stored in the database comprises training the second machine learning model to: determine, based on data in the database, a likelihood of a feature being previously unidentified; determine whether the likelihood satisfies a threshold; and determine, based on the likelihood satisfying the threshold, that the feature should be stored in the database. 12. The system of claim 10 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to, after a surrogate for each file of the second plurality of files is stored in the database and using the identified features of the first plurality of files and the identified features of the second plurality of files, retrain the second machine learning model to determine whether subsequent features should be stored in the database. 13. The system of claim 1 , wherein the files comprise high-resolution image files, and wherein storing the identified one or more features in the database as a surrogate for the file comprises storing (i) at least one of a lower resolution image file or an image identifier corresponding to the file and (ii) metadata comprising the identified one or more features. 14. The system of claim 13 , wherein the at least one non-transitory storage device comprises computer-executable program code that, when executed by the at least one processing device, causes the at least one processing device to: receive an indication of an error associated with a file of the plurality of files; identify, using a third machine learning model, other files in the database associated with the error; generate a report comprising the other files and the identified one or more features of the other files; and provide the report to a user. 15. The system of claim 13 , wherein the high-resolution image files comprise images captured by one or more cameras of an autonomous vehicle, and the metadata comprises at least one of a number of people identified in each image, a number of objects identified in each image, or a number of vehicles identified in each image. 16. The system of claim 13 , wherein the high-resolution image files comprise images captured by a microscope of a plurality of cells, and the metadata comprises at least one of a number of cells identified in each image, a wavelength of light used to capture each image, or an identifier of the microscope. 17. The system of claim 13 , wherein the high-resolution image files comprise images of wafer devices, and the metadata comprises at least one of a step of manufacturing during which an image was captured, a defect identified in the wafer device, a location of a defect identified in the wafer device, or a type of defect identified in the wafer device. 18. A method for intelligent data compression, the method comprising: receiving a first plurality of files for storage in a database; identifying data points in each of the first plura

Assignees

Mellanox Technologies Ltd

Inventors

Classifications

G06N20/10
using kernel methods, e.g. support vector machines [SVM] · CPC title
G06N20/00
Machine learning · CPC title
G06F16/1744Primary
using compression, e.g. sparse files · CPC title

Patent family

Related publications grouped by family.

View patent family 91666619

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12182075B2 cover?: Systems, computer program products, and methods are described herein for intelligent data compression, in accordance with an embodiment of the invention. The present invention may be configured to receive a plurality of files for storage in a database and perform a series of steps iteratively, for each file of the plurality of files, and until each file of the plurality of files is represented …
Who is the assignee on this patent?: Mellanox Technologies Ltd
What technology area does this patent fall under?: Primary CPC classification G06F16/1744. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 31 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Monitoring machine learning models using surrogate model output

Intelligently optimized machine learning models

Validation processing for candidate retraining data

Machine learning model for entity resolution

Apparatus, articles of manufacture, and methods for clustered federated learning using context data

Method and apparatus for decrypting and authenticating a data record

Density estimation network for unsupervised anomaly detection

Frequently asked questions