What technology area does this patent fall under?

Primary CPC classification G06F11/3414. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Distributed file system performance optimization for path-level settings using machine learning

US12019532B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12019532-B2
Application number	US-202117230071-A
Country	US
Kind code	B2
Filing date	Apr 14, 2021
Priority date	Apr 14, 2021
Publication date	Jun 25, 2024
Grant date	Jun 25, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A non-transitory machine-readable medium can comprise executable instructions that, when executed by a processor, facilitate performance of operations, comprising: determining a workflow comprising a group of files of a distributed file system, determining a type of the workflow, determining a group of historical metrics associated with the type of the workflow, using a model generated based on machine learning applied to the group of historical metrics, determining respective predicted ranks for different settings that are able to be applied to the workflow, based on the respective predicted ranks, determining a setting, of the different settings, to use to apply to the workflow and that has a predicted rank of the respective predicted ranks that satisfies a defined criterion, and applying the setting to at least one of the group of files to decrease a latency associated with processing the workflow.

First claim

Opening claim text (preview).

What is claimed is: 1. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, comprising: determining a workload comprising at least one of a first workload associated with a group of files of a distributed file system or a second workload associated with a directory of the distributed file system comprising at least one file of the group of files; determining a type of the workload; determining a group of historical metrics associated with the type of the workload; generating training data based on test data comprising test workloads appliable to the distributed file system and the historical metrics, the generating comprising performing a series of file access operations on files of specified size and on defined directories according to a defined client mode; using a machine learning model generated based on machine learning applied to the group of historical metrics, determining respective predicted ranks for different settings that are able to be applied to the workload, wherein the machine learning model was trained using the training data; based on the respective predicted ranks, determining a setting of the different settings to use to apply to the workload, wherein the determining of the setting comprises determining the setting that has a predicted rank of the respective predicted ranks that satisfies a defined function; and applying the setting to at least one of the group of files or the directory of the distributed file system to at least one of decrease a latency associated with processing the workload or to increase a throughput associated with the processing of the workload by the distributed file system. 2. The system of claim 1 , wherein the operations further comprise: updating a data store of settings to comprise the setting, resulting in updated settings stored by the data store, wherein the machine learning uses the updated settings to improve future determinations of settings. 3. The system of claim 1 , wherein the determining of the setting further comprises determining the setting based on the type of the workload. 4. The system of claim 1 , wherein the determining of the setting further comprises determining an access pattern comprising a streaming access pattern, a random-access pattern, or a concurrent access pattern. 5. The system of claim 1 , wherein the determining of the setting further comprises determining a protection level to be applied to the workload. 6. The system of claim 1 , wherein the determining of the setting further comprises determining metadata to be applied to accelerate the processing of the workload. 7. The system of claim 1 , wherein the determining of the setting further comprises determining a filename pre-fetch setting for the processing of the workload. 8. The system of claim 1 , wherein the determining of the setting further comprises determining an endurant cache setting for the processing of the workload. 9. The system of claim 1 , wherein the determining of the setting that has the predicted rank that satisfies the defined function comprises determining the setting that has a highest predicted rank of the respective predicted ranks. 10. The system of claim 1 , wherein the determining of the setting that has the predicted rank that satisfies the defined function comprises determining the setting that has at least a threshold predicted rank of the respective predicted ranks. 11. A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor, facilitate performance of operations, comprising: determining a workflow comprising a group of files of a distributed file system; determining a type of the workflow; determining a group of historical metrics associated with the type of the workflow; generating training data based on test data comprising test workloads applicable to the distributed file system and the historical metrics, the generating comprising performing a series of file access operations on files of specified size and on defined directories according to a defined client mode; using a machine learning model generated based on machine learning applied to the group of historical metrics, determining respective predicted ranks for different settings that are able to be applied to the workflow, wherein the machine learning model was trained using the training data; based on the respective predicted ranks, determining a setting, of the different settings, to use to apply to the workflow and that has a predicted rank of the respective predicted ranks that satisfies a defined criterion; and applying the setting to at least one of the group of files to decrease a latency associated with processing the workflow. 12. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises determining an access pattern comprising a streaming access pattern, a random-access pattern, or a concurrent access pattern, and wherein the determining of the setting is based on a metric associated with the group of files and a quantity of client devices associated with the workflow. 13. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises determining a protection level to be applied to the workflow, and wherein the protection level is based on a quantity of nodes in the distributed file system. 14. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises: determining a first subgroup of the group of files to allocate to a solid-state drive of the distributed file system, and determining a second subgroup of the group of files to allocate to a hard-disk drive if the distributed file system. 15. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises determining a filename pre-fetch setting for a filename pre-fetch operation used for processing of the workflow, wherein the filename pre-fetch operation comprises determining a common prefix name of files within the group of files, and wherein the operations further comprise, based on a result of the filename pre-fetch operation, retrieving the files that comprise the common prefix name. 16. The non-transitory machine-readable medium of claim 15 , wherein the retrieving of the files comprises moving the files that comprise the common prefix name into a solid-state storage of the distributed file system. 17. A method, comprising: determining, by a system comprising a processor, a workload associated with a directory of a distributed file system comprising at least one file of a group of files; determining, by the system, a type of the workload; determining, by the system, a group of historical metrics associated with the type of the workload; generating training data based on test data comprising test workloads appliable to the distributed file system and the historical metrics, the generating comprising performing a series of file access operations on files of specified size and on defined directories according to a defined client mode; using the training data, training a machine learning model based on application of machine learning to the group of historical metrics; using the machine learning model, determining, by the system, respective predicted ranks for different settings that are able to be applied to the workload; based on the respective predicted ranks, selecting, by the system, a setting of the different settings to use to

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G06F18/214
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06F16/1824
implemented using Network-attached Storage [NAS] architecture (distributed or networked storage systems G06F3/067; protocols for distributed storage of data in a network H04L67/1097) · CPC title
G06F16/1767
Concurrency control, e.g. optimistic or pessimistic approaches · CPC title
G06N20/00
Machine learning · CPC title
G06F16/1734
Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs · CPC title

Patent family

Related publications grouped by family.

View patent family 83601380

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12019532B2 cover?: A non-transitory machine-readable medium can comprise executable instructions that, when executed by a processor, facilitate performance of operations, comprising: determining a workflow comprising a group of files of a distributed file system, determining a type of the workflow, determining a group of historical metrics associated with the type of the workflow, using a model generated based on…
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06F11/3414. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 25 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and system for classifying entity objects of entities based on attributes of the entity objects using machine learning

Task code recommendation model

Utilizing a machine learning model for predicting issues associated with a closing process of an entity

Prioritizing sequential application tasks

System for Managing Effective Self-Service Analytic Workflows

Frequently asked questions