Method and system for classifying entity objects of entities based on attributes of the entity objects using machine learning
US-2022327378-A1 · Oct 13, 2022 · US
US12019532B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12019532-B2 |
| Application number | US-202117230071-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 14, 2021 |
| Priority date | Apr 14, 2021 |
| Publication date | Jun 25, 2024 |
| Grant date | Jun 25, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A non-transitory machine-readable medium can comprise executable instructions that, when executed by a processor, facilitate performance of operations, comprising: determining a workflow comprising a group of files of a distributed file system, determining a type of the workflow, determining a group of historical metrics associated with the type of the workflow, using a model generated based on machine learning applied to the group of historical metrics, determining respective predicted ranks for different settings that are able to be applied to the workflow, based on the respective predicted ranks, determining a setting, of the different settings, to use to apply to the workflow and that has a predicted rank of the respective predicted ranks that satisfies a defined criterion, and applying the setting to at least one of the group of files to decrease a latency associated with processing the workflow.
Opening claim text (preview).
What is claimed is: 1. A system, comprising: a processor; and a memory that stores executable instructions that, when executed by the processor, facilitate performance of operations, comprising: determining a workload comprising at least one of a first workload associated with a group of files of a distributed file system or a second workload associated with a directory of the distributed file system comprising at least one file of the group of files; determining a type of the workload; determining a group of historical metrics associated with the type of the workload; generating training data based on test data comprising test workloads appliable to the distributed file system and the historical metrics, the generating comprising performing a series of file access operations on files of specified size and on defined directories according to a defined client mode; using a machine learning model generated based on machine learning applied to the group of historical metrics, determining respective predicted ranks for different settings that are able to be applied to the workload, wherein the machine learning model was trained using the training data; based on the respective predicted ranks, determining a setting of the different settings to use to apply to the workload, wherein the determining of the setting comprises determining the setting that has a predicted rank of the respective predicted ranks that satisfies a defined function; and applying the setting to at least one of the group of files or the directory of the distributed file system to at least one of decrease a latency associated with processing the workload or to increase a throughput associated with the processing of the workload by the distributed file system. 2. The system of claim 1 , wherein the operations further comprise: updating a data store of settings to comprise the setting, resulting in updated settings stored by the data store, wherein the machine learning uses the updated settings to improve future determinations of settings. 3. The system of claim 1 , wherein the determining of the setting further comprises determining the setting based on the type of the workload. 4. The system of claim 1 , wherein the determining of the setting further comprises determining an access pattern comprising a streaming access pattern, a random-access pattern, or a concurrent access pattern. 5. The system of claim 1 , wherein the determining of the setting further comprises determining a protection level to be applied to the workload. 6. The system of claim 1 , wherein the determining of the setting further comprises determining metadata to be applied to accelerate the processing of the workload. 7. The system of claim 1 , wherein the determining of the setting further comprises determining a filename pre-fetch setting for the processing of the workload. 8. The system of claim 1 , wherein the determining of the setting further comprises determining an endurant cache setting for the processing of the workload. 9. The system of claim 1 , wherein the determining of the setting that has the predicted rank that satisfies the defined function comprises determining the setting that has a highest predicted rank of the respective predicted ranks. 10. The system of claim 1 , wherein the determining of the setting that has the predicted rank that satisfies the defined function comprises determining the setting that has at least a threshold predicted rank of the respective predicted ranks. 11. A non-transitory machine-readable medium, comprising executable instructions that, when executed by a processor, facilitate performance of operations, comprising: determining a workflow comprising a group of files of a distributed file system; determining a type of the workflow; determining a group of historical metrics associated with the type of the workflow; generating training data based on test data comprising test workloads applicable to the distributed file system and the historical metrics, the generating comprising performing a series of file access operations on files of specified size and on defined directories according to a defined client mode; using a machine learning model generated based on machine learning applied to the group of historical metrics, determining respective predicted ranks for different settings that are able to be applied to the workflow, wherein the machine learning model was trained using the training data; based on the respective predicted ranks, determining a setting, of the different settings, to use to apply to the workflow and that has a predicted rank of the respective predicted ranks that satisfies a defined criterion; and applying the setting to at least one of the group of files to decrease a latency associated with processing the workflow. 12. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises determining an access pattern comprising a streaming access pattern, a random-access pattern, or a concurrent access pattern, and wherein the determining of the setting is based on a metric associated with the group of files and a quantity of client devices associated with the workflow. 13. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises determining a protection level to be applied to the workflow, and wherein the protection level is based on a quantity of nodes in the distributed file system. 14. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises: determining a first subgroup of the group of files to allocate to a solid-state drive of the distributed file system, and determining a second subgroup of the group of files to allocate to a hard-disk drive if the distributed file system. 15. The non-transitory machine-readable medium of claim 11 , wherein the determining of the setting comprises determining a filename pre-fetch setting for a filename pre-fetch operation used for processing of the workflow, wherein the filename pre-fetch operation comprises determining a common prefix name of files within the group of files, and wherein the operations further comprise, based on a result of the filename pre-fetch operation, retrieving the files that comprise the common prefix name. 16. The non-transitory machine-readable medium of claim 15 , wherein the retrieving of the files comprises moving the files that comprise the common prefix name into a solid-state storage of the distributed file system. 17. A method, comprising: determining, by a system comprising a processor, a workload associated with a directory of a distributed file system comprising at least one file of a group of files; determining, by the system, a type of the workload; determining, by the system, a group of historical metrics associated with the type of the workload; generating training data based on test data comprising test workloads appliable to the distributed file system and the historical metrics, the generating comprising performing a series of file access operations on files of specified size and on defined directories according to a defined client mode; using the training data, training a machine learning model based on application of machine learning to the group of historical metrics; using the machine learning model, determining, by the system, respective predicted ranks for different settings that are able to be applied to the workload; based on the respective predicted ranks, selecting, by the system, a setting of the different settings to use to
Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
implemented using Network-attached Storage [NAS] architecture (distributed or networked storage systems G06F3/067; protocols for distributed storage of data in a network H04L67/1097) · CPC title
Concurrency control, e.g. optimistic or pessimistic approaches · CPC title
Machine learning · CPC title
Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.