What technology area does this patent fall under?

Primary CPC classification G06F16/256. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Nov 24 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Search and recommendation engine allowing recommendation-aware placement of data assets to minimize latency

US2022374329A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2022374329-A1
Application number	US-202217875155-A
Country	US
Kind code	A1
Filing date	Jul 27, 2022
Priority date	Mar 29, 2016
Publication date	Nov 24, 2022
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A search engine responding to a user query to find relevant data assets in a federation business data lake (FBDL) system based on interactions of known users interacting with data assets in the FBDL system. Data assets are optimally placed for minimal latency or maximal load. Data asset recommendations and past data asset access information are input as features to a time-series model for predicting future data access patterns. An expected latency and load risk is then determined and scored by a weighted mean of these values, and placement optimization is simulated using an optimization method (e.g., genetic algorithm). Using the scoring and simulation, a data asset placement engine is then used to move the locations of the data assets to minimize latency and/or to minimize maximal load.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method of optimizing placement of data assets in a data retrieval system storing data assets for users in an enterprise, comprising: generating search results and data asset recommendations in response to a target user query; providing the recommendations and past data asset access information for data assets as features to a time-series model for predicting future data access patterns by the target user; determining and scoring an expected latency and load risk created by a weighted mean of these expected latency and load risk values; simulating a placement optimization using an optimization method; and moving the data assets from one location to another location in the system using the scoring and simulation so as to minimize latency. 2 . The method of claim 1 wherein the data processing system is maintained by a large scale enterprise, and wherein the data assets comprise Big Data-scale data sets, and wherein the data assets comprise databases, stacks of databases, file systems, and enterprise services, and wherein the data assets are accessed through a Hadoop layer storing open source software components to control storing, processing, and analyzing the data. 3 . The method of claim 2 wherein the data assets are stored in storage devices organized into arrays, and the arrays are located in one or more data centers of a federation business data lake (FBDL) storage system. 4 . The method of claim 3 wherein the moving step moves data assets from one disk to another disk within a storage device, or from one array to another array, or from one data center to another data center. 5 . The method of claim 3 wherein the data retrieval system comprises a search engine processing the query from the target user, the search engine returning one or more data asset recommendations responsive to the query. 6 . The method of claim 5 further comprising: monitoring and recording, by a monitoring component of the server, all interactions of a plurality of known users, including a first user and a target user, each interaction comprising an activity that triggers a read/write cycle to the storage; first deriving a similarity of each of the plurality of known users to the target user based on respective past and current data retrieval patterns of each of known users for data queried in the search engine; modeling a plurality of possible users for whom there are no known interactions with the plurality of known users or the data assets to constitute missing features; training respective a generative model using different random seeds for an inference engine through reconstructive self-supervised learning (SSL) techniques to generate possible values for the missing features; and providing the generated possible values to a consensus mechanism to generate an integrated recommendation to the target user. 7 . The method of claim 6 wherein the partial features comprise past searches, user interactions with the database, user interactions with each other, user profiles, product/service characteristics, user access patterns between users and the FBDL, user profiles and interactions among users. 8 . The method of claim 7 wherein the optimization method comprises a genetic algorithm (GA). 10 . A method of processing queries input to a data retrieval system storing data assets for users in an enterprise, comprising: storing, in a federation business data lake (FBDL) storage maintained for a large-scale data processing system, data assets retrievable by a user in one or more possible locations along a scale of disks within a storage device, storage devices within arrays, and arrays within data centers; providing a search engine for entry of queries by a target user looking for data assets in the FBDL; generating search results and data asset recommendations in response to a target user query input to the search engine; providing the recommendations and past data asset access information for data assets as features to a time-series model for predicting future data access patterns by the target user; simulating an optimized placement of the data assets to minimize latency using an optimization method; and moving the data assets from one location to another location in the system using the scoring and simulation. 11 . The method of claim 10 wherein the latency comprises a measure of time required to perform an operation. 12 . The method of claim 11 further comprising determining and scoring an expected latency and load risk created by a weighted mean of these expected latency and load risk values. 13 . The method of claim 11 wherein the data processing system is maintained by a large scale enterprise, and wherein the data assets comprise Big Data-scale data sets, and wherein the data assets comprise databases, stacks of databases, file systems, and enterprise services, and wherein the data assets are accessed through a Hadoop layer storing open source software components to control storing, processing, and analyzing the data. 14 . The method of claim 13 wherein the moving step moves data assets from one disk to another disk within a storage device, or from one array to another array, or from one data center to another data center. 15 . The method of claim 13 further comprising: monitoring and recording, by a monitoring component of the server, all interactions of a plurality of known users, including a first user and a target user, each interaction comprising an activity that triggers a read/write cycle to the storage; first deriving a similarity of each of the plurality of known users to the target user based on respective past and current data retrieval patterns of each of known users for data queried in the search engine; modeling a plurality of possible users for whom there are no known interactions with the plurality of known users or the data assets to constitute missing features; training respective a generative model using different random seeds for an inference engine through reconstructive self-supervised learning (SSL) techniques to generate possible values for the missing features; and providing the generated possible values to a consensus mechanism to generate an integrated recommendation to the target user. 16 . The method of claim 15 wherein the partial features comprise past searches, user interactions with the database, user interactions with each other, user profiles, product/service characteristics, user access patterns between users and the FBDL, user profiles and interactions among users. 17 . A system for optimizing placement of data assets in a data retrieval system storing data assets for users in an enterprise, comprising: a search engine generating search results and data asset recommendations in response to a target user query; a time-series modeling component receiving the recommendations and past data asset access information for data assets as features, and configured to predict future data access patterns by the target user; and a data asset placement engine receiving the future data asset access pattern prediction and determining and scoring an expected latency and load risk created by a weighted mean of these expected latency and load risk values, the data asset placement engine further simulating a placement optimization using an optimization method, and moving the data assets from one location to another location in the system using the scoring and simulation so as to minimize latency. 18 . The system of claim 17 wherein the data processing system is maintaine

Assignees

Dell Products Lp

Inventors

Classifications

G06F16/285
Clustering or classification · CPC title
G06F16/256Primary
in federated or virtual databases · CPC title
G06F11/3438
monitoring of user actions (tracking the activity of the user H04L67/535) · CPC title
G06F17/40
Data acquisition and logging (for input to computer G06F3/00) · CPC title
G06F11/3457Primary
Performance evaluation by simulation · CPC title

Patent family

Related publications grouped by family.

View patent family 84103684

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2022374329A1 cover?: A search engine responding to a user query to find relevant data assets in a federation business data lake (FBDL) system based on interactions of known users interacting with data assets in the FBDL system. Data assets are optimally placed for minimal latency or maximal load. Data asset recommendations and past data asset access information are input as features to a time-series model for predi…
Who is the assignee on this patent?: Dell Products Lp
What technology area does this patent fall under?: Primary CPC classification G06F16/256. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Nov 24 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Search and recommendation engine allowing recommendation-aware placement of data assets to minimize maximal load

Business data lake search engine

Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods

Frequently asked questions