Management of indexed data to improve content retrieval processing

US11544502B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11544502-B2
Application numberUS-201916721652-A
CountryUS
Kind codeB2
Filing dateDec 19, 2019
Priority dateDec 19, 2019
Publication dateJan 3, 2023
Grant dateJan 3, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to processing operations configured to uniquely utilize indexing of content to improve content retrieval processing, particularly when working with large data sets. The techniques described herein enables efficient content retrieval when working with large data sets such as those that may be associated with a plurality of tenants of a data storage application/service. Among other technical advantages, the present disclosure is applicable to train a classifier using relevant samples based on text search in tenant-specific scenarios, where accurate searching can be executed for content associated with one or more tenant accounts of an application/service concurrently in milliseconds even in instances where there may be millions of documents to be searched. As an example, exemplary data shards may be generated and managed for efficient and scalable content retrieval processing including training of a classifier (e.g., artificial intelligence classifier) and real-time (or near real-time) query processing.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: retrieving indexing of file content associated with a tenant of an application or service; generating a plurality of data shards usable for training of an artificial intelligence classifier, wherein each of the plurality of data shards comprises a plurality of indexes from the indexing of the file content that are representative of a randomized sampling of the file content associated with the tenant; generating a processing queue that groups generated data shards for processing during execution of rounds of training of the artificial intelligence classifier, wherein the processing queue prioritizes the plurality of data shards as a first grouping that is processed during a round of training of the artificial intelligence classifier; pre-loading, prior to executing of the training of the artificial intelligence classifier, the plurality of data shards into a memory of a computing device that is configured to execute the training of the artificial intelligence classifier, wherein the pre-loading propagates the plurality of data shards together as the first grouping for training of the artificial intelligence classifier; and reading, from the memory, the plurality of data shards during executing of the training of the artificial intelligence classifier. 2. A system comprising: at least one processor; and a memory, operatively connected with the at least one processor, storing computer-executable instructions that, when executed by the at least one processor, causes the at least one processor to execute a method that comprises: retrieving indexing of file content associated with a tenant of an application or service; generating a plurality of data shards usable for training of an artificial intelligence classifier, wherein each of the plurality of data shards comprises a plurality of indexes from the indexing of the file content that are representative of a randomized sampling of the file content associated with the tenant; generating a processing queue that groups generated data shards for processing during execution of rounds of training of the artificial intelligence classifier, wherein the processing queue prioritizes the plurality of data shards as a first grouping that is processed during a round of training of the artificial intelligence classifier; pre-loading, prior to executing of the training of the artificial intelligence classifier, the plurality of data shards into a memory of a computing device that is configured to execute the training of the artificial intelligence classifier, wherein the pre-loading propagates the plurality of data shards together as the first grouping for training of the artificial intelligence classifier; and reading, from the memory, the plurality of data shards during executing of the training of the artificial intelligence classifier. 3. A computer-readable memory device having stored thereon instructions that, upon execution by one or more processors, cause the one or more processors to: retrieve indexing of file content for each of a plurality of tenant accounts of an application or service; generate a plurality of data shards usable for training an artificial intelligence classifier, wherein each of the plurality of data shards comprises a plurality of indexes from the indexing of the file content for a specific tenant account of the plurality of tenant accounts, the plurality of indexes being representative of a randomized sampling of the file content of the specific tenant account; generate processing queue that groups generated data shards for processing during execution of rounds of training of the artificial intelligence classifier, wherein the processing queue prioritizes the plurality of data shards as a first grouping that is processed during a round of training of the artificial intelligence classifier; pre-loading, prior to executing of the training of the artificial intelligence classifier, the plurality of data shards into a memory of a computing device that is configured to execute the training of the artificial intelligence classifier, wherein the pre-loading propagates the plurality of data shards together as the first grouping for training of the artificial intelligence classifier; and reading, from the memory, the plurality of data shards during executing of the training of the artificial intelligence classifier. 4. The method of claim 1 , wherein the generating of the plurality of data shards further comprises identifying a predetermined number of files for a size of each of the plurality of data shards, and randomly selecting, as the randomized sampling, indexes associated with the predetermined number of files from the file content. 5. The method of claim 1 , wherein the generating of the plurality of data shards further comprises applying preset rules to create the randomized sampling of the file content, and wherein the preset rules comprise a first rule that identifies a predetermined number of files for a size of a data shard, and a second rule that randomizes file types of the file content represented in the randomized sampling of file content. 6. The method of claim 1 , wherein the processing queue further comprises a second grouping of a plurality of data shards specific to a second tenant of the application or service, and wherein the pre-loading further comprises preloading the second grouping of the plurality of data shards into the memory for reading that occurs during a second round of the training of the artificial intelligence classifier. 7. The method of claim 1 , wherein each of the plurality of data shards is specific to a single tenant of the application or service, and wherein the method further comprising: executing a round of training of the artificial intelligence classifier using the plurality of data shards that are specific to the single tenant. 8. The method of claim 1 , wherein the pre-loading automatically occurs based on a detection of user access to an artificial intelligence processing application or service that is configured for the training of the artificial intelligence classifier. 9. The method of claim 1 , wherein the pre-loading automatically occurs based on a detection of a search query, entered into a user interface of an artificial intelligence processing application or service, that is used for the training of the artificial intelligence classifier, and wherein the method further comprising: executing a round of training of the artificial intelligence classifier based on the search query and the plurality of data shards. 10. The system of claim 2 , wherein the generating of the plurality of data shards further comprises identifying a predetermined number of files for a size of each of the plurality of data shards, and randomly selecting, as the randomized sampling, indexes associated with the predetermined number of files from the file content. 11. The system of claim 2 , wherein the generating of the plurality of data shards further comprises applying preset rules to create the randomized sampling of the file content, and wherein the preset rules comprise a first rule that identifies a predetermined number of files for a size of a data shard, and a second rule that randomizes file types of the file content represented in the randomized sampling of file content. 12. The system of claim 5 , wherein the processing queue further comprises a second grouping of a plurality of data shards specific to a second tenant of the application or service, and wherein the pre-loading further comprises preloading the second grouping of the plurality of data shards into the memory for reading that occurs during a second round of the training of the

Assignees

Inventors

Classifications

  • Indexing structures · CPC title

  • Machine learning · CPC title

  • Management thereof · CPC title

  • Run-time optimisation · CPC title

  • Probabilistic graphical models, e.g. probabilistic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11544502B2 cover?
The present disclosure relates to processing operations configured to uniquely utilize indexing of content to improve content retrieval processing, particularly when working with large data sets. The techniques described herein enables efficient content retrieval when working with large data sets such as those that may be associated with a plurality of tenants of a data storage application/serv…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/2228. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 03 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).