Data ingest optimization

US9589065B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9589065-B2
Application numberUS-201213604096-A
CountryUS
Kind codeB2
Filing dateSep 5, 2012
Priority dateJan 28, 2011
Publication dateMar 7, 2017
Grant dateMar 7, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods and systems for optimizing the retrieval of data from multiple sources are described. A slot map including slots for the storage of data elements can be obtained. The data elements associated with the slots can be prioritized by weighting values with costs of retrieving the data elements from respective data sources. Each value can be associated with a different data element and can indicate a respective degree of importance of the associated data element. Further, the systems and methods can direct the retrieval of data elements from the respective data sources in an order in accordance with the priority of the data elements to optimize the quality of data obtainable within a critical time constraint. In addition, the retrieved data elements can be stored in corresponding slots on a storage medium.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer readable storage medium comprising a non-transitory computer readable program code, wherein the computer readable program code when executed on a computer causes the computer to: obtain a slot map including slots for the storage of data segments in a persistent storage device, the slot map including quality tag values associated with each of the data segments applied to each slot in the slot map; prioritize the data segments associated with the slots by weighting the quality tag values, each of which is associated with a different slot and data segment and indicates a respective degree of importance of the associated data segment, with costs of retrieving the data segments from respective data sources and probabilities of successfully retrieving valid data segments from each of the respective data sources at one or more particular future times, and output a priority queue of the data segments; populate the slot map with retrieved data segments and output the slot map; and direct a retrieval of the data segments from the respective data sources in an order in accordance with a determined priority of the data segments to optimize a quality of data obtainable within a critical time constraint. 2. The computer readable storage medium of claim 1 , wherein the data segments provide material for analysis of a subject and wherein each value indicates a respective degree of importance of a corresponding data segment in the analysis. 3. The computer readable storage medium of claim 1 , wherein each value is based upon an expectation of success of retrieving the data segment associated with the value from a corresponding data source. 4. The computer readable storage medium of claim 1 , wherein each value is based upon an expected resource expenditure of retrieving the data segment associated with the value from a corresponding data source. 5. The computer readable storage medium of claim 1 , wherein the retrieval is constrained by at least one of a resource budget or a hard-stop end time. 6. The computer readable storage medium of claim 1 , wherein the retrieval comprises adding additional slots to the slot map and repeating the prioritize step for the additional slots. 7. A system for optimizing the retrieval of data from multiple sources comprising: a slot map generator configured to generate a slot map including slots for the storage of data segments in a persistent storage device, the slot map further including quality tag values associated with each of the data segments applied to each slot in the slot map; a priority module configured to prioritize data segments associated with the slots by weighting the quality tag values, each of which is associated with a different slot and data segment and indicates a respective degree of importance of the associated data segment, with costs of retrieving the data segments from respective data sources and probabilities of successfully retrieving valid data segments from each of the respective data sources at one or more particular future times; and a processor configured to direct a retrieval of the data segments from the respective data sources in an order in accordance with a determined priority of the data segments to optimize a quality of data obtainable within a critical resource constraint, the processor being further configured to output a priority queue of the data segments, populate the slot map with retrieved data segments, and output the slot map. 8. The system of claim 7 , wherein the data segments provide material for analysis of a subject and wherein each value indicates a respective degree of importance of a corresponding data segment in the analysis. 9. The system of claim 7 , wherein the priority module is further configured to base each value upon an expectation of success of retrieving the data segment associated with the value from a corresponding data source. 10. The system of claim 7 , wherein the priority module is further configured to base each value upon an expected resource expenditure of retrieving the data segment associated with the value from a corresponding data source. 11. The system of claim 7 , wherein the critical resource constraint is at least one of a resource budget or a critical time constraint. 12. The system of claim 7 , wherein the processor is further configured to add additional slots to the slot map and to repeat the prioritizing for the additional slots.

Assignees

Inventors

Classifications

  • G06F16/951Primary

    Indexing; Web crawling techniques · CPC title

  • by using parallel associative memories or content-addressable memories · CPC title

  • Physics · mapped topic

  • G06F16/957Primary

    Browsing optimisation, e.g. caching or content distillation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9589065B2 cover?
Methods and systems for optimizing the retrieval of data from multiple sources are described. A slot map including slots for the storage of data elements can be obtained. The data elements associated with the slots can be prioritized by weighting values with costs of retrieving the data elements from respective data sources. Each value can be associated with a different data element and can ind…
Who is the assignee on this patent?
Bhagwan Varun, Grandison Tyrone W A, Gruhl Daniel F, and 1 more
What technology area does this patent fall under?
Primary CPC classification G06F16/951. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 07 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).