External data access with split index

US9715515B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9715515-B2
Application numberUS-201414170493-A
CountryUS
Kind codeB2
Filing dateJan 31, 2014
Priority dateJan 31, 2014
Publication dateJul 25, 2017
Grant dateJul 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A split-index can be employed for access to external data. The index can be created on a primary data storage system for data stored externally on a secondary data storage system. After creation, the index can be utilized to expedite at least query execution over the externally stored data. The index can be updated upon detection of changes to data. Further, even when the index is not completely up to date, the index can be exploited for query execution. Furthermore, hybrid execution is enabled with the index and without the index.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: creating, by a processor, an index locally in a relational data storage system for data stored external to the relational data storage system in a non-relational data storage system; receiving a query, by the relational data storage system, for at least a portion of the data in the non-relational data storage system; determining that the index is current with respect to an indexed portion of the data stored external to the relational data storage system, and stale with respect to an updated portion of the data stored external to the relational data storage system; identifying, as a first selected data subset, data of the indexed portion that satisfies the query based on the index; scanning the updated portion of the data stored external to the relational data storage system to identify a second selected data subset matching the query; and returning the first selected data subset and the second selected data subset in response to the query. 2. The method of claim 1 , further comprising: restricting identifying the data satisfying the query with the index to a query with selectivity according to a probability that a segment of the data satisfies a filter expression of a query. 3. The method of claim 1 , further comprising: updating the index to reflect changes to the data stored externally in the non-relational data storage system. 4. The method of claim 1 , further comprising: determining one or more updates to the index based on one or more changes to the data stored externally in the non-relational data storage system concurrently with query execution. 5. The method of claim 4 , further comprising: committing the updates to the index during a temporary pause in processing activity. 6. The method of claim 1 , further comprising: processing at least a portion of the query over the data in the non-relational data storage system without the index. 7. The method of claim 1 , further comprising: executing a portion of the query over relational data stored in the relational data storage system in combination with data satisfying the query from the non-relational data storage system. 8. A system, comprising: a processor; and a memory storing instructions, wherein execution of the instructions by the processor causes a device to: create an index locally in a relational data storage system for data stored external to the relational data storage system in a non-relational data storage system; receive a query, by the relational data storage system, for at least a portion of the data in the non-relational data storage system; determine that the index is current with respect to an indexed portion of the data stored external to the relational data storage system, and stale with respect to an updated portion of the data stored external to the relational data storage system; identify, as a first selected data subset, data of the indexed portion that satisfies the query based on the index; scanning the updated portion of the data stored external to the relational data storage system to identify a second selected data subset matching the query; and return the first selected data subset and the second selected data subset in response to the query. 9. The system of claim 8 , wherein the relational data storage system is a relational data warehouse system. 10. The system of claim 9 , wherein the non-relational data storage system is a non-relational distributed file system. 11. The system of claim 8 , wherein execution of the instructions further causes the device to execute the query over the data stored externally without the index for data that changed since the index was created. 12. The system of claim 8 , wherein execution of the instructions further causes the device to control execution of a query over the data externally stored with the index based on a signal in the query that indicates that the index is to be utilized. 13. The system of claim 8 , wherein execution of the instructions further causes the device to update the index in view of changes to the data. 14. The system of claim 13 , wherein execution of the instructions further causes the device to perform an incremental update. 15. The system of claim 8 , wherein execution of the instructions by the processor further causes the device to control execution of a query over the data externally stored with the index based on selectivity according to a probability that a segment of the data satisfies a filter expression of a query. 16. A computer-readable storage medium having instructions stored thereon, wherein execution of the instructions by a processor of a device causes the device to: create an index locally in a relational data warehouse system for data stored external to the relational data warehouse system in a non-relational distributed file system; receive a query, by the relational data warehouse system, for at least a portion of the data stored in the non-relational distributed file system; determine that the index is current with respect to an indexed portion of the data stored external to the relational data warehouse, and stale with respect to an updated portion of the data stored external to the relational data warehouse; identify, as a first selected data subset, data of the indexed portion that satisfies the query based on the index; scan the updated portion of the data stored external to the relational data storage system to identify a second selected data subset matching the query; return the first selected data subset and the second selected data subset in response to the query. 17. The computer-readable storage medium of claim 16 , wherein execution of the instructions by the processor further causes the device to acquire a portion of the data from the non-relational distributed file system for the query without the index. 18. The computer-readable storage medium of claim 16 , wherein execution of the instructions by the processor further causes the device to incrementally update the index in response to one or more changes to the data. 19. The computer-readable storage medium of claim 16 , wherein execution of the instructions by the processor further causes the device to evaluating a portion of the query over relational data stored in the relational data warehouse system in conjunction with the data satisfying the query from the non-relational distributed file system.

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS] · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • Management thereof · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9715515B2 cover?
A split-index can be employed for access to external data. The index can be created on a primary data storage system for data stored externally on a secondary data storage system. After creation, the index can be utilized to expedite at least query execution over the externally stored data. The index can be updated upon detection of changes to data. Further, even when the index is not completel…
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F17/30336. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).