Bloom filter utilization for join processing

US10248694B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10248694-B2
Application numberUS-201514840797-A
CountryUS
Kind codeB2
Filing dateAug 31, 2015
Priority dateAug 31, 2015
Publication dateApr 2, 2019
Grant dateApr 2, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method includes inserting a bloom filter creation stage after an inner data source identification stage, wherein a join operation is to be performed to join an outer data source with the inner data source. The method inserts a bloom filter search stage after an outer data source identification stage, wherein each row of data from the outer data source is searched against a bloom filter for the inner data source during the bloom filter search stage. The method initializes a read on the inner data source. Subsequent to determining the bloom filter creation stage is complete, the method initializes a read on the outer data source. The method performs the join operation at a join stage.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer program product comprising: one or more computer readable tangible storage media and program instructions stored on at least one of the one or more storage media, the program instructions comprising: program instructions to insert a bloom filter creation stage after an inner data source identification stage, wherein a join operation is to be performed to join an outer data source with the inner data source; program instructions to, responsive to determining a plurality of optimization conditions are satisfied, wherein the plurality optimization conditions at least specify that the join operation does not have any additive stages between the inner data source and a join stage, determine to insert a first bloom filter search stage after an outer data source identification stage; program instructions to insert the first bloom filter search stage after the outer data source identification stage, wherein each row of data from the outer data source is searched against a first bloom filter for the inner data source during the first bloom filter search stage; program instructions to initialize a read on the inner data source, wherein a plurality of rows from the inner data source pass through to the join stage; program instructions to, responsive to determining the bloom filter creation stage is complete, initialize a read on the outer data source against the first bloom filter, wherein the plurality of rows from the inner data source pass through to the bloom filter creation stage; and program instructions to perform the join operation of the inner data source and the outer data source at the join stage. 2. The computer program product of claim 1 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: create an inner data set based at least on the inner data source and the first bloom filter. 3. The computer program product of claim 1 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: generate a second bloom filter based on reference data during a dataset creation job; insert the second bloom filter after a data source during the dataset creation job; and update a path to the second bloom filter in an inner dataset header for the inner data source. 4. The computer program product of claim 3 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: subsequent to determining the plurality optimization conditions are satisfied, determine to insert a second bloom filter search stage after the outer data source identification stage. 5. The computer program product of claim 3 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: subsequent to determining the plurality optimization conditions are not satisfied, determine to perform the join operation between the inner data source and the outer data source at the join stage without the second bloom filter. 6. The computer program product of claim 1 , wherein the plurality optimization conditions are further selected from a group consisting of: join keys pass through unmodified from the inner data source to the join, join keys pass through unmodified from the outer data source to the join, and a cost of a specific join operation without a bloom filter is greater than a processing cost of the specific join operation with the bloom filter. 7. The computer program product of claim 1 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: send the plurality of rows from the inner data source to the join stage, wherein the one or more rows are not searched against the first bloom filter; responsive to determining the bloom filter creation stage is complete, notify the first bloom filter search stage that the bloom filter creation stage is complete; and load the first bloom filter at the first bloom filter search stage. 8. A computer system comprising: one or more computer processors; one or more computer readable storage media; and program instructions stored on the computer readable storage media for execution by at least one of the one or more computer processors, the program instructions comprising: program instructions to insert a bloom filter creation stage after an inner data source identification stage, wherein a join operation is to be performed to join an outer data source with the inner data source; program instructions to, responsive to determining a plurality of optimization conditions are satisfied, wherein the plurality optimization conditions at least specify that the join operation does not have any additive stages between the inner data source and a join stage, determine to insert a first bloom filter search stage after an outer data source identification stage; program instructions to insert the first bloom filter search stage after the outer data source identification stage, wherein each row of data from the outer data source is searched against a first bloom filter for the inner data source during the first bloom filter search stage; program instructions to initialize a read on the inner data source, wherein a plurality of rows from the inner data source pass through to the join stage; program instructions to, responsive to determining the bloom filter creation stage is complete, initialize a read on the outer data source against the first bloom filter, wherein the plurality of rows from the inner data source pass through to the bloom filter creation stage; and program instructions to perform the join operation of the inner data source and the outer data source at the join stage. 9. The computer system of claim 8 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: create an inner data set based at least on the inner data source and the first bloom filter. 10. The computer system of claim 8 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: generate a second bloom filter based on reference data during a dataset creation job; insert the second bloom filter after a data source during the dataset creation job; and update a path to the second bloom filter in an inner dataset header for the inner data source. 11. The computer system of claim 10 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: subsequent to determining the plurality optimization conditions are satisfied, determine to insert a second bloom filter search stage after the outer data source identification stage. 12. The computer system of claim 10 , further comprising program instructions, stored on the one or more computer readable storage media, which when executed by a processor, cause the processor to: subsequent to determining the plurality optimization conditions are not satisfied, determine to perform the join operation between the inner data source and the outer data source at the join stage without the second bloom filter. 13. The computer system of claim 8 , wherein the plurality optimization conditions are further se

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10248694B2 cover?
A computer-implemented method includes inserting a bloom filter creation stage after an inner data source identification stage, wherein a join operation is to be performed to join an outer data source with the inner data source. The method inserts a bloom filter search stage after an outer data source identification stage, wherein each row of data from the outer data source is searched against …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06F16/2456. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 02 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).