Managing user information - source prioritization
US-2015347690-A1 · Dec 3, 2015 · US
US9817860B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9817860-B2 |
| Application number | US-201113324202-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 13, 2011 |
| Priority date | Dec 13, 2011 |
| Publication date | Nov 14, 2017 |
| Grant date | Nov 14, 2017 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods of generating filters automatically from data processing jobs are described. In an embodiment, these filters are automatically generated from a compiled version of the data processing job using static analysis which is applied to a high-level representation of the job. The executable filter is arranged to suppress rows and/or columns within the data to which the job is applied and which do not affect the output of the job. The filters are generated by a filter generator and then stored and applied dynamically at a filtering proxy that may be co-located with the storage node that holds the data. In another embodiment, the filtered data may be cached close to a compute node which runs the job and data may be provided to the compute node from the local cache rather than from the filtering proxy.
Opening claim text (preview).
The invention claimed is: 1. A method comprising: receiving, at a filter generator, a compiled input data processing job; creating a high-level representation of the compiled input data processing job; applying static analysis to the high-level representation to automatically generate an executable filter, the executable filter being arranged, when applied to data, to selectively suppress data elements which do not influence an output of the data processing job; correctness of the executable filter being ensured through enforcement of single-record correctness, globally-stateless mappers, and isolation; outputting the executable filter to a filtering proxy; and at least one of: modifying a uniform resource identifier within the compiled input data processing job to point to the filtering proxy, embedding a filter specification in each uniform resource identifier, and outputting the modified compiled input data processing job; applying static analysis to the high-level representation to automatically generate an executable filter further comprising at least one of: identifying a set of output instructions within the high-level representation, identifying, for each program point in the high-level representation, those instructions in the high-level representation that can affect values of variables at that program point, generating a set of control flow instructions from the high-level representation for each output instruction, the set of control flow instructions comprising those control flow instructions that may lead to the output instruction, and generating an executable filter comprising code corresponding to the sets of control flow instructions and those instructions that can affect values of variables in each control flow instruction with output labels replaced with a return true statement; identifying a set of output labels within the high-level representation, collecting a set of control flow labels that form part of an execution path leading to an output label, computing, for each point in the high-level representation, a map from a variable to labels selected from the set of labels which may influence a value of the variable at that point, identifying, for each control flow label in the set of control flow labels and using the map, at least one relevant set of control flow labels comprising those labels that can affect the value of a variable used in the control flow label to perform a jump, and generating an executable filter comprising code corresponding to the one or more relevant sets of control flow labels by replacing output labels with a return true statement and by inserting a return false statement at all other exits from the control flow graph; or assigning a state to each program point within the high-level representation, the state capturing a representation, at that program point, of an input to the data processing job as either a string or a sequence of tokens, identifying program points within the high-level representation that de-reference an input token used subsequently in the data processing job, at each identified program point, determining a constraint associated with at least one token used from a computed state at that program point, and generating an executable filter comprising code which iterates over all sequences of tokens and emits any input tokens that are dereferenced in the high-level representation; ensuring that, if a mapper class field is read in an execution path of the globally stateless mapper, at least one of, the mapper class field is set earlier in the execution path than the filter, or the mapper class field is not updated in any execution path of the globally stateless mapper; or receiving and storing the executable filter at the filtering proxy, intercepting, at a caching proxy, a request for data from a compute node, performing a comparison at the caching proxy, to determine if the requested data is available in a local cache, if at least a portion of the requested data is available in the local cache, providing cached filtered data to the compute node, if at least a portion of the requested data is not available in the local cache, sending a request for that unavailable data to the filtering proxy, and in response to receiving the request from the caching proxy at the filtering proxy, accessing the requested data from a storage node, dynamically applying the executable filter to the data to generate filtered data; and providing the filtered data to the compute node. 2. The method according to claim 1 , further comprising: modifying the compiled input data processing job to reference the filtering proxy; and outputting the modified compiled input data processing job. 3. The method according to claim 2 , wherein modifying the compiled input data processing job comprises: modifying a uniform resource identifier within the compiled input data processing job to point to the filtering proxy; and embedding a filter specification in each uniform resource identifier. 4. The method according to claim 1 , wherein the data elements comprise rows within the data. 5. The method according to claim 4 , wherein applying static analysis to the high-level representation to automatically generate an executable filter comprises: identifying a set of output instructions within the high-level representation; identifying, for each program point in the high-level representation, those instructions in the high-level representation that can affect values of variables at that program point; generating a set of control flow instructions from the high-level representation for each output instruction, the set of control flow instructions comprising those control flow instructions that may lead to the output instruction; and generating an executable filter comprising code corresponding to the sets of control flow instructions and those instructions that can affect values of variables in each control flow instruction with output labels replaced with a return true statement. 6. The method according to claim 4 , wherein applying static analysis to the high-level representation to automatically generate an executable filter comprises: identifying a set of output labels within the high-level representation; collecting a set of control flow labels that form part of an execution path leading to an output label; computing, for each point in the high-level representation, a map from a variable to labels selected from the set of labels which may influence a value of the variable at that point; identifying, for each control flow label in the set of control flow labels and using the map, at least one relevant set of control flow labels comprising those labels that can affect the value of a variable used in the control flow label to perform a jump; and generating an executable filter comprising code corresponding to the one or more relevant sets of control flow labels by replacing output labels with a return true statement and by inserting a return false statement at all other exits from the control flow graph. 7. The method according to claim 1 , wherein the data elements comprise column entries within a row of data. 8. The method according to claim 7 , wherein applying static analysis to the high-level representation to automatically generate an executable filter comprises: assigning a state to each program point within the high-level representation, the state capturing a representation, at that program point, of an input to the data processing job as either a string or a sequence of tokens; identifying program points within the high-level representation that de-reference an input token used subsequently in the data processing job; at each identified program point, determining a constraint associated with at least
Querying · CPC title
Interprogram communication · CPC title
Intercept · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.