Generic Indexing for Efficiently Supporting Ad-Hoc Query Over Hierarchically Marked-Up Data
US-2015134670-A1 · May 14, 2015 · US
US2017124166A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2017124166-A1 |
| Application number | US-201615336961-A |
| Country | US |
| Kind code | A1 |
| Filing date | Oct 28, 2016 |
| Priority date | Oct 29, 2015 |
| Publication date | May 4, 2017 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Improved computer technology is disclosed for enabling high performance stream processing on data such as complex, hierarchical data. In an example embodiment, a dynamic field schema specifies a dynamic field format for expressing the incoming data. An incoming data stream is then translated according to the dynamic field schema into an outgoing data stream in the dynamic field format. Stream processing, including field-specific stream processing, can then be performed on the outgoing data stream.
Opening claim text (preview).
What is claimed is: 1 . A method comprising: translating, by a processor, a first data stream into a second data stream, the second data stream exhibiting a dynamic field format according to a dynamic field schema; and performing, by a processor, a processing operation on data within the second data stream over a sequentially advancing window of the second data stream, wherein the second data stream has a length that is longer than a length of the window. 2 . The method of claim 1 wherein the dynamic field format comprises an ordered collection of fields that supports variable length fields and nested hierarchical data structures. 3 . The method of claim 2 wherein the dynamic field format is flexible at execution time and self-describing with regard to field boundaries and the nested hierarchical data structures. 4 . The method of claim 3 wherein the dynamic field format supports a plurality of types, the types comprising a simple type, a group type, an array type, and a switch type. 5 . The method of claim 4 wherein the dynamic field schema specifies a plurality of the supported types. 6 . The method of claim 5 wherein the dynamic field schema further specifies a start member. 7 . The method of claim 3 wherein the second data stream comprises a plurality of tokens, each of a plurality of the tokens comprising metadata and data, the data comprising a plurality of data characters that serve as payload data for the second data stream, the metadata describing how the data characters relate to the dynamic field format. 8 . The method of claim 7 wherein the performing step comprises: a processor selectively targeting a field of the second data stream for a data processing operation without analyzing the data characters of the second data stream. 9 . The method of claim 8 wherein the selectively targeting step comprises the processor selectively targeting a field of the second data stream based on a plurality of field identifiers in the metadata of the tokens in the second data stream. 10 . The method of claim 8 wherein the first data stream comprises a plurality of records, the records comprising a plurality of the data characters in a plurality of fields. 11 . The method of claim 8 wherein the data processing operation comprises at least one of (1) field re-formatting, (2) data format conversion, (3) lookup and replace, (4) field masking, (5) regular expression pattern matching, (6) exact matching, (7) approximate matching, (8) address masking, (9) filtering and selection, (10) encryption, (11) decryption, (12) aggregation, (13) address validation, (14) email validation, and (15) data validation. 12 . The method of claim 8 wherein the performing step further comprises: a processor pivoting data corresponding to a plurality of records within the second data stream to group the selectively targeted field in the plurality of records; and a processor performing the data processing operation on the grouped fields in parallel. 13 . The method of claim 12 wherein the processor that performs the data processing operation on the grouped fields in parallel comprises a graphics processing unit (GPU). 14 . The method of claim 7 wherein the metadata comprises: a token type; a start of data flag; an end of data flag; a start of record flag; an end of record flag; and a field identifier. 15 . The method of claim 14 wherein the metadata further comprises a length for the data. 16 . The method of claim 15 wherein the metadata further comprises an application-specific metadata value. 17 . The method of claim 14 wherein the token type is a member of a group of available token types, the available token types comprising: a field type; a start of group type; and an end of group type. 18 . The method of claim 14 wherein the tokens are variable length tokens. 19 . The method of claim 7 wherein the translating step further comprises a processor bundling a plurality of the tokens in a single message of the second data stream such that the second data stream comprises a plurality of messages that include bundled tokens. 20 . The method of claim 1 further comprising: a processor compiling the dynamic field schema into a program for use by the translating step, the program configured to define a plurality of rules for generating the second data stream from the first data stream in accordance with the dynamic field schema; and wherein the translating step comprises a processor executing the program to translate the first data stream into the second data stream. 21 . The method of claim 20 wherein the program comprises an array of subroutines, each subroutine comprising an array of instructions, each instruction comprising an opcode and one or more opcode parameters, wherein the instructions are members of a group of available instructions, the group of available instructions comprising: an advance input instruction; a copy to stack instruction; a copy to output token instruction; a copy until delimiter instruction; a copy counted instruction; a handle group instruction; a handle array instruction; a handle switch instruction; a convert number instruction; a range check instruction; a basic math instruction; and an error instruction. 22 . The method of claim 21 wherein the compiling step comprises a processor (1) reading the dynamic field schema, and (2) based on the read dynamic field schema, (i) arranging the subroutines for inclusion in the program, and (ii) selecting instructions from the group of available instructions for inclusion in each subroutine such that the subroutines and instructions are arranged and selected in accordance with the dynamic field schema. 23 . The method of claim 1 wherein the translating step comprises the processor translating the first data stream into the second data stream using a runtime environment that comprises a translation program, a call stack, a data stack, a pending input buffer, and an error flag that operate with respect to the first data stream and the second data stream. 24 . The method of claim 1 wherein the translating step is performed by a hardware logic circuit, the hardware logic circuit comprising: an input buffer; a command parser; a program buffer; a call stack; a data stack; a read buffer; and a state machine. 25 . The method of claim 24 wherein the translating step comprises: the command parser reading a translation program from the input buffer, the translation program comprising a plurality of subroutines, the subroutines comprising a plurality of instructions; the command parser populating the program buffer with the translation program; the command parser reading data within the first data stream and writing the read data into the read buffer; the state machine interacting with the program buffer, the read buffer, the call stack and the data stack to control translation of the first data stream into the second data stream according to the translation program. 26 . The method of claim 25 wherein the state machine comprises: an initial state; a program start state; a call subroutine state; an execute instruction state; a pop stack state; and an error state; and wherein the state machine interacting step comprises the state machine transitioning through the states based on contents of the read buffer, the program buffer,
Protocols for interworking; Protocol conversion · CPC title
Organizing or formatting or addressing of data · CPC title
Command handling arrangements, e.g. command buffers, queues, command scheduling · CPC title
Arithmetic instructions · CPC title
Data buffering arrangements · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.