Opaque message parsing

US9882844B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9882844-B2
Application numberUS-201414512391-A
CountryUS
Kind codeB2
Filing dateOct 11, 2014
Priority dateDec 10, 2013
Publication dateJan 30, 2018
Grant dateJan 30, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method of parsing a message comprising a sequence of data fields, the method comprising evaluating program code for processing the parsed message to identify a first set of data fields of the message that are referenced in said program code; identifying the boundaries of the data fields in a schema defining the format of said message; identifying a second set of data fields in said schema related to the first set of data fields by reference, said second set further including the first set; and sequentially parsing the message using the identified data field boundaries, wherein said parsing step comprises skipping data fields in said sequence that precede the first data field belonging to the second set. A computer program product comprising program code for implementing this method and a data processing system adapted to implement this method are also disclosed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of parsing a message comprising a sequence of data fields, the method comprising: evaluating program code for processing the message to identify a first set of data fields of the message that are referenced in the program code; identifying boundaries of the first set of data fields in a schema defining a format of the message; identifying a second set of data fields in the schema, the second set of data fields being related to the first set of data fields by reference, the second set of data fields further including the first set of data fields; sequentially parsing the message using the boundaries of the first set of data fields, wherein sequentially parsing comprises skipping, according to the sequence, a subset of data fields of the first set of data fields that precede in the sequence a first occurrence of a data field belonging to the second set of data fields, wherein the program code comprises a conditional expression including a plurality of branches, each of said branches referencing a different one of said data fields, said branch decision depending on a further data field downstream in said sequence relative to at least some of the data fields in said branches, the method further comprising: collecting run-time statistics from the parsing of a plurality of messages to determine the frequency of each branch being taken; selecting branches that are taken at a frequency above a defined threshold; identifying the data fields referenced by the selected branches; and skipping the parsing of data fields referenced by unselected branches that precede the data fields of the selected branches in said sequence. 2. The method of claim 1 , wherein the message has a tree structure, wherein the data sequence of data fields comprises sequence groups, each group defining a parent node and N sibling nodes of the tree, wherein N is an integer of at least zero. 3. The method of claim 2 , wherein the parsing step comprises skipping all data fields belonging to the same sequence group if the sequence group does not contain said first data field. 4. The method of claim 2 , wherein the parsing step comprises skipping all data fields of a sequence group preceding said first data field. 5. The method of claim 1 , wherein the parsing step comprises skipping all data fields not belonging to the second set. 6. The method of claim 1 , wherein said selecting step comprises selecting the most frequently taken branch only. 7. The method of claim 1 , further comprising: parsing the further data field; evaluating the parsing result; returning to the skipped data field referenced by an unselected branch in case the further data field requests the taking of said unselected branch; and repeating said parsing step starting from said previously skipped data field and including the previously skipped data field in said repeated parsing step. 8. The method of claim 1 , further comprising producing an output including unparsed data fields not belonging to said second set and parsed data fields belonging to said second set. 9. The method of claim 1 , wherein the message is an XML message. 10. A computer program product comprising a non-transitory computer-readable data carrier, said carrier comprising computer program code for implementing the method of claim 1 when executed on at least one processor of a data processing system. 11. A data processing system comprising at least one processor coupled to a memory having program code that is configured to perform, when executed by the at least one processor, steps of: evaluating program code for processing the message to identify a first set of data fields of the message that are referenced in the program code; identifying boundaries of the first set of data fields in a schema defining a format of the message; identifying a second set of data fields in the schema, the second set of data fields being related to the first set of data fields by reference, the second set of data fields further including the first set of data fields; and sequentially parsing the message using the boundaries of the first set of data fields, wherein sequentially parsing comprises skipping, according to the sequence, a subset of data fields of the first set data fields that precede in the sequence a first occurrence of a data field belonging to the second set of data fields, wherein the program code comprises a conditional expression including a plurality of branches, each of said branches referencing a different one of said data fields, said branch decision depending on a further data field downstream in said sequence relative to at least some of the data fields in said branches, the method further comprising: collecting run-time statistics from the parsing of a plurality of messages to determine the frequency of each branch being taken; selecting branches that are taken at a frequency above a defined threshold; identifying the data fields referenced by the selected branches; and skipping the parsing of data fields referenced by unselected branches that precede the data fields of the selected branches in said sequence. 12. The data processing system of claim 11 , wherein the system further comprises processing the parsed message, wherein the schema defines the format of said message. 13. The data processing system of claim 11 , wherein the data processing system is adapted to act as a message broker between a message producer and a message consumer. 14. The data processing system of claim 11 , wherein the data processing system is adapted to implement a part of a service-oriented architecture.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9882844B2 cover?
A computer-implemented method of parsing a message comprising a sequence of data fields, the method comprising evaluating program code for processing the parsed message to identify a first set of data fields of the message that are referenced in said program code; identifying the boundaries of the data fields in a schema defining the format of said message; identifying a second set of data fiel…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04L51/04. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Jan 30 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).