Method and apparatus for accelerated format translation of data in a delimited data format

US11789965B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11789965-B2
Application numberUS-202016846868-A
CountryUS
Kind codeB2
Filing dateApr 13, 2020
Priority dateOct 23, 2012
Publication dateOct 17, 2023
Grant dateOct 17, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various methods and apparatuses are described for performing high speed format translations of incoming data, where the incoming data is arranged in a delimited data format. As an example, the data in the delimited data format can be translated to a structured format such as a fixed field format using pipelined operations. A reconfigurable logic device can be used in exemplary embodiments as a platform for the format translation.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: receiving, by a pipeline, an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters, a plurality of shield characters, and a plurality of field delimiter characters, the field delimiter characters defining a plurality of boundaries between the fields, wherein the pipeline is deployed on at least one of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP); the pipeline processing the bytes of the received byte stream as the bytes stream through the pipeline, wherein the processing step includes the pipeline translating the received byte stream to an outgoing byte stream arranged in a structured format, the outgoing byte stream comprising a plurality of the data characters of the received byte stream arranged in a plurality of fields and stripped of the field delimiter characters and the shield characters, wherein the structured format permits a downstream processing component to jump directly to a field of interest in the outgoing byte stream without requiring the downstream processing component to analyze the data characters of the outgoing byte stream leading up to the field of interest; selectively targeting a field of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream; and performing a field-specific data processing operation on the selectively targeted field of the outgoing byte stream. 2. The method of claim 1 wherein the selectively targeting step and the performing step are performed by a computer system that executes software. 3. The method of claim 1 wherein the selectively targeting step and the performing step are performed by the pipeline. 4. The method of claim 1 wherein the field-specific data processing operation comprises an address validation operation as to whether the data characters in the selectively targeted field exhibit a correct postal service-recognized address format. 5. The method of claim 1 wherein the field-specific data processing operation comprises an email address validation operation as to whether the data characters in the selectively targeted field exhibit a correct email address format. 6. The method of claim 1 wherein the field-specific data processing operation comprises a date validation operation as to whether the data characters in the selectively targeted field exhibit a date in a correct range and format. 7. The method of claim 1 wherein the field-specific data processing operation comprises a query/replace operation that translates the data characters in the selectively targeted field. 8. The method of claim 1 wherein the field-specific data processing operation comprises a field masking or tokenization operation that obfuscates or tokenizes the data characters of the selectively targeted field. 9. The method of claim 1 wherein the field-specific data processing operation comprises a filtering/searching operation that matches data characters in the selectively targeted field against search criteria. 10. The method of claim 1 wherein the field-specific data processing operation comprises a data quality checking operation as part of an extract, transfer, load (ETL) procedure. 11. The method of claim 1 wherein the selectively targeting step comprises selectively targeting a plurality of fields of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream; and wherein the performing step comprises performing a plurality of field-specific data processing operations in parallel on the selectively targeted fields of the outgoing byte stream. 12. The method of claim 1 wherein the pipeline is deployed on a reconfigurable logic device. 13. The method of claim 1 wherein the pipeline is deployed on a GPU. 14. The method of claim 1 wherein the delimited data format is a comma separated value (CSV) format. 15. The method of claim 1 wherein the structured format is a fixed field format. 16. The method of claim 1 wherein the translating step includes the pipeline using the shield characters to recognize data characters in the incoming byte stream that also happen to match field delimiter characters in the incoming byte stream as being data characters rather than field delimiter characters. 17. The method of claim 16 wherein the translating step further includes the pipeline (1) generating a shield character mask based on the shield characters in the incoming byte stream, wherein the shield character mask distinguishes between bytes in the incoming byte stream that may include field delimiter characters and bytes in the incoming byte stream that do not include field delimiter characters, (2) identifying the field delimiter characters in the incoming byte stream based on the shield character mask, and (3) removing the shield characters and the identified field delimiter characters from the outgoing byte stream. 18. An apparatus comprising: at least one of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP) on which a pipeline is deployed; and a data processing stage implemented on a processor or the pipeline; wherein the pipeline is configured to receive an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters, a plurality of shield characters, and a plurality of field delimiter characters, wherein the field delimiter characters define a plurality of boundaries between the fields; wherein the pipeline is further configured to process the bytes of the received byte stream as the bytes stream through the pipeline to translate the received byte stream to an outgoing byte stream arranged in a structured format, the outgoing byte stream comprising a plurality of the data characters of the received byte stream arranged in a plurality of fields and stripped of the field delimiter characters and the shield characters, wherein the structured format permits the data processing stage to jump directly to a field of interest in the outgoing byte stream without requiring the data processing stage to analyze the data characters of the outgoing byte stream leading up to the field of interest; and wherein the data processing stage is configured to (1) selectively target a field of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream, and (2) perform a field-specific data processing operation on the selectively targeted field of the outgoing byte stream. 19. An apparatus comprising: a processor; at least one of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP) on which a pipeline is deployed; and wherein the pipeline is configured to receive an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters, a plurality of shield characters, and a

Assignees

Inventors

Classifications

  • G06F16/254Primary

    Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title

  • Data format conversion from or to a database · CPC title

  • Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title

  • for patient-specific data, e.g. for electronic patient records · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11789965B2 cover?
Various methods and apparatuses are described for performing high speed format translations of incoming data, where the incoming data is arranged in a delimited data format. As an example, the data in the delimited data format can be translated to a structured format such as a fixed field format using pipelined operations. A reconfigurable logic device can be used in exemplary embodiments as a …
Who is the assignee on this patent?
Ip Reservoir Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/254. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 17 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).