Method and Apparatus for Accelerated Format Translation of Data in a Delimited Data Format
US-2019108177-A1 · Apr 11, 2019 · US
US11789965B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11789965-B2 |
| Application number | US-202016846868-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 13, 2020 |
| Priority date | Oct 23, 2012 |
| Publication date | Oct 17, 2023 |
| Grant date | Oct 17, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Various methods and apparatuses are described for performing high speed format translations of incoming data, where the incoming data is arranged in a delimited data format. As an example, the data in the delimited data format can be translated to a structured format such as a fixed field format using pipelined operations. A reconfigurable logic device can be used in exemplary embodiments as a platform for the format translation.
Opening claim text (preview).
What is claimed is: 1. A method comprising: receiving, by a pipeline, an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters, a plurality of shield characters, and a plurality of field delimiter characters, the field delimiter characters defining a plurality of boundaries between the fields, wherein the pipeline is deployed on at least one of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP); the pipeline processing the bytes of the received byte stream as the bytes stream through the pipeline, wherein the processing step includes the pipeline translating the received byte stream to an outgoing byte stream arranged in a structured format, the outgoing byte stream comprising a plurality of the data characters of the received byte stream arranged in a plurality of fields and stripped of the field delimiter characters and the shield characters, wherein the structured format permits a downstream processing component to jump directly to a field of interest in the outgoing byte stream without requiring the downstream processing component to analyze the data characters of the outgoing byte stream leading up to the field of interest; selectively targeting a field of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream; and performing a field-specific data processing operation on the selectively targeted field of the outgoing byte stream. 2. The method of claim 1 wherein the selectively targeting step and the performing step are performed by a computer system that executes software. 3. The method of claim 1 wherein the selectively targeting step and the performing step are performed by the pipeline. 4. The method of claim 1 wherein the field-specific data processing operation comprises an address validation operation as to whether the data characters in the selectively targeted field exhibit a correct postal service-recognized address format. 5. The method of claim 1 wherein the field-specific data processing operation comprises an email address validation operation as to whether the data characters in the selectively targeted field exhibit a correct email address format. 6. The method of claim 1 wherein the field-specific data processing operation comprises a date validation operation as to whether the data characters in the selectively targeted field exhibit a date in a correct range and format. 7. The method of claim 1 wherein the field-specific data processing operation comprises a query/replace operation that translates the data characters in the selectively targeted field. 8. The method of claim 1 wherein the field-specific data processing operation comprises a field masking or tokenization operation that obfuscates or tokenizes the data characters of the selectively targeted field. 9. The method of claim 1 wherein the field-specific data processing operation comprises a filtering/searching operation that matches data characters in the selectively targeted field against search criteria. 10. The method of claim 1 wherein the field-specific data processing operation comprises a data quality checking operation as part of an extract, transfer, load (ETL) procedure. 11. The method of claim 1 wherein the selectively targeting step comprises selectively targeting a plurality of fields of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream; and wherein the performing step comprises performing a plurality of field-specific data processing operations in parallel on the selectively targeted fields of the outgoing byte stream. 12. The method of claim 1 wherein the pipeline is deployed on a reconfigurable logic device. 13. The method of claim 1 wherein the pipeline is deployed on a GPU. 14. The method of claim 1 wherein the delimited data format is a comma separated value (CSV) format. 15. The method of claim 1 wherein the structured format is a fixed field format. 16. The method of claim 1 wherein the translating step includes the pipeline using the shield characters to recognize data characters in the incoming byte stream that also happen to match field delimiter characters in the incoming byte stream as being data characters rather than field delimiter characters. 17. The method of claim 16 wherein the translating step further includes the pipeline (1) generating a shield character mask based on the shield characters in the incoming byte stream, wherein the shield character mask distinguishes between bytes in the incoming byte stream that may include field delimiter characters and bytes in the incoming byte stream that do not include field delimiter characters, (2) identifying the field delimiter characters in the incoming byte stream based on the shield character mask, and (3) removing the shield characters and the identified field delimiter characters from the outgoing byte stream. 18. An apparatus comprising: at least one of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP) on which a pipeline is deployed; and a data processing stage implemented on a processor or the pipeline; wherein the pipeline is configured to receive an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters, a plurality of shield characters, and a plurality of field delimiter characters, wherein the field delimiter characters define a plurality of boundaries between the fields; wherein the pipeline is further configured to process the bytes of the received byte stream as the bytes stream through the pipeline to translate the received byte stream to an outgoing byte stream arranged in a structured format, the outgoing byte stream comprising a plurality of the data characters of the received byte stream arranged in a plurality of fields and stripped of the field delimiter characters and the shield characters, wherein the structured format permits the data processing stage to jump directly to a field of interest in the outgoing byte stream without requiring the data processing stage to analyze the data characters of the outgoing byte stream leading up to the field of interest; and wherein the data processing stage is configured to (1) selectively target a field of the outgoing byte stream for processing without analyzing the data characters of the outgoing byte stream, and (2) perform a field-specific data processing operation on the selectively targeted field of the outgoing byte stream. 19. An apparatus comprising: a processor; at least one of (1) a reconfigurable logic device, (2) a graphics processor unit (GPU), (3) an application-specific integrated circuit (ASIC), and (4) a chip multi-processor (CMP) on which a pipeline is deployed; and wherein the pipeline is configured to receive an incoming stream comprising a plurality of bytes arranged in a delimited data format, the incoming byte stream being representative of data arranged in a plurality of fields, the incoming byte stream comprising a plurality of data characters, a plurality of shield characters, and a
Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses · CPC title
Data format conversion from or to a database · CPC title
Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP · CPC title
for patient-specific data, e.g. for electronic patient records · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.