Systems and methods for processing binary mainframe data files in a big data environment

US10360198B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10360198-B2
Application numberUS-201614994965-A
CountryUS
Kind codeB2
Filing dateJan 13, 2016
Priority dateJan 13, 2016
Publication dateJul 23, 2019
Grant dateJul 23, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system may read an input file having an input file size and including a first record and a second record. The first and second record may each have a record length. The system may parse the input file into a first split file and a second split file, with the first split file including the first record and the second split file including the second record. The system may distribute the first split file to a first node to generate a first output file and the second split file to a second node to generate a second output file. Any number of additional split files may be distributed to generate any number output files. The system may combine the output files to generate a converted data file.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: reading, by a processor, a binary data file in copybook; converting, by the processor, metadata of the binary data file into actionbook; retrieving, by the processor, a record length from the metadata in the actionbook; determining, by the processor, a number of records in the binary data file by dividing a size of the binary data file by the record length; determining, by the processor, file sizes for split input files based on the number of records in the binary data file; parsing, by the processor and based on the file sizes, records in the binary data file into the split input files; reading, by the processor using the metadata, the split input files; converting, by the processor using a node, the split input files into delimited output records; distributing, by the processor, the delimited output records to different nodes; and combining, by the processor, at least a subset of the delimited output records to generate a converted data file. 2. The method of claim 1 , further comprising: combining, by the processor, partial records of the binary data file into a single record; and pushing, by the processor, the single record into a current split input file or a next split input file of the split input files. 3. The method of claim 1 , further comprising: calculating, by the processor operating on the node, a file size of a split input file of the split input files; retrieving, by the processor and from the actionbook, a record length of the split input file; and calculating, by the processor, a number of records in the split input file using the record length. 4. The method of claim 3 , further comprising storing, by the processor, the converted data file at least partially on a first node and at least partially on a second node. 5. The method of claim 1 , further comprising reading, by the processor, record offsets corresponding to the respective records. 6. The method of claim 1 , further comprising converting, by the processor using the node, data types, wherein the data types include at least one of comp decimals, signed decimals, zoned decimals, string, integer or float. 7. The method of claim 1 , further comprising determining, by the processor, a current file position in a mapping process based on metadata stored in the actionbook. 8. The method of claim 1 , wherein the delimited output records are text-based and contain text results of processing data from the split input file. 9. The method of claim 1 , wherein the copybook describes layout information of binary data in the binary data file, wherein the layout information includes the metadata that describes records in the binary data file, wherein the records include at least one of record length, field length, field data type, number of records or columns per record, and wherein the metadata in the copybook is stored in a binary format or a test-based format. 10. The method of claim 1 , wherein the actionbook includes the metadata stored in object-oriented format, wherein the object-oriented format includes at least one of JSON or XML, wherein the actionbook includes tagged metadata or structured metadata that parses and processes the binary data file, and wherein a node accesses actionbook to map and parse the split input files. 11. The method of claim 1 , further comprising checking, by the processor, a file position a next split file start point to determine if a record was left out. 12. A computer-based system, comprising: a processor; and a tangible, non-transitory memory configured to communicate with the processor, the tangible, non-transitory memory having instructions stored thereon that, in response to execution by the processor, cause the processor to perform operations comprising: reading, by the processor, a binary data file in copybook; converting, by the processor, metadata of the binary data file into actionbook; retrieving, by the processor, a record length from the metadata in the actionbook; determining, by the processor, a number of records in the binary data file by dividing a size of the binary data file by the record length; determining, by the processor, file sizes for split input files based on the number of records in the binary data file; parsing, by the processor and based on the file sizes, records in the binary data file into the split input files; reading, by the processor using the metadata, the split input files; converting, by the processor using a node, the split input files into delimited output records; distributing, by the processor, the delimited output records to different nodes; and combining, by the processor, at least a subset of the delimited output records to generate a converted data file. 13. The computer-based system of claim 12 , further comprising converting, by the processor using the node, data types, wherein the data types include at least one of comp decimals, signed decimals, zoned decimals, string, integer or float. 14. The computer-based system of claim 12 , further comprising determining, by the processor, a current file position in a mapping process based on metadata stored in the actionbook. 15. An article of manufacture including a non-transitory, tangible computer readable storage medium having instructions stored thereon that, in response to execution by a processor, cause the processor to perform operations comprising: reading, by the processor, a binary data file in copybook; converting, by the processor, metadata of the binary data file into actionbook; retrieving, by the processor, a record length from the metadata in the actionbook; determining, by the processor, a number of records in the binary data file by dividing a size of the binary data file by the record length; determining, by the processor, file sizes for split input files based on the number of records in the binary data file; parsing, by the processor and based on the file sizes, records in the binary data file into the split input files; reading, by the processor using the metadata, the split input files; converting, by the processor using a node, the split input files into delimited output records; distributing, by the processor, the delimited output records to different nodes; and combining, by the processor, at least a subset of the delimited output records to generate a converted data file. 16. The article of claim 15 , further comprising: combining, by the processor, partial records of the binary data file into a single record; and pushing, by the processor, the single record into a current split input file or a next split input file of the split input files. 17. The article of claim 15 , further comprising: calculating, by the processor operating on the node, a file size of the split input file; retrieving, by the processor and from the actionbook, a record length of the split input file; and calculating, by the processor, a number of records in the split input file using the record length. 18. The article of claim 15 , further comprising reading, by the processor, record offsets corresponding to the respective records. 19. The article of claim 15 , further comprising converting, by the processor using the node, data types, wherein the data types include at least one of comp decimals, signed decimals, zoned decimals, string, integer or float. 20. The article of claim 15 , further comprising determining, by the processor, a current file position in a mapping process based on metadata stored in the actionbook.

Assignees

Inventors

Classifications

  • G06F16/22Primary

    Indexing; Data structures therefor; Storage structures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10360198B2 cover?
A system may read an input file having an input file size and including a first record and a second record. The first and second record may each have a record length. The system may parse the input file into a first split file and a second split file, with the first split file including the first record and the second split file including the second record. The system may distribute the first s…
Who is the assignee on this patent?
American Express Travel Related Services Co Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 23 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).