Apparatus and method for document format conversion

US9529781B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9529781-B2
Application numberUS-201314399337-A
CountryUS
Kind codeB2
Filing dateNov 4, 2013
Priority dateJul 22, 2013
Publication dateDec 27, 2016
Grant dateDec 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and method for document format conversion. The apparatus includes a document parsing unit for parsing a fixed layout document to acquire path primitives of the document; a path grouping unit for dividing the path primitives into groups to generate path groups; a font file generating unit for acquiring path groups that are used to represent characters and generating font files corresponding to the path groups, wherein if there are two or more path groups representing the same character, only one font file is generated and associated with the multiple path groups representing the same character; a document generating unit for generating a converted document using all font files that have been generated. With the above, the problem of data redundancy in fixed layout documents is solved; further, the incorrect rending in reflowing processes may be solved to achieve better display effects.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus for document format conversion, comprising: a document parsing unit configured to parse a fixed layout document containing many path primitives and acquire path primitives of the fixed layout document, wherein the fixed layout document is a document containing a predetermined layout of a plurality of characters and non-characters that are each fixed relative to each other in the document, wherein each of the characters and non-characters are defined by one or more of the many path primitives; a path grouping unit configured to, in response to acquiring the path primitives, divide the path primitives into groups and to generate path groups which represent characters or non-characters; a font file generating unit, the font file generating unit being configured to, in response to generating the path groups, acquire path groups representing characters and generating font files corresponding to the path groups representing characters, wherein if there are two or more path groups representing an identical character, only one font file is generated, and is associated with the two or more path groups representing the identical character; a document generating unit configured to, in response to generating the font files, generate a converted document using all font files that have been generated, wherein the number of font files generated that represent the characters is less than the total number of path groups generated that represent the characters. 2. The apparatus for document format conversion according to claim 1 , wherein the path grouping unit includes an enclosing rectangle acquisition subunit configured to acquire a minimum enclosing rectangle of each path primitive, wherein in the case of a character defined by more than one path primitive, the minimum enclosing rectangle is smaller than an enclosing rectangle corresponding to the character defined by more than one path primitive; a group processing subunit configured to detect position relationships between the minimum enclosing rectangles of the various path primitives; in the case of the minimum enclosing rectangles of two path primitives intersecting or in the case of a distance between the minimum enclosing rectangles of two path primitives being less than a predetermined character spacing, dividing the two path primitives into the same path group. 3. The apparatus for document format conversion according to claim 1 , further comprising: a representation determination unit configured to recognize each path group through an Optical Character Recognition (OCR) technique, wherein if a character corresponding to a path group is recognized, the corresponding path group is used to represent the character for the processing of the font file generation unit. 4. The apparatus for document format conversion according to claim 1 , further comprising: a Unicode recognition unit configured to recognize a Unicode value corresponding to a path group representing a character; a character representation unit configured to represent a character to be described using the recognized Unicode value and the font file corresponding therewith. 5. The apparatus for document format conversion according to claim 4 , wherein the font file generation unit is configured to generate the font file using the Unicode value recognized by the Unicode recognition unit and the path group corresponding therewith. 6. The apparatus for document format conversion according to claim 5 , wherein the font file generation unit comprises: a first table generation subunit configured to generate a first table using Unicode values and in which mappings between the Unicode values and font indexes are stored; a second table generation subunit configured to generate a second table using path primitives contained in the path groups, in which the font indexes and font data corresponding to the font indexes are stored; a table processing subunit configured to generate the font file using the first table and the second table. 7. The apparatus for document format conversion according to claim 5 , further comprising: a record state determination unit configured to determine whether a Unicode value recognized by the Unicode recognition unit has been recorded; a data acquisition unit configured to determine, if the Unicode value has been recorded, that there is a path group representing the same character and acquiring the recorded Unicode value and the corresponding font file that has been generated, for representing the character to be described by the character representation unit; and configured to generate, if the Unicode value has not been recorded, a font file for representing the character to be described by the character representation unit. 8. The apparatus for document format conversion according to claim 7 , further comprising: a file storage unit configured to store the font files for the character representation unit to represent a corresponding character using the name of a font file and a Unicode value corresponding to the Unicode file; and a coordinate determination unit configured to further acquire, if an acquired Unicode value of a specified path group has been recorded previously, coordinates of the specified path group, and determine whether the coordinates of the specified path group are identical to those of the recorded path group; wherein if identical, a determination of the same path group is made and no further process is required; otherwise, a new name is generated for the character representation unit to represent a corresponding character using the recorded Unicode value and the new name and for the font file generating unit to generate a font file named with the new name. 9. The apparatus for document format conversion according to claim 1 , wherein the font file generating unit is configured to generate, when it is determined that a Unicode value has not been recorded for a character, a new font file for representing the character. 10. A method for document format conversion, the method comprising: parsing a fixed layout document containing many path primitives to acquire path primitives of the fixed layout document, wherein the fixed layout document is a document containing a predetermined layout of a plurality of characters and non-characters that are each fixed relative to each other in the document, wherein each of the characters and non-characters are defined by one or more of the many path primitives; in response to acquiring the path primitives, dividing the path primitives into groups to generate path groups which represent characters or non-characters; in response to generating the path groups, acquiring path groups that are used to represent characters and generating font files corresponding to the path groups that are used to represent characters, wherein if there are two or more path groups representing the same character, only one font file is generated and associated with the two or more path groups representing the same character; and in response to generating the font files, generating a converted document using all font files that have been generated, wherein the number of font files generated that represent the characters is less than the total number of path groups generated that represent the characters. 11. The method for document format conversion according to claim 10 , wherein the step of dividing the path primitives into groups to generate path groups further comprises the steps of: acquiring a minimum enclosing rectangle of each path primitive, wherein in the case of a character defined by more than one path primitive, the minimum enclosing rectangle is smaller than an enclosing rectan

Assignees

Inventors

Classifications

  • G06F40/12Primary

    Use of codes for handling textual entities · CPC title

  • Display of layout of documents; Previewing · CPC title

  • Font handling; Temporal or kinetic typography · CPC title

  • Tablespace storage structures; Management thereof · CPC title

  • Editing, e.g. inserting or deleting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9529781B2 cover?
An apparatus and method for document format conversion. The apparatus includes a document parsing unit for parsing a fixed layout document to acquire path primitives of the document; a path grouping unit for dividing the path primitives into groups to generate path groups; a font file generating unit for acquiring path groups that are used to represent characters and generating font files corre…
Who is the assignee on this patent?
Univ Peking Founder Group Co, Founder Apabi Tech Ltd, Founder Information Ind Holdings Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/12. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).