Footnote zone detection in a fixed format document using number of paragraphs in footnote description
US-9703759-B2 · Jul 11, 2017 · US
US9953008B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9953008-B2 |
| Application number | US-201313745279-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 18, 2013 |
| Priority date | Jan 18, 2013 |
| Publication date | Apr 24, 2018 |
| Grant date | Apr 24, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Determining relationships between graphical elements in a fixed format document is provided. Graphical element sizes and their relative positions may be analyzed to determine whether two or more graphical elements should be aggregated together or whether the graphical elements should belong to different graphical groups. Graphs and figures comprising objects that are absolutely positioned may be detected, as well as objects where inter-element positions need to be preserved from regular document flow. Additionally, background objects may be differentiated from regular text flow when the objects overlap with text.
Opening claim text (preview).
We claim: 1. A method for converting a fixed format document containing a plurality of graphical elements and a plurality of text lines into a flow format document, the method comprising: detecting a plurality of graphical elements on a page in a fixed format document; determining that a first graphical element is proximate to a second graphical element; grouping the first graphical element and the second graphical element on the page as a first graphic aggregation in the flow format document, wherein a bounding box is defined about the first graphic aggregation; vertically expanding the bounding box to include a first line of text on the page when: the first line of text is located above or below the bounding box, the first line of text does not extend beyond a left or a right side of the bounding box more than a first fraction of the width of the bounding box, and a vertical distance between the first line of text and the top or the bottom of the bounding box is less than an average text line height on the page; and horizontally expanding the bounding box to include a second line of text on the page when: the second line of text is located to the right or the left of the bounding box, and the second line of text is at a distance of less than a second fraction, the second fraction greater than the first fraction, of the width of the bounding box from the left or right side of the bounding box. 2. The method of claim 1 , wherein the plurality of graphical elements comprise a path, an image, and text. 3. The method of claim 1 , wherein determining that a first graphical element is proximate to a second graphical element further comprises: determining a shortest distance between the first graphical element and the second graphical element; and determining the shortest distance between the first graphical element and the second graphical element is less than a determined join threshold. 4. The method of claim 3 , wherein determining a shortest distance between a first graphical element and a second graphical element comprises determining a shortest distance between a bounding box of the first graphical element and a bounding box of the second graphical element. 5. The method of claim 1 , wherein the plurality of graphical elements comprise components of a flow chart, including a path, an image, and text. 6. The method of claim 1 , wherein the second fraction is greater than first fraction. 7. The method of claim 6 , wherein the first fraction comprises one-fifth. 8. The method of claim 7 , wherein the second fraction comprises one-half. 9. The method of claim 1 , further comprising expanding the bounding box to include a third line of text on the page when the third line of text overlaps the bounding box. 10. The method of claim 9 , further comprising expanding the bounding box to include a fourth line of text on the page when the fourth line of text fails to overlap the bounding box but semantically belongs with the first graphic aggregation. 11. The method of claim 1 , wherein the bounding box comprises a first bounding box, and wherein the method further comprises: defining a second graphic aggregation on the page, wherein a second bounding box is defined around the second graphic aggregation; and merging the second graphic aggregation with the first aggregation when the second graphic aggregation overlaps the first graphic aggregation. 12. A system for converting a fixed format document containing a plurality of graphical elements into a flow format document, the system comprising: one or more processors; and a memory coupled to the one or more processors, the one or more processors operable to: detect a plurality of graphical elements on a page in a fixed format document; determine that a first graphical element is proximate to a second graphical element; group display the first graphical element and the second graphical element on the page as a first graphic aggregation in the flow format document and define a bounding box about the first graphic aggregation; vertically expand the bounding box to include a first line of text on the page when: the first line of text is located above or below the bounding box, the first line of text does not extend beyond a left or a right side of the bounding box more than a first fraction of the width of the bounding box, and a vertical distance between the first line of text and the top or the bottom of the bounding box is less than an average text line height on the page; and horizontally expand the bounding box to include a second line of text on the page when: the second line of text is located to the right or the left of the bounding box, and the second line of text is at a distance of less than a second fraction of the width of the bounding box from the left or right side of the bounding box. 13. The system of claim 12 , wherein the plurality of graphical elements comprise a path, an image, and text. 14. The system of claim 12 , wherein in determining that a first graphical element is proximate to a second graphical element, the one or more processors are further operable to: determine a shortest distance between the first graphical element and the second graphical element; and determine the shortest distance between the first graphical element and the second graphical element is less than a determined join threshold. 15. The system of claim 14 , wherein in determining a shortest distance between a first graphical element and a second graphical element, the one or more processors are operable to determine a shortest distance between a bounding box of the first graphical element and a bounding box of the second graphical element. 16. The system of claim 12 , wherein the one or more processors are further operable to: determine that the first graphic aggregation and a second graphic aggregation intersect; and merge the first graphic aggregation and the second graphic aggregation into a single aggregation. 17. A computer readable storage device containing computer executable instructions which, when executed by a computer, perform a method for converting a fixed format document containing a plurality of graphical elements into a flow format document, the method comprising: detecting a plurality of graphical elements on a page in a fixed format document; determining that a first graphical element is proximate to a second graphical element; and grouping the first graphical element and the second graphical element on the page as a first graphic aggregation in the flow format document, wherein a bounding box is defined about the first graphic aggregation; vertically expanding the bounding box to include a first line of text on the page when: the first line of text is located above or below the bounding box, the first line of text does not extend beyond a left or a right side of the bounding box more than a first fraction of the width of the bounding box, and a vertical distance between the first line of text and the top or the bottom of the bounding box is less than an average text line height on the page; and horizontally expanding the bounding box to include a second line of text on the page when: the second line of text is located to the right or the left of the bounding box, and the second line of text is at a distance of less than a second fraction of the width of the bounding box from the left or right side of the bounding box.
Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title
Fragmentation of text files, e.g. creating reusable text-blocks; Linking to fragments, e.g. using XInclude; Namespaces · CPC title
Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title
including print-ready data, i.e. data already matched to the printing process · CPC title
Display of layout of documents; Previewing · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.