Electronic document content redaction
US-2015378973-A1 · Dec 31, 2015 · US
US10733434B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10733434-B2 |
| Application number | US-201816139884-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 24, 2018 |
| Priority date | Sep 24, 2018 |
| Publication date | Aug 4, 2020 |
| Grant date | Aug 4, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer-implemented method, system and a computer program product are provided for automatically detecting redaction blocks in an image file document by analyzing the document to identify any redaction block areas and then detecting location information for each redaction block area identified in the document which may be mapped to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for automatically detecting redaction blocks in a document comprising: receiving, by an information handling system comprising a processor and a memory, the document as an image file; analyzing, by the information handling system, the document to identify any redaction block areas in the document; detecting, by the information handling system, location information for each redaction block area identified in the document; applying, by the information handling system, optical character recognition to the document to detect text fragments in the document; detecting, by the information handling system, location information for each text fragment identified in the document; and mapping, by the information handling system, each redaction block area to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document, wherein the redaction block areas are redacted block areas. 2. The method of claim 1 , further comprising classifying, by the information handling system, each identified redaction block area as a type Ti selected from a group consisting of a text block, a table cell, a checkbox, and unknown. 3. The method of claim 2 , where the type Ti is the checkbox which classifies a redaction block area located over a group of labels in the document. 4. The method of claim 1 , where each redaction block area is a blacked-out area in the document. 5. The method of claim 1 , where detecting location information comprises computing, by the information handling system, a geometric shape and x, y coordinates for each redaction block area. 6. The method of claim 1 , further comprising inserting, by the information handling system, a sentinel string of predetermined characters into the document for each detected redaction block area. 7. The method of claim 1 , where analyzing the document comprises applying a redaction block detection process to scan each line of the image file to identify any redacted text blocks by locating a threshold number T 1 of consecutive black pixels that are aligned in a threshold number T 1 of consecutive rows. 8. An information handling system comprising: one or more processors; a memory coupled to at least one of the processors; a set of instructions stored in the memory and executed by at least one of the processors to automatically detect redaction blocks in a document, wherein the set of instructions are executable to perform actions of: receiving, by the system, the document as an image file; analyzing, by the system, the document to identify any redaction block areas in the document; detecting, by the system, location information for each redaction block area identified in the document; applying, by the system, optical character recognition to the document to detect text fragments in the document; detecting, by the system, location information for each text fragment identified in the document; and mapping, by the system, each redaction block area to any associated text fragments in the document based on the location information for each redaction block area and text fragment in the document, wherein the redaction block areas are redacted block areas. 9. The information handling system of claim 8 , where the set of instructions are executable to classify, by the system, each identified redaction block area as a type Ti selected from a group consisting of a text block, a table cell, a checkbox, and unknown. 10. The information handling system of claim 9 , where the type Ti is the checkbox which classifies a redaction block area located over a group of labels in the document. 11. The information handling system of claim 8 , where each redaction block area is a blacked-out area in the document. 12. The information handling system of claim 8 , where the set of instructions are executable to detect location information by computing a geometric shape and x, y coordinates for each redaction block area. 13. The information handling system of claim 8 , where the set of instructions are executable to insert, by the system, a sentinel string of predetermined characters into the document for each detected redaction block area. 14. The information handling system of claim 8 , where the set of instructions are executable to analyze the document by applying a redaction block detection process to scan each line of the image file to identify any redacted text blocks by locating a threshold number T 1 of consecutive black pixels that are aligned in a threshold number T 1 of consecutive rows. 15. A computer program product stored in a computer readable storage medium, comprising computer instructions that, when executed by an information handling system, causes the system to automatically detecting redaction blocks in a document by performing actions comprising: receiving, by the system, the document as an image file; analyzing, by the information handling system, the document to identify any redaction blocks in the document, wherein each redaction block is a redacted block; detecting, by the information handling system, location information for each redaction block identified in the document; applying, by the information handling system, optical character recognition to the document to detect text fragments in the document; detecting, by the information handling system, location information for each text fragment identified in the document; mapping, by the information handling system, each redaction block to any associated text fragments in the document based on the location information for each redaction block and text fragment in the document; classifying, by the system, each identified redaction block as a redaction block type Ti selected from a group consisting of a text block, a table cell, a checkbox, and unknown; and generating, by the system, an output file which identifies, for the document, each text fragment and associated text fragment location information, along with each redaction block and associated redaction block fragment location information and redaction block type Ti. 16. The computer program product of claim 15 , further comprising computer instructions that, when executed by the information handling system, causes the system to insert a sentinel string of predetermined characters into the document for each detected redaction block. 17. The computer program product of claim 15 , where analyzing the document comprises applying a redaction block detection process to scan each line of the image file to identify any redacted text blocks by locating a threshold number T 1 of consecutive black pixels that are aligned in a threshold number T 1 of consecutive rows.
Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors · CPC title
Classification of content, e.g. text, photographs or tables · CPC title
Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables · CPC title
Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.