Records Access and Management
US-2024419838-A1 · Dec 19, 2024 · US
US9489376B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9489376-B2 |
| Application number | US-201313732501-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 2, 2013 |
| Priority date | Jan 2, 2013 |
| Publication date | Nov 8, 2016 |
| Grant date | Nov 8, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, apparatus and computer program product to identify confidential information in a document. To examine a document for inclusion of confidential information, the document is compared against documents having similar structure and content from one or more other sources. When comparing documents (of similar structure and content) from different sources, confidential information is then made to stand out by searching for terms (from the sources) that are not shared between or among them. In contrast, common words or terms that are shared across the sources are ignored as likely being non-confidential information; what remains as not shared may then be classified as confidential information and protected accordingly (e.g., by omission, redaction, substitution or the like). Using this technique, non-confidential information may be safely segmented from confidential information in a dynamic, automated manner.
Opening claim text (preview).
Having described our invention, what we now claim is as follows: 1. A method of identifying potential confidential information in a data item, the data item associated with a source, comprising: obtaining, from each of a set of alternative sources, a data item of a same type and format as the data item; comparing, using a hardware element, the data item to the data item(s) obtained from the set of alternative sources to identify occurrences of particular pieces of information in the data item, wherein multiple occurrences of a particular piece of information within a data item from each alternative source are treated as a single occurrence; and based on the occurrences of particular pieces of information in the data item and a given sensitivity criteria, and without knowledge that the particular pieces of information are considered by the source to be confidential, segmenting one or more pieces of information in the data item as representing the potential confidential information. 2. The method as described in claim 1 further including highlighting the one or more pieces of information. 3. The method as described in claim 2 further including taking a given action with respect to the one or more pieces of information that have been highlighted. 4. The method as described in claim 3 wherein the given action is one of: removing the piece of information, redacting the piece of information, and substituting non-confidential data for the piece of information. 5. The method as described in claim 3 further including outputting the data item without the one or more pieces of information. 6. The method as described in claim 1 wherein the data item is one of: a document, a report, a file, a log, a message, an email, and a communication. 7. The method as described in claim 1 wherein the given sensitivity criteria is a configurable threshold. 8. Apparatus, comprising: a processor; computer memory holding computer program instructions that when executed by the processor perform a method of identifying potential confidential information in a data item, the data item associated with a source, the method comprising: obtaining, from each of a set of alternative sources, a data item of a same type and format as the data item; comparing the data item to the data item(s) obtained from the set of alternative sources to identify occurrences of particular pieces of information in the data item, wherein multiple occurrences of a particular piece of information within a data item from each alternative source are treated as a single occurrence; and based on the occurrences of particular pieces of information in the data item and a given sensitivity criteria, and without knowledge that the particular pieces of information are considered by the source to be confidential, segmenting one or more pieces of information in the data item as representing the potential confidential information. 9. The apparatus as described in claim 8 wherein the method further includes highlighting the one or more pieces of information. 10. The apparatus as described in claim 9 wherein the method further includes taking a given action with respect to the one or more pieces of information that have been highlighted. 11. The apparatus as described in claim 10 wherein the given action is one of: removing the piece of information, redacting the piece of information, and substituting non-confidential data for the piece of information. 12. The apparatus as described in claim 10 wherein the method further includes outputting the data item without the one or more pieces of information. 13. The apparatus as described in claim 8 wherein the data item is one of: a document, a report, a file, a log, a message, an email, and a communication. 14. The apparatus as described in claim 8 wherein the given sensitivity criteria is a configurable threshold. 15. A computer program product in a non-transitory computer-readable storage medium in a data processing system, the computer program product holding computer program instructions which, when executed by the data processing system, perform a method of identifying potential confidential information in a data item, the data item associated with a source, the method comprising: obtaining, from each of a set of alternative sources, a data item of a same type and format as the data item; comparing the data item to the data item(s) obtained from the set of alternative sources to identify occurrences of particular pieces of information in the data item, wherein multiple occurrences of a particular piece of information within a data item from each alternative source are treated as a single occurrence; and based on the occurrences of particular pieces of information in the data item and a given sensitivity criteria, and without knowledge that the particular pieces of information are considered by the source to be confidential, segmenting one or more pieces of information in the data item as representing the potential confidential information. 16. The computer program product as described in claim 15 wherein the method further includes highlighting the one or more pieces of information. 17. The computer program product as described in claim 16 wherein the method further includes taking a given action with respect to the one or more pieces of information that have been highlighted. 18. The computer program product as described in claim 17 wherein the given action is one of: removing the piece of information, redacting the piece of information, and substituting non-confidential data for the piece of information. 19. The computer program product as described in claim 17 wherein the method further includes outputting the data item without the one or more pieces of information. 20. The computer program product as described in claim 15 wherein the data item is one of: a document, a report, a file, a log, a message, an email, and a communication. 21. The computer program product as described in claim 15 wherein the given sensitivity criteria is a configurable threshold. 22. Apparatus, comprising: a display interface; a processor; computer memory holding computer program instructions executed by the processor to identify potential confidential information in a data item, the data item associated with a source, by (i) comparing the data item to data items of a similar type and format received from a set of alternative sources to identify occurrences of particular pieces of information in the data item, and (ii) based on the occurrences of particular pieces of information in the data item, and without knowledge that the particular pieces of information are considered by the source to be confidential, identifying one or more pieces of information in the data item as representing the potential confidential information, wherein multiple occurrences of a particular piece of information within a data item from at least one particular alternative source are treated as a single occurrence, and (iii) outputting, on the display interface, a representation of the data item with the one or more pieces of information representing potential confidential information highlighted.
Protecting personal data, e.g. for financial or medical purposes · CPC title
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.