Entropy-based detection of sensitive information in code

US9336381B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9336381-B1
Application numberUS-201313858448-A
CountryUS
Kind codeB1
Filing dateApr 8, 2013
Priority dateApr 8, 2013
Publication dateMay 10, 2016
Grant dateMay 10, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described for identifying security credentials or other sensitive information based on an entropy-based analysis of information included in documents such as source code files, object code files, or other types of files. A baseline information entropy may be determined for one or more documents, indicating a baseline level of randomness for information in the document(s). One or more of the documents may be analyzed to identify the presence of high entropy portions that have an information entropy above a threshold value. The threshold value may be based on the baseline information entropy, or based on other criteria such as a programming language of the document(s). Because security credentials may have a higher level of information entropy than the surrounding code, any high entropy portions of the document(s) may be identified as potential security risks.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for increasing security by identifying sensitive information in documents, the method comprising: analyzing an information entropy of a first document; determining, from the information entropy of the first document, a baseline information entropy for the first document that indicates a randomness of information included in the first document; from the baseline information entropy, determining a threshold information entropy; determining an information entropy of at least one portion of the first document; determining whether the information entropy of the at least one portion of the first document is at least the threshold information entropy; based on determining that the information entropy of the at least one portion of the first document is the at least the threshold information entropy, identifying the at least one portion of the first document as including at least one security risk; and generating a second document based on the first document, wherein the at least one portion of the first document identified as including the at least one security risk is one or more of disabled, removed, or replaced by the generating. 2. The method of claim 1 , wherein the at least one security risk comprises one or more of a password, a cryptographic key, a certificate, an initialization vector for a cipher, an Ethernet address, a media access control (MAC) address, a universally unique identifier (UUID), or a uniform resource locator (URL). 3. The method of claim 1 , wherein the threshold information entropy is based at least in part on a programming language or a natural language of the information included in the first document. 4. The method of claim 1 , further comprising providing a notification describing the at least one portion of the first document as including the at least one security risk, wherein the notification is provided in real time as the at least one portion of the first document is identified as including the at least one security risk. 5. A system, comprising: a memory storing computer-executable instructions; and a processor in communication with the memory, the processor configured to access the memory and execute the computer-executable instructions to: analyze a first document to determine an information entropy of the first document; determine, from the information entropy of the analyzed first document, a baseline information entropy for the first document that indicates a randomness of information included in the first document; from the baseline information entropy, determine a threshold information entropy; determine an information entropy of at least one portion of the first document; determine whether the information entropy of the at least one portion of the first document is at least the threshold information entropy; based on determining that the information entropy of the at least one portion of the first document is the at least the threshold information entropy, identify the at least one portion of the first document as including at least one security risk, and generate a second document based on the first document, wherein the at least one portion of the first document identified as including the at least one security risk is one or more of disabled, removed, or replaced by the generating. 6. The system of claim 5 , wherein the information included in the at least one portion of the first document comprises one or more of: source code describing a computer program; intermediate language code describing the computer program; machine-executable object code describing the computer program; formatted text; or unformatted text. 7. The system of claim 5 , wherein the at least one security risk includes one or more of a password, a cryptographic key, a certificate, an initialization vector for a cipher, an Ethernet address, a media access control (MAC) address, a universally unique identifier (UUID), or a uniform resource locator (URL). 8. The system of claim 5 , the processor further configured to provide a notification describing the at least one portion of the first document as including the at least one security risk. 9. The system of claim 8 , wherein the notification is provided in real time as the at least one portion of the first document is identified as including the at least one security risk. 10. The system of claim 8 , wherein the notification is provided during processing of the first document that is performed subsequently to a generation of the at least one portion of the first document. 11. The system of claim 5 , the processor further configured to determine the threshold information entropy based at least in part on a frequency of one or more strings in the first document. 12. The system of claim 5 , the processor further configured to determine the threshold information entropy based at least in part on one or more of: a programming language of the information included in the at least one portion of the first document; a natural language of the information included in the at least one portion of the first document; a location of one or more authors of the at least one portion of the first document; or a group affiliation of the one or more authors of the at least one portion of the first document. 13. One or more non-transitory computer-readable media storing instructions which, when executed, instruct at least one processor to perform actions comprising: analyzing a first document to determine an information entropy of the first document; determining, from the information entropy of the analyzed first document, a baseline information entropy for the first document that indicates a randomness of information included in the first document; from the baseline information entropy, determining a threshold information entropy; determining an information entropy of at least one portion of the first document; determining whether the information entropy of the at least one portion of the first document is at least the threshold information entropy; based on determining that the information entropy of the at least one portion of the first document is the at least the threshold information entropy, identifying the at least one portion of the first document as including at least one security risk; and generating a second document based on the first document, wherein the at least one portion of the first document identified as including the at least one security risk is one or more of disabled, removed, or replaced by the generating. 14. The one or more non-transitory computer-readable media of claim 13 , the actions further comprising: providing at least one notification describing the at least one portion of the first document as the at least one security risk. 15. The one or more non-transitory computer-readable media of claim 13 , the actions further comprising: based on determining that the information entropy of the at least one portion of the first document is the at least the threshold information entropy, removing, disabling, or replacing the at least one portion of the first document. 16. The one or more non-transitory computer-readable media of claim 13 , the actions further comprising: based at least partly on determining that the information entropy of the at least one portion of the first document is the at least the threshold information entropy, identifying at least one unauthorized use associated with an insertion of the at least one portion of the document into the first document. 17. The one or more non-transitory computer-readable media of claim 13 , where

Assignees

Inventors

Classifications

  • G06F21/577Primary

    Assessing vulnerabilities and evaluating computer system security · CPC title

  • G06F21/50Primary

    Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems · CPC title

  • Test or assess software · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9336381B1 cover?
Techniques are described for identifying security credentials or other sensitive information based on an entropy-based analysis of information included in documents such as source code files, object code files, or other types of files. A baseline information entropy may be determined for one or more documents, indicating a baseline level of randomness for information in the document(s). One or …
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/577. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 10 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).