Tracking sensitive data in a distributed computing environment

US10515212B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-10515212-B1
Application numberUS-201615189824-A
CountryUS
Kind codeB1
Filing dateJun 22, 2016
Priority dateJun 22, 2016
Publication dateDec 24, 2019
Grant dateDec 24, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Computing resource service providers may operate a plurality of computing resources in a distributed computing environment. In addition, the computing resource server providers may provide customers with access to applications and/or services. The applications and/or services may include sensitive data. Sensitive data in the distributed computing environment may be tracked by analyzing source code associated with the applications and/or services. Analysis of the source code may include detecting operations associated with databases and generating schemas associated with the databases based at least in part on attributes included in the source code. Sensitive data may be detected based at least in part on the schemas generated by analyzing the source code.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: obtaining source code from a data store, the source code associated with a software development project; detecting code whose execution results in a create, read, update, or delete operation to be performed in a schemaless database by at least performing lexical analysis of the source code for indication of an operation associated with the schemaless database; as a result of detecting code, obtaining a data structure included in the operation; generating a schema associated with the schemaless database by at least obtaining a set of attributes from the data structure; detecting sensitive data maintained in the schemaless database based at least in part on the schema; and providing an indication of a location of the sensitive data. 2. The computer-implemented method of claim 1 , wherein the computer-implemented method further comprises updating a graph based at least in part on detecting sensitive data, the graph indicating locations from which sensitive data is accessible in a distributed computing environment. 3. The computer-implemented method of claim 1 , wherein detecting sensitive data further comprises detecting sensitive data based at least in part on a result of pattern matching the set of attributes included in the schema. 4. The computer-implemented method of claim 1 , wherein detecting code further comprises performing semantic analysis of the source code for indications of the schemaless database. 5. A system, comprising: at least one computing device implementing one or more services, wherein the one or more services: obtain a set of code objects from a data store, the set of code objects associated with one or more applications executed in a distributed computing environment; generate a set of schemas associated with one or more schemaless databases utilized by the one or more applications by at least: detecting a set of operations associated with the one or more schemaless databases based at least in part on performing lexical analysis on the set of code objects, wherein the set of operations comprises at least one of a create, read, update, or delete operation on the one or more schemaless databases; and obtaining a set of attributes from data structures associated with the set of operations; detect sensitive data maintained in the one or more schemaless databases based at least in part on the set of schemas; and provide a notification indicative of a location of the sensitive data. 6. The system of claim 5 , wherein the one or more services further comprise transmitting a notification to a client device operated by a developer associated with a code object of the set of code objects for which sensitive data was detected, the notification indicating a risk to sensitive data. 7. The system of claim 5 , wherein generating the set of schemas further comprises generating a set of lexical tokens based at least in part on the set of code objects, each lexical token of the set of lexical tokens including a set of lexemes. 8. The system of claim 5 , wherein detecting sensitive data further comprises detecting sensitive data based at least in part on matching regular expressions associated with sensitive data with the set of attributes included in the set of schemas. 9. The system of claim 5 , wherein detecting the set of operations further comprises detecting the set of operations based at least in part on a result of performing semantic analysis on the set of code objects. 10. The system of claim 5 , wherein the one or more services further comprise generating a graphical representation based at least in part on detecting sensitive data, the graphical representation indicating a location of sensitive data in a computing resource service provider environment. 11. The system of claim 5 , wherein sensitive data further comprises customer private information. 12. The system of claim 5 , wherein the set of attributes further comprises a set of table-keys for storing data in the one or more schemaless databases. 13. A non-transitory computer-readable storage medium comprising executable instructions that, as a result of being executed by one or more processors of a computer system, cause the computer system to at least: obtain a set of code objects from a data store, the set of code objects associated with a software development project; detect an operation on an instantiation of a schemaless database based at least in part on performing lexical analysis of the set of code objects, wherein the operation corresponds to at least one of a create, read, update, or delete operation on the schemaless database; generate a schema associated with the schemaless database by at least obtaining attributes of data stored in the schemaless database associated with the operation; detect sensitive data maintained in the schemaless database based at least in part on the schema; and provide a location of the sensitive data. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to generate the schema for the schemaless database based at least in part on attribute data. 15. The non-transitory computer-readable storage medium of claim 14 , wherein the instructions that cause the computer system to detect sensitive data further include instructions that cause the computer system to detect sensitive data by at least comparing the schema with at least one other database schema associated with sensitive data. 16. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions further comprise instructions that, as a result of being executed by the one or more processors, cause the computer system to: obtain additional attribute data; and detect additional sensitive data based at least in part on attribute data and additional attribute data. 17. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions that cause the computer system to detect the operation on the instantiation of the schemaless database further include instructions that cause the computer system to detect the operation based at least in part on semantic analysis of the set of code objects. 18. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions that cause the computer system to detect the operation on the instantiation of the schemaless database further include instructions that cause the computer system to detect the operation based at least in part on machine learning analysis of the set of code objects. 19. The non-transitory computer-readable storage medium of claim 13 , wherein the instructions that cause the computer system to detect the operation on the instantiation of the schemaless database further include instructions that cause the computer system to detect the operation based at least in part on a result of performing pattern matching on the set of code objects. 20. The non-transitory computer-readable storage medium of claim 19 , wherein the instructions that cause the computer system to perform pattern matching on the set of code objects further include instructions that cause the computer system to perform pattern matching using one or more regular expressions.

Assignees

Inventors

Classifications

  • Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities · CPC title

  • Electronic shopping [e-shopping] · CPC title

  • Graphs; Linked lists (G06F16/9027 takes precedence) · CPC title

  • Schema design and management · CPC title

  • G06F21/562Primary

    Static detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10515212B1 cover?
Computing resource service providers may operate a plurality of computing resources in a distributed computing environment. In addition, the computing resource server providers may provide customers with access to applications and/or services. The applications and/or services may include sensitive data. Sensitive data in the distributed computing environment may be tracked by analyzing source c…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0601. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 24 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).