Storing tokenized information in untrusted environments

US9081978B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-9081978-B1
Application numberUS-201313905815-A
CountryUS
Kind codeB1
Filing dateMay 30, 2013
Priority dateMay 30, 2013
Publication dateJul 14, 2015
Grant dateJul 14, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are described for tokenizing information to be stored in an untrusted environment. During tokenization, one or more strings in a file or data stream are replaced with a token. The token may be generated as a random number or a counter, such that the replaced string may not be derived based on the token. Token-to-string mapping data may be stored in a trusted environment, and the tokenized information may be stored in the untrusted environment. Users may search the tokenized information based on non-sensitive search terms present in a whitelist that is accessible from the untrusted environment, the whitelist providing a token-to-string mapping for the non-sensitive terms. The search results may be provided as redacted information, in which the non-sensitive strings have been detokenized based on the whitelist while the sensitive strings remain tokenized.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, comprising: in a trusted computing environment, parsing a file to determine a plurality of words included in the file, based on whitespace characters that separate the words in the file, the file comprising one or more sensitive words corresponding to financial account data; for individual words that are unique in the plurality of words, determining a corresponding token that corresponds to the word, such that the word is not derivable from the token; generating a tokenized file that includes corresponding tokens in place of the plurality of words; storing the tokenized file in an untrusted computing environment; in the trusted computing environment, storing a mapping of the plurality of words to the corresponding tokens; and in the untrusted computing environment: storing a whitelist mapping of a subset of the plurality of words to the corresponding tokens, the subset including non-sensitive words other than the one or more sensitive words; receiving a search request including one or more search terms; for the one or more search terms that are included in the whitelist, retrieving the corresponding token; for the one or more search terms that are not included in the whitelist, sending a request that the trusted computing environment retrieve the corresponding token; based at least in part on one or more tokens corresponding to the one or more search terms, perform a search of the tokenized file stored in the untrusted computing environment; identifying one or more tokens in the tokenized file that are included in the whitelist; replacing the identified one or more tokens with one or more corresponding words from the whitelist, to generate partly detokenized information; and providing the partly detokenized information in response to the search request. 2. The method of claim 1 , further comprising: receiving identification of the a user associated with the search request; and determining, based on access control data, whether the user is permitted access to the one or more search terms included in the search request. 3. The method of claim 1 , wherein the determining of the token is based on a random or pseudo-random number generation algorithm. 4. The method of claim 1 , wherein the determining of the token includes generating the token as an ordinal value that is incremented or decremented for each unique word in the plurality of words. 5. A system, comprising: a token mapping datastore storing token mapping data that associates a plurality of strings with corresponding tokens, the token mapping datastore included in a first computing environment associated with a first trust level; a whitelist token mapping datastore storing whitelist token mapping data that associates a subset of the plurality of strings with the corresponding tokens, the whitelist token mapping datastore included in a second computing environment associated with a second trust level; a first computing device in communication with the token mapping datastore, the first computing device configured to execute a first set of computer-readable instructions that cause the first computing device to: generate tokenized information that includes one or more tokens that correspond to one or more strings of the plurality of strings; send the tokenized information to be stored in the second computing environment; and a second computing device in communication with the first computing device and the whitelist token mapping datastore, the second computing device configured to execute a second set of computer-readable instructions that cause the second computing device to: receive a search request including one or more search terms; for the one or more search terms that are included in the whitelist token mapping data, retrieve the corresponding token from the whitelist token mapping datastore; for the one or more search terms that are not included in the whitelist token mapping data, send a request that the first computing device retrieve the corresponding token from the token mapping datastore; based at least in part on one or more tokens corresponding to the one or more search terms, perform a search for the tokenized information stored in the second computing environment; identify one or more tokens in the tokenized information that are included in the whitelist token mapping data; replace the identified one or more tokens with one or more corresponding strings from the whitelist token mapping data, to generate partly detokenized information; and provide the partly detokenized information in response to the search request. 6. The system of claim 5 , the first set of computer-readable instructions further comprising instructions that cause the first computing device to: parse information in the first computing environment to determine the one or more of the plurality of strings based on separation by one or more whitespace characters; and for individual strings that are unique in the one or more strings, determine a token that corresponds to the string, based at least partly on the token mapping data in the token mapping datastore. 7. The system of claim 5 , wherein: the token mapping datastore includes the token mapping data for one or more of the plurality of strings that include sensitive data; and the whitelist token mapping datastore includes the whitelist token mapping data for one or more of the subset of the plurality of strings that do not include the sensitive data. 8. The system of claim 5 , wherein: the search request further comprises a search range in dates, times, or both dates and times; the tokenized information includes timestamp data describing when the tokenized information was created; and the search includes a search for a version of the tokenized information having a timestamp within the search range. 9. The system of claim 5 , the second computing device further configured to: for the one or more search terms that are not included in the whitelist token mapping data, modify the whitelist token mapping data to include an association of the search term with the corresponding token retrieved from the token mapping datastore. 10. The system of claim 5 , the first computing device further configured to: receive, from the second computing device, the request that the first computing device retrieve the corresponding token from the token mapping datastore; analyze the one or more search terms included in the request, to determine a probability that the one or more search terms include sensitive information; and deny the request, based on the probability exceeding a predetermined threshold probability. 11. The system of claim 5 , the first computing device further configured to: receive, from the second computing device, the request that the first computing device retrieve the corresponding token from the token mapping datastore; determine a frequency of a plurality of requests that include the request; and deny the request, based on the frequency exceeding a predetermined threshold frequency. 12. The system of claim 5 , the first computing device further configured to: receive, from the second computing device, the request that the first computing device retrieve the corresponding token from the token mapping datastore, the request including an identification of a user associated with the search request, the request including the one or more search terms that are not included in the whitelist token mapping data; determine whether the user is permitted access to the one or more search terms included in the request, based on access control data; and deny the request, based on a d

Assignees

Inventors

Classifications

  • where protection concerns the structure of data, e.g. records, types, queries · CPC title

  • wherein the data content is protected, e.g. by encrypting or encapsulating the payload · CPC title

  • Protecting distributed programs or content, e.g. vending or licensing of copyrighted material (protection in video systems or pay television H04N7/16) {; Digital rights management [DRM]} · CPC title

  • G06F21/62Primary

    Protecting access to data via a platform, e.g. using keys or access control rules · CPC title

  • by anonymising data, e.g. decorrelating personal data from the owner's identification · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9081978B1 cover?
Techniques are described for tokenizing information to be stored in an untrusted environment. During tokenization, one or more strings in a file or data stream are replaced with a token. The token may be generated as a random number or a counter, such that the replaced string may not be derived based on the token. Token-to-string mapping data may be stored in a trusted environment, and the toke…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/6227. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 14 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).