Methods and systems for calculating statistical quantities in a computing environment
US-2015363232-A1 · Dec 17, 2015 · US
US9971809B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-9971809-B1 |
| Application number | US-201514868334-A |
| Country | US |
| Kind code | B1 |
| Filing date | Sep 28, 2015 |
| Priority date | Sep 28, 2015 |
| Publication date | May 15, 2018 |
| Grant date | May 15, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The disclosed computer-implemented method for searching unstructured documents for structured data may include (1) receiving a request to search unstructured documents for a document that contains data (e.g., sensitive data) from a structured dataset, (2) generating a secure search index (e.g., a Bloom filter) for searching the unstructured documents for the sensitive data, (3) extracting a first token and a second token from an unstructured document, (4) generating a hashed key from the first token and the second token, (5) querying the secure search index to determine whether the second hashed key is contained in the secure search index, and (6) responding, upon determining that the second hashed key is contained in the secure search index, to the request with information about the unstructured document. Various other methods, systems, and computer-readable media are also disclosed.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method for searching unstructured documents for structured data, at least a portion of the method being performed by a computing device comprising at least one processor, the method comprising: receiving a request to search unstructured documents for a document that contains: a value from a first field of a dataset; and a value from a second field of the dataset; generating a secure search index for searching the unstructured documents by, for each record in the dataset: identifying, within the dataset, the record's value from the first field and the record's value from the second field; generating a first hashed key from the record's value from the first field and the record's value from the second field; adding the first hashed key to the secure search index; extracting a first token and a second token from an unstructured document; generating a second hashed key from the first token and the second token; querying the secure search index to determine whether the second hashed key is contained in the secure search index; responding, upon determining that the second hashed key is contained in the secure search index, to the request with information about the unstructured document. 2. The computer-implemented method of claim 1 , wherein: values from the first field follow a known pattern; the request for the document specifies that the value from the first field is required to be within a specified distance from the value from the second field; extracting the first token and the second token from the unstructured document comprises: using the known pattern to identify the first token within the unstructured document; identifying the second token within the specified distance from the first token. 3. The computer-implemented method of claim 1 , wherein: receiving the request to search unstructured documents for the document comprises receiving a request to search unstructured documents for a document that contains: the value from the first field of the dataset; the value from the second field of the dataset; and a value from a third field of the dataset, wherein: values from the first field follow a known pattern; values from the second field and values from the third field do not follow a known pattern; the computer-implemented method further comprises: generating an additional secure search index by, for each record in the dataset: identifying, within the dataset, the record's value from the first field and the record's value from the third field; generating a third hashed key from the record's value from the first field and the record's value from the third field; adding the third hashed key to the additional secure search index; extracting a third token from the unstructured document; generating a fourth hashed key from the first token and the third token; querying the additional secure search index to determine whether the fourth hashed key is contained in the additional secure search index; responding to the request with information about the unstructured document occurs upon determining that the fourth hashed key is contained in the additional secure search index. 4. The computer-implemented method of claim 1 , wherein: receiving the request to search unstructured documents for the document comprises receiving a request to search unstructured documents for a document that contains: the value from the first field of the dataset; the value from the second field of the dataset; and a value from a third field of the dataset, wherein: values from the first field follow a first known pattern; values from the third field follow a second known pattern; values from the second field do not follow a known pattern; the computer-implemented method further comprises: generating an additional secure search index by, for each record in the dataset: identifying, within the dataset, the record's value from the second field and the record's value from the third field; generating a third hashed key from the record's value from the second field and the record's value from the third field; adding the third hashed key to the additional secure search index; extracting a third token from the unstructured document; generating a fourth hashed key from the second token and the third token; querying the additional secure search index to determine whether the fourth hashed key is contained in the additional secure search index; responding to the request with information about the unstructured document occurs upon determining that the fourth hashed key is contained in the additional secure search index. 5. The computer-implemented method of claim 1 , wherein: the first hashed key is generated from the record's value from the first field, the record's value from the second field, and a cryptographic key; the second hashed key is generated from the first token, the second token, and the cryptographic key. 6. The computer-implemented method of claim 1 , wherein the secure search index comprises a Bloom filter. 7. The computer-implemented method of claim 1 , wherein generating the first hashed key comprises: generating an intermediate value from a combination of the record's value from the first field and the record's value from the second field; hashing the intermediate value to produce the hashed key. 8. The computer-implemented method of claim 1 , wherein: the step of generating the secure search index for searching the unstructured documents is performed at a server-side computing device; the steps of extracting the first token and the second token, generating the second hashed key, and querying the secure search index are performed at a client-side computing device to which the secure search index has been distributed. 9. The computer-implemented method of claim 1 , wherein: values from the first field follow a known pattern; extracting the first token from the unstructured document comprises using a regular expression based on the known pattern to identify the first token within the unstructured document. 10. The computer-implemented method of claim 1 , wherein: at least the first field of the dataset comprises sensitive data; the first field of the dataset comprises at least one of: social security numbers; account numbers; credit card numbers. 11. A system for searching unstructured documents for structured data, the system comprising: a receiving module, stored in memory, that receives a request to search unstructured documents for a document that contains: a value from a first field of a dataset; and a value from a second field of the dataset; an index-generating module, stored in memory, that generates a secure search index for searching the unstructured documents by, for each record in the dataset: identifying, within the dataset, the record's value from the first field and the record's value from the second field; generating a first hashed key from the record's value from the first field and the record's value from the second field; adding the first hashed key to the secure search index; an extracting module, stored in memory, that extracts a first token and a second token from an unstructured document; a key-generating module, stored in memory, that generates a second hashed key from the first token and the second token; a querying module, stored in memory, that queries the secure search index to determine whether the second hashed key is contained in the secure search index; a responding module, stored in memory, that responds, upon determining that the second hashed key is contained in the secure search index, to the request with informa
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Protecting personal data, e.g. for financial or medical purposes · CPC title
Management therefor · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.