Intelligent hashing of sensitive information

US11664998B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11664998-B2
Application numberUS-202016884728-A
CountryUS
Kind codeB2
Filing dateMay 27, 2020
Priority dateMay 27, 2020
Publication dateMay 30, 2023
Grant dateMay 30, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described are techniques for preserving data security for sensitive information. The techniques including identifying sensitive information in first audio data from a first client device. The techniques further comprise generating second audio data including hashed sensitive information, where the hashed sensitive information comprises an audio clip that replaces the sensitive information and that is based on the sensitive information. The techniques further comprise transmitting the second data including the hashed sensitive information to a second client device. The techniques further comprise receiving third audio data including the hashed sensitive information from the second client device. The techniques further comprise generating fourth audio data by replacing the hashed sensitive information with the sensitive information and transmitting the fourth audio data including the sensitive information to the first client device.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: identifying sensitive information in first audio data from a first client device; generating second audio data including hashed sensitive information, wherein the hashed sensitive information comprises an audio clip that replaces the sensitive information, that is based on the sensitive information, and that retains linguistic characteristics of the sensitive information, wherein the linguistic characteristics are selected from a group consisting of: phonetic characteristics, syntactic characteristics, and semantic characteristics; transmitting the second audio data including the hashed sensitive information to a second client device; receiving third audio data including the hashed sensitive information from the second client device; generating fourth audio data by replacing the hashed sensitive information with the sensitive information; and transmitting the fourth audio data including the sensitive information to the first client device. 2. The method of claim 1 , wherein identifying the sensitive information in the first audio data further comprises: comparing extracted portions of the first audio data to a sensitive information database; and classifying respective extracted portions matching a respective entry in the sensitive information database as the sensitive information. 3. The method of claim 1 , wherein identifying the sensitive information in the first audio data further comprises: determining that an extracted portion of the first audio data does not match any record in a sensitive information database; generating a sensitivity score for the extracted portion of the first audio data in response to determining that the extracted portion of the first audio data does not match any record in the sensitive information database; determining that the sensitivity score satisfies a sensitivity score threshold; and classifying the extracted portion of the first audio data as the sensitive information. 4. The method of claim 3 , wherein the sensitivity score is generated by a content sensitivity model that is trained using machine learning algorithms. 5. The method of claim 1 , wherein generating the second audio data including the hashed sensitive information further comprises storing a correspondence between the sensitive information and the hashed sensitive information in a mapping table; and wherein generating fourth audio data by replacing the hashed sensitive information with the sensitive information further comprises matching the hashed sensitive information with the sensitive information based on the correspondence in the mapping table. 6. The method of claim 1 , wherein the hashed sensitive information includes an indicator that identifies the hashed sensitive information as data with a sensitive information classification. 7. The method of claim 6 , wherein the indicator further includes an explanation of the sensitive information classification, wherein the explanation relates to a match in a sensitive information database. 8. The method of claim 7 , wherein the method further comprises: receiving feedback related to an accuracy of the sensitive information classification; and updating, based on the feedback, the sensitive information database. 9. The method of claim 6 , wherein the indicator further includes an explanation of the sensitive information classification, wherein the explanation relates to a sensitivity score generated by a sensitivity score model above a sensitivity score threshold. 10. The method of claim 9 , wherein the method further comprises: receiving feedback related to an accuracy of the sensitive information classification; and updating, based on the feedback, the sensitivity score model. 11. The computer-implemented method of claim 1 , wherein the method is performed by a data security application according to software that is downloaded to the data security application from a remote data processing system. 12. The computer-implemented method of claim 11 , wherein the method further comprises: metering a usage of the software; and generating an invoice based on metering the usage. 13. The method of claim 1 , wherein the linguistic characteristics comprise the phonetic characteristics. 14. The method of claim 1 , wherein the linguistic characteristics comprise the syntactic characteristics. 15. The method of claim 1 , wherein the linguistic characteristics comprise the semantic characteristics. 16. A system comprising: one or more processors; and one or more computer-readable storage media storing program instructions which, when executed by the one or more processors, are configured to cause the one or more processors to perform a method comprising: identifying sensitive information in first audio data from a first client device; generating second audio data including hashed sensitive information, wherein the hashed sensitive information comprises an audio clip that replaces the sensitive information, that is based on the sensitive information, and that retains linguistic characteristics of the sensitive information, wherein the linguistic characteristics are selected from a group consisting of: phonetic characteristics, syntactic characteristics, and semantic characteristics; transmitting the second audio data including the hashed sensitive information to a second client device; receiving third audio data including the hashed sensitive information from the second client device; generating fourth audio data by replacing the hashed sensitive information with the sensitive information; and transmitting the fourth audio data including the sensitive information to the first client device. 17. The system of claim 16 , wherein identifying the sensitive information in the first audio data further comprises: comparing extracted portions of the first audio data to a sensitive information database; and classifying respective extracted portions matching a respective entry in the sensitive information database as the sensitive information. 18. The system of claim 16 , wherein identifying the sensitive information in the first audio data further comprises: determining that an extracted portion of the first audio data does not match any record in a sensitive information database; generating, in response to determining that the extracted portion of the first audio data does not match any record in the sensitive information database, a sensitivity score for the extracted portion of the first audio data based on inputting the extracted portion of the first audio data to a content sensitivity model that is trained using machine learning algorithms; determining that the sensitivity score satisfies a sensitivity score threshold; and classifying the extracted portion of the first audio data as the sensitive information. 19. The system of claim 16 , wherein generating the second audio data including the hashed sensitive information further comprises storing a correspondence between the sensitive information and the hashed sensitive information in a mapping table; and wherein generating fourth audio data by replacing the hashed sensitive information with the sensitive information further comprises matching the hashed sensitive information with the sensitive information based on the correspondence in the mapping table. 20. A computer program product comprising one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable

Assignees

Inventors

Classifications

  • Machine learning · CPC title

  • H04L9/0643Primary

    Hash functions, e.g. MD5, SHA, HMAC or f9 MAC · CPC title

  • H04L9/3236Primary

    using cryptographic hash functions · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11664998B2 cover?
Described are techniques for preserving data security for sensitive information. The techniques including identifying sensitive information in first audio data from a first client device. The techniques further comprise generating second audio data including hashed sensitive information, where the hashed sensitive information comprises an audio clip that replaces the sensitive information and t…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification H04L9/0643. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue May 30 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).