What technology area does this patent fall under?

Primary CPC classification G06F21/6245. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Masking personal information in audio recordings

US12032717B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12032717-B2
Application number	US-202016832976-A
Country	US
Kind code	B2
Filing date	Mar 27, 2020
Priority date	Mar 27, 2020
Publication date	Jul 9, 2024
Grant date	Jul 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

One example method includes transcribing a portion of the audio component to create a transcription file that includes text, searching the text of the transcription file and identifying information in the text that may include personal information, defining a textual window that includes the information, evaluating the text in the textual window to identify personal information, and masking the personal information in the audio component of the recording. The personal information may be masked with information of a non-personal nature.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: creating a recording that includes an audio component of a client; transcribing a portion of the audio component to create a transcription file; receiving a regex; using the regex to search for a matching text between a portion of the transcription file and the regex; identifying one or more textual windows that include the matching text in the transcription file, wherein each identified textual window includes text preceding and following the matching text, and each identified textual window includes only text associated with the client; evaluating, by a trained machine learning classifier, the text in each identified textual window based on a bag of words model, in which a word in a bag of words has a relatively higher weight than other words in the bag of words based on a strength of correlation between words and personal information sought to be located; inferring, based on the evaluating of the text in each identified textual window, presence of personal information of the client in each identified textual window; and removing the personal information from any identified textual window in which presence of the personal information was inferred. 2. The method as recited in claim 1 , wherein the audio component includes words spoken by a human. 3. The method as recited in claim 1 , wherein the recording is an audio recording, or an audio/video recording. 4. The method as recited in claim 1 , wherein the trained machine learning classifier maps words in the one or more identified textual windows as a vector of real numbers, and the vector is one of a group of vectors in a vector space. 5. The method as recited in claim 1 , wherein each textual window comprises a portion of the recording that is bounded by a start time and an end time. 6. The method as recited in claim 1 , wherein the method is performed on-the-fly as the recording is being created. 7. The method as recited in claim 1 , wherein the personal information does not pertain to any person whose voice is in the recording. 8. The method as recited in claim 1 , wherein the removed personal information is replaced with data of a non-personal nature. 9. The method as recited in claim 1 , further comprising generating a set of training data and using the training data as a basis for searching the text of the transcription file. 10. The method as recited in claim 9 , wherein generating the set of training data comprises: tagging data in the training data as comprising the personal information; automatically learning one or more regexes, including the regex; and training a machine learning classifier to infer presence of the personal information in the identified textual window. 11. A non-transitory storage medium having stored therein instructions that are executable by one or more hardware processors to perform operations comprising: creating a recording that includes an audio component of a client; transcribing a portion of the audio component to create a transcription file; receiving a regex; using the regex to search for a matching text between a portion of the transcription file and the regex; identifying one or more textual windows that include the matching text in the transcription file, wherein each identified textual window includes text preceding and following the matching text, and each identified textual window includes only text associated with the client; evaluating, by a trained machine learning classifier, the text in each textual window based on a bag of words model, in which a word in a bag of words has a relatively higher weight than other words in the bag of words based on a strength of correlation between words and personal information sought to be located; inferring, based on the evaluating of the text in each identified textual window, presence of personal information in each textual window; and removing the personal information of the client from any identified textual window in which presence of the personal information was inferred. 12. The non-transitory storage medium as recited in claim 11 , wherein the audio component includes words spoken by a human. 13. The non-transitory storage medium as recited in claim 11 , wherein the recording is an audio recording, or an audio/video recording. 14. The non-transitory storage medium as recited in claim 11 , wherein the trained machine learning classifier maps words in the one or more identified textual windows as a vector of real numbers, and the vector is one of a group of vectors in a vector space. 15. The non-transitory storage medium as recited in claim 11 , wherein each textual window comprises a portion of the recording that is bounded by a start time and an end time. 16. The non-transitory storage medium as recited in claim 11 , wherein the operations are performed on-the-fly as the recording is being created. 17. The non-transitory storage medium as recited in claim 11 , wherein the personal information does not pertain to any person whose voice is in the recording. 18. The non-transitory storage medium as recited in claim 11 , wherein the removed personal information is replaced with data of a non-personal nature. 19. The non-transitory storage medium as recited in claim 11 , further comprising generating a set of training data and using the training data as a basis for searching the text of the transcription file. 20. The non-transitory storage medium as recited in claim 19 , wherein generating the set of training data comprises: tagging data in the training data as comprising the personal information; automatically learning one or more regexes, including the regex; and training a machine learning classifier to infer presence of the personal information in the identified textual window.

Assignees

Emc Ip Holding Co Llc

Inventors

Classifications

G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/063
Training · CPC title
G10L15/197
Probabilistic grammars, e.g. word n-grams · CPC title
G06N20/00
Machine learning · CPC title
G06F40/30
Semantic analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 77856224

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12032717B2 cover?: One example method includes transcribing a portion of the audio component to create a transcription file that includes text, searching the text of the transcription file and identifying information in the text that may include personal information, defining a textual window that includes the information, evaluating the text in the textual window to identify personal information, and masking the…
Who is the assignee on this patent?: Emc Ip Holding Co Llc
What technology area does this patent fall under?: Primary CPC classification G06F21/6245. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Method and system for confidential sentiment analysis

Systems and methods for automatically scrubbing sensitive data

System and method for speaker role determination and scrubbing identifying information

Systems and methods for securing data based on discovered relationships

Removing personal information from text using a neural network

Frequently asked questions