Privacy-sensitive speech model creation via aggregation of multiple user models

US9093069B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9093069-B2
Application numberUS-201213668662-A
CountryUS
Kind codeB2
Filing dateNov 5, 2012
Priority dateNov 5, 2012
Publication dateJul 28, 2015
Grant dateJul 28, 2015

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of speech recognition processing, the computer-implemented method comprising: receiving a spoken utterance; storing audio data from the spoken utterance at a first device; creating adaptation data for updating at least one acoustic model, the adaptation data being created from the audio data via processing at the first device, the adaptation data being in a format that hinders reconstruction of the audio data; and transmitting the adaptation data to a second device for processing. 2. The computer-implemented method of claim 1 , wherein creating the adaptation data occurs after collecting a predetermined amount of audio data. 3. The computer-implemented method of claim 1 , wherein creating the adaptation data includes deriving statistical data from the audio data. 4. The computer-implemented method of claim 3 , wherein transmitting the adaptation data includes transmitting derived statistical data to a server that aggregates derived statistical data from multiple client devices. 5. The computer-implemented method of claim 1 , wherein creating the adaptation data includes creating updated acoustic model data. 6. The computer-implemented method of claim 5 , wherein transmitting the adaptation data includes transmitting the updated acoustic model data to a server that aggregates local acoustic models into a global acoustic model. 7. The computer-implemented method of claim 5 , wherein the updated acoustic model data is a version of an acoustic model used at the second device. 8. The computer-implemented method of claim 1 , wherein creating the adaptation data from the audio data includes processing a subset of the audio data and discarding a remaining portion of the audio data. 9. The computer-implemented method of claim 1 , wherein storing audio data at the first device includes storing audio data at a computer that is in network communication with a mobile device that received the spoken utterance. 10. The computer-implemented method of claim 1 , wherein storing audio data at the first device includes storing audio data at a mobile device that received the spoken utterance. 11. The computer-implemented method of claim 1 , wherein storing audio data from the spoken utterance includes storing audio waveform files and corresponding transcriptions; and wherein the adaptation data is in a format that hinders reconstruction of the corresponding transcriptions by human or machine, and hinders reconstruction of the corresponding waveform files by human or machine. 12. The computer-implemented method of claim 1 , wherein the adaptation data is in a format that is not readable by human or machine. 13. The computer-implemented method of claim 1 , wherein receiving the spoken utterance includes receiving a voice command or voice query at a mobile device. 14. The computer-implemented method of claim 1 , wherein transmitting the adaptation data includes sending a compressed version of the adaptation data to the second device. 15. The computer-implemented method of claim 1 , wherein transmitting the adaptation data includes sending an encrypted version of the adaptation data to the second device. 16. A system for speech processing, the system comprising: a processor; and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to perform the operations of: receiving a spoken utterance; storing audio data from the spoken utterance at a first device; creating adaptation data for updating at least one acoustic model, the adaptation data being created from the audio data via processing at the first device, the adaptation data being in a format that hinders reconstruction of the audio data; and transmitting the adaptation data to a second device for processing. 17. The system of claim 16 , wherein creating the adaptation data occurs after collecting a predetermined amount of audio data. 18. The system of claim 16 , wherein creating the adaptation data includes deriving statistical data from the audio data, and wherein transmitting the adaptation data includes transmitting derived statistical data to a server that aggregates derived statistical data from multiple client devices. 19. The system of claim 16 , wherein creating the adaptation data includes creating updated acoustic model data, and wherein transmitting the adaptation data includes transmitting the updated acoustic model data to a server that aggregates local acoustic models into a global acoustic model. 20. A computer program product including a non-transitory computer-storage medium having instructions stored thereon for processing data information, such that the instructions, when carried out by a processing device, cause the processing device to perform the operations of: receiving a spoken utterance; storing audio data from the spoken utterance at a first device; creating adaptation data for updating at least one acoustic model, the adaptation data being created from the audio data via processing at the first device, the adaptation data being in a format that hinders reconstruction of the audio data; and transmitting the adaptation data to a second device for processing.

Assignees

Inventors

Classifications

  • wherein the identity of one or more communicating identities is hidden (cryptographic mechanisms or cryptographic arrangements for anonymous credentials or for identity based cryptographic systems H04L9/00) · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • G10L15/065Primary

    Adaptation · CPC title

  • G10L15/04Primary

    Segmentation; Word boundary detection · CPC title

  • to assure secure storage of data (address-based protection against unauthorised use of memory G06F12/14; record carriers for use with machines and with at least a part designed to carry digital markings G06K19/00) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9093069B2 cover?
Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can t…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 28 2015 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).