Privacy-sensitive speech model creation via aggregation of multiple user models

US9424836B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9424836-B2
Application numberUS-201514745630-A
CountryUS
Kind codeB2
Filing dateJun 22, 2015
Priority dateNov 5, 2012
Publication dateAug 23, 2016
Grant dateAug 23, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising acts of: receiving, via at least one network, adaptation data generated at least in part by performing statistical processing on audio data comprising at least one user utterance; and using the adaptation data to update at least one acoustic model for use in speech recognition processing, wherein the adaptation data is in a format that prevents reconstruction of the audio data. 2. The computer-implemented method of claim 1 , wherein the adaptation data is received via the at least one network in an encrypted form. 3. The computer-implemented method of claim 1 , wherein: the adaptation data comprises first adaptation data received from a first device and second adaptation data received from a second device different from the first device; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 4. The computer-implemented method of claim 1 , wherein: the adaptation data comprises first adaptation data and second adaptation data; the first adaptation data is generated at least in part by performing statistical processing on first audio data comprising at least one first utterance spoken by a first user; the second adaptation data is generated at least in part by performing statistical processing on second audio data comprising at least one second utterance spoken by a second user different from the first user; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 5. The computer-implemented method of claim 1 , wherein the adaptation data is generated at least in part by performing statistical processing on at least a selected threshold amount of audio data. 6. The computer-implemented method of claim 5 , wherein the selected threshold amount of audio data is at least 100 utterances. 7. The computer-implemented method of claim 1 , wherein the adaptation data comprises at least one update to at least one component of the at least one acoustic model. 8. A system comprising: at least one memory storing executable instructions; and at least one processor programmed by the executable instructions to perform a method comprising acts of: receiving, via at least one network, adaptation data generated at least in part by performing statistical processing on audio data comprising at least one user utterance; and using the adaptation data to update at least one acoustic model for use in speech recognition processing, wherein the adaptation data is in a format that prevents reconstruction of the audio data. 9. The system of claim 8 , wherein the adaptation data is received via the at least one network in an encrypted form. 10. The system of claim 8 , wherein: the adaptation data comprises first adaptation data received from a first device and second adaptation data received from a second device different from the first device; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 11. The system of claim 8 , wherein: the adaptation data comprises first adaptation data and second adaptation data; the first adaptation data is generated at least in part by performing statistical processing on first audio data comprising at least one first utterance spoken by a first user; the second adaptation data is generated at least in part by performing statistical processing on second audio data comprising at least one second utterance spoken by a second user different from the first user; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 12. The system of claim 8 , wherein the adaptation data is generated at least in part by performing statistical processing on at least a selected threshold amount of audio data. 13. The system of claim 12 , wherein the selected threshold amount of audio data is at least 100 utterances. 14. The system of claim 8 , wherein the adaptation data comprises at least one update to at least one component of the at least one acoustic model. 15. At least one non-transitory computer-readable medium having encoded thereon executable instructions which, when executed by at least one processor, cause the at least one processor to perform a method comprising acts of: receiving, via at least one network, adaptation data generated at least in part by performing statistical processing on audio data comprising at least one user utterance; and using the adaptation data to update at least one acoustic model for use in speech recognition processing, wherein the adaptation data is in a format that prevents reconstruction of the audio data. 16. The at least one non-transitory computer-readable medium of claim 15 , wherein the adaptation data is received via the at least one network in an encrypted form. 17. The at least one non-transitory computer-readable medium of claim 15 , wherein: the adaptation data comprises first adaptation data received from a first device and second adaptation data received from a second device different from the first device; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 18. The at least one non-transitory computer-readable medium of claim 15 , wherein: the adaptation data comprises first adaptation data and second adaptation data; the first adaptation data is generated at least in part by performing statistical processing on first audio data comprising at least one first utterance spoken by a first user; the second adaptation data is generated at least in part by performing statistical processing on second audio data comprising at least one second utterance spoken by a second user different from the first user; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 19. The at least one non-transitory computer-readable medium of claim 15 , wherein the adaptation data is generated at least in part by performing statistical processing on at least a selected threshold amount of audio data. 20. The at least one non-transitory computer-readable medium of claim 19 , wherein the selected threshold amount of audio data is at least 100 utterances. 21. The at least one non-transitory computer-readable medium of claim 15 , wherein the adaptation data comprises at least one update to at least one component of the at least one acoustic model.

Assignees

Inventors

Classifications

  • G10L15/065Primary

    Adaptation · CPC title

  • Protecting personal data, e.g. for financial or medical purposes · CPC title

  • wherein the identity of one or more communicating identities is hidden (cryptographic mechanisms or cryptographic arrangements for anonymous credentials or for identity based cryptographic systems H04L9/00) · CPC title

  • to assure secure storage of data (address-based protection against unauthorised use of memory G06F12/14; record carriers for use with machines and with at least a part designed to carry digital markings G06K19/00) · CPC title

  • Segmentation; Word boundary detection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9424836B2 cover?
Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can t…
Who is the assignee on this patent?
Nuance Communications Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/065. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 23 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).