Privacy-sensitive speech model creation via aggregation of multiple user models
US-9093069-B2 · Jul 28, 2015 · US
US9424836B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9424836-B2 |
| Application number | US-201514745630-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 22, 2015 |
| Priority date | Nov 5, 2012 |
| Publication date | Aug 23, 2016 |
| Grant date | Aug 23, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method comprising acts of: receiving, via at least one network, adaptation data generated at least in part by performing statistical processing on audio data comprising at least one user utterance; and using the adaptation data to update at least one acoustic model for use in speech recognition processing, wherein the adaptation data is in a format that prevents reconstruction of the audio data. 2. The computer-implemented method of claim 1 , wherein the adaptation data is received via the at least one network in an encrypted form. 3. The computer-implemented method of claim 1 , wherein: the adaptation data comprises first adaptation data received from a first device and second adaptation data received from a second device different from the first device; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 4. The computer-implemented method of claim 1 , wherein: the adaptation data comprises first adaptation data and second adaptation data; the first adaptation data is generated at least in part by performing statistical processing on first audio data comprising at least one first utterance spoken by a first user; the second adaptation data is generated at least in part by performing statistical processing on second audio data comprising at least one second utterance spoken by a second user different from the first user; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 5. The computer-implemented method of claim 1 , wherein the adaptation data is generated at least in part by performing statistical processing on at least a selected threshold amount of audio data. 6. The computer-implemented method of claim 5 , wherein the selected threshold amount of audio data is at least 100 utterances. 7. The computer-implemented method of claim 1 , wherein the adaptation data comprises at least one update to at least one component of the at least one acoustic model. 8. A system comprising: at least one memory storing executable instructions; and at least one processor programmed by the executable instructions to perform a method comprising acts of: receiving, via at least one network, adaptation data generated at least in part by performing statistical processing on audio data comprising at least one user utterance; and using the adaptation data to update at least one acoustic model for use in speech recognition processing, wherein the adaptation data is in a format that prevents reconstruction of the audio data. 9. The system of claim 8 , wherein the adaptation data is received via the at least one network in an encrypted form. 10. The system of claim 8 , wherein: the adaptation data comprises first adaptation data received from a first device and second adaptation data received from a second device different from the first device; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 11. The system of claim 8 , wherein: the adaptation data comprises first adaptation data and second adaptation data; the first adaptation data is generated at least in part by performing statistical processing on first audio data comprising at least one first utterance spoken by a first user; the second adaptation data is generated at least in part by performing statistical processing on second audio data comprising at least one second utterance spoken by a second user different from the first user; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 12. The system of claim 8 , wherein the adaptation data is generated at least in part by performing statistical processing on at least a selected threshold amount of audio data. 13. The system of claim 12 , wherein the selected threshold amount of audio data is at least 100 utterances. 14. The system of claim 8 , wherein the adaptation data comprises at least one update to at least one component of the at least one acoustic model. 15. At least one non-transitory computer-readable medium having encoded thereon executable instructions which, when executed by at least one processor, cause the at least one processor to perform a method comprising acts of: receiving, via at least one network, adaptation data generated at least in part by performing statistical processing on audio data comprising at least one user utterance; and using the adaptation data to update at least one acoustic model for use in speech recognition processing, wherein the adaptation data is in a format that prevents reconstruction of the audio data. 16. The at least one non-transitory computer-readable medium of claim 15 , wherein the adaptation data is received via the at least one network in an encrypted form. 17. The at least one non-transitory computer-readable medium of claim 15 , wherein: the adaptation data comprises first adaptation data received from a first device and second adaptation data received from a second device different from the first device; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 18. The at least one non-transitory computer-readable medium of claim 15 , wherein: the adaptation data comprises first adaptation data and second adaptation data; the first adaptation data is generated at least in part by performing statistical processing on first audio data comprising at least one first utterance spoken by a first user; the second adaptation data is generated at least in part by performing statistical processing on second audio data comprising at least one second utterance spoken by a second user different from the first user; and the act of using the adaptation data comprises aggregating the first adaptation data and the second adaptation data. 19. The at least one non-transitory computer-readable medium of claim 15 , wherein the adaptation data is generated at least in part by performing statistical processing on at least a selected threshold amount of audio data. 20. The at least one non-transitory computer-readable medium of claim 19 , wherein the selected threshold amount of audio data is at least 100 utterances. 21. The at least one non-transitory computer-readable medium of claim 15 , wherein the adaptation data comprises at least one update to at least one component of the at least one acoustic model.
Adaptation · CPC title
Protecting personal data, e.g. for financial or medical purposes · CPC title
wherein the identity of one or more communicating identities is hidden (cryptographic mechanisms or cryptographic arrangements for anonymous credentials or for identity based cryptographic systems H04L9/00) · CPC title
to assure secure storage of data (address-based protection against unauthorised use of memory G06F12/14; record carriers for use with machines and with at least a part designed to carry digital markings G06K19/00) · CPC title
Segmentation; Word boundary detection · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.