User mediation for hotword/keyword detection
US-2024355324-A1 · Oct 24, 2024 · US
US9093069B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9093069-B2 |
| Application number | US-201213668662-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 5, 2012 |
| Priority date | Nov 5, 2012 |
| Publication date | Jul 28, 2015 |
| Grant date | Jul 28, 2015 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques disclosed herein include systems and methods for privacy-sensitive training data collection for updating acoustic models of speech recognition systems. In one embodiment, the system locally creates adaptation data from raw audio data. Such adaptation can include derived statistics and/or acoustic model update parameters. The derived statistics and/or updated acoustic model data can then be sent to a speech recognition server or third-party entity. Since the audio data and transcriptions are already processed, the statistics or acoustic model data is devoid of any information that could be human-readable or machine readable such as to enable reconstruction of audio data. Thus, such converted data sent to a server does not include personal or confidential information. Third-party servers can then continually update speech models without storing personal and confidential utterances of users.
Opening claim text (preview).
The invention claimed is: 1. A computer-implemented method of speech recognition processing, the computer-implemented method comprising: receiving a spoken utterance; storing audio data from the spoken utterance at a first device; creating adaptation data for updating at least one acoustic model, the adaptation data being created from the audio data via processing at the first device, the adaptation data being in a format that hinders reconstruction of the audio data; and transmitting the adaptation data to a second device for processing. 2. The computer-implemented method of claim 1 , wherein creating the adaptation data occurs after collecting a predetermined amount of audio data. 3. The computer-implemented method of claim 1 , wherein creating the adaptation data includes deriving statistical data from the audio data. 4. The computer-implemented method of claim 3 , wherein transmitting the adaptation data includes transmitting derived statistical data to a server that aggregates derived statistical data from multiple client devices. 5. The computer-implemented method of claim 1 , wherein creating the adaptation data includes creating updated acoustic model data. 6. The computer-implemented method of claim 5 , wherein transmitting the adaptation data includes transmitting the updated acoustic model data to a server that aggregates local acoustic models into a global acoustic model. 7. The computer-implemented method of claim 5 , wherein the updated acoustic model data is a version of an acoustic model used at the second device. 8. The computer-implemented method of claim 1 , wherein creating the adaptation data from the audio data includes processing a subset of the audio data and discarding a remaining portion of the audio data. 9. The computer-implemented method of claim 1 , wherein storing audio data at the first device includes storing audio data at a computer that is in network communication with a mobile device that received the spoken utterance. 10. The computer-implemented method of claim 1 , wherein storing audio data at the first device includes storing audio data at a mobile device that received the spoken utterance. 11. The computer-implemented method of claim 1 , wherein storing audio data from the spoken utterance includes storing audio waveform files and corresponding transcriptions; and wherein the adaptation data is in a format that hinders reconstruction of the corresponding transcriptions by human or machine, and hinders reconstruction of the corresponding waveform files by human or machine. 12. The computer-implemented method of claim 1 , wherein the adaptation data is in a format that is not readable by human or machine. 13. The computer-implemented method of claim 1 , wherein receiving the spoken utterance includes receiving a voice command or voice query at a mobile device. 14. The computer-implemented method of claim 1 , wherein transmitting the adaptation data includes sending a compressed version of the adaptation data to the second device. 15. The computer-implemented method of claim 1 , wherein transmitting the adaptation data includes sending an encrypted version of the adaptation data to the second device. 16. A system for speech processing, the system comprising: a processor; and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the system to perform the operations of: receiving a spoken utterance; storing audio data from the spoken utterance at a first device; creating adaptation data for updating at least one acoustic model, the adaptation data being created from the audio data via processing at the first device, the adaptation data being in a format that hinders reconstruction of the audio data; and transmitting the adaptation data to a second device for processing. 17. The system of claim 16 , wherein creating the adaptation data occurs after collecting a predetermined amount of audio data. 18. The system of claim 16 , wherein creating the adaptation data includes deriving statistical data from the audio data, and wherein transmitting the adaptation data includes transmitting derived statistical data to a server that aggregates derived statistical data from multiple client devices. 19. The system of claim 16 , wherein creating the adaptation data includes creating updated acoustic model data, and wherein transmitting the adaptation data includes transmitting the updated acoustic model data to a server that aggregates local acoustic models into a global acoustic model. 20. A computer program product including a non-transitory computer-storage medium having instructions stored thereon for processing data information, such that the instructions, when carried out by a processing device, cause the processing device to perform the operations of: receiving a spoken utterance; storing audio data from the spoken utterance at a first device; creating adaptation data for updating at least one acoustic model, the adaptation data being created from the audio data via processing at the first device, the adaptation data being in a format that hinders reconstruction of the audio data; and transmitting the adaptation data to a second device for processing.
wherein the identity of one or more communicating identities is hidden (cryptographic mechanisms or cryptographic arrangements for anonymous credentials or for identity based cryptographic systems H04L9/00) · CPC title
Protecting personal data, e.g. for financial or medical purposes · CPC title
Adaptation · CPC title
Segmentation; Word boundary detection · CPC title
to assure secure storage of data (address-based protection against unauthorised use of memory G06F12/14; record carriers for use with machines and with at least a part designed to carry digital markings G06K19/00) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.