Distributed voice processing system

US10388273B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10388273-B2
Application numberUS-201615233207-A
CountryUS
Kind codeB2
Filing dateAug 10, 2016
Priority dateAug 10, 2016
Publication dateAug 20, 2019
Grant dateAug 20, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed herein are system, method, and computer program product embodiments for distributed voice processing. An embodiment operates by receiving audio data from microphones associated with a plurality of devices distributed across an area of interest. A trigger word is detected in the audio data received from at least one of the devices. Responsive to detecting the trigger word, a voice command processing system associated with a multimedia device is activated. Based on the audio data received from at least two or more of the devices, a voice command associated with the multimedia device is determined. The multimedia device is controlled in accordance with the voice command.

First claim

Opening claim text (preview).

What is claimed is: 1. A method, comprising: receiving, at a computing device during a system sleep mode, audio data from microphones associated with a plurality of devices, including at least one laptop computer and at least one device that is other than a laptop computer, distributed across an area of interest; determining a first timestamp associated with audio data from a first one of the microphones from the at least one laptop computer and a second timestamp associated with audio data from a second one of the microphones from the at least one device that is other than a laptop computer are within a synchronization interval; synchronizing the received audio data from the at least one laptop computer and the at least one device that is other than a laptop computer based on the synchronization interval; detecting a trigger word in the synchronized audio data received from the plurality of devices; responsive to detecting the trigger word, both switching the system in an active mode in which subsequent audio data is received from the plurality of devices at a smaller time interval in the active mode relative a larger time interval in the sleep mode, and activating a voice command processing system associated with a multimedia device; determining, based on the audio data received from at least two or more of the plurality of devices, that the synchronized audio data includes a voice command associated with the multimedia device, wherein at least one microphone associated with each of the two or more of the plurality of devices receives at least a portion of the audio data corresponding to the voice command; and controlling the multimedia device in accordance with the voice command. 2. The method of claim 1 , wherein the plurality of devices are communicatively coupled to the computing device via a wireless network. 3. The method of claim 1 , wherein the determining that the audio data includes the voice command comprises: receiving, at the computing device, an indication from at least one of the plurality of devices that the trigger word has been received by the at least one of the plurality of devices, wherein the at least one of the plurality of devices determines that at least one microphone associated with the at least one of the plurality of devices received audio data corresponding to the trigger word. 4. The method of claim 1 , wherein the determining that the audio data includes the voice command comprises: determining, by the computing device, that the audio data includes the trigger word based on audio data received from the at least two or more of the plurality of devices. 5. The method of claim 1 , further comprising: determining a signal-to-noise ratio for audio data received from a subset of the microphones, wherein the subset of the microphones are physically positioned within or proximate to one of the plurality of devices. 6. The method of claim 1 , further comprising: identifying which microphone is associated with a highest signal-to-noise ratio of the audio data received from the microphones; and wherein detecting the trigger word comprises using audio data received from the identified microphone to detect said trigger word. 7. The method of claim 1 , further comprising: combining the received audio data based on signal-to-noise ratios, wherein one or more of the microphones with highest signal-to-noise ratios are used to enhance the voice command or trigger word from the received audio data, and wherein one or more of the microphones with lowest signal-to-noise ratios are used to cancel out noise from the received audio data associated with the highest signal-to-noise ratios. 8. The method of claim 1 , wherein the audio data received from the laptop computer includes combined audio data received from a plurality of microphones associated with the laptop computer that has been combined into the combined audio data by the laptop computer prior to being received at the computing device. 9. The method of claim 1 , wherein the synchronizing comprises: applying a first weight to the audio data received from the at least one laptop computer; applying a second weight to the audio data received from the at least one device that is other than a laptop computer; and combining the audio data based on the first weight and the second weight, wherein the first weight and the second weight are assigned based on a signal-to-noise ratio of the audio data received from the laptop computer relative to a signal-to-noise ratio of the audio data received from the one device that is other than the laptop computer. 10. The method of claim 9 , wherein combining comprises: determining which audio data has a lower signal-to-noise ratio; and depressing an amount of noise in the audio data with a higher signal-to-noise ratio with the audio data with the lower signal-to-noise ratio. 11. A system comprising: a memory; and at least one processor communicatively coupled to the memory and configured to: receive audio data from microphones associated with a plurality of devices during the system sleep mode, including at least one laptop computer and at least one device that is other than a laptop computer, distributed across an area of interest; determine a first timestamp associated with audio data from a first one of the microphones from the at least one laptop computer and a second timestamp associated with audio data from a second one of the microphones from the at least one device that is other than a laptop computer are within a synchronization interval; synchronize the received audio data from the at least one laptop computer and the at least one device that is other than a laptop computer based on the synchronization interval; detect a trigger word in the synchronized audio data received from the plurality of devices; responsive to detecting the trigger word, both switch the system in an active mode in which subsequent audio data is received from the plurality of devices at a smaller time interval in the active mode relative a larger time interval in the sleep mode and activate a voice command processing system associated with a multimedia device; determine, based on the audio data received from at least two or more of the plurality of devices, that the audio data includes a voice command associated with the multimedia device, wherein at least one microphone associated with each of the two or more of the plurality of devices receives at least a portion of the voice command; and signal the multimedia device to perform an action in accordance with the voice command. 12. The system of claim 11 , wherein to detect a trigger word in the audio data, the at least one processor is configured to: receive an indication from at least one of the plurality of devices that the trigger word has been received by one of the microphones, wherein the at least one of the plurality of devices determines that at least one microphone associated with the at least one of the plurality of devices received audio data corresponding to the trigger word. 13. The system of claim 11 , wherein to determine that the received audio data includes the voice command, the at least one processor is configured to: determine that the audio data includes the trigger word based on the audio data received from the two or more of the plurality of devices. 14. The system of claim 11 , wherein to receive audio data, the at least one processor is configured to: determine a signal-to-noise ratio for audio data received from each of the plurality of devices. 15. The system of claim 14 , wherein to determine the signal-to-noise ratio

Assignees

Inventors

Classifications

  • Noise filtering · CPC title

  • Word spotting · CPC title

  • G10L15/08Primary

    Speech classification or search · CPC title

  • Execution procedure of a spoken command · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10388273B2 cover?
Disclosed herein are system, method, and computer program product embodiments for distributed voice processing. An embodiment operates by receiving audio data from microphones associated with a plurality of devices distributed across an area of interest. A trigger word is detected in the audio data received from at least one of the devices. Responsive to detecting the trigger word, a voice comm…
Who is the assignee on this patent?
Roku Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 20 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).