Audio user interaction recognition and context refinement

US9736604B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9736604-B2
Application numberUS-201213674690-A
CountryUS
Kind codeB2
Filing dateNov 12, 2012
Priority dateMay 11, 2012
Publication dateAug 15, 2017
Grant dateAug 15, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system which tracks a social interaction between a plurality of participants, includes a fixed beamformer that is adapted to output a first spatially filtered output and configured to receive a plurality of second spatially filtered outputs from a plurality of steerable beamformers. Each steerable beamformer outputs a respective one of the second spatially filtered outputs associated with a different one of the participants. The system also includes a processor capable of determining a similarity between the first spatially filtered output and each of the second spatially filtered outputs. The processor determines the social interaction between the participants based on the similarity between the first spatially filtered output and each of the second spatially filtered outputs.

First claim

Opening claim text (preview).

What is claimed: 1. A system to track social interactions between a plurality of participants, comprising: a fixed beamformer configured to: receive a plurality of second spatially filtered beam outputs from a plurality of steerable beamformers, each steerable beamformer configured to output a respective one of the second spatially filtered beam outputs and associated with a different participant of the plurality of participants; and generate a plurality of first spatially filtered beam outputs corresponding to a plurality of active speakers of the plurality of participants, the plurality of first spatially filtered beam outputs indicating a number of active speakers of the plurality of active speakers; and a processor configured to: determine similarities between the plurality of first spatially filtered beam outputs and the plurality of second spatially filtered beam outputs; based on the similarities, output a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers; based on the similarities, determine the social interactions between the plurality of participants; and identify a participation status associated with each steerable beamformer based on the social interactions. 2. The system of claim 1 , wherein the fixed beamformer comprises a fixed microphone array, and wherein each of the steerable beamformers comprises a steerable microphone array. 3. The system of claim 1 , wherein the fixed beamformer and the processor are included within a mobile device configured to track the social interactions and display a representation of the social interactions between the plurality of participants. 4. The system of claim 1 , wherein the fixed beamformer and the processor are included in at least one of a handset, a laptop, a tablet, a computer, or a netbook. 5. The system of claim 1 , wherein each of the plurality of steerable beamformers is included within a respective device, and wherein each respective device is configured to be associated with a different one of the participants. 6. The system of claim 5 , wherein each respective device comprises a headset worn by an associated participant. 7. The system of claim 1 , further comprising a user interface configured to display information representative of the social interactions between the participants, the information including the participation status. 8. The system of claim 7 , wherein a user interface display of the user interface is configured to graphically display representative indicators for all of the plurality of participants at once. 9. The system of claim 8 , wherein the user interface display is configured to zoom in on one of the participants via the user interface. 10. The system of claim 1 , further comprising a mobile device configured to track the social interactions and display a representation of the social interactions between the plurality of participants. 11. The system of claim 1 , wherein the processor is configured to calculate a correlation between the plurality of first spatially filtered beam outputs of the fixed beamformer and a selected one of the second spatially filtered beam outputs of the steerable beamformers. 12. The system of claim 11 , wherein the fixed beamformer is included within a first mobile device, and a selected steerable beamformer is included within a second mobile device that is different from the first mobile device. 13. The system of claim 1 , wherein the similarities are determined based on at least one of: a correlation, a least square fit with allowable time adjustment in a time domain or a frequency domain, a feature based approach based on at least one of linear prediction coding (LPC), mel-frequency cepstral coefficients (MFCC), or cross-cumulant, an empirical Kullback-Leibler divergence, or an Itakura-Saito distance. 14. The system of claim 1 , wherein the processor is further configured to determine a location of at least one of the participants. 15. A system to determine a similarity between an output of a fixed microphone array and outputs of a plurality of steerable microphone arrays, comprising: a processor configured to: receive first spatially filtered beam outputs from the fixed microphone array and second spatially filtered beam outputs from the steerable microphone arrays, wherein the first spatially filtered beam outputs are associated with a plurality of active speakers of a plurality of participants and the second spatially filtered beam outputs are associated with the plurality of participants, and wherein the first spatially filtered beam outputs indicate a number of active speakers of the plurality of participants; and determine similarities between the first spatially filtered beam outputs and the second spatially filtered beam outputs; and an output device that is configured to output, based on the similarities, a plurality of speaker identifiers (IDs), each speaker ID of the plurality of speaker IDs corresponding to a different active speaker of the plurality of active speakers, wherein the output device is further configured to output, based on the similarities, social interactions between the plurality of participants. 16. The system of claim 15 , wherein each spatially filtered beam output comprises an audio beam output. 17. The system of claim 15 , wherein the processor is further configured to determine the similarities between the first spatially filtered beam outputs and the second spatially filtered beam outputs a plurality of times, once for each of the steerable microphone arrays. 18. The system of claim 15 , wherein the processor and the output device are included within a device comprising at least one of a handset, a laptop, a tablet, a computer, or a netbook. 19. The system of claim 15 , wherein the processor is further configured to: determine a first active speaker of the plurality of participants based on an estimated direction of signal arrival; separate a spatially filtered beam output corresponding to the first active speaker from the output of the fixed microphone array using the estimated direction of signal arrival; and determine second similarities between the outputs of the steerable microphone arrays and the output of the fixed microphone array based on the first spatially filtered beam outputs, the second spatially filtered beam outputs, and the separated spatially filtered beam output corresponding to the first active speaker. 20. The system of claim 19 , wherein the estimated direction of signal arrival is estimated in three dimensions (3D). 21. The system of claim 15 , wherein the second spatially filtered beam outputs correspond to look directions of the plurality of participants. 22. The system of claim 19 , wherein the second spatially filtered beam outputs are generated by fixed broadside beamforming from active noise control (ANC) headsets. 23. The system of claim 15 , wherein the second spatially filtered beam outputs of the steerable microphone arrays indicate at least one look direction of at least one active speaker, and wherein to determine the similarities, the processor is configured to: for each active speaker of the at least one active speaker: find a maximum peak of a cross-correlation equation based on a separated output of the fixed microphone array and a look direction of the active speaker; and determine an angle of strong correlation associated w

Assignees

Inventors

Classifications

  • H04R29/005Primary

    Microphone arrays · CPC title

  • using ultrasonic, sonic or infrasonic waves · CPC title

  • the extracted parameters being correlation coefficients · CPC title

  • G10L25/48Primary

    specially adapted for particular use · CPC title

  • Conference systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9736604B2 cover?
A system which tracks a social interaction between a plurality of participants, includes a fixed beamformer that is adapted to output a first spatially filtered output and configured to receive a plurality of second spatially filtered outputs from a plurality of steerable beamformers. Each steerable beamformer outputs a respective one of the second spatially filtered outputs associated with a d…
Who is the assignee on this patent?
Qualcomm Inc
What technology area does this patent fall under?
Primary CPC classification H04R29/005. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Aug 15 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).