Multi-modal far field user interfaces and vision-assisted audio processing

US11830289B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11830289-B2
Application numberUS-202016898721-A
CountryUS
Kind codeB2
Filing dateJun 11, 2020
Priority dateDec 11, 2017
Publication dateNov 28, 2023
Grant dateNov 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Far field devices typically rely on audio only for enabling user interaction and involve only audio processing. Adding a vision-based modality can greatly improve the user interface of far field devices to make them more natural to the user. For instance, users can look at the device to interact with it rather than having to repeatedly utter a wakeword. Vision can also be used to assist audio processing, such as to improve the beamformer. For instance, vision can be used for direction of arrival estimation. Combining vision and audio can greatly enhance the user interface and performance of far field devices.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for interferer rejection in vision-based attention detection, comprising: applying a people detector to a video frame of a video stream, to determine a bounding box of a detected person; detecting an interferer in the video stream, wherein the interferer is an inanimate stationary rectangular object; and in response to determining that the bounding box of the detected person is contained within a bounding box of the interferer, ignoring the bounding box of the detected person for attention detection processing. 2. The method of claim 1 , wherein: detecting the interferer comprises determining a lack of features indicating attention in an area of the video frame where a person was detected. 3. The method of claim 1 , further comprising: detecting the interferer comprises applying a classifier trained to detect classes of interferers, to video frames of the video stream. 4. The method of claim 1 , further comprising: in response to determining that the bounding box does not include the interferer, applying a frontal face detector to the bounding box to detect attention. 5. The method of claim 1 , further comprising: maintaining a list of one or more detected interferers across video frames, wherein the list comprises one or more bounding boxes of the detected interferers. 6. The method of claim 1 , further comprising: maintaining state information across video frames for one or more previously-detected people, wherein the state information for a given previously-detected person tracks a starting time when feature indicating attention is detected for the given previously-detected person. 7. The method of claim 1 , further comprising: detecting a frontal face in the bounding box; and maintaining state information across video frames for one or more previously-detected persons, wherein the state information for a given previously-detected person tracks a period of time that the frontal face has been detected for the given previously-detected person. 8. The method of claim 7 , further comprising: comparing the period of time that the frontal face has been detected for the given previously-detected person against a threshold; and outputting an attention event in response to determining that the period of time exceeds the threshold. 9. A method for interferer rejection in vision-based attention detection, comprising: detecting a user in a video frame of a video stream; detecting an interferer in the video stream, wherein the interferer is an inanimate rectangular object; and in response to determining that the interferer is co-located with the user, ignoring the user for attention detection processing being executed by a far field device. 10. The method of claim 9 , wherein: detecting the interferer comprises determining a lack of features indicating attention in an area of the video frame where the user was detected. 11. The method of claim 9 , further comprising: detecting the interferer comprises applying a classifier trained to detect classes of interferers to video frames of the video stream. 12. The method of claim 9 , further comprising: waking up the far field device based on the user looking at the far field device.

Assignees

Inventors

Classifications

  • G06V40/172Primary

    Classification, e.g. identification · CPC title

  • Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title

  • Control circuits for electronic adaptation of the sound field · CPC title

  • Video; Image sequence · CPC title

  • Face · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11830289B2 cover?
Far field devices typically rely on audio only for enabling user interaction and involve only audio processing. Adding a vision-based modality can greatly improve the user interface of far field devices to make them more natural to the user. For instance, users can look at the device to interact with it rather than having to repeatedly utter a wakeword. Vision can also be used to assist audio p…
Who is the assignee on this patent?
Analog Devices Inc
What technology area does this patent fall under?
Primary CPC classification G06V40/172. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).