What technology area does this patent fall under?

Primary CPC classification G06V40/172. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Multi-modal far field user interfaces and vision-assisted audio processing

US11830289B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11830289-B2
Application number	US-202016898721-A
Country	US
Kind code	B2
Filing date	Jun 11, 2020
Priority date	Dec 11, 2017
Publication date	Nov 28, 2023
Grant date	Nov 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Far field devices typically rely on audio only for enabling user interaction and involve only audio processing. Adding a vision-based modality can greatly improve the user interface of far field devices to make them more natural to the user. For instance, users can look at the device to interact with it rather than having to repeatedly utter a wakeword. Vision can also be used to assist audio processing, such as to improve the beamformer. For instance, vision can be used for direction of arrival estimation. Combining vision and audio can greatly enhance the user interface and performance of far field devices.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for interferer rejection in vision-based attention detection, comprising: applying a people detector to a video frame of a video stream, to determine a bounding box of a detected person; detecting an interferer in the video stream, wherein the interferer is an inanimate stationary rectangular object; and in response to determining that the bounding box of the detected person is contained within a bounding box of the interferer, ignoring the bounding box of the detected person for attention detection processing. 2. The method of claim 1 , wherein: detecting the interferer comprises determining a lack of features indicating attention in an area of the video frame where a person was detected. 3. The method of claim 1 , further comprising: detecting the interferer comprises applying a classifier trained to detect classes of interferers, to video frames of the video stream. 4. The method of claim 1 , further comprising: in response to determining that the bounding box does not include the interferer, applying a frontal face detector to the bounding box to detect attention. 5. The method of claim 1 , further comprising: maintaining a list of one or more detected interferers across video frames, wherein the list comprises one or more bounding boxes of the detected interferers. 6. The method of claim 1 , further comprising: maintaining state information across video frames for one or more previously-detected people, wherein the state information for a given previously-detected person tracks a starting time when feature indicating attention is detected for the given previously-detected person. 7. The method of claim 1 , further comprising: detecting a frontal face in the bounding box; and maintaining state information across video frames for one or more previously-detected persons, wherein the state information for a given previously-detected person tracks a period of time that the frontal face has been detected for the given previously-detected person. 8. The method of claim 7 , further comprising: comparing the period of time that the frontal face has been detected for the given previously-detected person against a threshold; and outputting an attention event in response to determining that the period of time exceeds the threshold. 9. A method for interferer rejection in vision-based attention detection, comprising: detecting a user in a video frame of a video stream; detecting an interferer in the video stream, wherein the interferer is an inanimate rectangular object; and in response to determining that the interferer is co-located with the user, ignoring the user for attention detection processing being executed by a far field device. 10. The method of claim 9 , wherein: detecting the interferer comprises determining a lack of features indicating attention in an area of the video frame where the user was detected. 11. The method of claim 9 , further comprising: detecting the interferer comprises applying a classifier trained to detect classes of interferers to video frames of the video stream. 12. The method of claim 9 , further comprising: waking up the far field device based on the user looking at the far field device.

Assignees

Analog Devices Inc

Inventors

Classifications

G06V40/172Primary
Classification, e.g. identification · CPC title
G06T7/70
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
H04S7/30
Control circuits for electronic adaptation of the sound field · CPC title
G06T2207/10016
Video; Image sequence · CPC title
G06T2207/30201
Face · CPC title

Patent family

Related publications grouped by family.

View patent family 66819463

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11830289B2 cover?: Far field devices typically rely on audio only for enabling user interaction and involve only audio processing. Adding a vision-based modality can greatly improve the user interface of far field devices to make them more natural to the user. For instance, users can look at the device to interact with it rather than having to repeatedly utter a wakeword. Vision can also be used to assist audio p…
Who is the assignee on this patent?: Analog Devices Inc
What technology area does this patent fall under?: Primary CPC classification G06V40/172. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).