Dialog management for multiple users
US-2022093101-A1 · Mar 24, 2022 · US
US11830289B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11830289-B2 |
| Application number | US-202016898721-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 11, 2020 |
| Priority date | Dec 11, 2017 |
| Publication date | Nov 28, 2023 |
| Grant date | Nov 28, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Far field devices typically rely on audio only for enabling user interaction and involve only audio processing. Adding a vision-based modality can greatly improve the user interface of far field devices to make them more natural to the user. For instance, users can look at the device to interact with it rather than having to repeatedly utter a wakeword. Vision can also be used to assist audio processing, such as to improve the beamformer. For instance, vision can be used for direction of arrival estimation. Combining vision and audio can greatly enhance the user interface and performance of far field devices.
Opening claim text (preview).
What is claimed is: 1. A method for interferer rejection in vision-based attention detection, comprising: applying a people detector to a video frame of a video stream, to determine a bounding box of a detected person; detecting an interferer in the video stream, wherein the interferer is an inanimate stationary rectangular object; and in response to determining that the bounding box of the detected person is contained within a bounding box of the interferer, ignoring the bounding box of the detected person for attention detection processing. 2. The method of claim 1 , wherein: detecting the interferer comprises determining a lack of features indicating attention in an area of the video frame where a person was detected. 3. The method of claim 1 , further comprising: detecting the interferer comprises applying a classifier trained to detect classes of interferers, to video frames of the video stream. 4. The method of claim 1 , further comprising: in response to determining that the bounding box does not include the interferer, applying a frontal face detector to the bounding box to detect attention. 5. The method of claim 1 , further comprising: maintaining a list of one or more detected interferers across video frames, wherein the list comprises one or more bounding boxes of the detected interferers. 6. The method of claim 1 , further comprising: maintaining state information across video frames for one or more previously-detected people, wherein the state information for a given previously-detected person tracks a starting time when feature indicating attention is detected for the given previously-detected person. 7. The method of claim 1 , further comprising: detecting a frontal face in the bounding box; and maintaining state information across video frames for one or more previously-detected persons, wherein the state information for a given previously-detected person tracks a period of time that the frontal face has been detected for the given previously-detected person. 8. The method of claim 7 , further comprising: comparing the period of time that the frontal face has been detected for the given previously-detected person against a threshold; and outputting an attention event in response to determining that the period of time exceeds the threshold. 9. A method for interferer rejection in vision-based attention detection, comprising: detecting a user in a video frame of a video stream; detecting an interferer in the video stream, wherein the interferer is an inanimate rectangular object; and in response to determining that the interferer is co-located with the user, ignoring the user for attention detection processing being executed by a far field device. 10. The method of claim 9 , wherein: detecting the interferer comprises determining a lack of features indicating attention in an area of the video frame where the user was detected. 11. The method of claim 9 , further comprising: detecting the interferer comprises applying a classifier trained to detect classes of interferers to video frames of the video stream. 12. The method of claim 9 , further comprising: waking up the far field device based on the user looking at the far field device.
Classification, e.g. identification · CPC title
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
Control circuits for electronic adaptation of the sound field · CPC title
Video; Image sequence · CPC title
Face · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.