What technology area does this patent fall under?

Primary CPC classification G06F3/167. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Interactive method and device

US11056108B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11056108-B2
Application number	US-201816171242-A
Country	US
Kind code	B2
Filing date	Oct 25, 2018
Priority date	Nov 8, 2017
Publication date	Jul 6, 2021
Grant date	Jul 6, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An interactive method and a device thereof are provided. The method includes obtaining voice data of the object in response to determining that the object is facing the interactive device and is in the utterance state; and establishing an interaction between the object and the interactive device based on the voice data. The method solves the technical problems in which current interactions need to set up wakeup terms for interactive devices which are prone to false wakeups through the wakeup terms due to an existence of a relatively small number of wakeup terms. The above methods can implement the technical effects of remote interactions without the need of a wakeup term.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by an interactive device, the method comprising: determining, by a camera of the interactive device, whether a plurality of objects is facing the interactive device and a time duration of stay exceeds a preset time duration; in response to determining that the plurality of objects is facing the interactive device and the time duration of stay exceeds the preset time duration: initiating a voice interaction between a closest object of the plurality of objects and the interactive device by actively providing an initial voice inquiry from the interactive device to the closest object, the closest object being at a shortest linear distance from the interactive device among the plurality of objects; in response to the initial voice inquiry, receiving, by a microphone array of the interactive device, voice data of the closest object; performing, by the microphone array, directional de-noising of the voice data based on information obtained by the camera and the microphone array including enhancing the voice data in a direction of the closest object while suppressing noises in directions different from the direction of the closest object; and establishing the voice interaction between the closest object and the interactive device based on the voice data comprising: performing a semantic analysis on the voice data; determining whether the voice data is relevant to the interactive device based on a result of the semantic analysis; and establishing the voice interaction between the closest object and the interactive device in response to determining that the voice data is relevant to the interactive device. 2. The method of claim 1 , wherein establishing the voice interaction between the closest object and the interactive device based on the voice data further comprising: performing a semantic analysis on the voice data; obtaining an operational instruction that matches a result of the semantic analysis; and controlling the interactive device according to the operational instruction. 3. The method of claim 2 , wherein the operational instruction comprises at least one of a voice response, an interface display, or an execution of an action. 4. The method of claim 1 , wherein determining whether the plurality of objects is facing the interactive device comprises: detecting mouth feature points from the plurality of objects; performing a real-time object monitoring on a coverage area of the camera of the interactive device; in response to detecting an appearance of the plurality of objects within the coverage area, performing a face recognition on each object of the plurality of objects and determining whether a corresponding object is facing the interactive device and is in an utterance state based on the mouth feature points from the corresponding object. 5. One or more computer readable media storing executable instructions that, when executed one or more processors, cause the one or more processors to perform acts comprising: determining, by a camera of an interactive device, whether a plurality of objects is facing the interactive device; in response to determining that the plurality of objects is facing the interactive device and a time duration of stay of the plurality of objects exceeds a preset time duration: initiating a voice interaction between a closest object of the plurality of objects and the interactive device by actively providing an initial voice inquiry from the interactive device to the closest object, the closest object being at a shortest linear distance from the interactive device among the plurality of objects; in response to the initial voice inquiry, receiving voice data of the closest object through a microphone array of the interactive device; performing, by the microphone array, directional de-noising of the voice data based on information obtained by the camera and the microphone array including enhancing the voice data in a direction of the closest object while suppressing noises in directions different from the direction of the closest object; establishing the voice interaction between the closest object and the interactive device; performing a semantic analysis on the voice data; determining whether the voice data is relevant to the interactive device based on a result of the semantic analysis; and establishing the voice interaction between the closest object and the interactive device in response to determining that the voice data is relevant to the interactive device. 6. The one or more computer readable media of claim 5 , wherein determining whether the plurality of objects is facing the interactive device comprises: detecting whether the plurality of objects exists within a preset area of scope; and in response to determining that the plurality of objects exists within the preset area of scope, determining whether the plurality if objects is facing the interactive device. 7. The one or more computer readable media of claim 6 , wherein detecting whether the plurality of objects exists within the preset area of scope comprises at least one of: detecting whether the plurality of objects exists within the preset area of scope through a sensor deployed in the preset area of scope; or detecting whether the plurality of objects exists within the preset area of scope through an infrared detector. 8. The one or more computer readable media of claim 6 , wherein determining whether the plurality of objects is facing the interactive device comprises determining whether the plurality of objects is facing the interactive device through face recognition. 9. The one or more computer readable media of claim 5 , the acts further comprising: determining whether each object of the plurality of objects is in an utterance state in response to determining that the plurality of objects is facing the interactive device and the time duration of stay exceeds the preset time duration; and wherein obtaining the voice data of the closest object includes obtaining the voice data of the closest object in response to determining that the closest object is in the utterance state. 10. A device comprising: a camera configured to obtain an image; one or more processors configured to: determine whether a plurality of objects is facing the interactive device and stays for a time duration that exceeds a preset time duration, and in response to determining that the plurality of objects is facing the interactive device and stays for the time duration that exceeds the preset time duration, initiate a voice interaction between a closest object of the plurality of objects and the interactive device by actively providing an initial voice inquiry from the interactive device to the closest object, the closest object being at a shortest linear distance from the interactive device among the plurality of objects; and a microphone array configured to: receive voice data of the closest object in response to the initial voice inquiry, and perform directional de-noising of the voice data based on information obtained by the camera and the microphone array including enhancing the voice data in a direction of the closest object while suppressing noises in directions different from the direction of the closest object, wherein the one or more processors are further configured to establish the voice interaction between the closest object and the interactive device according to the voice data by: performing a semantic analysis on the voice data; obtaining an operational instruction that matches a result of the semantic analysis; controlling the interactive device according to the operational instruction; determining whether the voice data is relevant to

Assignees

Alibaba Group Holding Ltd

Inventors

Classifications

G06F3/167Primary
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
G06V40/161
Detection; Localisation; Normalisation · CPC title
G10L15/22Primary
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L15/1815
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
G10L15/25
using position of the lips, movement of the lips or face analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 66328821

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11056108B2 cover?: An interactive method and a device thereof are provided. The method includes obtaining voice data of the object in response to determining that the object is facing the interactive device and is in the utterance state; and establishing an interaction between the object and the interactive device based on the voice data. The method solves the technical problems in which current interactions need…
Who is the assignee on this patent?: Alibaba Group Holding Ltd
What technology area does this patent fall under?: Primary CPC classification G06F3/167. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).