Interactive method and device

US11056108B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11056108-B2
Application numberUS-201816171242-A
CountryUS
Kind codeB2
Filing dateOct 25, 2018
Priority dateNov 8, 2017
Publication dateJul 6, 2021
Grant dateJul 6, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An interactive method and a device thereof are provided. The method includes obtaining voice data of the object in response to determining that the object is facing the interactive device and is in the utterance state; and establishing an interaction between the object and the interactive device based on the voice data. The method solves the technical problems in which current interactions need to set up wakeup terms for interactive devices which are prone to false wakeups through the wakeup terms due to an existence of a relatively small number of wakeup terms. The above methods can implement the technical effects of remote interactions without the need of a wakeup term.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by an interactive device, the method comprising: determining, by a camera of the interactive device, whether a plurality of objects is facing the interactive device and a time duration of stay exceeds a preset time duration; in response to determining that the plurality of objects is facing the interactive device and the time duration of stay exceeds the preset time duration: initiating a voice interaction between a closest object of the plurality of objects and the interactive device by actively providing an initial voice inquiry from the interactive device to the closest object, the closest object being at a shortest linear distance from the interactive device among the plurality of objects; in response to the initial voice inquiry, receiving, by a microphone array of the interactive device, voice data of the closest object; performing, by the microphone array, directional de-noising of the voice data based on information obtained by the camera and the microphone array including enhancing the voice data in a direction of the closest object while suppressing noises in directions different from the direction of the closest object; and establishing the voice interaction between the closest object and the interactive device based on the voice data comprising: performing a semantic analysis on the voice data; determining whether the voice data is relevant to the interactive device based on a result of the semantic analysis; and establishing the voice interaction between the closest object and the interactive device in response to determining that the voice data is relevant to the interactive device. 2. The method of claim 1 , wherein establishing the voice interaction between the closest object and the interactive device based on the voice data further comprising: performing a semantic analysis on the voice data; obtaining an operational instruction that matches a result of the semantic analysis; and controlling the interactive device according to the operational instruction. 3. The method of claim 2 , wherein the operational instruction comprises at least one of a voice response, an interface display, or an execution of an action. 4. The method of claim 1 , wherein determining whether the plurality of objects is facing the interactive device comprises: detecting mouth feature points from the plurality of objects; performing a real-time object monitoring on a coverage area of the camera of the interactive device; in response to detecting an appearance of the plurality of objects within the coverage area, performing a face recognition on each object of the plurality of objects and determining whether a corresponding object is facing the interactive device and is in an utterance state based on the mouth feature points from the corresponding object. 5. One or more computer readable media storing executable instructions that, when executed one or more processors, cause the one or more processors to perform acts comprising: determining, by a camera of an interactive device, whether a plurality of objects is facing the interactive device; in response to determining that the plurality of objects is facing the interactive device and a time duration of stay of the plurality of objects exceeds a preset time duration: initiating a voice interaction between a closest object of the plurality of objects and the interactive device by actively providing an initial voice inquiry from the interactive device to the closest object, the closest object being at a shortest linear distance from the interactive device among the plurality of objects; in response to the initial voice inquiry, receiving voice data of the closest object through a microphone array of the interactive device; performing, by the microphone array, directional de-noising of the voice data based on information obtained by the camera and the microphone array including enhancing the voice data in a direction of the closest object while suppressing noises in directions different from the direction of the closest object; establishing the voice interaction between the closest object and the interactive device; performing a semantic analysis on the voice data; determining whether the voice data is relevant to the interactive device based on a result of the semantic analysis; and establishing the voice interaction between the closest object and the interactive device in response to determining that the voice data is relevant to the interactive device. 6. The one or more computer readable media of claim 5 , wherein determining whether the plurality of objects is facing the interactive device comprises: detecting whether the plurality of objects exists within a preset area of scope; and in response to determining that the plurality of objects exists within the preset area of scope, determining whether the plurality if objects is facing the interactive device. 7. The one or more computer readable media of claim 6 , wherein detecting whether the plurality of objects exists within the preset area of scope comprises at least one of: detecting whether the plurality of objects exists within the preset area of scope through a sensor deployed in the preset area of scope; or detecting whether the plurality of objects exists within the preset area of scope through an infrared detector. 8. The one or more computer readable media of claim 6 , wherein determining whether the plurality of objects is facing the interactive device comprises determining whether the plurality of objects is facing the interactive device through face recognition. 9. The one or more computer readable media of claim 5 , the acts further comprising: determining whether each object of the plurality of objects is in an utterance state in response to determining that the plurality of objects is facing the interactive device and the time duration of stay exceeds the preset time duration; and wherein obtaining the voice data of the closest object includes obtaining the voice data of the closest object in response to determining that the closest object is in the utterance state. 10. A device comprising: a camera configured to obtain an image; one or more processors configured to: determine whether a plurality of objects is facing the interactive device and stays for a time duration that exceeds a preset time duration, and in response to determining that the plurality of objects is facing the interactive device and stays for the time duration that exceeds the preset time duration, initiate a voice interaction between a closest object of the plurality of objects and the interactive device by actively providing an initial voice inquiry from the interactive device to the closest object, the closest object being at a shortest linear distance from the interactive device among the plurality of objects; and a microphone array configured to: receive voice data of the closest object in response to the initial voice inquiry, and perform directional de-noising of the voice data based on information obtained by the camera and the microphone array including enhancing the voice data in a direction of the closest object while suppressing noises in directions different from the direction of the closest object, wherein the one or more processors are further configured to establish the voice interaction between the closest object and the interactive device according to the voice data by: performing a semantic analysis on the voice data; obtaining an operational instruction that matches a result of the semantic analysis; controlling the interactive device according to the operational instruction; determining whether the voice data is relevant to

Assignees

Inventors

Classifications

  • G06F3/167Primary

    Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Detection; Localisation; Normalisation · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • using position of the lips, movement of the lips or face analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11056108B2 cover?
An interactive method and a device thereof are provided. The method includes obtaining voice data of the object in response to determining that the object is facing the interactive device and is in the utterance state; and establishing an interaction between the object and the interactive device based on the voice data. The method solves the technical problems in which current interactions need…
Who is the assignee on this patent?
Alibaba Group Holding Ltd
What technology area does this patent fall under?
Primary CPC classification G06F3/167. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).