Controlling of device based on user recognition utilizing vision and speech features

US11615760B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11615760-B2
Application numberUS-202017101770-A
CountryUS
Kind codeB2
Filing dateNov 23, 2020
Priority dateNov 22, 2019
Publication dateMar 28, 2023
Grant dateMar 28, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An artificial intelligence-based control method is disclosed. In an artificial intelligence-based control method according to an exemplary embodiment of the present disclosure, when a user approaches within a set sensing range of a device, the device may capture a user image and predict whether the user has an intent to use the device by using motion features included in the captured image. An AI control method of the present disclosure may be associated with an artificial intelligent module, an unmanned aerial vehicle (UAV), a robot, an augmented reality (AR) device, a virtual reality (VR) device, a 5G service-related device, etc.

First claim

Opening claim text (preview).

What is claimed is: 1. An artificial intelligence-based control method comprising: when a user approaches within a preset sensing range of a device, receiving an image including the user from the device; generating a first feature vector representing motion features from the image; producing a first output for determining whether there is an intent to use the device by applying the first feature vector to a first classification model; based on a determination that there is the intent to use the device according to the first output, generating a second feature vector representing gaze features from the image and producing a second output for determining whether there is the intent to use the device by applying the second feature vector to a second classification model; generating and transmitting a signal for controlling the device to turn on or off an information display function, wherein an on signal for controlling the device to turn on the information display function is generated and transmitted based on a determination that there is intent to use the device according to the first output and wherein an off signal for controlling the device to turn off the information display function is generated and transmitted based on a determination that there is no intent to use the device according to the second output; identifying a registered user based on vision features of the user including at least one among the motion features, facial expressions, and the gaze features; receiving a voice of the user; generating a third feature vector representing speech features from the voice; identifying a speaker having a most similar speech feature among a plurality of registered speakers by applying the third feature vector to a speaker identification model; based on an identification result based on the vision features and an identification result based on the speech features being different, modifying user information labeled with the vision features in such a way as to be mapped to user information identified based on the speech features, wherein the first output has a different value for each registered user. 2. The method of claim 1 , wherein the first and second classification models are convolutional neural network-based learning models. 3. The method of claim 1 , wherein the gaze features comprise at least one among a direction of gaze of the user, an amount of time the user looks at the device, and an angle between a camera placed in the device and irises. 4. The method of claim 1 , wherein the motion features comprise at least one of either a moving pattern or walking speed based on a skeleton of the user. 5. The method of claim 1 , further comprising generating a signal for performing control such that preferred content based on a registered history of use of the identified user is shown through a display. 6. The method of claim 1 , wherein the sensing range is an angle of view of a camera provided in the device. 7. The method of claim 1 , wherein the device is either a TV or an airport robot. 8. An intelligent device comprising: a communication module; a sensor configured to sense an access of a user; and a processor configured to: when the user approaches within a preset sensing range of the sensor, receive an image including the user from the device, generate a first feature vector representing motion features from the image, produce a first output for determining whether there is an intent to use the device by applying the first feature vector to a first classification model, based on a determination that there is the intent to use the device according to the first output, generate a second feature vector representing gaze features from the image and produce a second output for determining whether there is the intent to use the device by applying the second feature vector to a second classification model, generate and transmit a signal for controlling the device to turn on or off an information display function, wherein an on signal for controlling the device to turn on the information display function is generated and transmitted based on a determination that there is intent to use the device according to the first output and wherein an off signal for controlling the device to turn off the information display function is generated and transmitted based on a determination that there is no intent to use the device according to the second output, identify a registered user based on vision features of the user including at least one among the motion features, facial expressions, and the gaze features, receive a voice of the user, generate a third feature vector representing speech features from the voice, identify a speaker having a most similar speech feature among a plurality of registered speakers by applying the third feature vector to a speaker identification model, and based on an identification result based on the vision features and an identification result based on the speech features being different, modify user information labeled with the vision features in such a way as to be mapped to user information identified based on the speech features, wherein the first output has a different value for each registered user. 9. The intelligent device of claim 8 , wherein the first and second classification models are convolutional neural network-based learning models. 10. The intelligent device of claim 8 , wherein the gaze features comprise at least one among a direction of gaze of the user, an amount of time the user looks at the device, and an angle between a camera placed in the device and irises. 11. The intelligent device of claim 8 , wherein the motion features comprise at least one of either a moving pattern or walking speed based on a skeleton of the user. 12. The intelligent device of claim 8 , wherein the processor is further configured to generate a signal for performing control such that preferred content based on a registered history of use of the identified user is shown through a display.

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

  • G06F3/013Primary

    Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11615760B2 cover?
An artificial intelligence-based control method is disclosed. In an artificial intelligence-based control method according to an exemplary embodiment of the present disclosure, when a user approaches within a set sensing range of a device, the device may capture a user image and predict whether the user has an intent to use the device by using motion features included in the captured image. An …
Who is the assignee on this patent?
Lg Electronics Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/013. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).