Lip-language identification method and apparatus, and augmented reality (AR) device and storage medium which identifies an object based on an azimuth angle associated with the AR field of view

US11527242B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11527242-B2
Application numberUS-201916610254-A
CountryUS
Kind codeB2
Filing dateApr 24, 2019
Priority dateApr 26, 2018
Publication dateDec 13, 2022
Grant dateDec 13, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A lip-language identification method and an apparatus thereof, an augmented reality device and a storage medium. The lip-language identification method includes: acquiring a sequence of face images for an object to be identified; performing lip-language identification based on a sequence of face images so as to determine semantic information of speech content of the object to be identified corresponding to lip actions in a face image; and outputting the semantic information.

First claim

Opening claim text (preview).

What is claimed is: 1. A lip-language identification method based on an augmented reality device, the augmented reality device comprising a camera device and an infrared sensor, the method comprising: acquiring, by the augmented reality device, a sequence of face images for an object to be identified; sending, by the augmented reality device, the sequence of face images to a server; performing, by the server, lip-language identification based on the sequence of face images, so as to determine semantic information of speech content of the object to be identified corresponding to lip actions in the face images; and receiving, by the augmented reality device, the semantic information sent by the server and outputting the semantic information, wherein acquiring the sequence of face images for the object to be identified, comprises: acquiring a sequence of images including the object to be identified; positioning the object to be identified and acquiring azimuth of the object to be identified; and determining a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and generating the sequence of face images by cropping an image of the face region of the object to be identified from each frame of the images; and wherein positioning the azimuth of the object to be identified, comprises: positioning the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking, and positioning the azimuth of the object to be identified by sensing the object to be identified through the infrared sensor; wherein the azimuth of the object to be identified is an angle between the position of the object to be identified and a central axis of the field of view range of the camera device; wherein the semantic information is semantic text information and/or semantic audio information; wherein outputting the semantic information comprises: displaying, by the augmented reality device, the semantic text information within a visual field of a user wearing the augmented reality device, in response to receiving a display mode instruction; and playing, by the augmented reality device, the semantic audio information, in response to receiving an audio mode instruction. 2. The lip-language identification method according to claim 1 , further comprising saving the sequence of face images, after acquiring the sequence of face images for the object to be identified. 3. The lip-language identification method according to claim 2 , wherein sending the sequence of face images to the server comprises: sending the saved sequence of face images to the server upon receiving a sending instruction. 4. A lip-language identification apparatus, comprising: a processor; and a machine-readable storage medium, storing instructions that are executed by the processor for performing the lip-language identification method according to claim 1 . 5. A storage medium that stores non-transitorily computer readable instructions that, when executed by a computer, the computer executes instructions for the lip-language identification method according to claim 1 . 6. A lip-language identification apparatus, comprising: a face image sequence acquiring unit, configured to acquire a sequence of face images for an object to be identified; a sending unit, configured to send the sequence of face images to a server, wherein the server determines semantic information corresponding to lip actions in the face images by performing lip-language identification; and a receiving unit, configured to receive semantic information from the server, an output unit, configured to output semantic information; wherein the face image sequence acquiring unit comprises: an image sequence acquiring subunit, configured to acquire a sequence of images for the object to be identified; a positioning subunit, configured to position an azimuth of the object to be identified; and a face image sequence generation subunit, configured to determine a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and crop an image of the face region of the object to be identified from the each frame image so as to generate the sequence of face images; and wherein the positioning subunit is further configured to position the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking, and position the azimuth of the object to be identified by sensing the object to be identified through an infrared sensor; wherein the azimuth of the object to be identified is an angle between the position of the object to be identified and a central axis of the field of view range of a camera device; wherein the output unit comprises: an output mode instruction generation subunit, configured to generate a display mode instruction, wherein the output mode instruction includes a display mode instruction and an audio mode instruction; wherein the semantic information is semantic text information and/or semantic audio information, and the output unit further comprises: a display subunit, configured to display the semantic text information within a visual field of a user wearing an augmented reality device upon receiving the display mode instruction; and a play subunit, configured to play the semantic audio information upon receiving the audio mode instruction. 7. An augmented reality device, comprising the lip-language identification apparatus according to claim 6 . 8. The augmented reality device according to claim 7 , further comprising a camera device, a display device or a play device; wherein the camera device is configured to capture an image of the object to be identified; the display device is configured to display semantic information; and the play device is configured to play the semantic information.

Assignees

Inventors

Classifications

  • using position of the lips, movement of the lips or face analysis · CPC title

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • G06V40/171Primary

    Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title

  • Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

  • Head mounted · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11527242B2 cover?
A lip-language identification method and an apparatus thereof, an augmented reality device and a storage medium. The lip-language identification method includes: acquiring a sequence of face images for an object to be identified; performing lip-language identification based on a sequence of face images so as to determine semantic information of speech content of the object to be identified corr…
Who is the assignee on this patent?
Boe Technology Group Co Ltd, Beijing Boe Technology Dev Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06V40/171. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).