Information processing apparatus, information processing method, and program
US-2020106884-A1 · Apr 2, 2020 · US
US11527242B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11527242-B2 |
| Application number | US-201916610254-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 24, 2019 |
| Priority date | Apr 26, 2018 |
| Publication date | Dec 13, 2022 |
| Grant date | Dec 13, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A lip-language identification method and an apparatus thereof, an augmented reality device and a storage medium. The lip-language identification method includes: acquiring a sequence of face images for an object to be identified; performing lip-language identification based on a sequence of face images so as to determine semantic information of speech content of the object to be identified corresponding to lip actions in a face image; and outputting the semantic information.
Opening claim text (preview).
What is claimed is: 1. A lip-language identification method based on an augmented reality device, the augmented reality device comprising a camera device and an infrared sensor, the method comprising: acquiring, by the augmented reality device, a sequence of face images for an object to be identified; sending, by the augmented reality device, the sequence of face images to a server; performing, by the server, lip-language identification based on the sequence of face images, so as to determine semantic information of speech content of the object to be identified corresponding to lip actions in the face images; and receiving, by the augmented reality device, the semantic information sent by the server and outputting the semantic information, wherein acquiring the sequence of face images for the object to be identified, comprises: acquiring a sequence of images including the object to be identified; positioning the object to be identified and acquiring azimuth of the object to be identified; and determining a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and generating the sequence of face images by cropping an image of the face region of the object to be identified from each frame of the images; and wherein positioning the azimuth of the object to be identified, comprises: positioning the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking, and positioning the azimuth of the object to be identified by sensing the object to be identified through the infrared sensor; wherein the azimuth of the object to be identified is an angle between the position of the object to be identified and a central axis of the field of view range of the camera device; wherein the semantic information is semantic text information and/or semantic audio information; wherein outputting the semantic information comprises: displaying, by the augmented reality device, the semantic text information within a visual field of a user wearing the augmented reality device, in response to receiving a display mode instruction; and playing, by the augmented reality device, the semantic audio information, in response to receiving an audio mode instruction. 2. The lip-language identification method according to claim 1 , further comprising saving the sequence of face images, after acquiring the sequence of face images for the object to be identified. 3. The lip-language identification method according to claim 2 , wherein sending the sequence of face images to the server comprises: sending the saved sequence of face images to the server upon receiving a sending instruction. 4. A lip-language identification apparatus, comprising: a processor; and a machine-readable storage medium, storing instructions that are executed by the processor for performing the lip-language identification method according to claim 1 . 5. A storage medium that stores non-transitorily computer readable instructions that, when executed by a computer, the computer executes instructions for the lip-language identification method according to claim 1 . 6. A lip-language identification apparatus, comprising: a face image sequence acquiring unit, configured to acquire a sequence of face images for an object to be identified; a sending unit, configured to send the sequence of face images to a server, wherein the server determines semantic information corresponding to lip actions in the face images by performing lip-language identification; and a receiving unit, configured to receive semantic information from the server, an output unit, configured to output semantic information; wherein the face image sequence acquiring unit comprises: an image sequence acquiring subunit, configured to acquire a sequence of images for the object to be identified; a positioning subunit, configured to position an azimuth of the object to be identified; and a face image sequence generation subunit, configured to determine a position of a face region of the object to be identified in each frame of image in the sequence of images according to the positioned azimuth of the object to be identified; and crop an image of the face region of the object to be identified from the each frame image so as to generate the sequence of face images; and wherein the positioning subunit is further configured to position the azimuth of the object to be identified according to a voice signal emitted when the object to be identified is speaking, and position the azimuth of the object to be identified by sensing the object to be identified through an infrared sensor; wherein the azimuth of the object to be identified is an angle between the position of the object to be identified and a central axis of the field of view range of a camera device; wherein the output unit comprises: an output mode instruction generation subunit, configured to generate a display mode instruction, wherein the output mode instruction includes a display mode instruction and an audio mode instruction; wherein the semantic information is semantic text information and/or semantic audio information, and the output unit further comprises: a display subunit, configured to display the semantic text information within a visual field of a user wearing an augmented reality device upon receiving the display mode instruction; and a play subunit, configured to play the semantic audio information upon receiving the audio mode instruction. 7. An augmented reality device, comprising the lip-language identification apparatus according to claim 6 . 8. The augmented reality device according to claim 7 , further comprising a camera device, a display device or a play device; wherein the camera device is configured to capture an image of the object to be identified; the display device is configured to display semantic information; and the play device is configured to play the semantic information.
using position of the lips, movement of the lips or face analysis · CPC title
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title
Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Head mounted · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.