Multi-modal interaction with intelligent assistants in voice command devices
US-2020312318-A1 · Oct 1, 2020 · US
US12437758B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12437758-B2 |
| Application number | US-202318169313-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 15, 2023 |
| Priority date | Nov 13, 2020 |
| Publication date | Oct 7, 2025 |
| Grant date | Oct 7, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Some embodiments of the present application disclose a display apparatus and a voice control method for the display apparatus. The display apparatus comprises a display, a detector and a controller. The display is configured to present a user interface, and the detector is configured to acquire user voice information; and the controller is configured to cause the display apparatus to perform: acquiring voice information inputted from a user; in response to the voice information, extracting at least one keyword from the voice information; traversing action items in a configuration library; in response to determining that no action item in the configuration library matches the at least one keyword, obtaining text information of the user interface on the display to in order to determine an control instruction according to the text information.
Opening claim text (preview).
What is claimed is: 1. A display apparatus, comprising: a display, configured to display an image from a broadcast system or a network, and/or a user interface; a detector, configured to acquire voice information from a user; and a controller, in connection with the display and the detector and configured to: display a user interface on the display; obtain the voice information input from the user while the user interface is displaying on the display; in response to the voice information, extract at least one keyword from the voice information, wherein the at least one keyword comprises a name content for indicating a controlled object and an action content for indicating an execution action; traverse action items in a configuration library, wherein controlled objects of the action items in the configuration library are configured according to applications built-in the display apparatus; in response to determining that no action item in the configuration library matches the at least one keyword, obtain text information of the user interface on the display, and obtain layout information of the user interface; extract a function control in a layout of the user interface according to the text information, wherein the function control is a control having a first text presented on the display and matched with the at least one keyword; and generate a control instruction according to the function control and the voice information; in response to determining that a first action item in the configuration library matches the at least one keyword, cause the display apparatus to execute the first action item; wherein the controller is further configured to: traverse positions of all controls in the layout information of the user interface; calculate a distance between a position of a second text in the text content in an image of the user interface and a position of a second control among the controls in the layout information of the user interface; and in response to determining that the distance is less than or equal to a preset distance threshold, mark the second control corresponding to the distance as the function control. 2. The display apparatus according to claim 1 , wherein the controller is further configured to: acquire the voice information from the user via the detector; convert the voice information into a voice text; and extract the at least one keyword from the voice text. 3. The display apparatus according to claim 1 , wherein the first action item comprises an action item of which a controlled object is the same as or similar to the name content in the at least one keyword and an action of which an execution action is the same as or similar to the execution action in the at least one keyword. 4. The display apparatus according to claim 2 , wherein the controller is further configured to: determine whether the voice text include an action instruction through a preset semantic recognition model; in response to determining that the voice text include the action instruction, proceed to extract the at least one keyword from the voice text; and in response to determining that the voice text include no action instruction, cause the display to present a prompt, wherein the prompt comprises the voice text extracted from the voice information of the user. 5. The display apparatus according to claim 1 , wherein the controller is further configured to: take a screenshot of the user interface on the display to generate an image of the user interface; and perform optical character recognition (OCR) on the image of the user interface to obtain the text information of the user interface, wherein the text information comprises a text content and a position of the text content in the image of the user interface. 6. The display apparatus according to claim 1 , wherein the controller is further configured to: construct a set of words associated with the text information, wherein the set of associated words comprises a synonym of a name word in the text information; traverse all control names in the layout information of the user interface; compare the control names with the set of associated words; and in response to determining that a control name is the same as a content of any word item in the set of associated words, mark a control corresponding to the control name as the function control. 7. The display apparatus according to claim 1 , wherein the controller is further configured to: obtain one or more operation types supported by the function control and an action type specified based on the voice information; compare the one or more operation types with the action type; and in response to determining that at least one of the one or more operation types is the same as the action type, generate the control instruction. 8. The display apparatus according to claim 1 , wherein the controller is further configured to: execute the control instruction; and construct an action item in a configuration library based on the control instruction and the controlled object. 9. The display apparatus according to claim 1 , wherein the function control comprises a control which is able to configure with a picture or text for visual presentation on a user interface and an application icon. 10. A voice control method for a display apparatus, comprising: displaying a user interface on a display of the display apparatus, wherein the display is configured to display an image from a broadcast system or a network, and/or display the user interface: obtaining voice information input from a user while the user interface is displaying on the display; in response to the voice information, extracting at least one keyword from the voice information, wherein the at least one keyword comprises a name content for indicating a controlled object and an action content for indicating an execution action; traversing action items in a configuration library, wherein controlled objects of the action items in the configuration library are configured according to applications built-in the display apparatus; in response to determining that no action item in the configuration library matches the at least one keyword, obtaining text information of the user interface on the display, and obtaining layout information of the user interface; extracting a function control in a layout of the user interface according to the text information, wherein the function control is a control having a first text presented on the display and matched with the at least one keyword; and generate a control instruction according to the function control and the voice information; in response to determining that a first action item in the configuration library matches the at least one keyword, causing the display apparatus to execute the first action item; wherein the voice control method further comprises: traversing positions of all controls in the layout information of the user interface; calculating a distance between a position of a second text in the text content in an image of the user interface and a position of a second control among the controls in the layout information of the user interface; and in response to determining that the distance is less than or equal to a preset distance threshold, marking the second control corresponding to the distance as the function control. 11. The voice control method according to claim 10 , further comprising: acquiring the voice information from the user via the detector; converting the voice information into a voice text; and extracting the at least one keyword from the voice text. 12. The voice control method according to claim 10 , wherein the first
Speech classification or search · CPC title
Character recognition · CPC title
Recognising information on displays, dials, clocks · CPC title
Overlay text, e.g. embedded captions in a TV programme · CPC title
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.