Methods, apparatuses and computer program products for facilitating actions based on text captured by head mounted devices

US2025182410A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025182410-A1
Application numberUS-202318529827-A
CountryUS
Kind codeA1
Filing dateDec 5, 2023
Priority dateDec 5, 2023
Publication dateJun 5, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system and method for determining interesting text to trigger actions of devices are provided. The system may include one or more head-mounted devices associated with a network. A head mounted device(s) may capture an image(s) and/or a video(s) corresponding to an environment detected in a field of view of a camera(s). The image(s) and/or the video(s) may include one or more text items associated with the environment. The head mounted device may determine whether a text item(s) of the one or more text items is interesting. The head mounted device may extract the text item(s) determined as being interesting and may superimpose the text item(s) at a position in the image(s) and/or the video(s). The head mounted device may trigger, based on the text item(s) determined as being interesting, one or more actions capable of being performed by the head mounted device.

First claim

Opening claim text (preview).

What is claimed: 1 . A method comprising: capturing, via a head mounted device, at least one image or at least one video corresponding to an environment detected in a field of view of at least one camera, wherein the at least one image or the at least one video comprises one or more text items associated with the environment; determining whether at least one text item of the one or more text items is interesting; extracting the at least one text item determined as being interesting and superimposing the at least one text item at a position in the at least one image or the at least one video; and triggering, based on the at least one text item determined as being interesting, one or more actions capable of being performed by the head mounted device. 2 . The method of claim 1 , wherein the head mounted device comprises smart glasses or an augmented or virtual reality device. 3 . The method of claim 1 , further comprising: the determining whether the at least one text item of the one or more text items is interesting comprises determining that the at least one text item corresponds to at least one predetermined content item of text content designated as interesting associated with training data of one or more machine learning models. 4 . The method of claim 1 , further comprising: determining that the at least one text item is interesting in response to determining, based on the at least one image or the at least one video, that at least one hand of a user holds, or points to, an object associated with the at least one text item. 5 . The method of claim 4 , further comprising: cropping out one or more regions or other text items captured in the at least one image or the at least one video in response to determining the at least one hand of the user holds, or points to, the object associated with the at least one text item to obtain a second image or a second video comprising the at least one text item and excluding the one or more regions or the other text items. 6 . The method of claim 1 , further comprising: determining, based on analyzing the at least one image or the at least one video, at least one region of interest associated with the at least one text item determined as being interesting and one or more background items or other items associated with a scene of the environment; and cropping out the one or more background items or the other items to obtain a second image or a second video comprising the at least one region of interest and the at least one text item. 7 . The method of claim 1 , wherein at least one action of the one or more actions comprises translating the at least one text item in a first language to a translated text item in a second language associated with the head mounted device. 8 . The method of claim 7 , further comprising: performing the translating the at least one text item in the first language to the translated text item in the second language in response to detecting at least one finger of a user, associated with the head mounted device, pointing at or hovering over the at least one text item in the environment. 9 . The method of claim 7 , further comprising: presenting, by the head mounted device, the translated text item in the second language superimposed within the at least one image or the at least one video. 10 . The method of claim 1 , further comprising: generating a prompt, by the head mounted device, enabling a user associated with the head mounted device to select at least one action of the one or more actions to enable the head mounted device to perform the at least one action. 11 . An apparatus comprising: one or more processors; and at least one memory storing instructions, that when executed by the one or more processors, cause the apparatus to: capture, via the apparatus, at least one image or at least one video corresponding to an environment detected in a field of view of at least one camera, wherein the at least one image or the at least one video comprises one or more text items associated with the environment; determine whether at least one text item of the one or more text items is interesting; extract the at least one text item determined as being interesting and superimposing the at least one text item at a position in the at least one image or the at least one video; and trigger, based on the at least one text item determined as being interesting, one or more actions capable of being performed by the apparatus. 12 . The apparatus of claim 11 , wherein the apparatus comprises a head mounted device, smart glasses or an augmented or virtual reality device. 13 . The apparatus of claim 11 , wherein when the one or more processors further execute the instructions, the apparatus is configured to: perform the determine whether the at least one text item of the one or more text items is interesting by determining that the at least one text item corresponds to at least one predetermined content item of text content designated as interesting associated with training data of one or more machine learning models. 14 . The apparatus of claim 11 , wherein when the one or more processors further execute the instructions, the apparatus is configured to: determine that the at least one text item is interesting in response to determining, based on the at least one image or the at least one video, that at least one hand of a user holds, or points to, an object associated with the at least one text item. 15 . The apparatus of claim 14 , wherein when the one or more processors further execute the instructions, the apparatus is configured to: crop out one or more regions or other text items captured in the at least one image or the at least one video in response to determining the at least one hand of the user holds, or points to, the object associated with the at least one text item to obtain a second image or a second video comprising the at least one text item and excluding the one or more regions or the other text items. 16 . The apparatus of claim 11 , wherein when the one or more processors further execute the instructions, the apparatus is configured to: determine, based on analyzing the at least one image or the at least one video, at least one region of interest associated with the at least one text item determined as being interesting and one or more background items or other items associated with a scene of the environment; and crop out the one or more background items or the other items to obtain a second image or a second video comprising the at least one region of interest and the at least one text item. 17 . The apparatus of claim 11 , wherein at least one action of the one or more actions comprises translating the at least one text item in a first language to a translated text item in a second language associated with the apparatus. 18 . The apparatus of claim 17 , wherein when the one or more processors further execute the instructions, the apparatus is configured to: perform the translating the at least one text item in the first language to the translated text item in the second language in response to detecting at least one finger of a user, associated with the apparatus, pointing at or hovering over the at least one text item in the environment. 19 . A non-transitory computer-readable medium storing instructions that, when executed, cause: capturing, via a head mounted device, at least one image or at least one video corresponding to an environment detected in a field of view of at least one camera, wherein th

Assignees

Inventors

Classifications

  • G06F3/011Primary

    Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title

  • Eyeglass type (eyeglass details G02C) · CPC title

  • Head mounted · CPC title

  • Static hand or arm · CPC title

  • Character recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025182410A1 cover?
A system and method for determining interesting text to trigger actions of devices are provided. The system may include one or more head-mounted devices associated with a network. A head mounted device(s) may capture an image(s) and/or a video(s) corresponding to an environment detected in a field of view of a camera(s). The image(s) and/or the video(s) may include one or more text items associ…
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/011. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jun 05 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).