Hand-gesture activation of actionable items

US12554333B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12554333-B2
Application numberUS-202318211507-A
CountryUS
Kind codeB2
Filing dateJun 19, 2023
Priority dateJun 21, 2022
Publication dateFeb 17, 2026
Grant dateFeb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one implementation, a method of performing an action is performed at a device including an image sensor, one or more processors, and non-transitory memory. The method includes receiving, from the image sensor, one or more images of a physical environment. The method includes detecting, in the one or more images of the physical environment, one or more actionable items respectively associated with one or more actions. The method includes detecting, in the one or more images of the physical environment, a hand gesture indicating a particular actionable item. The method includes in response to detecting the hand gesture, performing an action associated with the particular actionable item.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising: at a device including an image sensor, one or more processors, and non-transitory memory: receiving, from the image sensor, one or more images of a physical environment; detecting, in the one or more images of the physical environment, one or more actionable items respectively associated with one or more actions, wherein detecting the one or more actionable items includes classifying the one or more actionable items to determine corresponding actions for the one or more actionable items, and wherein classifying the one or more actionable items includes performing computer vision on the one or more images of the physical environment; detecting, in the one or more images of the physical environment, a first hand gesture selecting a particular actionable item of the one or more actionable items, the particular actionable item associated with a plurality of actions including a first action and a second action different from the first action; in response to detecting the first hand gesture selecting the particular actionable item, without displaying a user interface element to perform the first action associated with the particular actionable item, performing the first action associated with the particular actionable item; detecting, in the one or more images of the physical environment, a second hand gesture selecting the particular actionable item; and in response to detecting the second hand gesture selecting the particular actionable item, without displaying a user interface element to perform the second action associated with the particular actionable item, performing the second action associated with the particular actionable item. 2 . The method of claim 1 , wherein detecting the one or more actionable items includes detecting machine-readable content. 3 . The method of claim 1 , wherein detecting the one or more actionable items includes detecting an object. 4 . The method of claim 3 , wherein performing the first action includes changing a state of the object. 5 . The method of claim 1 , wherein performing the first action includes playing audio based on the particular actionable item. 6 . The method of claim 5 , wherein the audio includes at least one of: a reading of the particular actionable item, a definition of the particular actionable item, or a translation of the particular actionable item. 7 . The method of claim 1 , wherein performing the first action includes initiating a phone call based on the particular actionable item. 8 . The method of claim 1 , wherein performing the first action includes storing, in the non-transitory memory, information based on the particular actionable item. 9 . The method of claim 1 , wherein performing the first action is further performed in response to a vocal command. 10 . The method of claim 9 , wherein performing the first action includes selecting, based on the vocal command, the first action from a plurality of actions associated with the particular actionable item. 11 . The method of claim 1 , wherein the device includes a communication interface, wherein the particular actionable item corresponds to another device, and wherein performing the first action includes transmitting data to the other device via the communication interface. 12 . The method of claim 11 , wherein the data indicates a request to change a state of the other device. 13 . The method of claim 1 , wherein the first hand gesture is a swipe hand gesture. 14 . The method of claim 1 , wherein the first hand gesture is a circling hand gesture. 15 . The method of claim 1 , wherein the first hand gesture is a tap hand gesture and the second hand gesture is a double-tap hand gesture. 16 . The method of claim 1 , wherein the first hand gesture is a circle hand gesture in which a finger contacts a thumb to form a circle. 17 . A device comprising: an image sensor; a non-transitory memory; and one or more processors to: receive, from the image sensor, one or more images of a physical environment; detect, in the one or more images of the physical environment, one or more actionable items respectively associated with one or more actions, wherein detecting the one or more actionable items includes classifying the one or more actionable items to determine corresponding actions for the one or more actionable items, and wherein classifying the one or more actionable items includes performing computer vision on the one or more images of the physical environment; detect, in the one or more images of the physical environment, a first hand gesture selecting a particular actionable item of the one or more actionable items, the particular actionable item associated with a plurality of actions including a first action and a second action different from the first action; in response to detecting the first hand gesture selecting the particular actionable item, without displaying a user interface element to perform a the first action associated with the particular actionable item, perform the first action associated with the particular actionable item; detect, in the one or more images of the physical environment, a second hand gesture selecting the particular actionable item; and in response to detecting the second hand gesture selecting the particular actionable item, without displaying a user interface element to perform the second action associated with the particular actionable item, perform the second action associated with the particular actionable item. 18 . The device of claim 17 , wherein the one or more processors are to detect the one or more actionable items by detecting machine-readable content or an object. 19 . The device of claim 17 , wherein the device does not include a display. 20 . A non-transitory memory storing one or more programs, which, when executed by one or more processors of a device including an image sensor cause the device to: receive, from the image sensor, one or more images of a physical environment; detect, in the one or more images of the physical environment, one or more actionable items respectively associated with one or more actions, wherein detecting the one or more actionable items includes classifying the one or more actionable items to determine corresponding actions for the one or more actionable items, and wherein classifying the one or more actionable items includes performing computer vision on the one or more images of the physical environment; detect, in the one or more images of the physical environment, a first hand gesture selecting a particular actionable item of the one or more actionable items, the particular actionable item associated with a plurality of actions including a first action and a second action different from the first action; in response to detecting the first hand gesture selecting the particular actionable item, without displaying a user interface element to perform a the first action associated with the particular actionable item, perform the first action associated with the particular actionable item; detect, in the one or more images of the physical environment, a second hand gesture selecting the particular actionable item; and in response to detecting the second hand gesture selecting the particular actionable item, without displaying a user interface element to perform the second action associated with the particular actionable item, perform the second action associated with the particular actionable item.

Assignees

Inventors

Classifications

  • Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title

  • Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title

  • Selection of displayed objects or displayed text elements (G06F3/0482 takes precedence) · CPC title

  • Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer · CPC title

  • Management of the audio stream, e.g. setting of volume, audio stream path · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12554333B2 cover?
In one implementation, a method of performing an action is performed at a device including an image sensor, one or more processors, and non-transitory memory. The method includes receiving, from the image sensor, one or more images of a physical environment. The method includes detecting, in the one or more images of the physical environment, one or more actionable items respectively associated…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06F3/017. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).