Image analysis using gaze tracking and utterance dictation

US12566495B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12566495-B2
Application numberUS-202318511872-A
CountryUS
Kind codeB2
Filing dateNov 16, 2023
Priority dateNov 22, 2022
Publication dateMar 3, 2026
Grant dateMar 3, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for evaluating images are disclosed. An image is displayed. Data corresponding to a gaze fixation of a user who is viewing the image is obtained. An image artifact is identified based on the gaze fixation. The image artifact is at a location in the image where the gaze fixation occurred. A recording of the user's utterance is accessed. This recording is recorded during an overlapping time period with when the gaze fixation occurred. The recording is transcribed into text, which is then parsed. A key term is extracted from the parsed text. A determination is made as to whether the key term corresponds to the image artifact identified based on the gaze fixation.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for facilitating an evaluation of an image by comparing an identified image artifact included in the image against a gaze fixation that is directed toward the image artifact and against a recording of an utterance that is also associated with the image artifact, said method comprising: displaying an image; while the image is displayed, tracking a gaze of a user who is viewing the image, wherein tracking the gaze includes identifying a gaze fixation of the user with respect to the image; providing an instruction to the user to prompt the user's gaze to follow a specific gaze pattern when the image is displayed, wherein an image artifact is overlapped by the specific gaze pattern; determining that the image artifact is represented within the image at a location in the image where the gaze fixation occurred; determining whether the user's gaze fixated on the image artifact while the user's gaze followed the specific gaze pattern; while the image is displayed, recording an utterance of the user, said utterance being recorded during an overlapping time period with when the gaze fixation occurred; transcribing the recorded utterance into text; in response to parsing the text, extracting at least one key term from the parsed text; and determining whether the at least one key term accurately describes the image artifact where the gaze fixation occurred relative to the image. 2 . The method of claim 1 , wherein the image is a medical image. 3 . The method of claim 1 , wherein the gaze fixation of the user occurs in response to the user's gaze residing on the image artifact for at least a threshold time period. 4 . The method of claim 3 , wherein the threshold time period is at least 500 milliseconds. 5 . The method of claim 1 , wherein tracking the user's gaze is performed via a wearable gaze detector. 6 . The method of claim 1 , wherein tracking the user's gaze is performed via a device that is remote relative to the user such that the device is not a wearable device. 7 . The method of claim 1 , wherein tracking the user's gaze is performed via a headset that renders virtualized content. 8 . The method of claim 1 , wherein tracking the user's gaze is performed at a sampling rate that is between about 30 Hertz (Hz) and about 120 Hz. 9 . The method of claim 1 , wherein said method is performed by a handheld device. 10 . The method of claim 1 , wherein the image artifact is tagged with metadata describing subject matter of the image artifact, wherein determining whether the at least one key term accurately describes the image artifact is performed by comparing the metadata against the at least one key term. 11 . The method of claim 1 , wherein the image artifact is associated with label data describing subject matter of the image artifact. 12 . The method of claim 11 , wherein the label data is generated using a machine learning engine. 13 . The method of claim 11 , wherein the label data is manually provided via user input. 14 . A computer system comprising: one or more processors; and one or more hardware storage devices that store instructions that are executable by the one or more processors to cause the computer system to: display an image; while the image is displayed, track a gaze of a user who is viewing the image, wherein tracking the gaze includes identifying a gaze fixation of the user with respect to the image; provide an instruction to the user to prompt the user's gaze to follow a specific gaze pattern when the image is displayed, wherein an image artifact is overlapped by the specific gaze pattern; determine that the image artifact is represented within the image at a location in the image where the gaze fixation occurred; determine whether the user's gaze fixated on the image artifact while the user's gaze followed the specific gaze pattern; while the image is displayed, record an utterance of the user, said utterance being recorded during an overlapping time period with when the gaze fixation occurred; transcribe the recorded utterance into text; in response to parsing the text, extract a key term from the parsed text; and determine whether the key term corresponds to the image artifact, which is identified based on the gaze fixation. 15 . The computer system of claim 14 , wherein at least one of a visual cue or an audio cue is provided to the user while the user's gaze is being tracked. 16 . The computer system of claim 14 , wherein touch input is further received, and wherein the touch input, the key term, and data for the gaze fixation is used to determine whether the image artifact is accurately being described. 17 . A method comprising: displaying an image; obtaining data corresponding to a gaze fixation of a user who is viewing the image; providing an instruction to the user to prompt a gaze of the user to follow a specific gaze pattern when the image is displayed, wherein an image artifact is overlapped by the specific gaze pattern; determining that the image artifact is represented within the image and that the image artifact is at a location in the image where the gaze fixation occurred; determining whether the user's gaze fixated on the image artifact while the user's gaze followed the specific pattern; accessing a recording of an utterance of the user, said recording being recorded during an overlapping time period with when the gaze fixation occurred; transcribing the recording into text; in response to parsing the text, extracting a key term from the parsed text; and determining whether the key term corresponds to the image artifact identified based on the gaze fixation. 18 . The method of claim 17 , wherein the method further includes providing a second instruction to prompt the user to look at a second image artifact. 19 . The method of claim 18 , wherein the method further includes determining that the user's gaze fixated on the second image artifact. 20 . The method of claim 19 , wherein the method further includes at least one of: providing an audio respond stating that the user successfully fixated on the second image artifact or providing a visual alert stating that the user successfully fixated on the second image artifact.

Assignees

Inventors

Classifications

  • Labelling scene content, e.g. deriving syntactic or semantic representations · CPC title

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • Parsing · CPC title

  • Recognition of textual entities · CPC title

  • G06F3/013Primary

    Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12566495B2 cover?
Techniques for evaluating images are disclosed. An image is displayed. Data corresponding to a gaze fixation of a user who is viewing the image is obtained. An image artifact is identified based on the gaze fixation. The image artifact is at a location in the image where the gaze fixation occurred. A recording of the user's utterance is accessed. This recording is recorded during an overlapping…
Who is the assignee on this patent?
Intuitive Research And Tech Corporation, Uab Res Found
What technology area does this patent fall under?
Primary CPC classification G06F3/013. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 03 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).