What technology area does this patent fall under?

Primary CPC classification G06V10/454. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Mar 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Machine learning system to score alt-text in image data

US2021073617A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2021073617-A1
Application number	US-201916567277-A
Country	US
Kind code	A1
Filing date	Sep 11, 2019
Priority date	Sep 11, 2019
Publication date	Mar 11, 2021
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are generally described for automatic scoring of alt-text for image data. In various examples, first image data and first text data describing the first image data may be received. A feature representation of the first image data may be determined using an encoder machine learning model. A hidden state representation may be determined using a decoder machine learning model based on the feature representation and a first word of the first text data. In some examples, a first score may be determined using the hidden state representation. The first score may include an indication of a descriptive capability of the first text data with respect to the first image data.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method of generating a score for alternative text (alt-text) in HTML, comprising: receiving first image data representing a first image; receiving first text data describing the first image; sending the first image data to an input layer of a convolutional neural network (CNN) trained to recognize objects; determining first feature data from a last convolutional layer of the CNN, the first feature data representing the first image data; sending the first feature data and the first text data to a decoder model comprising a temporal recurrent neural network (RNN) and an attentional model; generating, by the temporal RNN using a previous word of the first text data, a hidden state representation h t of a current word y t of the first text data; generating, by the attentional model, an image-dependent attentional vector h t using the hidden state representation h t and the first feature data; determining a probability of a current word y t by inputting the image-dependent attentional vector h t into a softmax output layer of the attentional model; and displaying the score for the first text data, wherein the score is an indication of a descriptive capability of the first text data with respect to the first image. 2 . The method of claim 1 , further comprising: detecting, by the CNN, a first object represented in the first image data; and determining, by the CNN, a matrix F=[f 1 ; . . . ; f K ], wherein f 1 corresponds to a feature representation of at least a portion of the first object and wherein K corresponds to a spatial location of a given feature representation of the matrix F. 3 . The method of claim 2 , further comprising determining a projection v k of the matrix F in a lower dimension; and determining a global image description f g , wherein the first feature data comprises the projection v k and the global image description f g . 4 . A method of scoring user-entered alt-text describing an image with a computing device, the method comprising: receiving, by the computing device, image data for an image; analyzing the image data with a classifier to identify features in the image; receiving, from a user input device alt-text data describing the features in the image; determining, by the computing device, a score for the user-entered alt-text data indicating how well the user-entered text data describes the features in the image; and causing a display of the score to appear on the user input device in association with the entered text data. 5 . The method of claim 4 , further comprising receiving the alt-text data in a field of graphical user interface (GUI) of the user input device, wherein the GUI displays the first image data. 6 . The method of claim 5 , further comprising using a model that receives as inputs the features detected in the image and previously entered alt-text data to determine a probability for a number of words that describe the features in the image and using the model to determine a score for the user-entered alt-text data, wherein the probability is recomputed after each alt-text word is entered by the user. 7 . The method of claim 6 , further comprising: causing the display of the score as one or more of a textual description, color code or numeric indication of how well the user-entered alt-text describes the features of the image on the user input device. 8 . The method of claim 7 , further comprising: using a temporal decoder model as the model to analyze the user-entered alt-text data, wherein the temporal decoder include a hidden representation h t of a first word of user-entered alt-text data based at least in part on a second word of the user-entered alt-text data, wherein the first word follows the second word. 9 . The method of claim 8 , further comprising: determining, by an attentional decoder model, an attentional score comprising a weight emphasizing at least one portion of the image data that corresponds to the hidden representation h t . 10 . The method of claim 9 , further comprising determining probability based at least in part on the attentional score. 11 . The method of claim 4 , further comprising: detecting, by a convolutional neural network (CNN), the features in image data; and determining, by the CNN, a matrix F=[f 1 ; . . . ; f K ], wherein G corresponds to a representation of at least a first feature at a first spatial location in the image data. 12 . The method of claim 4 , further comprising causing a display of the image with the user-entered alt-text data. 13 . A system for scoring text describing an image, the system comprising: at least one processor; and at least one non-transitory computer-readable memory storing instructions that, when executed by the at least one processor, are effective to program the at least one processor to: receive first image data; receive first text data comprising candidate alt-text describing the first image data; determine a first score for the first text data based on a probability that the first text data describes the first image data; and display an indication of the first score in association with the first text data. 14 . The system of claim 13 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: receive the first text data in a field of graphical user interface (GUI), wherein the GUI displays the first image data; and display the first score in association with the first text data and the first image data. 15 . The system of claim 14 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to generate a respective second score for the first text data as each word of the first text data is entered into the field of the GUI. 16 . The system of claim 13 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: generate, by a decoder machine learning model, second text data based at least in part on the first text data and the first image data; and display the first text data and the second text data in a field of a graphical user interface, wherein the second text data comprises suggested text that is descriptive of the first image data. 17 . The system of claim 13 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: determine, by a temporal decoder model, a hidden representation h t of a first word of the first text data based at least in part on a second word of the first text data, wherein the first word follows the second word in the first text data. 18 . The system of claim 17 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: determine, by an attentional decoder model, an attentional score comprising a weight emphasizing at least one portion of the first image data that corresponds to the hidden representation h t . 19 . The system of claim 18 , the at least

Assignees

Amazon Tech Inc

Inventors

Classifications

G06V10/454Primary
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
G06V10/82
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06N3/045
Combinations of networks · CPC title
G06F18/24
Classification techniques · CPC title

Patent family

Related publications grouped by family.

View patent family 72422269

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021073617A1 cover?: Techniques are generally described for automatic scoring of alt-text for image data. In various examples, first image data and first text data describing the first image data may be received. A feature representation of the first image data may be determined using an encoder machine learning model. A hidden state representation may be determined using a decoder machine learning model based on t…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06V10/454. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Mar 11 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).