Systems and methods for identifying trees and estimating tree heights and other tree parameters
US-2024395033-A1 · Nov 28, 2024 · US
US2021073617A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2021073617-A1 |
| Application number | US-201916567277-A |
| Country | US |
| Kind code | A1 |
| Filing date | Sep 11, 2019 |
| Priority date | Sep 11, 2019 |
| Publication date | Mar 11, 2021 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are generally described for automatic scoring of alt-text for image data. In various examples, first image data and first text data describing the first image data may be received. A feature representation of the first image data may be determined using an encoder machine learning model. A hidden state representation may be determined using a decoder machine learning model based on the feature representation and a first word of the first text data. In some examples, a first score may be determined using the hidden state representation. The first score may include an indication of a descriptive capability of the first text data with respect to the first image data.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method of generating a score for alternative text (alt-text) in HTML, comprising: receiving first image data representing a first image; receiving first text data describing the first image; sending the first image data to an input layer of a convolutional neural network (CNN) trained to recognize objects; determining first feature data from a last convolutional layer of the CNN, the first feature data representing the first image data; sending the first feature data and the first text data to a decoder model comprising a temporal recurrent neural network (RNN) and an attentional model; generating, by the temporal RNN using a previous word of the first text data, a hidden state representation h t of a current word y t of the first text data; generating, by the attentional model, an image-dependent attentional vector h t using the hidden state representation h t and the first feature data; determining a probability of a current word y t by inputting the image-dependent attentional vector h t into a softmax output layer of the attentional model; and displaying the score for the first text data, wherein the score is an indication of a descriptive capability of the first text data with respect to the first image. 2 . The method of claim 1 , further comprising: detecting, by the CNN, a first object represented in the first image data; and determining, by the CNN, a matrix F=[f 1 ; . . . ; f K ], wherein f 1 corresponds to a feature representation of at least a portion of the first object and wherein K corresponds to a spatial location of a given feature representation of the matrix F. 3 . The method of claim 2 , further comprising determining a projection v k of the matrix F in a lower dimension; and determining a global image description f g , wherein the first feature data comprises the projection v k and the global image description f g . 4 . A method of scoring user-entered alt-text describing an image with a computing device, the method comprising: receiving, by the computing device, image data for an image; analyzing the image data with a classifier to identify features in the image; receiving, from a user input device alt-text data describing the features in the image; determining, by the computing device, a score for the user-entered alt-text data indicating how well the user-entered text data describes the features in the image; and causing a display of the score to appear on the user input device in association with the entered text data. 5 . The method of claim 4 , further comprising receiving the alt-text data in a field of graphical user interface (GUI) of the user input device, wherein the GUI displays the first image data. 6 . The method of claim 5 , further comprising using a model that receives as inputs the features detected in the image and previously entered alt-text data to determine a probability for a number of words that describe the features in the image and using the model to determine a score for the user-entered alt-text data, wherein the probability is recomputed after each alt-text word is entered by the user. 7 . The method of claim 6 , further comprising: causing the display of the score as one or more of a textual description, color code or numeric indication of how well the user-entered alt-text describes the features of the image on the user input device. 8 . The method of claim 7 , further comprising: using a temporal decoder model as the model to analyze the user-entered alt-text data, wherein the temporal decoder include a hidden representation h t of a first word of user-entered alt-text data based at least in part on a second word of the user-entered alt-text data, wherein the first word follows the second word. 9 . The method of claim 8 , further comprising: determining, by an attentional decoder model, an attentional score comprising a weight emphasizing at least one portion of the image data that corresponds to the hidden representation h t . 10 . The method of claim 9 , further comprising determining probability based at least in part on the attentional score. 11 . The method of claim 4 , further comprising: detecting, by a convolutional neural network (CNN), the features in image data; and determining, by the CNN, a matrix F=[f 1 ; . . . ; f K ], wherein G corresponds to a representation of at least a first feature at a first spatial location in the image data. 12 . The method of claim 4 , further comprising causing a display of the image with the user-entered alt-text data. 13 . A system for scoring text describing an image, the system comprising: at least one processor; and at least one non-transitory computer-readable memory storing instructions that, when executed by the at least one processor, are effective to program the at least one processor to: receive first image data; receive first text data comprising candidate alt-text describing the first image data; determine a first score for the first text data based on a probability that the first text data describes the first image data; and display an indication of the first score in association with the first text data. 14 . The system of claim 13 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: receive the first text data in a field of graphical user interface (GUI), wherein the GUI displays the first image data; and display the first score in association with the first text data and the first image data. 15 . The system of claim 14 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to generate a respective second score for the first text data as each word of the first text data is entered into the field of the GUI. 16 . The system of claim 13 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: generate, by a decoder machine learning model, second text data based at least in part on the first text data and the first image data; and display the first text data and the second text data in a field of a graphical user interface, wherein the second text data comprises suggested text that is descriptive of the first image data. 17 . The system of claim 13 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: determine, by a temporal decoder model, a hidden representation h t of a first word of the first text data based at least in part on a second word of the first text data, wherein the first word follows the second word in the first text data. 18 . The system of claim 17 , the at least one non-transitory computer-readable memory storing further instructions that, when executed by the at least one processor, are further effective to program the at least one processor to: determine, by an attentional decoder model, an attentional score comprising a weight emphasizing at least one portion of the first image data that corresponds to the hidden representation h t . 19 . The system of claim 18 , the at least
Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN] · CPC title
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
Combinations of networks · CPC title
Classification techniques · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.