Generating accurate and natural captions for figures
US-11494431-B2 · Nov 8, 2022 · US
US12182525B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12182525-B2 |
| Application number | US-202217580951-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 21, 2022 |
| Priority date | Jan 21, 2022 |
| Publication date | Dec 31, 2024 |
| Grant date | Dec 31, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Methods, apparatus, and processor-readable storage media for automatically generating context-based alternative text using artificial intelligence techniques are provided herein. An example computer-implemented method includes generating text captions for an image derived from a web page by processing the image using an artificial intelligence-based image captioning model; determining context information pertaining to the image by processing the image using an artificial intelligence-based context and emotion recognition library; generating context-based alternative text for at least a portion of the image by processing, using at least one artificial intelligence-based alternative text generation model, at least a portion of one or more of the generated text caption(s) for the image and the determined context information pertaining to at least a portion of the image; and performing one or more automated actions based on the generated context-based alternative text.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: generating one or more text captions for an image relating to a web page by processing at least a portion of the image using at least one artificial intelligence-based image captioning model; determining context information pertaining to at least a portion of the image by processing one or more portions of the image using at least one artificial intelligence-based context and emotion recognition library; generating context-based alternative text for at least a portion of the image by processing, using at least one artificial intelligence-based alternative text generation model, at least a portion of one or more of the one or more generated text captions for the image and the determined context information pertaining to at least a portion of the image; and performing one or more automated actions based at least in part on the generated context-based alternative text, wherein performing one or more automated actions comprises: automatically inserting at least a portion of the generated context-based alternative text into at least one particular portion of application code of the web page, and wherein the at least one particular portion of the application code is determined by parsing content contained within the application code for one or more code-related identifiers; and automatically updating at least a portion of the one or more code-related identifiers associated with the at least one particular portion of the application code; wherein the method is performed by at least one processing device comprising a processor coupled to a memory. 2. The computer-implemented method of claim 1 , further comprising: automatically training the at least one artificial intelligence-based alternative text generation model using at least one of one or more supervised learning techniques and one or more unsupervised learning techniques. 3. The computer-implemented method of claim 1 , wherein performing one or more automated actions comprises: obtaining user feedback pertaining to the generated context-based alternative text; and automatically training, using at least a portion of the user feedback, one or more of the at least one artificial intelligence-based image captioning model, the at least one artificial intelligence-based alternative text generation model, and the at least one artificial intelligence-based context and emotion recognition library. 4. The computer-implemented method of claim 1 , wherein generating context-based alternative text for at least a portion of the image comprises updating an existing set of alternative text for the at least a portion of the image. 5. The computer-implemented method of claim 1 , wherein determining context information pertaining to at least a portion of the image comprises: identifying at least one of one or more facial gestures and one or more body gestures in the image; and determining one or more emotional indications derived from the at least one of one or more identified facial gestures and one or more identified body gestures. 6. The computer-implemented method of claim 1 , wherein processing at least a portion of the image using at least one artificial intelligence-based image captioning model comprises processing the at least a portion of the image using one or more deep learning models. 7. The computer-implemented method of claim 6 , wherein processing the at least a portion of the image using one or more deep learning models comprises processing the at least a portion of the image using at least one of one or more convolutional neural networks, one or more residual neural networks, and one or more deep neural networks. 8. The computer-implemented method of claim 1 , wherein determining context information pertaining to at least a portion of the image comprises automatically identifying one or more actions depicted in the at least a portion of the image. 9. The computer-implemented method of claim 1 , wherein determining context information pertaining to at least a portion of the image comprises automatically identifying one or more scenery variables depicted in the at least a portion of the image. 10. The computer-implemented method of claim 1 , wherein determining context information pertaining to at least a portion of the image comprises automatically identifying one or more event types depicted in the at least a portion of the image. 11. The computer-implemented method of claim 1 , further comprising: extracting text from the image by processing one or more portions of the image using at least one artificial intelligence-based optical character recognition model; and automatically training the at least one artificial intelligence-based alternative text generation model using at least a portion of one or more of the one or more generated text captions for the image, the extracted text from the image, and the determined context information pertaining to at least a portion of the image. 12. A non-transitory processor-readable storage medium having stored therein program code of one or more software programs, wherein the program code when executed by at least one processing device causes the at least one processing device: to generate one or more text captions for an image relating to a web page by processing at least a portion of the image using at least one artificial intelligence-based image captioning model; to determine context information pertaining to at least a portion of the image by processing one or more portions of the image using at least one artificial intelligence-based context and emotion recognition library; to generate context-based alternative text for at least a portion of the image by processing, using at least one artificial intelligence-based alternative text generation model, at least a portion of one or more of the one or more generated text captions for the image and the determined context information pertaining to at least a portion of the image; and to perform one or more automated actions based at least in part on the generated context-based alternative text, wherein performing one or more automated actions comprises: automatically inserting at least a portion of the generated context-based alternative text into at least one particular portion of application code of the web page, wherein the at least one particular portion of the application code is determined by parsing content contained within the application code for one or more code-related identifiers; and automatically updating at least a portion of the one or more code-related identifiers associated with the at least one particular portion of the application code. 13. The non-transitory processor-readable storage medium of claim 12 , wherein performing one or more automated actions comprises: obtaining user feedback pertaining to the generated context-based alternative text; and automatically training, using at least a portion of the user feedback, one or more of the at least one artificial intelligence-based image captioning model, the at least one artificial intelligence-based alternative text generation model, and the at least one artificial intelligence-based context and emotion recognition library. 14. The non-transitory processor-readable storage medium of claim 12 , wherein generating context-based alternative text for at least a portion of the image comprises updating an existing set of alternative text for the at least a portion of the image. 15. An apparatus comprising: at least one processing device comprising a processor coupled to a memory; the at least one processing device being configured
characterised by the processing or recognition method (segmentation of character regions G06V30/148) · CPC title
Region-based matching · CPC title
Natural language generation · CPC title
Semantic analysis · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.