Method, medium, and system for intelligent online personal assistant with image text localization

US12223533B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12223533-B2
Application numberUS-202117222251-A
CountryUS
Kind codeB2
Filing dateApr 5, 2021
Priority dateNov 11, 2016
Publication dateFeb 11, 2025
Grant dateFeb 11, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems, methods, and computer program products for identifying a candidate product in an electronic marketplace based on a visual comparison between candidate product image visual text content and input query image visual text content. Unlike conventional optical character recognition (OCR) based systems, embodiments automatically localize and isolate portions of a candidate product image and an input query image that each contain visual text content, and calculate a visual similarity measure between the respective portions. A trained neural network may be re-trained to more effectively find visual text content by using the localized and isolated visual text content portions as additional ground truths. The visual similarity measure serves as a visual search result score for the candidate product. Any number of images of any number of candidate products may be compared to an input query image to enable text-in-image based product searching without resorting to conventional OCR techniques.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising: localizing and isolating a candidate portion of a candidate product image with candidate visual text content; retraining a neural network using the candidate portion of the candidate product image; localizing and isolating, via the retrained neural network, an input query portion of an input query image with input query visual text content; calculating a visual similarity measure between the candidate portion and the input query portion based on a distance value between image signatures associated with the candidate product image and the input query image; ranking a candidate product in a product list based on the visual similarity measure; and causing presentation of the product list on a graphical user interface of a client device. 2. The method of claim 1 , wherein the input query image comprises one of: a photograph; a video frame; a sketch; or a diagram. 3. The method of claim 1 , wherein the candidate product image is associated with an electronic marketplace. 4. A non-transitory computer-readable storage medium having embedded therein a set of instructions which, when executed by one or more processors of a computer, causes the computer to execute operations comprising: localizing and isolating a candidate portion of a candidate product image with candidate visual text content; retraining a neural network using the candidate portion of the candidate product image; localizing and isolating, via the retrained neural network, an input query portion of an input query image with input query visual text content; calculating a visual similarity measure between the respective portions of the candidate product image and the input query image based on a distance value between image signatures associated with the candidate product image and the input query image; ranking a candidate product in a product list based on the visual similarity measure; and causing presentation of the product list on a graphical user interface of a client device. 5. A system comprising: at least one processor; and memory encoding computer-executable instructions that, when executed by the at least one processor, cause the system to perform operations comprising: localizing and isolating a candidate portion of a candidate product image with candidate visual text content; retraining a neural network using the candidate portion of the candidate product image; localizing and isolating, via the retrained neural network, an input query portion of an input query image with input query visual text content; calculating a visual similarity measure between the candidate portion and the input query portion based on a distance value between image signatures associated with the candidate product image and the input query image; ranking a candidate product in a product list based on the visual similarity measure; and causing presentation of the product list on a graphical user interface of a client device. 6. The system of claim 5 , wherein the input query image comprises one of: a photograph; a video frame; a sketch; or a diagram. 7. The method of claim 1 , wherein localizing and isolating the respective portions of the candidate product image and the input query image comprises localizing and isolating, using a machine learning model, the respective portions of the candidate product image and the input query image. 8. The method of claim 7 , wherein the machine learning model comprises a deep neural network. 9. The method of claim 7 , further comprising retraining the machine learning model using a new image of a new product provided to an electronic marketplace. 10. The method of claim 1 , wherein the image signatures comprise binary vectors, and the distance value comprises one or more bits that are different in the binary vectors. 11. The non-transitory computer-readable storage medium of claim 4 , wherein the input query image comprises one of: a photograph; a video frame; a sketch; or a diagram. 12. The non-transitory computer-readable storage medium of claim 4 , wherein the candidate product image is associated with an electronic marketplace. 13. The non-transitory computer-readable storage medium of claim 4 , wherein localizing and isolating the respective portions of the candidate product image and the input query image comprises localizing and isolating, using a machine learning model, the respective portions of the candidate product image and the input query image. 14. The non-transitory computer-readable storage medium of claim 13 , wherein the machine learning model comprises a deep neural network. 15. The non-transitory computer-readable storage medium of claim 13 , wherein the operations further comprise retraining the machine learning model using a new image of a new product provided to an electronic marketplace. 16. The non-transitory computer-readable storage medium of claim 4 , wherein the image signatures comprise binary vectors, and the distance value comprises one or more bits that are different in the binary vectors. 17. The system of claim 5 , wherein the candidate product image is associated with an electronic marketplace. 18. The system of claim 5 , wherein localizing and isolating the respective portions of the candidate product image and the input query image comprises localizing and isolating, using a machine learning model, the respective portions of the candidate product image and the input query image. 19. The system of claim 18 , wherein the operations further comprise retraining the machine learning model using a new image of a new product provided to an electronic marketplace. 20. The system of claim 5 , wherein the image signatures comprise binary vectors, and the distance value comprises one or more bits that are different in the binary vectors.

Assignees

Inventors

Classifications

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Industrial image inspection · CPC title

  • Industrial image inspection · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12223533B2 cover?
Systems, methods, and computer program products for identifying a candidate product in an electronic marketplace based on a visual comparison between candidate product image visual text content and input query image visual text content. Unlike conventional optical character recognition (OCR) based systems, embodiments automatically localize and isolate portions of a candidate product image and …
Who is the assignee on this patent?
Ebay Inc
What technology area does this patent fall under?
Primary CPC classification G06Q30/0625. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 11 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).