Visual intent triggering for visual search

US12346370B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12346370-B2
Application numberUS-201816036224-A
CountryUS
Kind codeB2
Filing dateJul 16, 2018
Priority dateJul 16, 2018
Publication dateJul 1, 2025
Grant dateJul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Representative embodiments disclose mechanisms to perform visual intent classification or visual intent detection or both on an image. Visual intent classification utilizes a trained machine learning model that classifies subjects in the image according to a classification taxonomy. The visual intent classification can be used as a pre-triggering mechanism to initiate further action in order to substantially save processing time. Example further actions include user scenarios, query formulation, user experience enhancement, and so forth. Visual intent detection utilizes a trained machine learning model to identify subjects in an image, place a bounding box around the image, and classify the subject according to the taxonomy. The trained machine learning model utilizes multiple feature detectors, multi-layer predictions, multilabel classifiers, and bounding box regression.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method, comprising: receiving an image as a query at a computer-implemented search engine, wherein the image includes an object; in response to receiving the image as the query, submitting the image to a multilabel classifier of the computer-implemented search engine, where the multilabel classifier is configured to: identify a plurality of objects in the image; place bounding boxes in the image, where each of the bounding boxes substantially bounds a corresponding object; and assign at least one classification label to each bounding box to identify the corresponding objects in the images; passing at least one classification label and an associated bounding box to a trained suppression model of the computer-implemented search engine, the trained suppression model computing scores for the bounding box and suppressing at least one classification label along with its associated bounding box based upon the scores; based on an unsuppressed classification label, selecting, by the computer-implemented search engine, a user intent scenario from amongst a predefined set of user intent scenarios, wherein the user intent scenario in the predefined set of user intent scenarios is selectable due to the classification label being assigned to the user intent scenario; generating, by the computer-implemented search engine, a query suggestion for review by a user who issued the image as the query, wherein the query suggestion is generated based upon the selected user intent scenario; subsequent to generating the query suggestion, receiving, by the computer-implemented search engine, an indication that the query suggestion has been selected by the user; and providing, by the computer-implemented search engine, output that is based upon the user intent scenario. 2. The method of claim 1 wherein the multilabel classifier comprises a MobileNet backbone trained using an error function comprising two multilabel classification losses, a first multilabel classification loss being a multilabel elementwise sigmoid loss and a second multilabel classification loss being a multilabel softmax loss. 3. The method of claim 1 wherein the multilabel classifier is trained using a cross-entropy loss given by E = - 1 n ⁢ ∑ n = 1 N ⁢ ⁢ [ p n ⁢ ⁢ log ⁢ p ^ n + ( 1 - p n ) ⁢ ⁢ log ⁡ ( 1 - p ^ n ) ] . 4. The method of claim 1 wherein the user intent scenario is a visual search, the method further comprising outputting images that have the classification label assigned thereto. 5. The method of claim 1 wherein multiple classification labels for the image are received from the multilabel classifier, the method further comprising selecting the classification label from amongst the multiple classification labels based upon a determination that the object is of interest to the user. 6. The method of claim 1 wherein the user intent scenario is performance of a search, wherein a query that includes the classification is constructed, the method further comprising outputting search results identified based upon the query. 7. A computing system comprising: a processor; and memory storing instructions that, when executed by the processor, cause the processor to perform acts comprising: receiving an image as a query by a computer-implemented search engine, wherein the image includes an object; in response to receiving the image as the query, submitting the image to a multilabel classifier, where the multilabel classifier is configured to: identify a plurality of objects in the image; place bounding boxes in the image, where each of the bounding boxes substantially bounds a corresponding object; and assign at least one classification label to each bounding box to identify the corresponding objects in the images; passing classification labels and associated bounding boxes to a trained suppression model, where the trained suppression model computes scores for the bounding boxes and the associated classification labels and suppresses a classification label and its associated bounding box based upon the scores; based on an unsuppressed classification label, selecting a user intent scenario from amongst a predefined set of user intent scenarios, wherein the user intent scenario in the predefined set of user intent scenarios is selectable due to the unsuppressed classification label being assigned to the user intent scenario; generating a query suggestion for review by a user who issued the image as the query, wherein the query suggestion is generated based upon the selected user intent scenario; subsequent to generating the query suggestion, receiving, by the search engine, an indication that the query suggestion has been selected by the user; and providing, by the search engine, output that is based upon the user intent scenario. 8. The computing system of claim 7 wherein the multilabel classifier comprises a MobileNet backbone trained using an error function comprising two multilabel classification losses, a first multilabel classification loss being a multilabel elementwise sigmoid loss and a second multilabel classification loss being a multilabel sof

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • Classification techniques · CPC title

  • into predefined classes · CPC title

  • Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12346370B2 cover?
Representative embodiments disclose mechanisms to perform visual intent classification or visual intent detection or both on an image. Visual intent classification utilizes a trained machine learning model that classifies subjects in the image according to a classification taxonomy. The visual intent classification can be used as a pre-triggering mechanism to initiate further action in order to…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F16/50. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).