Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G06F16/50. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Visual intent triggering for visual search

Patent metadata
Field	Value
Publication number	US-12346370-B2
Application number	US-201816036224-A
Country	US
Kind code	B2
Filing date	Jul 16, 2018
Priority date	Jul 16, 2018
Publication date	Jul 1, 2025
Grant date	Jul 1, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Representative embodiments disclose mechanisms to perform visual intent classification or visual intent detection or both on an image. Visual intent classification utilizes a trained machine learning model that classifies subjects in the image according to a classification taxonomy. The visual intent classification can be used as a pre-triggering mechanism to initiate further action in order to substantially save processing time. Example further actions include user scenarios, query formulation, user experience enhancement, and so forth. Visual intent detection utilizes a trained machine learning model to identify subjects in an image, place a bounding box around the image, and classify the subject according to the taxonomy. The trained machine learning model utilizes multiple feature detectors, multi-layer predictions, multilabel classifiers, and bounding box regression.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer implemented method, comprising: receiving an image as a query at a computer-implemented search engine, wherein the image includes an object; in response to receiving the image as the query, submitting the image to a multilabel classifier of the computer-implemented search engine, where the multilabel classifier is configured to: identify a plurality of objects in the image; place bounding boxes in the image, where each of the bounding boxes substantially bounds a corresponding object; and assign at least one classification label to each bounding box to identify the corresponding objects in the images; passing at least one classification label and an associated bounding box to a trained suppression model of the computer-implemented search engine, the trained suppression model computing scores for the bounding box and suppressing at least one classification label along with its associated bounding box based upon the scores; based on an unsuppressed classification label, selecting, by the computer-implemented search engine, a user intent scenario from amongst a predefined set of user intent scenarios, wherein the user intent scenario in the predefined set of user intent scenarios is selectable due to the classification label being assigned to the user intent scenario; generating, by the computer-implemented search engine, a query suggestion for review by a user who issued the image as the query, wherein the query suggestion is generated based upon the selected user intent scenario; subsequent to generating the query suggestion, receiving, by the computer-implemented search engine, an indication that the query suggestion has been selected by the user; and providing, by the computer-implemented search engine, output that is based upon the user intent scenario. 2. The method of claim 1 wherein the multilabel classifier comprises a MobileNet backbone trained using an error function comprising two multilabel classification losses, a first multilabel classification loss being a multilabel elementwise sigmoid loss and a second multilabel classification loss being a multilabel softmax loss. 3. The method of claim 1 wherein the multilabel classifier is trained using a cross-entropy loss given by E = - 1 n ⁢ ∑ n = 1 N ⁢ ⁢ [ p n ⁢ ⁢ log ⁢ p ^ n + ( 1 - p n ) ⁢ ⁢ log ⁡ ( 1 - p ^ n ) ] . 4. The method of claim 1 wherein the user intent scenario is a visual search, the method further comprising outputting images that have the classification label assigned thereto. 5. The method of claim 1 wherein multiple classification labels for the image are received from the multilabel classifier, the method further comprising selecting the classification label from amongst the multiple classification labels based upon a determination that the object is of interest to the user. 6. The method of claim 1 wherein the user intent scenario is performance of a search, wherein a query that includes the classification is constructed, the method further comprising outputting search results identified based upon the query. 7. A computing system comprising: a processor; and memory storing instructions that, when executed by the processor, cause the processor to perform acts comprising: receiving an image as a query by a computer-implemented search engine, wherein the image includes an object; in response to receiving the image as the query, submitting the image to a multilabel classifier, where the multilabel classifier is configured to: identify a plurality of objects in the image; place bounding boxes in the image, where each of the bounding boxes substantially bounds a corresponding object; and assign at least one classification label to each bounding box to identify the corresponding objects in the images; passing classification labels and associated bounding boxes to a trained suppression model, where the trained suppression model computes scores for the bounding boxes and the associated classification labels and suppresses a classification label and its associated bounding box based upon the scores; based on an unsuppressed classification label, selecting a user intent scenario from amongst a predefined set of user intent scenarios, wherein the user intent scenario in the predefined set of user intent scenarios is selectable due to the unsuppressed classification label being assigned to the user intent scenario; generating a query suggestion for review by a user who issued the image as the query, wherein the query suggestion is generated based upon the selected user intent scenario; subsequent to generating the query suggestion, receiving, by the search engine, an indication that the query suggestion has been selected by the user; and providing, by the search engine, output that is based upon the user intent scenario. 8. The computing system of claim 7 wherein the multilabel classifier comprises a MobileNet backbone trained using an error function comprising two multilabel classification losses, a first multilabel classification loss being a multilabel elementwise sigmoid loss and a second multilabel classification loss being a multilabel sof

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06F18/24
Classification techniques · CPC title
G06F16/353
into predefined classes · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 67303506

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12346370B2 cover?: Representative embodiments disclose mechanisms to perform visual intent classification or visual intent detection or both on an image. Visual intent classification utilizes a trained machine learning model that classifies subjects in the image according to a classification taxonomy. The visual intent classification can be used as a pre-triggering mechanism to initiate further action in order to…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G06F16/50. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 01 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Automated generation of pre-labeled training data

Image processing system to detect objects of interest

System and method for ranking search engine results

Task-focused search by image

Method and apparatus for providing visual search engine results

Natural language image search

Training image sampling

Frequently asked questions