Training data acquisition apparatus, training apparatus, and training data acquiring method

US11741153B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11741153-B2
Application numberUS-202117249359-A
CountryUS
Kind codeB2
Filing dateFeb 26, 2021
Priority dateAug 27, 2020
Publication dateAug 29, 2023
Grant dateAug 29, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to one embodiment, an apparatus includes a first acquisition unit, a second acquisition unit, an identification unit, and an output unit. The first acquisition unit acquires a query image and a query text relating to a target object. The second acquisition unit acquires candidate images of the target object. The identification unit identifies from the candidate images a positive image containing a region demonstrating a similarity to the query image higher than or equal to a first threshold value, and identifies a position of the region in the positive image. The output unit outputs training data including the positive image, information representing the position of the region, and a correct label.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus comprising: at least one processor configured to: acquire a query image and a query text relating to a target object; acquire candidate images of the target object, using the query text; using the query image, identify from the candidate images a positive image containing a region demonstrating a similarity to the query image higher than or equal to a first threshold value, and identify a position of the region in the positive image; output training data including the positive image, information representing the position of the region in the positive image, and a correct label based on the query text; and train an object detection model for outputting information representing a position of the target object in an input image and the correct label, using the training data, wherein the candidate images include a first candidate image and a second candidate image, and the at least one processor is further configured to: acquire the first candidate image from a database storing an image group by conducting a search using the query text, and acquire the second candidate image by conducting a search using a query other than the query text; identify, from the candidate images, a negative image containing no region demonstrating a similarity to the query image higher than or equal to a second threshold value; and output negative data including the negative image and a label different from the correct label. 2. The apparatus according to claim 1 , wherein the at least one processor is further configured to: sort the candidate images into positive candidate images and negative candidate images in accordance with the similarity to the query image; and identify as the positive image an image containing a region demonstrating a similarity to the query image higher than or equal to the first threshold value from the positive candidate images, identify a position of the region from the positive image, and identify as the negative image an image containing no region demonstrating a similarity to the query image higher than or equal to the second threshold value from the negative candidate images. 3. The apparatus according to claim 1 , wherein the at least one processor is configured to acquire the query text based on at least one of input characters, an input image, input sound, or the query image. 4. An apparatus comprising: a first acquisition unit configured to acquire a query image and a query text relating to a target object; a second acquisition unit configured to acquire candidate images of the target object, using the query text; an identification unit configured to, using the query image, identify from the candidate images a positive image containing a region demonstrating a similarity to the query image higher than or equal to a threshold value, and identify a position of the region in the positive image; a training data output unit configured to output training data including the positive image, information representing the position of the region in the positive image, and a correct label based on the query text; and a training unit configured to train an object detection model for outputting information representing a position of the target object in an input image and the correct label, using the training data output from the training data output unit, wherein the candidate images include a first candidate image and a second candidate image, the second acquisition unit is configured to acquire the first candidate image from a database storing an image group by conducting a search using the query text, and acquire the second candidate image by conducting a search using a query other than the query text, the identification unit further identifies from the candidate images a negative image containing no region demonstrating a similarity to the query image higher than or equal to a second threshold value, and the training data output unit further outputs negative data including the negative image and a label different from the correct label. 5. The apparatus according to claim 4 , wherein the identification unit comprises: a sort unit configured to sort the candidate images into positive candidate images and negative candidate images in accordance with the similarity to the query image; and a region identification unit configured to identify as the positive image an image containing a region demonstrating a similarity to the query image higher than or equal to the first threshold value from the positive candidate images, identify a position of the region from the positive image, and identify as the negative image an image containing no region demonstrating a similarity to the query image higher than or equal to the second threshold value from the negative candidate images. 6. The apparatus according to claim 4 , wherein the first acquisition unit acquires the query text based on at least one of input characters, an input image, input sound, or the query image. 7. A method comprising: acquiring a query image and a query text relating to a target object; acquiring candidate images of the target object, using the query text; identifying, using the query image, from the candidate images a positive image containing a region demonstrating a similarity to the query image higher than or equal to a threshold value, and identifying a position of the region in the positive image; outputting training data including the positive image, information representing the position of the region in the positive image, and a correct label based on the query text; and training an object detection model for outputting information representing a position of the target object in an input image and the correct label, using the training data, wherein the candidate images include a first candidate image and a second candidate image, the acquiring includes acquiring the first candidate image from a database storing an image group by conducting a search using the query text, and acquiring the second candidate image by conducting a search using a query other than the query text, the identifying further includes identifying from the candidate images a negative image containing no region demonstrating a similarity to the query image higher than or equal to a second threshold value, and the outputting includes further outputting negative data including the negative image and a label different from the correct label. 8. The method according to claim 7 , wherein the identifying includes: sorting the candidate images into positive candidate images and negative candidate images in accordance with the similarity to the query image; and identifying as the positive image an image containing a region demonstrating a similarity to the query image higher than or equal to the first threshold value from the positive candidate images, identifying a position of the region from the positive image, and identifying as the negative image an image containing no region demonstrating a similarity to the query image higher than or equal to the second threshold value from the negative candidate images. 9. The method according to claim 7 , wherein the acquiring includes acquiring the query text based on at least one of input characters, an input image, input sound, or the query image. 10. A non-transitory computer readable medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method comprising: acquiring a query image and a query text relating to a target object; acquiring candidate images of the target object, using the query text; identifying, using the query image, from the candidate images a

Assignees

Inventors

Classifications

  • G06F16/532Primary

    Query formulation, e.g. graphical querying · CPC title

  • Indexing; Data structures therefor; Storage structures · CPC title

  • Presentation of query results · CPC title

  • Clustering; Classification · CPC title

  • using extracted text · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11741153B2 cover?
According to one embodiment, an apparatus includes a first acquisition unit, a second acquisition unit, an identification unit, and an output unit. The first acquisition unit acquires a query image and a query text relating to a target object. The second acquisition unit acquires candidate images of the target object. The identification unit identifies from the candidate images a positive image…
Who is the assignee on this patent?
Toshiba Kk
What technology area does this patent fall under?
Primary CPC classification G06F16/532. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Aug 29 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).