Object recognition and description using multimodal recurrent neural network

US10970603B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10970603-B2
Application numberUS-201816205768-A
CountryUS
Kind codeB2
Filing dateNov 30, 2018
Priority dateNov 30, 2018
Publication dateApr 6, 2021
Grant dateApr 6, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An embodiment of the invention may include a method, computer program product and computer system for image identification and classification. The method, computer program product and computer system may include a computing device which may receive one or more images of a first object from at least two angles linguistic data associated with the first object. The computing device may input the one or more images of the first object into one or more first neural networks and the linguistic data of the first object into one or more second neural networks. The computing device may combine the output of the one or more first neural networks and the one or more second neural networks and generate an identification model based on the combined output of the one or more first neural networks and the one or more second neural networks.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for image identification and classification, the method comprising: receiving, by a computer device, one or more images of a first object from at least two angles; receiving, by the computing device, linguistic data associated with the first object, wherein the linguistic data of the first object describes the artist, art medium, age, color, symbol, pattern, function, and motif of the first object; inputting, by the computing device, the one or more images of the first object into one or more first neural networks; inputting, by the computing device, the linguistic data of the first object into one or more second neural networks; combining, by the computing device, an output of the one or more first neural networks and the one or more second neural networks; generating, by the computing device, an identification model based on the combined output of the one or more first neural networks and the one or more second neural networks, wherein the identification model generates a linguistic description for an unknown object; receiving, by the computer device, at least one image of an unknown second object, wherein the second object is the unknown object (multiple images from different angles); inputting, by the computer device, the at least one image of the unknown second object into the identification model to generate a linguistic description of the unknown second object; analyzing, by the computer device, the at least one image of the unknown second object to identify different features of the unknown second object; generating, by the computer device, a novel linguistic description identifying the unknown second object based on the identified different feature of the unknown second object, wherein the linguistic description includes a novel description of the unknown second object describing the unknown second object and the identified features of the unknown second object, wherein the generated linguistic description is based on a probability distribution of generating a word given previous linguistic data on the second neural networks and the one or more images on the first neural networks; and displaying, by the computer device, the novel linguistic description identifying the unknown second object to a user. 2. A method as in claim 1 , wherein the first object is a known piece of art. 3. A method as in claim 1 , wherein the one or more first neural networks are deep convolutional neural networks. 4. A method as in claim 1 , wherein the one or more second neural networks are deep recurrent neural networks. 5. A method as in claim 1 , wherein the first object and the second object may comprise one of the group consisting of: a painting, a mural, graffiti, a drawing, a photograph, a tapestry, a stained glass, a glasswork piece, a metalwork piece, a sculpture, a pottery piece, a porcelain piece, a ceramic piece, jewelry, clothing, furniture, and architecture. 6. A computer program product for image identification and classification, the computer program product comprising: a computer-readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions comprising: program instructions to receive, by a computer device, one or more images of a first object from at least two angles; program instructions to receive, by the computing device, linguistic data associated with the first object, wherein the linguistic data of the first object describes the artist, art medium, age, color, symbol, pattern, function, and motif of the first object: program instructions to input, by the computing device, the one or more images of the first object into one or more first neural networks; program instructions to input, by the computing device, the linguistic data of the first object into one or more second neural networks; program instructions to combine, by the computing device, an output of the one or more first neural networks and the one or more second neural networks; program instructions to generate, by the computing device, an identification model based on the combined output of the one or more first neural networks and the one or more second neural networks, wherein the identification model generates a linguistic description for an unknown object; program instructions to receive, by the computer device, at least one image of an unknown second object, wherein the second object is the unknown object (multiple images from different angles); program instructions to input, by the computer device, the at least one image of the unknown second object into the identification model to generate a linguistic description of the unknown second object; program instructions to analyze, by the computer device, the at least one image of the unknown second object to identify different features of the unknown second object; program instructions to generate, by the computer device, a novel linguistic description identifying the unknown second object based on the identified different feature of the unknown second object, wherein the linguistic description includes a novel description of the unknown second object describing the unknown second object and the identified features of the unknown second object, wherein the generated linguistic description is based on a probability distribution of generating a word given previous linguistic data on the second neural networks and the one or more images on the first neural networks; and program instructions to display by the computer device the novel linguistic description identifying the unknown second object to a user. 7. A computer program product as in claim 6 , wherein the first object is a known piece of art. 8. A computer program product as in claim 6 , wherein the one or more first neural networks are deep convolutional neural networks. 9. A computer program product as in claim 6 , wherein the one or more second neural networks are deep recurrent neural networks. 10. A computer system for image identification and classification, the system comprising: one or more computer processors, one or more computer-readable storage media, and program instructions stored on one or more of the computer-readable storage media for execution by at least one of the one or more processors, the program instructions comprising: program instructions to receive, by a computer device, one or more images of a first object from at least two angles; program instructions to receive, by the computing device, linguistic data associated with the first object; program instructions to input, by the computing device, the one or more images of the first object into one or more first neural networks; program instructions to input, by the computing device, the linguistic data of the first object into one or more second neural networks, wherein the linguistic data of the first object describes the artist, art medium, age, color, symbol, pattern, function, and motif of the first object; program instructions to combine, by the computing device, an output of the one or more first neural networks and the one or more second neural networks; program instructions to generate, by the computing device, an identification model based on the combined output of the one or more first neural networks and the one or more second neural networks, wherein the identification model generates a linguistic description for an unknown object; program instructions to receive, by the computer device, at least one image of an unknown second object wherein the second object is the unknown object (multiple images from different angles); program instructions to input, by the computer device, the at leas

Assignees

Inventors

Classifications

  • G06N20/20Primary

    Ensemble learning · CPC title

  • Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title

  • using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Fusion techniques · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10970603B2 cover?
An embodiment of the invention may include a method, computer program product and computer system for image identification and classification. The method, computer program product and computer system may include a computing device which may receive one or more images of a first object from at least two angles linguistic data associated with the first object. The computing device may input the o…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/20. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 06 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).