Audio tagging

US9304657B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9304657-B2
Application numberUS-201414311851-A
CountryUS
Kind codeB2
Filing dateJun 23, 2014
Priority dateDec 31, 2013
Publication dateApr 5, 2016
Grant dateApr 5, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Various embodiments are provided for enabling audio tagging of image files. The audio messages are obtained by the system, usually by recording an audio message from a user, and then converted into a textual tag, using speech recognition technology. In some implementations semantic analysis of text component of these massages is performed. In some implementations the textual tags are then propagated to other image files associated with the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A method implemented by a data processing apparatus, the method comprising: obtaining, at one or more processors, an audio message associated with one or more image files, wherein the obtaining comprises: detecting that a first image file is being displayed on a device associated with a user, determining a first period of time when the first image file is displayed on the device associated with the user, and time stamping the obtained audio message; processing, at the one or more processors, the audio message using speech recognition technology to detect a text component of the audio message; determining, at the one or more processors, one or more textual tags for the one or more image files based on the detected text component, wherein the determining comprises determining a first portion of the detected text component corresponding to the first period of time using the time stamps of the obtained audio message and identifying a first set of the one or more textual tags that were determined based on the first portion of the detected text component; and assigning, at the one or more processors, the one or more textual tags to the one or more image files, wherein the assigning comprises assigning one or more of the textual tags from the first set of the one or more textual tags to the first image file. 2. The method of claim 1 , wherein the determining of the one or more textual tags comprises performing semantic analysis of the text component. 3. The method of claim 2 , wherein the semantic analysis of the text component comprises identifying one or more semantic classes for one or more portions of the detected text component; performing semantic clustering of the portions of the detected text components; and wherein the determining one or more textual tags for the one or more image files is at least partially based on the semantic clustering of the portions of the detected text. 4. The method of claim 1 , wherein the one or more image files are from a plurality of image files associated with a user, the method further comprising: assigning the one or more textual tags to a second image file from the plurality of image files associated with the user based on a comparison of one or more properties of the one or more image files and the second image file. 5. The method of claim 4 , wherein the one or more properties of the one or more image files and the second image file are selected from the following group: file name, file location, file metadata, file creation date, file size, geographical location of a place where the image was captured, and file image analysis results. 6. The method of claim 1 , wherein the one or more image files are digital photographs. 7. The method of claim 1 , wherein the one or more image files are digital video files. 8. The method of claim 1 , wherein the determining the one or more textual tags comprises selecting the one or more textual tags from a tag library. 9. The method of claim 1 , wherein the assigning the one or more textual tags to the one or more image files comprises assigning the one or more textual tags to a portion of an image or group of images in the one or more image files. 10. A system comprising: a machine-readable storage device having instructions stored thereon; and a data processing apparatus in communication with the machine-readable storage device and operable to execute the instructions to perform operations comprising: obtaining an audio message associated with one or more image files, wherein the obtaining comprises detecting that a first image file is being displayed on a device associated with a user, determining a first period of time when the first image file is displayed on the device associated with the user, and time stamping the obtained audio message; processing the audio message using speech recognition technology to detect a text component of the audio message; determining one or more textual tags for the one or more image files based on the detected text component, wherein the determining comprises determining a first portion of the detected text component corresponding to the first period of time using the time stamps of the obtained audio message and identifying a first set of the one or more textual tags that were determined based on the first portion of the detected text component; and assigning the one or more textual tags to the one or more image files, wherein the assigning comprises assigning one or more of the textual tags from the first set of the one or more textual tags to the first image file. 11. The system of claim 10 , wherein the determining of the one or more textual tags comprises performing semantic analysis of the text component. 12. The system of claim 11 , wherein the semantic analysis of the text component comprises identifying one or more semantic classes for one or more portions of the detected text component; performing semantic clustering of the portions of the detected text components; and wherein the determining one or more textual tags for the one or more image files is at least partially based on the semantic clustering of the portions of the detected text. 13. The system of claim 10 , wherein the one or more image files are from a plurality of image files associated with a user, the method further comprising: assigning the one or more textual tags to a second image file from the plurality of image files associated with the user based on a comparison of one or more properties of the one or more image files and the second image file. 14. The system of claim 13 , wherein the one or more properties of the one or more image files and the second image file are selected from the following group: file name, file location, file metadata, file creation date, file size, geographical location of a place where the image was captured, and file image analysis results. 15. The system of claim 10 , wherein the one or more image files are digital photographs. 16. The system of claim 10 , wherein the one or more image files are digital video files. 17. The system of claim 10 , wherein the determining the one or more textual tags comprises selecting the one or more textual tags from a tag library. 18. The system of claim 10 , wherein the assigning the one or more textual tags to the one or more image files comprises assigning the one or more textual tags to a portion of an image or group of images in the one or more image files. 19. A storage device having instructions stored thereon that, when executed by a data processing apparatus, cause the data processing apparatus to perform operations comprising: obtaining an audio message associated with one or more image files, wherein the obtaining comprises detecting that a first image file is being displayed on a device associated with a user, determining a first period of time when the first image file is displayed on the device associated with the user, and time stamping the obtained audio message; processing the audio message using speech recognition technology to detect a text component of the audio message; determining one or more textual tags for the one or more image files based on the detected text component, wherein the determining comprises determining a first portion of the detected text component corresponding to the first period of time using the time stamps of the obtained audio message and identifying a first set of the one or more textual tags that were determined based on the first portion of the detected text component; and assigning the one or m

Assignees

Inventors

Classifications

  • Formatting, i.e. changing of presentation of documents (automatic justification G06F40/189; automatic line break hyphenation G06F40/191) · CPC title

  • using information manually generated, e.g. tags, keywords, comments, manually generated location and time information · CPC title

  • Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs (query results presentation G06F16/156) · CPC title

  • Trees, e.g. B+trees · CPC title

  • Computer-aided management of electronic mailing [e-mailing] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9304657B2 cover?
Various embodiments are provided for enabling audio tagging of image files. The audio messages are obtained by the system, usually by recording an audio message from a user, and then converted into a textual tag, using speech recognition technology. In some implementations semantic analysis of text component of these massages is performed. In some implementations the textual tags are then propa…
Who is the assignee on this patent?
Yan David, Anisimovich Konstantin, Abbyy Dev Llc
What technology area does this patent fall under?
Primary CPC classification G06F3/04817. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 05 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).