Intelligent media transcription

US12033619B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12033619-B2
Application numberUS-202017095797-A
CountryUS
Kind codeB2
Filing dateNov 12, 2020
Priority dateNov 12, 2020
Publication dateJul 9, 2024
Grant dateJul 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The exemplary embodiments disclose a method, a computer program product, and a computer system for transcribing media. The exemplary embodiments may include collecting media, extracting one or more features from the media, and transcribing the media based on the extracted one or more features and one or more models.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for transcribing media, the method comprising: collecting media of a user, wherein the media comprises content of a presentation given by the user; extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and long short-term memory to parse the collected media and extract the one or more features, and wherein the one or more extracted features comprise one or more speech features; determining, using one or more models, a transcription style based on the one or more extracted speech features, wherein the one or more models are trained through use of a feedback loop to weight the one or more extracted speech features such that features having a greater correlation with determined particular transcription styles are weighted greater than other features, and wherein the transcription style specifies a transcription format; transcribing, using the one or more models, the media, according to the determined transcription style, based on the one or more features, and their associated weights, wherein one or more text portions of the transcription are highlighted and bolded based on respective importance values extracted from the one or more features, and wherein an importance value of a text portion of the transcription indicates whether or not a topic of the text portion will be on an exam; notifying the user of the highlighted and bolded transcription in the determined particular style via a device of the user, wherein the notifying is performed according to preferences of the user; and receiving, from the user, confirmation of an accuracy of the transcription and approval of the transcription prior to notifying one or more other users of the transcription. 2. The method of claim 1 , wherein the one or more models correlate the one or more features with an appropriate transcription style and appropriately transcribing the media. 3. The method of claim 1 , further comprising receiving feedback indicative of whether the transcription was accurate; and adjusting the one or more models based on the received feedback. 4. The method of claim 1 , further comprising: collecting training data; extracting training features from the training data; and training the one or more models based on the extracted training features. 5. The method of claim 1 , wherein the transcription style is selected from a group comprising a transcription, outline, summary, presentation with notes, blog with comments, and tutorial with examples. 6. The method of claim 1 , wherein: the user is notified of the transcription along with audio or video of the media; and the transcription notification is synchronized with the audio or video of the media, wherein the synchronization is based on the media's content. 7. The method of claim 1 , wherein the transcription includes one or more timestamps. 8. The method of claim 1 , wherein the transcription is searchable by the user. 9. The method of claim 1 , wherein the one or more features include topics, importance, frequency, vocabulary, tones, moods, pointing, waving, facial expressions, eye direction, and eye movement. 10. A computer program product for transcribing media, the computer program product comprising: one or more non-transitory computer-readable storage media and program instructions stored on the one or more non-transitory computer-readable storage media capable of performing a method, the method comprising: collecting media of a user, wherein the media comprises content of a presentation given by the user; extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and long short-term memory to parse the collected media and extract the one or more features, and wherein the one or more extracted features comprise one or more speech features; determining, using one or more models, a transcription style based on the one or more extracted speech features, wherein the one or more models are trained through use of a feedback loop to weight the one or more extracted speech features such that features having a greater correlation with determined particular transcription styles are weighted greater than other features, and wherein the transcription style specifies a transcription format; transcribing, using the one or more models, the media, according to the determined transcription style, based on the one or more features and their associated weights, wherein one or more text portions of the transcription are highlighted and bolded based on respective importance values extracted from the one or more features, and wherein an importance value of a text portion of the transcription indicates whether or not a topic of the text portion will be on an exam; notifying the user of the highlighted and bolded transcription in the determined particular style via a device of the user, wherein the notifying is performed according to preferences of the user; and receiving, from the user, confirmation of an accuracy of the transcription and approval of the transcription prior to notifying one or more other users of the transcription. 11. The computer program product of claim 10 , wherein the one or more models correlate the one or more features with an appropriate transcription style and appropriately transcribing the media. 12. A computer system for transcribing media, the computer system comprising: one or more computer processors, one or more computer-readable storage media, and program instructions stored on the one or more of the computer-readable storage media for execution by at least one of the one or more processors capable of performing a method, the method comprising: collecting media of a user, wherein the media comprises content of a presentation given by the user; extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and long short-term memory to parse the collected media and extract the one or more features, and wherein the one or more extracted features comprise one or more speech features; determining, using one or more models, a transcription style based on the one or more extracted speech features, wherein the one or more models are trained through use of a feedback loop to weight the one or more extracted speech features such that features having a greater correlation with determined particular transcription styles are weighted greater than other features, and wherein the transcription style specifies a transcription format; transcribing, using the one or more models, the media, according to the determined transcription style, based on the one or more features and their associated weights, wherein one or more text portions of the transcription are highlighted and bolded based on respective importance-values extracted from the one or more features, and wherein an importance value of a text portion of the transcription indicates whether or not a topic of the text portion will be on an exam; notifying the user of the highlighted and bolded transcription in the determined particular style via a device of the user, wherein the notifying is performed according to preferences of the user; and receiving, from the user, confirmation of an accuracy of the transcription and approval of the transcription prior to notifying one or more other users of the transcription. 13. The computer system of claim 12 , wherein the one or more models correlate the one or more features w

Assignees

Inventors

Classifications

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Training · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12033619B2 cover?
The exemplary embodiments disclose a method, a computer program product, and a computer system for transcribing media. The exemplary embodiments may include collecting media, extracting one or more features from the media, and transcribing the media based on the extracted one or more features and one or more models.
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).