What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Intelligent media transcription

US12033619B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12033619-B2
Application number	US-202017095797-A
Country	US
Kind code	B2
Filing date	Nov 12, 2020
Priority date	Nov 12, 2020
Publication date	Jul 9, 2024
Grant date	Jul 9, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The exemplary embodiments disclose a method, a computer program product, and a computer system for transcribing media. The exemplary embodiments may include collecting media, extracting one or more features from the media, and transcribing the media based on the extracted one or more features and one or more models.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for transcribing media, the method comprising: collecting media of a user, wherein the media comprises content of a presentation given by the user; extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and long short-term memory to parse the collected media and extract the one or more features, and wherein the one or more extracted features comprise one or more speech features; determining, using one or more models, a transcription style based on the one or more extracted speech features, wherein the one or more models are trained through use of a feedback loop to weight the one or more extracted speech features such that features having a greater correlation with determined particular transcription styles are weighted greater than other features, and wherein the transcription style specifies a transcription format; transcribing, using the one or more models, the media, according to the determined transcription style, based on the one or more features, and their associated weights, wherein one or more text portions of the transcription are highlighted and bolded based on respective importance values extracted from the one or more features, and wherein an importance value of a text portion of the transcription indicates whether or not a topic of the text portion will be on an exam; notifying the user of the highlighted and bolded transcription in the determined particular style via a device of the user, wherein the notifying is performed according to preferences of the user; and receiving, from the user, confirmation of an accuracy of the transcription and approval of the transcription prior to notifying one or more other users of the transcription. 2. The method of claim 1 , wherein the one or more models correlate the one or more features with an appropriate transcription style and appropriately transcribing the media. 3. The method of claim 1 , further comprising receiving feedback indicative of whether the transcription was accurate; and adjusting the one or more models based on the received feedback. 4. The method of claim 1 , further comprising: collecting training data; extracting training features from the training data; and training the one or more models based on the extracted training features. 5. The method of claim 1 , wherein the transcription style is selected from a group comprising a transcription, outline, summary, presentation with notes, blog with comments, and tutorial with examples. 6. The method of claim 1 , wherein: the user is notified of the transcription along with audio or video of the media; and the transcription notification is synchronized with the audio or video of the media, wherein the synchronization is based on the media's content. 7. The method of claim 1 , wherein the transcription includes one or more timestamps. 8. The method of claim 1 , wherein the transcription is searchable by the user. 9. The method of claim 1 , wherein the one or more features include topics, importance, frequency, vocabulary, tones, moods, pointing, waving, facial expressions, eye direction, and eye movement. 10. A computer program product for transcribing media, the computer program product comprising: one or more non-transitory computer-readable storage media and program instructions stored on the one or more non-transitory computer-readable storage media capable of performing a method, the method comprising: collecting media of a user, wherein the media comprises content of a presentation given by the user; extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and long short-term memory to parse the collected media and extract the one or more features, and wherein the one or more extracted features comprise one or more speech features; determining, using one or more models, a transcription style based on the one or more extracted speech features, wherein the one or more models are trained through use of a feedback loop to weight the one or more extracted speech features such that features having a greater correlation with determined particular transcription styles are weighted greater than other features, and wherein the transcription style specifies a transcription format; transcribing, using the one or more models, the media, according to the determined transcription style, based on the one or more features and their associated weights, wherein one or more text portions of the transcription are highlighted and bolded based on respective importance values extracted from the one or more features, and wherein an importance value of a text portion of the transcription indicates whether or not a topic of the text portion will be on an exam; notifying the user of the highlighted and bolded transcription in the determined particular style via a device of the user, wherein the notifying is performed according to preferences of the user; and receiving, from the user, confirmation of an accuracy of the transcription and approval of the transcription prior to notifying one or more other users of the transcription. 11. The computer program product of claim 10 , wherein the one or more models correlate the one or more features with an appropriate transcription style and appropriately transcribing the media. 12. A computer system for transcribing media, the computer system comprising: one or more computer processors, one or more computer-readable storage media, and program instructions stored on the one or more of the computer-readable storage media for execution by at least one of the one or more processors capable of performing a method, the method comprising: collecting media of a user, wherein the media comprises content of a presentation given by the user; extracting one or more features from the media, wherein the extracting is performed using machine learning techniques comprising a convolutional neural network and long short-term memory to parse the collected media and extract the one or more features, and wherein the one or more extracted features comprise one or more speech features; determining, using one or more models, a transcription style based on the one or more extracted speech features, wherein the one or more models are trained through use of a feedback loop to weight the one or more extracted speech features such that features having a greater correlation with determined particular transcription styles are weighted greater than other features, and wherein the transcription style specifies a transcription format; transcribing, using the one or more models, the media, according to the determined transcription style, based on the one or more features and their associated weights, wherein one or more text portions of the transcription are highlighted and bolded based on respective importance-values extracted from the one or more features, and wherein an importance value of a text portion of the transcription indicates whether or not a topic of the text portion will be on an exam; notifying the user of the highlighted and bolded transcription in the determined particular style via a device of the user, wherein the notifying is performed according to preferences of the user; and receiving, from the user, confirmation of an accuracy of the transcription and approval of the transcription prior to notifying one or more other users of the transcription. 13. The computer system of claim 12 , wherein the one or more models correlate the one or more features w

Assignees

Inventors

Classifications

G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
G06N3/09
Supervised learning · CPC title
G06N3/0442
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G10L15/063
Training · CPC title

Patent family

Related publications grouped by family.

View patent family 81454721

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12033619B2 cover?: The exemplary embodiments disclose a method, a computer program product, and a computer system for transcribing media. The exemplary embodiments may include collecting media, extracting one or more features from the media, and transcribing the media based on the extracted one or more features and one or more models.
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 09 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).