Application recommendation machine learning system
US-2021256366-A1 · Aug 19, 2021 · US
US12541977B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12541977-B2 |
| Application number | US-202318205802-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 5, 2023 |
| Priority date | Jun 5, 2023 |
| Publication date | Feb 3, 2026 |
| Grant date | Feb 3, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed herein are system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof, for cue point discovery for content. For example, system, apparatus, article of manufacture, method and/or computer program product embodiments, and/or combinations and sub-combinations thereof are provided for using unsupervised machine learning to automatically classify cue points for episodic content. The cue points can be associates with an opening credits section, an end credits section, a recap section, or a behind-the-scenes section.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method, comprising: dividing, by at least one computer processor, a video associated with an episode of an episodic content into a plurality of sections; determining a representation for each of the plurality of sections; comparing a first representation of a first section of the plurality of sections of the video with a plurality of representations, wherein the plurality of representations are associated with one or more sections of one or more episodes of the episodic content; determining a plurality of similarity values for the first representation based on the comparison; determining one or more of the plurality of similarity values that satisfy a condition; determining a temporal position of the first representation in response to the one or more of the plurality of similarity values satisfying the condition; and determining a type of the first section of the plurality of sections of the video by comparing the temporal position with a temporal position threshold and further based at least one or more of first region information of a first geographical region where the episodic content is produced or second region information of a second geographical region where the episodic content is being shown, wherein the type of the first section comprises an opening credits section, an end credits section, a recap section, or a behind-the-scenes section, and wherein the temporal position threshold depends on at least one or more of the first region information or the second region information. 2 . The computer-implemented method of claim 1 , wherein the representation comprises an image embedding, an audio embedding, a text embedding, or a combination of two or more of the image embedding, the audio embedding, and the text embedding. 3 . The computer-implemented method of claim 1 , wherein determining the type of the first section further comprises using one or more temporal positions corresponding to one or more of the plurality of representations associated with the one or more of the plurality of similarity values that satisfy the condition. 4 . The computer-implemented method of claim 1 , wherein using the temporal position associated with the first representation comprises: determining that the type of the first section comprises the opening credits section in response to the temporal position being before the first temporal position threshold. 5 . The computer-implemented method of claim 1 , wherein using the temporal position associated with the first representation comprises: determining that the type of the first section comprises the end credits section in response to the temporal position being after the first temporal position threshold. 6 . The computer-implemented method of claim 1 , wherein determining the type of the first section further comprises: using a text detection method to determine text within the first section of the plurality of sections of the video; and using the determined text to determine that the type of the first section comprises the end credits section. 7 . The computer-implemented method of claim 1 , wherein determining the type of the first section further comprises: using production information associated with the episodic content to determine the type of the first section. 8 . The computer-implemented method of claim 1 , wherein determining one or more of the plurality of similarity values that satisfy the condition comprises: comparing the plurality of similarity values with a second threshold; and determining that the one or more of the plurality of similarity values are greater than the second threshold. 9 . The computer-implemented method of claim 1 , wherein determining the representation, comparing the first representation with the plurality of representations, determining the plurality of similarity values, and determining the one or more of the plurality of similarity values satisfying the condition are part of an unsupervised machine learning model. 10 . The computer-implemented method of claim 1 , further comprising: determining two or more sections of the plurality of sections of the video that have similarity values that satisfy the condition; determining a number of the two or more sections; and in response to the number of the two or more sections satisfying a second threshold, using temporal positions associated with the two or more sections to determine a type of the two or more sections of the plurality of sections of the video. 11 . A system, comprising: one or more memories; and at least one processor each coupled to at least one of the memories and configured to perform operations comprising: dividing a video associated with an episode of an episodic content into a plurality of sections; determining a representation for each of the plurality of sections; comparing a first representation of a first section of the plurality of sections of the video with a plurality of representations, wherein the plurality of representations are associated with one or more sections of one or more episodes of the episodic content; determining a plurality of similarity values for the first representation based on the comparison; determining one or more of the plurality of similarity values that satisfy a condition; determining a temporal position of the first representation in response to the one or more of the plurality of similarity values satisfying the condition; and determining a type of the first section of the plurality of sections of the video by comparing the temporal position with a temporal position threshold and further based at least one or more of first region information of a first geographical region where the episodic content is produced or second region information of a second geographical region where the episodic content is being shown, wherein the type of the first section comprises an opening credits section, an end credits section, a recap section, or a behind-the-scenes section, wherein the temporal position threshold depends on at least one or more of the first region information or the second region information. 12 . The system of claim 11 , wherein: the representation comprises an image embedding, an audio embedding, a text embedding, or a combination of two or more of the image embedding, the audio embedding, and the text embedding; and determining the type of the first section further comprises using one or more temporal positions corresponding to one or more of the plurality of representations associated with the one or more of the plurality of similarity values that satisfy the condition. 13 . The system of claim 11 , wherein using the temporal position associated with the first representation comprises: determining that the type of the first section comprises the opening credits section in response to the temporal position being before the first temporal position threshold. 14 . The system of claim 11 , wherein using the temporal position associated with the first representation comprises: determining that the type of the first section comprises the end credits section in response to the temporal position being after the first temporal position threshold. 15 . The system of claim 11 , wherein determining the type of the first section further comprises: using a text detection method to determine text within the first section of the plurality of sections of the video; and using the determined text to determine that the type of the first section comprises the end credits section. 16 . The system of clai
Scene text, e.g. street names · CPC title
Proximity, similarity or dissimilarity measures · CPC title
Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching · CPC title
Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.