What technology area does this patent fall under?

Primary CPC classification G06F40/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Feb 28 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Thematic segmentation of long content using deep learning and contextual cues

US2019066663A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2019066663-A1
Application number	US-201715684042-A
Country	US
Kind code	A1
Filing date	Aug 23, 2017
Priority date	Aug 23, 2017
Publication date	Feb 28, 2019
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A recurrent neural network (RNN) is trained to identify split positions in long content, wherein each split position is a position at which the theme of the long content changes. Each sentence in the long content is converted to a vector that corresponds to the meaning of the sentence. The sentence vectors are used as inputs to the RNN. The high-probability split points determined by the RNN may be combined with contextual cues to determine the actual split point to use. The split points are used to generate thematic segments of the long content. The multiple thematic segments may be presented to a user along with a topic label for each thematic segment. Each topic label may be generated based on the words contained in the corresponding thematic segment.

First claim

Opening claim text (preview).

What is claimed is: 1 . A system comprising: a memory that stores instructions; one or more processors configured by the instructions to perform operations comprising: accessing a plurality of sentences; generating a plurality of sentence vectors, each sentence vector of the plurality of sentence vectors corresponding to a respective sentence of the plurality of sentences; providing a subset of the plurality of sentence vectors as an input to a recurrent neural network (RNN); based on an output of the RNN responsive to the input, determining that a subset of the plurality of sentences relate to a first topic; and providing an output comprising the subset of the plurality of sentences related to the first topic. 2 . The system of claim 1 , wherein the generating of each sentence vector of the plurality of sentence vectors comprises: accessing a plurality of word vectors, each word vector of the plurality of word vectors corresponding to a respective word of the sentence corresponding to the sentence vector; and averaging the plurality of word vectors to generate the sentence vector. 3 . The system of claim 1 , wherein the operations further comprise: accessing an audio file; and generating the plurality of sentences from the audio file using speech-to-text conversion. 4 . The system of claim 1 , wherein: the operations further comprise: providing a second subset of the plurality of sentence vectors as a second input to the RNN; and the determining that the subset of the plurality of sentences relate to the first topic is further based on a second output of the RNN responsive to the second input. 5 . The system of claim 4 , wherein: the second subset of the plurality of sentence vectors has a same number of sentence vectors as the first subset of the plurality of sentence vectors; the second subset of the plurality of sentence vectors has at least one vector in common with the first subset of the plurality of sentence vectors; and the second subset of the plurality of sentence vectors has at least one vector different from each vector in the first subset of the plurality of sentence vectors. 6 . The system of claim 1 , wherein: the determining that the subset of the plurality of sentences relate to the first topic comprises: comparing each value of a plurality of output values from the RNN to a predetermined threshold, each output value corresponding to a possible split position indicating a split between the first topic and a second topic; and the determining that the subset of the plurality of sentences relate to the first topic is based on results of the comparisons. 7 . The system of claim 1 , wherein: the plurality of sentences are embedded in a file that includes a paragraph change indicator at a position within the plurality of sentences; and the determining that the subset of the plurality of sentences relate to the first topic is based on the position of the paragraph change indicator. 8 . The system of claim 1 , wherein: the plurality of sentences are embedded in a file that includes a header indicator at a position within the plurality of sentences; and the determining that the subset of the plurality of sentences relate to the first topic is based on the position of the header indicator. 9 . The system of claim 1 , wherein the operations further comprise: determining a set of words comprised by the subset of the plurality of sentences; and generating a name of the first topic based on the set of words. 10 . The system of claim 1 , wherein: the operations further comprise: accessing a uniform resource locator (URL); accessing a media file using the URL; generating the plurality of sentences by using speech-to-text conversion on the media file; and identifying a second subset of the plurality of sentences related to a second topic, using the RNN; the output comprising the subset of the plurality of sentences related to the first topic is a first media file; and the operations further comprise: generating a first name for the first topic; generating a second media file comprising the second subset of the plurality of sentences related to the second topic; generating a second name for the second topic; and providing a user interface that includes the first name, a link to the first media file, the second name, and a link to the second media file. 11 . A method comprising: accessing, by one or more processors, a plurality of sentences; generating, by the one or more processors, a plurality of sentence vectors, each sentence vector of the plurality of sentence vectors corresponding to a respective sentence of the plurality of sentences; providing, by the one or more processors, a subset of the plurality of sentence vectors as an input to a recurrent neural network (RNN); based on an output of the RNN responsive to the input, determining, by the one or more processors, that a subset of the plurality of sentences relate to a first topic; and providing, by the one or more processors an output comprising the subset of the plurality of sentences related to the first topic. 12 . The method of claim 11 , wherein the generating of each sentence vector of the plurality of sentence vectors comprises: accessing a plurality of word vectors, each word vector of the plurality of word vectors corresponding to a respective word of the sentence corresponding to the sentence vector; and averaging the plurality of word vectors to generate the sentence vector. 13 . The method of claim 11 , further comprising: accessing an audio file; and generating the plurality of sentences from the audio file using speech-to-text conversion. 14 . The method of claim 11 , wherein: the method further comprises: providing a second subset of the plurality of sentence vectors as a second input to the RNN; and the determining that the subset of the plurality of sentences relate to the first topic is further based on a second output of the RNN responsive to the second input. 15 . The method of claim 14 , wherein: the second subset of the plurality of sentence vectors has a same number of sentence vectors as the first subset of the plurality of sentence vectors; the second subset of the plurality of sentence vectors has at least one vector in common with the first subset of the plurality of sentence vectors; and the second subset of the plurality of sentence vectors has at least one vector different from each vector in the first subset of the plurality of sentence vectors. 16 . The method of claim 11 , wherein: the determining that the subset of the plurality of sentences relate to the first topic comprises: comparing each value of a plurality of output values from the RNN to a predetermined threshold, each output value corresponding to a possible split position indicating a split between the first topic and a second topic; and the determining that the subset of the plurality of sentences relate to the first topic is based on results of the comparisons. 17 . The method of claim 11 , wherein: the plurality of sentences are embedded in a file that includes a paragraph change indicator at a position within the plurality of sentences; and the determining that the subset of the plurality of sentences relate to the first topic is based on the position of the paragraph change indicator. 18 . The method of claim 11 , wherein: the plurality of sentences are embedded in a file that includes a header indicator at a position within the plurality of sentences; and

Assignees

Sap Se

Inventors

Classifications

G06N3/044
Recurrent networks, e.g. Hopfield networks · CPC title
G06F40/30Primary
Semantic analysis · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G06F40/258
Heading extraction; Automatic titling; Numbering · CPC title
G06N3/08
Learning methods · CPC title

Patent family

Related publications grouped by family.

View patent family 65435513

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019066663A1 cover?: A recurrent neural network (RNN) is trained to identify split positions in long content, wherein each split position is a position at which the theme of the long content changes. Each sentence in the long content is converted to a vector that corresponds to the meaning of the sentence. The sentence vectors are used as inputs to the RNN. The high-probability split points determined by the RNN ma…
Who is the assignee on this patent?: Sap Se
What technology area does this patent fall under?: Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Feb 28 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).