Utilizing deep recurrent neural networks with layer-wise attention for punctuation restoration

US2020364576A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2020364576-A1
Application numberUS-201916411490-A
CountryUS
Kind codeA1
Filing dateMay 14, 2019
Priority dateMay 14, 2019
Publication dateNov 19, 2020
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to utilizing a deep recurrent neural network for accurately performing punctuation restoration. For example, the disclosed systems can provide a sequence of words to a punctuation restoration neural network having multiple bi-directional recurrent layers and one or more neural attention mechanisms. In one or more embodiments, the punctuation restoration neural network incorporates layer-wise attentions and/or multi-head attention. The disclosed systems can utilize the punctuation restoration neural network to generate probabilities for each word, indicating the likelihood that each possible punctuation mark is associated with that word. Based on these probabilities, the disclosed systems can generate a punctuated transcript that includes punctuation before the appropriate words.

First claim

Opening claim text (preview).

What is claimed is: 1 . A non-transitory computer-readable medium storing instructions thereon that, when executed by at least one processor, cause a computing device to: generate, by each bi-directional recurrent neural network layer of a plurality of bi-directional recurrent neural network layers, a plurality of output states corresponding to words from a sequence of words; generate, utilizing one or more neural attention mechanisms, a plurality of attention outputs based on the plurality of output states; determine punctuation label probabilities for the words from the sequence of words based on the plurality of output states and the plurality of attention outputs; and generate a punctuated transcript comprising punctuation before one or more of the words from the sequence of words based on the punctuation label probabilities. 2 . The non-transitory computer-readable medium of claim 1 , wherein: the one or more neural attention mechanisms comprise a neural attention mechanism for each bi-directional recurrent neural network layer; and the instructions, when executed by the at least one processor, cause the computing device to generate, by each neural attention mechanism, a layer-wise attention weight for each output state from the plurality of output states of a corresponding bi-directional recurrent neural network layer. 3 . The non-transitory computer-readable medium of claim 2 , wherein the instructions, when executed by the at least one processor, cause the computing device to generate the plurality of attention outputs by concatenating, for a given state, the layer-wise attention weight corresponding to the state from each neural attention mechanism. 4 . The non-transitory computer-readable medium of claim 3 , wherein: each neural attention mechanism from the one or more neural attention mechanisms comprises a multi-head neural attention mechanism; the instructions, when executed by the at least one processor, cause the computing device to: utilize each multi-head neural attention mechanism to generate a plurality of layer-wise attention weights for each state; and generate the plurality of attention outputs by concatenating, for a given state, the plurality of layer-wise attention weights corresponding to the state from each neural attention mechanism. 5 . The non-transitory computer-readable medium of claim 1 , wherein: the one or more neural attention mechanisms comprise a multi-head neural attention mechanism; the instructions, when executed by the at least one processor, cause the computing device to: utilize the multi-head neural attention mechanism to generate a plurality of attention weights for each state; and generate the plurality of attention outputs by concatenating, for a given state, the plurality of attention weights corresponding to the state. 6 . The non-transitory computer-readable medium of claim 1 , wherein the one or more neural attention mechanisms comprise one or more scaled dot-product neural attention mechanisms. 7 . The non-transitory computer-readable medium of claim 1 , further storing instructions that, when executed by the at least one processor, cause the computing device to generate a set of final states based on the plurality of output states. 8 . The non-transitory computer-readable medium of claim 7 , wherein the instructions, when executed by the at least one processor, cause the computing device to determine the punctuation label probabilities for the words from the sequence of words based on the set of final states and the plurality of attention outputs utilizing a fully connected layer with a SoftMax classifier to generate, for a given word of the sequence of words, a punctuation label probability for each of a plurality of punctuation marks. 9 . The non-transitory computer-readable medium of claim 1 , wherein the instructions, when executed by the at least one processor, cause the computing device to generate, by a given bi-directional recurrent neural network layer, the plurality of output states by: generating a plurality of forward states by processing embeddings of the sequence of words in a forward direction utilizing a forward recurrent neural network layer of the given bi-directional recurrent neural network layer; generating a plurality of backward states by processing the embeddings of the sequence of words in a backward direction utilizing a backward recurrent neural network layer of the given bi-directional recurrent neural network layer; and combining, for each state, a forward state and a backward state corresponding to the state. 10 . The non-transitory computer-readable medium of claim 1 , further storing instructions that, when executed by the at least one processor, cause the computing device to perform a language understanding task based on the punctuated transcript, the language understanding task comprising at least one of generating a translation, generating a transcript summary, determining an answer to a question, performing sentiment analysis, performing syntactic parsing, or extracting information. 11 . A system comprising: a memory comprising a punctuation restoration neural network trained to generate punctuation label probabilities, the punctuation restoration neural network comprising a plurality of bi-directional recurrent neural network layers and one or more neural attention mechanisms; at least one processor; and at least one non-transitory computer-readable medium storing instructions thereon that, when executed by the at least one processor, cause the system to: generate, by each bi-directional recurrent neural network layer of the punctuation restoration neural network, a plurality of output states by generating forward states and backward states and combining the forward states and backward states, wherein each state corresponds to words from a sequence of words; generate a set of final states based on the plurality of output states utilizing a gated recurrent unit of the punctuation restoration neural network; generate, utilizing one or more neural attention mechanisms, a plurality of attention outputs by combining the plurality of output states from each bi-directional layer and the set of final states; determine punctuation label probabilities for the words from the sequence of words based on the set of final states and the plurality of attention outputs; and generate a punctuated transcript comprising punctuation before one or more of the words from the sequence of words based on the punctuation label probabilities. 12 . The system of claim 11 , wherein: the one or more neural attention mechanisms comprise a plurality of neural attention mechanisms; each neural attention mechanism from the plurality of neural attention mechanisms corresponds to a bi-directional recurrent neural network layer from the plurality of bi-directional recurrent neural network layers; and the instructions, when executed by the at least one processor, cause the system to generate, by each neural attention mechanism, a layer-wise attention weight for each output state from the plurality of output states of a corresponding bi-directional recurrent neural network layer. 13 . The system of claim 12 , wherein: each neural attention mechanism from the plurality of neural attention mechanisms comprises a multi-head neural attention mechanism; the instructions, when executed by the at least one processor, cause the system to: utilize each multi-head neural attention mechanism to generate a plurality of layer-wise attention weights for each state; and generate the plurality of attention outputs by combining, for a given state, the pluralit

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Document-oriented image-based pattern recognition · CPC title

  • using neural networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Generating training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2020364576A1 cover?
The present disclosure relates to utilizing a deep recurrent neural network for accurately performing punctuation restoration. For example, the disclosed systems can provide a sequence of words to a punctuation restoration neural network having multiple bi-directional recurrent layers and one or more neural attention mechanisms. In one or more embodiments, the punctuation restoration neural net…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 19 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).