Utilizing deep recurrent neural networks with layer-wise attention for punctuation restoration

US11521071B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11521071-B2
Application numberUS-201916411490-A
CountryUS
Kind codeB2
Filing dateMay 14, 2019
Priority dateMay 14, 2019
Publication dateDec 6, 2022
Grant dateDec 6, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure relates to utilizing a deep recurrent neural network for accurately performing punctuation restoration. For example, the disclosed systems can provide a sequence of words to a punctuation restoration neural network having multiple bi-directional recurrent layers and one or more neural attention mechanisms. In one or more embodiments, the punctuation restoration neural network incorporates layer-wise attentions and/or multi-head attention. The disclosed systems can utilize the punctuation restoration neural network to generate probabilities for each word, indicating the likelihood that each possible punctuation mark is associated with that word. Based on these probabilities, the disclosed systems can generate a punctuated transcript that includes punctuation before the appropriate words.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable medium storing instructions thereon that, when executed by at least one processor, cause a computing device to: generate, by each bi-directional recurrent neural network layer of a plurality of bi-directional recurrent neural network layers of a punctuation restoration neural network, a plurality of output states corresponding to words from a sequence of words; generate, utilizing a gated recurrent unit of the punctuation restoration neural network, a set of final states based on the plurality of output states; generate, utilizing a plurality of neural attention mechanisms of the punctuation restoration neural network, a plurality of attention outputs based on the plurality of output states, wherein each neural attention mechanism is associated with a bi-directional recurrent neural network layer from the plurality of bi-directional recurrent neural network layers and generates, for at least one state of the punctuation restoration neural network, one or more attention outputs based on the plurality of output states of the bi-directional recurrent neural network layer and a final state from the set of final states that corresponds to a previous state of the punctuation restoration neural network; determine punctuation label probabilities for the words from the sequence of words based on the set of final states and the plurality of attention outputs; and generate a punctuated transcript comprising punctuation before one or more words from the sequence of words based on the punctuation label probabilities. 2. The non-transitory computer-readable medium of claim 1 , wherein: the instructions, when executed by the at least one processor, cause the computing device to generate, by each neural attention mechanism, a layer-wise attention weight for each output state from the plurality of output states of an associated bi-directional recurrent neural network layer. 3. The non-transitory computer-readable medium of claim 2 , wherein the instructions, when executed by the at least one processor, cause the computing device to generate the plurality of attention outputs by concatenating, for a given state of the punctuation restoration neural network, the layer-wise attention weight corresponding to the given state from each neural attention mechanism. 4. The non-transitory computer-readable medium of claim 3 , wherein: each neural attention mechanism from the plurality of neural attention mechanisms comprises a multi-head neural attention mechanism; and the instructions, when executed by the at least one processor, cause the computing device to: utilize each multi-head neural attention mechanism to generate a plurality of layer-wise attention weights for each state of the punctuation restoration neural network; and generate the plurality of attention outputs by concatenating, for a given state of the punctuation restoration neural network, the plurality of layer-wise attention weights corresponding to the given state from each neural attention mechanism. 5. The non-transitory computer-readable medium of claim 1 , wherein the plurality of neural attention mechanisms comprises one or more scaled dot-product neural attention mechanisms. 6. The non-transitory computer-readable medium of claim 1 , wherein instructions, when executed by the at least one processor, cause the computing device to generate the set of final states based on the plurality of output states by generating for each state of the punctuation restoration neural network, a final state based on an output state that corresponds to the state of the punctuation restoration neural network. 7. The non-transitory computer-readable medium of claim 6 , wherein the instructions, when executed by the at least one processor, cause the computing device to determine the punctuation label probabilities for the words from the sequence of words based on the set of final states and the plurality of attention outputs utilizing a fully connected layer with a SoftMax classifier to generate, for a given word of the sequence of words, a punctuation label probability for each of a plurality of punctuation marks. 8. The non-transitory computer-readable medium of claim 1 , wherein the instructions, when executed by the at least one processor, cause the computing device to generate, by a given bi-directional recurrent neural network layer, the plurality of output states by: generating a plurality of forward states by processing embeddings of the sequence of words in a forward direction utilizing a forward recurrent neural network layer of the given bi-directional recurrent neural network layer; generating a plurality of backward states by processing the embeddings of the sequence of words in a backward direction utilizing a backward recurrent neural network layer of the given bi-directional recurrent neural network layer; and combining, for each state of the punctuation restoration neural network, a forward state and a backward state corresponding to the state of the punctuation restoration neural network. 9. The non-transitory computer-readable medium of claim 1 , further storing instructions that, when executed by the at least one processor, cause the computing device to perform a language understanding task based on the punctuated transcript, the language understanding task comprising at least one of generating a translation, generating a transcript summary, determining an answer to a question, performing sentiment analysis, performing syntactic parsing, or extracting information. 10. A system comprising: a memory comprising a punctuation restoration neural network trained to generate punctuation label probabilities, the punctuation restoration neural network comprising a plurality of bi-directional recurrent neural network layers and a plurality of neural attention mechanisms, wherein each neural attention mechanism is associated with a bi-directional recurrent neural network layer from the plurality of bi-directional recurrent neural network layers; at least one processor; and at least one non-transitory computer-readable medium storing instructions thereon that, when executed by the at least one processor, cause the system to: generate, by each bi-directional recurrent neural network layer of the punctuation restoration neural network, a plurality of output states by generating forward states and backward states and combining the forward states and backward states, wherein each state of the punctuation restoration neural network corresponds to words from a sequence of words; generate a set of final states based on the plurality of output states utilizing a gated recurrent unit of the punctuation restoration neural network; generate, utilizing the plurality of neural attention mechanisms, a plurality of attention outputs by utilizing each neural attention mechanism to generate, for at least one state of the punctuation restoration neural network, one or more attention outputs based on the plurality of output states of an associated bi-directional recurrent neural network layer and a final state from the set of final states that corresponds to a previous state of the punctuation restoration neural network; determine punctuation label probabilities for the words from the sequence of words based on the set of final states and the plurality of attention outputs; and generate a punctuated transcript comprising punctuation before one or more of the words from the sequence of words based on the punctuation label probabilities. 11. The system of claim 10 , wherein: the instructions, when executed by the at least one processor, cause the system to generate, by each neural attention mechanism, a layer-w

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • Document-oriented image-based pattern recognition · CPC title

  • using neural networks · CPC title

  • G06N3/084Primary

    Backpropagation, e.g. using gradient descent · CPC title

  • Probabilistic or stochastic networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11521071B2 cover?
The present disclosure relates to utilizing a deep recurrent neural network for accurately performing punctuation restoration. For example, the disclosed systems can provide a sequence of words to a punctuation restoration neural network having multiple bi-directional recurrent layers and one or more neural attention mechanisms. In one or more embodiments, the punctuation restoration neural net…
Who is the assignee on this patent?
Adobe Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 06 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).