System and method for residual long short term memories (LSTM) network

US10810482B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10810482-B2
Application numberUS-201615343987-A
CountryUS
Kind codeB2
Filing dateNov 4, 2016
Priority dateAug 30, 2016
Publication dateOct 20, 2020
Grant dateOct 20, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus and a method. The apparatus includes a plurality of long short term memory (LSTM) networks, wherein each of the plurality of LSTM networks is at a different network layer, wherein each of the plurality of LSTM networks is configured to determine a residual function, wherein each of the plurality of LSTM networks includes an output gate to control what is provided to a subsequent LSTM network, and wherein each of the plurality of LSTM networks includes at least one highway connection to compensate for the residual function of a previous LSTM network.

First claim

Opening claim text (preview).

What is claimed is: 1. An apparatus, comprising: a plurality of long short term memory (LSTM) networks, wherein each of the plurality of LSTM networks is at a different network layer; wherein each of the plurality of LSTM networks is configured to determine a residual function; wherein each of the plurality of LSTM networks includes an output gate to control what is provided to a subsequent LSTM network; and wherein each of the plurality of LSTM networks includes at least one highway connection to compensate for the residual function of a previous LSTM network and a projection matrix configured to control the at least one highway connection, wherein the projection matrix is further configured to control a ratio between a main path and a highway connection path. 2. The apparatus of claim 1 , wherein each LSTM network is further configured to: determine by a first function block f t L+1 =sigm(W xf L+1 x t L+1 +W hf L+1 h t−1 L+1 +W cf L+1 ⊙c t−1 L+1 +b f L+1 ); determine by a second function block i t L+1 =sigm(W xi L+1 x t L+1 +W hi L+1 h t−1 L+1 +W ci L+1 ⊙c t−1 L+1 +b i L+1 ); determine by a third function block j t L+1 =tanh(W xc L+1 x t L+1 +W hc L+1 h t−1 L+1 +b j L+1 ); determine by a fourth function block o t L+1 =sigm(W xo L+1 x t L+1 +W ho L+1 h t−1 L+1 +W co L+1 c t L+1 +b o L+1 ); determine by a fifth function block c t L+1 =f t L+1 ⊙c t−1 L+1 +i t L+1 ⊙j t L+1 ; and determine by a sixth function block h t L+1 as a function of c t L+1 , o t L+1 , and one of x t L+1 or ∑ i = 1 N ⁢ W h , i L + 1 ⁢ h t L + 1 - i , wherein x t L+1 is an input to the LSTM network, h t−1 L+1 is an output of a previous time in the LSTM network, c t−1 L+1 is a cell activation of the previous time in the LSTM network, W xf L+1 , W hf L+1 , W cf L+1 , W xi L+1 , W hi L+1 , W ci L+1 , W xc L+1 , W hc L+1 , W xo L+1 , W ho L+1 , and W co L+1 are weight matrices of the LSTM network, and b f L+1 , b i L+1 , b j L+1 , b o L+1 are pre-determined bias values of the LSTM network. 3. The apparatus of claim 2 , wherein f 1 and f 2 are each selected from a sigmoid function (sigm) and a hyperbolic tangent function (tanh). 4. The apparatus of claim 2 , wherein the sixth function block is further configured to determine h t L+1 =o t L+1 ⊙W proj L+1 tanh(c t L+1 )+(1−o t L+1 )⊙W h L+1 x t L+1 . 5. The apparatus of claim 2 , wherein the sixth function block is further configured to determine h t L + 1 = o t L + 1 ⊙ W proj L + 1 ⁢ tanh ⁡ ( c t L + 1 ) + ( 1 - o t L + 1 ) ⊙ ( ∑ i = 1 N ⁢ W h , i L + 1 ⁢ h t L + 1 - i ) . 6. The apparatus of claim 2 , wherein the sixth function block is further configured to determine h t L+1 =o t L+1 ⊙(W proj L+1 tanh(c t L+1 )+W h L+1 x t L+1 ). 7. The apparatus of claim 2 , wherein the sixth function block is further configured to determine h t L + 1 = o t L + 1 ⊙ ( W proj L + 1

Assignees

Inventors

Classifications

  • G06N3/044Primary

    Recurrent networks, e.g. Hopfield networks · CPC title

  • Learning methods · CPC title

  • G06N3/0442Primary

    characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • using simulation · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10810482B2 cover?
An apparatus and a method. The apparatus includes a plurality of long short term memory (LSTM) networks, wherein each of the plurality of LSTM networks is at a different network layer, wherein each of the plurality of LSTM networks is configured to determine a residual function, wherein each of the plurality of LSTM networks includes an output gate to control what is provided to a subsequent LS…
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/044. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 20 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).