Utilizing hidden state sharing modules to prevent catastrophic forgetting

US12530225B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12530225-B2
Application numberUS-202117473808-A
CountryUS
Kind codeB2
Filing dateSep 13, 2021
Priority dateSep 13, 2021
Publication dateJan 20, 2026
Grant dateJan 20, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer-implemented method, system and computer program product for processing data. Data, including single data points (e.g., images) or entire sequences of data (e.g., speech, video), is received to be processed. A long short term memory structure is utilized to process the received data, where the long short term memory structure includes hidden state sharing modules for allowing information sharing in hidden states across different tasks. The hidden state sharing modules include broadcast modules which are configured to send hidden states of the current task to all previous modules and collect modules which are configured to collect all the hidden states from all the previous modules. In this manner, catastrophic forgetting is avoided by preventing the loss of previously learned information via the use of hidden state sharing modules.

First claim

Opening claim text (preview).

The invention claimed is: 1 . A computer-implemented method for preventing catastrophic forgetting, the method comprising: receiving data; and processing said received data by utilizing a long short term memory structure, wherein said long short term memory structure comprises hidden state sharing modules for allowing information sharing in hidden states across different tasks, wherein said hidden state sharing modules broadcast hidden states to all previous modules and collect hidden states from all said previous modules thereby preventing a loss of previously learned information so as to avoid catastrophic forgetting. 2 . The method as recited in claim 1 , wherein said hidden state sharing modules comprise a first module configured to send hidden states of a task to all said previous modules. 3 . The method as recited in claim 1 , wherein said hidden state sharing modules comprise a second module configured to collect all hidden states from all said previous modules. 4 . The method as recited in claim 1 , wherein said data comprises a data set for a first task, wherein the method further comprises: updating model parameters of a first task-oriented module with said data set for said first task in response to processing said data set for said first task, wherein said model parameters of said first task-oriented module comprise a matrix and a bias, wherein said first task-oriented module comprises computational blocks that control information flow. 5 . The method as recited in claim 4 , wherein said data comprises a data set for a second task which is subsequent to said first task, wherein the method further comprises: immobilizing changes to said model parameters of said first task-oriented module in response to processing said second task; and creating a second task-oriented module for said second task in response to processing said second task, wherein said second task-oriented module comprises computational blocks that control information flow. 6 . The method as recited in claim 5 further comprising: creating a first hidden state sharing module of said hidden state sharing modules configured to send hidden states of said second task to all said previous modules in response to processing said second task; and creating a second hidden state sharing module of said hidden state sharing modules configured to collect all hidden states from all said previous modules in response to processing said second task. 7 . The method as recited in claim 1 further comprising: obtaining an output hidden state of said long short term memory structure by summing hidden states of all modules of said long short term memory structure. 8 . A computer program product for preventing catastrophic forgetting, the computer program product comprising one or more computer readable storage mediums having program code embodied therewith, the program code comprising programming instructions for: receiving data; and processing said received data by utilizing a long short term memory structure, wherein said long short term memory structure comprises hidden state sharing modules for allowing information sharing in hidden states across different tasks, wherein said hidden state sharing modules broadcast hidden states to all previous modules and collect hidden states from all said previous modules thereby preventing a loss of previously learned information so as to avoid catastrophic forgetting. 9 . The computer program product as recited in claim 8 , wherein said hidden state sharing modules comprise a first module configured to send hidden states of a task to all said previous modules. 10 . The computer program product as recited in claim 8 , wherein said hidden state sharing modules comprise a second module configured to collect all hidden states from all said previous modules. 11 . The computer program product as recited in claim 8 , wherein said data comprises a data set for a first task, wherein the program code further comprises the programming instructions for: updating model parameters of a first task-oriented module with said data set for said first task in response to classifying, processing or making predictions using said data set for said first task, wherein said model parameters of said first task-oriented module comprise a matrix and a bias, wherein said first task-oriented module comprises computational blocks that control information flow. 12 . The computer program product as recited in claim 11 , wherein said data comprises a data set for a second task which is subsequent to said first task, wherein the program code further comprises the programming instructions for: immobilizing changes to said model parameters of said first task-oriented module in response to processing said second task; and creating a second task-oriented module for said second task in response to processing said second task, wherein said second task-oriented module comprises computational blocks that control information flow. 13 . The computer program product as recited in claim 12 , wherein the program code further comprises the programming instructions for: creating a first hidden state sharing module of said hidden state sharing modules configured to send hidden states of said second task to all said previous modules in response to processing said second task; and creating a second hidden state sharing module of said hidden state sharing modules configured to collect all hidden states from all said previous modules in response to processing said second task. 14 . The computer program product as recited in claim 8 , wherein the program code further comprises the programming instructions for: obtaining an output hidden state of said long short term memory structure by summing hidden states of all modules of said long short term memory structure. 15 . A system, comprising: a memory for storing a computer program for preventing catastrophic forgetting; and a processor connected to said memory, wherein said processor is configured to execute program instructions of the computer program comprising: receiving data; and processing said received data by utilizing a long short term memory structure, wherein said long short term memory structure comprises hidden state sharing modules for allowing information sharing in hidden states across different tasks, wherein said hidden state sharing modules broadcast hidden states to all previous modules and collect hidden states from all said previous modules thereby preventing a loss of previously learned information so as to avoid catastrophic forgetting. 16 . The system as recited in claim 15 , wherein said hidden state sharing modules comprise a first module configured to send hidden states of a task to all said previous modules. 17 . The system as recited in claim 15 , wherein said hidden state sharing modules comprise a second module configured to collect all hidden states from all said previous modules. 18 . The system as recited in claim 15 , wherein said data comprises a data set for a first task, wherein the program instructions of the computer program further comprise: updating model parameters of a first task-oriented module with said data set for said first task in response to classifying, processing or making predictions using said data set for said first task, wherein said model parameters of said first task-oriented module comprise a matrix and a bias, wherein said first task-oriented module comprises computational blocks that control information flow. 19 . The system as

Assignees

Inventors

Classifications

  • G06N3/0442Primary

    characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Transfer learning · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • Learning methods · CPC title

  • G06F9/4881Primary

    Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12530225B2 cover?
A computer-implemented method, system and computer program product for processing data. Data, including single data points (e.g., images) or entire sequences of data (e.g., speech, video), is received to be processed. A long short term memory structure is utilized to process the received data, where the long short term memory structure includes hidden state sharing modules for allowing informat…
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N3/0442. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 20 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).