Hierarchical machine learning system for lifelong learning

US11263524B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11263524-B2
Application numberUS-201816197017-A
CountryUS
Kind codeB2
Filing dateNov 20, 2018
Priority dateMar 7, 2018
Publication dateMar 1, 2022
Grant dateMar 1, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments described herein cover a hierarchical machine learning system with a separated perception subsystem (that includes a hierarchy of nodes having at least a first layer and a second layer) and application subsystem. In one example embodiment a first node in the first layer processes a first input and processes at least a portion of the first input to generate a first feature vector. A second node in the second layer processes a second input comprising at least a portion of the first feature vector to generate a second feature vector. The first node generates a first sparse feature vector from the first feature vector and/or the second node generates a second sparse feature vector from the second feature vector. A third node of the perception subsystem then processes at least one of the first sparse feature vector or the second sparse feature vector to determine an output.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for a machine learning system that mitigates catastrophic forgetting, comprising: receiving a first input at a first node in a first layer of a hierarchy of nodes; processing at least a portion of the first input by the first node to generate a first feature vector, the first feature vector comprising a first plurality of feature elements; processing a second input comprising at least a portion of the first feature vector by a second node in a second layer of the hierarchy of nodes to generate a second feature vector, the second feature vector comprising a second plurality of feature elements; generating a first sparse feature vector from the first feature vector, wherein a majority of feature elements in the first sparse feature vector have a value of zero; generating a second sparse feature vector from the second feature vector, wherein a majority of feature elements in the second sparse feature vector have a value of zero; and processing the first sparse feature vector and the second sparse feature vector by a third node to determine a first output. 2. The computer-implemented method of claim 1 , wherein the second layer is a top layer of the hierarchy of nodes and the first layer is a second to top layer of the hierarchy of nodes. 3. The computer-implemented method of claim 1 , wherein the first input further comprises at least a portion of a first previous feature vector that was generated by the first node based on a first previous input. 4. The computer-implemented method of claim 1 , wherein the first input further comprises at least a portion of a second previous feature vector that was generated by the second node based on a second previous input comprising at least a portion of a first previous feature vector generated by the first node. 5. The computer-implemented method of claim 1 , further comprising: receiving a data item at a bottom layer of the hierarchy of nodes, wherein the data item comprises a target, and wherein the first input and the second input are associated with the data item; determining an error associated with the first output based at least in part on a function of the first output and the target; determining a relevancy rating based at least in part on the error; determining whether the relevancy rating satisfies a relevancy criterion; determining a first novelty rating for the first node; determining whether the first novelty rating satisfies a novelty criterion; and updating the first node responsive to determining that the first novelty rating satisfies the novelty criterion and that the relevancy rating satisfies the relevancy criterion. 6. The computer-implemented method of claim 5 , further comprising: determining a second novelty rating for the second node; determining whether the second novelty rating satisfies the novelty criterion; and updating the second node responsive to determining that the second novelty rating satisfies the novelty criterion and that the relevancy rating satisfies the relevancy criterion. 7. The computer-implemented method of claim 5 , wherein the novelty criterion comprises a novelty threshold, the relevancy rating comprises a relevancy threshold, and the first node comprises a plurality of centroids, the method further comprising: determining that the first novelty rating is below the novelty threshold and that the relevancy rating is below the relevancy threshold; and updating a first centroid of the plurality of centroids in the first node, wherein the first centroid is associated with a first feature element of the first feature vector having a highest value. 8. The computer-implemented method of claim 5 , wherein the novelty criterion comprises a novelty threshold, the relevancy rating comprises a relevancy threshold, and the first node comprises a plurality of centroids, the method further comprising: determining that the first novelty rating is above the novelty threshold and that the relevancy rating is above the relevancy threshold; and allocating a first new centroid for the first node, the first new centroid having values based on the first input. 9. The computer-implemented method of claim 1 , further comprising: determining whether update criteria are satisfied for the first node; separately determining whether the update criteria are satisfied for the second node; and updating at least one of the first node or the second node. 10. The computer-implemented method of claim 1 , wherein the first node and the second node are components of a hierarchical perception subsystem of a machine learning system and wherein the third node is a component of an application subsystem of the machine learning system, the method further comprising: co-training the perception subsystem and the application subsystem based on labeled data items and unlabeled data items, wherein a first function is used to train nodes in the hierarchical perception subsystem and a second function is used to train nodes in the application subsystem. 11. A non-transitory computer readable medium comprising instructions that, when executed by a processing device, cause the processing device to perform operations comprising: receiving, by a first subsystem of a machine learning system that mitigates catastrophic forgetting, a first sparse feature vector from a first node in a second subsystem of the machine learning system, wherein the first node is included in a first layer of a hierarchy of nodes in the second subsystem, and wherein the first sparse feature vector was generated in response to the second subsystem receiving a data item; receiving, by the first subsystem, a second sparse feature vector from a second node in the second subsystem, wherein the second node is included in a second layer of the hierarchy of nodes, and wherein the second sparse feature vector was generated in response to the second subsystem receiving the data item; and mapping the first sparse feature vector and the second sparse feature vector into an output space of possible outputs to generate an output associated with the data item. 12. The non-transitory computer readable medium of claim 11 , the operations further comprising: determining that the data item is associated with a target output; making a comparison between the output and the target output; determining an error associated with the output based on the comparison; sending at least one of the error or a relevancy rating associated with the error to the first node in the first layer; and sending at least one of the error or the relevancy rating associated with the error to the second node in the second layer. 13. The non-transitory computer readable medium of claim 12 , the operations further comprising: determining one or more feature elements of the first sparse feature vector that contributed to the output; determining one or more feature elements of the second sparse feature vector that contributed to the output; and updating one or more weights in a third node of the first subsystem that are associated with at least one of a) the one or more feature elements of the first sparse feature vector that contributed to the output orb) the one or more feature elements of the second sparse feature vector that contributed to the output, wherein weights associated with feature elements that did not contribute to output are not updated. 14. The non-transitory computer readable medium of claim 13 , wherein a majority of feature elements in the first sparse feature vector have a zero value, wherein a majority of feature elements in the second sparse feature vector have

Assignees

Inventors

Classifications

  • using neural networks · CPC title

  • Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods · CPC title

  • using classification, e.g. of video objects · CPC title

  • Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection · CPC title

  • based on sparsity criteria, e.g. with an overcomplete basis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11263524B2 cover?
Embodiments described herein cover a hierarchical machine learning system with a separated perception subsystem (that includes a hierarchy of nodes having at least a first layer and a second layer) and application subsystem. In one example embodiment a first node in the first layer processes a first input and processes at least a portion of the first input to generate a first feature vector. A …
Who is the assignee on this patent?
IBM
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 01 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).