Continuous learning for machine learning models

US12488798B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12488798-B1
Application numberUS-202217852552-A
CountryUS
Kind codeB1
Filing dateJun 29, 2022
Priority dateJun 8, 2022
Publication dateDec 2, 2025
Grant dateDec 2, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A first neural network (NN) model may generate labels for training a second NN model. The second NN model may represent instances of a NN model operating on multiple different devices (e.g., decentralized user and/or edge devices). The system may include using a “teacher” model to process data received by one or more of the devices to generate a labeled dataset. The system may use the labeled dataset and a “student” model to calculate gradient data for updating the student model. The student model may be the same or similar to NN model instances operating on the devices. The system may validate the updated student model to determine, for example, whether it exhibits improved performance when processing the newly received data and/or historical data. The system may distribute the validated update to the devices.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: receiving, by a system component from a user device separate from the system component, first audio data representing an utterance, wherein the user device receives the first audio data and processes the first audio data using a first neural network model to generate first automatic speech recognition (ASR) data representing a first transcript of the utterance; processing the first audio data to determine first feature data, the first feature data representing normalized log-filterbank energies of frames of the first audio data; processing, by the system component using a second neural network model different from the first neural network model, the first audio data to determine second ASR data representing a second transcript of the utterance; determining, based on at least the second ASR data, to include the first feature data and the second ASR data in a first labeled dataset for updating the first neural network model, the first labeled dataset additionally including second feature data and third ASR data determined using second audio data; determining, by the system component using the first labeled dataset and a third neural network model different from the second neural network model, first gradient data representing gradients calculated for updating the third neural network model using the first labeled dataset, the third neural network model representing a duplicate of the first neural network model; determining, using the first gradient data, first model update data, the first model update data additionally representing second gradient data determined using a second labeled dataset; sending, from the system component to the user device, the first model update data; causing the user device to generate an updated first neural network model using the first model update data; and causing the user device to process third audio data, received by the user device, using the updated first neural network model to generate fourth ASR data. 2 . The computer-implemented method of claim 1 , further comprising: receiving third data representing a confidence that the second ASR data represents an accurate transcript of the utterance; and determining that the third data satisfies a condition, wherein determining to include the first feature data and the second ASR data in the first labeled dataset is additionally based on determining that the third data satisfies the condition. 3 . The computer-implemented method of claim 1 , further comprising: processing a third labeled dataset using the third neural network model to determine a first word error rate; determining a fourth neural network model using the third neural network model and the first model update data, the fourth neural network model representing an update of the third neural network model based on the first model update data; processing the third labeled dataset using the fourth neural network model to determine a second word error rate; and determining that the second word error rate is less than the first word error rate, wherein causing the user device to generate the updated first neural network model is based at least in part on determining that the second word error rate is less than the first word error rate. 4 . The computer-implemented method of claim 1 , wherein the first gradient data is determined using first training parameters, the method further comprising: determining a fourth neural network model using the third neural network model and the first model update data, the fourth neural network model representing an update of the third neural network model based on the first model update data; processing a third labeled dataset using the fourth neural network model to determine a first word error rate; determining, using the first labeled dataset and second training parameters different from the first training parameters, second model update data; determining a fifth neural network model using the third neural network model and the second model update data, the fifth neural network model representing an update of the third neural network model based on the second model update data; processing the third labeled dataset using the fifth neural network model to determine a second word error rate; and determining that the first word error rate is less than the second word error rate, wherein causing the user device to generate an updated first neural network model using the first model update data is based on determining that the first word error rate is less than the second word error rate. 5 . A computer-implemented method comprising: receiving, by one or more system components from a user device, first audio data representing an utterance captured using a microphone of the user device, wherein the user device processes the first audio data using a first machine learning model to generate first output data representing a first transcript of the utterance; processing the first audio data using a second machine learning model different from the first machine learning model to determine second output data representing a second transcript of the utterance; determining, based on at least the second output data, to include the second output data in first data representing a portion of a first labeled dataset for updating the first machine learning model; determining, by the one or more system components using the first data and a third machine learning model different from the second machine learning model, second data representing first gradients calculated for updating the third machine learning model using the first labeled dataset; determining, using the second data, first model update data, the first model update data additionally representing second gradients determined using a second labeled dataset; sending, from the one or more system components to the user device, the first model update data; causing the user device to generate an updated first machine learning model using the first model update data; and causing the user device to process second audio data, received by the microphone, using the updated first machine learning model to generate third output data. 6 . The computer-implemented method of claim 5 , further comprising: receiving third data representing a confidence that the second output data represents an accurate transcript of the utterance; and determining that the third data satisfies a condition, wherein determining to include the second output data in the first data is additionally based on determining that the third data satisfies the condition. 7 . The computer-implemented method of claim 5 , further comprising: processing a third labeled dataset using the third machine learning model to determine a first performance metric; determining a fourth machine learning model using the third machine learning model and the first model update data, a fourth machine learning model representing an update of the third machine learning model; processing the third labeled dataset using the fourth machine learning model to determine a second performance metric; and determining, using the first performance metric and the second performance metric, to cause the user device to generate an updated first machine learning model using the first model update data. 8 . The computer-implemented method of claim 5 , wherein the second data is determined using first training parameters, the method further comprising: determining a fourth machine learning model using the third machine learning model and the second data; processing a third labeled dataset using the fourth machine learning model to determine a first performance metric; determining, using the first data a

Assignees

Inventors

Classifications

  • Non-supervised learning, e.g. competitive learning · CPC title

  • Learning methods · CPC title

  • Recurrent networks, e.g. Hopfield networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12488798B1 cover?
A first neural network (NN) model may generate labels for training a second NN model. The second NN model may represent instances of a NN model operating on multiple different devices (e.g., decentralized user and/or edge devices). The system may include using a “teacher” model to process data received by one or more of the devices to generate a labeled dataset. The system may use the labeled d…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06N3/045. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).