Language models using spoken language modeling
US-2024386885-A1 · Nov 21, 2024 · US
US2025218428A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2025218428-A1 |
| Application number | US-202519085675-A |
| Country | US |
| Kind code | A1 |
| Filing date | Mar 20, 2025 |
| Priority date | Sep 24, 2021 |
| Publication date | Jul 3, 2025 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.
Opening claim text (preview).
What is claimed is: 1 . A computer implemented method, comprising: obtaining a machine learning model pre-trained for language modeling; performing an iterative hypertuning process comprising: (a) selecting one or more original auxiliary tasks from a pool of auxiliary tasks based on one or more relationships of the one or more original auxiliary tasks to a downstream task, (b) assigning hyperparameters to the machine learning model, (c) post-training the machine learning model for the one or more auxiliary tasks using labeled data associated with the one or more auxiliary tasks and the assigned hyperparameters, wherein the post-training comprises performing iterative training operations to optimize model parameters of the machine learning model and generate a focused machine learning model, (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, the downstream task, or both, (e) determining a performance metric based on the output, and (f) performing (a)-(e) based on the performance metric to optimize selecting the one or more auxiliary tasks and assigning the hyperparameters, wherein (a)-(e) are repeated through n number of iterations until an optimal combination of the one or more auxiliary tasks and the hyperparameters are found to solve an optimization or search problem; and providing the focused machine learning model comprising the optimized model parameters. 2 . The computer implemented method of claim 1 , wherein the one or more original auxiliary tasks are selected based on data that indicates the one or more relationships of the one or more original auxiliary tasks to the downstream task, which is indicative that when the machine learning model is focused on the one or more original auxiliary tasks the performance metric will be improved. 3 . The computer implemented method of claim 1 , wherein the assigning the hyperparameters comprises defining a command-line argument in a training service for each of the hyperparameters to be tuned, and using a value passed in the command-line argument to set the corresponding hyperparameter in code of a training application. 4 . The computer implemented method of claim 3 , wherein: the post-training is configured with hyperparameter tuning, and each of the hyperparameters to be tuned, type of each of the hyperparameters, and the range of values to try for the optimization are defined, the post-training is performed by the training service executing the training application, each of the hyperparameters are identified using a same name as a corresponding argument defined in the training service, and the training service includes the command-line arguments using the names when the training service executes the training application for post-training the machine learning model. 5 . The computer implemented method of claim 1 , wherein the iterative hypertuning process further comprises: (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, (d.1) obtaining, using a separate machine learning model, output associated with performance of the downstream task based on the output of the focused machine learning model, (e) determining the performance metric based on the output of the focused machine learning model, (e.1) determining another performance metric based on the output of the separate machine learning model, and (f) performing (a)-(e.1) based on the performance metric, the another performance metric, or a combination thereof to optimize selecting the one or more auxiliary tasks and assigning the hyperparameters, wherein (a)-(e.1) are repeated through n number of iterations until the optimal combination of the one or more auxiliary tasks and the hyperparameters are found to solve the optimization or search problem. 6 . The computer implemented method of claim 5 , wherein optimizing the selecting the one or more auxiliary tasks and assigning the hyperparameters is performed using a tuning algorithm to search and identify a best combination of hyperparameters including the one or more auxiliary tasks to solve the optimization or search problem. 7 . The computer implemented method of claim 6 , wherein the tuning algorithm executes a search strategy that includes grid search or random search to search and identify the best combination of hyperparameters. 8 . A system comprising: one or more data processors; and one or more non-transitory computer readable media storing instructions which, when executed by the one or more data processors, cause the one or more data processors to perform processing comprising: obtaining a machine learning model pre-trained for language modeling; performing an iterative hypertuning process comprising: (a) selecting one or more original auxiliary tasks from a pool of auxiliary tasks based on one or more relationships of the one or more original auxiliary tasks to a downstream task, (b) assigning hyperparameters to the machine learning model, (c) post-training the machine learning model for the one or more auxiliary tasks using labeled data associated with the one or more auxiliary tasks and the assigned hyperparameters, wherein the post-training comprises performing iterative training operations to optimize model parameters of the machine learning model and generate a focused machine learning model, (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, the downstream task, or both, (e) determining a performance metric based on the output, and (f) performing (a)-(e) based on the performance metric to optimize selecting the one or more auxiliary tasks and assigning the hyperparameters, wherein (a)-(e) are repeated through n number of iterations until an optimal combination of the one or more auxiliary tasks and the hyperparameters are found to solve an optimization or search problem; and providing the focused machine learning model comprising the optimized model parameters. 9 . The system of claim 8 , wherein the one or more original auxiliary tasks are selected based on data that indicates the one or more relationships of the one or more original auxiliary tasks to the downstream task, which is indicative that when the machine learning model is focused on the one or more original auxiliary tasks the performance metric will be improved. 10 . The system of claim 8 , wherein the assigning the hyperparameters comprises defining a command-line argument in a training service for each of the hyperparameters to be tuned, and using a value passed in the command-line argument to set the corresponding hyperparameter in code of a training application. 11 . The system of claim 10 , wherein: the post-training is configured with hyperparameter tuning, and each of the hyperparameters to be tuned, type of each of the hyperparameters, and the range of values to try for the optimization are defined, the post-training is performed by the training service executing the training application, each of the hyperparameters are identified using a same name as a corresponding argument defined in the training service, and the training service includes the command-line arguments using the names when the training service executes the training application for post-training the machine learning model. 12 . The system of claim 8 , wherein the iterative hypertuning process further comprises: (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, (d.1) obtaining, using a separate machine lea
using context dependencies, e.g. language models · CPC title
updating or merging of old and new templates; Mean values; Weighting · CPC title
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title
using kernel methods, e.g. support vector machines [SVM] · CPC title
Ensemble learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.