Framework for focused training of language models and techniques for end-to-end hypertuning of the framework

US2025218428A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2025218428-A1
Application numberUS-202519085675-A
CountryUS
Kind codeA1
Filing dateMar 20, 2025
Priority dateSep 24, 2021
Publication dateJul 3, 2025
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machine learning model on an unlabeled set of training data pertaining to a task that the machine learning model was pre-trained for as part of the language modeling, and the unlabeled set of training data is obtained with respect to a target domain, a target task, or a target language, and (ii) training the machine learning model on a labeled set of training data that pertains to another task that is an auxiliary task related to a downstream task to be performed using the machine learning model or output from the machine learning model.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer implemented method, comprising: obtaining a machine learning model pre-trained for language modeling; performing an iterative hypertuning process comprising: (a) selecting one or more original auxiliary tasks from a pool of auxiliary tasks based on one or more relationships of the one or more original auxiliary tasks to a downstream task, (b) assigning hyperparameters to the machine learning model, (c) post-training the machine learning model for the one or more auxiliary tasks using labeled data associated with the one or more auxiliary tasks and the assigned hyperparameters, wherein the post-training comprises performing iterative training operations to optimize model parameters of the machine learning model and generate a focused machine learning model, (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, the downstream task, or both, (e) determining a performance metric based on the output, and (f) performing (a)-(e) based on the performance metric to optimize selecting the one or more auxiliary tasks and assigning the hyperparameters, wherein (a)-(e) are repeated through n number of iterations until an optimal combination of the one or more auxiliary tasks and the hyperparameters are found to solve an optimization or search problem; and providing the focused machine learning model comprising the optimized model parameters. 2 . The computer implemented method of claim 1 , wherein the one or more original auxiliary tasks are selected based on data that indicates the one or more relationships of the one or more original auxiliary tasks to the downstream task, which is indicative that when the machine learning model is focused on the one or more original auxiliary tasks the performance metric will be improved. 3 . The computer implemented method of claim 1 , wherein the assigning the hyperparameters comprises defining a command-line argument in a training service for each of the hyperparameters to be tuned, and using a value passed in the command-line argument to set the corresponding hyperparameter in code of a training application. 4 . The computer implemented method of claim 3 , wherein: the post-training is configured with hyperparameter tuning, and each of the hyperparameters to be tuned, type of each of the hyperparameters, and the range of values to try for the optimization are defined, the post-training is performed by the training service executing the training application, each of the hyperparameters are identified using a same name as a corresponding argument defined in the training service, and the training service includes the command-line arguments using the names when the training service executes the training application for post-training the machine learning model. 5 . The computer implemented method of claim 1 , wherein the iterative hypertuning process further comprises: (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, (d.1) obtaining, using a separate machine learning model, output associated with performance of the downstream task based on the output of the focused machine learning model, (e) determining the performance metric based on the output of the focused machine learning model, (e.1) determining another performance metric based on the output of the separate machine learning model, and (f) performing (a)-(e.1) based on the performance metric, the another performance metric, or a combination thereof to optimize selecting the one or more auxiliary tasks and assigning the hyperparameters, wherein (a)-(e.1) are repeated through n number of iterations until the optimal combination of the one or more auxiliary tasks and the hyperparameters are found to solve the optimization or search problem. 6 . The computer implemented method of claim 5 , wherein optimizing the selecting the one or more auxiliary tasks and assigning the hyperparameters is performed using a tuning algorithm to search and identify a best combination of hyperparameters including the one or more auxiliary tasks to solve the optimization or search problem. 7 . The computer implemented method of claim 6 , wherein the tuning algorithm executes a search strategy that includes grid search or random search to search and identify the best combination of hyperparameters. 8 . A system comprising: one or more data processors; and one or more non-transitory computer readable media storing instructions which, when executed by the one or more data processors, cause the one or more data processors to perform processing comprising: obtaining a machine learning model pre-trained for language modeling; performing an iterative hypertuning process comprising: (a) selecting one or more original auxiliary tasks from a pool of auxiliary tasks based on one or more relationships of the one or more original auxiliary tasks to a downstream task, (b) assigning hyperparameters to the machine learning model, (c) post-training the machine learning model for the one or more auxiliary tasks using labeled data associated with the one or more auxiliary tasks and the assigned hyperparameters, wherein the post-training comprises performing iterative training operations to optimize model parameters of the machine learning model and generate a focused machine learning model, (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, the downstream task, or both, (e) determining a performance metric based on the output, and (f) performing (a)-(e) based on the performance metric to optimize selecting the one or more auxiliary tasks and assigning the hyperparameters, wherein (a)-(e) are repeated through n number of iterations until an optimal combination of the one or more auxiliary tasks and the hyperparameters are found to solve an optimization or search problem; and providing the focused machine learning model comprising the optimized model parameters. 9 . The system of claim 8 , wherein the one or more original auxiliary tasks are selected based on data that indicates the one or more relationships of the one or more original auxiliary tasks to the downstream task, which is indicative that when the machine learning model is focused on the one or more original auxiliary tasks the performance metric will be improved. 10 . The system of claim 8 , wherein the assigning the hyperparameters comprises defining a command-line argument in a training service for each of the hyperparameters to be tuned, and using a value passed in the command-line argument to set the corresponding hyperparameter in code of a training application. 11 . The system of claim 10 , wherein: the post-training is configured with hyperparameter tuning, and each of the hyperparameters to be tuned, type of each of the hyperparameters, and the range of values to try for the optimization are defined, the post-training is performed by the training service executing the training application, each of the hyperparameters are identified using a same name as a corresponding argument defined in the training service, and the training service includes the command-line arguments using the names when the training service executes the training application for post-training the machine learning model. 12 . The system of claim 8 , wherein the iterative hypertuning process further comprises: (d) obtaining, using the focused machine learning model, output associated with performance of the one or more auxiliary tasks, (d.1) obtaining, using a separate machine lea

Assignees

Inventors

Classifications

  • using context dependencies, e.g. language models · CPC title

  • updating or merging of old and new templates; Mean values; Weighting · CPC title

  • Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

  • using kernel methods, e.g. support vector machines [SVM] · CPC title

  • Ensemble learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2025218428A1 cover?
Techniques are disclosed herein for focused training of language models and end-to-end hypertuning of the framework. In one aspect, a method is provided that includes obtaining a machine learning model pre-trained for language modeling, and post-training the machine learning model for various tasks to generate a focused machine learning model. The post-training includes: (i) training the machin…
Who is the assignee on this patent?
Oracle Int Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 03 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).