Separate deployment of machine learning model and associated embedding

US11803752B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11803752-B2
Application numberUS-202117165509-A
CountryUS
Kind codeB2
Filing dateFeb 2, 2021
Priority dateDec 13, 2018
Publication dateOct 31, 2023
Grant dateOct 31, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations of the present specification provide a model-based prediction method and apparatus. The method includes: a model running environment receives an input tensor of a machine learning model; the model running environment sends a table query request to an embedding running environment, the table query request including the input tensor, to request low-dimensional conversion of the input tensor; the model running environment receives a table query result returned by the embedding running environment, the table query result being obtained by the embedding running environment by performing embedding query and processing based on the input tensor; and the model running environment inputs the table query result into the machine learning model, and runs the machine learning model to complete model-based prediction.

First claim

Opening claim text (preview).

What is claimed is: 1. A model-based prediction method performed by a machine learning system, the method comprising: receiving, by a model running environment of a plurality of model running environments, an input tensor of a machine learning model including at least one of a Wide & Deep model or a Deep Factorization Machine model, wherein the model running environment accommodates the machine learning model on a first computing node of a plurality of computing nodes, and wherein another model running environment of the plurality of model running environments accommodates another machine learning model that is different than the machine learning model on a second computing node of the plurality of computing nodes; sending, by the model running environment, a table query request to an embedding model deployed in an embedding running environment of a plurality of embedding running environments, the table query request including the input tensor, to request low-dimensional conversion of the input tensor, wherein a memory of the first computing node is insufficient to accommodate both the machine learning model and the embedding model, the embedding running environment accommodates the embedding model on a third computing node of the plurality of computing nodes, and another embedding running environment of the plurality of embedding environments accommodates a copy of the embedding model on a fourth computing node of the plurality of computing nodes to be concurrently accessible to the other machine learning model; obtaining, by the embedding running environment, a plurality of vectors based on the table query request and the input tensor; generating a table query result based on the obtained plurality of vectors, wherein the table query result is generated, at least in part, by combining the obtained plurality of vectors into a single vector; receiving, by the model running environment, the table query result returned by the embedding running environment, the table query result being obtained by the embedding running environment by querying an embedding table for the low-dimensional conversion that is associated with the machine learning model based on the input tensor; and inputting, by the model running environment, the table query result into the machine learning model, and executing the machine learning model to complete model-based prediction. 2. The method according to claim 1 , wherein the model running environment is a physical execution unit or a virtual execution unit, and the embedding running environment is a physical execution unit or a virtual execution unit. 3. The method according to claim 1 , wherein each embedding running environment implements a single embedding model, and each model running environment implements a single machine learning model. 4. A machine learning method performed by a machine learning system executing on a plurality of running environments including a plurality of model running environments and a plurality of embedding running environments, the method comprising: receiving, by a model running environment of the plurality of model running environments, an input to a machine learning model implemented on the model running environment, wherein the model running environment accommodates the machine learning model on a first computing node of a plurality of computing nodes, and wherein another model running environment of the plurality of model running environments accommodates another machine learning model on a second computing node of the plurality of computing nodes; sending, from the model running environment to an embedding model deployed in an embedding running environment of the plurality of embedding environments, a request for converting the input, wherein a memory of the first computing node is insufficient to accommodate both the machine learning model and the embedding model, the embedding running environment accommodates the embedding model on a third computing node of the plurality of computing nodes, and another embedding running environment of the plurality of embedding environments accommodates a copy of the embedding model on a fourth computing node of the plurality of computing nodes to be concurrently accessible to the other machine learning model; receiving, by the model running environment, a result returned from the embedding running environment, the result including a low-dimensional representation of the input; and feeding, by the model running environment, the low-dimensional representation into the machine learning model to perform model-based prediction. 5. The method according to claim 4 , wherein the sending, from the model running environment to the embedding running environment, a request for converting the input includes: sending a local request for converting the input, wherein the embedding running environment and the model running environment are located on a same physical node; or sending a remote request for converting the input, wherein the embedding running environment and the model running environment are located on different physical nodes. 6. The method according to claim 4 , wherein different hardware resources are configured for different model running environments, the hardware resources being adapted to running requirements of machine learning models in the model running environments. 7. The method according to claim 6 , wherein the hardware resources each include at least one of a central processing unit or a hardware accelerator. 8. The method of claim 7 , wherein the hardware accelerator includes at least one of a field-programmable gate array or an application-specific integrated circuit chip designed for a specific purpose. 9. The method according to claim 4 , wherein the machine learning model includes at least one of a Wide & Deep model or a Deep Factorization Machine model. 10. A machine learning system comprising a plurality of embedding running environments and a plurality of model running environments: wherein a model running environment of the plurality of model running environments is configured to receive an input for a machine learning model, send a table query request including the input to an embedding model deployed in an embedding running environment of the plurality of embedding running environments, receive a response including a low-dimensional converted value of the input from the embedding running environment, and feed the low-dimensional converted value into the machine learning model to execute model-based prediction; wherein the model running environment accommodates the machine learning model on a first computing node of a plurality of computing nodes, another model running environment of the plurality of model running environments accommodates another machine learning model on a second computing node of the plurality of computing nodes; and wherein the embedding running environment is configured to perform embedding query based on the input to obtain the low-dimensional converted value by obtaining a plurality of vectors based on the input and generating the response based, at least in part, on combining the plurality of vectors into a single vector, and send the response including the low-dimensional converted value back to the model running environment; and wherein a memory of the first computing node is insufficient to accommodate both the machine learning model and the embedding model, the embedding running environment accommodates the embedding model on a third computing node of the plurality of computing nodes, and another embedding running environment of the plurality of embedding environments accommodates a copy of the embedding model on a fourth computing node of the plurality of computing nodes to

Assignees

Inventors

Classifications

  • Feedforward networks · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

  • Architecture, e.g. interconnection topology · CPC title

  • G06N20/00Primary

    Machine learning · CPC title

  • G05B13/027Primary

    using neural networks only · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11803752B2 cover?
Implementations of the present specification provide a model-based prediction method and apparatus. The method includes: a model running environment receives an input tensor of a machine learning model; the model running environment sends a table query request to an embedding running environment, the table query request including the input tensor, to request low-dimensional conversion of the in…
Who is the assignee on this patent?
Advanced New Technologies Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 31 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).