Systems and methods for video paragraph captioning using hierarchical recurrent neural networks
US-10395118-B2 · Aug 27, 2019 · US
US11803752B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11803752-B2 |
| Application number | US-202117165509-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 2, 2021 |
| Priority date | Dec 13, 2018 |
| Publication date | Oct 31, 2023 |
| Grant date | Oct 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementations of the present specification provide a model-based prediction method and apparatus. The method includes: a model running environment receives an input tensor of a machine learning model; the model running environment sends a table query request to an embedding running environment, the table query request including the input tensor, to request low-dimensional conversion of the input tensor; the model running environment receives a table query result returned by the embedding running environment, the table query result being obtained by the embedding running environment by performing embedding query and processing based on the input tensor; and the model running environment inputs the table query result into the machine learning model, and runs the machine learning model to complete model-based prediction.
Opening claim text (preview).
What is claimed is: 1. A model-based prediction method performed by a machine learning system, the method comprising: receiving, by a model running environment of a plurality of model running environments, an input tensor of a machine learning model including at least one of a Wide & Deep model or a Deep Factorization Machine model, wherein the model running environment accommodates the machine learning model on a first computing node of a plurality of computing nodes, and wherein another model running environment of the plurality of model running environments accommodates another machine learning model that is different than the machine learning model on a second computing node of the plurality of computing nodes; sending, by the model running environment, a table query request to an embedding model deployed in an embedding running environment of a plurality of embedding running environments, the table query request including the input tensor, to request low-dimensional conversion of the input tensor, wherein a memory of the first computing node is insufficient to accommodate both the machine learning model and the embedding model, the embedding running environment accommodates the embedding model on a third computing node of the plurality of computing nodes, and another embedding running environment of the plurality of embedding environments accommodates a copy of the embedding model on a fourth computing node of the plurality of computing nodes to be concurrently accessible to the other machine learning model; obtaining, by the embedding running environment, a plurality of vectors based on the table query request and the input tensor; generating a table query result based on the obtained plurality of vectors, wherein the table query result is generated, at least in part, by combining the obtained plurality of vectors into a single vector; receiving, by the model running environment, the table query result returned by the embedding running environment, the table query result being obtained by the embedding running environment by querying an embedding table for the low-dimensional conversion that is associated with the machine learning model based on the input tensor; and inputting, by the model running environment, the table query result into the machine learning model, and executing the machine learning model to complete model-based prediction. 2. The method according to claim 1 , wherein the model running environment is a physical execution unit or a virtual execution unit, and the embedding running environment is a physical execution unit or a virtual execution unit. 3. The method according to claim 1 , wherein each embedding running environment implements a single embedding model, and each model running environment implements a single machine learning model. 4. A machine learning method performed by a machine learning system executing on a plurality of running environments including a plurality of model running environments and a plurality of embedding running environments, the method comprising: receiving, by a model running environment of the plurality of model running environments, an input to a machine learning model implemented on the model running environment, wherein the model running environment accommodates the machine learning model on a first computing node of a plurality of computing nodes, and wherein another model running environment of the plurality of model running environments accommodates another machine learning model on a second computing node of the plurality of computing nodes; sending, from the model running environment to an embedding model deployed in an embedding running environment of the plurality of embedding environments, a request for converting the input, wherein a memory of the first computing node is insufficient to accommodate both the machine learning model and the embedding model, the embedding running environment accommodates the embedding model on a third computing node of the plurality of computing nodes, and another embedding running environment of the plurality of embedding environments accommodates a copy of the embedding model on a fourth computing node of the plurality of computing nodes to be concurrently accessible to the other machine learning model; receiving, by the model running environment, a result returned from the embedding running environment, the result including a low-dimensional representation of the input; and feeding, by the model running environment, the low-dimensional representation into the machine learning model to perform model-based prediction. 5. The method according to claim 4 , wherein the sending, from the model running environment to the embedding running environment, a request for converting the input includes: sending a local request for converting the input, wherein the embedding running environment and the model running environment are located on a same physical node; or sending a remote request for converting the input, wherein the embedding running environment and the model running environment are located on different physical nodes. 6. The method according to claim 4 , wherein different hardware resources are configured for different model running environments, the hardware resources being adapted to running requirements of machine learning models in the model running environments. 7. The method according to claim 6 , wherein the hardware resources each include at least one of a central processing unit or a hardware accelerator. 8. The method of claim 7 , wherein the hardware accelerator includes at least one of a field-programmable gate array or an application-specific integrated circuit chip designed for a specific purpose. 9. The method according to claim 4 , wherein the machine learning model includes at least one of a Wide & Deep model or a Deep Factorization Machine model. 10. A machine learning system comprising a plurality of embedding running environments and a plurality of model running environments: wherein a model running environment of the plurality of model running environments is configured to receive an input for a machine learning model, send a table query request including the input to an embedding model deployed in an embedding running environment of the plurality of embedding running environments, receive a response including a low-dimensional converted value of the input from the embedding running environment, and feed the low-dimensional converted value into the machine learning model to execute model-based prediction; wherein the model running environment accommodates the machine learning model on a first computing node of a plurality of computing nodes, another model running environment of the plurality of model running environments accommodates another machine learning model on a second computing node of the plurality of computing nodes; and wherein the embedding running environment is configured to perform embedding query based on the input to obtain the low-dimensional converted value by obtaining a plurality of vectors based on the input and generating the response based, at least in part, on combining the plurality of vectors into a single vector, and send the response including the low-dimensional converted value back to the model running environment; and wherein a memory of the first computing node is insufficient to accommodate both the machine learning model and the embedding model, the embedding running environment accommodates the embedding model on a third computing node of the plurality of computing nodes, and another embedding running environment of the plurality of embedding environments accommodates a copy of the embedding model on a fourth computing node of the plurality of computing nodes to
Feedforward networks · CPC title
Learning methods · CPC title
Architecture, e.g. interconnection topology · CPC title
Machine learning · CPC title
using neural networks only · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.