Error tolerant neural network model compression
US-10229356-B1 · Mar 12, 2019 · US
US2019318245A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2019318245-A1 |
| Application number | US-201916452290-A |
| Country | US |
| Kind code | A1 |
| Filing date | Jun 25, 2019 |
| Priority date | Dec 26, 2016 |
| Publication date | Oct 17, 2019 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
This application provides a method, a terminal-side device, and a cloud-side device for data processing and a terminal-cloud collaboration system. The method includes: sending, by the terminal-side device, a request message to the cloud-side device; receiving, by the terminal-side device, a second neural network model that is obtained by compressing a first neural network model and that is sent by the cloud-side device, where the first neural network model is a neural network model on the cloud-side device that is used to process the cognitive computing task, and a hardware resource required when the second neural network model runs on the terminal-side device is within an available hardware resource capability range of the terminal-side device; and processing, by the terminal-side device, the cognitive computing task based on the second neural network model.
Opening claim text (preview).
1 . A method for data processing, comprising: sending, by a terminal-side device, a request message to a cloud-side device, wherein the request message is used to request a neural network model to process a cognitive computing task; receiving, by the terminal-side device, a second neural network model obtained by trimming a first neural network model sent by the cloud-side device, wherein the first neural network model is on the cloud-side device that is used to process the cognitive computing task, and a hardware resource required to execute the second neural network model is within an available hardware resource capability range of the terminal-side device; and processing, by the terminal-side device, the cognitive computing task using the second neural network model. 2 . The method according to claim 1 , wherein the terminal-side device comprises a neural network basic platform, the neural network basic platform comprises a neural network architecture component and a neural network parameter component, and the neural network architecture component is decoupled from the neural network parameter component; and wherein processing the cognitive computing task using the second neural network model comprises: when the second neural network model comprises an architecture update component, updating the neural network architecture component based on the architecture update component; when the second neural network model comprises a parameter update component, updating the neural network parameter component based on the parameter update component; and processing the cognitive computing task using the neural network basic platform with the updated neural network architecture component and the updated neural network parameter component. 3 . The method according to claim 1 , wherein sending a request message to a cloud-side device comprises: sending, by the terminal-side device, the request message to the cloud-side device in response to determining that: the terminal-side device lacks a neural network model used to process the cognitive computing task; or an accuracy of processing the cognitive computing task using a neural network model on the terminal-side device does not meet a cognitive accuracy tolerance, wherein the cognitive accuracy tolerance represents an expected accuracy of processing the cognitive computing task by the terminal-side device; or a hardware resource required to execute a neural network model on the terminal-side device to process the cognitive computing task exceeds the available hardware resource capability of the terminal-side device. 4 . The method according to claim 1 , wherein the request message includes indication information used to indicate a cognitive accuracy tolerance, so that the cloud-side device trims the first neural network model to obtain the second neural network model that meets the cognitive accuracy tolerance, wherein the cognitive accuracy tolerance represents an expected accuracy of processing the cognitive computing task by the terminal-side device. 5 . The method according to claim 1 , wherein the request message includes indication information used to indicate the available hardware resource capability of the terminal-side device. 6 . The method according to claim 1 , wherein the request message further includes an identifier identifying the first neural network model, so that the cloud-side device determines the first neural network model based on the identifier; or the request message further includes function information to describe a function of processing the cognitive computing task, so that the cloud-side device determines the first neural network model based on the function information. 7 . The method according to claim 1 , wherein a computation amount and a required storage capacity of the second neural network model are respectively less than a computation amount and a required storage capacity of the first neural network model. 8 . A method for data processing, comprising: receiving, by a cloud-side device, a request message from a terminal-side device, to request a neural network model to process a cognitive computing task; determining, by the cloud-side device based on the request message, a first neural network model to process the cognitive computing task; trimming, by the cloud-side device, the first neural network model to obtain a second neural network model, wherein a hardware resource required to execute the second neural network model is within an available hardware resource capability range of the terminal-side device; and sending, by the cloud-side device, the second neural network model to the terminal-side device, so that the terminal-side device processes the cognitive computing task using the second neural network model. 9 . The method according to claim 8 , wherein the trimming the first neural network model to obtain a second neural network model comprises: trimming, by the cloud-side device, a parameter component of the first neural network model to obtain the second neural network model, wherein a required storage capacity of a parameter component of the second neural network model is less than a required storage capacity of the parameter component of the first neural network model. 10 . The method according to claim 8 , wherein the trimming the first neural network model to obtain a second neural network model comprises: trimming, by the cloud-side device, an architecture component of the first neural network model to obtain a third neural network model, wherein a computation amount of a computation kernel of the third neural network model is less than a computation amount of a computation kernel of the first neural network model; and trimming, by the cloud-side device, a parameter component of the third neural network model to obtain the second neural network model, wherein a required storage capacity of a parameter component of the second neural network model is less than a required storage capacity of the parameter component of the third neural network model. 11 . The method according to claim 8 , wherein the request message includes indication information to indicate a cognitive accuracy tolerance, and the cognitive accuracy tolerance represents an expected accuracy of processing the cognitive computing task by the terminal-side device; and trimming the first neural network model to obtain a second neural network model comprises: trimming, by the cloud-side device based on the cognitive accuracy tolerance, the first neural network model to obtain the second neural network model, wherein an accuracy of processing the cognitive computing task using the second neural network model meets the cognitive accuracy tolerance. 12 . The method according to claim 8 , wherein the request message includes indication information to indicate an available hardware resource capability of the terminal-side device. 13 . The method according to claim 8 , wherein the request message further includes an identifier identifying the first neural network model; and determining, by the cloud-side device based on the request message, a first neural network model used to process the cognitive computing task comprises: determining, by the cloud-side device, the first neural network model based on the identifier. 14 . The method according to claim 8 , wherein the request message further includes function information to describe a function of processing the cognitive computing task; and determining, by the cloud-side device based on the request message, a first neural network model used to process the cognitive computing task comprises: determining, b
Partitioning or combining of resources · CPC title
Combinations of networks · CPC title
in which an application is distributed across nodes in the network (software deployment G06F8/60; multiprogramming arrangements G06F9/46) · CPC title
Architecture, e.g. interconnection topology · CPC title
involving the movement of software or configuration parameters (network booting or remote initial program loading [RIPL] G06F9/4416) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.