On-device Convolutional Neural Network Models for Assistant Systems

US2021117623A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2021117623-A1
Application numberUS-201916703700-A
CountryUS
Kind codeA1
Filing dateDec 4, 2019
Priority dateOct 18, 2019
Publication dateApr 22, 2021
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes receiving a user input comprising one or more words at a client system, wherein each word comprises one or more characters, inputting the words to a convolutional neural network (CNN) model stored on the client system, accessing a plurality of character-embeddings for a plurality of characters, respectively, from a data store of the client system, generating one or more word-embeddings for the one or more words, respectively, based on the accessed character-embeddings by processing the accessed character-embeddings with one or more convolutional layers and one or more gated linear units of the CNN model, determining one or more tasks corresponding to the user input for execution based on an analysis of the one or more word-embeddings by the CNN model, and providing an output responsive to the user input based on the execution of the one or more tasks at the client system.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising, by a client system: receiving, at the client system, a user input comprising one or more words, wherein each word comprises one or more characters; inputting the one or more words to a convolutional neural network (CNN) model stored on the client system; accessing, from a data store of the client system, a plurality of character-embeddings for a plurality of characters, respectively; generating, based on the accessed character-embeddings, one or more word-embeddings for the one or more words, respectively, by processing the accessed character-embeddings with one or more convolutional layers and one or more gated linear units of the CNN model; determining, based on an analysis of the one or more word-embeddings by the CNN model, one or more tasks corresponding to the user input for execution; and providing, at the client system, an output responsive to the user input based on the execution of the one or more tasks. 2 . The method of claim 1 , further comprising: parsing, by a natural-language understanding module stored on the client system, the user input into the one or more words. 3 . The method of claim 1 , wherein the CNN model comprises a plurality of layers, wherein the plurality of layers comprise at least a convolutional layer, a pooling layer, a gated linear unit, a linear layer, and a residual connection with gradient clipping. 4 . The method of claim 1 , further comprising: determining one or more intents associated with the user input by analyzing the one or more word-embeddings based on the CNN model. 5 . The method of claim 4 , wherein determining the one or more intents comprises: generating, by the one or more convolutional layers and one or more pooling layers of the CNN model, a feature representation for the user input based on the one or more word-embeddings; calculating, by one or more linear layers of the CNN model, a plurality of probabilities corresponding to a plurality of intents based on the feature representation, wherein each probability indicates a likelihood that a corresponding intent is associated with the user input; and determining, based on the calculated probabilities, the one or more intents from the plurality of intents. 6 . The method of claim 1 , further comprising: determining one or more slots associated with the user input by analyzing the one or more word-embeddings based on the CNN model. 7 . The method of claim 6 , wherein determining the one or more slots comprises: calculating, by one or more linear layers of the CNN model, a plurality of probabilities corresponding to a plurality of slots based on the one or more word-embeddings, wherein each probability indicates a likelihood that a corresponding slot is associated with a respective word; and determining, based on the calculated probabilities, the one or more slots from the plurality of slots. 8 . The method of claim 1 , wherein the processing of the accessed character-embeddings with the one or more convolutional layers of the CNN model is based on one or more digital signal processing (DSP) algorithms, wherein the one or more DSP algorithms are determined based on hardware components of the client system. 9 . The method of claim 1 , wherein the analysis of the one or more word-embeddings by the CNN model is based on one or more digital signal processing (DSP) algorithms, wherein the one or more DSP algorithms are determined based on hardware components of the client system. 10 . The method of claim 1 , wherein generating the one or more word-embeddings is further based on a plurality of dictionary features. 11 . The method of claim 1 , wherein a plurality of parameters and a plurality of activations associated with the CNN model are quantized. 12 . The method of claim 1 , further comprising: sending, to one or more remote servers, the one or more tasks for execution, wherein the output is generated by the one or more remote servers based on the execution of the one or more tasks. 13 . The method of claim 12 , further comprising: receiving, at the client system from the one or more remote servers, instructions for providing the output. 14 . The method of claim 1 , wherein the CNN model comprises a plurality of layers, wherein the plurality of layers are generated based on one or more pruning algorithms, wherein the one or more pruning algorithms are determined based on hardware components of the client system. 15 . The method of claim 1 , wherein the CNN model comprises a plurality of parameters, wherein the plurality of parameters are determined based on one or more sparsification algorithms. 16 . One or more computer-readable non-transitory storage media embodying software that is operable when executed to: receive, at the client system, a user input comprising one or more words, wherein each word comprises one or more characters; input the one or more words to a convolutional neural network (CNN) model stored on the client system; access, from a data store of the client system, a plurality of character-embeddings for a plurality of characters, respectively; generate, based on the accessed character-embeddings, one or more word-embeddings for the one or more words, respectively, by processing the accessed character-embeddings with one or more convolutional layers and one or more gated linear units of the CNN model; determine, based on an analysis of the one or more word-embeddings by the CNN model, one or more tasks corresponding to the user input for execution; and provide, at the client system, an output responsive to the user input based on the execution of the one or more tasks. 17 . The media of claim 16 , wherein the software is further operable when executed to: parse, by a natural-language understanding module stored on the client system, the user input into the one or more words. 18 . The media of claim 16 , wherein the CNN model comprises a plurality of layers, wherein the plurality of layers comprise at least a convolutional layer, a pooling layer, a gated linear unit, a linear layer, and a residual connection with gradient clipping. 19 . The media of claim 16 , wherein the software is further operable when executed to: determine one or more intents associated with the user input by analyzing the one or more word-embeddings based on the CNN model. 20 . A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: receive, at the client system, a user input comprising one or more words, wherein each word comprises one or more characters; input the one or more words to a convolutional neural network (CNN) model stored on the client system; access, from a data store of the client system, a plurality of character-embeddings for a plurality of characters, respectively; generate, based on the accessed character-embeddings, one or more word-embeddings for the one or more words, respectively, by processing the accessed character-embeddings with one or more convolutional layers and one or more gated linear units of the CNN model; determine, based on an analysis of the one or more word-embeddings by the CNN model, one or more tasks corresponding to the user input for execution; and provide, at the client system, an output responsive to the user input based on the execution of the one or more tasks.

Assignees

Inventors

Classifications

  • G06Q10/40Primary

    Business processes related to social networking or social networking services · CPC title

  • using neural networks · CPC title

  • using classification, e.g. of video objects · CPC title

  • Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title

  • Facial expression recognition · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2021117623A1 cover?
In one embodiment, a method includes receiving a user input comprising one or more words at a client system, wherein each word comprises one or more characters, inputting the words to a convolutional neural network (CNN) model stored on the client system, accessing a plurality of character-embeddings for a plurality of characters, respectively, from a data store of the client system, generating…
Who is the assignee on this patent?
Facebook Tech Llc
What technology area does this patent fall under?
Primary CPC classification G06Q10/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 22 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).