Data Synthesis for Domain Development of Natural Language Understanding for Assistant Systems

US2023419952A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023419952-A1
Application numberUS-202217747345-A
CountryUS
Kind codeA1
Filing dateMay 18, 2022
Priority dateMay 18, 2022
Publication dateDec 28, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes receiving a request to train a natural-language understanding (NLU) model for a new domain, accessing a context-free grammar associated with the new domain, wherein the context-free grammar defines production rules with respect to ontology tokens associated with the new domain and utterance tokens for generating natural-language strings in the new domain, generating utterance-frame pairs based on traversing a hierarchical grammar tree associated with the context-free grammar based on the production rules, wherein each utterance-frame pair comprises an utterance and a corresponding frame, wherein each frame comprises ontology tokens associated with the new domain and utterance tokens corresponding to one or more of the ontology tokens of the frame, and training the NLU model based on the utterance-frame pairs.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising, by one or more computing systems: receiving a request to train a natural-language understanding (NLU) model for a new domain; accessing a context-free grammar associated with the new domain, wherein the context-free grammar defines one or more production rules with respect to ontology tokens associated with the new domain and utterance tokens for generating natural-language strings in the new domain; generating a plurality of utterance-frame pairs based on traversing a hierarchical grammar tree associated with the context-free grammar based on the one or more production rules, wherein each utterance-frame pair comprises an utterance and a corresponding frame, wherein each frame comprises one or more ontology tokens associated with the new domain and one or more utterance tokens corresponding to one or more of the ontology tokens of the frame; and training the NLU model based on the plurality of utterance-frame pairs. 2 . The method of claim 1 , further comprising: generating a plurality of synthesis representations corresponding to a plurality of utterances, wherein the plurality of synthesis representations are used to generate the plurality of utterance-frame pairs, respectively, and wherein each of the plurality of synthesis representations is an intermediate hybrid representation of the utterance and the corresponding frame of the respective utterance-frame pair. 3 . The method of claim 2 , wherein the synthesis representation comprises one or more of a prefix, an ontology token, or an utterance token, and wherein the ontology token comprises an intent or a slot associated with the new domain. 4 . The method of claim 2 , wherein the synthesis representation comprises one or more utterance tokens, wherein the method further comprises: generating the utterance in each utterance-frame pair based on extracting the one or more utterance tokens from the synthesis representation. 5 . The method of claim 2 , wherein the synthesis representation comprises one or more intents, one or more slots, and one or more utterance tokens, wherein the method further comprises: generating the frame in each utterance-frame pair based on extracting the one or more intents, the one or more slots, and one or more of the utterance tokens associated with the one or more slots from the synthesis representation. 6 . The method of claim 2 , further comprising: assigning one or more probabilities to the synthesis representation, wherein the one or more probabilities are associated with one or more of the prefix, the ontology token, or the utterance token, respectively. 7 . The method of claim 1 , wherein the hierarchical grammar tree comprises one or more non-terminal nodes and one or more terminal nodes, wherein each of the non-terminal nodes comprises one or more of an ontology token, and wherein each of the terminal nodes comprises an utterance token. 8 . The method of claim 7 , wherein the one or more production rules specify one or more paths from one or more of the non-terminal nodes to one or more of the terminal nodes, respectively. 9 . The method of claim 8 , wherein traversing the hierarchical grammar tree based on the one or more production rules comprises: selecting a path from the specified paths; identifying ontology tokens corresponding to non-terminal nodes along the path; and identifying utterance tokens corresponding to terminal nodes along the path. 10 . The method of claim 1 , wherein the frame in each utterance-frame pair is a structured representation of the corresponding utterance, wherein the structured representation is based on one or more of an intent, a slot, or an utterance token associated with the slot. 11 . The method of claim 1 , further comprising: receiving, from a client system, a user utterance associated with the new domain; determining, based on the trained NLU model, one or more intents and one or more slots associated with the user utterance, wherein the one or more intents and the one or more slots are associated with the new domain; determining, based on the one or more intents and the one or more slots, one or more tasks; executing the one or more tasks; and sending, to the client system, instructions for presenting execution results of one or more of the tasks. 12 . The method of claim 1 , wherein the trained NLU model is operable to take a user utterance as an input and generate a frame corresponding to the user utterance as an output. 13 . One or more computer-readable non-transitory storage media embodying software that is operable when executed to: receive a request to train a natural-language understanding (NLU) model for a new domain; access a context-free grammar associated with the new domain, wherein the context-free grammar defines one or more production rules with respect to ontology tokens associated with the new domain and utterance tokens for generating natural-language strings in the new domain; generate a plurality of utterance-frame pairs based on traversing a hierarchical grammar tree associated with the context-free grammar based on the one or more production rules, wherein each utterance-frame pair comprises an utterance and a corresponding frame, wherein each frame comprises one or more ontology tokens associated with the new domain and one or more utterance tokens corresponding to one or more of the ontology tokens of the frame; and train the NLU model based on the plurality of utterance-frame pairs. 14 . The media of claim 13 , wherein the software is further operable when executed to: generate a plurality of synthesis representations corresponding to a plurality of utterances, wherein the plurality of synthesis representations are used to generate the plurality of utterance-frame pairs, respectively, and wherein each of the plurality of synthesis representations is an intermediate hybrid representation of the utterance and the corresponding frame of the respective utterance-frame pair. 15 . The media of claim 14 , wherein the synthesis representation comprises one or more of a prefix, an ontology token, or an utterance token, and wherein the ontology token comprises an intent or a slot associated with the new domain. 16 . The media of claim 14 , wherein the synthesis representation comprises one or more intents, one or more slots, and one or more utterance tokens, wherein the software is further operable when executed to: generate the utterance in each utterance-frame pair based on extracting the one or more utterance tokens from the synthesis representation. 17 . The media of claim 14 , wherein the synthesis representation comprises one or more utterance tokens, wherein the software is further operable when executed to: generate the frame in each utterance-frame pair based on extracting the one or more intents, the one or more slots, and one or more of the utterance tokens associated with the one or more slots from the synthesis representation. 18 . The media of claim 14 , wherein the software is further operable when executed to: assign one or more probabilities to the synthesis representation, wherein the one or more probabilities are associated with one or more of the prefix, the ontology token, or the utterance token, respectively. 19 . The media of claim 13 , wherein the trained NLU model is operable to take a user utterance as an input and generate a frame corresponding to the user utterance as an output. 20 . A system comprising: one or more processors; a

Assignees

Inventors

Classifications

  • G10L15/063Primary

    Training · CPC title

  • Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

  • Semantic analysis · CPC title

  • Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title

  • Processing or translation of natural language (natural language analysis G06F40/20; semantic analysis G06F40/30) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023419952A1 cover?
In one embodiment, a method includes receiving a request to train a natural-language understanding (NLU) model for a new domain, accessing a context-free grammar associated with the new domain, wherein the context-free grammar defines production rules with respect to ontology tokens associated with the new domain and utterance tokens for generating natural-language strings in the new domain, ge…
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Dec 28 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).