Voice-based auto-completions and auto-responses for assistant systems

US12475170B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12475170-B2
Application numberUS-202017120013-A
CountryUS
Kind codeB2
Filing dateDec 11, 2020
Priority dateDec 11, 2020
Publication dateNov 18, 2025
Grant dateNov 18, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes receiving a first input by a user from a client system associated with the user, wherein the first input is in a voice modality, analyzing the first input to generate one or more candidate hypotheses, determining one or more modalities for presenting output generated by the one or more computing systems to the user at the client system, and sending instructions to the client system for presenting one or more suggested auto-completions corresponding to one or more of the candidate hypotheses, respectively, wherein each suggested auto-completion comprises the corresponding candidate hypothesis, and wherein the one or more suggested auto-completions are presented in the one or more determined modalities.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method comprising, by one or more computing systems: receiving, from a client system associated with a user, a first input by the user, wherein the first input is in a voice modality; analyzing the first input to generate one or more candidate hypotheses based on predictions of one or more long utterances the user is likely to speak given the first input; determining, based on context information associated with the user, one or more modalities for presenting, to the user at the client system, output generated by the one or more computing systems; and sending, to the client system, instructions for presenting one or more suggested auto-completions corresponding to one or more of the candidate hypotheses, respectively, wherein each suggested auto-completion comprises the corresponding candidate hypothesis, and wherein the one or more suggested auto-completions are presented in the one or more determined modalities, wherein analyzing the first input to generate the one or more candidate hypotheses is further based on the context information associated with the user. 2 . The method of claim 1 , wherein the client system comprises an extended-reality (XR) display device, wherein the determined modalities comprise a visual modality, and wherein the suggested auto-completions are presented as XR objects by the XR display device. 3 . The method of claim 1 , further comprising: accessing a media content associated with the user; generating one or more communication contents responsive to the accessed media content based on one or more of the accessed media content, a dialog state associated with the user, the context information associated with the user, location information associated with the user, or one or more multimodal signals associated with the user; and sending, to the client system, instructions for presenting the accessed media content and the one or more generated communication contents to the user. 4 . The method of claim 3 , further comprising: determining one or more modalities for presenting, to the user at the client system, the accessed media content and the one or more generated communication contents. 5 . The method of claim 4 , wherein the determination is based on one or more of: the accessed media content; a dialog state associated with the user; the context information associated with the user; location information associated with the user; or one or more multimodal signals associated with the user. 6 . The method of claim 3 , wherein the accessed media content comprises one or more of: a post; a message; an entity update; a comment on a post; an image; a video clip; a trending topic; or news. 7 . The method of claim 3 , wherein each of the one or more generated communication contents comprises one or more of: a comment on a post; a reply to a message; a comment on an entity update; a reply to a comment on a post; a comment on an image; a comment on a video clip; a message; or a review. 8 . The method of claim 1 , further comprising: determining whether to augment the first input based on one or more of: a dialog state associated with the user; the context information with the user; location information associated with the user; or one or more multimodal signals associated with the user. 9 . The method of claim 8 , wherein the first input comprises one or more pauses, and wherein determining whether to augment the first input is responsive to the one or more pauses. 10 . The method of claim 1 , further comprising: receiving, from the client system, a second input by the user indicating a selection by the user of a first suggested auto-completion of the one or more suggested auto-completions; and executing, via one or more agents, one or more tasks based on the first suggested auto-completion selected by the user. 11 . The method of claim 10 , wherein second input comprises one or more of: a motion input; a gesture input; a gaze input; or a voice input. 12 . The method of claim 1 , wherein analyzing the first input to generate the one or more candidate hypotheses is based on a personalized language model associated with the user. 13 . The method of claim 12 , wherein the personalized language model is trained based on a plurality of training data comprising one or more of: newsfeed posts associated with the user; newsfeed comments associated with the user; messages in one or more messaging interfaces associated with the user; data characterizing one or more domains; dialog states of one or more dialog sessions associated with the user; user profile data associated with the user; or task states associated with one or more tasks. 14 . The method of claim 1 , wherein analyzing the first input to generate the one or more candidate hypotheses is further based on one or more of: a dialog state associated with the user; location information associated with the user; and one or more multimodal signals associated with the user. 15 . The method of claim 1 , wherein the determined one or more modalities comprise one or more of: a textual modality; an auditory modality; or a visual modality. 16 . The method of claim 1 , wherein determining the one or more modalities is based on one or more of: the client system; a user preference associated with the user; the context information with the user; an environment associated with the user; or a prior interaction by the user. 17 . The method of claim 1 , wherein the first input comprises an incomplete natural-language utterance, and wherein each of one or more suggested auto-completions completes the incomplete natural-language utterance. 18 . The method of claim 1 , wherein the first input comprises a complete natural-language utterance, and wherein the method further comprising: determining, based on one or more of a dialog state associated with the user or the context information associated with the user, that at least one action is associated with the complete natural-language utterance; determining that at least one object needs to be embedded within the at least one action; and determining that at least one attribute needs to be declared for the at least one object, wherein each of the one or more suggested auto-completions comprises the at least one object and the at least one attribute. 19 . The method of claim 1 , further comprising: ranking the one or more suggested auto-completions based on one or more of: a dialog state associated with the user; the context information associated with the user; location information associated with the user; or one or more multimodal signals associated with the user, wherein the one or more suggested auto-completions are presented in an order based on their respective rankings. 20 . One or more computer-readable non-transitory storage media embodying software that is operable when executed to: receive, from a client system associated with a user, a first input by the user, wherein the first input is in a voice modality; analyze the first input to generate one or more candidate hypotheses based on predictions of one or more long utterances the user is likely to speak given the first input; determine, based on context information associated with the user, one or more modalities for presenting, to the user at the client system, output generated by one or more computing systems; and send, to the client system, instructions for presen

Assignees

Inventors

Classifications

  • using artificial neural networks · CPC title

  • Training · CPC title

  • Eyeglass type (eyeglass details G02C) · CPC title

  • Head-up displays · CPC title

  • for supporting social networking services · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12475170B2 cover?
In one embodiment, a method includes receiving a first input by a user from a client system associated with the user, wherein the first input is in a voice modality, analyzing the first input to generate one or more candidate hypotheses, determining one or more modalities for presenting output generated by the one or more computing systems to the user at the client system, and sending instructi…
Who is the assignee on this patent?
Meta Platforms Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/90324. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 18 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).