Utilizing pre-event and post-event input streams to engage an automated assistant
US-2021065693-A1 · Mar 4, 2021 · US
US12555569B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12555569-B2 |
| Application number | US-202318230921-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 7, 2023 |
| Priority date | Jul 25, 2022 |
| Publication date | Feb 17, 2026 |
| Grant date | Feb 17, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Provided is a system to provide natural utterance by a voice assistant and method thereof, wherein the system comprises an automatic speech recognition module for converting one or more unsegmented voice inputs into textual format in real-time. Further, a natural language understanding module extracts the information and intent of the user from the converted textual inputs, wherein the natural language understanding module comprises a communication classification unit for classifying the user inputs into one or more pre-defined classes. Further, the system comprises a processing module for analyzing and processing the inputs from the natural language understanding module and activity identification module, wherein the processing module provides real-time intuitive mingling responses based on the responses, contextual pauses and ongoing activity of the user.
Opening claim text (preview).
What is claimed is: 1 . A method of providing natural utterance by a voice assistant, the method comprising: receiving, through a microphone, at least one unsegmented voice input from a user; converting the at least one unsegmented voice input into a textual format; extracting user information from the textual format; classifying the extracted user information into one or more pre-defined classes; subsequent to the classifying the extracted user data, tracking activity of the user and identifying an environment of the user based on data received through one or more sensors including the microphone; calculating at least one contextual natural utterance interval during conversations between the user and the voice assistant; prioritizing and sequencing at least one contextual response to the at least one unsegmented voice input, wherein the at least one contextual response is fetched from a virtual server; and providing one or more responses based on the at least one contextual natural utterance interval and ongoing environmental activity detected in the identified environment of the user. 2 . The method of claim 1 , further comprising: detecting an utterance of a wake word, wherein the at least one unsegmented voice input received from the user is received after detection of the wake word. 3 . The method of claim 1 , wherein the one or more pre-defined classes comprise a request and a command, and wherein the request and the command pre-defined classes each comprise a momentary communication containing at least one prompt response from the voice assistant. 4 . The method of claim 1 , wherein the one or more pre-defined classes comprise an instruction comprising a prolonged communication between the user and the voice assistant and further comprising a plurality of contextual responses. 5 . The method of claim 1 , further comprising: subsequent to the prioritizing and sequencing of the at least one contextual response to the at least one unsegmented voice input received from the user, identifying an active listening state of the user and performing at least one task based on the identified active listening state of the user and the at least one contextual natural utterance interval. 6 . The method of claim 1 , wherein the providing one or more responses further comprises: using machine pre-trained deep neural network. 7 . A system for providing natural utterances by a voice assistant, the system comprising: a microphone; at least one memory storing at least one instruction; at least one processor in communication with the at least one memory and configured to execute the at least one instruction to: receive, through the microphone, at least one unsegmented voice input from a user, identify an utterance of a wake word received through the microphone, based on identifying the utterance of the wake word, convert the at least one unsegmented voice input into a textual format, extract user information from the textual format, classify the extracted user information into one or more pre-defined classes, subsequent to classifying the extracted user data, tracking activity of the user and identifying an environment of the user based on data received through one or more sensors including the microphone, calculate at least one contextual natural utterance interval during conversations between the user and the voice assistant, prioritize and sequence at least one contextual response to the at least one unsegmented voice input, wherein the at least one contextual response is fetched from a virtual server, and identify an active listening state of the user and perform at least one task based on the identified active listening state of the user. 8 . The system of claim 7 , wherein the one or more pre-defined classes comprise a request and a command, and wherein the request pre-defined class and the command pre-defined class each comprise a momentary communication triggering a prompt response from the voice assistant. 9 . The system of claim 7 , wherein the one or more pre-defined classes comprise an instruction comprising a prolonged communication between the user and the voice assistant and further comprising a plurality of contextual responses. 10 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: fetch the at least one contextual response from the virtual server based on the extracted user information. 11 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: convert the at least one contextual response from a text format into a speech format. 12 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: convert the at least one unsegmented voice input into the textual format by detecting positive, negative and neutral sentiments present in the at least one unsegmented voice input provided by the user, and extract the user information from the textual format by classifying an intent of the user based on sentiments detected in the at least one unsegmented voice input, and by determining one or more subjects present in the at least one unsegmented voice input. 13 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to classify the extracted user information into one or more pre-defined classes by: splitting the converted text format of the at least one unsegmented voice input provided by a user into one or more tokens, wherein each token represents a word from the converted text format of the at least one unsegmented voice input provided by a user, classifying the tokens into one or more pre-defined classes, determining a context and a relation between the classified tokens, and determining a type of conversation based on determined context and the determined relation between the classified tokens. 14 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: receive and organize unorganized raw data from a plurality of input devices located in a pre-defined vicinity, extract, from the organized data, information pertaining to the user, determine an intended action of the user based on the extracted information, identify the environment of the user based on the organized data, receive data pertaining to the environment of the user and based on the received data pertaining to the environment of the user, determine the location of the user, and classify data related to the user into activity type, activity state and environment based on at least one of the extracted information pertaining to the user, the determination of the environment of the user, and the determined location of the user. 15 . A non-transitory computer readable medium having instructions stored therein, which when executed by at least one processor cause the at least one processor to execute a method of providing natural utterances by a voice assistant, the method comprising: receiving, through a microphone, at least one unsegmented voice input from a user; converting the at least one unsegmented voice input into a textual format; extracting user information from the textual format; classifying the extracted user information into one or more pre-defined classes; subsequent to the classifying the extracted user data, tracking activity of the user and identifying an environment of the user based on data received throug
Word spotting · CPC title
using artificial neural networks · CPC title
using statistical methods · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Natural language query formulation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.