System to provide natural utterance by a voice assistant and method thereof

US12555569B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12555569-B2
Application numberUS-202318230921-A
CountryUS
Kind codeB2
Filing dateAug 7, 2023
Priority dateJul 25, 2022
Publication dateFeb 17, 2026
Grant dateFeb 17, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is a system to provide natural utterance by a voice assistant and method thereof, wherein the system comprises an automatic speech recognition module for converting one or more unsegmented voice inputs into textual format in real-time. Further, a natural language understanding module extracts the information and intent of the user from the converted textual inputs, wherein the natural language understanding module comprises a communication classification unit for classifying the user inputs into one or more pre-defined classes. Further, the system comprises a processing module for analyzing and processing the inputs from the natural language understanding module and activity identification module, wherein the processing module provides real-time intuitive mingling responses based on the responses, contextual pauses and ongoing activity of the user.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method of providing natural utterance by a voice assistant, the method comprising: receiving, through a microphone, at least one unsegmented voice input from a user; converting the at least one unsegmented voice input into a textual format; extracting user information from the textual format; classifying the extracted user information into one or more pre-defined classes; subsequent to the classifying the extracted user data, tracking activity of the user and identifying an environment of the user based on data received through one or more sensors including the microphone; calculating at least one contextual natural utterance interval during conversations between the user and the voice assistant; prioritizing and sequencing at least one contextual response to the at least one unsegmented voice input, wherein the at least one contextual response is fetched from a virtual server; and providing one or more responses based on the at least one contextual natural utterance interval and ongoing environmental activity detected in the identified environment of the user. 2 . The method of claim 1 , further comprising: detecting an utterance of a wake word, wherein the at least one unsegmented voice input received from the user is received after detection of the wake word. 3 . The method of claim 1 , wherein the one or more pre-defined classes comprise a request and a command, and wherein the request and the command pre-defined classes each comprise a momentary communication containing at least one prompt response from the voice assistant. 4 . The method of claim 1 , wherein the one or more pre-defined classes comprise an instruction comprising a prolonged communication between the user and the voice assistant and further comprising a plurality of contextual responses. 5 . The method of claim 1 , further comprising: subsequent to the prioritizing and sequencing of the at least one contextual response to the at least one unsegmented voice input received from the user, identifying an active listening state of the user and performing at least one task based on the identified active listening state of the user and the at least one contextual natural utterance interval. 6 . The method of claim 1 , wherein the providing one or more responses further comprises: using machine pre-trained deep neural network. 7 . A system for providing natural utterances by a voice assistant, the system comprising: a microphone; at least one memory storing at least one instruction; at least one processor in communication with the at least one memory and configured to execute the at least one instruction to: receive, through the microphone, at least one unsegmented voice input from a user, identify an utterance of a wake word received through the microphone, based on identifying the utterance of the wake word, convert the at least one unsegmented voice input into a textual format, extract user information from the textual format, classify the extracted user information into one or more pre-defined classes, subsequent to classifying the extracted user data, tracking activity of the user and identifying an environment of the user based on data received through one or more sensors including the microphone, calculate at least one contextual natural utterance interval during conversations between the user and the voice assistant, prioritize and sequence at least one contextual response to the at least one unsegmented voice input, wherein the at least one contextual response is fetched from a virtual server, and identify an active listening state of the user and perform at least one task based on the identified active listening state of the user. 8 . The system of claim 7 , wherein the one or more pre-defined classes comprise a request and a command, and wherein the request pre-defined class and the command pre-defined class each comprise a momentary communication triggering a prompt response from the voice assistant. 9 . The system of claim 7 , wherein the one or more pre-defined classes comprise an instruction comprising a prolonged communication between the user and the voice assistant and further comprising a plurality of contextual responses. 10 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: fetch the at least one contextual response from the virtual server based on the extracted user information. 11 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: convert the at least one contextual response from a text format into a speech format. 12 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: convert the at least one unsegmented voice input into the textual format by detecting positive, negative and neutral sentiments present in the at least one unsegmented voice input provided by the user, and extract the user information from the textual format by classifying an intent of the user based on sentiments detected in the at least one unsegmented voice input, and by determining one or more subjects present in the at least one unsegmented voice input. 13 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to classify the extracted user information into one or more pre-defined classes by: splitting the converted text format of the at least one unsegmented voice input provided by a user into one or more tokens, wherein each token represents a word from the converted text format of the at least one unsegmented voice input provided by a user, classifying the tokens into one or more pre-defined classes, determining a context and a relation between the classified tokens, and determining a type of conversation based on determined context and the determined relation between the classified tokens. 14 . The system of claim 7 , wherein the at least one processor is further configured to execute the at least one instruction to: receive and organize unorganized raw data from a plurality of input devices located in a pre-defined vicinity, extract, from the organized data, information pertaining to the user, determine an intended action of the user based on the extracted information, identify the environment of the user based on the organized data, receive data pertaining to the environment of the user and based on the received data pertaining to the environment of the user, determine the location of the user, and classify data related to the user into activity type, activity state and environment based on at least one of the extracted information pertaining to the user, the determination of the environment of the user, and the determined location of the user. 15 . A non-transitory computer readable medium having instructions stored therein, which when executed by at least one processor cause the at least one processor to execute a method of providing natural utterances by a voice assistant, the method comprising: receiving, through a microphone, at least one unsegmented voice input from a user; converting the at least one unsegmented voice input into a textual format; extracting user information from the textual format; classifying the extracted user information into one or more pre-defined classes; subsequent to the classifying the extracted user data, tracking activity of the user and identifying an environment of the user based on data received throug

Assignees

Inventors

Classifications

  • Word spotting · CPC title

  • using artificial neural networks · CPC title

  • using statistical methods · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Natural language query formulation · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12555569B2 cover?
Provided is a system to provide natural utterance by a voice assistant and method thereof, wherein the system comprises an automatic speech recognition module for converting one or more unsegmented voice inputs into textual format in real-time. Further, a natural language understanding module extracts the information and intent of the user from the converted textual inputs, wherein the natural …
Who is the assignee on this patent?
Samsung Electronics Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 17 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).