What technology area does this patent fall under?

Primary CPC classification G10L15/22. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Reducing the need for manual start/end-pointing and trigger phrases

US9715875B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9715875-B2
Application number	US-201414502737-A
Country	US
Kind code	B2
Filing date	Sep 30, 2014
Priority date	May 30, 2014
Publication date	Jul 25, 2017
Grant date	Jul 25, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and processes for selectively processing and responding to a spoken user input are provided. In one example, audio input containing a spoken user input can be received at a user device. The spoken user input can be identified from the audio input by identifying start and end-points of the spoken user input. It can be determined whether or not the spoken user input was intended for a virtual assistant based on contextual information. The determination can be made using a rule-based system or a probabilistic system. If it is determined that the spoken user input was intended for the virtual assistant, the spoken user input can be processed and an appropriate response can be generated. If it is instead determined that the spoken user input was not intended for the virtual assistant, the spoken user input can be ignored and/or no response can be generated.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for operating a virtual assistant on an electronic device, the method comprising: receiving, at the electronic device, an audio input; monitoring the audio input to identify a first spoken user input; identifying the first spoken user input in the audio input; determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input, wherein the contextual information comprises a determined distance between a user and the electronic device when the first spoken user input was received, the determined distance being based on the first spoken user input, wherein determining whether to respond to the first spoken user input comprises calculating a likelihood score that the virtual assistant should respond to the first spoken user input based on the contextual information associated with the first spoken user input, comparing the likelihood score to a threshold value, decreasing the likelihood score in response to the distance being greater than a threshold distance, and increasing the likelihood score in response to the distance being less than the threshold distance; in response to a determination to respond to the first spoken user input: generating a response to the first spoken user input; and monitoring the audio input to identify a second spoken user input; and in response to a determination not to respond to the first spoken user input, monitoring the audio input to identify the second spoken user input without generating the response to the first spoken user input. 2. The method of claim 1 , wherein determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input excludes identifying one or more predetermined words at the start of the first spoken user input. 3. The method of claim 1 , wherein generating the response to the first spoken user input comprises one or more of: performing speech-to-text conversion on the first spoken user input; determining a user intent sed on the first spoken user input; determining a task to be performed based on the first spoken user input; determining a parameter for the task to be performed based on the first spoken user input; performing the task to be performed; displaying a text response to the first spoken user input; and outputting an audio response to the first spoken user input. 4. The method of claim 1 , wherein determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input comprises: evaluating one or more conditional rules that depend on the contextual information associated with the first spoken user input. 5. The method of claim 1 , further comprising determining the distance between the user and the electronic device. 6. The method of claim 5 , wherein determining the distance between the user and the electronic device is based on distance data provided by a proximity sensor of the electronic device. 7. A non-transitory computer-readable storage medium comprising instructions for: receiving an audio input; monitoring the audio input to identify a first spoken user input; identifying the first spoken user input in the audio input; determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input, wherein the contextual information comprises a determined distance between a user and the electronic device when the first spoken user input was received, the determined distance being based on the first spoken user input, wherein determining whether to respond to the first spoken user input comprises calculating a likelihood score that the virtual assistant should respond to the first spoken user input based on the contextual information associated with the first spoken user input, comparing the likelihood score to a threshold value, decreasing the likelihood score in response to the distance being greater than a threshold distance, and increasing the likelihood score in response to the distance being less than the threshold distance; responsive to a determination to respond to the first spoken user input: generating a response to the first spoken user input; and monitoring the audio input to identify a second spoken user input; and responsive to a determination not to respond to the first spoken user input, monitoring the audio input to identify the second spoken user input without generating the response to the first spoken user input. 8. A system comprising: one or more processors; memory; and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for: receiving an audio input; monitoring the audio input to identify a first spoken user input; identifying the first spoken user input in the audio input; determining whether to respond to the first spoken user input based on contextual information associated with the first spoken user input, wherein the contextual information comprises a determined distance between a user and the electronic device when the first spoken user input was received, the determined distance being based on the first spoken user input, wherein determining whether to respond to the first spoken user input comprises calculating a likelihood score that the virtual assistant should respond to the first spoken user input based on the contextual information associated with the first spoken user input, comparing the likelihood score to a threshold value, decreasing the likelihood score in response to the distance being greater than a threshold distance, and increasing the likelihood score in response to the distance being less than the threshold distance; responsive to a determination to respond to the first spoken user input: generating a response to the first spoken user input; and monitoring the audio input to identify a second spoken user input; and responsive to a determination not to respond to the first spoken user input, monitoring the audio input to identify the second spoken user input without generating the response to the first spoken user input.

Assignees

Apple Inc

Inventors

Classifications

G06F2203/0381
Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer · CPC title
G06F3/167
Audio in a user interface, e.g. using voice commands for navigating, audio feedback · CPC title
G10L15/1822
Parsing for meaning understanding · CPC title
G06F3/013
Eye tracking input arrangements (G06F3/015 takes precedence) · CPC title
G10L15/1815
Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

Patent family

Related publications grouped by family.

View patent family 53366324

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9715875B2 cover?: Systems and processes for selectively processing and responding to a spoken user input are provided. In one example, audio input containing a spoken user input can be received at a user device. The spoken user input can be identified from the audio input by identifying start and end-points of the spoken user input. It can be determined whether or not the spoken user input was intended for a vir…
Who is the assignee on this patent?: Apple Inc
What technology area does this patent fall under?: Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 25 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).