Method for using pauses detected in speech input to assist in interpreting the input during conversational interaction for information retrieval

US9799328B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9799328-B2
Application numberUS-201313801837-A
CountryUS
Kind codeB2
Filing dateMar 13, 2013
Priority dateAug 3, 2012
Publication dateOct 24, 2017
Grant dateOct 24, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for using speech disfluencies detected in speech input to assist in interpreting the input is provided. The method includes providing access to a set of content items with metadata describing the content items, and receiving a speech input intended to identify a desired content item. The method further includes detecting a speech disfluency in the speech input and determining a measure of confidence of a user in a portion of the speech input following the speech disfluency. If the confidence measure is lower than a threshold value, the method includes determining an alternative query input based on replacing the portion of the speech input following the speech disfluency with another word or phrase. The method further includes selecting content items based on comparing the speech input, the alternative query input (when the confidence measure is low), and the metadata associated with the content items.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for using speech disfluencies detected in speech input to assist in interpreting the input, the method comprising: providing access to a set of content items, each of the content items being associated with metadata that describes the corresponding content item; receiving a speech input from a user, the input intended by the user to identify at least one desired content item; detecting a speech disfluency in the speech input; computing a first search priority for a first portion of the speech input following the speech disfluency and a second search priority for a second portion of the speech input preceding the speech disfluency, wherein each of the first search priority and the second search priority is computed based on a measure of the disfluency; in response to determining that the first search priority is less than the second search priority, determining whether the first search priority is lower than a threshold minimum search priority; in response to determining that the first search priority is lower than the threshold minimum search priority, determining an alternative query input by automatically replacing the first portion of the speech input following the speech disfluency with another word or phrase; selecting a subset of content items from the set of content items based on comparing the speech input, the alternative query input, and the metadata associated with the subset of content items; and presenting the subset of content items to the user. 2. The method of claim 1 , wherein the speech disfluency is a pause or an auditory time filler. 3. The method of claim 1 , further comprising providing a user preference signature, the user preference signature describing preferences of the user for at least one of (i) particular content items and (ii) metadata associated with the content items, wherein each of the content items is associated with metadata that describes the corresponding content items and wherein the first portion of the speech input that is replaced is selected based on the user preference signature. 4. A method for using speech disfluencies detected in speech input to assist in interpreting the input, the method comprising: providing access to a set of content items, each of the content items being associated with metadata that describes the corresponding content item; receiving a speech input from a user, the input intended by the user to identify at least one desired content item; detecting a pause in the speech input, wherein the pause is a period of silence, and wherein the detecting comprises: identifying a start time when a sound intensity of the speech input decreases to a first value that is below a threshold cut-off intensity; identifying an end time when the sound intensity of the speech input increases to a second value that is greater than a threshold cut-off intensity; and computing a duration of the pause in the speech input based on a difference between the end time and the start time; in response to determining that the duration of the pause is less than a threshold minimum duration, assigning a higher weight to a first portion of the speech input following the pause than a second portion of the speech input preceding the pause; selecting a subset of content items based on the assigned weight by comparing the speech input and the metadata associated with the subset of content items; and presenting the subset of content items to the user. 5. The method of claim 4 , further comprising inferring that the portion of the speech input following the pause is a title or a common phrase. 6. The method of claim 4 , wherein detecting the pause further comprises: comparing the sound intensity of the speech input to the threshold cut-off intensity; determining, based on the comparing, that the sound intensity of the speech input is less than the threshold cut-off intensity; determining a length of time for which the sound the sound intensity of the speech input is less than the threshold cut-off intensity; comparing the length of time to a minimum pause period; and determining that the length of time exceeds the minimum pause period. 7. The method of claim 6 , wherein the minimum pause period is associated with a speed of the speech input. 8. The method of claim 6 , wherein the threshold cut-off intensity is determined based on an average sound intensity of the speech input and a sound intensity of background noise. 9. A system for using speech disfluencies detected in speech input to assist in interpreting the input, the system comprising control circuitry configured to: provide access to a set of content items, each of the content items being associated with metadata that describes the corresponding content item; receive a speech input from a user, the input intended by the user to identify at least one desired content item; detect a speech disfluency in the speech input; compute a first search priority for a first portion of the speech input following the speech disfluency and a second search priority for a second portion of the speech input preceding the speech disfluency, wherein each of the first search priority and the second search priority is computed based on a measure of the disfluency; in response to determining that the first search priority is less than the second search priority, determine whether the first search priority is lower than a threshold minimum search priority; in response to determining that the first search priority is lower than the threshold minimum search priority, determine an alternative query input by automatically replacing the first portion of the speech input following the speech disfluency with another word or phrase; select a subset of content items from the set of content items based on comparing the speech input, the alternative query input, and the metadata associated with the subset of content items; and present the subset of content items to the user. 10. The system of claim 9 , wherein the speech disfluency is a pause or an auditory time filler. 11. The system of claim 9 , wherein the control circuitry is further configured to provide a user preference signature, the user preference signature describing preferences of the user for at least one of (i) particular content items and (ii) metadata associated with the content items and wherein each of the content items is associated with metadata that describes the corresponding content items and wherein the first portion of the speech input that is replaced is selected based on the user preference signature. 12. A system for using speech disfluencies detected in speech input to assist in interpreting the input, the system comprising control circuitry configured to: provide access to a set of content items, each of the content items being associated with metadata that describes the corresponding content item; receive a speech input from a user, the input intended by the user to identify at least one desired content item; detect a pause in the speech input, wherein the pause is a period of silence, and wherein the control circuitry configured to detect the pause in the speech input is further configured to: identify a start time when a sound intensity of the speech input decreases to a first value that is below a threshold cut-off intensity; identify an end time when the sound intensity of the speech input increases to a second value that is greater than a threshold cut-off intensity; and compute a duration of the pause in the speech input based on a difference between the end time and the start time; in response to determining that the duration of the pause is less than a threshold minim

Assignees

Inventors

Classifications

  • G10L15/187Primary

    Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams · CPC title

  • Physics · mapped topic

  • Physics · mapped topic

  • Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning · CPC title

  • Parsing for meaning understanding · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9799328B2 cover?
A method for using speech disfluencies detected in speech input to assist in interpreting the input is provided. The method includes providing access to a set of content items with metadata describing the content items, and receiving a speech input intended to identify a desired content item. The method further includes detecting a speech disfluency in the speech input and determining a measure…
Who is the assignee on this patent?
Veveo Inc
What technology area does this patent fall under?
Primary CPC classification G10L15/187. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 24 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).