Speaker verification using co-location information

US2016019889A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2016019889-A1
Application numberUS-201514805687-A
CountryUS
Kind codeA1
Filing dateJul 22, 2015
Priority dateJul 18, 2014
Publication dateJan 21, 2016
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a second user of a second user device that is co-located with the first user device, a second speaker model for the second user or a second score that indicates a respective likelihood that the utterance was spoken by the second user, and determining, by the first user device, that the utterance was spoken by the first user using (i) the first speaker model and the second speaker model or (ii) the first speaker model and the second score.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method comprising: obtaining, by a first user device, a first score that indicates a respective likelihood that an utterance encoded in an audio signal was spoken by a first user of the first user device; obtaining, by the first user device for each of one or more second user devices that are each co-located with the first user device, a second score that indicates a respective likelihood that the utterance was spoken by a respective second user, wherein the respective second user device is used by the respective second user; determining, by the first user device, that the utterance was spoken by the first user using the first score that indicates a respective likelihood that the utterance was spoken by the first user of the first user device and the second scores that each indicate a respective likelihood that the utterance was spoken by the respective second user; and performing an action that corresponds with a spoken command encoded in the audio signal in response to determining that the utterance was spoken by the first user. 2 . The method of claim 1 comprising determining, by the first user device, that each of the second user devices is co-located with the first user device. 3 . The method of claim 2 wherein determining, by the first user device, that each of the second user devices is co-located with the first user device comprises determining, by the first user device, that the second user device is co-located in a physical area near a physical location of the first user device. 4 . The method of claim 2 comprising determining, by the first user device, whether the first user device has one or more settings that allow the first user device access to the second score that indicates a respective likelihood that the utterance was spoken by the second user in response to determining that the second user device used by the second user is co-located with the first user device, wherein obtaining, by the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user comprises obtaining, by the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user in response to determining that the first user device has one or more settings that allow the first user device access to the second score. 5 . The method of claim 1 comprising: generating, by the first user device, the first score that indicates a likelihood that the utterance was spoken by the first user using a portion of the audio signal and a first speaker model that is specific to the first user. 6 . The method of claim 5 wherein: determining whether the first user device has the one or more settings that allow the first user device access to the second score that indicates the respective likelihood that the utterance was spoken by the second user comprises determining whether the first user device has one or more settings that allow the first user device access to a second speaker model that is specific to the second user; and obtaining, by the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user comprises: obtaining, by the first user device, the second speaker model that is specific to the second user; and generating, by the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user using a portion of the audio signal and the second speaker model that is specific to the second user. 7 . The method of claim 6 comprising: comparing the first score with the second score to determine a highest score, wherein determining that the utterance was spoken by the first user comprises determining that the first score is the highest score. 8 . The method of claim 1 wherein obtaining, by the first user device, a second score that indicates a respective likelihood that the utterance was spoken by the second user comprises: receiving, by the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user from another device. 9 . The method of claim 8 wherein receiving, by the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user from another device comprises receiving the second score from a server. 10 . The method of claim 8 wherein receiving, by the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user from another device comprises receiving the second score from the second user device. 11 . The method of claim 1 comprising: determining, by the first user device, one or more third speaker models, associated with the first user device, for other people who may be located in a physical area near a physical location of the first user device; and determining, by the first user device, that the utterance was spoken by the first user using the first score that indicates a respective likelihood that the utterance was spoken by the first user of the first user device, the second score that indicates a respective likelihood that the utterance was spoken by the second user associated with the second user device, and the third speaker models for other people who may be located in a physical area near a physical location of the first user device. 12 . The method of claim 11 comprising: generating, by the first user device for each of the third speaker models, a respective third score using the respective third speaker model and a portion of the audio signal; and comparing, by the first user device, the first score, the second score, and the third scores to determine a highest score. 13 . The method of claim 11 comprising: receiving, by the first user device for each of the third speaker models, a respective third score from a server; and comparing, by the first user device, the first score, the second score, and the third scores to determine a highest score. 14 . The method of claim 11 comprising: determining, by the first user device for a third user device, a frequency with which the third user device is located in a physical area near a physical location of the first user device; determining, by the first user device, whether the frequency satisfies a threshold frequency; and associating, by the first user device, a third speaker model specific to a third user of the third user device with the first user device in response to determining that the frequency satisfies the threshold frequency. 15 . The method of claim 14 wherein associating, by the first user device, the third speaker model specific to the third user of the third user device with the first user device comprises storing the third speaker model in a memory of the first user device. 16 . The method of claim 14 wherein associating, by the first user device, the third speaker model specific to the third user of the third user device with the first user device comprises sending, by the first user device to a server, a message indicating that the third speaker model should be associated with the first user device. 17 . A computer-implemented method comprising: receiving, by a first user device, an audio signal encoding an utterance; obtaining, by the first user device, a first score that indicates a respective likelihood that the utterance was spoken by the first user device; determining, by the first user device

Assignees

Inventors

Classifications

  • G06F21/32Primary

    using biometric data, e.g. fingerprints, iris scans or voiceprints · CPC title

  • Location-sensitive, e.g. geographical location, GPS · CPC title

  • Score normalisation · CPC title

  • Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis (in musical instruments G10H) · CPC title

  • using natural language modelling · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2016019889A1 cover?
Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying a user in a multi-user environment. One of the methods includes receiving, by a first user device, an audio signal encoding an utterance, obtaining, by the first user device, a first speaker model for a first user of the first user device, obtaining, by the first user device for a sec…
Who is the assignee on this patent?
Google Inc
What technology area does this patent fall under?
Primary CPC classification G06F21/32. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jan 21 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).