Voice data transmission method and apparatus
US-2024363120-A1 · Oct 31, 2024 · US
US9870776B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9870776-B2 |
| Application number | US-201614988884-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 6, 2016 |
| Priority date | Oct 6, 2008 |
| Publication date | Jan 16, 2018 |
| Grant date | Jan 16, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method and system for using conversational biometrics and speaker identification and/or verification to filter voice streams during mixed mode communication. The method includes receiving an audio stream of a communication between participants. Additionally, the method includes filtering the audio stream of the communication into separate audio streams, one for each of the participants. Each of the separate audio streams contains portions of the communication attributable to a respective participant. Furthermore, the method includes outputting the separate audio streams to a storage system.
Opening claim text (preview).
What is claimed is: 1. A method implemented in a computing system, the method comprising: removing background noise from a communication and performing normalization of the communication to enhance accuracy of the communication; extracting a plurality of audio streams from the enhanced communication, wherein: the plurality of audio streams correspond, respectively, to a plurality of participants in the enhanced communication, and the plurality of audio streams contain portions of the enhanced communication corresponding, respectively, to the plurality of participants; and matching one or more of the portions of the enhanced communication in the plurality of audio streams to voice prints by comparing the plurality of audio streams to only a plurality of the voice prints corresponding to identified participants within the enhanced communication and by utilizing physiological traits and behavioral traits found in speech of the identified participants; adapting a speaker model of the voice prints after successfully matching the one or more of the portions of the enhanced communication in the plurality of audio streams to the voice prints, wherein adapting the speaker model includes capturing long-term voice changes of the identified participants in the voice prints used for the matching. 2. The method of claim 1 , further comprising performing a verification process for at least one of the plurality of participants. 3. The method of claim 1 , wherein the voice prints include seed phrases received for each of the plurality of participants. 4. The method of claim 1 , wherein: providers of the voice prints are each associated with a role; and each of the voice prints is stored in one of a plurality of discrete databases based on the role of the associated provider. 5. The method of claim 1 , wherein the matching comprises: matching the one or more of the portions of the enhanced communication to the voice prints using at least one of conversational biometrics, speaker identification and speaker verification; and assigning the one or more portions of the enhanced communication attributable to each of the plurality of participants to the separate audio streams corresponding to each of the plurality of participants. 6. The method of claim 1 further comprising verifying at least one of the plurality of participants by authenticating a match between a voice sample of the at least one of the plurality of participants and a previously stored voice print of the at least one of the plurality of participants. 7. The method of claim 1 further comprising receiving the audio stream of the enhanced communication from a sensor. 8. The method of claim 1 further comprising converting the enhanced communication from speech to text. 9. The method of claim 1 , wherein the enhanced communication is a call center communication and the plurality of participants includes at least two of: a caller, an agent and an interactive voice response (IVR) system. 10. The method of claim 9 further comprising collecting seed phrases for the agent and the IVR system prior to the receiving the audio stream. 11. The method of claim 1 , wherein the enhanced communication is a medical professional communication and the plurality of participants include at least a medical professional and a patient. 12. The method of claim 1 , wherein the extracting comprises using a voice print for speaker identification of all but one of the plurality of participants and a process of elimination for the one of the plurality of participants. 13. The method of claim 1 , wherein the extracting the plurality of audio streams comprises one of: performed in real-time and performed in a batch. 14. The method of claim 1 , wherein the extracting comprises determining an origin of voice of at least one of the plurality of participants. 15. The method of claim 1 , further comprising filtering the audio streams into separate audio streams corresponding, respectively, to each of a plurality of participants in the enhanced communication, wherein each of the separate audio streams contains portions of the enhanced communication attributable to the corresponding one of the plurality of participants. 16. The method of claim 15 , further comprising identifying the plurality of participants by matching one or more of the portions of the enhanced communication in each of the separate audio streams to voice prints. 17. The method of claim 16 , wherein the matching comprises comparing the separate audio streams to only the plurality of the voice prints corresponding to the identified participants within the enhanced communication. 18. The method of claim 1 , wherein the physiological traits includes acoustic patterns of the identified participants which reflect an anatomy of the identified participants. 19. The method of claim 18 , wherein the behavioral traits are learned behavioral patterns in the identified participants. 20. The method of claim 19 , wherein the behavioral traits comprises voice pitch and speaking style of the identified participants.
in combination with interactive voice response systems or voice portals, e.g. as front-ends · CPC title
using speaker recognition · CPC title
Speech interaction details (speech recognition per se G10L15/00) · CPC title
Call or contact centers supervision arrangements · CPC title
Use of distortion metrics or a particular distance between probe pattern and reference templates · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.