System and method for speaker role determination and scrubbing identifying information

US11062706B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11062706-B2
Application numberUS-201916397745-A
CountryUS
Kind codeB2
Filing dateApr 29, 2019
Priority dateApr 29, 2019
Publication dateJul 13, 2021
Grant dateJul 13, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.

First claim

Opening claim text (preview).

What is claimed is: 1. A system comprising: at least one processor; and a memory that stores computer program instructions that are executable by the at least one processor, the computer program instructions comprising: an apportioner configured to: receive a file that includes data related to a first speaking party and a second speaking party, and divide the data into portions based on one or more characteristics of the data; a characteristic identifier configured to identify classifying characteristics of speaking party roles in each of the portions; and a data aggregator configured to generate, respectively from the portions, data sets corresponding to the first speaking party and the second speaking party based at least on the identified classifying characteristics; the characteristic identifier being configured to assign a speaking party role for at least one of the data sets, the characteristic identifier being based at least on one or more of the classifying characteristics and including a speaking roles model trained by a machine learning algorithm that utilizes editing distances between verified transcribed representations of the data and representations based on one or more of automatic speech recognition of audio data comprising the data or diarization of the audio data. 2. The system of claim 1 , wherein the one or more characteristics of the data comprise a pattern of speech in the data. 3. The system of claim 2 , wherein to identify the classifying characteristics of the speaking party roles, the characteristic identifier is configured to analyze one of the portions against at least one stored audio sample. 4. The system of claim 3 , further comprising: a speech recognizer configured to generate text data as a text representation of the audio data; and wherein the characteristic identifier is configured to identify one or more of the classifying characteristics based on textual patterns from the text representation. 5. The system of claim 4 , wherein the characteristic identifier is configured to identify the classifying characteristics based on the at least one stored audio sample and the textual patterns at least partially concurrently. 6. The system of claim 1 , wherein the data comprises text data derived via automatic speech recognition of audio data, and wherein the one or more classifying characteristics comprise a pattern of text in the text data. 7. The system of claim 6 , wherein to identify the classifying characteristics, the characteristic identifier is configured to analyze the pattern of text against at least one text pattern set associated with one or more speaking party roles. 8. The system of claim 1 , wherein the speaking roles model is configured to determine a lesser of two editing distances as corresponding to an assignment of two speaker roles for two alternative speaking party designations. 9. A computer-implemented method, comprising: dividing data related to one or more of a first speaking party or a second speaking party into portions based on one or more characteristics of the data; identifying classifying characteristics of speaking party roles in each of the portions; generating, respectively from the portions, data sets corresponding to the one or more of the first speaking party or the second speaking party based at least on the classifying characteristics identified; assigning a speaking party role for one of the data sets by a characteristic identifier, the characteristic identifier being based at least on one or more of the classifying characteristics and including a speaking roles model trained by a machine learning algorithm that utilizes editing distances between verified transcribed representations of the data and representations based on one or more of automatic speech recognition of audio data comprising the data or diarization of the audio data; and storing one of the data sets in a memory with an annotation identifying the speaking party role associated with the one of the data sets. 10. The computer-implemented method of claim 9 , wherein the one or more characteristics of the data comprise a pattern of speech in the data. 11. The computer-implemented method of claim 10 , wherein the data comprises audio data, and wherein said identifying the classifying characteristics of the speaking party roles includes analyzing one of the portions against at least one stored audio sample. 12. The computer-implemented method of claim 11 , further comprising: generating text data as a text representation of the audio data. 13. The computer-implemented method of claim 12 , further comprising: identifying the classifying characteristics based on the at least one stored audio sample and the textual pattern at least partially concurrently. 14. The computer-implemented method of claim 9 , wherein the data comprises text data derived via automatic speech recognition of audio data, and wherein the one or more classifying characteristics comprise a pattern of text in the text data. 15. The computer-implemented method of claim 14 , wherein said identifying classifying characteristics includes: analyzing the pattern of text against at least one text pattern set associated with one or more speaking party roles. 16. The computer-implemented method of claim 9 , wherein the speaking roles model is configured to determine a lesser of two editing distances as corresponding to an assignment of two speaker roles for two alternative speaking party designations. 17. The computer-implemented method of claim 16 , wherein the speaking roles model comprises a statistical probability algorithm that indicates a likelihood of a given one of the portions being associated with one of the one or more of the first speaking party or the second speaking party, and wherein said generating is based on the statistical probability algorithm. 18. A computer-readable storage medium having program instructions recorded thereon that, when executed by a processing device, perform a method comprising: dividing data related to one or more of a first speaking party or a second speaking party into portions based on one or more characteristics of the data; identifying classifying characteristics of speaking party roles in each of the portions; generating, respectively from the portions, data sets corresponding to the one or more of the first speaking party or the second speaking party based at least on the classifying characteristics identified; assigning a speaking party role for one of the data sets by a characteristic identifier, the characteristic identifier being based at least on one or more of the classifying characteristics and including a speaking roles model trained by a machine learning algorithm that utilizes editing distances between verified transcribed representations of the data and representations based on one or more of automatic speech recognition of audio data comprising the data or diarization of the audio data; and storing one of the data sets in a memory with an annotation identifying the speaking party role associated with the one of the data sets. 19. The computer-readable storage medium of claim 18 , wherein the one or more characteristics of the data comprise a pattern of speech in the data. 20. The computer-readable storage medium of claim 18 , wherein the data comprises audio data, and said identifying the classifying characteristics of the speaking party roles includes analyzing one of the portions against at least one stored audio sample; wher

Assignees

Inventors

Classifications

  • Speech to text systems (G10L15/08 takes precedence) · CPC title

  • using properties of sound source · CPC title

  • G06Q10/10Primary

    Office automation; Time management · CPC title

  • Speaker identification or verification techniques · CPC title

  • G10L15/22Primary

    Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11062706B2 cover?
Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corre…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06Q10/10. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jul 13 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).