Who is the assignee on this patent?

Microsoft Technology Licensing Llc

What technology area does this patent fall under?

Primary CPC classification G10L15/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 05 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Processing overlapping speech from distributed devices

US11138980B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11138980-B2
Application number	US-201916399175-A
Country	US
Kind code	B2
Filing date	Apr 30, 2019
Priority date	Apr 30, 2019
Publication date	Oct 5, 2021
Grant date	Oct 5, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio channels.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer implemented method comprising: receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices; detecting overlapped speech during a first period of time during a meeting; detecting no overlapped speech during a second period of time during the meeting; performing for the first period of time, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech in response to detecting the overlapped speech, wherein the neural network model comprises a local observer comprising a set of stacked attention layers that map each audio signal into a representation; providing the separated speech on a fixed number of separate output audio channels; and providing the nonoverlapped speech for the second period of time on a further output audio channel without performing continuous speech separation. 2. The method of claim 1 wherein performing continuous speech separation is performed by the neural network model trained using permutation invariant training. 3. The method of claim 2 wherein the neural network model is configured to receive a varying number of inputs to support a dynamic change in a number of audio signals and locations of distributed devices during a meeting between multiple users. 4. The method of claim 1 wherein the multiple devices capture the audio signals during an ad-hoc meeting. 5. The method of claim 1 wherein the audio signals are received at a meeting server coupled to the distributed devices via a network. 6. The method of claim 1 and further comprising generating a transcript based on the separate audio channels. 7. The method of claim 6 and further comprising including speaker attribution in the generated transcript. 8. The method of claim 7 and further comprising sending the transcript to one or more of the distributed devices. 9. The method of claim 1 wherein at least two of the audio streams are provided by an ambient capture device having an array of microphones in fixed positions. 10. A machine-readable storage device having instructions for execution by a processor of a machine to cause the processor to perform operations to perform a method, the operations comprising: receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices; detecting overlapped speech during a first period of time during a meeting; detecting no overlapped speech during a second period of time during the meeting; performing for the first period of time, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech in response to detecting the overlapped speech, wherein the neural network model comprises a local observer comprising a set of stacked attention layers that map each audio signal into a representation; providing the separated speech on a fixed number of separate output audio channels; and providing the nonoverlapped speech for the second period of time on a further output audio channel without performing continuous speech separation. 11. The device of claim 10 wherein performing continuous speech separation is performed by a neural network model trained using permutation invariant training. 12. The device of claim 11 wherein the neural network model is configured to receive a varying number of inputs to support a dynamic change in a number of audio signals and locations of distributed devices during a meeting between multiple users. 13. The device of claim 10 wherein the multiple distributed devices comprise wireless devices associated with speakers in a meeting. 14. The device of claim 10 wherein the audio signals are received at a meeting server coupled to the distributed devices via a network. 15. The device of claim 10 and further comprising generating a speaker attributed transcript based on the separate audio channels. 16. The device of claim 15 and further comprising sending the transcript to one or more of the distributed devices. 17. The device of claim 10 wherein at least two of the audio streams are provided by an ambient capture device having an array of microphones in fixed positions. 18. A device comprising: a processor; and a memory device coupled to the processor and having a program stored thereon for execution by the processor to perform operations comprising: receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices; detecting overlapped speech during a first period of time during a meeting; detecting no overlapped speech during a second period of time during the meeting; performing for the first period of time, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech in response to detecting the overlapped speech, wherein the neural network model comprises a local observer comprising a set of stacked attention layers that map each audio signal into a representation; providing the separated speech on a fixed number of separate output audio channels; and providing the nonoverlapped speech for the second period of time on a further output audio channel without performing continuous speech separation. 19. The device of claim 18 wherein performing continuous speech separation is performed by a neural network model trained using permutation invariant training and wherein the neural network model is configured to receive a varying number of inputs to support a dynamic change in a number of audio signals and locations of distributed devices during a meeting between multiple users. 20. The device of claim 18 wherein the audio signals are received at a meeting server coupled to the distributed devices via a network, and wherein the meeting server performs addition operations comprising: generating a speaker attributed transcript based on the separate audio channels; and sending the transcript to one or more of the distributed devices.

Assignees

Microsoft Technology Licensing Llc

Inventors

Classifications

G10L21/0208
Noise filtering · CPC title
G10L15/26
Speech to text systems (G10L15/08 takes precedence) · CPC title
G10L15/30Primary
Distributed recognition, e.g. in client-server systems, for mobile phones or network applications · CPC title
G10L21/0272Primary
Voice signal separating · CPC title
G10L2021/02087
the noise being separate speech, e.g. cocktail party · CPC title

Patent family

Related publications grouped by family.

View patent family 70293059

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11138980B2 cover?: A computer implemented method includes receiving audio signals representative of speech via multiple audio streams transmitted from corresponding multiple distributed devices, performing, via a neural network model, continuous speech separation for one or more of the received audio signals having overlapped speech, and providing the separated speech on a fixed number of separate output audio ch…
Who is the assignee on this patent?: Microsoft Technology Licensing Llc
What technology area does this patent fall under?: Primary CPC classification G10L15/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 05 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Systems and methods for speech separation and neural decoding of attentional selection in multi-speaker environments

Multi-talker speech recognizer

Permutation invariant training for talker-independent multi-talker speech separation

Multi-speaker speech separation

Frequently asked questions