Fully Supervised Speaker Diarization
US-2020219517-A1 · Jul 9, 2020 · US
US11200903B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11200903-B2 |
| Application number | US-202016781724-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 4, 2020 |
| Priority date | Feb 28, 2019 |
| Publication date | Dec 14, 2021 |
| Grant date | Dec 14, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method of speaker verification comprises receiving an audio signal representing speech. While the audio signal is being received, features of the received audio signal are extracted. The extracted features, of at least a part of the received audio signal corresponding to the speech of at least one speaker, are summarised, and the summarised extracted features are stored. In response to a request for a speaker verification process relating to at least one enrolled user, the speaker verification process is performed using the previously summarised features.
Opening claim text (preview).
The invention claimed is: 1. A method of speaker verification, comprising: in a first module: receiving an audio signal representing speech; while the audio signal is being received, extracting features of the received audio signal; performing a provisional separation of the speech represented by the received audio signal into the speech of multiple speakers; summarising the extracted features of at least a part of the received audio signal corresponding to the speech of each of said speakers; and storing the summarised extracted features; and in a second module: in response to a request for a speaker verification process relating to at least one enrolled user, performing said speaker verification process using the previously summarised extracted features, wherein the first module has a lower power consumption than the second module. 2. The method according to claim 1 , comprising: performing the provisional separation of the speech represented by the received audio signal into the speech of multiple speakers using speaker tracking. 3. The method according to claim 1 , comprising: performing the provisional separation of the speech represented by the received audio signal into the speech of multiple speakers by performing a speaker change detection process. 4. The method according to claim 1 , comprising: performing the provisional separation of the speech represented by the received audio signal into the speech of multiple speakers by clustering audio segments that have a high likelihood of being from the same speaker. 5. The method according to claim 4 comprising: dividing the received audio signal into segments of fixed length; and determining whether a segment represents the speech of the same speaker as a preceding segment. 6. The method according to claim 1 , comprising, in response to a request for diarisation, performing said diarisation using the previously summarised features. 7. The method according to claim 1 , comprising: identifying a keyword in the speech represented by the received audio signal; and interpreting the keyword as a request for a speaker verification process relating to a user speaking when the keyword was identified. 8. The method according to claim 1 , wherein extracting features of the received audio signal comprises extracting biometrics features from the received audio signal, and wherein the step of performing said speaker verification process using the previously summarised features comprises performing a biometrics scoring of the extracted features. 9. The method according to claim 8 , wherein performing said speaker verification process comprises comparing the previously summarised features with a model of the speech of at least one enrolled user to obtain a biometric score. 10. The method according to claim 1 , further comprising performing an antispoofing check using the previously summarised features, in response to the request for the speaker verification process. 11. The method according to claim 1 , further comprising performing an antispoofing check on the received audio signal, before storing the previously summarised features. 12. A system for speaker verification, comprising: a first module and a second module, wherein the first module is configured for: receiving an audio signal representing speech; while the audio signal is being received, extracting features of the received audio signal; performing a provisional separation of the speech represented by the received audio signal into the speech of multiple speakers; summarising the extracted features of at least a part of the received audio signal corresponding to the speech of each of said speakers; and causing the summarised extracted features to be stored; and wherein, the second module is configured, in response to a request for a speaker verification process relating to at least one enrolled user, for performing said speaker verification process using the previously extracted summarised features, wherein the first module has lower power consumption while operating than the second module. 13. The system as claimed in claim 12 , wherein the first module is configured for always-on operation; and wherein the second module is maintained in low-power or inactive state until receiving the request for the speaker verification process. 14. The system as claimed in claim 12 , wherein the first module and the second module are provided in separate integrated circuits. 15. The system as claimed in claim 14 , wherein the first module and the second module are provided in separate integrated circuits within a device. 16. The system as claimed in claim 15 , wherein the first module and the second module are provided in separate integrated circuits within a smartphone device. 17. The system as claimed in claim 14 , wherein the first module and the second module are provided in separate devices. 18. The system as claimed in claim 12 , wherein the first module is configured for causing the summarised extracted features to be stored in the first module. 19. The system as claimed in claim 12 , wherein the first module is configured for causing the summarised extracted features to be stored in the second module.
Use of distortion metrics or a particular distance between probe pattern and reference templates · CPC title
Decision making techniques; Pattern matching strategies · CPC title
Voice signal separating · CPC title
Speaker identification or verification techniques · CPC title
Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.