What technology area does this patent fall under?

Primary CPC classification G10L17/18. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Voiceprint recognition method, model training method, and server

Patent metadata
Field	Value
Publication number	US-11508381-B2
Application number	US-202017085609-A
Country	US
Kind code	B2
Filing date	Oct 30, 2020
Priority date	Oct 10, 2018
Publication date	Nov 22, 2022
Grant date	Nov 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this application disclose a voiceprint recognition method performed by a computer. After obtaining a to-be-recognized target voice message, the computer obtains target feature information of the target voice message by using a voice recognition model, the voice recognition model being obtained through training according to a first loss function and a second loss function. Next, the computer determines a voiceprint recognition result according to the target feature information and registration feature information, the registration feature information being obtained from a voice message of a to-be-recognized object using the voiceprint recognition model. The normalized exponential function and the centralization function are used for jointly optimizing the voice recognition model, and can reduce an intra-class variation between depth features from the same speaker. The two functions are used for simultaneously supervising and learning the voice recognition model, and enable the depth feature to have better discrimination, thereby improving recognition performance.

First claim

Opening claim text (preview).

What is claimed is: 1. A voiceprint recognition method, comprising: obtaining a target voice message; obtaining text-independent target feature information of the target voice message by using a voiceprint recognition model, the voiceprint recognition model obtained through training according to a first loss function and a second loss function, the first loss function being a normalized exponential function that discriminates between deep features associated with different objects, and the second loss function being a centralization function that reduces variations in the deep features associated with the same object, and the voiceprint recognition model is obtained through training a convolutional neural network (CNN) by: obtaining a voice message set comprising voice messages corresponding to multiple training objects; capturing, from the voice messages, voice segments; inputting the captured voice segments to the CNN to obtain a deep feature of a sentence level for each of the voice messages; training the CNN as the voiceprint recognition model by joint supervision of the deep features of the voice messages corresponding to the training objects with the first loss function discriminating the deep features corresponding to different training objects in the voice messages and the second loss function reducing variations in the deep features of the same training object; and determining a voiceprint recognition result by comparing the target feature information and registration feature information, the registration feature information obtained from a voice message of an object using the voiceprint recognition model. 2. The method according to claim 1 , wherein determining the voiceprint recognition result comprises: calculating a cosine similarity according to the target feature information and the registration feature information; determining that the target voice message is a voice message of the object in accordance with a determination that the cosine similarity reaches a first similarity threshold; and determining that the target voice message is not a voice message of the object in accordance with a determination that the cosine similarity does not reach the first similarity threshold. 3. The method according to claim 1 , wherein determining the voiceprint recognition result comprises: calculating a log-likelihood ratio between the target feature information and the registration feature information using a PLDA classifier; determining that the target voice message is a voice message of the object in accordance with a determination that the log-likelihood ratio reaches a second similarity threshold; and determining that the target voice message is not a voice message of the object in accordance with a determination that the log-likelihood ratio does not reach the second similarity threshold. 4. The method according to claim 1 , wherein training the CNN further comprises: determining, for each of the voice messages, a deep feature corresponding to the voice message using the CNN; obtaining a fully connected layer weight matrix according to the voice messages; and determining the first loss function according to the deep feature of each of the voice messages and the fully connected layer weight matrix. 5. The method according to claim 4 , wherein determining the first loss function according to the deep feature of each of the voice messages and the fully connected layer weight matrix comprises: determining the first loss function according to: L s = - ∑ i = 1 M log ⁢ e W y i T ⁢ x i + b y i ∑ j = 1 N e W v T ⁢ x i + b j , wherein L S represents the first loss function, X i represents representing the i th deep feature from the y i th object, w v represents the v th column in the fully connected layer weight matrix, b i represents a bias of the j th class, each class corresponding to an object, M represents a group size of a training set corresponding to the voice message set, and N represents a quantity of objects corresponding to the voice message set. 6. The method according to claim 1 , wherein training the CNN further comprises: determining, for each of the voice messages, a deep feature corresponding to the voice message using the CNN; calculating a deep feature gradient according to the deep feature of each of the voice messages; calculating a second voice mean according to the deep feature gradient and a first voice mean; and determining the second loss function according to the deep feature of each of the voice messages and the second voice mean. 7. The method according to claim 6 , wherein calculating the deep feature gradient according to the deep feature of each of the voice messages comprises: calculating the deep feature gradient according to: Δ ⁢ μ j = ∑ i =

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G06N3/045
Combinations of networks · CPC title
G10L17/18Primary
Artificial neural networks; Connectionist approaches · CPC title
G06N3/08
Learning methods · CPC title
G06F21/32
using biometric data, e.g. fingerprints, iris scans or voiceprints · CPC title
G10L17/04Primary
Training, enrolment or model building · CPC title

Patent family

Related publications grouped by family.

View patent family 67644996

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11508381B2 cover?: Embodiments of this application disclose a voiceprint recognition method performed by a computer. After obtaining a to-be-recognized target voice message, the computer obtains target feature information of the target voice message by using a voice recognition model, the voice recognition model being obtained through training according to a first loss function and a second loss function. Next, t…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L17/18. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Acoustic model training method, speech recognition method, apparatus, device and medium

Identity verification method and apparatus based on voiceprint

Speaker verification

Frequently asked questions