Voiceprint recognition method, model training method, and server

US11508381B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11508381-B2
Application numberUS-202017085609-A
CountryUS
Kind codeB2
Filing dateOct 30, 2020
Priority dateOct 10, 2018
Publication dateNov 22, 2022
Grant dateNov 22, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this application disclose a voiceprint recognition method performed by a computer. After obtaining a to-be-recognized target voice message, the computer obtains target feature information of the target voice message by using a voice recognition model, the voice recognition model being obtained through training according to a first loss function and a second loss function. Next, the computer determines a voiceprint recognition result according to the target feature information and registration feature information, the registration feature information being obtained from a voice message of a to-be-recognized object using the voiceprint recognition model. The normalized exponential function and the centralization function are used for jointly optimizing the voice recognition model, and can reduce an intra-class variation between depth features from the same speaker. The two functions are used for simultaneously supervising and learning the voice recognition model, and enable the depth feature to have better discrimination, thereby improving recognition performance.

First claim

Opening claim text (preview).

What is claimed is: 1. A voiceprint recognition method, comprising: obtaining a target voice message; obtaining text-independent target feature information of the target voice message by using a voiceprint recognition model, the voiceprint recognition model obtained through training according to a first loss function and a second loss function, the first loss function being a normalized exponential function that discriminates between deep features associated with different objects, and the second loss function being a centralization function that reduces variations in the deep features associated with the same object, and the voiceprint recognition model is obtained through training a convolutional neural network (CNN) by: obtaining a voice message set comprising voice messages corresponding to multiple training objects; capturing, from the voice messages, voice segments; inputting the captured voice segments to the CNN to obtain a deep feature of a sentence level for each of the voice messages; training the CNN as the voiceprint recognition model by joint supervision of the deep features of the voice messages corresponding to the training objects with the first loss function discriminating the deep features corresponding to different training objects in the voice messages and the second loss function reducing variations in the deep features of the same training object; and determining a voiceprint recognition result by comparing the target feature information and registration feature information, the registration feature information obtained from a voice message of an object using the voiceprint recognition model. 2. The method according to claim 1 , wherein determining the voiceprint recognition result comprises: calculating a cosine similarity according to the target feature information and the registration feature information; determining that the target voice message is a voice message of the object in accordance with a determination that the cosine similarity reaches a first similarity threshold; and determining that the target voice message is not a voice message of the object in accordance with a determination that the cosine similarity does not reach the first similarity threshold. 3. The method according to claim 1 , wherein determining the voiceprint recognition result comprises: calculating a log-likelihood ratio between the target feature information and the registration feature information using a PLDA classifier; determining that the target voice message is a voice message of the object in accordance with a determination that the log-likelihood ratio reaches a second similarity threshold; and determining that the target voice message is not a voice message of the object in accordance with a determination that the log-likelihood ratio does not reach the second similarity threshold. 4. The method according to claim 1 , wherein training the CNN further comprises: determining, for each of the voice messages, a deep feature corresponding to the voice message using the CNN; obtaining a fully connected layer weight matrix according to the voice messages; and determining the first loss function according to the deep feature of each of the voice messages and the fully connected layer weight matrix. 5. The method according to claim 4 , wherein determining the first loss function according to the deep feature of each of the voice messages and the fully connected layer weight matrix comprises: determining the first loss function according to: L s = - ∑ i = 1 M log ⁢ e W y i T ⁢ x i + b y i ∑ j = 1 N e W v T ⁢ x i + b j , wherein L S represents the first loss function, X i represents representing the i th deep feature from the y i th object, w v represents the v th column in the fully connected layer weight matrix, b i represents a bias of the j th class, each class corresponding to an object, M represents a group size of a training set corresponding to the voice message set, and N represents a quantity of objects corresponding to the voice message set. 6. The method according to claim 1 , wherein training the CNN further comprises: determining, for each of the voice messages, a deep feature corresponding to the voice message using the CNN; calculating a deep feature gradient according to the deep feature of each of the voice messages; calculating a second voice mean according to the deep feature gradient and a first voice mean; and determining the second loss function according to the deep feature of each of the voice messages and the second voice mean. 7. The method according to claim 6 , wherein calculating the deep feature gradient according to the deep feature of each of the voice messages comprises: calculating the deep feature gradient according to: Δ ⁢ μ j = ∑ i =

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • G10L17/18Primary

    Artificial neural networks; Connectionist approaches · CPC title

  • Learning methods · CPC title

  • using biometric data, e.g. fingerprints, iris scans or voiceprints · CPC title

  • G10L17/04Primary

    Training, enrolment or model building · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11508381B2 cover?
Embodiments of this application disclose a voiceprint recognition method performed by a computer. After obtaining a to-be-recognized target voice message, the computer obtains target feature information of the target voice message by using a voice recognition model, the voice recognition model being obtained through training according to a first loss function and a second loss function. Next, t…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L17/18. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 22 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).