Audio recognition method and system and machine device

US11900917B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11900917-B2
Application numberUS-202117230515-A
CountryUS
Kind codeB2
Filing dateApr 14, 2021
Priority dateJan 29, 2019
Publication dateFeb 13, 2024
Grant dateFeb 13, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural network training method is provided. The method includes obtaining an audio data stream, performing, for different audio data of each time frame in the audio data stream, feature extraction in each layer of a neural network, to obtain a depth feature outputted by a corresponding time frame, fusing, for a given label in labeling data, an inter-class confusion measurement index and an intra-class distance penalty value relative to the given label in a set loss function for the audio data stream through the depth feature, and updating a parameter in the neural network by using a loss function value obtained through fusion.

First claim

Opening claim text (preview).

What is claimed is: 1. A neural network training method for implementing audio recognition, applicable to an audio recognition terminal, the method comprising: obtaining an audio data stream for neural network training of audio recognition, the audio data stream including audio data respectively corresponding to a plurality of time frames; performing, for different audio data of each time frame in the audio data stream, feature extraction in each layer of a trained neural network, to obtain a depth feature outputted by a corresponding time frame; fusing, for a given label in labeling data, an inter-class confusion measurement index and an intra-class distance penalty value relative to the given label in a set loss function for the audio data stream through the depth feature; and obtaining, through fusion, a loss function value relative to a series of given labels in the labeling data, to update a parameter in the neural network. 2. The method according to claim 1 , wherein the obtaining an audio data stream for neural network training of audio recognition comprises: obtaining a noisy and continuous audio data stream and training data with the neural network as labeling data. 3. The method according to claim 1 , wherein the fusing the inter-class confusion measurement index and the intra-class distance penalty value comprises: obtaining, for the given label in the labeling data, a center vector corresponding to a category to which the given label belongs, the center vector being used for describing centers of all depth features in the category; and fusing, according to the depth feature and the center vector, an inter-class confusion measurement index and an intra-class distance penalty value relative to the given label in a set loss function for audio data of the time frame, to obtain a loss function value of the audio data relative to the given label. 4. The method according to claim 3 , wherein the fusing the inter-class confusion measurement index and the intra-class distance penalty value comprises: calculating a center loss of the given label by using the depth feature and the center vector, to obtain an intra-class distance penalty value of the audio data of the time frame relative to the given label. 5. The method according to claim 4 , wherein the fusing the inter-class confusion measurement index and the intra-class distance penalty value further comprises: calculating, according to the depth feature, an inter-class confusion measurement index of the audio data of the time frame relative to the given label by using a cross-entropy loss function. 6. The method according to claim 4 , wherein the fusing the inter-class confusion measurement index and the intra-class distance penalty value further comprises: performing weighting calculation on the intra-class distance penalty value and the inter-class confusion measurement index of the audio data relative to the given label in the set loss function according to a specified weighting factor, to obtain the loss function value of the audio data relative to the given label. 7. The method according to claim 6 , wherein audio data of different time frames in the audio data stream is labeled through the given label in the labeling data. 8. The method according to claim 1 , wherein a blank label is added to the labeling data, and the fusing the inter-class confusion measurement index and the intra-class distance penalty value comprises: obtaining center vectors corresponding to categories to which the given label in the labeling data and the added blank label belong; and calculating, for a depth feature sequence formed by the audio data stream for the depth feature in a time sequence, a probability that the audio data stream is mapped to a given sequence label and distances of the given sequence label respectively relative to the center vectors, to obtain an intra-class distance penalty value of the audio data stream relative to the given sequence label, the given sequence label comprising the added blank label and the given label. 9. The method according to claim 8 , wherein the fusing the inter-class confusion measurement index and the intra-class distance penalty value further comprises: calculating a probability distribution of the audio data stream relative to the given sequence label according to the depth feature, and calculating a log-likelihood cost of the audio data stream through the probability distribution as an inter-class confusion measurement index of the audio data stream relative to the given sequence label. 10. The method according to claim 8 , wherein the fusing the inter-class confusion measurement index and the intra-class distance penalty value further comprises: performing weighting calculation on the inter-class confusion measurement index and the intra-class distance penalty value of the audio data stream relative to the given sequence label in the set loss function according to a specified weighting factor, to obtain a loss function value of the audio data stream relative to the given sequence label. 11. The method according to claim 10 , wherein the labeling data of the audio data stream is an unaligned discrete label string, a blank label is added to the discrete label string, and the added blank label and the given label in the labeling data respectively correspond to audio data of different time frames in the audio data stream. 12. The method according to claim 1 , wherein the obtaining the loss function value comprises: obtaining, through fusion, a loss function value relative to a series of given labels in the labeling data, to perform iterative training of updated parameters in each layer of the neural network, until a minimum loss function value is obtained; and updating a parameter corresponding to the minimum loss function value to each layer of the neural network. 13. A neural network training system for implementing audio recognition, the audio recognition system comprising: a memory storing computer program instructions; and a processor coupled to the memory and configured to execute the computer program instructions to perform: obtaining an audio data stream for neural network training of audio recognition, the audio data stream including audio data respectively corresponding to a plurality of time frames; performing, for different audio data of each time frame in the audio data stream, feature extraction in each layer of a trained neural network, to obtain a depth feature outputted by a corresponding time frame; fusing, for a given label in labeling data, an inter-class confusion measurement index and an intra-class distance penalty value relative to the given label in a set loss function for the audio data stream through the depth feature; and obtaining, through fusion, a loss function value relative to a series of given labels in the labeling data, to update a parameter in the neural network. 14. The neural network training system according to claim 13 , wherein the fusing the inter-class confusion measurement index and the intra-class distance penalty value comprises: obtaining, for the given label in the labeling data, a center vector corresponding to a category to which the given label belongs, the center vector being used for describing centers of all depth features in the category; and fusing, according to the depth feature and the center vector, an inter-class confusion measurement index and an intra-class distance penalty value relative to the given label in a set loss function for audio data of the time frame, to obtain a loss function value of the audio data relative to the given label. 15. The

Assignees

Inventors

Classifications

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Supervised learning · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • G10L15/063Primary

    Training · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11900917B2 cover?
A neural network training method is provided. The method includes obtaining an audio data stream, performing, for different audio data of each time frame in the audio data stream, feature extraction in each layer of a neural network, to obtain a depth feature outputted by a corresponding time frame, fusing, for a given label in labeling data, an inter-class confusion measurement index and an in…
Who is the assignee on this patent?
Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?
Primary CPC classification G10L15/063. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 13 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).