Acoustic model training method, speech recognition method, apparatus, device and medium
US-2021125603-A1 · Apr 29, 2021 · US
US11568245B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11568245-B2 |
| Application number | US-201716760181-A |
| Country | US |
| Kind code | B2 |
| Filing date | Dec 15, 2017 |
| Priority date | Nov 16, 2017 |
| Publication date | Jan 31, 2023 |
| Grant date | Jan 31, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present invention provides artificial intelligence technology which has machine-learning-based information understanding capability, including metric learning providing improved classification performance, classification of an object considering a semantic relationship, understanding of the meaning of a scene based on the metric learning and the classification, and the like. An electronic device according to one embodiment of the present invention comprises a memory in which at least one instruction is stored, and a processor for executing the stored instruction. Here, the processor extracts feature data from training data of a first class, obtains a feature point by mapping the extracted feature data to an embedding space, and makes an artificial neural network learn in a direction for reducing a distance between the obtained feature point and an anchor point.
Opening claim text (preview).
What is claimed is: 1. An electronic apparatus comprising: a memory configured to store at least one instruction; and a processor configured to execute the stored instruction to: extract feature data from a first class of training data obtain a feature point by mapping the extracted feature data to an embedding space, and train an artificial neural network in a direction for reducing a distance between the obtained feature point and an anchor point in the embedding space for the first class of training data, and wherein the anchor point for the first class of training data comprises feature data extracted from representative data of the first class mapped to the embedding space and a position in the embedding space of the anchor point for the first class of training data is based on semantic relationship information between the first class and at least a second class, different from the first class, for a second class of training data. 2. The electronic apparatus as claimed in claim 1 , wherein the processor is configured to train the artificial neural network using a loss function which defines that the closer the feature point of first class of training data to the anchor point for the first class of training data, the less the loss, and the closer the feature point of the second class of training data to the anchor point for the first class of training data, the greater the loss. 3. The electronic apparatus as claimed in claim 1 , wherein the processor is configured to train a convolutional neural network (CNN) layer for extracting the feature data of the first class of training data, and a metric learning layer for obtaining a distance between the feature point obtained by receiving data output from the CNN layer and the anchor point for the first class of training data collectively. 4. The electronic apparatus as claimed in claim 3 , wherein the processor is configured to separate from the CNN layer, only the metric learning layer for obtaining a distance between the feature point obtained by receiving data output from the CNN layer for extracting the feature data of the training data of the first class and the anchor point for the first class of training data and train the separated metric learning layer. 5. The electronic apparatus as claimed in claim 1 , wherein the artificial neural network comprises a metric learning layer which outputs cluster feature data formed on the embedding space, and wherein the processor is configured to train an object classification layer including a single layer that receives data output from the metric learning layer and output a confidence level by each class. 6. The electronic apparatus as claimed in claim 1 , wherein the processor is configured to train the artificial neural network in a direction that the feature point of the training data of the first class is closer to the anchor point of the first class of training data, and at the same time, a feature point of the training data of the second class is closer to the anchor point of the second class of training data, in the embedding space. 7. The electronic apparatus as claimed in claim 6 , wherein the semantic relationship information comprises a distance in a semantic tree between a keyword of the first class of training data and a keyword of the second class of training data, and wherein the semantic tree reflects semantic hierarchical relationships between each keyword, and the distance in the semantic tree between the keyword of the first class of training data and the keyword of the second class of training data increases as a number of nodes between a first node corresponding to the keyword of the first class of training data and a second node corresponding to the keyword of the second class of training data increases. 8. The electronic apparats as claimed in claim 6 , wherein the processor is configured to update a position in the embedding space of at least one of a first class cluster and a second class cluster based on the semantic relationship information, wherein the first class cluster comprises the feature point of the first class of training data and the anchor point of the first class of training data, and wherein the second class cluster comprises the feature point of the second class of training data and the anchor point of the second class of training data. 9. The electronic apparatus as claimed in claim 1 , wherein the processor is configured to update the position of the anchor point of the first class of training data in the embedding space by reflecting the feature point of the first class of training data, and train the artificial neural network in a direction to reduce the distance between the feature point of the first class of training data and the updated anchor point. 10. The electronic apparatus as claimed in claim 9 , wherein the processor is configured to not perform position update of the anchor point of the first class of training data in an initial training comprising a first iteration of a first time from a training start point and perform, position update of the anchor point of the first class of training data in an iteration after the initial training. 11. The electronic apparatus as claimed in claim 10 , wherein the performing position update of the anchor point of the first class of training data in the iteration after the initial training comprises performing position update of the anchor point of the first class of training data once every two or more iterations. 12. The electronic apparatus as claimed in claim 10 , wherein the first time is set to a first value based on a type of the first class of training data being a first type, and is set to a second value based on the type of the first class of training data being a second type. 13. An electronic apparatus comprising: a memory configured to store at least one instruction; and a processor configured to execute the stored instruction; obtain feature points in an embedding space of each of a plurality of objects extracted from an image using an object recognition model which outputs data related to feature points on the embedding space, and recognize a scene of the image by using a keyword of an anchor point, among a plurality of anchor points, closest to at least some of the feature points, wherein each anchor point comprises a representative image for a respective class of training data mapped onto the embedding space, and wherein the embedding space comprises a feature space in which a distance between anchor points is determined based on semantic relationship between the anchor points. 14. The electronic apparatus as claimed in claim 13 , wherein the processor is configured to select a lower level anchor point closest to each of the mapped feature points, select at least some upper node from among nodes of a semantic tree corresponding to each of the selected lower level anchor points, and recognize the scene of the image by using a keyword corresponding to the selected upper node. 15. The electronic apparatus as claimed in claim 13 , wherein the processor is configured to select an upper level anchor point closest to at least some of the mapped feature points, and recognize the scene of the image by using a keyword corresponding to the selected the upper level anchor point. 16. The electronic apparatus as claimed in claim 13 , wherein the processor is configured to select the object recognition model based on a type of the image. 17. The electronic apparatus as claimed in claim 13 , wherein the processor is configured to select the object recognition model based on
Learning methods · CPC title
Architecture, e.g. interconnection topology · CPC title
Semantic analysis · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.