Deep multi-channel acoustic modeling using multiple microphone array geometries
US-11574628-B1 · Feb 7, 2023 · US
US11854528B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11854528-B2 |
| Application number | US-202117402045-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 13, 2021 |
| Priority date | Dec 22, 2020 |
| Publication date | Dec 26, 2023 |
| Grant date | Dec 26, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus for detecting unsupported utterances in natural language understanding, includes a memory storing instructions, and at least one processor configured to execute the instructions to classify a feature that is extracted from an input utterance of a user, as one of in-domain and out-of-domain (OOD) for a response to the input utterance, obtain an OOD score of the extracted feature, and identify whether the feature is classified as OOD. The at least one processor is further configured to executed the instructions to, based on the feature being identified to be classified as in-domain, identify whether the obtained OOD score is greater than a predefined threshold, and based on the OOD score being identified to be greater than the predefined threshold, re-classify the feature as OOD.
Opening claim text (preview).
What is claimed is: 1. An apparatus for detecting unsupported utterances in natural language understanding, the apparatus comprising: a memory storing instructions; and at least one processor configured to execute the instructions to: classify a feature that is extracted from an input utterance of a user, as one of in-domain and out-of-domain (OOD) for a response to the input utterance, via a trained neural network-based classifier; obtain an OOD score of the extracted feature via a distribution-based OOD detector, based on a distance between the extracted feature and distributions of a predetermined distribution function; based on the feature being classified as in-domain via the trained neural network-based classifier, identify whether the OOD score of the feature is greater than a predefined threshold; and based on the OOD score being greater than the predefined threshold, re-classify the feature, which is classified as in-domain via the trained neural network-based classifier, as OOD. 2. The apparatus of claim 1 , wherein the at least one processor is further configured to execute the instructions to obtain the OOD score of the feature, based on a class mean and a global covariance, the class mean is trained based on annotated utterances that are annotated by at least one user and are in-domain, and the global covariance is trained based on the trained class mean and unannotated utterances that are unannotated by the at least one user and are in-domain and OOD. 3. The apparatus of claim 2 , wherein the at least one processor is further configured to execute the instructions to classify the extracted feature as one of in-domain and OOD for the response to the input utterance, using the trained neural network-based classifier that is trained based on the annotated utterances, and the neural network-based classifier is trained prior to the class mean and the global covariance being trained. 4. The apparatus of claim 2 , wherein the predefined threshold is trained based on one or more OOD scores of validation data that is in-domain, and the predefined threshold is trained after the class mean and the global covariance being trained. 5. The apparatus of claim 2 , wherein the at least one processor is further configured to obtain consent of the user, for using personalization data of the user. 6. The apparatus of claim 5 , wherein the at least one processor is further configured to execute the instructions to, based on the consent of the user being obtained, store history of the user, the history comprising one or more utterances of the user. 7. The apparatus of claim 6 , wherein the at least one processor is further configured to execute the instructions to update the global covariance, based on the stored history. 8. A method of detecting unsupported utterances in natural language understanding, the method being performed by at least one processor, and the method comprising: classifying a feature that is extracted from an input utterance of a user, as one of in-domain and out-of-domain (OOD) for a response to the input utterance, via a trained neural network-based classifier; obtaining an OOD score of the extracted feature via a distribution-based OOD detector, based on a distance between the extracted feature and distributions of a predetermined distribution function; based on the feature being classified as in-domain via the trained neural network-based classifier, identifying whether the OOD score of the feature is greater than a predefined threshold; and based on the OOD score being identified to be greater than the predefined threshold, re-classifying the feature, which is classified as in-domain via the trained neural network-based classifier, as OOD. 9. The method of claim 8 , wherein the obtaining the OOD score comprises obtaining the OOD score of the feature, based on a class mean and a global covariance, the class mean is trained based on annotated utterances that are annotated by at least one user and are in-domain, and the global covariance is trained based on the trained class mean and unannotated utterances that are unannotated by the at least one user and are in-domain and OOD. 10. The method of claim 9 , wherein the classifying the extracted feature comprises classifying the extracted feature as one of in-domain and OOD for the response to the input utterance, using the trained neural network-based classifier that is trained based on the annotated utterances, and the trained neural network-based classifier is trained prior to the class mean and the global covariance being trained. 11. The method of claim 9 , wherein the predefined threshold is trained based on one or more OOD scores of validation data that is in-domain, and the predefined threshold is trained after the class mean and the global covariance being trained. 12. The method of claim 9 , further comprising obtaining consent of the user, for using personalization data of the user. 13. The method of claim 12 , further comprising, based on the consent of the user being obtained, storing history of the user, the history comprising one or more utterances of the user. 14. The method of claim 13 , further comprising updating the global covariance, based on the stored history. 15. A non-transitory computer-readable storage medium storing instructions that, when executed by at least one processor, cause the at least one processor to: classify a feature that is extracted from an input utterance of a user, as one of in-domain and out-of-domain (OOD) for a response to the input utterance, via a trained neural network-based classifier; obtain an OOD score of the extracted feature via a distribution-based OOD detector, based on a distance between the extracted feature and distributions of a predetermined distribution function; based on the feature being classified as in-domain via the trained neural network-based classifier, identify whether the OOD score of the feature is greater than a predefined threshold; and based on the OOD score being identified to be greater than the predefined threshold, re-classify the feature, which is classified as in-domain via the trained neural network-based classifier, as OOD. 16. The non-transitory computer-readable storage medium of claim 15 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to obtain the OOD score of the feature, based on a class mean and a global covariance, the class mean is trained based on annotated utterances that are annotated by at least one user and are in-domain, and the global covariance is trained based on the trained class mean and unannotated utterances that are unannotated by the at least one user and are in-domain and OOD. 17. The non-transitory computer-readable storage medium of claim 16 , wherein the instructions, when executed by the at least one processor, further cause the at least one processor to classify the extracted feature as one of in-domain and OOD for the response to the input utterance, using the trained neural network-based classifier that is trained based on the annotated utterances, and the neural network-based classifier is trained prior to the class mean and the global covariance being trained. 18. The non-transitory computer-readable storage medium of claim 16 , wherein the predefined threshold is trained based on one or more OOD scores of validation data that is in-domain, and the predefined threshold is trained after the class mean and the global covariance being trained. 19. The non-transi
Feature extraction for speech recognition; Selection of recognition unit · CPC title
using natural language modelling · CPC title
Parsing for meaning understanding · CPC title
Semantic analysis · CPC title
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.