System and method for a multi-primary wide gamut color system
US-2021343219-A1 · Nov 4, 2021 · US
US11527238B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11527238-B2 |
| Application number | US-202117154956-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 21, 2021 |
| Priority date | Oct 30, 2020 |
| Publication date | Dec 13, 2022 |
| Grant date | Dec 13, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A computer device is provided that includes one or more processors configured to receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain, and receive an external language model that has been trained with training data from a target-domain. The one or more processors are configured to perform an inference of the probability of an output token sequence given a sequence of input speech features. Performing the inference includes computing an E2E model score, computing an external language model score, and computing an estimated internal language model score for the E2E model. The estimated internal language model score is computed by removing a contribution of an intrinsic acoustic model. The processor is further configured to compute an integrated score based at least on E2E model score, the external language model score, and the estimated internal language model score.
Opening claim text (preview).
The invention claimed is: 1. A computer device comprising: one or more processors configured to: receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source domain; receive an external language model that has been trained with training data from a target domain; perform an inference of the probability of an output token sequence of tokenized text represented by one or more embedding vectors, given a sequence of input speech features by: computing an E2E model score for one or more candidate output token sequences based on the sequence of input speech features using the E2E model; computing an external language model score for the one or more candidate output token sequences using the external language model; computing an estimated internal language model score for the one or more candidate output token sequences for the E2E model, wherein the E2E model encodes an intrinsic language model and an intrinsic acoustic model, and wherein the estimated internal language model score is computed by removing a contribution of the intrinsic acoustic model; and computing an integrated score for the one or more candidate output token sequences based at least on the E2E model score, the external language model score, and the estimated internal language model score. 2. The computer device of claim 1 , wherein the E2E model has been trained to minimize a standard E2E model loss. 3. The computer device of claim 1 , wherein the E2E model has been trained to minimize a weighted combination of an internal language model loss and a standard E2E model loss. 4. The computer device of claim 3 , wherein the internal language model loss is determined based on summing negative log probabilities of the intrinsic language model over a training corpus. 5. The computer device of claim 1 , wherein the integrated score for one or more candidate output token sequence is computed by subtracting the estimated internal language model score from a log-linear combination of the E2E model score and the external language model score. 6. The computer device of claim 1 , wherein the one or more processors are further configured to: receive a speech input associated with the target domain via an input device; and evaluate a set of input data from the target domain using the trained E2E model implementing language model integration with the trained external language model for the target domain. 7. The computer device of claim 1 , wherein the E2E model is a recurrent neural network transducer (RNN-T) model, wherein the RNN-T model includes an encoder, a prediction network, and a joint network that combines an output of the encoder and an output of the prediction network via a feed-forward network, and wherein the estimated internal language model score is computed by removing a contribution of the encoder of the RNN-T model to the feed-forward network. 8. The computer device of claim 1 , wherein the E2E model is an attention-based encoder-decoder (AED) model, wherein the AED model includes an encoder that maps sequence of input speech features into a sequence of hidden representations, an attention network that generates attention weights for encoded features in the sequence of hidden representations, a context vector that is computed as a linear combination of the sequence of hidden representations weighted by the attention weights, and a decoder that takes the context vector and a token sequence as input, and wherein the estimated internal language model score is computed by removing a contribution of the encoder to the decoder of the AED model. 9. The computer device of claim 1 , wherein the E2E model is trained with training data that includes audio-transcript pairs. 10. The computer device of claim 1 , wherein the external language model is trained with training data that includes text data. 11. The computer device of claim 1 , wherein the integrated score is estimated at each step of a beam search inference algorithm. 12. A method comprising: at one or more processors of a computer device: receiving an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a source-domain; receiving an external language model that has been trained with training data from a target-domain; performing an inference of the probability of an output token sequence of tokenized text represented by one or more embedding vectors, given a sequence of input speech features by: computing an E2E model score for one or more candidate output token sequences based on the sequence of input speech features using the E2E model; computing an external language model score for the one or more candidate output token sequences using the external language model; computing an estimated internal language model score for the one or more candidate output token sequences for the E2E model, wherein the E2E model encodes an intrinsic language model and an intrinsic acoustic model, and wherein the estimated internal language model score is computed by removing a contribution of the intrinsic acoustic model; and computing an integrated score for the one or more candidate output token sequences based at least on the E2E model score, the external language model score, and the estimated internal language model score. 13. The method of claim 12 , wherein the E2E model has been trained to minimize a standard E2E model loss. 14. The method of claim 12 , wherein the E2E model has been trained to minimize a weighted combination of an internal language model loss and a standard E2E model loss. 15. The method of claim 14 , wherein the internal language model loss is determined based on summing negative log probabilities of the intrinsic language model over a training corpus. 16. The method of claim 12 , wherein the integrated score for one or more candidate output token sequences is computed by subtracting the estimated internal language model score from a log-linear combination of the E2E model score and the external language model score. 17. The method of claim 12 , wherein the E2E model is a recurrent neural network transducer (RNN-T) model that includes an encoder, a prediction network, and a joint network that combines an output of the encoder and an output of the prediction network via a feed-forward network, and wherein the method includes computing the estimated internal language model score by removing a contribution of the encoder of the RNN-T model to the feed-forward network. 18. The method of claim 12 , wherein the E2E model is an attention-based encoder-decoder (AED) model that includes an encoder that maps sequence of input speech features into a sequence of hidden representations, an attention network that generates attention weights for encoded features in the sequence of hidden representations, a context vector that is computed as a linear combination of the sequence of hidden representations weighted by the attention weights, and a decoder that takes the context vector and a token sequence as input, and wherein the method includes computing the estimated internal language model score by removing a contribution of the encoder to the decoder of the AED model. 19. The method of claim 12 , wherein the integrated score is estimated at each step of a beam search inference algorithm. 20. A server system comprising: one or more processors configured to: receive an end-to-end (E2E) model that has been trained for automatic speech recognition with training data from a s
Combinations of networks · CPC title
Recurrent networks, e.g. Hopfield networks · CPC title
Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs · CPC title
using context dependencies, e.g. language models · CPC title
Training · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.