Predictive placement of inventory based upon pull signals
US-10860977-B1 · Dec 8, 2020 · US
US11915682B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11915682-B2 |
| Application number | US-201917610934-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 20, 2019 |
| Priority date | May 15, 2019 |
| Publication date | Feb 27, 2024 |
| Grant date | Feb 27, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques are disclosed that enable generation of an audio waveform representing synthesized speech based on a difference signal determined using an autoregressive model. Various implementations include using a distribution of the difference signal values to represent sounds found in human speech with a higher level of granularity than sounds not frequently found in human speech. Additional or alternative implementations include using one or more speakers of a client device to render the generated audio waveform.
Opening claim text (preview).
What is claimed is: 1. A method implemented by one or more processors, the method comprising: iteratively generating samples of an audio waveform that is synthesized speech of provided text, wherein generating the samples of the audio waveform comprises: at each iteration of a plurality of sequential iterations: generating a respective difference signal for the iteration using an autoregressive model, wherein the respective difference signal is a predicted difference based on an amplitude of a respective preceding sample of the audio waveform generated in an immediately preceding iteration and an amplitude of a respective sample for the iteration, wherein an input to the autoregressive model comprises: a respective representation of at least part of the provided text, the respective preceding sample of the audio waveform generated in the immediately preceding iteration of the sequential iterations, and a respective preceding difference signal generated in the immediately preceding iteration; and determining the respective sample for the iteration using the respective difference signal for the iteration and the respective preceding sample of the audio waveform generated in the immediately preceding iteration, the respective sample for the iteration being one of the samples of the audio waveform; and causing a client device to render the audio waveform by rendering the samples of the audio waveform. 2. The method of claim 1 , wherein the one or more processors are one or more processors of the client device, wherein the client device includes memory and one or more speakers, wherein the autoregressive model is stored in the memory, wherein the audio waveform is generated using one or more of the processors of the client device, and wherein the audio waveform is rendered using one or more of the speakers of the client device. 3. The method of claim 2 , further comprising: determining that one or more conditions of the client device are satisfied; and in response to determining that the one or more conditions are satisfied: determining to utilize the autoregressive model to generate the audio waveform based on difference signals generated using the autoregressive model, instead of utilizing an alternative autoregressive model that is more resource intensive to utilize than the autoregressive model. 4. The method of claim 3 , wherein the one or more conditions of the client device include the client device being powered by a battery which is not fully charged. 5. The method of claim 3 , wherein the one or more conditions of the client device include the one or more of the processors of the client device being throttled by heat. 6. The method of claim 1 , wherein the one or more processors are one or more processors of a server that is remote from the client device, wherein the autoregressive model is stored in memory of the server, wherein the audio waveform is generated using one or more of the processors of the server, and wherein causing the client device to render the audio waveform comprises transmitting the samples of the audio waveform to the client device. 7. The method of claim 6 , further comprising: determining that one or more conditions of the server are satisfied; and in response to determining that the one or more conditions are satisfied: determining to utilize the autoregressive model to generate the audio waveform based on difference signals generated using the autoregressive model, instead of utilizing an alternative autoregressive model that is more resource intensive to utilize than the autoregressive model. 8. The method of claim 7 , wherein the one or more conditions of the server include one or more of the processors of the server being throttled by heat. 9. The method of claim 1 , wherein the autoregressive model is a recurrent neural network model. 10. The method of claim 1 , wherein the difference signal generated for the iteration is a smaller number of bits than a number of bits for the respective sample of the audio waveform of the iteration. 11. The method of claim 1 , wherein the difference signal is a discrete value selected from a difference signal distribution. 12. The method of claim 11 , wherein the difference signal distribution is a log uniform distribution. 13. The method of claim 11 , wherein the difference signal distribution includes 256 discrete values or 512 discrete values. 14. The method of claim 11 , wherein the difference signal distribution includes at least a first difference signal value and a second difference signal value, wherein the first difference signal value represents a change in sound corresponding to a high amplitude high frequency sound not found in human speech, or found in human speech with less than a threshold frequency, wherein the second difference signal value represents a change is sound found in human speech, or found in human speech with greater than a threshold frequency, and wherein the change in sound represented by the first difference signal value is greater than the change in sound represented by the second difference signal value. 15. The method of claim 11 , wherein the difference signal distribution excludes a difference signal value representing a high amplitude high frequency sound not found in human speech, or found in human speech with less than a threshold frequency. 16. The method of claim 1 , wherein the audio waveform comprises the synthesized speech of the provided text representing an individual word. 17. The method of claim 1 , wherein the audio waveform comprises the synthesized speech of the provided text representing an individual phoneme. 18. The method of claim 1 , further comprising: training the autoregressive model using a speech synthesis training instance including provided training text and a ground truth audio waveform corresponding to the provided training text, wherein training the autoregressive model comprises: at each iteration of a plurality of sequential training iterations of generating samples of a training audio waveform: generating a respective training difference signal for the iteration using the autoregressive model, wherein the respective training difference signal is a predicted difference based on an amplitude of a respective preceding training sample of the training audio waveform generated in an immediately preceding iteration and an amplitude of a respective training sample for the iteration, wherein an input to the autoregressive model comprises: a respective representation of at least part of the provided training text, the respective preceding training sample of the training audio waveform generated in the immediately preceding iteration of the sequential training iterations, and a respective preceding training difference signal generated in the immediately preceding iteration; determining the respective training sample for the iteration using the respective training difference signal for the iteration and the respective preceding training sample of the training audio waveform generated in the immediately preceding iteration, the respective training sample for the iteration being one of the samples of the training audio waveform; determining a difference between the respective training sample for the iteration and a corresponding sample of the ground truth audio waveform; and updating one or more weights of the autoregressive model based on the determined difference. 19. The method of claim 1 , wherein the client device executes an automated assistant client.
characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title
Supervised learning · CPC title
Architecture of speech synthesisers · CPC title
Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.