Method and system of neural network keyphrase detection
US-2019043488-A1 · Feb 7, 2019 · US
US11875783B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11875783-B2 |
| Application number | US-202016892080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 3, 2020 |
| Priority date | Jun 3, 2020 |
| Publication date | Jan 16, 2024 |
| Grant date | Jan 16, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, system, and device are directed to audio input bit-size conversion for compatibility to audio processing systems with an expected input sample bit-size.
Opening claim text (preview).
What is claimed is: 1. An audio processing device comprising: memory storing audio input including human speech and in a form of initial samples with a first bit-size; and at least one processor communicatively coupled to the memory to operate by: dividing at least one of the initial samples into multiple sample parts; generating at least one gain formed by at least one neural network accelerator; applying the at least one gain to at least one of the sample parts to form at least one scaled sample part; and generating a scaled output sample in a second bit-size comprising combining at least portions of the multiple sample parts including the at least one scaled sample part, and wherein a portion of one of the sample parts being combined has most significant bits (MSBs) of the initial sample and a portion of another one of the sample parts being combined has least significant bits (LSBs) of the initial sample. 2. The device of claim 1 , wherein the sample parts each have a size so that the sample parts cooperatively hold all of the bits from the initial sample. 3. The device of claim 1 , wherein the sample parts are of the second bit-size. 4. The device of claim 1 , wherein the sample parts comprise at least a high sample part filled with the most significant bits and other bits from the initial sample and a low sample part having the least significant bits from the initial sample and remaining bit spaces filled with zeros. 5. The device of claim 1 , wherein the dividing comprises storing the initial sample in a container of a transition sample with a third bit-size that is larger than the first bit-size of the initial sample and evenly divisible into the sample parts. 6. The device of claim 5 , wherein the first bit-size is 24 bits, the second bit-size is 16 bits, and the third bit-size is 32 bits. 7. The device of claim 5 , wherein the at least one processor is arranged to operate by deinterleaving a sequence of the transition samples, wherein each transition sample has a high sample part and a low sample part, and the deinterleaving to generate a high sample vector of high sample parts separate from a low sample vector of low sample parts to separately input the high and low sample vectors into a neural network accelerator. 8. The device of claim 7 , wherein the at least one processor to shift the low sample parts having the least significant bits (LSBs) of the initial samples to reserve a bit space in the low sample part for a sign bit using at least one neural network accelerator. 9. The device of claim h wherein the at least one processor operates by determining absolute value versions of the sample parts and a separate sign vector maintaining a sign of at least one of the sample parts to use to generate the scaled output sample. 10. A method of audio processing comprising: obtaining audio input including human speech and in a form of initial samples with a first bit-size; dividing at least one of the initial samples into multiple sample parts; generating, by at least one neural network accelerator, at least one gain; applying the at least one gain to at least one of the sample parts to form at least one scaled sample part; and generating a scaled output sample in a second bit-size comprising combining at least portions of the multiple sample parts and including the at least one scaled sample part, and wherein a portion of one of the sample parts being combined has most significant bits (MSBs) of the initial sample and a portion of another one of the sample parts being combined has least significant bits (LSBs) of the initial sample. 11. The method of claim 10 , wherein the at least one gain is computed dynamically depending on the sample parts. 12. The method of claim 10 , wherein the at least one gain is computed by using a count of a number of bit spaces occupied by one of the sample parts. 13. The method of claim 10 , wherein the same at least one gain is used for multiple sample parts of a same sample set of multiple parts of multiple initial samples regardless of which sample part was used to form the gain. 14. The method of claim 10 , wherein multiple initial samples of a sample set of initial samples are divided into sample parts, and wherein the at least one gain is generated by using only data of a high sample part with the highest value among all high sample parts of the set. 15. The method of claim 14 , comprising determining the high sample part with the highest value by using max pooling layers of a neural network. 16. A computer-implemented system for audio processing comprising: at least one microphone to capture audio input including human speech; memory to store the audio input in of initial samples of a first bit-size; at least one processor communicatively coupled to the at least one microphone and at least one memory, and to operate by: dividing at least one of the initial samples into multiple sample parts; generating at least one gain formed by at least one neural network accelerator; applying the at least one gain to at least one of the sample parts to form at least one scaled sample part; and generating a scaled output sample in a second bit-size comprising combining at least portions of the multiple sample parts and including the at least one scaled sample part, and wherein a portion of one of the sample parts being combined has most significant bits (MSBs) of the initial sample and a portion of another one of the sample parts being combined has least significant bits (LSBs) of the initial sample. 17. The system of claim 16 , wherein the at least one gain is arranged so that applying the at least one gain causes a bit-shift in the sample part to place a most significant bit of the sample part at the highest available bit space of a scaled sample part to be used to form the scaled output sample. 18. The system of claim 17 , wherein the bit-shift provides empty bit spaces on the scaled sample part to receive bits of a scaled low sample part associated with the least significant bits of the initial sample. 19. The system of claim 16 , wherein the scaled output sample is formed by combining at least portions of a scaled high sample part and a scaled low sample part. 20. At least one non-transitory machine readable medium comprising instructions that, in response to being executed on a computing device, cause the computing device to operate by: obtaining audio input including human speech and in a form of initial samples with a first bit-size; dividing at least one of the initial samples into multiple sample parts; generating, by at least one neural network accelerator, at least one gain; applying the at least one gain to at least one of the sample parts to form at least one scaled sample part; and generating a scaled output sample in a second bit-size comprising combining at least portions of the multiple sample parts and including the at least one scaled sample part, and wherein a portion of one of the sample parts being combined has most significant bits (MSBs) of the initial sample and a portion of another one of the sample parts being combined has least significant bits (LSBs) of the initial sample. 21. The machine readable medium of claim 20 , wherein at least one of the dividing, applying the at least one gain, and generating a scaled output sample are performed by one or more neural network accelerators without the use of a digital signal processor (DSP). 22. The machine readable medium of claim 20 , wherein the instructi
using artificial neural networks · CPC title
Interface to dedicated audio devices, e.g. audio drivers, interface to CODECs · CPC title
Management of the audio stream, e.g. setting of volume, audio stream path · CPC title
Programmable structures, i.e. where the code converter contains apparatus which is operator-changeable to modify the conversion process · CPC title
the radix thereof being two · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.