Methods and systems for invoking a user-intended internet of things (iot) device from a plurality of iot devices
US-2022301568-A1 · Sep 22, 2022 · US
US11908456B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11908456-B2 |
| Application number | US-202017006440-A |
| Country | US |
| Kind code | B2 |
| Filing date | Aug 28, 2020 |
| Priority date | Aug 6, 2018 |
| Publication date | Feb 20, 2024 |
| Grant date | Feb 20, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of this application discloses an azimuth estimation method performed at a computing device, the method including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; performing a spatial spectrum estimation on the buffered multi-channel sampling signals to obtain a spatial spectrum estimation result, when the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals; and determining an azimuth of a target voice associated with the multi-channel sampling signals according to the spatial spectrum estimation result and a highest wakeup word detection score, thereby improving the accuracy of the azimuth estimation in a voice interaction process.
Opening claim text (preview).
What is claimed is: 1. An azimuth estimation method performed at a computing device having one or more processors and memory storing a plurality of computer programs to be executed by the processor, the method comprising: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals, wherein the multi-channel sampling signals are respectively obtained through N1 channels corresponding to N1 beams in N1 different directions; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; computing, upon a determination that the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals, a time period from appearance to disappearance of the wakeup word based on the wakeup word detection scores, further including: determining a time point of the appearance of the wakeup word as a starting time point at which a wakeup word detection score starts changing; determining a time point of the disappearance of the wakeup word as an ending time point corresponding to the highest wakeup word detection score among the wakeup word detection scores; and determining the time period from the appearance to the disappearance of the wakeup word according to the starting time point and the ending time point; extracting a target sampling signal within the time period from the buffered multi-channel sampling signals; performing a spatial spectrum estimation on the target sampling signal extracted within the time period to obtain a spatial spectrum estimation result indicating signal power strengths at N2 candidate azimuths, wherein N2 is different from N1; and determining, among the N2 candidate azimuths, an azimuth of a target voice associated with the multi-channel sampling signals based on the spatial spectrum estimation result and a highest wakeup word detection score. 2. The method according to claim 1 , wherein the performing the spatial spectrum estimation on the target sampling signal to obtain the spatial spectrum estimation result comprises: calculating the signal power strengths at the N2 candidate azimuths based on the target sampling signal, wherein N2 is larger than N1. 3. The method according to claim 2 , wherein the determining an azimuth of a target voice associated with the multi-channel sampling signals based on the spatial spectrum estimation result and a highest wakeup word detection score comprises: determining an azimuth of a target main beam, the target main beam being a main beam, among the N1 beams, of a sampling signal corresponding to the highest wakeup word detection score; determining at least one local maximum value point among the signal power strengths at the N2 candidate azimuths; and determining the azimuth of the target voice based on the azimuth of the target main beam and the at least one local maximum value point. 4. The method according to claim 3 , wherein the determining the azimuth of the target voice based on the azimuth of the target main beam and the local maximum value point comprises: determining, among the N2 candidate azimuths, a candidate azimuth corresponding to a local maximum value point that is closest to the azimuth of the target main beam among the at least one local maximum value point; and determining the candidate azimuth as the azimuth of the target voice. 5. The method according to claim 3 , wherein the determining the azimuth of the target voice based on the azimuth of the target main beam and the at least one local maximum value point comprises: when there are at least two local maximum value points closest to the azimuth of the target main beam, determining an average value of candidate azimuths respectively corresponding to the at least two local maximum value points as the azimuth of the target voice. 6. The method according to claim 1 , wherein during the performing the spatial spectrum estimation, the method further comprises: stopping the wakeup word detection on the one or more sampling signals in the multi-channel sampling signals within a duration from determination of the existence of the wakeup word to reappearance of the wakeup word. 7. The method according to claim 1 , wherein the performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals comprises: performing the wakeup word detection on the one or more sampling signals in the multi-channel sampling signals, and determining a confidence of a wakeup word of each channel of the one or more sampling signals, the confidence being a similarity between content in the sampling signal and a preconfigured wakeup word; and determining the wakeup word detection score for each channel of the one or more sampling signals according to the confidence of the wakeup word of the sampling signal. 8. The method according to claim 1 , further comprising: determining that the wakeup word exists when the wakeup word detection score of any channel of the one or more sampling signals is greater than a score threshold. 9. The method according to claim 1 , further comprising: reserving sampling signals within a latest duration (M+N), and deleting sampling signals beyond the duration (M+N) in the buffered multi-channel sampling signals, M being an occupation duration of the wakeup word, and N being a preset duration. 10. A computing device, comprising one or more processors, memory and a plurality of computer programs stored in the memory that, when executed by the one or more processors, cause the computing device to perform a plurality of operations including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals, wherein the multi-channel sampling signals are respectively obtained through N1 channels corresponding to N1 beams in N1 different directions; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; computing, upon a determination that the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals, a time period from appearance to disappearance of the wakeup word based on the wakeup word detection scores, further including: determining a time point of the appearance of the wakeup word as a starting time point at which a wakeup word detection score starts changing; determining a time point of the disappearance of the wakeup word as an ending time point corresponding to the highest wakeup word detection score among the wakeup word detection scores; and determining the time period from the appearance to the disappearance of the wakeup word according to the starting time point and the ending time point; extracting a target sampling signal within the time period from the buffered multi-channel sampling signals; performing a spatial spectrum estimation on the target sampling signal extracted within the time period to obtain a spatial spectrum estimation result indicating signal power strengths at N2 candidate azimuths, wherein N2 is different from N1; and determining, among the N2 candidate azimuths, an azimuth of a target voice associated with the multi-channel sampling signals based on the spatial spectrum estimation result and a highest wakeup word detection score. 11. The computing device according to claim 10 , whe
Speech classification or search · CPC title
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
the extracted parameters being spectral information of each sub-band · CPC title
the extracted parameters being power information · CPC title
Word spotting · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.