What technology area does this patent fall under?

Primary CPC classification G10L15/08. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Feb 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Azimuth estimation method, device, and storage medium

US11908456B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11908456-B2
Application number	US-202017006440-A
Country	US
Kind code	B2
Filing date	Aug 28, 2020
Priority date	Aug 6, 2018
Publication date	Feb 20, 2024
Grant date	Feb 20, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of this application discloses an azimuth estimation method performed at a computing device, the method including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; performing a spatial spectrum estimation on the buffered multi-channel sampling signals to obtain a spatial spectrum estimation result, when the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals; and determining an azimuth of a target voice associated with the multi-channel sampling signals according to the spatial spectrum estimation result and a highest wakeup word detection score, thereby improving the accuracy of the azimuth estimation in a voice interaction process.

First claim

Opening claim text (preview).

What is claimed is: 1. An azimuth estimation method performed at a computing device having one or more processors and memory storing a plurality of computer programs to be executed by the processor, the method comprising: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals, wherein the multi-channel sampling signals are respectively obtained through N1 channels corresponding to N1 beams in N1 different directions; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; computing, upon a determination that the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals, a time period from appearance to disappearance of the wakeup word based on the wakeup word detection scores, further including: determining a time point of the appearance of the wakeup word as a starting time point at which a wakeup word detection score starts changing; determining a time point of the disappearance of the wakeup word as an ending time point corresponding to the highest wakeup word detection score among the wakeup word detection scores; and determining the time period from the appearance to the disappearance of the wakeup word according to the starting time point and the ending time point; extracting a target sampling signal within the time period from the buffered multi-channel sampling signals; performing a spatial spectrum estimation on the target sampling signal extracted within the time period to obtain a spatial spectrum estimation result indicating signal power strengths at N2 candidate azimuths, wherein N2 is different from N1; and determining, among the N2 candidate azimuths, an azimuth of a target voice associated with the multi-channel sampling signals based on the spatial spectrum estimation result and a highest wakeup word detection score. 2. The method according to claim 1 , wherein the performing the spatial spectrum estimation on the target sampling signal to obtain the spatial spectrum estimation result comprises: calculating the signal power strengths at the N2 candidate azimuths based on the target sampling signal, wherein N2 is larger than N1. 3. The method according to claim 2 , wherein the determining an azimuth of a target voice associated with the multi-channel sampling signals based on the spatial spectrum estimation result and a highest wakeup word detection score comprises: determining an azimuth of a target main beam, the target main beam being a main beam, among the N1 beams, of a sampling signal corresponding to the highest wakeup word detection score; determining at least one local maximum value point among the signal power strengths at the N2 candidate azimuths; and determining the azimuth of the target voice based on the azimuth of the target main beam and the at least one local maximum value point. 4. The method according to claim 3 , wherein the determining the azimuth of the target voice based on the azimuth of the target main beam and the local maximum value point comprises: determining, among the N2 candidate azimuths, a candidate azimuth corresponding to a local maximum value point that is closest to the azimuth of the target main beam among the at least one local maximum value point; and determining the candidate azimuth as the azimuth of the target voice. 5. The method according to claim 3 , wherein the determining the azimuth of the target voice based on the azimuth of the target main beam and the at least one local maximum value point comprises: when there are at least two local maximum value points closest to the azimuth of the target main beam, determining an average value of candidate azimuths respectively corresponding to the at least two local maximum value points as the azimuth of the target voice. 6. The method according to claim 1 , wherein during the performing the spatial spectrum estimation, the method further comprises: stopping the wakeup word detection on the one or more sampling signals in the multi-channel sampling signals within a duration from determination of the existence of the wakeup word to reappearance of the wakeup word. 7. The method according to claim 1 , wherein the performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals comprises: performing the wakeup word detection on the one or more sampling signals in the multi-channel sampling signals, and determining a confidence of a wakeup word of each channel of the one or more sampling signals, the confidence being a similarity between content in the sampling signal and a preconfigured wakeup word; and determining the wakeup word detection score for each channel of the one or more sampling signals according to the confidence of the wakeup word of the sampling signal. 8. The method according to claim 1 , further comprising: determining that the wakeup word exists when the wakeup word detection score of any channel of the one or more sampling signals is greater than a score threshold. 9. The method according to claim 1 , further comprising: reserving sampling signals within a latest duration (M+N), and deleting sampling signals beyond the duration (M+N) in the buffered multi-channel sampling signals, M being an occupation duration of the wakeup word, and N being a preset duration. 10. A computing device, comprising one or more processors, memory and a plurality of computer programs stored in the memory that, when executed by the one or more processors, cause the computing device to perform a plurality of operations including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals, wherein the multi-channel sampling signals are respectively obtained through N1 channels corresponding to N1 beams in N1 different directions; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channel of the one or more sampling signals; computing, upon a determination that the wakeup word detection scores of the one or more sampling signals indicates that a wakeup word exists in the one or more sampling signals, a time period from appearance to disappearance of the wakeup word based on the wakeup word detection scores, further including: determining a time point of the appearance of the wakeup word as a starting time point at which a wakeup word detection score starts changing; determining a time point of the disappearance of the wakeup word as an ending time point corresponding to the highest wakeup word detection score among the wakeup word detection scores; and determining the time period from the appearance to the disappearance of the wakeup word according to the starting time point and the ending time point; extracting a target sampling signal within the time period from the buffered multi-channel sampling signals; performing a spatial spectrum estimation on the target sampling signal extracted within the time period to obtain a spatial spectrum estimation result indicating signal power strengths at N2 candidate azimuths, wherein N2 is different from N1; and determining, among the N2 candidate azimuths, an azimuth of a target voice associated with the multi-channel sampling signals based on the spatial spectrum estimation result and a highest wakeup word detection score. 11. The computing device according to claim 10 , whe

Assignees

Tencent Tech Shenzhen Co Ltd

Inventors

Classifications

G10L15/08Primary
Speech classification or search · CPC title
G10L15/22
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
G10L25/18
the extracted parameters being spectral information of each sub-band · CPC title
G10L25/21
the extracted parameters being power information · CPC title
G10L2015/088
Word spotting · CPC title

Patent family

Related publications grouped by family.

View patent family 67645177

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11908456B2 cover?: Embodiments of this application discloses an azimuth estimation method performed at a computing device, the method including: obtaining, in real time, multi-channel sampling signals and buffering the multi-channel sampling signals; performing wakeup word detection on one or more sampling signals of the multi-channel sampling signals, and determining a wakeup word detection score for each channe…
Who is the assignee on this patent?: Tencent Tech Shenzhen Co Ltd
What technology area does this patent fall under?: Primary CPC classification G10L15/08. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Feb 20 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).