Audio source localization
US-11671756-B2 · Jun 6, 2023 · US
US12298425B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12298425-B2 |
| Application number | US-202318215486-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2023 |
| Priority date | Dec 31, 2020 |
| Publication date | May 13, 2025 |
| Grant date | May 13, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A sound source positioning method and apparatus, a computer-readable storage medium, and a sound pickup apparatus are provided, for accurately locating a sound source by using a microphone array and a radar. The method includes: obtaining first location information by using echo data of a radar ( 201 ), where the first location information includes a first angle of an object relative to the radar; obtaining an incident angle by using a voice signal captured by a microphone array ( 202 ), where the incident angle is an angle at which the voice signal is incident to the microphone array; and fusing the first angle and the incident angle to obtain second location information ( 205 ), where the second location information indicates a location of a sound source generating the voice signal.
Opening claim text (preview).
What is claimed is: 1. A method for identifying a position of a sound source, the method comprising: obtaining first location information based on echo data measured by a radar, wherein the first location information comprises a first angle of an object relative to the radar, the object associated with the sound source; obtaining an incident angle based on a plurality of voice signals respectively captured by a plurality of microphones of a microphone array, the plurality of voice signals correspond to the sound source, wherein the incident angle is an angle at which the plurality of voice signals is incident on the microphone array; and fusing the first angle and the incident angle to obtain second location information, wherein the second location information indicates the position of the sound source, wherein fusing the first angle and the incident angle to obtain the second location information comprises: determining a first weight corresponding to the first angle and a second weight corresponding to the incident angle, wherein the first weight and a moving speed of the object relative to the radar are positively correlated, and performing weighted fusion on the first angle and the incident angle based on the first weight and the second weight to obtain a fused angle. 2. The method according to claim 1 , wherein the second weight and the moving speed of the object relative to the radar are negatively correlated; and wherein the second location information comprises the fused angle. 3. The method according to claim 1 , wherein the method further comprises: extracting, based on the second location information, voice data of the sound source from the voice signals captured by the microphone array. 4. The method according to claim 3 , wherein the extracting, based on the second location information, voice data of the sound source from the voice signals captured by the microphone array comprises: using data captured by the microphone array as input data of a preset beam separation network, and outputting the voice data of the sound source. 5. The method according to claim 4 , wherein the beam separation network comprises a voice separation model for separating the voice data of the sound source and background data in the input data, and the method further comprises: determining a moving speed of the sound source based on the echo data; and updating the voice separation model based on the moving speed to obtain an updated voice separation model. 6. The method according to claim 5 , wherein the updating the voice separation model based on the moving speed comprises: determining a parameter set of the voice separation model based on the moving speed, to obtain the updated voice separation model, wherein the parameter set is related to a change rate of a parameter of the voice separation model, and wherein the moving speed and the change rate are positively correlated. 7. The method according to claim 5 , wherein the beam separation network further comprises a dereverberation model to filter out a reverberation signal in the input data, the method further comprising updating, based on a distance between the object and the radar, the dereverberation model to obtain an updated dereverberation model. 8. The method according to claim 7 , wherein the updating, based on the distance between the object and the radar, the dereverberation model comprises: updating, based on the distance between the object and the radar, a delay parameter and a prediction order in the dereverberation model to obtain the updated dereverberation model, wherein the delay parameter indicates a duration of the reverberation signal lagging behind the voice data of the sound source, wherein the prediction order indicates a duration of reverberation, and wherein both the delay parameter and the prediction order are positively correlated with the distance. 9. The method according to claim 3 , the method further comprising: in response to the voice data of the sound source does not meet a preset condition, removing a beam used to process the data corresponding to the sound source in the data captured by the microphone array. 10. The method according to claim 3 , the method further comprising: extracting a feature from the voice data to obtain an acoustic feature of the sound source; recognizing, based on the acoustic feature, a first probability that the sound source is a living object; determining, based on the echo data of the radar, a second probability that the sound source is the living object; and fusing the first probability and the second probability to obtain a fusion result indicating whether the sound source is the living object. 11. The method according to claim 1 , wherein the obtaining the incident angle by using the voice signals captured by the microphone array comprises: in response to a plurality of second angles being obtained by using the voice signals captured by the microphone array, and the first angle and the plurality of second angles are in a same coordinate system, selecting, a second angle from the plurality of second angles, wherein a difference between the second angle and the first angle is a smallest difference in a plurality of differences between the plurality of second angles and the first angle, or the difference falls within a first preset range; and using the second angle as the incident angle. 12. The method according to claim 1 , wherein after the obtaining the incident angle by using the voice signal captured by the microphone array, the method further comprising: in response to a plurality of third angles are obtained based on data captured by the microphone array for another time, selecting, based on the moving speed of the object, a third angle from the plurality of third angles, and using the third angle as a new incident angle. 13. The method according to claim 12 , wherein the selecting, based on the moving speed of the object, the third angle from the plurality of third angles, and using the third angle as the new incident angle comprises: in response to the moving speed of the object is greater than a preset speed, selecting, from the plurality of third angles, the third angle, wherein a difference between the third angle and the first angle falls within a second preset range, and using the third angle as the new incident angle. 14. The method according to claim 13 , wherein the selecting, based on the moving speed of the object, the third angle from the plurality of third angles, and using the third angle as the new incident angle comprises: in response to the moving speed of the object is not greater than the preset speed, selecting, from the plurality of third angles, the third angle, wherein a difference between the third angle and the first angle falls within a third preset range, wherein the third preset range is greater than the second preset range, and using the third angle as the new incident angle. 15. The method according to claim 1 , wherein before the obtaining the incident angle by using the voice signal captured by the microphone array, the method further comprising: in response to it is determined, by using the echo data, that the object is in a moving state and the object does not make a sound, adjusting a sound source detection threshold of the microphone array for the object, wherein the microphone array is configured to capture a voice signal of which a sound pressure is higher than the sound source detection threshold. 16. The method according to claim 1 , wherein the first location information further comprises a first relati
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
microphones · CPC title
Microphone arrays; Beamforming · CPC title
the noise being echo, reverberation of the speech · CPC title
using properties of sound source · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.