Storage medium, sound source direction estimation method, and sound source direction estimation device

US11295755B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11295755-B2
Application numberUS-201916532188-A
CountryUS
Kind codeB2
Filing dateAug 5, 2019
Priority dateAug 8, 2018
Publication dateApr 5, 2022
Grant dateApr 5, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A non-transitory computer-readable storage medium storing a program that causes a processor included in a computer mounted on a sound source direction estimation device to execute a process, the process includes calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone and estimating a sound source direction of the first voice data and the second voice data based on the sound pressure difference, outputting an instruction to execute a voice recognition on the first voice data or the second voice data in a language corresponding to the estimated sound source direction, and controlling a reference for estimating a sound source direction based on the sound pressure difference, based on a time length of the voice data used for the voice recognition based on the instruction and a voice recognition time length.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable storage medium storing a program that causes a processor included in a computer mounted on a sound source direction estimation device to execute a process, the process comprising: calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone to estimate a sound source direction based on the sound pressure difference between the first voice data and the second voice data; estimating, by using the estimated sound source direction, a language from among a plurality of languages each corresponding to a respective individual sound source, the estimated language being a language corresponding to a sound source located in the estimated sound source direction; outputting an instruction to execute, on at least any one of the first voice data or the second voice data, a voice recognition in the estimated language; and controlling a reference for estimating a sound source direction based on the sound pressure difference, based on a time length of the voice data used for the voice recognition based on the instruction and a voice recognition time length, wherein the process of estimating the sound source direction of the first voice data and the second voice data calculates a sound pressure difference between the first voice data acquired from the first microphone and the second voice data acquired from the second microphone, and estimates the sound source direction of the first voice data and the second voice data based on a comparison result between a first threshold value for determining the sound source direction of the first voice data and the second voice data, and the sound pressure difference, and the process of controlling the reference updates the first threshold value when the voice recognition time length with respect to the time length of the voice data used for the voice recognition based on the instruction, is larger than a second threshold value for determining whether a language of the first voice data or the second voice data to be input to the voice recognition is different from the language corresponding to the sound source direction. 2. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of outputting the instruction to execute the voice recognition when the sound pressure difference obtained by subtracting a sound pressure of the second voice data from a sound pressure of the first voice data is equal to or larger than the first threshold value, estimates that a voice is uttered from a first sound source present in a direction according to a directivity of the first microphone or a directivity based on the first microphone and a sound path structure where the first microphone is installed, and outputs the instruction to execute the voice recognition in a language corresponding to the first sound source on the first voice data, and when the sound pressure difference is less than the first threshold value, estimates that a voice is uttered from a second sound source present in a direction according to a directivity of the second microphone or a directivity based on the second microphone and a sound path structure where the second microphone is installed, and outputs the instruction to execute the voice recognition in a language corresponding to the second sound source on the second voice data. 3. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference increases the first threshold value when the sound pressure difference obtained by subtracting a sound pressure of the second voice data from a sound pressure of the first voice data is equal to or larger than the first threshold value and the voice recognition time length with respect to the time length of the voice data used for the voice recognition is larger than the second threshold value, and decreases the first threshold value when the sound pressure difference is less than the first threshold value and the voice recognition time length with respect to the time length of the voice data used for the voice recognition is larger than the second threshold value. 4. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference calculates the sound pressure difference for a plurality of frames, and updates the first threshold value based on an average value of the calculated sound pressure differences for the plurality of frames. 5. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference updates the first threshold value so that a difference between the sound pressure difference and the first threshold value becomes large when the voice recognition time length with respect to the time length of the voice data used for the voice recognition is equal to or less than the second threshold value, and the difference between the sound pressure difference and the first threshold value is equal to or less than a predetermined value. 6. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of outputting the instruction to execute the voice recognition outputs the instruction to execute the voice recognition in a language corresponding to a sound source different from the estimated sound source when the voice recognition time length with respect to the time length of the voice data used for the voice recognition is larger than a third threshold value which is equal to or larger than the second threshold value. 7. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference sets the first threshold value based on a difference between the sound pressure differences under a plurality of noise conditions. 8. A sound source direction estimation method comprising: calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone to estimate a sound source direction based on the sound pressure difference between the first voice data and the second voice data; estimating, by using the estimated sound source direction, a language from among a plurality of languages each corresponding to a respective individual sound source, the estimated language being a language corresponding to a sound source located in the estimated sound source direction; outputting an instruction to execute, on at least any one of the first voice data or the second voice data, a voice recognition in the estimated language; and controlling a reference for estimating a sound source direction based on the sound pressure difference, based on a time length of the voice data used for the voice recognition based on the instruction and a voice recognition time length, wherein the process of estimating the sound source direction of the first voice data and the second voice data calculates the sound pressure difference between the first voice data acquired from the first microphone and the second voice data acquired from the second microphone, and estimates the sound source direction of the first voice data and the second voice data based on a comparison result between a first threshold value for determining the sound source direction of the first voice data and the second voice data, and the sound pressure difference, and the process of controlling the reference updates the first threshold value when the voice recognition time length with respect to the time length of the voice data used for the voice recognition based on the instruction, is larger than

Assignees

Inventors

Classifications

  • Language identification · CPC title

  • H04R3/005Primary

    for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title

  • Speech recognition (G10L17/00 takes precedence) · CPC title

  • Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title

  • Details {(G01S3/82, G01S3/84, G01S3/86 take precedence)} · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11295755B2 cover?
A non-transitory computer-readable storage medium storing a program that causes a processor included in a computer mounted on a sound source direction estimation device to execute a process, the process includes calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone and estimating a sound sour…
Who is the assignee on this patent?
Fujitsu Ltd
What technology area does this patent fall under?
Primary CPC classification H04R3/005. Mapped technology areas include Electricity.
When was this patent published?
Publication date Tue Apr 05 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).