What technology area does this patent fall under?

Primary CPC classification H04R3/005. Mapped technology areas include Electricity.

When was this patent published?

Publication date Tue Apr 05 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Storage medium, sound source direction estimation method, and sound source direction estimation device

US11295755B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11295755-B2
Application number	US-201916532188-A
Country	US
Kind code	B2
Filing date	Aug 5, 2019
Priority date	Aug 8, 2018
Publication date	Apr 5, 2022
Grant date	Apr 5, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A non-transitory computer-readable storage medium storing a program that causes a processor included in a computer mounted on a sound source direction estimation device to execute a process, the process includes calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone and estimating a sound source direction of the first voice data and the second voice data based on the sound pressure difference, outputting an instruction to execute a voice recognition on the first voice data or the second voice data in a language corresponding to the estimated sound source direction, and controlling a reference for estimating a sound source direction based on the sound pressure difference, based on a time length of the voice data used for the voice recognition based on the instruction and a voice recognition time length.

First claim

Opening claim text (preview).

What is claimed is: 1. A non-transitory computer-readable storage medium storing a program that causes a processor included in a computer mounted on a sound source direction estimation device to execute a process, the process comprising: calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone to estimate a sound source direction based on the sound pressure difference between the first voice data and the second voice data; estimating, by using the estimated sound source direction, a language from among a plurality of languages each corresponding to a respective individual sound source, the estimated language being a language corresponding to a sound source located in the estimated sound source direction; outputting an instruction to execute, on at least any one of the first voice data or the second voice data, a voice recognition in the estimated language; and controlling a reference for estimating a sound source direction based on the sound pressure difference, based on a time length of the voice data used for the voice recognition based on the instruction and a voice recognition time length, wherein the process of estimating the sound source direction of the first voice data and the second voice data calculates a sound pressure difference between the first voice data acquired from the first microphone and the second voice data acquired from the second microphone, and estimates the sound source direction of the first voice data and the second voice data based on a comparison result between a first threshold value for determining the sound source direction of the first voice data and the second voice data, and the sound pressure difference, and the process of controlling the reference updates the first threshold value when the voice recognition time length with respect to the time length of the voice data used for the voice recognition based on the instruction, is larger than a second threshold value for determining whether a language of the first voice data or the second voice data to be input to the voice recognition is different from the language corresponding to the sound source direction. 2. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of outputting the instruction to execute the voice recognition when the sound pressure difference obtained by subtracting a sound pressure of the second voice data from a sound pressure of the first voice data is equal to or larger than the first threshold value, estimates that a voice is uttered from a first sound source present in a direction according to a directivity of the first microphone or a directivity based on the first microphone and a sound path structure where the first microphone is installed, and outputs the instruction to execute the voice recognition in a language corresponding to the first sound source on the first voice data, and when the sound pressure difference is less than the first threshold value, estimates that a voice is uttered from a second sound source present in a direction according to a directivity of the second microphone or a directivity based on the second microphone and a sound path structure where the second microphone is installed, and outputs the instruction to execute the voice recognition in a language corresponding to the second sound source on the second voice data. 3. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference increases the first threshold value when the sound pressure difference obtained by subtracting a sound pressure of the second voice data from a sound pressure of the first voice data is equal to or larger than the first threshold value and the voice recognition time length with respect to the time length of the voice data used for the voice recognition is larger than the second threshold value, and decreases the first threshold value when the sound pressure difference is less than the first threshold value and the voice recognition time length with respect to the time length of the voice data used for the voice recognition is larger than the second threshold value. 4. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference calculates the sound pressure difference for a plurality of frames, and updates the first threshold value based on an average value of the calculated sound pressure differences for the plurality of frames. 5. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference updates the first threshold value so that a difference between the sound pressure difference and the first threshold value becomes large when the voice recognition time length with respect to the time length of the voice data used for the voice recognition is equal to or less than the second threshold value, and the difference between the sound pressure difference and the first threshold value is equal to or less than a predetermined value. 6. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of outputting the instruction to execute the voice recognition outputs the instruction to execute the voice recognition in a language corresponding to a sound source different from the estimated sound source when the voice recognition time length with respect to the time length of the voice data used for the voice recognition is larger than a third threshold value which is equal to or larger than the second threshold value. 7. The non-transitory computer-readable storage medium according to claim 1 , wherein the process of controlling the reference sets the first threshold value based on a difference between the sound pressure differences under a plurality of noise conditions. 8. A sound source direction estimation method comprising: calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone to estimate a sound source direction based on the sound pressure difference between the first voice data and the second voice data; estimating, by using the estimated sound source direction, a language from among a plurality of languages each corresponding to a respective individual sound source, the estimated language being a language corresponding to a sound source located in the estimated sound source direction; outputting an instruction to execute, on at least any one of the first voice data or the second voice data, a voice recognition in the estimated language; and controlling a reference for estimating a sound source direction based on the sound pressure difference, based on a time length of the voice data used for the voice recognition based on the instruction and a voice recognition time length, wherein the process of estimating the sound source direction of the first voice data and the second voice data calculates the sound pressure difference between the first voice data acquired from the first microphone and the second voice data acquired from the second microphone, and estimates the sound source direction of the first voice data and the second voice data based on a comparison result between a first threshold value for determining the sound source direction of the first voice data and the second voice data, and the sound pressure difference, and the process of controlling the reference updates the first threshold value when the voice recognition time length with respect to the time length of the voice data used for the voice recognition based on the instruction, is larger than

Assignees

Fujitsu Ltd

Inventors

Classifications

G06F40/263
Language identification · CPC title
H04R3/005Primary
for combining the signals of two or more microphones (specially adapted for hearing aids H04R25/407) · CPC title
G10L15/00
Speech recognition (G10L17/00 takes precedence) · CPC title
G10L25/78
Detection of presence or absence of voice signals (switching of direction of transmission by voice frequency in two-way loud-speaking telephone systems H04M9/10) · CPC title
G01S3/801
Details {(G01S3/82, G01S3/84, G01S3/86 take precedence)} · CPC title

Patent family

Related publications grouped by family.

View patent family 69406371

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11295755B2 cover?: A non-transitory computer-readable storage medium storing a program that causes a processor included in a computer mounted on a sound source direction estimation device to execute a process, the process includes calculating a sound pressure difference between a first voice data acquired from a first microphone and a second voice data acquired from a second microphone and estimating a sound sour…
Who is the assignee on this patent?: Fujitsu Ltd
What technology area does this patent fall under?: Primary CPC classification H04R3/005. Mapped technology areas include Electricity.
When was this patent published?: Publication date Tue Apr 05 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).