Sound source localization using sensor fusion
US-2016249132-A1 · Aug 25, 2016 · US
US10332519B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10332519-B2 |
| Application number | US-201615529580-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 9, 2016 |
| Priority date | Apr 7, 2015 |
| Publication date | Jun 25, 2019 |
| Grant date | Jun 25, 2019 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus including circuitry configured to determine a position of a mouth of a user that is distinguishable among a plurality of people, and control an acquisition condition for collecting a sound based on the determined position of the user's mouth.
Opening claim text (preview).
The invention claimed is: 1. An apparatus, comprising: circuitry configured to control at least one imaging sensor to determine a position of a mouth of each user that is distinguishable among a plurality of people, determine a reliability of the determined position of each user's mouth based on information obtained by the at least one imaging sensor, control an acquisition condition for collecting a sound based on the determined position of each user's mouth and the determined reliability of the determined position of each user's mouth, and collect the sound using at least one sound sensor according to the controlled acquisition condition, wherein each sound sensor of the at least one sound sensor is located in a predetermined position, and wherein the acquisition condition comprises orientation and width of a sound collection region for each sound sensor of the at least one sound sensor. 2. The apparatus according to claim 1 , wherein the circuitry is further configured to: detect a body part of each user performing a gesture; and determine a relative position or a relative orientation of at least one portion of each user's body part at a plurality of points during the gesture, wherein the position of each user's mouth is determined as an estimate based on the determined relative position or the determined relative orientation of the at least one portion of each user's body part. 3. The apparatus according to claim 2 , wherein the detected body part comprises an arm of each user and the at least one portion of each user's body part comprises one or more of a hand, a forearm, an elbow, and a shoulder of the user. 4. The apparatus according to claim 3 , wherein the relative position or the relative orientation of the at least one portion of each user's body part is determined based on the relative position or the relative orientation of another one of the at least one portion of each user's body part. 5. The apparatus according to claim 2 , wherein the circuitry is further configured to determine whether the detected body part is on a left side or a right side of each user. 6. The apparatus according to claim 1 , wherein the determined position of each user's mouth is set to be a target position of sound collection, such that the orientation of the at least one sound collection region is directed toward each target position. 7. The apparatus according to claim 1 , wherein the circuitry is further configured to determine the position of the mouth of each user of a plurality of users distinguishable among the plurality of people. 8. The apparatus according to claim 7 , wherein the determined position of each mouth of the plurality of users is set to be a target position of sound collection, such that the orientation of each sound collection region is directed toward one of the plurality of target positions. 9. The apparatus according to claim 8 , wherein a number of sound sensors is equal to or greater than a number of the plurality of users. 10. The apparatus according to claim 8 , wherein each sound sensor collects sound within the orientation and the width of the sound collection region directed toward one of the plurality of target positions. 11. The apparatus according to claim 10 , wherein an estimate of the plurality of target positions is based on a determined relative position or a determined relative orientation of at least one portion of a body part of each user of the plurality of users. 12. The apparatus according to claim 11 , wherein the relative position or the relative orientation of the at least one portion of each user's body part is determined using the at least one imaging sensor at a plurality of points during a detected gesture of the user's body part. 13. The apparatus according to claim 12 , wherein the determined reliability of the determined position of each user's mouth is based on an amount of data for each target position related to the relative position or the relative orientation of the at least one portion of each user's body part, and the width of a particular sound collection region decreases as the reliability of the estimate of a particular target position of the plurality of target positions increases. 14. The apparatus according to claim 1 , wherein the circuitry is further configured to control a display device to display visual information indicating the control of the acquisition condition. 15. The apparatus according to claim 14 , wherein the displayed visual information indicating the control of the acquisition condition is based on the determined reliability of the determined position of each user's mouth. 16. The apparatus according to claim 14 , wherein a size of the displayed visual information is controlled according to the determined reliability of the determined position of each user's mouth. 17. The apparatus according to claim 1 , wherein each imaging sensor of the at least one imaging sensor is located in the predetermined position of a respective sound sensor of the at least one sound sensor. 18. An information processing method, performed via at least one processor, the method comprising: controlling at least one imaging sensor to determine a position of a mouth of each user that is distinguishable among a plurality of people; determining a reliability of the determined position of each user's mouth based on information obtained by the at least one imaging sensor; controlling an acquisition condition for collecting a sound based on the determined position of each user's mouth and the determined reliability of the determined position of each user's mouth; and collecting the sound using at least one sound sensor according to the controlled acquisition condition, wherein each sound sensor of the at least one sound sensor is located in a predetermined position, and wherein the acquisition condition comprises orientation and width of a sound collection region for each sound sensor of the at least one sound sensor. 19. A non-transitory computer-readable medium having embodied thereon a program, which when executed by a computer causes the computer to execute a method, the method comprising: controlling at least one imaging sensor to determine a position of a mouth of each user that is distinguishable among a plurality of people; determining a reliability of the determined position of each user's mouth based on information obtained by the at least one imaging sensor; controlling an acquisition condition for collecting a sound based on the determined position of each user's mouth and the determined reliability of the determined position of each user's mouth; and collecting the sound using at least one sound sensor according to the controlled acquisition condition, wherein each sound sensor of the at least one sound sensor is located in a predetermined position, and wherein the acquisition condition comprises orientation and width of a sound collection region for each sound sensor of the at least one sound sensor.
Procedures used during a speech recognition process, e.g. man-machine dialogue · CPC title
with detection of the device orientation or free movement in a three-dimensional [3D] space, e.g. 3D mice, 6-DOF [six degrees of freedom] pointers using gyroscopes, accelerometers or tilt-sensors · CPC title
Circuits for transducers (arrangements for producing a reverberation or echo sound G10K15/08; amplifiers H03F) · CPC title
Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title
Sound input; Sound output (speech processing G10L) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.