Voice control device with push-to-talk (ptt) and mute controls
US-2024312458-A1 · Sep 19, 2024 · US
US9398247B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9398247-B2 |
| Application number | US-201214127772-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 19, 2012 |
| Priority date | Jul 26, 2011 |
| Publication date | Jul 19, 2016 |
| Grant date | Jul 19, 2016 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An information processing apparatus includes a processor that receives captured image data and captured sound data corresponding to an environment in which content is reproduced and detects a user based on the captured image data and analyzes a situation of the environment based on a result of the detection and the captured sound data and controls an audio volume corresponding to reproduced content based on a result of the analyzing.
Opening claim text (preview).
The invention claimed is: 1. An information processing apparatus comprising: an input circuit for reception of capture image data and captured sound data corresponding to an environment in which content is reproduced; a processor that: processes the captured image data and the captured sound data corresponding to the environment in which content is reproduced; detects a user based on the captured image data; analyzes a situation of the environment based on a result of the detection and the captured sound data; determines a direction in the captured image data to a source of the captured sound data; determines if the direction in the captured image data to the source of the captured sound data is coincident with a location of a face of a human detected in the captured image data; and controls an audio volume corresponding to reproduced content based on a result of the analyzing, wherein when a sound level corresponding to the captured sound data is greater than or equal to a predetermined threshold value, the processor controls the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is coincident with the location of the face detected in the captured image data, and the processor controls the audio volume corresponding to the reproduced content to increase when the processor determines that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data, when the processor increases the audio volume when the processor determined that the direction in the captured image data corresponding to the source of the captured sound data which is a human voice is not coincident with the location of the face detected in the captured image data, the processor determines a volume increase amount based on the captured image data of a distance between the location of the detected user and the source of the captured sound data, and in an event of a manual adjustment of a setting, the processor once an environmental situation is over automatically returns to a previous setting before the environmental situation occurred. 2. The information processing apparatus of claim 1 , wherein the processor receives the captured image data from a camera positioned in the environment in which content is reproduced and detects the face based on the captured image data. 3. The information processing apparatus of claim 2 , wherein the processor detects a position corresponding to the detected face based on the captured image data. 4. The information processing apparatus of claim 2 , wherein the processor detects a plurality of faces based on the captured image data. 5. The information processing apparatus of claim 2 wherein the processor determines face information corresponding to the detected face, the face information including at least one of an individual, age and gender. 6. The information processing apparatus of claim 1 , wherein the processor receives the sound data from a microphone positioned in the environment in which content is reproduced. 7. The information processing apparatus of claim 1 , wherein the processor determines a sound level corresponding to the captured sound data. 8. The information processing apparatus of claim 1 , wherein the processor determines whether the captured sound data is a human's voice or a sound other than a human's voice. 9. The information processing apparatus of claim 1 , wherein the processor controls the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the level is less than the predetermined threshold value. 10. The information processing apparatus of claim 1 , wherein the processor determines whether the captured sound data is a human's voice or a sound other than a human's voice when it is determined that the level is greater than the predetermined threshold value. 11. The information processing apparatus of claim 10 , wherein the processor controls the audio volume corresponding to the reproduced content to be lowered when it is determined that the captured sound data is a human's voice and a face is not detected based on the captured image data. 12. The information processing apparatus of claim 10 , wherein the processor determines a direction corresponding to a source of the captured sound data when it is determined that the captured sound data is a human's voice and a face is detected based on the captured image data. 13. The information processing apparatus of claim 10 , wherein the processor determines whether the captured sound data corresponds to an environmental sound registered in advance when it is determined that the captured sound data is determined to be a sound other than a human's voice. 14. The information processing apparatus of claim 13 , wherein the processor controls the audio volume corresponding to the reproduced content to increase when it is determined that the captured sound data corresponds to an environmental sound that is registered in advance. 15. The information processing apparatus of claim 13 , wherein the processor controls the audio volume corresponding to the reproduced content based on previously stored settings corresponding to the environmental sound when it is determined that the captured sound data corresponds to the environmental sound stored in advance. 16. The information processing apparatus of claim 1 , wherein the processor determines an age of the detected user and, when the processor controls the audio volume to increase, the processor applies an increased gain to a predetermined audio frequency band. 17. A method performed by an information processing apparatus, the method comprising: receiving captured image data and captured sound data corresponding to an environment in which content is reproduced; detecting a user based on the captured image data; determining a direction in the captured image data to the source of the captured sound data; determining if the direction in the captured image data to the source of the captured sound data is coincident with a location of a face of a human detected in the captured image data; analyzing a situation of the environmental based on a result of the detection and the captured sound data; and controlling an audio volume corresponding to reproduced content based on a result of the analyzing, wherein when a sound level corresponding to the captured sound data is greater than or equal to a predetermined threshold value, the controlling includes controlling the audio volume corresponding to the reproduced content to remain unchanged when it is determined that the direction in the captured image data corresponding to the source of the captured sound data which is at human voice is coincident with the location of the face detected in the captured image data, and the controlling includes controlling the audio volume corresponding to the reproduced content to increase when the direction in the captured image data corresponding to the source of the captured sound data which is human voice is not coincident with the location of the face detected in the captured image data, when the controlling includes controlling the audio volume to increase when the direction in the captured image date corresponding to the source of the captured sound data which is a human voice is not co
by muting the audio signal · CPC title
involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams (arrangements characterised by components specially adapted for monitoring, identification or recognition of audio in broadcast systems H04H60/58) · CPC title
involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream (arrangements characterised by components specially adapted for monitoring, identification or recognition of video in broadcast systems H04H60/59) · CPC title
involving end-user characteristics, e.g. viewer profile, preferences (monitoring of user activities for profile generation for accessing a video database G06F16/739; user profiles in network data switching protocols H04L67/306; processing of user preferences or user profiles in wireless networks H04W8/18) · CPC title
Cameras (H04N23/00 takes precedence) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.