Primary speaker identification from audio and video data
US-2015088515-A1 · Mar 26, 2015 · US
US9875410B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-9875410-B2 |
| Application number | US-201514940988-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 13, 2015 |
| Priority date | Nov 26, 2014 |
| Publication date | Jan 23, 2018 |
| Grant date | Jan 23, 2018 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A camera system includes: a camera configured to capture an image of a surveillance area; a microphone array which includes at least one microphone; and at least one processor configured to implement: a video processor which designates at least one subject in the image as a target; a beam-former which calculates a rotation angle of the microphone array based on a location of the subject; and a driving controller which rotates the microphone array toward the subject based on the rotation angle of the microphone array, wherein the beam-forming unit further performs signal processing on an audio input signal received through the microphone array rotated toward the subject and outputs an audio output signal corresponding to the audio input signal.
Opening claim text (preview).
What is claimed is: 1. A camera system comprising: a camera configured to capture an image of a surveillance area; a speaker array which comprises a plurality of speakers, a microphone array which comprises at least one microphone; and at least one processor configured to implement: a video processor which designates at least one subject in the image as a target; a beam-former which calculates a rotation angle of the microphone array based on a location of the at least one subject; and a driving controller which rotates the microphone array in a direction toward the at least one subject based on the rotation angle of the microphone array, wherein the beam-former further performs signal processing on an audio input signal received through the microphone array rotated in the direction toward the at least one subject, wherein the at least one microphone comprises a plurality of microphones, and the at least one subject comprises a plurality of subjects, and wherein the at least one processor is further configured to implement an audio processor which generates a plurality of audio output signals based on audio input signals received through the plurality of microphones and controls the beam-former to transmit the plurality of audio output signals toward the plurality of subjects through the plurality of speakers, respectively, in a time division multiplexing manner. 2. The camera system of claim 1 , wherein in response to the at least one subject not being located in a center of the image, the video processor calculates a rotation angle of a lens of the camera, and the driving controller rotates the lens of the camera based on the rotation angle of the lens so that the at least one subject is located in the center of the image, and wherein in response to the at least one subject being located in the center of the image, the video processor designates the at least one subject as the target. 3. The camera system of claim 1 , wherein the beam-former determines a rotation angle of the speaker array based on the rotation angle of the microphone array, and the driving controller rotates the speaker array toward the at least one subject based on the rotation angle of the speaker array. 4. The camera system of claim 1 , wherein the beam-former calculates a distance from the camera to the at least one subject, and calculates the rotation angle of the microphone array using the calculated distance. 5. The camera system of claim 1 , wherein the video processor generates an image analysis result by performing at least one of facial recognition of the at least one subject, a behavioral pattern analysis of the at least one subject, and a situation analysis with respect to the image, and wherein the audio processor recognizes sound of the at least one subject by matching the image analysis result with the audio input signal received through the microphone array and generates an audio output signal corresponding to the sound. 6. The camera system of claim 1 , wherein the beam-former calculates rotations angles of the plurality of microphones based on locations of the plurality of subjects, respectively, and wherein the driving controller rotates the plurality of microphones toward the plurality of subjects based on the rotation angles of the plurality of microphones so that the beam-former performs the signal processing on the audio input signals received through the plurality of microphones rotated toward the at least one subject, respectively. 7. The camera system of claim 1 , wherein the audio processor generates the plurality of audio output signals which are multiplexed and different in at least one of an amplitude and a phase, respectively. 8. A method of operating a camera system comprising a camera and a microphone array which comprises at least one microphone by using at least one processor, the method comprising: capturing an image of a surveillance area using the camera; designating at least one subject in the image as a target; calculating a rotation angle of the microphone array based on a location of the at least one subject; rotating the microphone array toward the at least one subject based on the rotation angle of the microphone array; and receiving an audio input signal through the microphone array rotated toward the at least one subject, wherein the at least one microphone comprises a plurality of microphones, and the at least one subject comprises a plurality of subjects, and wherein the method further comprises generating a plurality of audio output signals based on audio input signals received through the plurality of microphones, and transmitting the plurality of audio output signals towards the plurality of subjects, respectively, in a time division multiplexing manner. 9. The method of claim 8 , further comprising: generating an audio output signal based on the audio input signal received through the microphone array; and transmitting the audio output signal toward the at least one subject through a speaker array which comprises at least one speaker. 10. The method of claim 8 , further comprising: determining the location of the at least one subject; and in response to the at least one subject not being located in a center of the image, calculating a rotation angle of a lens of the camera, and rotating the lens based on the rotation angle of the lens so that the at least one subject is located in the center of the image, wherein the designating the at least one subject as the target is performed in response to the at least one subject being located in the center of the image. 11. The method of claim 8 , wherein the camera system further comprises a speaker array which comprises at least one speaker, and wherein the method further comprises: determining a rotation angle of the speaker array based on the rotation angle of the microphone array, and rotating the speaker array toward the at least one subject based on the rotation angle of the speaker array; and generating an audio output signal toward the at least one subject corresponding to the audio input signal received through the microphone array. 12. The method of claim 8 , further comprising calculating a distance from the camera to the at least one subject which is used to calculate the rotation angle of the microphone array. 13. The method of claim 8 , further comprising: generating an image analysis result by performing at least one of facial recognition of the at least one subject, a behavioral pattern analysis of the at least one subject, and a situation analysis with respect to the image; and recognizing sound of the at least one subject by matching the image analysis result with the audio input signal received through the microphone array and generating an audio output signal corresponding to the sound. 14. The method of claim 8 , wherein the calculating the rotation angle of the microphone array comprises calculating rotations angles of the plurality of microphones based on locations of the plurality of subjects, respectively, and wherein the rotating the microphone array toward the at least one subject comprises rotating the plurality of microphones toward the plurality of subjects based on the rotation angles of the plurality of microphones to receive the audio input signals through the plurality of microphones rotated toward the at least one subject, respectively. 15. The method of claim 8 , wherein the plurality of audio output signals are multiplexed and different in at least one of an amplitude and a phase, respectively.
for receiving images from a single remote source · CPC title
of input or preprocessed data · CPC title
Surveillance or monitoring of activities, e.g. for recognising suspicious objects (recognising microscopic objects G06V20/69) · CPC title
of input or preprocessed data · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.