Image pickup apparatus and control method therefor
US-11102389-B2 · Aug 24, 2021 · US
US11445145B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11445145-B2 |
| Application number | US-201916375399-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 4, 2019 |
| Priority date | Apr 4, 2018 |
| Publication date | Sep 13, 2022 |
| Grant date | Sep 13, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The present application relates to the technical field of communication, and provides a method and a device for controlling camera shooting, a smart device and a computer storage medium, including: collecting voice data of a sound source object; extracting a voice feature based on the voice data of the sound source object; determining a current voice scene according to the extracted voice feature and a voice feature corresponding to the preset voice scene; and acquiring a shooting mode corresponding to the current voice scene, and controlling movement of the camera according to the shooting mode corresponding to the current voice scene. With the method above, frequently shaking can be avoided, and shooting efficiency and user experience can be improved.
Opening claim text (preview).
What is claimed is: 1. A method for controlling camera shooting comprising steps of: collecting voice data of a sound source object; extracting a voice feature based on the voice data of the sound source object; determining a current voice scene according to the extracted voice feature and a voice feature corresponding to a preset voice scene, wherein the voice feature comprises one or more selected from a group consisting of a voice duration, a voice interval duration, a sound source angle, a sound intensity of a voice, or a sound frequency of a voice; and acquiring a shooting mode corresponding to the current voice scene, and controlling the movement of the camera according to the shooting mode corresponding to the current voice scene, wherein the step of acquiring the shooting mode corresponding to the current voice scene, and controlling movement of the camera according to the shooting mode corresponding to the current voice scene comprises: acquiring a first sound source angle of the first voice data if a first voice data of a first sound source object is detected when the current voice scene is determined to be a round table conference scene; controlling the movement of the camera to the first sound source object corresponding to the first sound source angle according to the first sound source angle; predetermining, according to a scheduling mode corresponding to the round table conference scene, a subsequent second sound source object which sends voice data subsequently when the first voice data end; controlling in advance movement of the camera to the second sound source object according to a sound source angle of the second sound source object; or alternatively predetermining the second sound source object sending voice data and a third sound source object sending voice data according to the scheduling mode corresponding to the round table conference scene when the first voice data end; and controlling in advance movement of the camera to an intermediate position between the second sound source object and the third sound source object according to the sound source angle of the second sound source object and the sound source angle of the third sound source object. 2. The method of claim 1 , wherein the step of extracting a voice feature based on the voice data of the sound source object comprises: extracting voice features of a specified amount of the voice data; determining the current voice scene by inputting the specified amount of the voice data into a trained machine learning model. 3. The method of claim 2 , wherein steps of training the machine learning model comprises: acquiring a specified amount of sample voice data, and establishing a sample voice data set based on the sample voice data, wherein the sample voice data is marked with a voice scene, and the number of the sample voice data of each voice scene is no less than an average of the number of the sample voice data of each voice scene; extracting voice features according to the sample voice data, and establishing a feature vector set based on the voice features extracted; training a decision tree of the sample voice data set according to the feature vector set until an actual output value of the decision tree is the same as an ideal output value, and the training is completed. 4. The method of claim 1 , wherein the step of determining the current voice scene according to the extracted voice feature and the voice feature corresponding to the preset voice scene comprises: acquiring a specified amount of a sample voice data; determining a distribution of the sound source angle, a voice duration distribution, and a voice interval time of the sample voice data; constructing a decision tree according to the distribution of the sound source angle, the voice duration distribution, and the voice interval time of the sample voice data acquired; determining a current scene according to the decision tree constructed and the voice features acquired. 5. The method of claim 1 , wherein the step of acquiring the shooting mode corresponding to the current voice scene, and controlling the movement of the camera according to the shooting mode corresponding to the current voice scene comprises: acquiring voice data from a beginning of a video conference to a current moment when a current scene is a video conference scene; dividing speaking regions according to the voice data acquired, and determining region angles of the speaking regions divided; acquiring a sound source angle of the new voice data when the new voice data are detected; determining a speaking region to which the sound source angle of the new voice data belongs; controlling a turning angle of the camera according to the region angle of the speaking region. 6. The method of claim 5 , wherein the step of controlling the turning angle of the camera according to the region angle of the speaking region comprises: acquiring the amount n of the voice data from the beginning of the video conference to the current moment, and a voice duration Ti and the sound source angle Ai corresponding to each of the voice data; determining an angle Ac of the camera to be rotated according to the following formula: Ac = ∑ i = 1 n A i × T i ∑ i = 1 n T i . 7. The method of claim 5 , wherein the step of controlling the turning angle of the camera according to the region angle of the speaking region comprises: acquiring the amount n of the voice data from the beginning of the video conference to the current moment, and the sound source angle Ai; determining an angle Ac of the camera to be rotated according to the following formula: Ac = ∑ i = 1 n Ai . 8. A smart device, comprising: a memory, a processor, and a computer program stored in the memory and executable by the processor, wherein when the processor executes the computer program, the steps claimed according to claim 1 are implemented. 9. A computer storage medium, the computer storage medium is stored with a computer program, wherein when the computer program is executed by a processor, the steps claimed according to claim 1 are implemented.
Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects · CPC title
Camera operation mode switching, e.g. between still and video, sport and normal or high- and low-resolution modes · CPC title
Control of cameras or camera modules · CPC title
Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals (selecting H04Q) · CPC title
Conference systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.