Methods, systems, and media for presenting media content previews
US-2019018568-A1 · Jan 17, 2019 · US
US11289078B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11289078-B2 |
| Application number | US-201916456523-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jun 28, 2019 |
| Priority date | Jun 28, 2019 |
| Publication date | Mar 29, 2022 |
| Grant date | Mar 29, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An apparatus, method and computer readable medium for a voice-controlled camera with artificial intelligence (AI) for precise focusing. The method includes receiving, by the camera, natural language instructions from a user for focusing the camera to achieve a desired photograph. The natural language instructions are processed using natural language processing techniques to enable the camera to understand the instructions. A preview image of a user desired scene is captured by the camera. Artificial Intelligence (AI) is applied to the preview image to obtain context and to detect objects within the preview image. A depth map of the preview image is generated to obtain distances from the detected objects in the preview image to the camera. It is determined whether the detected objects in the image match the natural language instructions from the user.
Opening claim text (preview).
What is claimed is: 1. A system for performing precise focusing comprising: a camera, the camera having a microphone to receive natural language instructions (NLIs) from a user for focusing the camera on one or more objects to achieve a desired user image; the camera coupled to one or more processors, the one or more processors coupled to one or more memory devices, the one or more memory devices including instructions, which when executed by the one or more processors, cause the system to: process the NLIs for understanding using natural language processing (NLP) techniques; capture a preview image of the desired user image and apply artificial intelligence (AI) scene analysis to the preview image to obtain context and to detect the one or more objects within the preview image; generate a depth map of the preview image to obtain distances of detected objects in the preview image to the camera; when the detected objects match the NLIs, determine and adjust camera focus point and camera settings based on the NLIs to obtain the desired user image; and take a photograph of the desired user image. 2. The system of claim 1 , wherein when the detected objects in the image do not match the NLIs from the user, the one or more memory devices including further instructions, which when executed by the one or more processors, cause the system to: recapture the preview image of the desired user image; apply the AI scene analysis to the preview image to obtain the context and to detect the one or more objects within the preview image; generate the depth map of the preview image to obtain the distances from the detected objects in the preview image to the camera; when the detected objects in the preview image match the NLIs, determine and adjust the camera focus point and the camera settings based on the NLIs of the user to obtain the desired user image; and take the photograph of the desired user image. 3. The system of claim 1 , wherein the camera continuously listens, via a microphone, to voice commands from the user based on a wake word, the wake word to operate as a trigger to inform the camera that the voice commands following the wake word are instructions for focusing the camera to achieve the desired user image. 4. The system of claim 1 , wherein NLP uses deep learning techniques based on dense vector representations, wherein the deep learning techniques include one or more of Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and recursive neural networks. 5. The system of claim 1 , wherein the AI scene analysis uses Semantic Segmentation in real-time using Fully Convolutional Networks (FCN), R-CNN (Regional-based Convolutional Neural Network), Fast R-CNN, Faster R-CNN, YOLO (You Only Look Once), and Mask R-CNN Using TensorRT, or a combination of one or more of the above. 6. The system of claim 1 , wherein the camera focus point and the camera settings are determined by calculating optical formulas for cameras based on the detected objects to be photographed, their position in the preview image, and their estimated depth or distance to the camera. 7. The system of claim 1 , wherein the camera focus point and the camera settings are determined through experimentation by selecting a camera parameter and viewing an image of that selection using depth of field preview, wherein if the image is not good, continuously changing the camera parameter and viewing the image until the image is correct. 8. A method of performing precise focusing of a camera comprising: receiving, by the camera, natural language instructions (NLIs) from a user for focusing the camera on one or more objects to achieve a desired user image, wherein the NLIs are processed to understand the instructions using natural language processing (NLP); capturing, by the camera, a preview image of the desired user image, wherein artificial intelligence (AI) scene analysis is applied to the preview image to obtain context and to detect the one or more objects within the preview image; generating a depth map of the preview image to obtain distances of detected objects in the preview image to the camera; when the detected objects in the preview image match the NLIs, determining camera focus point and camera settings based on the NLIs and adjusting the camera focus point and the camera settings to obtain the desired user image; and taking a photograph of the desired user image. 9. The method of claim 8 , wherein the photograph is taken automatically by the camera. 10. The method of claim 8 , wherein the user is prompted to take the photograph. 11. The method of claim 8 , wherein when the detected objects in the image do not match the NLIs, recapturing, by the camera, the preview image of the desired user image; applying the AI scene analysis to the preview image to obtain the context and to detect the one or more objects within the preview image; generating the depth map of the preview image to obtain the distances from the detected objects in the preview image to the camera; when the detected objects in the preview image match the NLIs, determining the camera focus point and the camera settings based on the NLIs and adjusting the camera focus point and the camera settings to obtain the desired user image; and taking the photograph of the desired user image. 12. The method of claim 8 , wherein the camera continuously listens, via a microphone, to voice commands from the user based on a wake word, the wake word to operate as a trigger to inform the camera that the voice commands following the wake word are instructions for focusing the camera to achieve the desired user image. 13. The method of claim 8 , wherein NLP uses deep learning techniques based on dense vector representations, wherein the deep learning techniques include one or more of Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and recursive neural networks. 14. The method of claim 8 , wherein the artificial intelligence scene analysis uses Semantic Segmentation in real-time using Fully Convolutional Networks (FCN), R-CNN (Regional-based Convolutional Neural Network), Fast R-CNN, Faster R-CNN, YOLO (You Only Look Once), and Mask R-CNN Using TensorRT, or a combination of one or more of the above. 15. The method of claim 8 , wherein the camera focus point and the camera settings are determined by calculating optical formulas for cameras based on the detected objects to be photographed, their position in the preview image, and their estimated depth or distance to the camera. 16. The method of claim 8 , wherein the camera focus point and the camera settings are determined through experimentation by selecting a camera parameter and viewing an image of that selection using depth of field preview, wherein if the image is not good, continuously changing the camera parameter and viewing the image until the image is correct. 17. The method of claim 8 , wherein receiving and processing the natural language instructions and capturing and applying the AI scene analysis to the preview image are performed simultaneously. 18. At least one non-transitory computer readable medium, comprising a set of instructions, which when executed by one or more computing devices, cause the one or more computing devices to: receive, by the camera, natural language instructions (NLIs) from a user for focusing the camera on one or more objects to achieve a desired user image, wherein the NLIs are processed to understand the instructions using natural language processing (NLP); capture, by the camera, a preview image of the
Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image · CPC title
Focus control based on electronic image sensor signals · CPC title
by using electronic viewfinders · CPC title
Control of parameters via user interfaces · CPC title
Combinations of networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.