Voice controlled camera with AI scene detection for precise focusing

US11289078B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11289078-B2
Application numberUS-201916456523-A
CountryUS
Kind codeB2
Filing dateJun 28, 2019
Priority dateJun 28, 2019
Publication dateMar 29, 2022
Grant dateMar 29, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An apparatus, method and computer readable medium for a voice-controlled camera with artificial intelligence (AI) for precise focusing. The method includes receiving, by the camera, natural language instructions from a user for focusing the camera to achieve a desired photograph. The natural language instructions are processed using natural language processing techniques to enable the camera to understand the instructions. A preview image of a user desired scene is captured by the camera. Artificial Intelligence (AI) is applied to the preview image to obtain context and to detect objects within the preview image. A depth map of the preview image is generated to obtain distances from the detected objects in the preview image to the camera. It is determined whether the detected objects in the image match the natural language instructions from the user.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for performing precise focusing comprising: a camera, the camera having a microphone to receive natural language instructions (NLIs) from a user for focusing the camera on one or more objects to achieve a desired user image; the camera coupled to one or more processors, the one or more processors coupled to one or more memory devices, the one or more memory devices including instructions, which when executed by the one or more processors, cause the system to: process the NLIs for understanding using natural language processing (NLP) techniques; capture a preview image of the desired user image and apply artificial intelligence (AI) scene analysis to the preview image to obtain context and to detect the one or more objects within the preview image; generate a depth map of the preview image to obtain distances of detected objects in the preview image to the camera; when the detected objects match the NLIs, determine and adjust camera focus point and camera settings based on the NLIs to obtain the desired user image; and take a photograph of the desired user image. 2. The system of claim 1 , wherein when the detected objects in the image do not match the NLIs from the user, the one or more memory devices including further instructions, which when executed by the one or more processors, cause the system to: recapture the preview image of the desired user image; apply the AI scene analysis to the preview image to obtain the context and to detect the one or more objects within the preview image; generate the depth map of the preview image to obtain the distances from the detected objects in the preview image to the camera; when the detected objects in the preview image match the NLIs, determine and adjust the camera focus point and the camera settings based on the NLIs of the user to obtain the desired user image; and take the photograph of the desired user image. 3. The system of claim 1 , wherein the camera continuously listens, via a microphone, to voice commands from the user based on a wake word, the wake word to operate as a trigger to inform the camera that the voice commands following the wake word are instructions for focusing the camera to achieve the desired user image. 4. The system of claim 1 , wherein NLP uses deep learning techniques based on dense vector representations, wherein the deep learning techniques include one or more of Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and recursive neural networks. 5. The system of claim 1 , wherein the AI scene analysis uses Semantic Segmentation in real-time using Fully Convolutional Networks (FCN), R-CNN (Regional-based Convolutional Neural Network), Fast R-CNN, Faster R-CNN, YOLO (You Only Look Once), and Mask R-CNN Using TensorRT, or a combination of one or more of the above. 6. The system of claim 1 , wherein the camera focus point and the camera settings are determined by calculating optical formulas for cameras based on the detected objects to be photographed, their position in the preview image, and their estimated depth or distance to the camera. 7. The system of claim 1 , wherein the camera focus point and the camera settings are determined through experimentation by selecting a camera parameter and viewing an image of that selection using depth of field preview, wherein if the image is not good, continuously changing the camera parameter and viewing the image until the image is correct. 8. A method of performing precise focusing of a camera comprising: receiving, by the camera, natural language instructions (NLIs) from a user for focusing the camera on one or more objects to achieve a desired user image, wherein the NLIs are processed to understand the instructions using natural language processing (NLP); capturing, by the camera, a preview image of the desired user image, wherein artificial intelligence (AI) scene analysis is applied to the preview image to obtain context and to detect the one or more objects within the preview image; generating a depth map of the preview image to obtain distances of detected objects in the preview image to the camera; when the detected objects in the preview image match the NLIs, determining camera focus point and camera settings based on the NLIs and adjusting the camera focus point and the camera settings to obtain the desired user image; and taking a photograph of the desired user image. 9. The method of claim 8 , wherein the photograph is taken automatically by the camera. 10. The method of claim 8 , wherein the user is prompted to take the photograph. 11. The method of claim 8 , wherein when the detected objects in the image do not match the NLIs, recapturing, by the camera, the preview image of the desired user image; applying the AI scene analysis to the preview image to obtain the context and to detect the one or more objects within the preview image; generating the depth map of the preview image to obtain the distances from the detected objects in the preview image to the camera; when the detected objects in the preview image match the NLIs, determining the camera focus point and the camera settings based on the NLIs and adjusting the camera focus point and the camera settings to obtain the desired user image; and taking the photograph of the desired user image. 12. The method of claim 8 , wherein the camera continuously listens, via a microphone, to voice commands from the user based on a wake word, the wake word to operate as a trigger to inform the camera that the voice commands following the wake word are instructions for focusing the camera to achieve the desired user image. 13. The method of claim 8 , wherein NLP uses deep learning techniques based on dense vector representations, wherein the deep learning techniques include one or more of Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and recursive neural networks. 14. The method of claim 8 , wherein the artificial intelligence scene analysis uses Semantic Segmentation in real-time using Fully Convolutional Networks (FCN), R-CNN (Regional-based Convolutional Neural Network), Fast R-CNN, Faster R-CNN, YOLO (You Only Look Once), and Mask R-CNN Using TensorRT, or a combination of one or more of the above. 15. The method of claim 8 , wherein the camera focus point and the camera settings are determined by calculating optical formulas for cameras based on the detected objects to be photographed, their position in the preview image, and their estimated depth or distance to the camera. 16. The method of claim 8 , wherein the camera focus point and the camera settings are determined through experimentation by selecting a camera parameter and viewing an image of that selection using depth of field preview, wherein if the image is not good, continuously changing the camera parameter and viewing the image until the image is correct. 17. The method of claim 8 , wherein receiving and processing the natural language instructions and capturing and applying the AI scene analysis to the preview image are performed simultaneously. 18. At least one non-transitory computer readable medium, comprising a set of instructions, which when executed by one or more computing devices, cause the one or more computing devices to: receive, by the camera, natural language instructions (NLIs) from a user for focusing the camera on one or more objects to achieve a desired user image, wherein the NLIs are processed to understand the instructions using natural language processing (NLP); capture, by the camera, a preview image of the

Assignees

Inventors

Classifications

  • Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image · CPC title

  • Focus control based on electronic image sensor signals · CPC title

  • by using electronic viewfinders · CPC title

  • Control of parameters via user interfaces · CPC title

  • Combinations of networks · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11289078B2 cover?
An apparatus, method and computer readable medium for a voice-controlled camera with artificial intelligence (AI) for precise focusing. The method includes receiving, by the camera, natural language instructions from a user for focusing the camera to achieve a desired photograph. The natural language instructions are processed using natural language processing techniques to enable the camera to…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/22. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 29 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).