Video processing method and electronic device
US-12309447-B2 · May 20, 2025 · US
US12452531B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12452531-B2 |
| Application number | US-202318373078-A |
| Country | US |
| Kind code | B2 |
| Filing date | Sep 26, 2023 |
| Priority date | Sep 29, 2022 |
| Publication date | Oct 21, 2025 |
| Grant date | Oct 21, 2025 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
An electronic device for controlling a photographic system may obtain a video stream and a user query for a target event, obtain a set of photos from the video stream, obtain at least one photoshoot suggestion based on the user query via a language model, obtain a snapped photo for the target event based on the at least one photoshoot suggestion, in response to a given video frame included in the video stream satisfying a target content criterion, and output one or more photos selected from the set of photos and the snapped photo as event photos.
Opening claim text (preview).
What is claimed is: 1. An electronic device for controlling a photographic system, the electronic device comprising: a memory storing one or more instructions; and one or more processors configured to: obtain a video stream and a user query for a target event; obtain a set of photos from the video stream; obtain at least one photoshoot suggestion based on the user query via a language model; obtain a snapped photo for the target event based on the at least one photoshoot suggestion, in response to a given video frame included in the video stream satisfying a target content criterion; and output one or more photos selected from the set of photos and the snapped photo as event photos. 2. The electronic device of claim 1 , wherein the given video frame meets the target content criterion when a similarity score between a text embedding extracted from the current video frame and an image embedding extracted from the at least one photoshoot suggestion, is greater than similarity scores between each of text embeddings extracted from previous video frames within the video stream and the image embedding extracted from the at least one photoshoot suggestion. 3. The electronic device of claim 1 , further comprising a first camera configured to acquire the video stream and a second camera configured to acquire the snapped photo, wherein: the at least one photoshoot suggestion comprises a plurality of photoshoot suggestions, any one or any combination of the one or more processors are configured to: extract an image embedding from the current video frame acquired at a current pose of the first camera; obtain a plurality of text embeddings from the plurality of photoshoot suggestions, respectively; compute similarity scores between the image embedding and each of the plurality of text embeddings; select a first photoshoot suggestion that has a highest similarity score, from among the similarity scores; increment a counter that is initially set for the selected first photoshoot suggestion over time; decrease the similarity score for the selected first photoshoot suggestion over time by reducing the similarity score by a value of the counter that increases over time; select a second photoshoot suggestion that initially had a second-highest similarity score and has surpassed all other photoshoot suggestions in similarity score; and adjust the current pose of the first camera to capture the selected second photoshoot suggestion. 4. The electronic device of claim 1 , further comprising a first camera configured to acquire the video stream and a second camera configured to acquire the snapped photo, wherein any one or any combination of the one or more processors are configured to: extract an image embedding from the given video frame that is acquired at a current pose of the first camera; obtain a text embedding from the at least one photoshoot suggestion; acquire translation coordinates and rotation angles of a next pose of the first camera, based on a change in similarity between the image embedding and the text embedding with respect to change in each pixel in the video frame; adjust the pose of the first camera based on the translation coordinates and the rotation angles; and control the first camera to acquire a next video frame in the adjusted pose. 5. The electronic device of claim 1 , further comprising a first camera configured to acquire the video stream and a second camera configured to acquire the snapped photo, wherein any one or any combination of the one or more processors are configured to: extract an image embedding from the video frame that is acquired at a current pose of the first camera; obtain a text embedding from the at least one photoshoot suggestion; acquire translation coordinates and rotation angles of a next pose of the first camera, based on a change in similarity between the image embedding and the text embedding with respect to change in camera pose parameters of the current pose of the first camera, adjust the pose of the first camera based on the translation coordinates and the rotation angles; and control the camera to acquire a next video frame in the adjusted pose. 6. The electronic device of claim 1 , wherein any one or any combination of the one or more processors are configured to: construct a full query based on the user query; input the full query to the language model; acquire the at least one photoshoot suggestion as an output of the language model; and control a camera to obtain the snapped photo based on the at least one photoshoot suggestion. 7. The electronic device of claim 6 , wherein any one or any combination of the one or more processors are configured to: obtain a voice signal during the target event; identify a key event descriptor based on the voice signal acquired during the target event; construct the full query based on the user query and the key event descriptor identified from the voice signal; and input the full query to the language model to acquire the least one photoshoot suggestion that reflects the identified key event descriptor. 8. The electronic device of claim 1 , wherein any one or any combination of the one or more processors are configured to: identify a key event descriptor from the set of photos; construct a full query based on the key event descriptor identified from the set of photos and the user query; and input the full query to the language model to acquire the least one photoshoot suggestion that reflects the identified key event descriptor. 9. The electronic device of claim 1 , wherein any one or any combination of the one or more processors are configured to: determine whether any one of the at least one photoshoot suggestion includes a photography composition directive; and discard the photoshoot suggestion including the photography composition directive. 10. The electronic device of claim 1 , wherein any one or any combination of the one or more processors are configured to: determine whether to use a photo gallery application or a camera application based on device capabilities of the electronic device and the user query; based on the photo gallery application being activated, access a photo gallery of the electronic device to acquire the set of photos that has been stored in the memory; and based on the camera application being activated, acquire the set of photos and the snapped photo to be stored in the memory. 11. A method for controlling a photographic system, the method comprising: obtaining a video stream and a user query for a target event; obtaining a set of photos from the video stream; obtaining at least one photoshoot suggestion based on the user query via a language model; obtaining a snapped photo for the target event based on the at least one photoshoot suggestion, in response to a given video frame included in the video stream satisfying a target content criterion; and outputting one or more photos selected from the set of photos and the snapped photo as event photos. 12. The method of claim 11 , further comprising: determining that the given video frame satisfies the target content criterion when a similarity score between a text embedding extracted from the given video frame and an image embedding extracted from the at least one photoshoot suggestion, is greater than similarity scores between each of text embeddings extracted from previous video frames within the video stream and the image embedding extracted from the at least one photoshoot suggestion. 13. The method of claim 11 , wherein: the video stream is acquired by a first camera, and the snapped photo is acquired by a second camera, t
Semantic analysis · CPC title
by means of an audio-responsive input (audible safety signals B25J19/061) · CPC title
Video; Image sequence · CPC title
Determining position or orientation of objects or cameras (camera calibration G06T7/80) · CPC title
including video camera means · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.