What technology area does this patent fall under?

Primary CPC classification G06F40/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Smart cameras enabled by assistant systems

US11308284B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11308284-B2
Application number	US-201916659363-A
Country	US
Kind code	B2
Filing date	Oct 21, 2019
Priority date	Oct 18, 2019
Publication date	Apr 19, 2022
Grant date	Apr 19, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In one embodiment, a method includes receiving a user input from a user from a client system associated with the user, wherein the client system comprises one or more cameras, determining one or more points of interest in a field of view of the one or more cameras based on one or more machine-learning models and sensory data captured by the one or more cameras, generating a plurality of media files based on the one or more points of interest, wherein each media file is a recording of at least one of the one or more points of interest, generating one or more highlight files based on the plurality of media files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and sending instructions for presenting the one or more highlight files to the client system.

First claim

Opening claim text (preview).

What is claimed is: 1. A method comprising, by one or more computing systems: accessing sensory data captured by one or more cameras associated with a client system; determining, based on one or more machine-learning models and the sensory data captured by the one or more cameras, one or more points of interest in a field of view of the one or more cameras; generating, based on the one or more points of interest, a plurality of media files, wherein each media file is associated with a recording of at least one of the one or more points of interest; generating, based on the plurality of media files, one or more highlight files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and wherein each highlight file is associated with a respective captioning; and sending, to the client system, instructions for presenting the one or more highlight files. 2. The method of claim 1 , wherein the sensory data is based on one or more of textual signals, visual signals, or audio signals. 3. The method of claim 1 , further comprising: receiving, from the client system, a user input based on one or more of a text input, an audio input, an image input, a video input, an eye gaze, a gesture, or a motion, wherein determining the one or more points of interest in the field of view of the one or more cameras is responsive to the user input. 4. The method of claim 1 , wherein each of the plurality of media files comprises one or more of an image or a video clip. 5. The method of claim 1 , wherein determining the points of interest comprises: detecting one or more people in the field of view; and determining, based on one or more facial recognition algorithms, one or more identifiers of one or more of the detected people. 6. The method of claim 5 , wherein determining the points of interest is based on a measure of interestingness of one or more of the detected people, wherein the measure of interestingness is determined by the one or more machine-learning models based on one or more of the identifiers. 7. The method of claim 1 , wherein determining the points of interest comprises: detecting one or more people in the field of view; and determining one or more facial expressions of one or more of the detected people. 8. The method of claim 7 , wherein determining the points of interest is based on a measure of interestingness of one or more of the detected people, wherein the measure of interestingness is determined by the one or more machine-learning models based on one or more of the facial expressions. 9. The method of claim 1 , wherein determining the points of interest comprises: detecting one or more objects in the field of view. 10. The method of claim 9 , wherein determining the points of interest is based on a measure of interestingness of one or more of the detected objects, wherein the measure of interestingness is determined by the one or more machine-learning models based on one or more of the detected objects. 11. The method of claim 1 , wherein determining the points of interest is based on eye gaze data of the user captured by the client system. 12. The method of claim 1 , wherein the predefined quality standard is based on one or more of blurriness, lighting, or vividness of color. 13. The method of claim 1 , further comprising: receiving, from the client system, a user query from the user in response to the highlight files; accessing a plurality of episodic memories associated with the user; identifying one or more episodic memories of the accessed episodic memories as related to the user query; retrieving one or more media files corresponding to the identified episodic memories, wherein each media file comprises one or more of a post, a comment, an image, or a video clip; and sending, to the client system, instructions for presenting the one or more media files corresponding to the identified episodic memories. 14. The method of claim 1 , further comprising: sending, to the client system, instructions for zooming in one or more of the cameras to position one or more of the points of interest in a center of the field of view. 15. The method of claim 1 , further comprising: sending, to the client system, instructions for zooming out one or more of the cameras to position one or more of the points of interest in a center of the field of view. 16. The method of claim 1 , wherein the highlight files are personalized for the user based on one or more of: user profile data associated with the user; user preferences associated with the user; prior user inputs by the user; or user relationships with other users in a social graph. 17. The method of claim 1 , further comprising: receiving, from the client system, a user request from the user to share one or more of the highlight files with one or more other users; and sending, to one or more other client systems associated with the one or more other users, respectively, instructions for presenting the shared highlight files. 18. The method of claim 1 , further comprising: detecting a movement of the client system; and applying one or more visual stabilization algorithms to the sensory data captured by the one or more cameras. 19. One or more computer-readable non-transitory storage media embodying software that is operable when executed to: access sensory data captured by one or more cameras associated with a client system; determine, based on one or more machine-learning models and the sensory data captured by the one or more cameras, one or more points of interest in a field of view of the one or more cameras; generate, based on the one or more points of interest, a plurality of media files, wherein each media file is associated with a recording of at least one of the one or more points of interest; generate, based on the plurality of media files, one or more highlight files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and wherein each highlight file is associated with a respective captioning; and send, to the client system, instructions for presenting the one or more highlight files. 20. A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: access sensory data captured by one or more cameras associated with a client system; determine, based on one or more machine-learning models and the sensory data captured by the one or more cameras, one or more points of interest in a field of view of the one or more cameras; generate, based on the one or more points of interest, a plurality of media files, wherein each media file is associated with a recording of at least one of the one or more points of interest; generate, based on the plurality of media files, one or more highlight files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and wherein each highlight file is associated with a respective captioning; and send, to the client system, instructions for presenting the one or more highlight files.

Assignees

Facebook Tech Llc

Inventors

Classifications

G06V10/82
using neural networks · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06V20/41
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
G06V40/174
Facial expression recognition · CPC title
G06F40/30Primary
Semantic analysis · CPC title

Patent family

Related publications grouped by family.

View patent family 75490741

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11308284B2 cover?: In one embodiment, a method includes receiving a user input from a user from a client system associated with the user, wherein the client system comprises one or more cameras, determining one or more points of interest in a field of view of the one or more cameras based on one or more machine-learning models and sensory data captured by the one or more cameras, generating a plurality of media f…
Who is the assignee on this patent?: Facebook Tech Llc
What technology area does this patent fall under?: Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 19 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).