Multi-command single utterance input method
US-2015348551-A1 · Dec 3, 2015 · US
US11308284B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11308284-B2 |
| Application number | US-201916659363-A |
| Country | US |
| Kind code | B2 |
| Filing date | Oct 21, 2019 |
| Priority date | Oct 18, 2019 |
| Publication date | Apr 19, 2022 |
| Grant date | Apr 19, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In one embodiment, a method includes receiving a user input from a user from a client system associated with the user, wherein the client system comprises one or more cameras, determining one or more points of interest in a field of view of the one or more cameras based on one or more machine-learning models and sensory data captured by the one or more cameras, generating a plurality of media files based on the one or more points of interest, wherein each media file is a recording of at least one of the one or more points of interest, generating one or more highlight files based on the plurality of media files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and sending instructions for presenting the one or more highlight files to the client system.
Opening claim text (preview).
What is claimed is: 1. A method comprising, by one or more computing systems: accessing sensory data captured by one or more cameras associated with a client system; determining, based on one or more machine-learning models and the sensory data captured by the one or more cameras, one or more points of interest in a field of view of the one or more cameras; generating, based on the one or more points of interest, a plurality of media files, wherein each media file is associated with a recording of at least one of the one or more points of interest; generating, based on the plurality of media files, one or more highlight files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and wherein each highlight file is associated with a respective captioning; and sending, to the client system, instructions for presenting the one or more highlight files. 2. The method of claim 1 , wherein the sensory data is based on one or more of textual signals, visual signals, or audio signals. 3. The method of claim 1 , further comprising: receiving, from the client system, a user input based on one or more of a text input, an audio input, an image input, a video input, an eye gaze, a gesture, or a motion, wherein determining the one or more points of interest in the field of view of the one or more cameras is responsive to the user input. 4. The method of claim 1 , wherein each of the plurality of media files comprises one or more of an image or a video clip. 5. The method of claim 1 , wherein determining the points of interest comprises: detecting one or more people in the field of view; and determining, based on one or more facial recognition algorithms, one or more identifiers of one or more of the detected people. 6. The method of claim 5 , wherein determining the points of interest is based on a measure of interestingness of one or more of the detected people, wherein the measure of interestingness is determined by the one or more machine-learning models based on one or more of the identifiers. 7. The method of claim 1 , wherein determining the points of interest comprises: detecting one or more people in the field of view; and determining one or more facial expressions of one or more of the detected people. 8. The method of claim 7 , wherein determining the points of interest is based on a measure of interestingness of one or more of the detected people, wherein the measure of interestingness is determined by the one or more machine-learning models based on one or more of the facial expressions. 9. The method of claim 1 , wherein determining the points of interest comprises: detecting one or more objects in the field of view. 10. The method of claim 9 , wherein determining the points of interest is based on a measure of interestingness of one or more of the detected objects, wherein the measure of interestingness is determined by the one or more machine-learning models based on one or more of the detected objects. 11. The method of claim 1 , wherein determining the points of interest is based on eye gaze data of the user captured by the client system. 12. The method of claim 1 , wherein the predefined quality standard is based on one or more of blurriness, lighting, or vividness of color. 13. The method of claim 1 , further comprising: receiving, from the client system, a user query from the user in response to the highlight files; accessing a plurality of episodic memories associated with the user; identifying one or more episodic memories of the accessed episodic memories as related to the user query; retrieving one or more media files corresponding to the identified episodic memories, wherein each media file comprises one or more of a post, a comment, an image, or a video clip; and sending, to the client system, instructions for presenting the one or more media files corresponding to the identified episodic memories. 14. The method of claim 1 , further comprising: sending, to the client system, instructions for zooming in one or more of the cameras to position one or more of the points of interest in a center of the field of view. 15. The method of claim 1 , further comprising: sending, to the client system, instructions for zooming out one or more of the cameras to position one or more of the points of interest in a center of the field of view. 16. The method of claim 1 , wherein the highlight files are personalized for the user based on one or more of: user profile data associated with the user; user preferences associated with the user; prior user inputs by the user; or user relationships with other users in a social graph. 17. The method of claim 1 , further comprising: receiving, from the client system, a user request from the user to share one or more of the highlight files with one or more other users; and sending, to one or more other client systems associated with the one or more other users, respectively, instructions for presenting the shared highlight files. 18. The method of claim 1 , further comprising: detecting a movement of the client system; and applying one or more visual stabilization algorithms to the sensory data captured by the one or more cameras. 19. One or more computer-readable non-transitory storage media embodying software that is operable when executed to: access sensory data captured by one or more cameras associated with a client system; determine, based on one or more machine-learning models and the sensory data captured by the one or more cameras, one or more points of interest in a field of view of the one or more cameras; generate, based on the one or more points of interest, a plurality of media files, wherein each media file is associated with a recording of at least one of the one or more points of interest; generate, based on the plurality of media files, one or more highlight files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and wherein each highlight file is associated with a respective captioning; and send, to the client system, instructions for presenting the one or more highlight files. 20. A system comprising: one or more processors; and a non-transitory memory coupled to the processors comprising instructions executable by the processors, the processors operable when executing the instructions to: access sensory data captured by one or more cameras associated with a client system; determine, based on one or more machine-learning models and the sensory data captured by the one or more cameras, one or more points of interest in a field of view of the one or more cameras; generate, based on the one or more points of interest, a plurality of media files, wherein each media file is associated with a recording of at least one of the one or more points of interest; generate, based on the plurality of media files, one or more highlight files, wherein each highlight file comprises a media file that satisfies a predefined quality standard, and wherein each highlight file is associated with a respective captioning; and send, to the client system, instructions for presenting the one or more highlight files.
using neural networks · CPC title
using classification, e.g. of video objects · CPC title
Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items (segmenting video sequences G06V20/49) · CPC title
Facial expression recognition · CPC title
Semantic analysis · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.