Home monitoring camera featuring intelligent personal audio assistant, smart zoom and face recognition features
US-10681313-B1 · Jun 9, 2020 · US
US10863159B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10863159-B2 |
| Application number | US-201816477152-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 11, 2018 |
| Priority date | Jan 20, 2017 |
| Publication date | Dec 8, 2020 |
| Grant date | Dec 8, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems and methods described that provide for sending a first temporal portion of video to a client, wherein video quality of a first spatial region associated with a first direction of view is sent with higher quality than video data for another spatial region not associated with the first direction of view; sending a second temporal portion of the video to the client; and responsive to determining that a significant event, audio or video or a combination, occurred during the second temporal portion and corresponds with a second direction of view that is associated with a second spatial region, sending higher quality video for the second spatial region than video data for another spatial region. Virtual reality (VR) video, including 360-degree VR video may thus be foveated, based on contextual information in the video data and corresponding audio data, combined with field of view predictions based upon user motion.
Opening claim text (preview).
What is claimed: 1. A method of delivering video, wherein the video includes a plurality of temporal segments, each temporal segment having a plurality of spatial regions, the method comprising: sending a first temporal segment of video to a client, wherein a first spatial region in the first temporal segment associated with a first direction of view is sent with a higher video quality than a second spatial region in the first temporal segment that is not associated with the first direction of view; based on audio data associated with the video, determining that a first significant audio event has occurred during a second temporal segment that is subsequent to the first temporal segment, the first significant audio event occurring in a second direction of view; and sending the second temporal segment of video to the client, wherein responsive to determining that the first significant event has occurred during the second temporal segment, a third spatial region associated with the second direction of view in the second temporal segment is sent with a higher video quality than a fourth spatial region that is not associated with the second direction of view in the second temporal segment. 2. The method of claim 1 wherein the higher video quality comprises higher resolution. 3. The method of claim 1 wherein the higher video quality comprises at least one selected from the list consisting of: higher bit rate, higher frame rate, and smaller quantization parameter (QP). 4. The method of claim 1 wherein determining comprises determining on a server that is sending the first and second temporal video segments. 5. The method of claim 1 wherein determining comprises determining on the client and wherein the method further comprises: sending, from the client to a server that sent the first second temporal video segment, a request indicating the third spatial region to have higher video quality. 6. The method of claim 1 wherein the first direction of view is determined based upon information from the client regarding a user's direction of view. 7. The method of claim 1 , further comprising calculating a contextual weight of the first significant event, wherein the contextual weight includes an audio contextual weight. 8. The method of claim 1 , further comprising: calculating audio contextual weights for a plurality of tiles in the second temporal segment, including the first significant event; calculating video contextual weights for the plurality of tiles in the second temporal segment, including a second significant event; calculating a predicted focal region based, at least in part, on the audio contextual weights and the video contextual weights; determining a field of view (FOV) region based on the predicted focal region; and selecting one or more tiles from the plurality of tiles to encode at a higher quality based, at least in part, on the audio contextual weights and the video contextual weights. 9. The method of claim 8 , further comprising: calculating a user contextual weight is based on a model of a user's physical head movements, wherein calculating a predicted focal region is further based on the user contextual weight, and selecting one or more tiles from the plurality of tiles to encode at a higher quality is further based on the user contextual weight. 10. A system comprising: a processor; and a non-transitory computer-readable medium storing instruction that are operative, if executed on the processor, to perform the functions of: sending a first temporal segment of video to a client, wherein a first spatial region in the first temporal segment associated with a first direction of view is sent with a higher video quality than a second spatial region in the first temporal segment that is not associated with the first direction of view; based on audio data associated with the video, determining that a first significant audio event has occurred during a second temporal segment that is subsequent to the first temporal segment, the first significant audio event occurring in a second direction of view; and sending the second temporal segment of video to the client, wherein responsive to determining that the first significant audio event has occurred during the second temporal segment, a third spatial region associated with the second direction of view in the second temporal segment is sent with a higher video quality than a fourth spatial region that is not associated with the second direction of view in the second temporal segment. 11. The system of claim 10 wherein the higher video quality comprises at least one selected from the list consisting of: higher resolution, higher bit rate, higher frame rate, and smaller quantization parameter (QP). 12. The system of claim 10 wherein the instruction are further operative to perform the functions of: calculating audio contextual weights for a plurality of tiles in the second temporal segment, including the first significant event; calculating video contextual weights for the plurality of tiles in the second temporal segment, including a second significant event; calculating a predicted focal region based, at least in part, on the audio contextual weights and the video contextual weights; determining a field of view (FOV) region based on the predicted focal region; and selecting one or more tiles from the plurality of tiles to encode at a higher quality based, at least in part, on the audio contextual weights and the video contextual weights. 13. The method of claim 1 , further comprising determining an audio contextual weight for the third spatial region according to a location of an audio source of the first significant audio event, wherein the audio contextual weight for the third spatial region of the second temporal segment is calculated as a value that is proportional to a loudness of the audio source within the third spatial region of the second temporal segment. 14. The method of claim 1 , wherein a video contextual weight is derived for the third spatial region of the second temporal segment according to visual objects within the third spatial region of the second temporal segment. 15. The system of claim 10 , further comprising determining an audio contextual weight for the third spatial region according to a location of an audio source of the first significant audio event, wherein the audio contextual weight for the third spatial region of the second temporal segment is calculated as a value that is proportional to a loudness of the audio source within the third spatial region of the second temporal segment, and determining a video contextual weight for the third spatial region of the second temporal segment according to visual objects within the third spatial region of the second temporal segment.
Virtual reality · CPC title
Arrangements for interaction with the human body, e.g. for user immersion in virtual reality (blind teaching G09B21/00) · CPC title
comprising a device modifying the resolution of the displayed image · CPC title
with head-mounted left-right displays · CPC title
involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements {(video transcoding H04N19/40; media packet handling at the source H04L65/762)} · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.