Emojicon puppeting

US2018336714A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2018336714-A1
Application numberUS-201715809875-A
CountryUS
Kind codeA1
Filing dateNov 10, 2017
Priority dateMay 16, 2017
Publication dateNov 22, 2018
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for generating a video of an emoji that has been puppeted using inputs from image, depth, and audio. The inputs can capture facial expressions of a user, eye, eyebrow, mouth, and head movements. A pose, held by the user, can be detected that can be used to generate supplemental animation. The emoji can further be animated using physical properties associated with the emoji and captured movements. An emoji of a dog can have its ears move in response to an up-and-down movement, or a shaking of the head. The video can be sent in a message to one or more recipients. A sending device can render the puppeted video in accordance with hardware and software capabilities of a recipient's computer device.

First claim

Opening claim text (preview).

1 . A computer-implemented method practiced on a computing device comprising an image sensor and a depth sensor, the method comprising: receiving, using the depth sensor, a plurality of frames of depth information representing a head of a person that is changing with respect to time; receiving a plurality of frames of image information representing the head of the person; generating a video of an emoji in accordance with the plurality of frames of depth information and image information, wherein the image information and the depth information capture one or more facial expressions and movements of the head of the person, and wherein the generating comprises, for each of the one or more facial expressions, determining, using a machine learning model, an amount of difference between an expression-neutral base mesh of the head of the person and a facial expression base mesh of the head of the person corresponding to the image information and the depth information, and activating a corresponding facial expression of the emoji in accordance with the amount of difference; and transmitting the video of the emoji to one or more recipient computing devices. 2 . The method of claim 1 , wherein the generating comprises generating one or more versions of the video of the emoji in accordance with the plurality of frames of depth information and image information, the one or more versions corresponding to respective device capabilities of one or more recipient computing devices, and wherein the transmitting comprises transmitting the one or more versions of the video of the emoji to the one or more recipient computing devices based on at least one of the respective device capabilities or device capabilities of the computing device. 3 . The method of claim 1 , wherein the transmitting comprises: transmitting a metadata tag that enables the one or more recipient computing devices to loop playback of the video of the emoji multiple times. 4 - 5 . (canceled) 6 . The method of claim 1 , wherein the plurality of frames of depth information and the plurality of frames of image information are synchronized, wherein the synchronization comprises aligning the plurality of frames of image information and depth information in time such that a frame of image information and a frame of depth information that are aligned in time comprise a key frame, and one or more key frames are interleaved by one or more image information frames. 7 . The method of claim 1 , wherein the video of the emoji is transmitted through a messaging system that includes one or more identity servers and one or more message servers. 8 . The method of claim 2 , further comprising: transmitting, to a message service, a request to send the video of the emoji to the one or more recipient computing devices; and receiving, from the message service, a request to generate the one or more versions of the video of the emoji. 9 . At least one non-transitory computer readable medium programmed with instructions that, when executed by a processing system coupled to an image sensor and a depth sensor, perform operations, comprising: receiving, using the depth sensor, a plurality of frames of depth information representing a head of a person that is changing with respect to time; receiving a plurality of frames of image information representing the head of the person; generating a video of an emoji in accordance with the plurality of frames of depth information and image information wherein the image information and the depth information capture one or more facial expressions and movements of the head of the person, and wherein the generating comprises, for each of the one or more facial expressions, determining, using a machine learning model, an amount of difference between an expression-neutral base mesh of the head of the person and a facial expression base mesh of the head of the person corresponding to the image information and the depth information, and activating a corresponding facial expression of the emoji in accordance with the amount of difference; and transmitting the video of the emoji to one or more recipient computing devices. 10 . The medium of claim 9 , wherein the generating comprises generating two versions of the video of the emoji in accordance with the plurality of frames of depth information and image information, the two versions corresponding to respective device capabilities of two recipient computing devices, and wherein the transmitting comprises transmitting the two versions of the video of the emoji to two recipient computing devices based on the respective device capabilities. 11 . The medium of claim 9 , further comprising: generating a rich link for a first of the one or more recipient computing devices based on device capabilities of the first recipient computing device, the rich link referring to a storage location of the video of the emoji. 12 - 13 . (canceled) 14 . The medium of claim 9 , wherein the plurality of frames of depth information and the plurality of frames of image information are synchronized, wherein the synchronization comprises aligning the plurality of frames of image information and depth information in time such that a frame of image information and a frame of depth information that are aligned in time comprise a key frame, and one or more key frames are interleaved by one or more image information frames. 15 . The medium of claim 9 , wherein the video of the emoji is transmitted through a messaging system that includes one or more identity servers and one or more message servers. 16 . The medium of claim 9 , the operations further comprising: receiving a plurality of frames of audio information associated with the head of the person; and aligning the plurality of frames of audio information in time with the plurality of frames of image information and depth information wherein the generating the video of the emoji comprises adding audio based on the plurality of audio frames. 17 . A system comprising: a processing system comprising a depth sensor and an image sensor, the processing system coupled to a memory programmed with executable instructions that, when executed by the processing system perform operations, the operations comprising: receiving, using the depth sensor, a plurality of frames of depth information representing a head of a person that is changing with respect to time; receiving a plurality of frames of image information representing the head of the person, generating a video of an emoji based on the plurality of frames of depth information and image information, wherein the image information and the depth information capture one or more facial expressions and movements of the head of the person, and wherein the generating comprises, for each of the one or more facial expressions, determining, using a machine learning model, an amount of difference between an expression-neutral base mesh of the head of the person and a facial expression base mesh of the head of the person corresponding to the image information and the depth information, and activating a corresponding facial expression of the emoji in accordance with the amount of difference; and transmitting the video of the emoji to one or more recipient computing devices. 18 . The system of claim 17 , the operations further comprising: determining, based on the plurality of frames of depth information and image information, that a particular facial expression was held for a predetermined period of time; and adding supplemental graphics to the video of the emoji based on the det

Assignees

Inventors

Classifications

  • Physics · mapped topic

  • Constructional details of the terminal equipment, e.g. arrangements of the camera and the display · CPC title

  • Physics · mapped topic

  • Handheld terminals · CPC title

  • Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals (selecting H04Q) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2018336714A1 cover?
Systems and methods for generating a video of an emoji that has been puppeted using inputs from image, depth, and audio. The inputs can capture facial expressions of a user, eye, eyebrow, mouth, and head movements. A pose, held by the user, can be detected that can be used to generate supplemental animation. The emoji can further be animated using physical properties associated with the emoji a…
Who is the assignee on this patent?
Apple Inc
What technology area does this patent fall under?
Primary CPC classification G06T13/40. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Nov 22 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).