Speech prosody prediction in video games

US12296265B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12296265-B1
Application numberUS-202418407686-A
CountryUS
Kind codeB1
Filing dateJan 9, 2024
Priority dateNov 20, 2020
Publication dateMay 13, 2025
Grant dateMay 13, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This specification describes a computer-implemented method of generating context-dependent speech audio in a video game. The method comprises obtaining contextual information relating to a state of the video game. The contextual information is inputted into a prosody prediction module. The prosody prediction module comprises a trained machine learning model which is configured to generate predicted prosodic features based on the contextual information. Input data comprising the predicted prosodic features and speech content data associated with the state of the video game is inputted into a speech audio generation module. An encoded representation of the speech content data dependent on the predicted prosodic features is generated using one or more encoders of the speech audio generation module. Context-dependent speech audio is generated, based on the encoded representation, using a decoder of the speech audio generation module.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method of generating context-dependent speech audio in a video game, the method comprising: enabling, by at least one processor of a computing device, gameplay of the video game; determining, by a video game engine of the video game on the at least one processor, an in-game event for which context-dependent speech audio is to be generated during the gameplay of the video game, wherein the in-game event includes an action performed by a character of the video game; obtaining, by the video game engine of the video game, contextual information and speech content data relating to a current state of the gameplay; requesting, by the video game engine of the video game, the context-dependent speech audio from a speech audio generator of the video game; generating, by the speech audio generator responsive to the request, the context-dependent speech audio by: inputting the contextual information relating to the current state of the gameplay into a prosody prediction model, wherein the prosody prediction model comprises a trained machine learning model which is configured to generate predicted prosodic features based on the contextual information; generating, by the prosody prediction model, predicted prosodic features from the input contextual information; inputting, into a speech audio generation model, input data comprising: at least the predicted prosodic features; and the speech content data relating to the current state of the gameplay; generating, using one or more encoders of the speech audio generation model, an encoded representation of the speech content data dependent on the predicted prosodic features; decoding, using a decoder of the speech audio generation model, the encoded representation to generate the context-dependent speech audio; and causing, by the video game engine of the video game, the context-dependent speech audio that matches the current state of the video game to be played among the gameplay of the in-game event. 2. The computer-implemented method of claim 1 , wherein the one or more encoders comprise a prosody encoder configured to generate an encoded representation of the predicted prosodic features, and a speech content encoder configured to generate the encoded representation of the speech content data based on the encoded representation of the predicted prosodic features. 3. The computer-implemented method of claim 1 , wherein the video game is a sports video game, wherein obtaining the contextual information relating to the current state of the video game comprises determining contextual information relating to an in-progress match of the sports video game. 4. The computer-implemented method of claim 3 , wherein the contextual information relating to the in-progress match of the sports video game comprises determining one or more of: statistics relating to one or more teams playing in the match; statistics relating to one or more players playing in the match; statistics relating to a current status of the match; and the type of sport being played in the match. 5. The computer-implemented method of claim 1 , wherein the contextual information includes the speech content data associated with the current state of the video game. 6. The computer-implemented method of claim 1 , wherein the input data further comprises speaker identifier data for a speaker of the generated speech audio. 7. A non-transitory computer-readable medium containing instructions, which when executed by one or more processors, causes the one or more processors to perform a method comprising: enabling, by at least one processor of a computing device, gameplay of a video game; determining, by a video game engine of the video game on the at least one processor, an in-game event for which context-dependent speech audio is to be generated during the gameplay of the video game, wherein the in-game event includes an action performed by a character of the video game; obtaining, by the video game engine of the video game, contextual information and speech content data relating to a current state of the gameplay; requesting, by the video game engine of the video game, the context-dependent speech audio from a speech audio generator of the video game; generating, by the speech audio generator responsive to the request, the context-dependent speech audio by: inputting the contextual information relating to the current state of the gameplay into a prosody prediction model, wherein the prosody prediction model comprises a trained machine learning model which is configured to generate predicted prosodic features based on the contextual information; generating, by the prosody prediction model, predicted prosodic features from the input contextual information; inputting, into a speech audio generation model, input data comprising: at least the predicted prosodic features; and the speech content data relating to the current state of the gameplay; generating, using one or more encoders of the speech audio generation model, an encoded representation of the speech content data dependent on the predicted prosodic features; decoding, using a decoder of the speech audio generation model, the encoded representation to generate the context-dependent speech audio; and causing, by the video game engine of the video game, the context-dependent speech audio that matches the current state of the video game to be played among the gameplay of the in-game event. 8. The non-transitory computer-readable medium of claim 7 , wherein the speech audio generation model includes a synthesizer. 9. The non-transitory computer-readable medium of claim 8 , wherein the speech content data comprises a plurality of speech content segments at a plurality of respective time steps and wherein inputting, into the speech audio generation model, the input data comprising the predicted prosodic features and the speech content data comprises generating, as output of a speech content encoder of the synthesizer, a speech content encoding for each time step of one or more time steps of the speech content data. 10. The non-transitory computer-readable medium of claim 9 , wherein generating predicted prosodic features comprises generating predicted prosodic features for each time step of the one or more time steps of the speech content data. 11. The non-transitory computer-readable medium of claim 10 , wherein inputting, into the speech audio generator, the input data comprising the predicted prosodic features and the speech content data comprises combining, for each time step of the one or more time steps, the speech content encoding and the predicted prosodic features of the time step. 12. A computer-implemented method of generating context-dependent speech audio in a video game, the method comprising: enabling, by at least one processor of a computing device, gameplay of the video game comprising requesting, by the at least one processor of the computing device, video game content from a video game server while a user is playing the video game; determining, by a video game engine of the video game on the at least one processor, an in-game event for which context-dependent speech audio is to be generated during the gameplay of the video game; obtaining, by the video game engine of the video game, contextual information and speech content data relating to a current state of the gameplay; requesting, by the video game engine of the video game, the context-dependent speech audio from a speech audio generator of the video game; generating, by the speech audio generator responsive to the request, the context-dependent speech audio based upon processing

Assignees

Inventors

Classifications

  • Voice editing, e.g. manipulating the voice of the synthesiser · CPC title

  • Sound input; Sound output (speech processing G10L) · CPC title

  • A63F13/54Primary

    involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall · CPC title

  • generating an output signal, e.g. under timing constraints, for spatialization · CPC title

  • G06N3/08Primary

    Learning methods · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12296265B1 cover?
This specification describes a computer-implemented method of generating context-dependent speech audio in a video game. The method comprises obtaining contextual information relating to a state of the video game. The contextual information is inputted into a prosody prediction module. The prosody prediction module comprises a trained machine learning model which is configured to generate predi…
Who is the assignee on this patent?
Electronic Arts Inc
What technology area does this patent fall under?
Primary CPC classification A63F13/54. Mapped technology areas include Human Necessities.
When was this patent published?
Publication date Tue May 13 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 11 related publications on this page (citations in our corpus or others sharing the same primary CPC).