Systems and methods for training a model to determine a type of environment surrounding a user

US12311265B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12311265-B2
Application numberUS-202117541549-A
CountryUS
Kind codeB2
Filing dateDec 3, 2021
Priority dateDec 3, 2021
Publication dateMay 27, 2025
Grant dateMay 27, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for determining an environment in which a user is located is described. The method includes receiving a plurality of sets of audio data based on sounds emitted in a plurality of environments. Each of the plurality of environments has a different combination of objects. The method further includes receiving input data regarding the plurality of environments, and training an artificial intelligence (AI) model based on the plurality of sets of audio data and the input data. The method includes applying the AI model to audio data captured from an environment surrounding the first user to determine a type of the environment.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for determining a real-world environment in which a first user is located, comprising: receiving a plurality of sets of audio data generated from sounds emitted in a plurality of real-world environments, wherein each of the plurality of real-world environments has a different combination of objects; extracting a plurality of features from the plurality of sets of audio data, wherein the plurality of features include a plurality of amplitudes of the plurality of sets of audio data, a plurality of frequencies of the plurality of sets of audio data, and a plurality of sense directions in which the sounds are sensed; classifying the plurality of features to output associations between the plurality of features and a plurality of types of the plurality of real-world environments, the objects within the plurality of real-world environments, a plurality of arrangements of the objects, and a plurality of states of the objects within the plurality of real-world environments; receiving input data regarding the plurality of real-world environments; training an artificial intelligence (AI) model based on the plurality of sets of audio data generated from the sounds in the plurality of real-world environments and based on the input data regarding the plurality of real-world environments, wherein said training the AI model includes: providing, to the AI model, the associations between the plurality of features and the plurality of types of the plurality of real-world environments, the objects within the plurality of real-world environments, the plurality of arrangements of the objects, and the plurality of states of the objects within the plurality of real-world environments; and determining, by the AI model, a plurality of probabilities based on the associations between the plurality of features and the plurality of types of the plurality of real-world environments, the objects within the plurality of real-world environments, the plurality of arrangements of the objects, and the plurality of states of the objects within the plurality of real-world environments; and applying the AI model to audio data captured from the real-world environment surrounding the first user to determine that a type of the real-world environment in which the first user is located includes one of an indoor environment and an outdoor environment. 2. The method of claim 1 , further comprising: receiving an indication of the type of the real-world environment to be simulated; accessing, based on the type of the real-world environment to be simulated, the audio data captured from the real-world environment; providing the audio data captured from the real-world environment to a client device for outputting a sound corresponding to the type of the real-world environment. 3. The method of claim 1 , wherein the plurality of sets of audio data include audio data that is generated from sounds emitted from one or more of the objects in the plurality of real-world environments and sounds that reflected from remaining of the objects. 4. The method of claim 1 , wherein the input data includes data identifying the objects in the plurality of real-world environments or image data captured by cameras in the plurality of real-world environments or a combination thereof. 5. The method of claim 1 , wherein the plurality of sets of audio data is captured when a second user moves from one location to another. 6. The method of claim 1 , wherein the plurality of sets of audio data is captured when a plurality of users including a second user and a third user are at a location. 7. The method of claim 1 , further comprising applying the AI model to the audio data captured from the real-world environment surrounding the first user to identify one or more objects within the real-world environment surrounding the first user or one or more states of the one or more objects or an arrangement of the one or more objects or a combination thereof. 8. A server for determining a real-world environment in which a user is located, comprising: a processor configured to: receive, via a computer network, a plurality of sets of audio data generated from sounds emitted in a plurality of real-world environments, wherein each of the plurality of real-world environments has a different combination of objects; extract a plurality of features from the plurality of sets of audio data, wherein the plurality of features include a plurality of amplitudes of the plurality of sets of audio data, a plurality of frequencies of the plurality of sets of audio data, and a plurality of sense directions in which the sounds are sensed; classify the plurality of features to output associations between the plurality of features and a plurality of types of the plurality of real-world environments, the objects within the plurality of real-world environments, a plurality of arrangements of the objects, and a plurality of states of the objects within the plurality of real-world environments; receive, via the computer network, input data regarding the plurality of real-world environments; train an artificial intelligence (AI) model based on the plurality of sets of audio data generated from the sounds in the plurality of real-world environments and based on the input data regarding the plurality of real-world environments, wherein to train the AI model, the processor is configured to: provide, to the AI model, the associations between the plurality of features and the plurality of types of the plurality of real-world environments, the objects within the plurality of real-world environments, the plurality of arrangements of the objects, and the plurality of states of the objects within the plurality of real-world environments; and determine, using the AI model, a plurality of probabilities based on the associations between the plurality of features and the plurality of types of the plurality of real-world environments, the objects within the plurality of real-world environments, the plurality of arrangements of the objects, and the plurality of states of the objects within the plurality of real-world environments; and apply the AI model to audio data captured from the real-world environment surrounding the user to determine that a type of the real-world environment in which the user is located includes one of an indoor environment and an outdoor environment; and a memory device coupled to the processor. 9. The server of claim 8 , wherein the processor is configured to: receive an indication of the type of the real-world environment to be simulated; access, based on the type of the real-world environment to be simulated, the audio data captured from the real-world environment; provide the audio data captured from the real-world environment to a client device for outputting a sound corresponding to the type of the real-world environment. 10. The server of claim 8 , wherein the plurality of sets of audio data include audio data that is generated from sounds emitted from one or more of the objects in the plurality of real-world environments and sounds that are reflected from remaining of the objects. 11. The server of claim 8 , wherein the input data includes data identifying the objects in the plurality of real-world environments or image data captured by cameras in the plurality of real-world environments or a combination thereof. 12. The server of claim 8 , wherein the processor is configured to apply the AI model to the audio data captured from the real-world environment surrounding the user to identify one or more objects within the real-world environment surrounding the user or one or more states of the one or more objects

Assignees

Inventors

Classifications

  • Management of the audio stream, e.g. setting of volume, audio stream path · CPC title

  • generating an output signal, e.g. under timing constraints, for spatialization · CPC title

  • for performing operations on behalf of the game client, e.g. rendering · CPC title

  • Input via voice recognition · CPC title

  • Context or environment of the image · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12311265B2 cover?
A method for determining an environment in which a user is located is described. The method includes receiving a plurality of sets of audio data based on sounds emitted in a plurality of environments. Each of the plurality of environments has a different combination of objects. The method further includes receiving input data regarding the plurality of environments, and training an artificial i…
Who is the assignee on this patent?
Sony Interactive Entertainment Inc
What technology area does this patent fall under?
Primary CPC classification A63F13/54. Mapped technology areas include Human Necessities.
When was this patent published?
Publication date Tue May 27 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).