Who is the assignee on this patent?

Toyota Motor Europe, Eth Zuerich, Toyota Motor Co Ltd, and 1 more

What technology area does this patent fall under?

Primary CPC classification G10L25/30. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method for training a neural network to describe an environment on the basis of an audio signal, and the corresponding neural network

US12288567B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12288567-B2
Application number	US-202017792073-A
Country	US
Kind code	B2
Filing date	Jan 10, 2020
Priority date	Jan 10, 2020
Publication date	Apr 29, 2025
Grant date	Apr 29, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A neural network, a system using this neural network and a method for training a neural network to output a description of the environment in the vicinity of at least one sound acquisition device on the basis of an audio signal acquired by the sound acquisition device, the method including: obtaining audio and image training signals of a scene showing an environment with objects generating sounds, obtaining a target description of the environment seen on the image training signal, inputting the audio training signal to the neural network so that the neural network outputs a training description of the environment, and comparing the target description of the environment with the training description of the environment.

First claim

Opening claim text (preview).

The invention claimed is: 1. A method for training a neural network to output a description of the environment in the vicinity of at least one sound acquisition device on the basis of an audio signal acquired by the sound acquisition device, the method comprising: obtaining audio and image training signals of a scene showing an environment with objects generating sounds, obtaining a target description of the environment seen on the image training signal, inputting the audio training signal to the neural network so that the neural network outputs a training description of the environment, and comparing the target description of the environment with the training description of the environment, wherein the description of the environment, the target description of the environment, and the training description of the environment include at least one of a semantic segmentation of a frame of the image training signal or a depth map of a frame of the image training signal. 2. The method of claim 1 , wherein the audio training signal is acquired with a plurality of sound acquisition devices. 3. The method of claim 2 , wherein the sound acquisition devices of the plurality of sound acquisition devices are all spaced apart from each other. 4. The method of claim 2 , wherein at least one additional sound acquisition device is used to acquire an audio signal at a location which differs from the location of any one of the sound acquisition devices of the plurality of sound acquisition devices, the neural network being further configured to determine at least one predicted audio signal representative of the audio signal that is acquired by the at least one additional sound acquisition device, and the method further comprising comparing the predicted audio signal with an audio signal acquired by the at least one additional sound acquisition device. 5. The method of claim 1 , wherein the audio training signal is acquired using at least one binaural sound acquisition device. 6. The method of claim 1 , wherein the image training signal is acquired using a 360 degrees camera. 7. The method of claim 1 , wherein the target description is obtained using at least one pre-trained neural network configured to receive an image signal as input and to output the target description. 8. A neural network trained using the method of claim 1 . 9. The neural network of claim 8 , comprising, for each possible audio signal to be used as input, four convolutional layers, a concatenation module for concatenating the outputs of every four convolutional layers, and an ASPP module. 10. A system comprising at least one sound acquisition device and a neural network in accordance with claim 8 . 11. A vehicle comprising a system according to claim 10 . 12. A system for training a neural network to output a description of the environment in the vicinity of at least one sound acquisition device on the basis of an audio signal acquired by the sound acquisition device, the system comprising: a module for obtaining audio and image training signals of a scene showing an environment with objects generating sounds, a module for obtaining a target description of the environment seen on the image training signal, a module for inputting the audio training signal to the neural network so that the neural network outputs a training description of the environment, and a module for comparing the target description of the environment with the training description of the environment, wherein the description of the environment, the target description of the environment, and the training description of the environment include at least one of a semantic segmentation of a frame of the image training signal or a depth map of a frame of the image training signal. 13. A non-transitory recording medium readable by a computer and having recorded thereon a computer program including instructions for executing a method for training a neural network to output a description of the environment in the vicinity of at least one sound acquisition device on the basis of an audio signal acquired by the sound acquisition device, the method comprising: obtaining audio and image training signals of a scene showing an environment with objects generating sounds, obtaining a target description of the environment seen on the image training signal, inputting the audio training signal to the neural network so that the neural network outputs a training description of the environment, and comparing the target description of the environment with the training description of the environment, wherein the description of the environment, the target description of the environment, and the training description of the environment include at least one of a semantic segmentation of a frame of the image training signal or a depth map of a frame of the image training signal.

Assignees

Inventors

Classifications

G06V10/774
Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting · CPC title
G06V20/10
Terrestrial scenes (scenes under surveillance with static cameras G06V20/52; scenes perceived from the exterior of a vehicle G06V20/56; scenes perceived from the interior of a vehicle G06V20/59) · CPC title
G06V10/95
structured as a network, e.g. client-server architectures · CPC title
G06V10/26
Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion · CPC title
G06V10/82
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 69159784

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12288567B2 cover?: A neural network, a system using this neural network and a method for training a neural network to output a description of the environment in the vicinity of at least one sound acquisition device on the basis of an audio signal acquired by the sound acquisition device, the method including: obtaining audio and image training signals of a scene showing an environment with objects generating soun…
Who is the assignee on this patent?: Toyota Motor Europe, Eth Zuerich, Toyota Motor Co Ltd, and 1 more
What technology area does this patent fall under?: Primary CPC classification G10L25/30. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 29 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).