What technology area does this patent fall under?

Primary CPC classification G06V20/58. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Machine learning model to fuse emergency vehicle audio and visual detection

US11620903B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11620903-B2
Application number	US-202117149659-A
Country	US
Kind code	B2
Filing date	Jan 14, 2021
Priority date	Jan 14, 2021
Publication date	Apr 4, 2023
Grant date	Apr 4, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

According to various embodiments, systems, methods, and mediums for operating an autonomous driving vehicles (ADV) are described. The embodiments use a number of machine learning models to extract features individually from audio data and visual data captured by sensors mounted on the ADV, and then to fuse these extracted features to create a concatenated feature vectors. The concatenated feature vector is provided to a multiplayer perceptron (MLP) as input to generate a detection result related to the presence of an emergency vehicle in the surrounding environment. The detection result can be used by the ADV to take appropriate actions to comply with the local traffic rules.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of operating an autonomous driving vehicle (ADV), the method comprising: receiving, at an autonomous driving system (ADS) on the ADV, a stream of audio signals captured using one or more audio capturing devices and a sequence of image frames captured using one or more image capturing devices mounted on the ADV from a surrounding environment of the ADV; extracting, by the ADS, a first feature vector from the stream of captured audio signals, and a second feature vector from the sequence of captured image frames; concatenating, by the ADS, the first feature vector and the second vector to create a concatenated feature vector; and determining, by the ADS using a first neural network model based on the concatenated feature vector, that an emergency vehicle is present in the surrounding environment of the ADV. 2. The method of claim 1 , wherein the first neural network model is a multi-layer perceptron (MLP) network. 3. The method of claim 1 , further comprising: determining, using the first neural network model, a position of the emergency vehicle, and a moving direction of the emergency vehicle. 4. The method of claim 3 , further comprising: controlling, based on the position and the moving direction of the emergency vehicle, the ADV, including at least one of steering the ADV out of a current driving lane or braking the ADV to decelerate, in response to determining the position of the ADV. 5. The method of claim 1 , wherein extracting the first feature vector comprises: extracting, using a second neural network model, a third feature vector from the stream of captured audio signals, the third feature vector being a vector of basic audio features; extracting, using a third neural network model, a fourth feature vector from the stream of captured audio signals, the fourth feature vector being a vector of Mel Frequency Cepstral Coefficents (MFCC) features; and concatenating the third feature vector and the fourth feature vector into a single feature vector. 6. The method of claim 5 , further comprising: extracting, using a fourth neural network model, a fifth feature vector from the stream of captured audio signals, the fifth feature vector being a vector of Mel histogram features; and concatenating the third feature vector, the fourth feature vector, and the fifth feature vector into the single feature vector. 7. The method of claim 1 , wherein the ADS uses a convolutional neural network to extract the second feature vector. 8. The method of claim 1 , wherein the one or more audio capturing devices include one or more microphones, and wherein the one or more image capturing devices include one or more cameras. 9. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations of operating an autonomous driving vehicle (ADV), the operations comprising: receiving, at an autonomous driving system (ADS) on the ADV, a stream of audio signals captured using one or more audio capturing devices and a sequence of image frames captured using one or more image capturing devices mounted on the ADV from a surrounding environment of the ADV; extracting, by the ADS, a first feature vector from the stream of captured audio signals, and a second feature vector from the sequence of captured image frames; concatenating, by the ADS, the first feature vector and the second vector to create a concatenated feature vector; and determining, by the ADS using a first neural network model based on the concatenated feature vector, that an emergency vehicle is present in the surrounding environment of the ADV. 10. The non-transitory machine-readable medium of claim 9 , wherein the first neural network model is a multi-layer perceptron (MLP) network. 11. The non-transitory machine-readable medium of claim 9 , wherein the operations further comprise: determining, using the first neural network model, a position of the emergency vehicle, and a moving direction of the emergency vehicle. 12. The non-transitory machine-readable medium of claim 11 , wherein the operations further comprise: controlling, based on the position and the moving direction of the emergency vehicle, the ADV, including at least one of steering the ADV out of a current driving lane or braking the ADV to decelerate, in response to determining the position of the ADV. 13. The non-transitory machine-readable medium of claim 9 , wherein extracting the first feature vector comprises: extracting, using a second neural network model, a third feature vector from the stream of captured audio signals, the third feature vector being a vector of basic audio features; extracting, using a third neural network model, a fourth feature vector from the stream of captured audio signals, the fourth feature vector being a vector of Mel Frequency Cepstral Coefficents (MFCC) features; and concatenating the third feature vector and the fourth feature vector into a single feature vector. 14. The non-transitory machine-readable medium of claim 13 , wherein the operations further comprise: extracting, using a fourth neural network model, a fifth feature vector from the stream of captured audio signals, the fifth feature vector being a vector of Mel histogram features; and concatenating the third feature vector, the fourth feature vector, and the fifth feature vector into the single feature vector. 15. The non-transitory machine-readable medium of claim 9 , wherein the ADS uses a convolutional neural network to extract the second feature vector. 16. The non-transitory machine-readable medium of claim 9 , wherein the one or more audio capturing devices include one or more microphones, and wherein the one or more image capturing devices include one or more cameras. 17. A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations of operating an autonomous driving vehicle (ADV), the operations comprising: receiving, at an autonomous driving system (ADS) on the ADV, a stream of audio signals captured using one or more audio capturing devices and a sequence of image frames captured using one or more image capturing devices mounted on the ADV from a surrounding environment of the ADV, extracting, by the ADS, a first feature vector from the stream of captured audio signals, and a second feature vector from the sequence of captured image frames, concatenating, by the ADS, the first feature vector and the second vector to create a concatenated feature vector, and determining, by the ADS using a first neural network model based on the concatenated feature vector, that an emergency vehicle is present in the surrounding environment of the ADV. 18. The system of claim 17 , wherein the first neural network model is a multi-layer perceptron (MLP) network. 19. The system of claim 17 , wherein the operations further comprise: determining, using the first neural network model, a position of the emergency vehicle, and a moving direction of the emergency vehicle. 20. The system of claim 19 , wherein the operations further comprise: controlling, based on the position and the moving direction of the emergency vehicle, the ADV, including at least one of steering the ADV out of a current driving lane or braking the ADV to decelerate, in response to determining the position of the ADV. 21. The system of

Assignees

Baidu Usa Llc

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0464
Convolutional networks [CNN, ConvNet] · CPC title
B60W2050/0028
Mathematical models, e.g. for simulation · CPC title
B60W10/18
including control of braking systems · CPC title
B60W40/02
related to ambient conditions · CPC title

Patent family

Related publications grouped by family.

View patent family 78938063

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11620903B2 cover?: According to various embodiments, systems, methods, and mediums for operating an autonomous driving vehicles (ADV) are described. The embodiments use a number of machine learning models to extract features individually from audio data and visual data captured by sensors mounted on the ADV, and then to fuse these extracted features to create a concatenated feature vectors. The concatenated featu…
Who is the assignee on this patent?: Baidu Usa Llc
What technology area does this patent fall under?: Primary CPC classification G06V20/58. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Apr 04 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).