Audiovisual deepfake detection
US-2022121868-A1 · Apr 21, 2022 · US
US11620903B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620903-B2 |
| Application number | US-202117149659-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 14, 2021 |
| Priority date | Jan 14, 2021 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
According to various embodiments, systems, methods, and mediums for operating an autonomous driving vehicles (ADV) are described. The embodiments use a number of machine learning models to extract features individually from audio data and visual data captured by sensors mounted on the ADV, and then to fuse these extracted features to create a concatenated feature vectors. The concatenated feature vector is provided to a multiplayer perceptron (MLP) as input to generate a detection result related to the presence of an emergency vehicle in the surrounding environment. The detection result can be used by the ADV to take appropriate actions to comply with the local traffic rules.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method of operating an autonomous driving vehicle (ADV), the method comprising: receiving, at an autonomous driving system (ADS) on the ADV, a stream of audio signals captured using one or more audio capturing devices and a sequence of image frames captured using one or more image capturing devices mounted on the ADV from a surrounding environment of the ADV; extracting, by the ADS, a first feature vector from the stream of captured audio signals, and a second feature vector from the sequence of captured image frames; concatenating, by the ADS, the first feature vector and the second vector to create a concatenated feature vector; and determining, by the ADS using a first neural network model based on the concatenated feature vector, that an emergency vehicle is present in the surrounding environment of the ADV. 2. The method of claim 1 , wherein the first neural network model is a multi-layer perceptron (MLP) network. 3. The method of claim 1 , further comprising: determining, using the first neural network model, a position of the emergency vehicle, and a moving direction of the emergency vehicle. 4. The method of claim 3 , further comprising: controlling, based on the position and the moving direction of the emergency vehicle, the ADV, including at least one of steering the ADV out of a current driving lane or braking the ADV to decelerate, in response to determining the position of the ADV. 5. The method of claim 1 , wherein extracting the first feature vector comprises: extracting, using a second neural network model, a third feature vector from the stream of captured audio signals, the third feature vector being a vector of basic audio features; extracting, using a third neural network model, a fourth feature vector from the stream of captured audio signals, the fourth feature vector being a vector of Mel Frequency Cepstral Coefficents (MFCC) features; and concatenating the third feature vector and the fourth feature vector into a single feature vector. 6. The method of claim 5 , further comprising: extracting, using a fourth neural network model, a fifth feature vector from the stream of captured audio signals, the fifth feature vector being a vector of Mel histogram features; and concatenating the third feature vector, the fourth feature vector, and the fifth feature vector into the single feature vector. 7. The method of claim 1 , wherein the ADS uses a convolutional neural network to extract the second feature vector. 8. The method of claim 1 , wherein the one or more audio capturing devices include one or more microphones, and wherein the one or more image capturing devices include one or more cameras. 9. A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform operations of operating an autonomous driving vehicle (ADV), the operations comprising: receiving, at an autonomous driving system (ADS) on the ADV, a stream of audio signals captured using one or more audio capturing devices and a sequence of image frames captured using one or more image capturing devices mounted on the ADV from a surrounding environment of the ADV; extracting, by the ADS, a first feature vector from the stream of captured audio signals, and a second feature vector from the sequence of captured image frames; concatenating, by the ADS, the first feature vector and the second vector to create a concatenated feature vector; and determining, by the ADS using a first neural network model based on the concatenated feature vector, that an emergency vehicle is present in the surrounding environment of the ADV. 10. The non-transitory machine-readable medium of claim 9 , wherein the first neural network model is a multi-layer perceptron (MLP) network. 11. The non-transitory machine-readable medium of claim 9 , wherein the operations further comprise: determining, using the first neural network model, a position of the emergency vehicle, and a moving direction of the emergency vehicle. 12. The non-transitory machine-readable medium of claim 11 , wherein the operations further comprise: controlling, based on the position and the moving direction of the emergency vehicle, the ADV, including at least one of steering the ADV out of a current driving lane or braking the ADV to decelerate, in response to determining the position of the ADV. 13. The non-transitory machine-readable medium of claim 9 , wherein extracting the first feature vector comprises: extracting, using a second neural network model, a third feature vector from the stream of captured audio signals, the third feature vector being a vector of basic audio features; extracting, using a third neural network model, a fourth feature vector from the stream of captured audio signals, the fourth feature vector being a vector of Mel Frequency Cepstral Coefficents (MFCC) features; and concatenating the third feature vector and the fourth feature vector into a single feature vector. 14. The non-transitory machine-readable medium of claim 13 , wherein the operations further comprise: extracting, using a fourth neural network model, a fifth feature vector from the stream of captured audio signals, the fifth feature vector being a vector of Mel histogram features; and concatenating the third feature vector, the fourth feature vector, and the fifth feature vector into the single feature vector. 15. The non-transitory machine-readable medium of claim 9 , wherein the ADS uses a convolutional neural network to extract the second feature vector. 16. The non-transitory machine-readable medium of claim 9 , wherein the one or more audio capturing devices include one or more microphones, and wherein the one or more image capturing devices include one or more cameras. 17. A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by the processor, cause the processor to perform operations of operating an autonomous driving vehicle (ADV), the operations comprising: receiving, at an autonomous driving system (ADS) on the ADV, a stream of audio signals captured using one or more audio capturing devices and a sequence of image frames captured using one or more image capturing devices mounted on the ADV from a surrounding environment of the ADV, extracting, by the ADS, a first feature vector from the stream of captured audio signals, and a second feature vector from the sequence of captured image frames, concatenating, by the ADS, the first feature vector and the second vector to create a concatenated feature vector, and determining, by the ADS using a first neural network model based on the concatenated feature vector, that an emergency vehicle is present in the surrounding environment of the ADV. 18. The system of claim 17 , wherein the first neural network model is a multi-layer perceptron (MLP) network. 19. The system of claim 17 , wherein the operations further comprise: determining, using the first neural network model, a position of the emergency vehicle, and a moving direction of the emergency vehicle. 20. The system of claim 19 , wherein the operations further comprise: controlling, based on the position and the moving direction of the emergency vehicle, the ADV, including at least one of steering the ADV out of a current driving lane or braking the ADV to decelerate, in response to determining the position of the ADV. 21. The system of
Supervised learning · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Mathematical models, e.g. for simulation · CPC title
including control of braking systems · CPC title
related to ambient conditions · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.