System and method for utilizing a temporal recurrent network for online action detection
US-11260872-B2 · Mar 1, 2022 · US
US11635299B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11635299-B2 |
| Application number | US-202016784103-A |
| Country | US |
| Kind code | B2 |
| Filing date | Feb 6, 2020 |
| Priority date | Feb 6, 2020 |
| Publication date | Apr 25, 2023 |
| Grant date | Apr 25, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A navigation system for providing driving instructions to a driver of a vehicle traveling on a route is provided. The driving instructions are generated by executing a multimodal fusion method that comprises extracting features from sensor measurements, annotating the features with directions for the vehicle to follow the route with respect to objects sensed by the sensors, and encoding the annotated features with a multimodal attention neural network to produce encodings. The encodings are transformed into a common latent space, and the transformed encodings are fused using an attention mechanism producing an encoded representation of the scene. The method further comprises decoding the encoded representation with a sentence generation neural network to generate a driving instruction and submitting the driving instruction to an output device.
Opening claim text (preview).
We claim: 1. A navigation system for providing driving instructions to a driver of a vehicle traveling on a route based on real-time description of objects in a scene pertinent to the route of the vehicle, wherein the navigation system is operatively connected through one or a combination of wired and wireless communication channels to multiple sensors configured to provide measurements of the scene and an output device configured to communicate the driving instructions to the driver of the vehicle, the navigation system comprising: an input interface configured to accept first measurements sensed by a first sensor of the multiple sensors and second measurements sensed by a second sensor of the multiple sensors; a memory configured to store executable instructions; a processor configured to execute the executable instructions to generate the driving instructions for the vehicle by performing a computer-implemented multimodal fusion method, wherein to perform the computer-implemented multimodal fusion method the processor is configured to: extract first features indicative of first attributes and first spatial relationships of first objects sensed by the first sensor, based on the first measurements; extract second features indicative of second attributes and second spatial relationships of second objects sensed by the second sensor, based on the second measurements; annotate the first features with encodings of a first direction for the vehicle to follow the route with respect to a first corresponding object of the first objects and the second features with encodings of a second direction for the vehicle to follow the route with respect to a second corresponding object of the second objects; encode the annotated first features and the annotated second features with a multimodal attention neural network by temporally correlating the annotated first features of the first measurements sensed at different instances of time to produce first encodings, temporally correlating the annotated second features of the second measurements sensed at different instances of time to produce second encodings, transforming the first encodings and the second encodings into a common latent space, and fusing the transformed first encodings and the transformed second encodings using an attention mechanism producing an encoded representation of the scene including a weighted combination of the first encodings and the second encodings with weights determined by the attention mechanism; decode the encoded representation of the scene with a sentence generation neural network to generate a driving instruction of the driving instructions using a vocabulary of types of salient objects, properties of the salient objects, and navigating actions; and submit the driving instruction of the driving instructions to the output device. 2. The navigation system of claim 1 , wherein the processor is configured to extract the first features and the second features using one or multiple feature extractors trained to detect and extract the first attributes of the first objects and the second attributes of the second objects from the first measurements and the second measurements. 3. The navigation system of claim 2 , wherein the processor is further configured to: determine the first spatial relationships of the first objects and the second spatial relationships of the second objects with respect to a location of the vehicle. 4. The navigation system of claim 1 , wherein each of the first attributes and the second attributes includes one of an absolute attribute value, a relative attribute value with respect to a state of the vehicle on the route, a type, a size, a color, a distance from the vehicle, or a velocity. 5. The navigation system of claim 1 , wherein the first objects and the second objects include at least one static object and at least one dynamic object. 6. The navigation system of claim 1 , wherein a first mode of the first sensor governing a type of the first measurements is different from a second mode of the second sensor governing a type of the second measurements, wherein the first mode and the second mode are selected from a group including sounds, color images and depth images. 7. The navigation system of claim 1 , wherein the multimodal attention neural network includes multiple input subnetworks, one for each of the multiple sensors, to encode a respective measurement of each of the multiple sensors, and includes a fusing subnetwork for fusing outputs of the multiple input subnetworks with the attention mechanism. 8. The navigation system of claim 7 , wherein each of the multiple input subnetworks includes a unimode attention trained to provide temporal correlation of features of each of the multiple sensors. 9. The navigation system of claim 1 , wherein the multiple sensors include a local sensor arranged on the vehicle and connected to the navigation system through the wired communication channel. 10. The navigation system of claim 1 , wherein the multiple sensors include a remote sensor arranged outside of the vehicle and connected to the navigation system through the wireless communication channel. 11. The navigation system of claim 1 , wherein the multiple sensors include one or a combination of a sensor arranged on a neighboring vehicle and a sensor arranged at a measurement system. 12. The navigation system of claim 1 , wherein the sentence generation neural network includes a long short-term memory (LSTM) decoder. 13. The navigation system of claim 1 , wherein the sentence generation neural network is trained based on a training set of measurements and corresponding driving instructions. 14. The navigation system of claim 1 , wherein the output device generates an acoustical signal based on the driving instructions.
Guidance using speech or audio output, e.g. text-to-speech (text to speech systems per se G10L13/00) · CPC title
whereby the further system is an optical system or imaging system · CPC title
Retrieval, searching and output of information related to real-time traffic, weather, or environmental conditions (arrangements for giving variable traffic instructions G08G1/09) · CPC title
Details, e.g. road map scale, orientation, zooming, illumination, level of detail, scrolling of road map or positioning of current position marker · CPC title
Details of the user input interface, e.g. buttons, knobs or sliders, including those provided on a touch screen; remote controllers; input using gestures · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.