Method and system for scene-aware interaction

US11635299B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11635299-B2
Application numberUS-202016784103-A
CountryUS
Kind codeB2
Filing dateFeb 6, 2020
Priority dateFeb 6, 2020
Publication dateApr 25, 2023
Grant dateApr 25, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A navigation system for providing driving instructions to a driver of a vehicle traveling on a route is provided. The driving instructions are generated by executing a multimodal fusion method that comprises extracting features from sensor measurements, annotating the features with directions for the vehicle to follow the route with respect to objects sensed by the sensors, and encoding the annotated features with a multimodal attention neural network to produce encodings. The encodings are transformed into a common latent space, and the transformed encodings are fused using an attention mechanism producing an encoded representation of the scene. The method further comprises decoding the encoded representation with a sentence generation neural network to generate a driving instruction and submitting the driving instruction to an output device.

First claim

Opening claim text (preview).

We claim: 1. A navigation system for providing driving instructions to a driver of a vehicle traveling on a route based on real-time description of objects in a scene pertinent to the route of the vehicle, wherein the navigation system is operatively connected through one or a combination of wired and wireless communication channels to multiple sensors configured to provide measurements of the scene and an output device configured to communicate the driving instructions to the driver of the vehicle, the navigation system comprising: an input interface configured to accept first measurements sensed by a first sensor of the multiple sensors and second measurements sensed by a second sensor of the multiple sensors; a memory configured to store executable instructions; a processor configured to execute the executable instructions to generate the driving instructions for the vehicle by performing a computer-implemented multimodal fusion method, wherein to perform the computer-implemented multimodal fusion method the processor is configured to: extract first features indicative of first attributes and first spatial relationships of first objects sensed by the first sensor, based on the first measurements; extract second features indicative of second attributes and second spatial relationships of second objects sensed by the second sensor, based on the second measurements; annotate the first features with encodings of a first direction for the vehicle to follow the route with respect to a first corresponding object of the first objects and the second features with encodings of a second direction for the vehicle to follow the route with respect to a second corresponding object of the second objects; encode the annotated first features and the annotated second features with a multimodal attention neural network by temporally correlating the annotated first features of the first measurements sensed at different instances of time to produce first encodings, temporally correlating the annotated second features of the second measurements sensed at different instances of time to produce second encodings, transforming the first encodings and the second encodings into a common latent space, and fusing the transformed first encodings and the transformed second encodings using an attention mechanism producing an encoded representation of the scene including a weighted combination of the first encodings and the second encodings with weights determined by the attention mechanism; decode the encoded representation of the scene with a sentence generation neural network to generate a driving instruction of the driving instructions using a vocabulary of types of salient objects, properties of the salient objects, and navigating actions; and submit the driving instruction of the driving instructions to the output device. 2. The navigation system of claim 1 , wherein the processor is configured to extract the first features and the second features using one or multiple feature extractors trained to detect and extract the first attributes of the first objects and the second attributes of the second objects from the first measurements and the second measurements. 3. The navigation system of claim 2 , wherein the processor is further configured to: determine the first spatial relationships of the first objects and the second spatial relationships of the second objects with respect to a location of the vehicle. 4. The navigation system of claim 1 , wherein each of the first attributes and the second attributes includes one of an absolute attribute value, a relative attribute value with respect to a state of the vehicle on the route, a type, a size, a color, a distance from the vehicle, or a velocity. 5. The navigation system of claim 1 , wherein the first objects and the second objects include at least one static object and at least one dynamic object. 6. The navigation system of claim 1 , wherein a first mode of the first sensor governing a type of the first measurements is different from a second mode of the second sensor governing a type of the second measurements, wherein the first mode and the second mode are selected from a group including sounds, color images and depth images. 7. The navigation system of claim 1 , wherein the multimodal attention neural network includes multiple input subnetworks, one for each of the multiple sensors, to encode a respective measurement of each of the multiple sensors, and includes a fusing subnetwork for fusing outputs of the multiple input subnetworks with the attention mechanism. 8. The navigation system of claim 7 , wherein each of the multiple input subnetworks includes a unimode attention trained to provide temporal correlation of features of each of the multiple sensors. 9. The navigation system of claim 1 , wherein the multiple sensors include a local sensor arranged on the vehicle and connected to the navigation system through the wired communication channel. 10. The navigation system of claim 1 , wherein the multiple sensors include a remote sensor arranged outside of the vehicle and connected to the navigation system through the wireless communication channel. 11. The navigation system of claim 1 , wherein the multiple sensors include one or a combination of a sensor arranged on a neighboring vehicle and a sensor arranged at a measurement system. 12. The navigation system of claim 1 , wherein the sentence generation neural network includes a long short-term memory (LSTM) decoder. 13. The navigation system of claim 1 , wherein the sentence generation neural network is trained based on a training set of measurements and corresponding driving instructions. 14. The navigation system of claim 1 , wherein the output device generates an acoustical signal based on the driving instructions.

Assignees

Inventors

Classifications

  • Guidance using speech or audio output, e.g. text-to-speech (text to speech systems per se G10L13/00) · CPC title

  • whereby the further system is an optical system or imaging system · CPC title

  • Retrieval, searching and output of information related to real-time traffic, weather, or environmental conditions (arrangements for giving variable traffic instructions G08G1/09) · CPC title

  • Details, e.g. road map scale, orientation, zooming, illumination, level of detail, scrolling of road map or positioning of current position marker · CPC title

  • Details of the user input interface, e.g. buttons, knobs or sliders, including those provided on a touch screen; remote controllers; input using gestures · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11635299B2 cover?
A navigation system for providing driving instructions to a driver of a vehicle traveling on a route is provided. The driving instructions are generated by executing a multimodal fusion method that comprises extracting features from sensor measurements, annotating the features with directions for the vehicle to follow the route with respect to objects sensed by the sensors, and encoding the ann…
Who is the assignee on this patent?
Mitsubishi Electric Res Laboratories Inc
What technology area does this patent fall under?
Primary CPC classification G01C21/3629. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 25 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 10 related publications on this page (citations in our corpus or others sharing the same primary CPC).