Method of lightweight simultaneous localization and mapping performed on a real-time computing and battery operated wheeled device
US-2022187841-A1 · Jun 16, 2022 · US
US12585285B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12585285-B2 |
| Application number | US-202418417504-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 19, 2024 |
| Priority date | Mar 24, 2023 |
| Publication date | Mar 24, 2026 |
| Grant date | Mar 24, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
In conventional robot navigation techniques learning and planning algorithms act independently without guiding each other simultaneously. A method and system for robotic navigation with simultaneous local path planning and learning is disclosed. The method discloses an approach to learn and plan simultaneously by assisting each other and improve the overall system performance. The planner acts as an actuator and helps to balance exploration and exploitation in the learning algorithm. The synergy between dynamic window approach (DWA) as a planning algorithm and a disclosed Next best Q-learning (NBQ) as a learning algorithm offers an efficient local planning algorithm. Unlike the traditional Q-learning, dimension of Q-tree in the NBQ is dynamic and does not require to define a priori.
Opening claim text (preview).
What is claimed is: 1 . A method for robot navigation, the method further comprising: performing by a robotic agent executed by one or more hardware processors, a global path planning to obtain a plurality of way points to reach a goal position based on a current position, the goal position and two-dimensional floor plan of an environment the robotic agent is deployed into, wherein the current position of the robotic agent represents a current way point; and sequentially navigating by the robotic agent, through each of the plurality of way points to reach the goal position by simultaneously applying a) a Dynamic Window Approach (DWA) for a local path planning, and b) a Next best Q-learning (NBQ) approach that enables real-time learning while balancing between an exploitation approach and an exploration approach, wherein sequentially navigating through each of the plurality of way points to reach the goal position comprises iteratively performing a plurality of steps until the plurality of way points are covered, the plurality of steps further comprising: a) computing an optimal velocity vector for a local goal evaluated for the current way point at a current state among a plurality of states visited by the robotic agent; b) employing by the robotic agent, one of an exploration approach and an exploitation approach for the local path planning based on the optimal velocity vector, wherein i) the exploration approach is followed if the optimal velocity vector is empty, wherein value of a scalar parameter, required to tune a number of linear velocity samples and a number of angular velocity samples, is set to zero during the exploration approach, and ii) the exploitation approach is followed if the optimal velocity vector is not-empty, wherein value of the tuning scalar parameter is set to be greater than zero and less than one during the exploitation approach; c) computing the number of linear velocity samples and the number of angular velocity samples at each of the plurality of states based on value set for the scalar parameter; d) obtaining the optimal velocity vector and a score value for each velocity sample offered by the DWA based on the current state, the local goal, the number of linear velocity samples, and the number of angular velocity samples; e) evaluating a reward using a predefined reward function and updating a Q-value of a Q-tree, wherein dimension of the Q-tree in the NBQ is dynamically obtained without need to define a priori, wherein the Q-value adapts based on one or more uncertainties in the environment, wherein by the NBQ approach, the Q-value at the current state of the plurality of states due to an action is a summation of an immediate reward and a discounted next best Q-value, wherein the discounted next best Q-value is the Q-value corresponding to a next best action at the current state, wherein the Q-value is updated recursively at each state for the plurality of actions, wherein the immediate reward in the NBQ is adapted in accordance with a pruned score function that is a score without considering changes of a static obstacle in the environment as a product of a weight to control on the pruned score function and a distance between a nearest obstacle and robot's predicted path corresponding to linear velocity and an angular velocity; and f) recomputing the optimal velocity vector at the current state with the updated Q-tree and executing the optimal velocity vector to update the current waypoint by current position of the robotic agent. 2 . The method of claim 1 , wherein each of the plurality of states is a tuple further comprising a sector within sensing range of a sensor attached to the robotic agent, a current linear velocity of the robotic agent, and a current angular velocity of the robotic agent. 3 . A robotic agent for robot navigation, the system further comprising: a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to: perform a global path planning to obtain a plurality of way points to reach a goal position based on a current position, the goal position and two-dimensional floor plan of an environment the robotic agent is deployed into, wherein the current position of the robotic agent represents a current way point; and sequentially navigate through each of the plurality of way points to reach the goal position by simultaneously applying a) a Dynamic Window Approach (DWA) for a local path planning, and b) a Next best Q-learning (NBQ) approach that enables real-time learning while balancing between an exploitation approach and an exploration approach, wherein sequentially navigating through each of the plurality of way points to reach the goal position comprises iteratively performing a plurality of steps until the plurality of way points are covered, the plurality of steps further comprising: a) computing an optimal velocity vector for a local goal evaluated for the current way point at a current state among a plurality of states visited by the robotic agent; b) employing, by the robotic agent, one of an exploration approach and an exploitation approach for the local path planning based on the optimal velocity vector, wherein i) the exploration approach is followed if the optimal velocity vector is empty, wherein value of a scalar parameter, required to tune a number of linear velocity samples and a number of angular velocity samples, is set to zero during the exploration approach, and ii) the exploitation approach is followed if the optimal velocity vector is not-empty, wherein value of the tuning scalar parameter is set to be greater than zero and less than one during the exploitation approach; c) computing the number of linear velocity samples and the number of angular velocity samples at each of the plurality of states based on value set for the scalar parameter; d) obtaining the optimal velocity vector and a score value for each velocity sample offered by the DWA based on the current state, the local goal, the number of linear velocity samples, and the number of angular velocity samples; e) evaluating a reward using a predefined reward function and updating a Q-value of a Q-tree, wherein dimension of the Q-tree in the NBQ is dynamically obtained without need to define a priori, wherein the Q-value adapts based on one or more uncertainties in the environment, wherein by the NBQ approach, the Q-value at the current state of the plurality of states due to an action is a summation of an immediate reward and a discounted next best Q-value, wherein the discounted next best Q-value is the Q-value corresponding to a next best action at the current state, wherein the Q-value is updated recursively at each state for the plurality of actions, wherein the immediate reward in the NBQ is adapted in accordance with a pruned score function that is a score without considering changes of a static obstacle in the environment as a product of a weight to control on the pruned score function and a distance between a nearest obstacle and robot's predicted path corresponding to linear velocity and an angular velocity; and f) recomputing the optimal velocity vector at the current state with the updated Q-tree and executing the optimal velocity vector to update the current waypoint by current position of the robotic agent. 4 . The robotic agent of claim 3 , wherein state is a tuple further comprising a sector within sensing range of a sensor attached to the robotic agent, a current linear velocity of the robotic agent, and a current angular velocity of the robotic agent. 5 . One or more non-transitory machine-readable information storage mediums comprising one or mor
using a topologic or simplified map · CPC title
using machine learning, e.g. neural networks · CPC title
specially adapted for indoor navigation · CPC title
Industrial sites, e.g. warehouses or factories · CPC title
using environment maps, e.g. simultaneous localisation and mapping [SLAM] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.