Robotic navigation with simultaneous local path planning and learning

US12585285B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12585285-B2
Application numberUS-202418417504-A
CountryUS
Kind codeB2
Filing dateJan 19, 2024
Priority dateMar 24, 2023
Publication dateMar 24, 2026
Grant dateMar 24, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

In conventional robot navigation techniques learning and planning algorithms act independently without guiding each other simultaneously. A method and system for robotic navigation with simultaneous local path planning and learning is disclosed. The method discloses an approach to learn and plan simultaneously by assisting each other and improve the overall system performance. The planner acts as an actuator and helps to balance exploration and exploitation in the learning algorithm. The synergy between dynamic window approach (DWA) as a planning algorithm and a disclosed Next best Q-learning (NBQ) as a learning algorithm offers an efficient local planning algorithm. Unlike the traditional Q-learning, dimension of Q-tree in the NBQ is dynamic and does not require to define a priori.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for robot navigation, the method further comprising: performing by a robotic agent executed by one or more hardware processors, a global path planning to obtain a plurality of way points to reach a goal position based on a current position, the goal position and two-dimensional floor plan of an environment the robotic agent is deployed into, wherein the current position of the robotic agent represents a current way point; and sequentially navigating by the robotic agent, through each of the plurality of way points to reach the goal position by simultaneously applying a) a Dynamic Window Approach (DWA) for a local path planning, and b) a Next best Q-learning (NBQ) approach that enables real-time learning while balancing between an exploitation approach and an exploration approach, wherein sequentially navigating through each of the plurality of way points to reach the goal position comprises iteratively performing a plurality of steps until the plurality of way points are covered, the plurality of steps further comprising: a) computing an optimal velocity vector for a local goal evaluated for the current way point at a current state among a plurality of states visited by the robotic agent; b) employing by the robotic agent, one of an exploration approach and an exploitation approach for the local path planning based on the optimal velocity vector, wherein i) the exploration approach is followed if the optimal velocity vector is empty, wherein value of a scalar parameter, required to tune a number of linear velocity samples and a number of angular velocity samples, is set to zero during the exploration approach, and ii) the exploitation approach is followed if the optimal velocity vector is not-empty, wherein value of the tuning scalar parameter is set to be greater than zero and less than one during the exploitation approach; c) computing the number of linear velocity samples and the number of angular velocity samples at each of the plurality of states based on value set for the scalar parameter; d) obtaining the optimal velocity vector and a score value for each velocity sample offered by the DWA based on the current state, the local goal, the number of linear velocity samples, and the number of angular velocity samples; e) evaluating a reward using a predefined reward function and updating a Q-value of a Q-tree, wherein dimension of the Q-tree in the NBQ is dynamically obtained without need to define a priori, wherein the Q-value adapts based on one or more uncertainties in the environment, wherein by the NBQ approach, the Q-value at the current state of the plurality of states due to an action is a summation of an immediate reward and a discounted next best Q-value, wherein the discounted next best Q-value is the Q-value corresponding to a next best action at the current state, wherein the Q-value is updated recursively at each state for the plurality of actions, wherein the immediate reward in the NBQ is adapted in accordance with a pruned score function that is a score without considering changes of a static obstacle in the environment as a product of a weight to control on the pruned score function and a distance between a nearest obstacle and robot's predicted path corresponding to linear velocity and an angular velocity; and f) recomputing the optimal velocity vector at the current state with the updated Q-tree and executing the optimal velocity vector to update the current waypoint by current position of the robotic agent. 2 . The method of claim 1 , wherein each of the plurality of states is a tuple further comprising a sector within sensing range of a sensor attached to the robotic agent, a current linear velocity of the robotic agent, and a current angular velocity of the robotic agent. 3 . A robotic agent for robot navigation, the system further comprising: a memory storing instructions; one or more Input/Output (I/O) interfaces; and one or more hardware processors coupled to the memory via the one or more I/O interfaces, wherein the one or more hardware processors are configured by the instructions to: perform a global path planning to obtain a plurality of way points to reach a goal position based on a current position, the goal position and two-dimensional floor plan of an environment the robotic agent is deployed into, wherein the current position of the robotic agent represents a current way point; and sequentially navigate through each of the plurality of way points to reach the goal position by simultaneously applying a) a Dynamic Window Approach (DWA) for a local path planning, and b) a Next best Q-learning (NBQ) approach that enables real-time learning while balancing between an exploitation approach and an exploration approach, wherein sequentially navigating through each of the plurality of way points to reach the goal position comprises iteratively performing a plurality of steps until the plurality of way points are covered, the plurality of steps further comprising: a) computing an optimal velocity vector for a local goal evaluated for the current way point at a current state among a plurality of states visited by the robotic agent; b) employing, by the robotic agent, one of an exploration approach and an exploitation approach for the local path planning based on the optimal velocity vector, wherein i) the exploration approach is followed if the optimal velocity vector is empty, wherein value of a scalar parameter, required to tune a number of linear velocity samples and a number of angular velocity samples, is set to zero during the exploration approach, and ii) the exploitation approach is followed if the optimal velocity vector is not-empty, wherein value of the tuning scalar parameter is set to be greater than zero and less than one during the exploitation approach; c) computing the number of linear velocity samples and the number of angular velocity samples at each of the plurality of states based on value set for the scalar parameter; d) obtaining the optimal velocity vector and a score value for each velocity sample offered by the DWA based on the current state, the local goal, the number of linear velocity samples, and the number of angular velocity samples; e) evaluating a reward using a predefined reward function and updating a Q-value of a Q-tree, wherein dimension of the Q-tree in the NBQ is dynamically obtained without need to define a priori, wherein the Q-value adapts based on one or more uncertainties in the environment, wherein by the NBQ approach, the Q-value at the current state of the plurality of states due to an action is a summation of an immediate reward and a discounted next best Q-value, wherein the discounted next best Q-value is the Q-value corresponding to a next best action at the current state, wherein the Q-value is updated recursively at each state for the plurality of actions, wherein the immediate reward in the NBQ is adapted in accordance with a pruned score function that is a score without considering changes of a static obstacle in the environment as a product of a weight to control on the pruned score function and a distance between a nearest obstacle and robot's predicted path corresponding to linear velocity and an angular velocity; and f) recomputing the optimal velocity vector at the current state with the updated Q-tree and executing the optimal velocity vector to update the current waypoint by current position of the robotic agent. 4 . The robotic agent of claim 3 , wherein state is a tuple further comprising a sector within sensing range of a sensor attached to the robotic agent, a current linear velocity of the robotic agent, and a current angular velocity of the robotic agent. 5 . One or more non-transitory machine-readable information storage mediums comprising one or mor

Assignees

Inventors

Classifications

  • using a topologic or simplified map · CPC title

  • using machine learning, e.g. neural networks · CPC title

  • specially adapted for indoor navigation · CPC title

  • Industrial sites, e.g. warehouses or factories · CPC title

  • using environment maps, e.g. simultaneous localisation and mapping [SLAM] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12585285B2 cover?
In conventional robot navigation techniques learning and planning algorithms act independently without guiding each other simultaneously. A method and system for robotic navigation with simultaneous local path planning and learning is disclosed. The method discloses an approach to learn and plan simultaneously by assisting each other and improve the overall system performance. The planner acts …
Who is the assignee on this patent?
Tata Consultancy Services Ltd
What technology area does this patent fall under?
Primary CPC classification G05D1/229. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 24 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).