What technology area does this patent fall under?

Primary CPC classification G06T7/73. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 15 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Object tracking and entity resolution

US12117838B1 · US · B1

Patent metadata
Field	Value
Publication number	US-12117838-B1
Application number	US-202117218621-A
Country	US
Kind code	B1
Filing date	Mar 31, 2021
Priority date	Mar 31, 2021
Publication date	Oct 15, 2024
Grant date	Oct 15, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of objects that move in the environment. Upon receiving a query from a user, based on the location of the objects relative to the device/user, the system can interpret gestures and voice commands to infer which object is specified by the voice command. To build the environment map, the system performs object detection to generate bounding boxes associated with an object, then clusters the bounding boxes into a three-dimensional (3D) object associated with 3D coordinates. As the system tracks the object using the 3D coordinates while maintaining two-dimensional (2D) information (e.g., bounding boxes and other features), the system can use existing 2D models to process objects in 3D.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method, the method comprising: receiving environment data representing a three-dimensional map of an environment; moving, by a device, to a first location in the environment; determining a configuration of a mechanical component of the device, the mechanical component comprising a camera; determining first position data representing the first location and the configuration; receiving, from the camera, first image data representing the environment; performing object detection using the first image data to determine an object; based on determining the object, determining first stored data corresponding to a previous location of the object; determining, using the environment data and the first position data, a first direction in which the camera is directed while the device is at the first location; determining a first bounding box corresponding to a portion of the first image data representing the object; based at least in part on the first position data and the first direction, determining the first bounding box corresponds to a second location; determining second position data corresponding to the second location; and determining second stored data corresponding to the object being located at the second location. 2. The computer-implemented method of claim 1 , further comprising: receiving, by the device, first audio data representing speech of a user; performing speech processing on the first audio data to generate speech processing data; determining that the speech processing data indicates the object; and causing an action to be performed based at least in part on the second stored data. 3. The computer-implemented method of claim 2 , further comprising: receiving, by the device, second image data including a second representation of the environment; processing the second image data to determine at least one of a first direction in which a face of the user is oriented or a second direction in which the user is pointing; and determining that at least one of the first direction or the second direction is associated with the second stored data, wherein determining that the speech processing data indicates the object is based at least in part on determining that at least one of the first direction or the second direction is associated with the second stored data. 4. A computer-implemented method, the method comprising: moving, by a device, to a first position in an environment; determining first position data representing the first position; receiving, from at least a first image capture component of the device, first image data of the environment; performing object detection using the first image data to determine an object; based on determining the object, determining first stored data corresponding to a previous position of the object; determining, using at least the first position data and the first image data, second position data corresponding to a current position of the object; determining second stored data corresponding to the current position of the object, receiving, by the device after determining the second stored data, first audio data representing speech of a first user; performing speech processing on the first audio data to generate speech processing data; determining that the speech processing data indicates the object; and causing an action to be performed based at least in part on the second stored data. 5. The computer-implemented method of claim 4 , wherein the first position data comprises data representing a configuration of a mechanical component of the device, the mechanical component including the first image capture component. 6. The computer-implemented method of claim 4 , further comprising including time data with the second stored data. 7. The computer-implemented method of claim 4 , further comprising: performing user recognition using one or more of second image data or second audio data to determine a second user who interacted with the object; and including user data with the second stored data. 8. The computer-implemented method of claim 4 , further comprising: receiving, by the device, second image data including a second representation of the environment; processing the second image data to determine at least one of a first direction in which a face of the first user is oriented or a second direction in which the first user is pointing; and determining that at least one of the first direction or the second direction is associated with the second stored data, wherein determining that the speech processing data indicates the object is based at least in part on determining that at least one of the first direction or the second direction is associated with the second stored data. 9. The computer-implemented method of claim 4 , wherein causing the action to be performed further comprises: determining, using the speech processing data, output data indicating the previous position of the object; performing text-to-speech (TTS) processing using the output data to determine output audio data; and causing the device to playback the output audio data. 10. The computer-implemented method of claim 4 , further comprising: receiving environment data representing a three-dimensional map of the environment; determining, using the environment data, a first direction in which the first image capture component is directed while the device is at the first position; determining a first bounding box corresponding to a portion of the first image data representing the object; and based at least in part on the position data and the first direction, determining that the first bounding box corresponds to the second position data. 11. The computer-implemented method of claim 4 , further comprising: determining a first bounding box corresponding to a portion of the first image data representing the object, wherein the second stored data includes the second position data and data corresponding to the first bounding box. 12. The computer-implemented method of claim 4 , wherein causing the action to be performed further comprises: determining, using the speech processing data and the second stored data, output data indicating the current position of the object; performing text-to-speech (TTS) processing using the output data to determine output audio data; and causing the device to playback the output audio data. 13. The computer-implemented method of claim 4 , wherein causing the action to be performed further comprises: determining, using the speech processing data and the second stored data, a first location associated with the current position of the object; and moving, by the device, to the first location. 14. A system comprising: at least one processor; and memory including instructions operable to be executed by the at least one processor to cause the system to: move a device to a first position in an environment; determine first position data representing the first position, wherein the first position data comprises data representing a configuration of a mechanical component of the device, the mechanical component including a first image capture component; receive, from at least the first image capture component of the device, first image data of the environment; perform object detection using the first image data to determine an object; based on determining the object, determine first stored data corresponding to a previous position of the object; determine, using at least the first position data and the first image data, second position data corresponding to a current position of the o

Assignees

Amazon Tech Inc

Inventors

Classifications

G06V40/20
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
G06V40/171
Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title
G06V40/193
Preprocessing; Feature extraction · CPC title
G06V10/764
using classification, e.g. of video objects · CPC title
G06V10/82
using neural networks · CPC title

Patent family

Related publications grouped by family.

View patent family 93018338

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12117838B1 cover?: Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of obj…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06T7/73. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 15 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Semantic Sensing Analysis System

Obstacle recognition method for autonomous robots

User input processing method and electronic device supporting same

Endoscopic system and method for controlling the same

Fitness and sports applications for an autonomous unmanned aerial vehicle

Moving-object position estimating system, information processing apparatus and moving-object position estimating method

Multi-tracker object tracking

Frequently asked questions