Semantic Sensing Analysis System
US-2023418287-A1 · Dec 28, 2023 · US
US12117838B1 · US · B1
| Field | Value |
|---|---|
| Publication number | US-12117838-B1 |
| Application number | US-202117218621-A |
| Country | US |
| Kind code | B1 |
| Filing date | Mar 31, 2021 |
| Priority date | Mar 31, 2021 |
| Publication date | Oct 15, 2024 |
| Grant date | Oct 15, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Described herein is a system for tracking objects and performing dynamic entity resolution using image data. For example, the system may build an environment map and populate the map with objects present in the environment. As the devices move about the environment it may capture image data and, based on its position and/or configuration of its components, may determine updated locations of objects that move in the environment. Upon receiving a query from a user, based on the location of the objects relative to the device/user, the system can interpret gestures and voice commands to infer which object is specified by the voice command. To build the environment map, the system performs object detection to generate bounding boxes associated with an object, then clusters the bounding boxes into a three-dimensional (3D) object associated with 3D coordinates. As the system tracks the object using the 3D coordinates while maintaining two-dimensional (2D) information (e.g., bounding boxes and other features), the system can use existing 2D models to process objects in 3D.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, the method comprising: receiving environment data representing a three-dimensional map of an environment; moving, by a device, to a first location in the environment; determining a configuration of a mechanical component of the device, the mechanical component comprising a camera; determining first position data representing the first location and the configuration; receiving, from the camera, first image data representing the environment; performing object detection using the first image data to determine an object; based on determining the object, determining first stored data corresponding to a previous location of the object; determining, using the environment data and the first position data, a first direction in which the camera is directed while the device is at the first location; determining a first bounding box corresponding to a portion of the first image data representing the object; based at least in part on the first position data and the first direction, determining the first bounding box corresponds to a second location; determining second position data corresponding to the second location; and determining second stored data corresponding to the object being located at the second location. 2. The computer-implemented method of claim 1 , further comprising: receiving, by the device, first audio data representing speech of a user; performing speech processing on the first audio data to generate speech processing data; determining that the speech processing data indicates the object; and causing an action to be performed based at least in part on the second stored data. 3. The computer-implemented method of claim 2 , further comprising: receiving, by the device, second image data including a second representation of the environment; processing the second image data to determine at least one of a first direction in which a face of the user is oriented or a second direction in which the user is pointing; and determining that at least one of the first direction or the second direction is associated with the second stored data, wherein determining that the speech processing data indicates the object is based at least in part on determining that at least one of the first direction or the second direction is associated with the second stored data. 4. A computer-implemented method, the method comprising: moving, by a device, to a first position in an environment; determining first position data representing the first position; receiving, from at least a first image capture component of the device, first image data of the environment; performing object detection using the first image data to determine an object; based on determining the object, determining first stored data corresponding to a previous position of the object; determining, using at least the first position data and the first image data, second position data corresponding to a current position of the object; determining second stored data corresponding to the current position of the object, receiving, by the device after determining the second stored data, first audio data representing speech of a first user; performing speech processing on the first audio data to generate speech processing data; determining that the speech processing data indicates the object; and causing an action to be performed based at least in part on the second stored data. 5. The computer-implemented method of claim 4 , wherein the first position data comprises data representing a configuration of a mechanical component of the device, the mechanical component including the first image capture component. 6. The computer-implemented method of claim 4 , further comprising including time data with the second stored data. 7. The computer-implemented method of claim 4 , further comprising: performing user recognition using one or more of second image data or second audio data to determine a second user who interacted with the object; and including user data with the second stored data. 8. The computer-implemented method of claim 4 , further comprising: receiving, by the device, second image data including a second representation of the environment; processing the second image data to determine at least one of a first direction in which a face of the first user is oriented or a second direction in which the first user is pointing; and determining that at least one of the first direction or the second direction is associated with the second stored data, wherein determining that the speech processing data indicates the object is based at least in part on determining that at least one of the first direction or the second direction is associated with the second stored data. 9. The computer-implemented method of claim 4 , wherein causing the action to be performed further comprises: determining, using the speech processing data, output data indicating the previous position of the object; performing text-to-speech (TTS) processing using the output data to determine output audio data; and causing the device to playback the output audio data. 10. The computer-implemented method of claim 4 , further comprising: receiving environment data representing a three-dimensional map of the environment; determining, using the environment data, a first direction in which the first image capture component is directed while the device is at the first position; determining a first bounding box corresponding to a portion of the first image data representing the object; and based at least in part on the position data and the first direction, determining that the first bounding box corresponds to the second position data. 11. The computer-implemented method of claim 4 , further comprising: determining a first bounding box corresponding to a portion of the first image data representing the object, wherein the second stored data includes the second position data and data corresponding to the first bounding box. 12. The computer-implemented method of claim 4 , wherein causing the action to be performed further comprises: determining, using the speech processing data and the second stored data, output data indicating the current position of the object; performing text-to-speech (TTS) processing using the output data to determine output audio data; and causing the device to playback the output audio data. 13. The computer-implemented method of claim 4 , wherein causing the action to be performed further comprises: determining, using the speech processing data and the second stored data, a first location associated with the current position of the object; and moving, by the device, to the first location. 14. A system comprising: at least one processor; and memory including instructions operable to be executed by the at least one processor to cause the system to: move a device to a first position in an environment; determine first position data representing the first position, wherein the first position data comprises data representing a configuration of a mechanical component of the device, the mechanical component including a first image capture component; receive, from at least the first image capture component of the device, first image data of the environment; perform object detection using the first image data to determine an object; based on determining the object, determine first stored data corresponding to a previous position of the object; determine, using at least the first position data and the first image data, second position data corresponding to a current position of the o
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Local features and components; Facial parts (eye characteristics G06V40/18); Occluding parts, e.g. glasses; Geometrical relationships · CPC title
Preprocessing; Feature extraction · CPC title
using classification, e.g. of video objects · CPC title
using neural networks · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.