Spatio-temporal consistency embeddings from multiple observed modalities

US12277483B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12277483-B2
Application numberUS-202318126557-A
CountryUS
Kind codeB2
Filing dateMar 27, 2023
Priority dateApr 1, 2021
Publication dateApr 15, 2025
Grant dateApr 15, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Provided is a process that includes obtaining data indicative of state of a dynamic mechanical system and an environment of the dynamic mechanical system, the data comprising a plurality of channels of data from a plurality of different sensors including a plurality of cameras and other sensors indicative of state of actuators of the dynamic mechanical system; forming a training set from the obtained data by segmenting the data by time and grouping segments from the different channels by time to form units of training data that span different channels among the plurality of channels; training a metric learning model to encode inputs corresponding to the plurality of channels as vectors in an embedding space with self-supervised learning based on the training set; and using the trained metric learning model to control the dynamic mechanical system or another dynamic mechanical system.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computer-implemented method comprising: obtaining input data representative of a state of a robot system relative to an environment, the robot system comprising a plurality of types of sensors, wherein the input data comprises a temporal sequence of sensor data for each of a plurality of different channels of sensor data reporting properties sensed by respective types of sensors; forming a set of training records from the input data, wherein each training record includes temporally matched sensor data selected from across the plurality of different channels of sensor data, wherein each type of sensor outputs a corresponding one of the different channels of sensor data; training a first model to encode inputs corresponding to the plurality of different channels of sensor data as vectors in an embedding space with self-supervised learning based on the set of training records, wherein the training comprises iteratively adjusting parameters of the first model based on outputs of an objective function, wherein the objective function causes the parameters of the first model to be adjusted in directions that cause vectors in the embedding space to encode temporal consistency of properties sensed by the respective types of sensors in the set of training records; causing the robot system to attempt to perform a task using the first model; and updating the first model based on the performance of the robot system on the attempt to perform the task using the first model. 2. The computer-implemented method of claim 1 , further comprising training a second model to classify states within the embedding space with reinforcement learning based on outputs of the first model. 3. The computer-implemented method of claim 2 , wherein causing the robot system to attempt to perform the task using the first model comprises: outputting a first embedding vector from the first model; determining, by the second model, a next state for the task in a sequence of states that correspond to the task based on a location of the first embedding vector relative to the sequence of states; and outputting one or more commands for one or more components of the robot system in a direction that minimizes a distance between a current state of the task and the next state of the task. 4. The computer-implemented method of claim 1 , further comprising: determining, for each temporal sequence of sensor data, a width of a temporal window from which to select sensor data records; and segmenting each temporal sequence of sensor data based on the respective temporal window width. 5. The computer-implemented method of claim 4 , further comprising iteratively selecting from each temporal sequence of sensor data a set of sensor data records within a respective temporal window to form the set of training records. 6. The computer-implemented method of claim 1 , wherein iteratively adjusting parameters of the first model based on the outputs of the objective function comprises: iteratively adjusting parameters of a distance metric to maximize distance between output vectors for dissimilar training records and minimize distance between output vectors for similar training records. 7. The computer-implemented method of claim 1 , wherein training the first model to encode inputs corresponding to the plurality of different channels of sensor data as vectors in an embedding space based on the set of training records comprises: forming the set of training records that span the plurality of different channels of sensor data by selecting, for each training record, a subset of sensor data records from each of the sets of sensor data records for each channel that occurred over same unit of time; and selecting subsets of sensor data records that include multiple sensor data records in temporal sequence. 8. The computer-implemented method of claim 7 , further comprising: selecting adjustments to parameters that cause vectors corresponding to temporal sequences of training records to embed at locations for which transitions between locations maintain temporally consistent properties. 9. The computer-implemented method of claim 1 , wherein obtaining input data representative of a state of a robot system relative to the environment comprises: issuing a sequence of commands to the robot system; and recording values outputted on the plurality of different channels based on a response of the robot system to the sequence of commands. 10. A system comprising: a robot system comprising a plurality of types of sensors and a plurality of different channels of sensor data reporting properties sensed by respective types of sensors; one or more processing units coupled to memory; and one or more computer readable storage media storing instructions that when executed cause the one or more processing units to perform operations comprising: obtaining input data comprising temporal sequences of sensor data for the plurality of different channels of sensor data from the robot system; forming a set of training records from the input data, wherein each training record includes temporally matched sensor data selected from across the plurality of different channels of sensor data, wherein each type of sensor outputs a corresponding one of the different channels of sensor data; training a first model to encode inputs corresponding to the plurality of different channels of sensor data as vectors in an embedding space with self-supervised learning based on the set of training records, wherein the training comprises iteratively adjusting parameters of the first model based on outputs of an objective function, wherein the objective function causes the parameters of the first model to be adjusted in directions that cause vectors in the embedding space to encode temporal consistency of properties sensed by the respective types of sensors in the set of training records; causing the robot system to attempt to perform a task using the first model; and updating the first model based on the performance of the robot system on the attempt to perform the task using the first model. 11. The system of claim 10 , wherein two or more of the respective types of sensors are selected from a video camera, an infrared camera, a depth camera, a touch sensor, a strain sensor, a position sensor, and a sensor of a servo or stepper motor. 12. The system of claim 10 , wherein the plurality of different channels of sensor data comprises a first channel comprising image data from a first camera in a first position and orientation, a second channel comprising image data from a second camera in a second position and orientation different than that of the first camera, and third channel comprising data from a sensor selected from a LiDAR sensor, a touch sensor, a strain sensor, a position sensor, and a sensor of a servo or stepper motor. 13. The system of claim 10 , wherein the operations further comprise training a second model to classify states within the embedding space with reinforcement learning based on outputs of the first model. 14. The system of claim 13 , wherein causing the robot system to attempt to perform a task using the first model comprises: outputting a first embedding vector from the first model; determining, by the second model, a next state for the task in a sequence of states that correspond to the task based on a location of the first embedding vector relative to the sequence of states; and outputting one or more commands for one or more components of the robot system in a direction that minimizes a distance between a current state of the task and the next state of the task.

Assignees

Inventors

Classifications

  • Reinforcement learning · CPC title

  • Transfer learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12277483B2 cover?
Provided is a process that includes obtaining data indicative of state of a dynamic mechanical system and an environment of the dynamic mechanical system, the data comprising a plurality of channels of data from a plurality of different sensors including a plurality of cameras and other sensors indicative of state of actuators of the dynamic mechanical system; forming a training set from the ob…
Who is the assignee on this patent?
Sanctuary Cognitive Systems Corp
What technology area does this patent fall under?
Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Apr 15 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 6 related publications on this page (citations in our corpus or others sharing the same primary CPC).