System and method for vision-based event detection

US12524503B1 · US · B1

Patent metadata
FieldValue
Publication numberUS-12524503-B1
Application numberUS-202318525398-A
CountryUS
Kind codeB1
Filing dateNov 30, 2023
Priority dateFeb 11, 2019
Publication dateJan 13, 2026
Grant dateJan 13, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

This disclosure describes systems and techniques for identifying events that occur within an environment using image data captured at the environment. For example, one or more cameras may generate image data representative of a user interacting with an item on the shelf. This image data may be used to generate feature data associated with the user and the item, which may be analyzed by one or more classifiers for identifying an interaction between the user and the item. The systems and techniques may then generate interaction data, which in turn may be analyzed by one or more additional classifiers for identifying an event, such as the user picking a particular item from the shelf within the environment. Event data indicative of the event may then be used to update a virtual cart of the user.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer device comprising: one or more processors; memory; a first component stored in the memory and executable on the one or more processors, wherein the first component is configured to: receive image data from one or more cameras in an environment, wherein the image data represents a first user interacting with a first item in the environment; and generate, using the image data, feature data associated with at least one of the first user or the first item, wherein the feature data comprises at least one segmentation map representing a plurality of regions of pixels of the image data and labels for each of the plurality of regions of pixels, and wherein at least one of the labels corresponds to a portion of the first user; a second component stored in the memory and executable on the one or more processors, wherein the second component is configured to: receive the feature data; input the feature data into a first artificial neural network that has been trained to determine whether the feature data represents the interaction between the first user and the first item; and generate, using the first artificial neural network, first data indicating at least a first time range of the interaction and a first location of the interaction; a third component stored in the memory and executable on the one or more processors, wherein the third component is configured to: receive the first data; input the first data into a second artificial neural network that has been trained to determine whether data indicating a time range and a location represents at least one predefined activity of a plurality of predefined activities; determine, using the second artificial neural network, that the first data represents a first predefined activity of the plurality of predefined activities; receive second data associated with a virtual cart of the first user; determine, based at least in part on the first data and the second data, a first action taken by the first user with respect to the first item and a first identity of the first item; and generate, based at least in part on the first data and the second data, third data indicating at least a second time range of the first predefined activity, a second location of the first predefined activity, the first action taken by the first user with respect to the first item, and the first identity of the first item; a fourth component stored in the memory and executable on the one or more processors, wherein the fourth component is configured to: receive the third data; and automatically update the virtual cart based at least in part on the third data to indicate the first identity of the first item and the first action taken with respect to the first item. 2 . The computer device of claim 1 , wherein the at least one segmentation map indicates that a first region of pixels of the image data depicts a hand of the first user, wherein the feature data further comprises: a customer-interaction score map indicating that a second region of pixels of the image data represents an interaction between the hand of the first user and the first item; or at least one of a direction or velocity of the hand of the first user in the image data. 3 . The computer device of claim 1 , wherein the third component is further configured to: input the first data and the second data into a third artificial neural network to determine the first location of the first predefined activity, the first action taken with respect to the first item, and the first identity of the first item; and generate, using the third artificial neural network, the third data indicating at least the second time range of the first predefined activity, the second location of the first predefined activity, the first action taken by the first user with respect to the first item, and the first identity of the first item. 4 . The computer device of claim 1 , wherein the third component is further configured to: identify a second predefined activity based at least in part on the first data; and generate fourth data indicative of the second predefined activity, the fourth data indicating at least a third time range of the second predefined activity, a third location of the second predefined activity, a second action taken by a second user with respective to a second item, and a second identity of the second item. 5 . The computer device of claim 1 , wherein the feature data indicates a state of a hand of the first user, and wherein the state of the hand indicates at least one of: that the hand of the first user was empty while moving toward an inventory location and that the hand of the first user held the first item while moving away from the inventory location; that the hand of the first user held the first item while moving toward the inventory location and the hand of the first user was empty while moving away from the inventory location; that the hand of the first user was empty while moving toward the inventory location and the hand of the first user was empty while moving away from the inventory location; or that the hand of the first user held the first item while moving toward the inventory location and the hand of the first user held the first item while moving away from the inventory location. 6 . A method comprising: generating, using one or more cameras in an environment that includes inventory locations holding items, image data representing a first user in the environment; generating feature data based at least in part on the image data, wherein the feature data comprises a segmentation map, and wherein the segmentation map represents a plurality of sets of pixels of the image data and labels for each of the plurality of sets of pixels; inputting the feature data into a first artificial neural network that has been trained to determine whether feature data represents an interaction between a user and an item; generating, using the first artificial neural network based at least in part on the feature data, first interaction data indicative of a first interaction between the first user and a first item, wherein the interaction data indicates a time range and at least one inventory location of the environment; inputting the first interaction data into a second artificial neural network that has been trained to determine whether the interaction data corresponds to one of a plurality of predefined activities in the environment; determining, using the second artificial neural network based at least in part on the interaction data, that the first interaction data corresponds to a first predefined activity in the environment, wherein the first predefined activity comprises a first action performed by the first user with respect to the first item at the at least one inventory location; receiving virtual cart data representing a current state of a virtual cart of the first user; generating, based at least in part on the interaction data and the virtual cart data, first event data indicating the first action performed by the first user and a first identity of the first item; and automatically updating the virtual cart of the first user to indicate the first action performed by the first user and the first identity of the first item based at least in part on the first event data. 7 . The method of claim 6 , further comprising: inputting the interaction data and the virtual cart data into a third artificial neural network; and generating the first event data using the third artificial neural network based at least in part on the interaction data and the virtual cart data. 8 . The method of claim 6 , wherein the feature data further comprises a customer-interaction score map, wherein the

Assignees

Inventors

Classifications

  • Locating goods or services, e.g. based on physical position of the goods or services within a shopping facility · CPC title

  • comprising goods or services offered by multiple providers, e.g. multiple merchants · CPC title

  • G06F18/241Primary

    relating to the classification model, e.g. parametric or non-parametric approaches · CPC title

  • Recognition of hand or arm movements, e.g. recognition of deaf sign language (static hand signs G06V40/113) · CPC title

  • Extraction of image or video features · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12524503B1 cover?
This disclosure describes systems and techniques for identifying events that occur within an environment using image data captured at the environment. For example, one or more cameras may generate image data representative of a user interacting with an item on the shelf. This image data may be used to generate feature data associated with the user and the item, which may be analyzed by one or m…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F18/241. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 13 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).