System and method for video-based detection of drive-arounds in a retail setting
US-2015310459-A1 · Oct 29, 2015 · US
US12079769B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12079769-B2 |
| Application number | US-202217741110-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 10, 2022 |
| Priority date | Jun 26, 2020 |
| Publication date | Sep 3, 2024 |
| Grant date | Sep 3, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Automated techniques provide for recalibrating cameras in a real space in which puts and takes of items are tracked. The method includes first processing one or more selected images selected from a plurality of sequences of images received from a plurality of cameras calibrated using a set of calibration images that were used to calibrate the cameras previously. The first processing includes a process step to match one or more features from the selected images with features extracted from the set of calibration images using a trained neural network classifier. The features correspond to points located at displays or structures that remain substantially immobile. Camera calibrations can be updated when transform information between features matched meets or exceeds a threshold.
Opening claim text (preview).
What is claimed is: 1. A method for recalibrating cameras in a real space for tracking puts and takes of items by subjects, the method including: first processing one or more images selected from a plurality of sequences of images received from a plurality of cameras, in which selected images in the plurality of sequences of images have respective fields of view in the real space, to: match one or more features corresponding to points located at displays or relatively immobile structures extracted from the selected images using a trained neural network classifier with features from a set of calibration images; obtain based upon features as matched, transformation information between the selected images and the set of calibration images; and update calibration of a camera with the transformation information whenever the transformation information for the camera meets or exceeds a first threshold. 2. The method of claim 1 , wherein the trained neural network classifier has been trained using a synthetic shapes dataset created by a second neural network. 3. The method of claim 2 , wherein the second neural network has been trained using a plurality of synthetic shapes having no ambiguity in interest point locations, wherein the synthetic shapes comprise three-dimensional models created automatically, and a plurality of viewpoints generated for the three-dimensional models for matching features; and wherein three-dimensional models are finetuned by data collected from like real space environments having matching features annotated between different images captured from different viewpoints. 4. The method of claim 1 , wherein feature descriptors corresponding to points located at displays or structures that remain substantially immobile are extracted using a scale invariant feature transform. 5. The method of claim 1 , further including second processing sequences of images of the plurality of sequences of images, to track puts and takes of items by subjects within respective fields of view in the real space; and wherein first processing and second processing occur substantially contemporaneously, thereby enabling cameras to be calibrated without clearing subjects from the real space or interrupting tracking puts and takes of items by subjects. 6. The method of claim 5 , wherein second processing at least one sequence of images of the plurality of sequences of images to track a take or put event, further includes, detecting the take or put event using a trained neural network. 7. The method of claim 6 , wherein second processing to track puts and takes of items by subjects includes tracking inventory caches involved in an exchange that move over time having locations in three dimensions. 8. The method of claim 7 , wherein locations of the inventory caches include locations corresponding to hands of identified subjects, and wherein processing the plurality of sequences of images includes using an image recognition engine to detect an inventory item in hands of a subject identified in the exchange as detected. 9. The method of claim 5 , wherein second processing at least one sequence of images of the plurality of sequences of images to track a take or put event, further including, detecting the take or put event using a trained random forest. 10. The method of claim 1 , further including storing the transformation information and images used to calibrate the cameras in a database. 11. The method of claim 1 , wherein the transformation information is determined relative to an origin point that is selected as a reference point for calibration. 12. The method of claim 1 , wherein updating calibration of a camera with the transformation information further includes updating calibration of a camera with the transformation information whenever the transformation information obtained for the camera meets or exceeds a second threshold of at least a 1 centimeter change in camera translation value. 13. The method of claim 1 , wherein updating calibration of a camera with the transformation information further includes updating calibration of a camera with the transformation information whenever the transformation information obtained for the camera meets or exceeds a third threshold of at least a 1 degree change in camera rotation value. 14. A system including one or more processors and memory accessible by the processors, the memory loaded with computer instructions recalibrating cameras in a real space for tracking puts and takes of items by subjects between inventory caches which can act as at least one of sources and sinks of inventory items in exchanges of inventory items, which computer instructions, when executed on the processors, implement actions comprising: first processing one or more images selected from a plurality of sequences of images received from a plurality of cameras, in which selected images in the plurality of sequences of images have respective fields of view in the real space, to: match one or more features corresponding to points located at displays or relatively immobile structures extracted from the selected images using a trained neural network classifier with features from a set of calibration images; obtain based upon features as matched, transformation information between the selected images and the set of calibration images; and update calibration of a camera with the transformation information whenever the transformation information for the camera meets or exceeds a threshold. 15. The system of claim 14 , wherein the trained neural network classifier has been trained using a synthetic shapes dataset created by a second neural network. 16. The system of claim 15 , wherein the second neural network has been trained using a plurality of synthetic shapes having no ambiguity in interest point locations, wherein the synthetic shapes comprise three-dimensional models created automatically, and a plurality of viewpoints generated for the three-dimensional models for matching features; and wherein three-dimensional models are finetuned by data collected from like real space environments having matching features annotated between different images captured from different viewpoints. 17. The system of claim 14 , wherein feature descriptors corresponding to points located at displays or structures that remain substantially immobile are extracted using a scale invariant feature transform. 18. The system of claim 14 , further including second processing sequences of images of the plurality of sequences of images, to track puts and takes of items by subjects within respective fields of view in the real space; and wherein first processing and second processing occur substantially contemporaneously, thereby enabling cameras to be calibrated without clearing subjects from the real space or interrupting tracking puts and takes of items by subjects. 19. The system of claim 18 , wherein second processing at least one sequence of images of the plurality of sequences of images to track a take or put event, further includes, detecting the take or put event using a trained neural network. 20. The system of claim 19 , wherein second processing to track puts and takes of items by subjects includes tracking inventory caches involved in an exchange which move over time having locations in three dimensions. 21. The system of claim 20 , wherein locations of the inventory caches include locations corresponding to hands of identified subjects, and wherein processing the plurality of sequences of images includ
Convolutional networks [CNN, ConvNet] · CPC title
Supervised learning · CPC title
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
using neural networks · CPC title
Image or video pattern matching; Proximity measures in feature spaces · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.