Weighted factorization for human-object-interaction detection

US12548372B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-12548372-B2
Application numberUS-202318153166-A
CountryUS
Kind codeB2
Filing dateJan 11, 2023
Priority dateJan 11, 2023
Publication dateFeb 10, 2026
Grant dateFeb 10, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Implementations include actions of receiving an image, providing a set of features for the image, determining a set of HOIs including one or more HOIs that are potentially represented in the image, providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first ML model, a set of feature scores for respective features in the set of features, generating, by a second ML model, sets of weights based on the set of HOIs, providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores, each final score corresponding to a respective HOI in the set of HOIs, and selecting an output HOI for the image from the set of HOIs based on the set of final scores.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for determining human-object-interactions (HOIs) in images, the method comprising: receiving an image; providing a tuple of features for the image, the tuple of features comprising an object feature, a human feature, a pose feature, and a relationship feature; determining a set of HOIs comprising one or more HOIs that are potentially represented in the image; providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first machine learning (ML) model, a set of feature scores for respective features in the tuple of features, each set of feature scores corresponding to a respective HOI in the set of HOIs; generating, by a second ML model, sets of weights based on the set of HOIs, wherein generating sets of weights based on the set of HOIs comprises: generating, by a third ML model, sets of text embeddings, each set of text embeddings corresponding to a respective HOI, processing, by the second ML model, the sets of text embeddings to generate the sets of weights; providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores corresponding to the respective HOI, each final score in the set of final scores corresponding to a respective HOI in the set of HOIs; and selecting an output HOI for the image from the set of HOIs based on the set of final scores. 2 . The method of claim 1 , wherein each set of weights in the sets of weights comprises an object feature weight representing a relative importance of the object feature in selecting the output HOI for the image, a human feature weight representing a relative importance of the human feature in selecting the output HOI for the image, a pose feature weight representing a relative importance of the pose feature in selecting the output HOI for the image, and a relationship feature weight representing a relative importance of the relationship feature in selecting the output HOI for the image. 3 . The method of claim 1 , wherein determining the output HOI for the image from the set of HOIs based on the set of final scores comprises selecting the output HOI as a HOI with a highest final score in the set of final scores. 4 . The method of claim 1 , wherein each final score is a weighted sum of features scores by applying respective weights in the set of weights. 5 . The method of claim 1 , further comprising: comparing the output HOI with a step in a pre-defined standard operation procedure (SOP) for a task; and proving feedback representative of whether the output HOI correspond to the step in the SOP. 6 . The method of claim 1 , wherein the sets of weights are specific to the image and an HOI depicted in the image. 7 . A system, comprising: one or more processors; and a computer-readable storage device coupled to the one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for determining human-object-interactions (HOIs) in images, the operations comprising: receiving an image; providing a tuple of features for the image, the tuple of features comprising an object feature, a human feature, a pose feature, and a relationship feature; determining a set of HOIs comprising one or more HOIs that are potentially represented in the image; providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first machine learning (ML) model, a set of feature scores for respective features in the tuple of features, each set of feature scores corresponding to a respective HOI in the set of HOIs; generating, by a second ML model, sets of weights based on the set of HOIs, wherein generating sets of weights based on the set of HOIs comprises: generating, by a third ML model, sets of text embeddings, each set of text embeddings corresponding to a respective HOI, processing, by the second ML model, the sets of text embeddings to generate the sets of weights; providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores corresponding to the respective HOI, each final score in the set of final scores corresponding to a respective HOI in the set of HOIs; and selecting an output HOI for the image from the set of HOIs based on the set of final scores. 8 . The system of claim 7 , wherein each set of weights in the sets of weights comprises an object feature weight representing a relative importance of the object feature in selecting the output HOI for the image, a human feature weight representing a relative importance of the human feature in selecting the output HOI for the image, a pose feature weight representing a relative importance of the pose feature in selecting the output HOI for the image, and a relationship feature weight representing a relative importance of the relationship feature in selecting the output HOI for the image. 9 . The system of claim 7 , wherein determining the output HOI for the image from the set of HOIs based on the set of final scores comprises selecting the output HOI as a HOI with a highest final score in the set of final scores. 10 . The system of claim 7 , wherein each final score is a weighted sum of features scores by applying respective weights in the set of weights. 11 . The system of claim 7 , wherein operations further comprise: comparing the output HOI with a step in a pre-defined standard operation procedure (SOP) for a task; and proving feedback representative of whether the output HOI correspond to the step in the SOP. 12 . The system of claim 7 , wherein the sets of weights are specific to the image and an HOI depicted in the image. 13 . A non-transitory computer-readable storage media coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for determining human-object-interactions (HOIs) in images, the operations comprising: receiving an image; providing a tuple of features for the image, the tuple of features comprising an object feature, a human feature, a pose feature, and a relationship feature; determining a set of HOIs comprising one or more HOIs that are potentially represented in the image; providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first machine learning (ML) model, a set of feature scores for respective features in the tuple of features, each set of feature scores corresponding to a respective HOI in the set of HOIs; generating, by a second ML model, sets of weights based on the set of HOIs, wherein generating sets of weights based on the set of HOIs comprises: generating, by a third ML model, sets of text embeddings, each set of text embeddings corresponding to a respective HOI, processing, by the second ML model, the sets of text embeddings to generate the sets of weights; providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores corresponding to the respective HOI, each final score in the set of final scores corresponding to a respective HOI in the set of HOIs; and selecting an output HOI for the image from the set of HOIs based on the set of final scores. 14 . The non-transitory computer-readable storage media of claim 13 , wherein each set of weights in the sets of weights comprises an object feature weight repr

Assignees

Inventors

Classifications

  • Extraction of image or video features · CPC title

  • G06V10/77Primary

    Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation · CPC title

  • using neural networks · CPC title

  • G06V40/20Primary

    Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12548372B2 cover?
Implementations include actions of receiving an image, providing a set of features for the image, determining a set of HOIs including one or more HOIs that are potentially represented in the image, providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first ML model, a set of feature scores for respective features in the set of features, generating, by a second…
Who is the assignee on this patent?
Accenture Global Solutions Ltd
What technology area does this patent fall under?
Primary CPC classification G06V10/77. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 10 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).