Information processing apparatus for detecting overfitting of learned model
US-2025191349-A1 · Jun 12, 2025 · US
US12548372B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12548372-B2 |
| Application number | US-202318153166-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 11, 2023 |
| Priority date | Jan 11, 2023 |
| Publication date | Feb 10, 2026 |
| Grant date | Feb 10, 2026 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Implementations include actions of receiving an image, providing a set of features for the image, determining a set of HOIs including one or more HOIs that are potentially represented in the image, providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first ML model, a set of feature scores for respective features in the set of features, generating, by a second ML model, sets of weights based on the set of HOIs, providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores, each final score corresponding to a respective HOI in the set of HOIs, and selecting an output HOI for the image from the set of HOIs based on the set of final scores.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for determining human-object-interactions (HOIs) in images, the method comprising: receiving an image; providing a tuple of features for the image, the tuple of features comprising an object feature, a human feature, a pose feature, and a relationship feature; determining a set of HOIs comprising one or more HOIs that are potentially represented in the image; providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first machine learning (ML) model, a set of feature scores for respective features in the tuple of features, each set of feature scores corresponding to a respective HOI in the set of HOIs; generating, by a second ML model, sets of weights based on the set of HOIs, wherein generating sets of weights based on the set of HOIs comprises: generating, by a third ML model, sets of text embeddings, each set of text embeddings corresponding to a respective HOI, processing, by the second ML model, the sets of text embeddings to generate the sets of weights; providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores corresponding to the respective HOI, each final score in the set of final scores corresponding to a respective HOI in the set of HOIs; and selecting an output HOI for the image from the set of HOIs based on the set of final scores. 2 . The method of claim 1 , wherein each set of weights in the sets of weights comprises an object feature weight representing a relative importance of the object feature in selecting the output HOI for the image, a human feature weight representing a relative importance of the human feature in selecting the output HOI for the image, a pose feature weight representing a relative importance of the pose feature in selecting the output HOI for the image, and a relationship feature weight representing a relative importance of the relationship feature in selecting the output HOI for the image. 3 . The method of claim 1 , wherein determining the output HOI for the image from the set of HOIs based on the set of final scores comprises selecting the output HOI as a HOI with a highest final score in the set of final scores. 4 . The method of claim 1 , wherein each final score is a weighted sum of features scores by applying respective weights in the set of weights. 5 . The method of claim 1 , further comprising: comparing the output HOI with a step in a pre-defined standard operation procedure (SOP) for a task; and proving feedback representative of whether the output HOI correspond to the step in the SOP. 6 . The method of claim 1 , wherein the sets of weights are specific to the image and an HOI depicted in the image. 7 . A system, comprising: one or more processors; and a computer-readable storage device coupled to the one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for determining human-object-interactions (HOIs) in images, the operations comprising: receiving an image; providing a tuple of features for the image, the tuple of features comprising an object feature, a human feature, a pose feature, and a relationship feature; determining a set of HOIs comprising one or more HOIs that are potentially represented in the image; providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first machine learning (ML) model, a set of feature scores for respective features in the tuple of features, each set of feature scores corresponding to a respective HOI in the set of HOIs; generating, by a second ML model, sets of weights based on the set of HOIs, wherein generating sets of weights based on the set of HOIs comprises: generating, by a third ML model, sets of text embeddings, each set of text embeddings corresponding to a respective HOI, processing, by the second ML model, the sets of text embeddings to generate the sets of weights; providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores corresponding to the respective HOI, each final score in the set of final scores corresponding to a respective HOI in the set of HOIs; and selecting an output HOI for the image from the set of HOIs based on the set of final scores. 8 . The system of claim 7 , wherein each set of weights in the sets of weights comprises an object feature weight representing a relative importance of the object feature in selecting the output HOI for the image, a human feature weight representing a relative importance of the human feature in selecting the output HOI for the image, a pose feature weight representing a relative importance of the pose feature in selecting the output HOI for the image, and a relationship feature weight representing a relative importance of the relationship feature in selecting the output HOI for the image. 9 . The system of claim 7 , wherein determining the output HOI for the image from the set of HOIs based on the set of final scores comprises selecting the output HOI as a HOI with a highest final score in the set of final scores. 10 . The system of claim 7 , wherein each final score is a weighted sum of features scores by applying respective weights in the set of weights. 11 . The system of claim 7 , wherein operations further comprise: comparing the output HOI with a step in a pre-defined standard operation procedure (SOP) for a task; and proving feedback representative of whether the output HOI correspond to the step in the SOP. 12 . The system of claim 7 , wherein the sets of weights are specific to the image and an HOI depicted in the image. 13 . A non-transitory computer-readable storage media coupled to one or more processors and having instructions stored thereon which, when executed by the one or more processors, cause the one or more processors to perform operations for determining human-object-interactions (HOIs) in images, the operations comprising: receiving an image; providing a tuple of features for the image, the tuple of features comprising an object feature, a human feature, a pose feature, and a relationship feature; determining a set of HOIs comprising one or more HOIs that are potentially represented in the image; providing sets of feature scores by, for each HOI in the set of HOIs, determining, by a first machine learning (ML) model, a set of feature scores for respective features in the tuple of features, each set of feature scores corresponding to a respective HOI in the set of HOIs; generating, by a second ML model, sets of weights based on the set of HOIs, wherein generating sets of weights based on the set of HOIs comprises: generating, by a third ML model, sets of text embeddings, each set of text embeddings corresponding to a respective HOI, processing, by the second ML model, the sets of text embeddings to generate the sets of weights; providing a set of final scores by, for each HOI in the set of HOIs, determining a final score based on a respective set of weights and the set of feature scores corresponding to the respective HOI, each final score in the set of final scores corresponding to a respective HOI in the set of HOIs; and selecting an output HOI for the image from the set of HOIs based on the set of final scores. 14 . The non-transitory computer-readable storage media of claim 13 , wherein each set of weights in the sets of weights comprises an object feature weight repr
Extraction of image or video features · CPC title
Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation · CPC title
using neural networks · CPC title
Movements or behaviour, e.g. gesture recognition (recognition of facial expressions G06V40/16) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.