What technology area does this patent fall under?

Primary CPC classification G06V10/751. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method, device, and computer program product for determining node of decision tree

US12592071B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12592071-B2
Application number	US-202318522570-A
Country	US
Kind code	B2
Filing date	Nov 29, 2023
Priority date	Oct 27, 2023
Publication date	Mar 31, 2026
Grant date	Mar 31, 2026

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure relate to a method, a device, and a computer program product for determining a node of a decision tree. The method includes determining multiple features of multiple modals corresponding to input information. The method further includes generating a multi-modal feature representation by combining the multiple features of the multiple modals. The method further includes determining a target path in a decision tree that is associated with the multi-modal feature representation, the decision tree comprising multiple nodes. The method further includes determining, in the target path based on the multi-modal feature representation, a target node associated with the input information and used to indicate a question or an answer. This method enables the fusion of feature representations corresponding to input information of different modals to determine a multi-modal feature representation. In this way, it is possible to determine richer and more accurate user intentions.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for determining a node of a decision tree, comprising: determining multiple features of multiple modals corresponding to input information; generating a multi-modal feature representation by combining the multiple features of the multiple modals; determining a target path in a decision tree that is associated with the multi-modal feature representation, the decision tree comprising multiple nodes; and determining, in the target path based on the multi-modal feature representation, a target node associated with the input information and used to indicate a question or an answer; wherein the method is performed by an electronic device, the electronic device comprising at least one processor and memory coupled to the at least one processor. 2 . The method according to claim 1 , wherein determining multiple features of multiple modals corresponding to input information comprises: generating the multiple features of the multiple modals by separately encoding information of different modals in the input information via a multi-modal encoder. 3 . The method according to claim 2 , wherein generating a multi-modal feature representation by combining the multiple features of the multiple modals comprises: determining a joint representation corresponding to the multiple features of the multiple modals by means of cross-modal fusion of the multiple features of the multiple modals; and determining the multi-modal feature representation according to the joint representation. 4 . The method according to claim 1 , wherein determining a target path in a decision tree that is associated with the multi-modal feature representation comprises: generating decision tree features corresponding to the decision tree by encoding the decision tree via a graph neural network; and determining the target path associated with the multi-modal feature representation by comparing the decision tree features with the multi-modal feature representation. 5 . The method according to claim 4 , wherein generating decision tree features corresponding to the decision tree by encoding the decision tree via a graph neural network comprises: generating multiple embeddings corresponding to the multiple nodes in the decision tree by encoding the multiple nodes via the graph neural network. 6 . The method according to claim 5 , wherein determining the target path associated with the multi-modal feature representation by comparing the decision tree features with the multi-modal feature representation comprises: determining a first set of candidate nodes of a preset number based on similarities between the multi-modal feature representation and embeddings corresponding to first nodes at a next depth to a root node of the decision tree; determining a second set of candidate nodes of the preset number based on similarities between the multi-modal feature representation and embeddings corresponding to second nodes at a next depth to each of the first set of candidate nodes; determining an ith set of candidate nodes of the preset number by repeating the step of determining a second set of candidate nodes of the preset number based on similarities between the multi-modal feature representation and embeddings corresponding to second nodes at the next depth to each of the first set of candidate nodes, until a node at the next depth is a leaf node or the depth reaches a predetermined depth threshold, the i being greater than 2; and determining the target path based on the first set of candidate nodes, the second set of candidate nodes, and the ith set of candidate nodes. 7 . The method according to claim 1 , wherein the input information comprises at least two of natural language text, image data, and speech data. 8 . The method according to claim 1 , wherein the target path comprises multiple candidate nodes connected in sequence, and determining, in the target path based on the multi-modal feature representation, a target node associated with the input information and used to indicate a question or an answer comprises: determining a similarity between the multi-modal feature representation and the multiple candidate nodes; and determining, from the multiple candidate nodes according to the similarity, the target node associated with the input information and used to indicate a question or an answer. 9 . The method according to claim 1 , further comprising: determining an application scenario parameter associated with the input information; and selecting the decision tree matching the application scenario parameter from multiple decision trees. 10 . The method according to claim 2 , further comprising: acquiring a sample text feature and a sample image feature, contents of the sample text feature being associated with contents of the sample image feature; constructing the multi-modal encoder, the multi-modal encoder having training parameters set therein; inputting the sample text feature and the sample image feature into the multi-modal encoder to generate a prediction result; and adjusting the training parameters iteratively based on differences of the prediction result with the sample text feature and the sample image feature until the differences satisfy preset requirements. 11 . An electronic device, comprising: at least one processor; and a memory coupled to the at least one processor and having instructions stored therein, wherein the instructions, when executed by the at least one processor, cause the electronic device to perform actions comprising: determining multiple feature representations of multiple modals corresponding to input information; generating a multi-modal feature representation by combining the multiple feature representations of the multiple modals; determining a target path in a decision tree that is associated with the multi-modal feature representation, the decision tree comprising multiple nodes; and determining, in the target path based on the multi-modal feature representation, a target node associated with the input information and used to indicate a question or an answer. 12 . The electronic device according to claim 11 , wherein determining multiple feature representations of multiple modals corresponding to input information comprises: generating the multiple feature representations of the multiple modals by separately encoding information of different modals in the input information via a multi-modal encoder. 13 . The electronic device according to claim 12 , wherein generating a multi-modal feature representation by combining the multiple feature representations of the multiple modals comprises: determining a joint representation corresponding to the features of the multiple modals by means of cross-modal fusion of the multiple feature representations of the multiple modals; and determining the multi-modal feature representation according to the joint representation. 14 . The electronic device according to claim 11 , wherein determining a target path in a decision tree that is associated with the multi-modal feature representation comprises: generating decision tree features corresponding to the decision tree by encoding the decision tree via a graph neural network; and determining the target path associated with the multi-modal feature representation by comparing the decision tree features with the multi-modal feature representation. 15 . The electronic device according to claim 14 , wherein generating decision tree features corresponding to the decision tree by encoding the decision tree via a graph neural network comprises:

Assignees

Dell Products Lp

Inventors

Classifications

G06V10/751Primary
Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching · CPC title
G06V10/761
Proximity, similarity or dissimilarity measures · CPC title
G06V10/87
using selection of the recognition techniques, e.g. of a classifier in a multiple classifier system · CPC title
G06V10/82Primary
using neural networks · CPC title
G06N5/01
Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound · CPC title

Patent family

Related publications grouped by family.

View patent family 95471232

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12592071B2 cover?: Embodiments of the present disclosure relate to a method, a device, and a computer program product for determining a node of a decision tree. The method includes determining multiple features of multiple modals corresponding to input information. The method further includes generating a multi-modal feature representation by combining the multiple features of the multiple modals. The method furt…
Who is the assignee on this patent?: Dell Products Lp
What technology area does this patent fall under?: Primary CPC classification G06V10/751. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 31 2026 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Multi-granularity alignment for visual question answering

Generating a question answering system for flowcharts

Multi-modal visual question answering system

Frequently asked questions