Machine learning collaboration techniques
US-2024420212-A1 · Dec 19, 2024 · US
US2023122093A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2023122093-A1 |
| Application number | US-202217992041-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 22, 2022 |
| Priority date | Jan 17, 2022 |
| Publication date | Apr 20, 2023 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method for determining a text topic includes: after a word sequence corresponding to a text to be processed and a number of spaced words in the text to be processed between each two words in the word sequence are determined, a graph structure corresponding to the text to be processed may be determined based on the number of spaced words between each two words in the text to be processed, a topic distribution corresponding to the text may be determined based on the word sequence and the graph structure, a topic corresponding to the text may be determined based on the topic distribution.
Opening claim text (preview).
What is claimed is: 1 . A computer-implemented method for determining a text topic, comprising: determining a word sequence corresponding to a text to be processed and a number of spaced words in the text to be processed between each two words in the word sequence; determining a graph structure corresponding to the text to be processed based on the number of spaced words; determining a topic distribution corresponding to the text to be processed based on the word sequence and the graph structure; and determining a topic corresponding to the text to be processed based on the topic distribution. 2 . The method of claim 1 , wherein determining the graph structure corresponding to the text to be processed comprises: in response to a number of words in the text to be processed between any two words being less than a first threshold, determining that there is a connection edge between the any two words; and generating the graph structure corresponding to the text to be processed based on the determined connection edges between words in the word sequence. 3 . The method of claim 1 , wherein determining the topic distribution corresponding to the text comprises: determining topic distribution labels corresponding to each word in the word sequence based on topic distribution labels corresponding to each word in a preset word list; and determining the topic distribution corresponding to the text by merging the topic distribution labels corresponding to each word in the word sequence based on dependency probabilities between preset topics and connection edges between words in the graph structure. 4 . The method of claim 3 , further comprising: obtaining a training data set, wherein the training data set comprises a plurality of texts; determining a reference graph structure and a reference word set corresponding to each text by preprocessing each of the texts; determining an initial topic distribution corresponding to each text based on an initial topic distribution function; determining, based on initial topic distribution labels corresponding to each word in an initial word list and initial dependency probabilities between the preset topics, a first probability of generating the reference graph structure based on the initial topic distribution and a second probability of generating the reference word set based on the initial topic distribution; determining a loss value based on the first probability and the second probability; and obtaining the topic distribution labels corresponding to each word in the preset word list and dependency probabilities between the preset topics by correcting based on the loss value, the topic distribution labels corresponding to each word in the initial word list, the initial dependency probabilities between the preset topics, and the initial topic distribution function. 5 . The method of claim 4 , wherein determining the first probability of generating the reference graph structure based on the initial topic distribution and the second probability of generating the reference word set based on the initial topic distribution comprises: determining topic distribution labels corresponding to each reference word based on the initial topic distribution; determining the first probability of generating the reference graph structure based on the initial topic distribution, on the basis of the initial dependency probabilities between the preset topics and the topic distribution labels for each reference word; and determining the second probability of generating the reference word set based on the reference graph structure and the initial topic distribution, on the basis of the topic distribution labels corresponding to each word in the initial word list. 6 . An electronic device, comprising: at least one processor; and a memory configured to store instructions executable by the at least one processor, wherein the at least one processor is configured to: determine a word sequence corresponding to a text to be processed and a number of spaced words in the text to be processed between each two words in the word sequence; determine a graph structure corresponding to the text to be processed based on the number of spaced words; determine a topic distribution corresponding to the text to be processed based on the word sequence and the graph structure; and determine a topic corresponding to the text to be processed based on the topic distribution. 7 . The apparatus of claim 6 , wherein the at least one processor is further configured to: in response to a number of words in the text to be processed between any two words being less than a first threshold, determine that there is a connection edge between the any two words; and generate the graph structure corresponding to the text to be processed based on the determined connection edges between words in the word sequence. 8 . The apparatus of claim 6 , wherein the at least one processor is further configured to: determine topic distribution labels corresponding to each word in the word sequence based on topic distribution labels corresponding to each word in a preset word list; and determine the topic distribution corresponding to the text by merging the topic distribution labels corresponding to each word in the word sequence based on dependency probabilities between preset topics and connection edges between words in the graph structure. 9 . The apparatus of claim 8 , wherein at least one processor is further configured to: acquire a training data set, in which the training data set includes a plurality of texts; determine a reference graph structure and a reference word set corresponding to each text by preprocessing each of the texts; determine an initial topic distribution corresponding to each text based on an initial topic distribution function; determine, based on initial topic distribution labels corresponding to each word in an initial word list and initial dependency probabilities between the preset topics, a first probability of generating the reference graph structure based on the initial topic distribution and a second probability of generating the reference word set based on the initial topic distribution; determine a loss value based on the first probability and the second probability; and obtain the topic distribution labels corresponding to each word in the preset word list and dependency probabilities between the preset topics by correcting based on the loss value, the topic distribution labels corresponding to each word in the initial word list, the initial dependency probabilities between the preset topics, and the initial topic distribution function. 10 . The apparatus of claim 9 , wherein the at least one processor is further configured to: determine topic distribution labels corresponding to each reference word based on the initial topic distribution; determine the first probability of generating the reference graph structure based on the initial topic distribution, on the basis of the initial dependency probabilities between the preset topics and the topic distribution labels for each reference word; and determine the second probability of generating the reference word set based on the reference graph structure and the initial topic distribution on the basis of the topic distribution labels corresponding to each word in the initial word list. 11 . A non-transitory computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions are configured to cause a computer to implement a method for determining a text topic, the method comprising: determining a word sequence corresponding to a text to be processed
Semantic analysis · CPC title
using statistical methods · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Machine learning · CPC title
Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.