Method for determining text topic, and electronic device

US2023122093A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023122093-A1
Application numberUS-202217992041-A
CountryUS
Kind codeA1
Filing dateNov 22, 2022
Priority dateJan 17, 2022
Publication dateApr 20, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method for determining a text topic includes: after a word sequence corresponding to a text to be processed and a number of spaced words in the text to be processed between each two words in the word sequence are determined, a graph structure corresponding to the text to be processed may be determined based on the number of spaced words between each two words in the text to be processed, a topic distribution corresponding to the text may be determined based on the word sequence and the graph structure, a topic corresponding to the text may be determined based on the topic distribution.

First claim

Opening claim text (preview).

What is claimed is: 1 . A computer-implemented method for determining a text topic, comprising: determining a word sequence corresponding to a text to be processed and a number of spaced words in the text to be processed between each two words in the word sequence; determining a graph structure corresponding to the text to be processed based on the number of spaced words; determining a topic distribution corresponding to the text to be processed based on the word sequence and the graph structure; and determining a topic corresponding to the text to be processed based on the topic distribution. 2 . The method of claim 1 , wherein determining the graph structure corresponding to the text to be processed comprises: in response to a number of words in the text to be processed between any two words being less than a first threshold, determining that there is a connection edge between the any two words; and generating the graph structure corresponding to the text to be processed based on the determined connection edges between words in the word sequence. 3 . The method of claim 1 , wherein determining the topic distribution corresponding to the text comprises: determining topic distribution labels corresponding to each word in the word sequence based on topic distribution labels corresponding to each word in a preset word list; and determining the topic distribution corresponding to the text by merging the topic distribution labels corresponding to each word in the word sequence based on dependency probabilities between preset topics and connection edges between words in the graph structure. 4 . The method of claim 3 , further comprising: obtaining a training data set, wherein the training data set comprises a plurality of texts; determining a reference graph structure and a reference word set corresponding to each text by preprocessing each of the texts; determining an initial topic distribution corresponding to each text based on an initial topic distribution function; determining, based on initial topic distribution labels corresponding to each word in an initial word list and initial dependency probabilities between the preset topics, a first probability of generating the reference graph structure based on the initial topic distribution and a second probability of generating the reference word set based on the initial topic distribution; determining a loss value based on the first probability and the second probability; and obtaining the topic distribution labels corresponding to each word in the preset word list and dependency probabilities between the preset topics by correcting based on the loss value, the topic distribution labels corresponding to each word in the initial word list, the initial dependency probabilities between the preset topics, and the initial topic distribution function. 5 . The method of claim 4 , wherein determining the first probability of generating the reference graph structure based on the initial topic distribution and the second probability of generating the reference word set based on the initial topic distribution comprises: determining topic distribution labels corresponding to each reference word based on the initial topic distribution; determining the first probability of generating the reference graph structure based on the initial topic distribution, on the basis of the initial dependency probabilities between the preset topics and the topic distribution labels for each reference word; and determining the second probability of generating the reference word set based on the reference graph structure and the initial topic distribution, on the basis of the topic distribution labels corresponding to each word in the initial word list. 6 . An electronic device, comprising: at least one processor; and a memory configured to store instructions executable by the at least one processor, wherein the at least one processor is configured to: determine a word sequence corresponding to a text to be processed and a number of spaced words in the text to be processed between each two words in the word sequence; determine a graph structure corresponding to the text to be processed based on the number of spaced words; determine a topic distribution corresponding to the text to be processed based on the word sequence and the graph structure; and determine a topic corresponding to the text to be processed based on the topic distribution. 7 . The apparatus of claim 6 , wherein the at least one processor is further configured to: in response to a number of words in the text to be processed between any two words being less than a first threshold, determine that there is a connection edge between the any two words; and generate the graph structure corresponding to the text to be processed based on the determined connection edges between words in the word sequence. 8 . The apparatus of claim 6 , wherein the at least one processor is further configured to: determine topic distribution labels corresponding to each word in the word sequence based on topic distribution labels corresponding to each word in a preset word list; and determine the topic distribution corresponding to the text by merging the topic distribution labels corresponding to each word in the word sequence based on dependency probabilities between preset topics and connection edges between words in the graph structure. 9 . The apparatus of claim 8 , wherein at least one processor is further configured to: acquire a training data set, in which the training data set includes a plurality of texts; determine a reference graph structure and a reference word set corresponding to each text by preprocessing each of the texts; determine an initial topic distribution corresponding to each text based on an initial topic distribution function; determine, based on initial topic distribution labels corresponding to each word in an initial word list and initial dependency probabilities between the preset topics, a first probability of generating the reference graph structure based on the initial topic distribution and a second probability of generating the reference word set based on the initial topic distribution; determine a loss value based on the first probability and the second probability; and obtain the topic distribution labels corresponding to each word in the preset word list and dependency probabilities between the preset topics by correcting based on the loss value, the topic distribution labels corresponding to each word in the initial word list, the initial dependency probabilities between the preset topics, and the initial topic distribution function. 10 . The apparatus of claim 9 , wherein the at least one processor is further configured to: determine topic distribution labels corresponding to each reference word based on the initial topic distribution; determine the first probability of generating the reference graph structure based on the initial topic distribution, on the basis of the initial dependency probabilities between the preset topics and the topic distribution labels for each reference word; and determine the second probability of generating the reference word set based on the reference graph structure and the initial topic distribution on the basis of the topic distribution labels corresponding to each word in the initial word list. 11 . A non-transitory computer-readable storage medium having computer instructions stored thereon, wherein the computer instructions are configured to cause a computer to implement a method for determining a text topic, the method comprising: determining a word sequence corresponding to a text to be processed

Assignees

Inventors

Classifications

  • G06F40/30Primary

    Semantic analysis · CPC title

  • G06F40/216Primary

    using statistical methods · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Machine learning · CPC title

  • Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023122093A1 cover?
A method for determining a text topic includes: after a word sequence corresponding to a text to be processed and a number of spaced words in the text to be processed between each two words in the word sequence are determined, a graph structure corresponding to the text to be processed may be determined based on the number of spaced words between each two words in the text to be processed, a to…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Apr 20 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).