Method, device, equipment, and storage medium for mining topic concept

US11651164B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11651164-B2
Application numberUS-202017036609-A
CountryUS
Kind codeB2
Filing dateSep 29, 2020
Priority dateApr 15, 2020
Publication dateMay 16, 2023
Grant dateMay 16, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The present disclosure provides a method, a device, an equipment and a storage medium for mining a topic concept. The method includes: acquiring a plurality of candidate topic concepts based on a query; performing word segmentation on the plurality of candidate topic concepts and performing part-of-speech tagging on words obtained after performing the word segmentation, to obtain a part-of-speech sequence of each of the plurality of candidate topic concepts; and filtering the plurality of candidate topic concepts based on the part-of-speech sequence, to filter out a topic concept corresponding to a target part-of-speech sequence among the plurality of candidate topic concepts, in which a proportion of accurate topic concepts in the target part-of-speech sequence is lower than or equal to a first preset threshold, or a proportion of inaccurate topic concepts in the target part-of-speech sequence is higher than or equal to a second preset threshold.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for mining a topic concept in a search text, which is performed by a computer, wherein the computer comprises one or more processors, a memory, one or more interfaces for connecting the one or more processors and the memory, an input device, and an output device, the method comprising: acquiring, through calling and executing a program stored in the memory by the one or more processors, a plurality of candidate topic concepts in the search text outputted by the output device based on a query inputted by the input device, wherein each of the plurality of candidate topic concepts in the search text comprises (i) one or more things, (ii) one or more events, or (iii) one or more characters; performing, through calling and executing the program stored in the memory by the one or more processors, word segmentation on the plurality of candidate topic concepts and performing part-of-speech tagging on a plurality of words respectively obtained after performing the word segmentation, to obtain a part-of-speech sequence of each of the plurality of candidate topic concepts, wherein the part-of-speech tagging on the plurality of words respectively obtained after performing the word segmentation comprises tagging a part of speech of each word of the plurality of words obtained after performing the word segmentation, and wherein the part-of-speech sequence of each of the plurality of candidate topic concepts is a sequence that includes one or more parts of speech and one or more part-of-speech separators for each word used to represent each of the plurality of candidate topic concepts; and filtering through the plurality of candidate topic concepts, through calling and executing the program stored in the memory by the one or more processors, by using a candidate topic concept template preset in advance, based on the part-of-speech sequence, to select one or more candidate topic concepts that meet at least one requirement preset in advance among the plurality of candidate topic concepts, wherein the candidate topic concept template preset in advance comprises one or more accurate topic concepts preset in advance or one or more inaccurate topic concepts preset in advance, wherein when the candidate topic concept template preset in advance comprises the one or more accurate topic concepts preset in advance, one or more candidate topic concepts that do match the candidate topic concept template preset in advance are selected among the plurality of candidate topic concepts; or wherein when the candidate topic concept template preset in advance comprises the one or more inaccurate topic concepts preset in advance, one or more candidate topic concepts that do not match the candidate topic concept template preset in advance are selected among the plurality of candidate topic concepts. 2. The method according to claim 1 , before the filtering through the plurality of candidate topic concepts, through calling and executing the program stored in the memory by the one or more processors, to select one or more candidate topic concepts that meet the at least one requirement preset in advance among the plurality of candidate topic concepts, the method further comprising: tagging, through calling and executing the program stored in the memory by the one or more processors, a part of topic concepts among the plurality of candidate topic concepts, to obtain a tagging result, the tagging result indicating whether each topic concept in the part of topic concepts is accurate, and a part-of-speech sequence of the part of topic concepts including the part-of-speech sequence of the plurality of candidate topic concepts; and counting, through calling and executing the program stored in the memory by the one or more processors, one of (i) a proportion of the accurate topic concepts and (ii) a proportion of the inaccurate topic concepts, in each target part-of-speech sequence according to the tagging result. 3. The method according to claim 1 , wherein the acquiring, through calling and executing the program stored in the memory by the one or more processors, the plurality of candidate topic concepts in the search text outputted by the output device based on the query inputted by the input device comprises: performing, through calling and executing the program stored in the memory by the one or more processors, word segmentation on a first query to obtain a first word segmentation result; performing, through calling and executing the program stored in the memory by the one or more processors, word segmentation on a first multimedia content to obtain a second word segmentation result, wherein the first multimedia content is a multimedia content hit by searching the first query; and determining, through calling and executing the program stored in the memory by the one or more processors, a first candidate topic concept according to the first word segmentation result and the second word segmentation result, wherein the first candidate topic concept is a word content in which a word continuously appears in the first query and a word continuously appears in the first multimedia content, and the first candidate topic concept is one of the plurality of candidate topic concepts. 4. The method according to claim 3 , wherein the first candidate topic concept is a longest one among a plurality of continuous contents, the plurality of continuous contents being a word content in which a word continuously appears in the first query and a word continuously appears in the first multimedia content. 5. The method according to claim 1 , after the filtering through the plurality of candidate topic concepts, through calling and executing the program stored in the memory by the one or more processors, to select one or more candidate topic concepts that meet the at least one requirement preset in advance among the plurality of candidate topic concepts, the method further comprising: deleting, through calling and executing the program stored in the memory by the one or more processors, a target candidate topic concept from filtered candidate topic concepts according to a target template, wherein one of (i) in a case that the target template is an inaccurate topic concept template, the target candidate topic concept is a topic concept matching the target template, and (ii) in a case that the target template is an accurate topic concept template, the target candidate topic concept is a topic concept not matching the target template among the filtered candidate topic concepts. 6. An electronic equipment comprising: one or more processors; a memory communicatively connected with the one or more processors; one or more interfaces for connecting the one or more processors and the memory; an input device; and an output device, the memory storing one or more program instructions for mining a topic concept in a search text, wherein the one or more processors are configured to execute the one or more program instructions so as to realize a method for mining the topic concept in the search text, which is performed by the electronic equipment, the method comprising: acquiring, through calling and executing a program stored in the memory by the one or more processors, a plurality of candidate topic concepts in the search text outputted by the output device based on a query inputted by the input device, wherein each of the plurality of candidate topic concepts in the search text comprises (i) one or more things, (ii) one or more events, or (iii) one or more characters; performing, through calling and executing the program stored in the memory by the one or more processors, word segmentation on the plurality of candidate topic concepts and performing part-of-speech tagging on a plurality of words respectively obtained after performing the

Assignees

Inventors

Classifications

  • G06F40/279Primary

    Recognition of textual entities · CPC title

  • Data mining · CPC title

  • Query processing support for facilitating data mining operations in structured databases · CPC title

  • Filtering based on additional data, e.g. user or group profiles · CPC title

  • Morphological analysis · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11651164B2 cover?
The present disclosure provides a method, a device, an equipment and a storage medium for mining a topic concept. The method includes: acquiring a plurality of candidate topic concepts based on a query; performing word segmentation on the plurality of candidate topic concepts and performing part-of-speech tagging on words obtained after performing the word segmentation, to obtain a part-of-spee…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd, Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 16 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 4 related publications on this page (citations in our corpus or others sharing the same primary CPC).