Who is the assignee on this patent?

Baidu online network technology beijing co ltd

What technology area does this patent fall under?

Primary CPC classification G06F40/279. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Method and apparatus for performing word segmentation on text, device, and medium

US11468236B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11468236-B2
Application number	US-202017020166-A
Country	US
Kind code	B2
Filing date	Sep 14, 2020
Priority date	Jan 14, 2020
Publication date	Oct 11, 2022
Grant date	Oct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure provide a method and apparatus for performing word segmentation on a text, a device and a medium, which relate to the field of data processing technology and particularly to a smart search technology. The method may include: dividing a to-be-segmented text into at least two layers of character fragment combinations, any layer of character fragments being child character fragments of a previous layer of character fragments and/or parent character fragments of a next layer of character fragments; and segmenting the to-be-segmented text according to a target word granularity based on the at least two layers of character fragment combinations.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for performing word segmentation on a text, comprising: dividing a to-be-segmented text into at least two layers of character fragment combinations, any layer of character fragments being child character fragments of a previous layer of character fragments and/or parent character fragments of a next layer of character fragments; and segmenting the to-be-segmented text according to a target word granularity based on the at least two layers of character fragment combinations; wherein the dividing the to-be-segmented text into at least two layers of character fragment combinations comprises: extracting candidate character fragments of at least one kind of length from the previous layer of character fragments, the previous layer of character fragments belonging to a previous layer of character fragment combination; combining the extracted candidate character fragments to obtain candidate character fragment combinations; and determining a current layer of character fragment combination from the candidate character fragment combinations according to an overlapping relationship between the candidate character fragments and historical usage information of the candidate character fragments, the current layer of character fragment combination including at least one character fragment of the current layer. 2. The method according to claim 1 , wherein the determining the current layer of character fragment combination from the candidate character fragment combinations according to the overlapping relationship between the candidate character fragments and historical usage information of the candidate character fragments comprises: filtering a candidate character fragment combination having an overlap from the candidate character fragment combinations, to obtain target character fragment combinations; and determining the current layer of character fragment combination from the target character fragment combinations according to a number of candidate character fragments included in the target character fragment combinations and historical usage information of the candidate character fragments. 3. The method according to claim 2 , wherein the determining the current layer of character fragment combination from the target character fragment combinations according to the number of candidate character fragments included in the target character fragment combinations and historical usage information of the candidate character fragments comprises: calculating an information entropy of the candidate character fragments according to historical adjacent character information of the candidate character fragments; determining weights of the target character fragment combinations according to the calculated information entropy; and determining the current layer of character fragment combination from the target character fragment combinations according to the number of the candidate character fragments included in the target character fragment combinations and the weights of the target character fragment combinations. 4. The method according to claim 1 , wherein the segmenting the to-be-segmented text according to the target word granularity based on the at least two layers of character fragment combinations comprises: determining target segmentation fragments from character fragments of the character fragment combinations according to historical usage information of character fragments in the character fragment combinations and a parent-child relationship between character fragments in different character fragment combinations; and combining the target segmentation fragments, and segmenting the to-be-segmented text according to the target word granularity based on the combination of target segmentation fragments. 5. The method according to claim 4 , wherein the determining target segmentation fragments from character fragments of the character fragment combinations according to historical usage information of character fragments in the character fragment combination and a parent-child relationship between character fragments in different character fragment combinations comprises: determining, according to historical usage information of a parent character fragment in the character fragment combinations, a weight of the parent character fragment; determining, according to historical usage information of a child character fragment associated with the parent character fragment, a comprehensive weight of the child character fragment; and comparing the weight of the parent character fragment with the comprehensive weight of the child character fragment; and terminating a traversal for a branch to which the parent character fragment belongs and using the child character fragment associated with the parent character fragment as the target segmentation fragment, in response to the weight of the parent character fragment is greater than the comprehensive weight of the child character fragment. 6. The method according to claim 1 , wherein after segmenting the to-be-segmented text, the method further comprises: comparing a target segmentation word obtained through the segmentation with an existing segmentation word, the existing segmentation word being obtained by segmenting the to-be-segmented text based on an existing word segmentation logic; and determining a to-be-mined word from the target segmentation word according to a comparison result. 7. An electronic device, comprising: at least one processor; and a memory, communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the instructions when executed by the at least one processor, cause the at least one processor to perform operations, the operations comprising: dividing a to-be-segmented text into at least two layers of character fragment combinations, any layer of character fragments being child character fragments of a previous layer of character fragments and/or parent character fragments of a next layer of character fragments; and segmenting the to-be-segmented text according to a target word granularity based on the at least two layers of character fragment combinations; wherein the dividing the to-be-segmented text into at least two layers of character fragment combinations comprises: extracting candidate character fragments of at least one kind of length from the previous layer of character fragments, the previous layer of character fragments belonging to a previous layer of character fragment combination; combining the extracted candidate character fragments to obtain candidate character fragment combinations; and determining a current layer of character fragment combination from the candidate character fragment combinations according to an overlapping relationship between the candidate character fragments and historical usage information of the candidate character fragments, the current layer of character fragment combination including at least one character fragment of the current layer. 8. The electronic device according to claim 7 , wherein the determining the current layer of character fragment combination from the candidate character fragment combinations according to the overlapping relationship between the candidate character fragments and historical usage information of the candidate character fragments comprises: filtering a candidate character fragment combination having an overlap from the candidate character fragment combinations, to obtain target character fragment combinations; and determining the current layer of character fragment combination from the target character fragment combinations according to a number of candidate character fragments included in the target

Assignees

Baidu online network technology beijing co ltd

Inventors

Classifications

G06F40/279Primary
Recognition of textual entities · CPC title
G06F16/3344Primary
using natural language analysis · CPC title
Y02D10/00
Energy efficient computing, e.g. low power processors, power management or thermal management · CPC title

Patent family

Related publications grouped by family.

View patent family 71001864

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468236B2 cover?: Embodiments of the present disclosure provide a method and apparatus for performing word segmentation on a text, a device and a medium, which relate to the field of data processing technology and particularly to a smart search technology. The method may include: dividing a to-be-segmented text into at least two layers of character fragment combinations, any layer of character fragments being ch…
Who is the assignee on this patent?: Baidu online network technology beijing co ltd
What technology area does this patent fall under?: Primary CPC classification G06F40/279. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).