Temporal based word segmentation
US-2017061291-A1 · Mar 2, 2017 · US
US10650096B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10650096-B2 |
| Application number | US-201815934410-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 23, 2018 |
| Priority date | Jun 14, 2017 |
| Publication date | May 12, 2020 |
| Grant date | May 12, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Embodiments of the present disclosure disclose a word segmentation method based on artificial intelligence, a server and a storage medium. The word segmentation method may include: acquiring a corpus to be segmented and a segmentation model corresponding to a preset segmentation template; matching the corpus to be segmented with the segmentation model according to a preset matching algorithm, and acquiring a target phrase satisfying a first preset rule in the corpus to be segmented; modifying an emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase; and performing a word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire a first segmentation result.
Opening claim text (preview).
What is claimed is: 1. A word segmentation method based on artificial intelligence, comprising: acquiring a corpus to be segmented and a segmentation model corresponding to a preset segmentation template; matching the corpus to be segmented with the segmentation model according to a preset matching algorithm, and acquiring a target phrase satisfying a first preset rule in the corpus to be segmented; modifying an emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase; and performing a word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire a first segmentation result; wherein modifying the emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase comprises: acquiring the emission matrix corresponding to the segmentation model and the corpus to be segmented; determining a modifying parameter corresponding to a Chinese character target phrase; and modifying a weight corresponding to the Chinese character in the emission matrix according to the modifying parameter. 2. The word segmentation method according to claim 1 , wherein determining the modifying parameter corresponding to the Chinese character in the target phrase comprises: determining a label of the modifying parameter corresponding to the Chinese character according to a location of the Chinese character in the target phrase, in which the label comprises a head part, an intermediate part, a trailing part and a single word phrase; and determining a value of the modifying parameter corresponding to the Chinese character according to a preset value. 3. The word segmentation method according to claim 2 , wherein modifying the weight corresponding to the Chinese character in the emission matrix according to the modifying parameter comprises: determining the weight to be modified of the Chinese character in the emission matrix according to the label; and modifying the weight to be modified according to the value of the modifying parameter. 4. The word segmentation method according to claim 3 , wherein modifying the weight to be modified according to the value of the modifying parameter comprises: performing a summation on the value of the modifying parameter and a value of the weight to be modified, and determining a result of the summation as a modified value of the weight to be modified. 5. The word segmentation method according to claim 1 , wherein before acquiring the corpus to be segmented and the segmentation model corresponding to the preset segmentation template, the method comprises: performing the word segmentation on the preset segmentation template using an original segmentation model, to acquire a second segmentation result; comparing the second segmentation result with the preset segmentation template according to a second preset rule, to acquire an update parameter; modifying the original segmentation model according to the update parameter to acquire a modified segmentation model; and generating the segmentation model corresponding to the preset segmentation template according to the modified segmentation model and the preset segmentation template. 6. The word segmentation method according to claim 1 , wherein performing the word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire the first segmentation result comprises: acquiring a transfer matrix corresponding to the segmentation model; and performing a Markov decoding on the transfer matrix and the emission matrix modified, to acquire the first segmentation result. 7. A server, comprising: one or more processors; and a memory configured to store one or more programs; wherein, when the one or more programs are executed by the one or more processors, the one or more processors are configured to execute the word segmentation method based on artificial intelligence, comprising: acquiring a corpus to be segmented and a segmentation model corresponding to a preset segmentation template; matching the corpus to be segmented with the segmentation model according to a preset matching algorithm, and acquiring a target phrase satisfying a first preset rule in the corpus to be segmented; modifying an emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase; and performing a word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire a first segmentation result; wherein modifying the emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase comprises: acquiring the emission matrix corresponding to the segmentation model and the corpus to be segmented; determining a modifying parameter corresponding to a Chinese character in the target phrase; and modifying a weight corresponding to the Chinese character in the emission matrix according to the modifying parameter. 8. The server according to claim 7 , wherein determining the modifying parameter corresponding to the Chinese character in the target phrase comprises: determining a label of the modifying parameter corresponding to the Chinese character according to a location of the Chinese character in the target phrase, in which the label comprises a head part, an intermediate part, a trailing part and a single word phrase; and determining a value of the modifying parameter corresponding to the Chinese character according to a preset value. 9. The server according to claim 8 , wherein modifying the weight corresponding to the Chinese character in the emission matrix according to the modifying parameter comprises: determining the weight to be modified of the Chinese character in the emission matrix according to the label; and modifying the weight to be modified according to the value of the modifying parameter. 10. The server according to claim 9 , wherein modifying the weight to be modified according to the value of the modifying parameter comprises: performing a summation on the value of the modifying parameter and a value of the weight to be modified, and determining a result of the summation as a modified value of the weight to be modified. 11. The server according to claim 7 , wherein before acquiring the corpus to be segmented and the segmentation model corresponding to the preset segmentation template, the method comprises: performing the word segmentation on the preset segmentation template using an original segmentation model, to acquire a second segmentation result; comparing the second segmentation result with the preset segmentation template according to a second preset rule, to acquire an update parameter; modifying the original segmentation model according to the update parameter to acquire a modified segmentation model; and generating the segmentation model corresponding to the preset segmentation template according to the modified segmentation model and the preset segmentation template. 12. The server according to claim 7 , wherein performing the word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire the first segmentation result comprises: acquiring a transfer matrix corresponding to the segmentation model; and performing a Markov decoding on the transfer matrix and the emission matrix modified, to acquire the first segmentation result. 13. A non-transitory storage medium comprising computer executable instructions, wherein when the computer executable instructions are executed by a computer processor, the word s
Grammatical analysis; Style critique · CPC title
Processing of non-Latin text (kana-to-kanji conversion G06F40/129; vowelisation G06F40/232) · CPC title
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.