Word segmentation method based on artificial intelligence, server and storage medium

US10650096B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10650096-B2
Application numberUS-201815934410-A
CountryUS
Kind codeB2
Filing dateMar 23, 2018
Priority dateJun 14, 2017
Publication dateMay 12, 2020
Grant dateMay 12, 2020

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Embodiments of the present disclosure disclose a word segmentation method based on artificial intelligence, a server and a storage medium. The word segmentation method may include: acquiring a corpus to be segmented and a segmentation model corresponding to a preset segmentation template; matching the corpus to be segmented with the segmentation model according to a preset matching algorithm, and acquiring a target phrase satisfying a first preset rule in the corpus to be segmented; modifying an emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase; and performing a word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire a first segmentation result.

First claim

Opening claim text (preview).

What is claimed is: 1. A word segmentation method based on artificial intelligence, comprising: acquiring a corpus to be segmented and a segmentation model corresponding to a preset segmentation template; matching the corpus to be segmented with the segmentation model according to a preset matching algorithm, and acquiring a target phrase satisfying a first preset rule in the corpus to be segmented; modifying an emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase; and performing a word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire a first segmentation result; wherein modifying the emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase comprises: acquiring the emission matrix corresponding to the segmentation model and the corpus to be segmented; determining a modifying parameter corresponding to a Chinese character target phrase; and modifying a weight corresponding to the Chinese character in the emission matrix according to the modifying parameter. 2. The word segmentation method according to claim 1 , wherein determining the modifying parameter corresponding to the Chinese character in the target phrase comprises: determining a label of the modifying parameter corresponding to the Chinese character according to a location of the Chinese character in the target phrase, in which the label comprises a head part, an intermediate part, a trailing part and a single word phrase; and determining a value of the modifying parameter corresponding to the Chinese character according to a preset value. 3. The word segmentation method according to claim 2 , wherein modifying the weight corresponding to the Chinese character in the emission matrix according to the modifying parameter comprises: determining the weight to be modified of the Chinese character in the emission matrix according to the label; and modifying the weight to be modified according to the value of the modifying parameter. 4. The word segmentation method according to claim 3 , wherein modifying the weight to be modified according to the value of the modifying parameter comprises: performing a summation on the value of the modifying parameter and a value of the weight to be modified, and determining a result of the summation as a modified value of the weight to be modified. 5. The word segmentation method according to claim 1 , wherein before acquiring the corpus to be segmented and the segmentation model corresponding to the preset segmentation template, the method comprises: performing the word segmentation on the preset segmentation template using an original segmentation model, to acquire a second segmentation result; comparing the second segmentation result with the preset segmentation template according to a second preset rule, to acquire an update parameter; modifying the original segmentation model according to the update parameter to acquire a modified segmentation model; and generating the segmentation model corresponding to the preset segmentation template according to the modified segmentation model and the preset segmentation template. 6. The word segmentation method according to claim 1 , wherein performing the word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire the first segmentation result comprises: acquiring a transfer matrix corresponding to the segmentation model; and performing a Markov decoding on the transfer matrix and the emission matrix modified, to acquire the first segmentation result. 7. A server, comprising: one or more processors; and a memory configured to store one or more programs; wherein, when the one or more programs are executed by the one or more processors, the one or more processors are configured to execute the word segmentation method based on artificial intelligence, comprising: acquiring a corpus to be segmented and a segmentation model corresponding to a preset segmentation template; matching the corpus to be segmented with the segmentation model according to a preset matching algorithm, and acquiring a target phrase satisfying a first preset rule in the corpus to be segmented; modifying an emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase; and performing a word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire a first segmentation result; wherein modifying the emission matrix corresponding to the segmentation model and the corpus to be segmented according to the target phrase comprises: acquiring the emission matrix corresponding to the segmentation model and the corpus to be segmented; determining a modifying parameter corresponding to a Chinese character in the target phrase; and modifying a weight corresponding to the Chinese character in the emission matrix according to the modifying parameter. 8. The server according to claim 7 , wherein determining the modifying parameter corresponding to the Chinese character in the target phrase comprises: determining a label of the modifying parameter corresponding to the Chinese character according to a location of the Chinese character in the target phrase, in which the label comprises a head part, an intermediate part, a trailing part and a single word phrase; and determining a value of the modifying parameter corresponding to the Chinese character according to a preset value. 9. The server according to claim 8 , wherein modifying the weight corresponding to the Chinese character in the emission matrix according to the modifying parameter comprises: determining the weight to be modified of the Chinese character in the emission matrix according to the label; and modifying the weight to be modified according to the value of the modifying parameter. 10. The server according to claim 9 , wherein modifying the weight to be modified according to the value of the modifying parameter comprises: performing a summation on the value of the modifying parameter and a value of the weight to be modified, and determining a result of the summation as a modified value of the weight to be modified. 11. The server according to claim 7 , wherein before acquiring the corpus to be segmented and the segmentation model corresponding to the preset segmentation template, the method comprises: performing the word segmentation on the preset segmentation template using an original segmentation model, to acquire a second segmentation result; comparing the second segmentation result with the preset segmentation template according to a second preset rule, to acquire an update parameter; modifying the original segmentation model according to the update parameter to acquire a modified segmentation model; and generating the segmentation model corresponding to the preset segmentation template according to the modified segmentation model and the preset segmentation template. 12. The server according to claim 7 , wherein performing the word segmentation on the corpus to be segmented according to the emission matrix modified, to acquire the first segmentation result comprises: acquiring a transfer matrix corresponding to the segmentation model; and performing a Markov decoding on the transfer matrix and the emission matrix modified, to acquire the first segmentation result. 13. A non-transitory storage medium comprising computer executable instructions, wherein when the computer executable instructions are executed by a computer processor, the word s

Assignees

Inventors

Classifications

  • Grammatical analysis; Style critique · CPC title

  • G06F40/53Primary

    Processing of non-Latin text (kana-to-kanji conversion G06F40/129; vowelisation G06F40/232) · CPC title

  • G06F40/289Primary

    Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Physics · mapped topic

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10650096B2 cover?
Embodiments of the present disclosure disclose a word segmentation method based on artificial intelligence, a server and a storage medium. The word segmentation method may include: acquiring a corpus to be segmented and a segmentation model corresponding to a preset segmentation template; matching the corpus to be segmented with the segmentation model according to a preset matching algorithm, a…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tec, Beijing Baidu Netcom Science And Techonlogy Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06F40/53. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 12 2020 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).