Intention identification method, intention identification apparatus, and computer-readable recording medium

US11468233B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11468233-B2
Application numberUS-202016750182-A
CountryUS
Kind codeB2
Filing dateJan 23, 2020
Priority dateJan 29, 2019
Publication dateOct 11, 2022
Grant dateOct 11, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An intention identification method includes generating a heterogeneous text network based on a language material sample; using a graph embedding algorithm to perform learning with respect to the heterogeneous text network and obtain a vector representation of the language material sample and a word, and determining keywords of the language material sample based on a similarity in terms of a vector between the language material sample and the word in the language material sample; training an intention identification model until a predetermined training termination condition is satisfied, by using the keywords of the language material samples, and obtaining the trained intention identification model; and receiving a language material query, and using the trained intention identification model to identify an intention of the language material query.

First claim

Opening claim text (preview).

What is claimed is: 1. An intention identification method comprising: generating a heterogeneous text network based on a plurality of language material samples that include a plurality of labeled language materials to which an intention has been labeled, and a plurality of unlabeled language materials to which an intention has not been labeled, wherein the heterogeneous text network includes a first co-occurrence relationship indicating that a word occurs in a language material sample from among the plurality of language material samples, and a second co-occurrence relationship indicating that two or more words appear in the language material sample; using a graph embedding algorithm to perform learning with respect to the heterogeneous text network, obtain a vector representation of the language material sample and the word, and determining keywords of the language material sample based on a similarity in terms of a vector between the language material sample and the word in the language material sample; training an intention identification model until a predetermined training termination condition is satisfied, the intention identification model being one or more intention identification classifiers that include a plurality of different language levels, wherein the training of the intention identification model includes matching the keywords of each language material sample of the plurality of the language material samples to a language level of the one or more intention identification classifiers; receiving a language material query; and identifying an intention of the received language material query using the trained intention identification model. 2. The intention identification method according to claim 1 , wherein the training of the intention identification model includes: training an intention identification classifier of the one or more intention identification classifiers by using the keywords of each language material sample of the plurality of labeled language materials; terminating the training upon detecting that the predetermined training termination condition is satisfied, or predicting an intention and a prediction reliability of the plurality of unlabeled language materials by using a plurality of the trained intention identification classifiers upon detecting that the predetermined training termination condition is not satisfied; acquiring a probability distribution of vectors of the plurality of unlabeled language materials, selecting, from the plurality of unlabeled language materials, a target language material for which the prediction reliability is greater than a predetermined first threshold and for which a probability corresponding to a feature vector is less than a predetermined second threshold, and labeling an intention to the target language material based on the intention and the prediction reliability that have been predicted; and deleting the target language material from the plurality of unlabeled language materials, adding the target language material to the plurality of labeled language materials, returning to using, a feature vector of the plurality of labeled language materials, and training the intention identification classifier. 3. The intention identification method according to claim 2 , wherein the training of the intention identification classifier includes: converting the keywords of the plurality of labeled language materials into an input sequence of the language levels of the intention identification classifier based on the language levels of the intention identification classifier, inputting the input sequence to the intention identification classifier, and training the intention identification classifier, wherein when the language levels are word levels, the input sequence is a sequence of the keywords in the plurality of labeled language materials, when the language levels are character levels, the input sequence is a sequence of characters obtained by dividing the keywords in the plurality of labeled language materials, and when the language levels are phrase levels, the input sequence is an order of phrases in the plurality of labeled language materials, and the phrases are formed by the keywords whose positional relationships in the plurality of labeled language materials satisfy a predetermined condition. 4. The intention identification method according to claim 1 , wherein the generating of the heterogeneous text network based on the language material sample includes: performing a character string preprocess with respect to the language material sample and obtaining the language material sample that has undergone the character string preprocess, the character string preprocess including data cleaning, stop word, an error correction process, and a stemming process; extracting a word in a language material text, which is obtained by processing the language material sample, and establishing the first co-occurrence relationship, and extracting two words present in the same language material text and establishing the second co-occurrence relationship; and generating the heterogeneous text network including the first co-occurrence relationship and the second co-occurrence relationship. 5. The intention identification method according to claim 1 , wherein the determining of the keywords of the language material sample based on the similarity in terms of the vector between the language material sample and the word in the language material sample includes: calculating, the similarity in terms of the vector between the language material sample and the word in the language material sample; and selecting a predetermined number of words for which the similarity in terms of the vector is maximum, and determining the selected words as the keywords of the language material sample. 6. The intention identification method according to claim 1 , wherein the language levels include at least two levels among a character level, a word level, and a phrase level. 7. A non-transitory computer-readable recording medium storing a computer program, wherein the intention identification method according to claim 1 is executed by having a processor execute the computer program. 8. An intention identification apparatus comprising: a processor; and a memory storing program instructions that cause the processor to implement: a text network generator configured to-generate a heterogeneous text network based on a plurality of language material samples that include a plurality of labeled language materials to which an intention has already been labeled, and a plurality of unlabeled language materials to which an intention has not been labeled, wherein  the heterogeneous text network includes a first co-occurrence relationship indicating that a word occurs in a language material sample from among the plurality of language material samples, and a second co-occurrence relationship indicating that two or more words appear in the language material sample; a vector generator configured to-use a graph embedding algorithm to perform learning with respect to the heterogeneous text network, obtain a vector representation of the language material sample and the word, and determine keywords of the language material sample based on a similarity in terms of a vector between the language material sample and the word in the language material sample; a model trainer configured to train an intention identification model until a predetermined training termination condition is satisfied the intention identification model being one or more intention identification classifiers that include a plurality of different language levels, wherein the training of the intention identification mode

Assignees

Inventors

Classifications

  • G06F40/216Primary

    using statistical methods · CPC title

  • Natural language query formulation · CPC title

  • Natural language generation · CPC title

  • Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars · CPC title

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11468233B2 cover?
An intention identification method includes generating a heterogeneous text network based on a language material sample; using a graph embedding algorithm to perform learning with respect to the heterogeneous text network and obtain a vector representation of the language material sample and a word, and determining keywords of the language material sample based on a similarity in terms of a vec…
Who is the assignee on this patent?
Liang Liang, Ding Lei, Dong Bin, and 3 more
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 11 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).