Automatic generation of domain models for virtual personal assistants

US9886950B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9886950-B2
Application numberUS-201414479949-A
CountryUS
Kind codeB2
Filing dateSep 8, 2014
Priority dateSep 8, 2013
Publication dateFeb 6, 2018
Grant dateFeb 6, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Technologies for automatic domain model generation include a computing device that accesses an n-gram index of a web corpus. The computing device generates a semantic graph of the web corpus for a relevant domain using the n-gram index. The semantic graph includes one or more related entities that are related to a seed entity. The computing device performs similarity discovery to identify and rank contextual synonyms within the domain. The computing device maintains a domain model including intents representing actions in the domain and slots representing parameters of actions or entities in the domain. The computing device performs intent discovery to discover intents and intent patterns by analyzing the web corpus using the semantic graph. The computing device performs slot discovery to discover slots, slot patterns, and slot values by analyzing the web corpus using the semantic graph. Other embodiments are described and claimed.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computing device for domain model creation, the computing device comprising: a web corpus module to access an n-gram index of a web corpus, wherein the web corpus includes a plurality of entities, wherein the n-gram index is indicative of a plurality of n-grams, wherein each n-gram comprises a predetermined number n of consecutive entities in the web corpus, and wherein the n-gram index is further indicative of a plurality of entities of each n-gram and a frequency of each n-gram; a semantic graph module to generate a semantic graph of the web corpus using the n-gram index of the web corpus, wherein the semantic graph is rooted by a predefined seed entity and includes a first plurality of related entities, wherein each of the first plurality of related entities is grammatically related to the seed entity and each of the first plurality of related entities is included in a corresponding n-gram of the web corpus that also includes the seed entity, and wherein to generate the semantic graph comprises to: retrieve a first plurality of n-grams from the web corpus using the n-gram index, wherein each of the first plurality of n-grams includes the seed entity; tag each entity of the first plurality of n-grams for part-of-speech; and identify a grammatical relationship between the seed entity and each of the first plurality of related entities in response to tagging of each entity, wherein each of the first plurality of related entities is included in the first plurality of n-grams; a similarity discovery module to analyze the web corpus using the semantic graph to identify and rank contextual synonyms for entities within a domain, wherein the semantic graph is further expanded using the ranked contextual synonyms; an intent discovery module to analyze the web corpus using the semantic graph to identify intents and intent patterns in the domain, wherein each intent is associated with a domain action, and each intent pattern matches query features and a corresponding intent; and a slot discovery module to analyze the web corpus using the semantic graph to identify slots, slot patterns, and slot values in the domain, wherein each slot is associated with a parameter of an intent or an entity, each slot pattern matches query features and a corresponding slot, and each slot value is associated with an entity. 2. The computing device of claim 1 , wherein to generate the semantic graph comprises to: score each of the first plurality of related entities. 3. The computing device of claim 2 , wherein to score each of the first plurality of related entities comprises to: determine a first number of n-grams in the first plurality of n-grams; determine a second number of n-grams in the first plurality of n-grams that each include a related entity of the first plurality of related entities; and determine a web relation frequency as a function of a frequency of the second number of n-grams in the first number of n-grams. 4. The computing device of claim 2 , wherein to score each of the first plurality of related entities comprises to calculate an indicative segment frequency in the web corpus and a normalized indicative segment frequency in the web corpus for the corresponding related entity. 5. The computing device of claim 4 , wherein to calculate the indicative segment frequency and the normalized indicative segment frequency comprises to: identify a plurality of segments including the corresponding related entity, wherein each segment comprises a shortest part of an n-gram of the first plurality of n-grams that includes the seed entity and the corresponding related entity; and identify a most common segment of the plurality of segments as the indicative segment of the corresponding related entity. 6. The computing device of claim 5 , wherein to calculate the normalized indicative segment frequency comprises to: determine a probable frequency of occurrence in the web corpus of the entities of the indicative segment of the corresponding related entity; and divide the indicative segment frequency of the corresponding related entity by the probable frequency of occurrence. 7. The computing device of claim 1 , wherein to analyze the web corpus using the semantic graph to identify and rank contextual synonyms for entities within the domain comprises to: select related entities of the first plurality of related entities having a highest indicative segment normalized frequency as anchor entities; retrieve anchor n-grams from the web corpus, wherein each anchor n-gram includes the seed entity and an anchor entity; replace the seed entity of each anchor n-gram with a placeholder; retrieve candidate n-grams from the web corpus, wherein each candidate n-gram matches an anchor n-gram; identify entities of the candidate n-grams matching the placeholder of the corresponding anchor n-gram as similarity candidates; and score each of the similarity candidates based on similarity to the seed entity. 8. The computing device of claim 7 , wherein to score each of the similarity candidates comprises to: generate a contextual similarity score for the corresponding similarity candidate based on contextual features; generate a linguistic similarity score for the corresponding similarity candidate based on linguistic features; and determine a similarity score for the corresponding similarity candidate as a function of the corresponding contextual similarity score and the corresponding linguistic similarity score. 9. The computing device of claim 1 , further comprising a domain model module to add the intents, intent patterns, slots, slot patterns, and slot values to a domain model, wherein the domain model includes known intents, intent patters, slots, and slot patterns associated with the domain and an ontology including known slot values associated with the domain. 10. The computing device of claim 9 , wherein to analyze the web corpus using the semantic graph to identify the intents and the intent patterns in the domain comprises to: score a first plurality of verbs of the first plurality of related entities of the semantic graph by a number of group unique n-grams and an indicative segment normalized frequency of the corresponding verb; identify one or more unknown verbs of the first plurality of verbs, wherein each of the unknown verbs does not match an intent pattern of the domain model; determine a similarity score for each pair of an unknown verb and a verb of the intent patterns of the domain model; identify one or more similar verbs of the unknown verbs as a function of the corresponding similarity score for the unknown verb and the verb of the intent patterns of the domain model; generate, for each similar verb of the one or more similar verbs, a new intent pattern for the intent of the corresponding intent pattern of the domain model; cluster one or more remaining verbs of the unknown verbs to generate clusters of remaining verbs, wherein each of the remaining verbs is not a similar verb; generate, for each cluster of remaining verbs, an intent; and generate, for each remaining verb of the clusters of remaining verbs, an intent pattern associated with the intent for the corresponding cluster of remaining verbs. 11. The computing device of claim 9 , wherein to analyze the web corpus using the semantic graph to identify the slot values in the domain comprises to: score a first plurality of modifiers of the first plurality of related entities of the semantic graph by a number of group unique n-grams and an indicative segment normalized frequency; identify one or more known modifiers of the first plurality of modifiers, wherein each of the known modifiers matc

Assignees

Inventors

Classifications

  • Lexical analysis, e.g. tokenisation or collocates · CPC title

  • Semantic analysis · CPC title

  • Phrasal analysis, e.g. finite state techniques or chunking · CPC title

  • Creation of semantic tools, e.g. ontology or thesauri · CPC title

  • G10L15/197Primary

    Probabilistic grammars, e.g. word n-grams · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9886950B2 cover?
Technologies for automatic domain model generation include a computing device that accesses an n-gram index of a web corpus. The computing device generates a semantic graph of the web corpus for a relevant domain using the n-gram index. The semantic graph includes one or more related entities that are related to a seed entity. The computing device performs similarity discovery to identify and r…
Who is the assignee on this patent?
Intel Corp
What technology area does this patent fall under?
Primary CPC classification G10L15/197. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Feb 06 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 7 related publications on this page (citations in our corpus or others sharing the same primary CPC).