Weighting dictionary entities for language understanding models

US9519870B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9519870-B2
Application numberUS-201414207986-A
CountryUS
Kind codeB2
Filing dateMar 13, 2014
Priority dateMar 13, 2014
Publication dateDec 13, 2016
Grant dateDec 13, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A dictionary used by a spoken language understanding (SLU) system is improved by providing weightings for entities in the dictionary that represent the likelihood each entity belongs to an entity class represented by the dictionary. A classifier model may be trained using a seed list containing sample entities that belong in the entity class and a background entity list containing samples that do not belong in the entity class. Clicked URLs from search logs, search result URLs, and attributes from an entity graph may be used as features of the sample entities to train the classifier model. The classifier model may be used to weight entities from a candidate dictionary. The entity weightings are used to generate an improved dictionary for use in the SLU system.

First claim

Opening claim text (preview).

What is claimed is: 1. One or more computer storage media storing computer-executable instructions that, when executed by one or more computing devices, cause the one or more computing devices to perform operations comprising: accessing a seed list containing positive sample entities that belong to an entity class; accessing a background entity list containing negative sample entities that do not belong to the entity class; identifying clicked URLs from search click logs for at least a portion of the positive sample entities and negative sample entities; identifying search result URLs for at least a portion of the positive sample entities and negative sample entities; identifying attributes from an entity graph for at least a portion of the positive sample entities and negative sample entities; training a classifier model using the clicked URLs, search result URLs, and attributes from the entity graph as features of the positive sample entities and negative sample entities; and using the classifier model to weight entities in a candidate dictionary to provide weightings for the entities from the candidate dictionary. 2. The one or more computer storage media of claim 1 , wherein the seed list is generated based on information from at least one selected from the following: an existing entity graph and training data from a spoken language understanding system. 3. The one or more computer storage media of claim 1 , wherein the background entity list is generated based on information from at least one selected from the following: an existing entity graph and training data from a spoken language understanding system. 4. The one or more computer storage media of claim 1 , wherein URL search results are obtained for positive sample entities and negative sample entities for which no clicked URLs are available from the search click logs. 5. The one or more computer storage media of claim 1 , wherein the classifier model is trained using at least one selected from the following: logistic regression and support vector machines. 6. The one or more computer storage media of claim 1 , wherein the operations further comprise: using the weightings to generate an improved dictionary that does not include a subset of entities that do not satisfy a weighting threshold; and using the improved dictionary in a spoken language understanding system to process a user input. 7. The one or more computer storage media of claim 1 , wherein the operations further comprise: creating a weighted dictionary using the weightings; and employing the weighted dictionary in a spoken language understanding system to process a user input. 8. The one or more computer storage media of claim 1 , wherein the operations further comprise: clustering entities based on the weightings; and generating a set of clustered dictionaries, each clustered dictionary including a cluster of entities. 9. The one or more computer storage media of claim 8 , wherein the operations further comprise: employing the set of clustered dictionaries in a spoken language understanding system to process a user input. 10. A computer-implemented method comprising: accessing positive sample entities that belong to an entity class and negative sample entities that do not belong to the entity class; identifying clicked URLs from search click logs for at least a portion of the positive sample entities and negative sample entities; identifying search result URLs for at least a portion of the positive sample entities and negative sample entities; identifying attributes from an entity graph for at least a portion of the positive sample entities and negative sample entities; training, using a computing device, a classifier model using the clicked URLs, search result URLs, and attributes from the entity graph as features of the positive sample entities and negative sample entities; and employing the classifier model to weight entities in a candidate dictionary to provide weightings for the entities from the candidate dictionary. 11. The method of claim 10 , wherein the positive sample entities and the negative sample entities are identified based on information from at least one selected from the following: an existing entity graph and training data from a spoken language understanding system. 12. The method of claim 10 , wherein URL search results are obtained for positive sample entities and negative sample entities for which no clicked URLs are available from the search click logs. 13. The method of claim 10 , wherein the method further comprises: employing the weightings to generate at least one improved dictionary for use in a spoken language understanding system to process a user input. 14. The method of claim 10 , wherein the method further comprises: clustering entities based on weighting; and generating a set of clustered dictionaries, each clustered dictionary including a cluster of entities. 15. The method of claim 14 , wherein the method further comprises: employing the set of clustered dictionaries in a spoken language understanding system to process a user input. 16. A computerized system comprising: one or more processors; and a plurality of components that include computer-executable instructions that are executed by the one or more processors, the components including: a model building component that trains a classifier model for an entity class using positive sample entities that belong to the entity class and negative sample entities that do not belong to the entity class, the model building component also using clicked URLs, search result URLs, and attributes from an entity graph as features of the positive sample entities and negative sample entities to train the classifier model; and a weighting component that employs the classifier model to weight entities in a candidate dictionary to provide weightings for the entities from the candidate dictionary. 17. The system of claim 16 , wherein the positive sample entities are from a seed list and the negative sample entities are from a background entity list, and wherein the seed list and background entity list are generated based on information from at least one selected from the following: an existing entity graph and training data from a spoken language understanding system. 18. The system of claim 16 , wherein the weightings for the entities from the candidate dictionary are used to provide at least one improved dictionary. 19. The computerized system of claim 16 , wherein the components further comprise: a clustering component that clusters entities based on the weightings and generates a set of clustered dictionaries, each clustered dictionary including a cluster of entities.

Assignees

Inventors

Classifications

  • G06N99/005Primary

    Physics · mapped topic

  • G06N20/00Primary

    Machine learning · CPC title

  • G06F16/313Primary

    Selection or weighting of terms for indexing · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9519870B2 cover?
A dictionary used by a spoken language understanding (SLU) system is improved by providing weightings for entities in the dictionary that represent the likelihood each entity belongs to an entity class represented by the dictionary. A classifier model may be trained using a seed list containing sample entities that belong in the entity class and a background entity list containing samples that …
Who is the assignee on this patent?
Microsoft Corp, Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06N99/005. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 13 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).