Exploiting structured content for unsupervised natural language semantic parsing

US10235358B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-10235358-B2
Application numberUS-201313773269-A
CountryUS
Kind codeB2
Filing dateFeb 21, 2013
Priority dateFeb 21, 2013
Publication dateMar 19, 2019
Grant dateMar 19, 2019

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Structured web pages are accessed and parsed to obtain implicit annotation for natural language understanding tasks. Search queries that hit these structured web pages are automatically mined for information that is used to semantically annotate the queries. The automatically annotated queries may be used for automatically building statistical unsupervised slot filling models without using a semantic annotation guideline. For example, tags that are located on a structured web page that are associated with the search query may be used to annotate the query. The mined search queries may be filtered to create a set of queries that is in a form of a natural language query and/or remove queries that are difficult to parse. A natural language model may be trained using the resulting mined queries. Some queries may be set aside for testing and the model may be adapted using in-domain sentences that are not annotated. The models may be tested using these implicitly annotated natural-language-like queries in an unsupervised fashion.

First claim

Opening claim text (preview).

What is claimed is: 1. A method for natural language semantic parsing, comprising: accessing structured content including structured web pages; parsing a semantic structure identified in the structured content to identify entities linked by a relationship, wherein each entity has a respective tag; mining a plurality of natural language search queries that can access the structured content to identify, from the plurality of natural language search queries, at least one natural language search query that includes at least one of the entities; and automatically annotating the at least one natural language search query using the respective tag. 2. The method of claim 1 , further comprising building an unsupervised slot filling model using the at least one natural language search query annotated in the automatically annotating. 3. The method of claim 2 , further comprising adapting the unsupervised slot filling model using in-domain unannotated sentences. 4. The method of claim 3 , further comprising testing a performance of the model based on the at least one natural language search query. 5. The method of claim 1 , wherein the structured content is defined by a triple that consists of two entities linked by a relation. 6. The method of claim 1 , further comprising filtering the natural language search queries by removing at least a portion of the natural language search queries that have un-annotated stopwords. 7. The method of claim 1 , wherein the respective tag of each entity is included in at least one of the structured web pages. 8. A computer-readable storage device storing computer-executable instructions that perform a method when executed, the method comprising: accessing structured content including structured web pages; parsing the structured content to identify two entities linked by a relationship, wherein each entity has a respective tag; mining a plurality of natural language search queries that can access the structured content to identify, from the plurality of natural language search queries, at least one natural language search query that includes at least one of the two entities; automatically annotating the at least one natural language search query to form at least one annotated natural language search query; and creating an understanding model including slots using the at least one natural language search query annotated in the automatically annotating. 9. The computer-readable storage device of claim 8 , wherein the understanding model is created in an unsupervised manner. 10. The computer-readable storage device of claim 8 , wherein the method further comprises testing a performance of the model based on the at least one natural language search query. 11. The computer-readable storage device of claim 8 , wherein the structured content is defined by a triple that consists of two entities linked by a relation. 12. The computer-readable storage device of claim 8 , further comprising filtering the natural language search queries by removing natural language search queries that have un-annotated non-stopwords. 13. A system for natural language semantic parsing, comprising: a processor and memory; an operating environment executing using the processor; and a knowledge manager that is configured to perform actions comprising: accessing structured content including structured web pages; parsing the structured content to identify two entities linked by a relationship, wherein each of the entities has a respective tag; mining a plurality of natural language search queries that can access the structured content to identify, from the plurality of natural language search queries, at least one natural language search query that includes at least one of the two entities; automatically annotating the at least one natural language search query using the respective tags; and creating an understanding model including slots using the at least one natural language search query annotated in the automatically annotating. 14. The system of claim 13 , wherein the understanding model is created in an unsupervised manner. 15. The system of claim 13 , further comprising testing a performance of the model based on the at least one natural language query. 16. The system of claim 13 , wherein the structured content includes multiple triples that consist of two entities linked by a relation. 17. The system of claim 13 , further comprising filtering the natural language search queries by removing some of the natural language search queries that have an un-annotated non-stopword. 18. The system of claim 13 , wherein the structured web pages include a semantic web. 19. The system of claim 13 , wherein the structured content is defined by a triple that consists of two entities linked by a relation.

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10235358B2 cover?
Structured web pages are accessed and parsed to obtain implicit annotation for natural language understanding tasks. Search queries that hit these structured web pages are automatically mined for information that is used to semantically annotate the queries. The automatically annotated queries may be used for automatically building statistical unsupervised slot filling models without using a se…
Who is the assignee on this patent?
Microsoft Technology Licensing Llc
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 19 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).