Automatic annotation for training and evaluation of semantic analysis engines
US-9224103-B1 · Dec 29, 2015 · US
US2016335367A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016335367-A1 |
| Application number | US-201514713152-A |
| Country | US |
| Kind code | A1 |
| Filing date | May 15, 2015 |
| Priority date | May 15, 2015 |
| Publication date | Nov 17, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Web pages that are known to be associated with entities, such as authors, are selected. Documents or other publications that are linked to or referenced by each web page are determined. Based on the authors of each determined document, the authors associated with each web page, and other information such as institutions or venues identified in each document, the various authors associated with the web pages are conflated or disambiguated to determine which authors, while having the same or similar names, should be treated as separate entities, and which authors, while having different names, should be treated as the same entities. Once the entity names have been conflated and disambiguated, they can be linked to social networking data or grant data associated with entities.
Opening claim text (preview).
What is claimed: 1 . A method comprising: identifying a plurality of web pages by a computing device; for each web page, determining a plurality of documents referenced by the web page by the computing device; for each web page, determining an author associated with the web page by the computing device; for each document, determining an author associated with the document by the computing device; for each web page, determining a plurality of name variants for the author associated with the web page using the determined authors associated with the documents referenced by the web page by the computing device; and for each web page, associating the plurality of name variants determined for the author of the web page with the author of the web page by the computing device. 2 . The method of claim 1 , further comprising for each document, associating the document with the determined author of the web page that referenced the document. 3 . The method of claim 1 , further comprising, for each document, determining information comprising one or more of a venue, field of study, event, or institution associated with the document, and associating the determined information with the author determined for the web page that referenced the document. 4 . The method of claim 3 , further comprising: receiving a query comprising one or more terms; and based on the one or more terms of the query and the plurality of name variants associated with each determined author of each web page, presenting indicators of one or more of the authors associated with the web pages in response to the query along with the determined information associated with the indicated one or more authors. 5 . The method of claim 3 , further comprising generating a graph using the determined information and the plurality of name variants associated with each determined author associated with each web page. 6 . The method of claim 1 , further comprising: for each web page, determining social networking data associated with the determined author of the web page based on the plurality of name variants associated with the author of the web page; for each web page, determining one or more institutions associated with the determined author of the web page and a date associated with each of the determined one or more institutions based on the social networking data associated with the determined author of the web page; and for each web page, associating the determined one or more institutions and associated dates with the determined author of the web page. 7 . The method of claim 1 , wherein identifying a plurality of web pages comprises identifying web pages with URLs that begin with a prefix of a plurality of prefixes, or identifying web pages that include one or more keywords of a plurality of keywords. 8 . The method of claim 1 , wherein the documents are academic publications. 9 . The method of claim 1 , further comprising: receiving grant data, wherein the grant data is associated with an author; and associating the grant data with a determined author of a web page of the plurality of web pages based on the author associated with the grant data and the plurality of name variants associated with the determined author of the web page. 10 . A method comprising: receiving identifiers of a plurality of web pages by a computing device, wherein each web page is associated with an author; for each web page, determining a plurality of documents referenced by the web page by the computing device, wherein each document is associated with an author; for each web page, determining a plurality of name variants for the author associated with the web page based on the authors associated with the documents referenced by the web page; for each document, determining information comprising one or more of a venue, field of study, or institution associated with the document by the computing device, and associating the determined information with the author determined for the web page that referenced the document; and generating a graph by the computing device, the graph comprising the authors associated with the web pages, the plurality of name variants determined for each author associated with the web pages, and the determined information associated with each author associated with the web pages. 11 . The method of claim 10 , further comprising: receiving a query comprising one or more terms; based on the one or more terms of the query and the graph, determining one or more authors associated with the web pages that are responsive to the one or more terms of the query; and presenting identifiers of the determined one or more authors in response to the query. 12 . The method of claim 10 , further comprising: for each web page, determining social networking data associated with the determined author of the web page based on the plurality of name variants associated with the author of the web page; for each web page, determining one or more institutions associated with the determined author of the web page and a date associated with each of the determined one or more institutions based on the social networking data associated with the determined author of the web page; and for each web page, associating the determined one or more institutions and associated dates with the determined author of the web page. 13 . The method of claim 10 , wherein the document are academic publications. 14 . The method of claim 10 further comprising: receiving grant data, wherein the grant data is associated with an author; and associating the grant data with a determined author of a web page of the plurality of web pages based on the author associated with the grant data and the plurality of name variants associated with the determined author of the web page. 15 . A system comprising: at least one computing device; and an entity disambiguation engine configured to: identify a plurality of web pages, wherein each web page is associated with an entity of a plurality of entities; for each web page of the plurality of web pages, determine a plurality of documents referenced by the web page, wherein each web page is associated with an entity of the plurality of entities; for each web page of the plurality of web pages, determine one or more entities of the plurality of entities that is the same entity as the entity associated with the web page based on the entities associated with the plurality of documents referenced by the web page; and for each web page of the plurality of web pages, associate the entity associated with the web page with identifiers of the one or more entities that are the same entity. 16 . The system of claim 15 , wherein the entities comprise one or more of authors, fields of study, institutions, events, or venues. 17 . The system of claim 15 , wherein the entity disambiguation engine configured to identify a plurality of web pages comprises the entity disambiguation engine configured to identify web pages with URLs that begin with a prefix of a plurality of prefixes, or identify web pages that include one or more keywords of a plurality of keywords. 18 . The system of claim 15 , wherein the entity disambiguation engine is further configured to: receive grant data, wherein the grant data is associated with an entity of the plurality of entities; and associate the grant data with an entity associated with a web page of the plurality of web pages based on the entity associated with the grant data and the identified one or more entities associated with
Indexing; Web crawling techniques · CPC title
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.