What technology area does this patent fall under?

Primary CPC classification G06F17/30696. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jul 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Entity normalization via name normalization

US9710549B2 · US · B2

Patent metadata
Field	Value
Publication number	US-9710549-B2
Application number	US-201414229774-A
Country	US
Kind code	B2
Filing date	Mar 28, 2014
Priority date	Feb 17, 2006
Publication date	Jul 18, 2017
Grant date	Jul 18, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods for normalizing entities via name normalization are disclosed. In some implementations, a computer-implemented method of identifying duplicate objects in a plurality of objects is provided. Each object in the plurality of objects is associated with one or more facts, and each of the one or more facts having a value. The method includes: using a computer processor to perform: associating facts extracted from web documents with a plurality of objects; and for each of the plurality of objects, normalizing the value of a name fact, the name fact being among one or more facts associated with the object; processing the plurality of objects in accordance with the normalized value of the name facts of the plurality of objects. In some implementations, normalizing the value of the name fact is optionally carried out by applying a group of normalization rules to the value of the name fact.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method of identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attribute and a value, the method comprising using a computer processor to perform: associating facts extracted from web documents with the plurality of objects; for each of the plurality of objects, normalizing a value of a name fact, the name fact being among one or more facts associated with the object; based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact; processing the plurality of objects in a bucket to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and merging the duplicate objects together, the merging including removing one of the duplicate objects from a memory repository. 2. The method of claim 1 , wherein normalizing the value of the name fact comprises: normalizing the value of the name fact by applying a group of normalization rules to the value of the name fact. 3. The method of claim 2 , wherein the group of normalization rules comprises at least one rule selected from one of: removing social titles; and removing predefined adjective words. 4. The method of claim 2 , wherein the group of normalization rules comprises sorting the value of the name fact in alphabetic order. 5. The method of claim 1 , wherein the grouping comprises: generating a signature for each of the plurality of objects based at least in part on the normalized value of the name fact of each of the plurality of objects; and responsive to an identifier of an existing bucket being the same as the signature of an object, adding the object to the existing bucket, otherwise establishing a new bucket including the object, an identifier of the new bucket being same as the signature of the object. 6. The method of claim 1 , wherein processing the plurality of objects includes: applying a matcher to a pair of objects in one of the plurality of buckets. 7. The method of claim 6 , wherein applying the matcher to a pair of objects in one of the plurality of buckets comprises: for each common fact of the pair of objects, determining a similarity of the values of the common fact based on a similarity measure; and determining that the pair of objects are duplicates based on the similarity. 8. The method of claim 7 , wherein determining that the pair of objects are duplicates comprises: determining that the pair of objects are duplicates based on the number of the common facts with similar values and the number of common facts. 9. The method of claim 6 , wherein applying the matcher comprises: applying the matcher to each pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates. 10. The method of claim 6 , further comprising: selecting the matcher from a collection of matchers, wherein applying the matcher includes applying the selected matcher to a pair of objects in one of the plurality of buckets to determine if the pair of objects are duplicates. 11. A system for identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attribute and a value, the system comprising: memory; one or more processors; and one or more programs stored in the memory and configured for execution by the one or more processors, the one or more programs including instructions for: associating facts extracted from web documents with the plurality of objects; for each of the plurality of objects, normalizing a value of a name fact, the name fact being among one or more facts associated with the object; based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact; processing the plurality of objects in a bucket to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and merging the duplicate objects together, the merging including removing one of the duplicate objects from a memory repository. 12. The system of claim 11 , wherein normalizing the value of the name fact comprises: normalizing the value of the name fact by applying a group of normalization rules to the value of the name fact. 13. The system of claim 12 , wherein the group of normalization rules comprises at least one rule selected from one of: removing social titles; and removing predefined adjective words. 14. The system of claim 12 , wherein the group of normalization rules comprises sorting the value of the name fact in alphabetic order. 15. A non-transitory computer readable storage medium storing one or more programs for identifying duplicate objects in a plurality of objects, wherein each object in the plurality of objects is associated with one or more facts, and each of the one or more facts has an attribute and a value, the one or more programs comprising instructions for: associating facts extracted from web documents with the plurality of objects; for each of the plurality of objects, normalizing a value of a name fact, the name fact being among one or more facts associated with the object; based on the normalized values of the name facts, grouping the plurality of objects into a plurality of buckets, each object in a bucket having the same normalized value of a name fact; processing the plurality of objects in a bucket to identify at least one pair of duplicate objects in the plurality of objects in the bucket, based on a similarity of values of facts other than the name fact for the objects in the bucket; and merging the duplicate objects together, the merging including removing one of the duplicate objects from a memory repository. 16. The non-transitory computer readable storage medium of claim 15 , wherein normalizing the value of the name fact comprises: normalizing the value of the name fact by applying a group of normalization rules to the value of the name fact. 17. The non-transitory computer readable storage medium of claim 16 , wherein the group of normalization rules comprises at least one rule selected from one of: removing social titles; and removing predefined adjective words. 18. The non-transitory computer readable storage medium of claim 16 , wherein the group of normalization rules comprises sorting the value of the name fact in alphabetic order.

Assignees

Google Inc

Inventors

Betz Jonathan T

Classifications

G06F17/30533
Physics · mapped topic
G06F17/30156
Physics · mapped topic
G06F17/30696Primary
Physics · mapped topic
G06F17/30578
Physics · mapped topic
G06F17/30303
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 46325357

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9710549B2 cover?: Systems and methods for normalizing entities via name normalization are disclosed. In some implementations, a computer-implemented method of identifying duplicate objects in a plurality of objects is provided. Each object in the plurality of objects is associated with one or more facts, and each of the one or more facts having a value. The method includes: using a computer processor to perform:…
Who is the assignee on this patent?: Google Inc
What technology area does this patent fall under?: Primary CPC classification G06F17/30696. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jul 18 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).