What technology area does this patent fall under?

Primary CPC classification G06F16/9014. Mapped technology areas include Physics.

When was this patent published?

Publication date Thu Aug 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Trie-based normalization of field values for matching

US2019236178A1 · US · A1

Patent metadata
Field	Value
Publication number	US-2019236178-A1
Application number	US-201815884732-A
Country	US
Kind code	A1
Filing date	Jan 31, 2018
Priority date	Jan 31, 2018
Publication date	Aug 1, 2019
Grant date	—

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A system tokenizes values stored in a field by multiple records. The system creates a trie from the tokenized values, each branch in the trie labeled with one of the tokenized values, each node storing a count indicating the number of the multiple records associated with a tokenized value sequence beginning from a root of the trie. The system tokenizes a value stored in the field by a prospective record. Beginning from the root of the trie, the system identifies each node corresponding to a token value sequence for the prospective record's tokenized value. Beginning from the most recently identified node for the prospective record's token value sequence, the system identifies each extending node which stores a count that satisfies a threshold, each identified extending node corresponding to another token value sequence. The system uses the other token value sequence to identify one of the multiple records that matches the prospective record.

First claim

Opening claim text (preview).

1 . A system comprising: one or more processors; and a non-transitory computer readable medium storing a plurality of instructions, which when executed, cause the one or more processors to: tokenize, by a database system, values stored in a field by a plurality of records; create, by the database system, a trie from the tokenized values, each branch in the trie being labeled with one of the tokenized values, each node of the trie storing a count indicating a number of the plurality of records associated with a tokenized value sequence beginning from a root of the trie; tokenize, by the database system, a value stored in the field by a prospective record; identify, by the database system, beginning from the root of the trie, each node corresponding to a token value sequence associated with the tokenized value; identify, by the database system, beginning from a most recently identified node corresponding to the token value sequence, each extending node storing a count that is determined to satisfy a threshold, each identified extending node corresponding to another token value sequence; and identify, by the database system, using the other token value sequence, an existing record of the plurality of records that matches the prospective record. 2 . The system of claim 1 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises bypassing a token value in the token value sequence associated with the prospective record. 3 . The system of claim 1 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises bypassing a node that lacks a correspondence to a token value in the token value sequence associated with the prospective record. 4 . The system of claim 3 , wherein bypassing the node comprises identifying a subsequent node based on a transition probability associated with the subsequent node. 5 . The system of claim 1 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises replacing a token value in the token value sequence with a substitute token value. 6 . The system of claim 1 , wherein determining that the count satisfies the threshold comprises generating a ratio by dividing the count by another count that is associated with the most recently identified node, and then determining that the ratio is greater than the threshold. 7 . The system of claim 1 , wherein identifying the existing record that matches the prospective record comprises submitting the match identification for approval by a user. 8 . A computer program product comprising computer-readable program code to be executed by one or more processors when retrieved from a non-transitory computer-readable medium, the program code including instructions to: tokenize, by a database system, values stored in a field by a plurality of records; create, by the database system, a trie from the tokenized values, each branch in the trie being labeled with one of the tokenized values, each node of the trie storing a count indicating a number of the plurality of records associated with a tokenized value sequence beginning from a root of the trie; tokenize, by the database system, a value stored in the field by a prospective record; identify, by the database system, beginning from the root of the trie, each node corresponding to a token value sequence associated with the tokenized value; identify, by the database system, beginning from a most recently identified node corresponding to the token value sequence, each extending node storing a count that is determined to satisfy a threshold, each identified extending node corresponding to another token value sequence; and identify, by the database system, using the other token value sequence, an existing record of the plurality of records that matches the prospective record. 9 . The computer program product of claim 8 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises bypassing a token value in the token value sequence associated with the prospective record. 10 . The computer program product of claim 8 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises bypassing a node that lacks a correspondence to a token value in the token value sequence associated with the prospective record. 11 . The computer program product of claim 10 , wherein bypassing the node comprises identifying a subsequent node based on a transition probability associated with the subsequent node. 12 . The computer program product of claim 8 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises replacing a token value in the token value sequence with a substitute token value. 13 . The computer program product of claim 8 , wherein determining that the count satisfies the threshold comprises generating a ratio by dividing the count by another count that is associated with the most recently identified node, and then determining that the ratio is greater than the threshold. 14 . The computer program product of claim 8 , wherein identifying the existing record that matches the prospective record comprises submitting the match identification for approval by a user. 15 . A method comprising: tokenizing, by a database system, values stored in a field by a plurality of records; creating, by the database system, a trie from the tokenized values, each branch in the trie being labeled with one of the tokenized values, each node of the trie storing a count indicating a number of the plurality of records associated with a tokenized value sequence beginning from a root of the trie; tokenizing, by the database system, a value stored in the field by a prospective record; identifying, by the database system, beginning from the root of the trie, each node corresponding to a token value sequence associated with the tokenized value; identifying, by the database system, beginning from a most recently identified node corresponding to the token value sequence, each extending node storing a count that is determined to satisfy a threshold, each identified extending node corresponding to another token value sequence; and identifying, by the database system, using the other token value sequence, an existing record of the plurality of records that matches the prospective record. 16 . The method of claim 15 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises bypassing a token value in the token value sequence associated with the prospective record. 17 . The method of claim 15 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises bypassing a node that lacks a correspondence to a token value in the token value sequence associated with the prospective record. 18 . The method of claim 17 , wherein bypassing the node comprises identifying a subsequent node based on a transition probability associated with the subsequent node. 19 . The method of claim 15 , wherein identifying each node corresponding to the token value sequence associated with the tokenized value comprises replacing a token value in the token value sequence with a substitute token value. 20 . The method of claim 15 , wherein determining that the count satisfies the threshold compri

Assignees

Salesforce Com Inc

Inventors

Classifications

G06F16/9014Primary
hash tables · CPC title
G06F16/2365Primary
Ensuring data consistency and integrity · CPC title
G06F16/235
Update request formulation · CPC title
G06F16/2468
Fuzzy queries · CPC title
G06F16/24575
using context · CPC title

Patent family

Related publications grouped by family.

View patent family 67391455

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2019236178A1 cover?: A system tokenizes values stored in a field by multiple records. The system creates a trie from the tokenized values, each branch in the trie labeled with one of the tokenized values, each node storing a count indicating the number of the multiple records associated with a tokenized value sequence beginning from a root of the trie. The system tokenizes a value stored in the field by a prospecti…
Who is the assignee on this patent?: Salesforce Com Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/9014. Mapped technology areas include Physics.
When was this patent published?: Publication date Thu Aug 01 2019 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).