What technology area does this patent fall under?

Primary CPC classification G06N20/00. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System and methods of generating structured data from unstructured data

US11210300B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11210300-B2
Application number	US-201615147052-A
Country	US
Kind code	B2
Filing date	May 5, 2016
Priority date	May 14, 2015
Publication date	Dec 28, 2021
Grant date	Dec 28, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Systems and methods to infer or predict the proper placement of unstructured data (such as text, phrases, segments of phrases, alphanumeric characters) into a more structured format (such as a specific data field). In some embodiments, this is based on a user's prior assignment of similar unstructured data into a specific structure. In some embodiments, this may be based on other users' prior assignment of similar unstructured data into the specific structure. In yet other embodiments, this may be based on information obtained from business data used by a data processing platform to assist in operating the business (i.e., either business data or the output of a business application that processes the business data, such as an ERP, CRM, or eCommerce application).

First claim

Opening claim text (preview).

What is claimed is: 1. A method of determining an assignment of one or more elements of data to a specific data field or to a set of data fields, comprising: training a machine learning algorithm to optimize identification of one or more structured data fields as destinations for elements of unstructured data based on historical entry of unstructured data into the structured data fields; accessing one or more sources of data to be processed for assignment to the specific data field or to the set of data fields; determining a relationship, association or correlation between samples of unstructured text and data fields that represent general text elements arranged in free-form strings using a natural language processing (NLP) technique that includes determining n-grams to represent each sample of unstructured text characters as a vector, determining, for each n-gram of the n-grams, an associated weight greater than zero and less than one based at least in part on an amount of time since a document containing the n-gram was cited by another document, with the weight reduced as the amount of time increases, and adding the highest weighted n-gram to a list of most likely candidates for placement into the specified data field or the set of data fields; identifying a most likely candidate text or string for placement into the specified data field or the set of data fields by applying the trained machine learning algorithm to the vector; adding the most likely candidate text or string to the list of most likely candidates for placement into the specified data field or the set of data fields; receiving a selection of one candidate from the list for placement into the specified data field or the set of data fields; in response to receiving the selection of the one candidate, using the one candidate as data values for the specified data field or the set of data fields; and storing the data values in a format or record associated with the specific data field or the set of data fields. 2. The method of claim 1 , wherein at least one of the one or more sources of data is data associated with a specific task. 3. The method of claim 1 , wherein at least one of the one or more sources of data is data associated with a specific data processing application or business area. 4. The method of claim 1 , wherein at least one of the one or more sources of data is data associated with a specific time interval covering a lifetime of a product architecture or a time since a product architecting event. 5. The method of claim 4 , wherein the data associated with the specific time interval is data that was generated within that time interval. 6. The method of claim 1 , wherein at least one of the one or more sources of data is data associated with a specific set of users. 7. The method of claim 1 , wherein the weights are at least in part a function of how recently a document containing the accessed data was entered into a system. 8. The method of claim 1 , wherein the weights are at least in part a function of the amount of citation or incorporation by other documents of elements of the accessed data. 9. The method of claim 1 , wherein the sources of data include data resident on a multi-tenant business data processing platform, the platform including tenant-specific data generated or utilized by one or more of a tenant-specific enterprise resource planning (ERP), customer relationship management (CRM), eCommerce, human resources (HR), or financial application. 10. The method of claim 1 , wherein the machine learning technique includes application of a k-nearest neighbor approach to identifying the most likely candidate text or string, wherein the k-nearest neighbor approach is uncombined with a support vector machine approach. 11. The method of claim 1 , wherein the amount of time since the document containing the n-gram was cited by another document is the minimum amount of time among several amounts of time since the document containing the n-gram was cited. 12. The method of claim 1 , wherein the weights are calculated at least in part by dividing the minimum time among all times since the document containing the n-gram was cited by the total time since the document was entered into the system and subtracting a resulting quotient from one. 13. The method of claim 1 , wherein: at least one of the one or more sources of data is data associated with a specific task, a specific data processing application or business area, and a specific set of users, and the data is generated within specific time interval covering a lifetime of a product architecture; the one or more sources of data include data resident on a multi-tenant business data processing platform, the platform including tenant-specific data generated or utilized by one or more of a tenant-specific eCommerce application; the weights are at least in part a function of how recently a document containing the accessed data was entered into a system; the machine learning technique includes application of a k-nearest neighbor approach to identifying the most likely candidate text or string that is uncombined with a support vector machine approach; the amount of time since the document containing the n-gram was cited by another document is the minimum amount of time among several amounts of time since the document containing the n-gram was cited. 14. A system for determining an assignment of one or more elements of data to a specific data field, comprising a database or data store containing a plurality of data records; one or more business related data processing applications installed in the system; a hardware processor programmed with a set of instructions, wherein, when executed by the hardware processor, the instructions cause the system to train a machine learning algorithm to optimize identification of one or more structured data fields as destinations for elements of unstructured data based on historical entry of unstructured data into the structured data fields; access one or more sources of data from the database or data store to be processed for assignment to the specific data field; determine a relationship, association or correlation between samples of unstructured text and data fields that represent general text elements arranged in free-form strings-using a natural language processing (NLP) technique that includes determining n-grams to represent each sample of unstructured text characters as a vector, determine, for each n-gram of the n-grams, an associated weight greater than zero and less than one based at least in part on an amount of time since a document containing the n-gram was cited by another document, with the weight reduced as the amount of time increases, and adding the highest weighted n-gram to a list of most likely candidates for placement into the specified data field or the set of data fields; identify a most likely candidate text or string for placement into the specified data field by applying a machine learning technique to the vector; add the most likely candidate text or string to the list of most likely candidates for placement into the specified data field or the set of data fields; receive a selection of one candidate from the list for placement into the specified data field or set of data fields; in response to receiving the selection of the one candidate, use the one candidate as data values for the specified data field; and store the data values in a format or record associated with the specific data field. 15. The system of claim 14 , wherein the one or more business related data processing applications include one

Assignees

Netsuite Inc

Inventors

Classifications

G06F16/258
Data format conversion from or to a database · CPC title
G06F40/174
Form filling; Merging · CPC title
G06F40/284
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06N20/00Primary
Machine learning · CPC title
G06Q10/00
Administration; Management · CPC title

Patent family

Related publications grouped by family.

View patent family 59561582

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11210300B2 cover?: Systems and methods to infer or predict the proper placement of unstructured data (such as text, phrases, segments of phrases, alphanumeric characters) into a more structured format (such as a specific data field). In some embodiments, this is based on a user's prior assignment of similar unstructured data into a specific structure. In some embodiments, this may be based on other users' prior a…
Who is the assignee on this patent?: Netsuite Inc
What technology area does this patent fall under?: Primary CPC classification G06N20/00. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Dec 28 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

System and method for a cloud based solution to track notes against business records

System and methods for management of cloud application extensions

System and Method for Automated Detection of Incorrect Data

System and methods for processing information regarding relationships and interactions to assist in making organizational decisions

System and methods for management of cloud application extensions

Frequently asked questions