What technology area does this patent fall under?

Primary CPC classification G06F40/284. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jun 19 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

System for tokenizing text in languages without inter-word separation

US10002128B2 · US · B2

Patent metadata
Field	Value
Publication number	US-10002128-B2
Application number	US-201514927274-A
Country	US
Kind code	B2
Filing date	Oct 29, 2015
Priority date	Sep 9, 2015
Publication date	Jun 19, 2018
Grant date	Jun 19, 2018

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A computerized system for transforming an input string includes a dictionary with tokens and associated scores. A chart parser generates a chart parse of the input string by, for each position within the input string, (i) identifying a string of at least one consecutive character in the input string that begins at that position and matches one of the tokens and (ii) unless the identified string is a single character matching the start character for another entry in the chart parse, creating an entry corresponding to the identified string. A partition selection module determines a selected partition of the input string. The selected partition includes an array of tokens selected from the chart parse such that their concatenation matches the input string. The selected partition is a minimum score partition, where the score is based on a sum of the tokens' associated scores from the dictionary.

First claim

Opening claim text (preview).

The invention claimed is: 1. A computerized system for transforming an input text string, the input text string being an ordered set of characters, the system comprising: a dictionary data store configured to store a plurality of tokens, wherein each token is associated with a score, and wherein each token is a string of one or more characters; a chart parser configured to generate a chart parse of the input text string, wherein: the chart parse includes a plurality of entries, each entry includes (i) an indication of a start character of the entry within the input text string and (ii) an indication of an end character of the entry within the input text string, and the chart parser is configured to, for each position within the input text string, (i) identify a string of at least one consecutive character in the input text string that begins at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the start character for another entry in the chart parse, create an entry corresponding to the identified string; a partition selection module configured to determine a selected partition of the input text string based on the entries of the chart parse, wherein: the selected partition includes an array of tokens such that a concatenation of the array of tokens matches the ordered set of characters of the input text string, each of the array of tokens is selected from the chart parse, a score of the selected partition is based on a sum of, for each token of the array of tokens, the score specified by the dictionary data store, and the selected partition is a minimum score partition; a data store storing application state records; a set generation module configured to, in response to a set of tokens, select records from the data store to form a consideration set of records; and a results generation module configured to respond to the user device with a subset of the consideration set of records, wherein the subset identifies application states of applications that are relevant to a search query from the user device, wherein the input text string is based on the search query from the user device. 2. The system of claim 1 wherein the chart parser is configured to, for each position within the input text string, (i) identify a string of consecutive characters in the input text string that ends at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the end character for another entry in the chart parse, create an entry corresponding to the identified string. 3. The system of claim 1 wherein, for each entry, the indication of the start character of the entry within the input text string and the indication of the end character of the entry within the input text string are specified as one of: a numerical start position within the input text string and a numerical end position within the input text string; the numerical start position within the input text string and a numerical length; and the numerical length and the numerical end position within the input text string. 4. The system of claim 1 further comprising a hash map configured to store hash values of the set of tokens from the dictionary data store, wherein: the chart parser is configured to calculate a hash value of a candidate token from the input text string; and presence of the calculated hash value in the hash map indicates that the candidate token matches one of the set of tokens. 5. The system of claim 4 wherein the set of tokens is a proper subset of the plurality of tokens in the dictionary data store, and wherein the set of tokens is selected from the plurality of tokens based on a domain of the input text string. 6. The system of claim 5 wherein the scores associated with the set of tokens are dependent on the domain. 7. The system of claim 1 wherein the chart parser is configured to generate a second chart parse by, for each position within the input text string, (i) identify a string of consecutive characters in the input text string that ends at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the end character for another entry in the second chart parse, create an entry in the second chart parse corresponding to the identified string. 8. The system of claim 7 wherein the partition selection module is configured to: determine a first partition of the input text string having a first score using the entries of the chart parse; determine a second partition of the input text string having a second score using the entries of the second chart parse; designate the first partition as the selected partition in response to the first score being lower than the second score; and designate the second partition as the selected partition in response to the second score being lower than the first score. 9. The system of claim 1 wherein the score of the selected partition is equal to the sum of, for each token of the array of tokens, the score specified by the dictionary data store. 10. The system of claim 1 wherein, for each token in the dictionary data store, the associated score is based on frequency of occurrence of the token. 11. The system of claim 10 wherein, for each token in the dictionary data store, the associated score is calculated by taking an inverse logarithm of the frequency of occurrence of the token. 12. A search system comprising: the system of claim 1 ; and a set processing module configured to assign a score to each record of the consideration set of records, wherein the subset is selected based on the assigned scores. 13. A search system comprising: the system of claim 1 ; an intake module configured to generate the application state records from source data, wherein the source data includes a text string used as the input text string; and a set processing module configured to assign a score to each record of the consideration set of records. 14. A computerized method for transforming an input text string, the input text string being an ordered set of characters, the method comprising: storing a plurality of tokens in a dictionary data store, wherein each token is associated with a score, and wherein each token is a string of one or more characters; generating a chart parse of the input text string, wherein: the chart parse includes a plurality of entries; each entry includes (i) an indication of a start character of the entry within the input text string and (ii) an indication of an end character of the entry within the input text strings, and generating the chart parse includes, for each position within the input text string, (i) identifying a string of at least one consecutive character in the input text string that begins at that position and matches one of the plurality of tokens and (ii) unless the identified string is a single character matching the start character for another entry in the chart parse, creating an entry corresponding to the identified string; determining a selected partition of the input text string based on the entries of the chart parse, wherein: the selected partition includes an array of tokens such that a concatenation of the array of tokens matches the ordered set of characters of the input text string, each of the array of tokens is selected from the chart parse, a score of the selected partition is based on a sum of, for each token of the array of tokens, the score specified by the dictionary data store, and the selected partition is a minimum score partition;

Assignees

Samsung Electronics Co Ltd

Inventors

Classifications

G06F40/289
Phrasal analysis, e.g. finite state techniques or chunking · CPC title
G06F40/53
Processing of non-Latin text (kana-to-kanji conversion G06F40/129; vowelisation G06F40/232) · CPC title
G06F40/284Primary
Lexical analysis, e.g. tokenisation or collocates · CPC title
G06F17/277Primary
Physics · mapped topic
G06F17/2863
Physics · mapped topic

Patent family

Related publications grouped by family.

View patent family 58190082

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US10002128B2 cover?: A computerized system for transforming an input string includes a dictionary with tokens and associated scores. A chart parser generates a chart parse of the input string by, for each position within the input string, (i) identifying a string of at least one consecutive character in the input string that begins at that position and matches one of the tokens and (ii) unless the identified string…
Who is the assignee on this patent?: Samsung Electronics Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06F40/284. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jun 19 2018 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Methods and apparatus related to automatically rewriting strings of text

Method and system for robust tagging of named entities in the presence of source or translation errors

Search engine for information retrieval system

Frequently asked questions