Secure in-line payments
US-12106300-B2 · Oct 1, 2024 · US
US2016321358A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2016321358-A1 |
| Application number | US-201514700683-A |
| Country | US |
| Kind code | A1 |
| Filing date | Apr 30, 2015 |
| Priority date | Apr 30, 2015 |
| Publication date | Nov 3, 2016 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A system is provided that extracts attribute values. The system receives data including unstructured text from a data store. The system further tokenizes the unstructured text into tokens, where a token is a character of the unstructured text. The system further annotates the tokens with attribute labels, where an attribute label for a token is determined, in least in part, based on a word that the token originates from within the unstructured text. The system further groups the tokens into text segments based on the attribute labels, where a set of tokens that are annotated with an identical attribute label are grouped into a text segment, and where the text segments define attribute values. The system further stores the attribute labels and the attribute values within the data store.
Opening claim text (preview).
We claim: 1 . A computer-readable medium having instructions stored thereon that, when executed by a processor, cause the processor to extract attribute values, the extracting comprising: receiving data comprising unstructured text from a data store; tokenizing the unstructured text into one or more tokens, wherein a token is a character of the unstructured text; annotating the one or more tokens with one or more attribute labels, wherein an attribute label for a token is determined, at least in part, based on a word that the token originates from within the unstructured text; grouping the one or more tokens into one or more text segments based on the one or more attribute labels, wherein a set of one or more tokens that are annotated with an identical attribute label are grouped into a text segment, and wherein the one or more text segments define one or more attribute values; and storing the one or more attribute labels and the one or more attribute values within the data store. 2 . The computer-readable medium of claim 1 , the extracting further comprising normalizing at least one attribute value of the one or more attribute values. 3 . The computer-readable medium of claim 2 , the normalizing the at least one attribute value further comprising: pairing an attribute value of the one or more attribute values with one or more target attribute values; selecting a target attribute value that has a highest probability of matching the attribute value; and replacing the attribute value with the selected target attribute value. 4 . The computer-readable medium of claim 3 , the extracting further comprising replacing at least one attribute label that is annotated for at least one token with at least one new attribute label in response to a user interaction. 5 . The computer-readable medium of claim 1 , wherein the unstructured text comprises a product description. 6 . The computer-readable medium of claim 1 , wherein the data store comprises a database. 7 . The computer-readable medium of claim 1 wherein the attribute label for the token is further determined, at least in part, based on a character-based conditional random field. 8 . The computer-readable medium of claim 1 , wherein the one or more tokens are character-based tokens. 9 . The computer-readable medium of claim 1 , wherein the attribute label for the token is further determined, at least in part, based on at least one of: whether the token is a lowercase character; a shape of the token; a punctuation of the token, one or more surrounding tokens; a size of the word that the token originates from within the unstructured text; a position of the token relative to the word that the token originates from within the unstructured text; or a position of the token relative to the unstructured text. 10 . The computer-readable medium of claim 1 , the extracting further comprising: receiving one or more pre-defined attribute values; annotating one or more characters of the unstructured text with one or more attribute labels by matching the one or more pre-defined attribute values with one or more text segments of the unstructured text; and replacing at least one attribute label that is annotated for at least one character with at least one new attribute label in response to a user interaction. 11 . A computer-implemented method for extracting attribute values, the computer-implemented method comprising: receiving data comprising unstructured text from a data store; tokenizing the unstructured text into one or more tokens, wherein a token is a character of the unstructured text; annotating the one or more tokens with one or more attribute labels, wherein an attribute label for a token is determined, at least in part, based on a word that the token originates from within the unstructured text; grouping the one or more tokens into one or more text segments based on the one or more attribute labels, wherein a set of one or more tokens that are annotated with an identical attribute label are grouped into a text segment, and wherein the one or more text segments define one or more attribute values; and storing the one or more attribute labels and the one or more attribute values within the data store. 12 . The computer-implemented method of claim 11 , further comprising normalizing at least one attribute value of the one or more attribute values, 13 . The computer-implemented method of claim 12 , the normalizing the at least one attribute value further comprising: pairing an attribute value of the one or more attribute values with one or more target attribute values; selecting a target attribute value that has a highest probability of matching the attribute value; and replacing the attribute value with the selected target attribute value. 14 . The computer-implemented method of claim 13 , further comprising replacing at least one attribute label that is annotated for at least one token with at least one new attribute label in response to a user interaction. 15 . The computer-implemented method of claim 11 , wherein the attribute label for the token is further determined, at least in part, based on at least one of: whether the token is a lowercase character; a shape of the token; a punctuation of the token, one or more surrounding tokens; a size of the word that the token originates from within the unstructured text; a position of the token relative to the word that the token originates from within the unstructured text; or a position of the token relative to the unstructured text. 16 . A system for extracting attribute values, the system comprising: a data reception module configured to receive data comprising unstructured text from a data store; a tokenization module configured to tokenize the unstructured text into one or more tokens, wherein a token is a character of the unstructured text; an annotation module configured to annotate the one or more tokens with one or more attribute labels, wherein an attribute label for a token is determined, at least in part, based on a word that the token originates from within the unstructured text; a token grouping module configured to group the one or more tokens into one or more text segments based on the one or more attribute labels, wherein a set of one or more tokens that are annotated with an identical attribute label are grouped into a text segment, and wherein the one or more text segments define one or more attribute values; and an attribute storage module configured to store the one or more attribute labels and the one or more attribute values within the data store. 17 . The system of claim 16 , further comprising a normalization module configured to normalize at least one attribute value of the one or more attribute values, 18 . The system of claim 17 , wherein the normalization module is further configured to pair an attribute value of the one or more attribute values with one or more target attribute values; wherein the normalization module is further configured to select a target attribute value that has a highest probability of matching the attribute value; and wherein the normalization module is further configured to replace the attribute value with the selected target attribute value. 19 . The system of claim 18 , further comprising a manual annotation module configured to replace at least one attribute label that is annotated for at least one token with at least one new attribute label in response to a user interaction. 20 . The system of claim 16 ,
Commerce · CPC title
Mark-up to mark-up conversion (conversion for visualization in web browsing G06F16/9577) · CPC title
Lexical analysis, e.g. tokenisation or collocates · CPC title
using context analysis, e.g. recognition aided by known co-occurring patterns · CPC title
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.