Extracting product facets from unstructured data
US-10235449-B1 · Mar 19, 2019 · US
US10747795B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-10747795-B2 |
| Application number | US-201815868558-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 11, 2018 |
| Priority date | Jan 11, 2018 |
| Publication date | Aug 18, 2020 |
| Grant date | Aug 18, 2020 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A method, computer system, and computer readable program product for generating text for a search corpus. In an embodiment, the method comprises analyzing structured data associated with an entity; and breaking the structured data into multiple unstructured natural language pair attributes such that natural language search terms have an increased level of matching with attributes associated with the entity, including filtering the structured data to identify a filtered group of attributes, and using the filtered group of attributes to form the natural language pair attributes. The method further comprises saving the formed natural language pairs in a search corpus. In an embodiment, the method further comprises generating a cluster of the attributes based on semantic similarities; and the using the filtered gropo of attributes to form the natural language pair attributes includes creating separate blocks of text that respectively describe the attributes included in the cluster of attributes.
Opening claim text (preview).
What is claimed is: 1. A method of generating text for a search corpus, comprising: analyzing structured data associated with an entity; breaking the structured data into multiple unstructured natural language pair attributes such that natural language search terms have an increased level of matching with attributes associated with the entity, including filtering the structured data, based on predetermined criteria, to identify a filtered group of attributes associated with the entity, wherein identifying the filtered group of attributes comprises analyzing attributes representing dimensions and energy specifications of an apparatus, and using the filtered group of attributes to form the natural language pair attributes; and saving the formed natural language pairs in a search corpus; generating a cluster of the attributes from the filtered group of attributes based on semantic similarities; and removing noise from semantically related content associated with the semantic similarities. 2. The method according to claim 1 , wherein the using the filtered group of attributes to form the natural language pair attributes includes: creating blocks of text that respectively describe the attributes included in the cluster of the attributes. 3. The method according to claim 2 , further comprising: processing one or more natural language searches based, at least in part, on the cluster of the attributes. 4. The method according to claim 2 , wherein each of said blocks of text includes a name for one of the attributes in the cluster of the attributes, and an associated value for the one of the attributes. 5. The method according to claim 1 , wherein the filtering the structured data includes: selecting one or more defined facets of the entity; and using the selected facets to filter the structured data. 6. The method according to claim 5 , wherein the selecting one or more defined facets includes: identifying a group of defined facets; tracking specified uses of the group of defined facets over a period of time; and selecting the one or more defined facets, from said group of defined facets, based on the tracked specified uses of the group of defined facets over the period of time. 7. The method according to claim 6 , wherein: the tracking specified uses of the group of defined facets over a period of time comprises tracking uses of the group of defined facets over the period of time during computer searches for the entity over the period of time; and the method further comprises removing one or more of the natural language pair attributes from the search corpus when a defined popularity of a specified one of the facets falls below a given value. 8. The method according to claim 1 , wherein: the using the filtered group of attributes to form the natural language pair attributes includes forming a natural language block of text including all the attributes associated with the entity and having a defined popularity. 9. The method according to claim 1 , further comprising: clustering the attributes of the filtered set of attributes into a plurality of clusters of the attributes based on semantic similarities; and wherein the using the filtered group of attributes to form the natural language pair attributes includes: forming one or more unstructured, natural language blocks of text from each of the clusters; and storing each of the blocks of text as a document in the search corpus to create documents with the semantically related content. 10. A computer system for generating text for a search corpus, comprising: a memory for storing data: one or more processor units operatively connected to the memory for transmitting data to and receiving data from the memory, the one or more processing units being configured for: analyzing structured data associated with an entity; breaking the structured data into multiple unstructured natural language pair attributes such that natural language search terms have an increased level of matching with attributes associated with the entity, including filtering the structured data, based on predetermined criteria, to identify a filtered group of attributes associated with the entity, wherein identifying the filtered group of attributes comprises analyzing attributes representing dimensions and energy specifications of an apparatus, and using the filtered group of attributes to form the natural language pair attributes; and saving the formed natural language pairs in a search corpus; generating a cluster of the attributes from the filtered group of attributes based on semantic similarities; and removing noise from semantically related content associated with the semantic similarities. 11. The computer system according to claim 10 , wherein: the one or more processor units are further configured for using the filtered set of attributes to form the natural language pair attributes includes creating blocks of text that respectively describe the attributes included in the cluster of the attributes. 12. The computer system according to claim 10 , wherein the filtering the structured data includes: selecting one or more defined facets of the entity, including identifying a group of defined facets, tracking specified uses of the group of defined facets over a period of time, and selecting the one or more defined facets, from said group of defined facets, based on the tracked specified uses of the group of defined facets over the period of time; and using the selected facets to filter the structured data. 13. The computer system according to claim 12 , wherein the tracking specified uses of the group of defined facets over a period of time comprises tracking uses of the group of defined facets over the period of time during computer searches for the entity over the period of time. 14. The computer system according to claim 10 , wherein: the using the filtered group of attributes to form the natural language pair attributes includes forming a natural language block of text including all the attributes associated with the entity and having a defined popularity. 15. A computer readable program product for generating text for a search corpus, the computer readable program product comprising: a computer readable storage medium having program instructions embodied therein, the program instructions executable by a specified computer to cause the specified computer to perform the method of: analyzing structured data associated with an entity; breaking the structured data into multiple unstructured natural language pair attributes such that natural language search terms have an increased level of matching with attributes associated with the entity, including filtering the structured data, based on predetermined criteria, to identify a filtered group of attributes associated with the entity, wherein identifying the filtered group of attributes comprises analyzing attributes representing dimensions and energy specifications of an apparatus, and using the filtered group of attributes to form the natural language pair attributes; and saving the formed natural language pairs in a search corpus; generating a cluster of the attributes from the filtered group of attributes based on semantic similarities; and removing noise from semantically related content associated with the semantic similarities. 16. The computer readable program product according to claim 15 , wherein: the method further comprises using the filtered set of attributes to form the natural language pair attributes includes creating blocks of text that respectively descr
utilising user interfaces specially adapted for shopping · CPC title
using natural language analysis · CPC title
by specifying product or service characteristics, e.g. product dimensions · CPC title
using ranking · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.