Device, method and program for generating accurate corpus data for presentation target for searching

US9645979B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9645979-B2
Application numberUS-201314420424-A
CountryUS
Kind codeB2
Filing dateSep 30, 2013
Priority dateSep 30, 2013
Publication dateMay 9, 2017
Grant dateMay 9, 2017

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A corpus generation device according to an embodiment includes a web page acquisition unit, a reference word acquisition unit, an attachment unit and an output unit. The web page acquisition unit acquires a web page including description sentence data regarding a presentation target. The reference word acquisition unit acquires a reference word that is an attribute value regarding the presentation target from the web page. The attachment unit extracts a broader word belonging to a layer above the reference word acquired by the reference word acquisition unit from a storage unit that stores hierarchical relationship information indicating a hierarchical relationship between attribute values, and attaches an attribute tag corresponding to the reference word to the broader word included in the description sentence data. The output unit outputs, as corpus data, the description sentence data to which the attribute tag is attached by the attachment unit.

First claim

Opening claim text (preview).

The invention claimed is: 1. A corpus generation device for generating accurate corpus data of presentation targets for use in a search operation for the presentation targets, the corpus generation device comprising: at least one non-transitory memory configured to store computer program code; and at least one processor operable to access said memory and execute said computer program code, said computer program code comprising: web page acquisition code configured to cause at least one of said at least one processor to acquire a web page including description sentence data regarding a presentation target; reference word acquisition code configured to cause at least one of said at least one processor to acquire a reference word that is an attribute value regarding the presentation target from the web page; attachment code configured to cause at least one of said at least one processor to extract a broader word belonging to a layer above the reference word acquired according to the reference word acquisition code from a storage configured to store hierarchical relationship information indicating a hierarchical relationship between attribute values, and to attach an attribute tag, indicating an attribute name, corresponding to the reference word to the broader word included in the description sentence data when the broader word is included in the description sentence data; and output code configured to cause at least one of said at least one processor to output, as corpus data, the description sentence data to which the attribute tag is attached in accordance with the attachment code, wherein the corpus generation device further comprises a presentation target information registration storage configured to store a set of the attribute name and the attribute value regarding the presentation target generated by machine learning using the corpus data output in accordance with the output code in association with the web page, and wherein, when a search for the presentation target is requested, the presentation target is searched by referring to the presentation target information registration storage based on the attribute name and the attribute value regarding the presentation target. 2. The corpus generation device according to claim 1 , wherein the attachment code is further configured causes at least one of said at least one processor to, when the reference word is included in the description sentence data, attach the attribute tag corresponding to the reference word to the reference word included in the description sentence data. 3. The corpus generation device according to claim 1 , wherein the web page further includes an attribute list in which the attribute name and the attribute value regarding the presentation target are associated, and wherein the reference word acquisition code is further configured to cause at least one of said at least one processor to acquire the attribute value in the attribute list as the reference word. 4. The corpus generation device according to claim 1 , wherein the reference word acquisition code is further configured to cause at least one of said at least one processor to search for a word having a high probability of being an attribute value regarding the presentation target from the description sentence data using a sentence structure analyzer, and acquire the searched word as the reference word. 5. The corpus generation device according to claim 1 , wherein the hierarchical relationship information has a tree structure having the attribute value as a node, and wherein the attachment code is further configured to cause at least one of said at least one processor to search for a partial tree having the reference word as a root node and having no branch from the hierarchical relationship information, extract one or more attribute values other than the reference word included in the partial tree, and attach the attribute tag corresponding to the reference word to the one or more attribute values included in the description sentence data when the one or more attribute values are included in the description sentence data. 6. The corpus generation device according to claim 1 , wherein the hierarchical relationship information hierarchically indicates the hierarchical relationship between the attribute values, and wherein the attachment code is further configured to cause at least one of said at least one processor to, when there are a plurality of attribute values on a layer directly under the reference word in the hierarchical relationship information and only one of the plurality of attribute values is included in the description sentence data, attach the attribute tag corresponding to the reference word to the one attribute value included in the description sentence data. 7. The corpus generation device according to claim 1 , wherein the machine learning is performed on the corpus data and, based on a result of the machine learning, an attribute list of a second presentation target in a second web page, the second web page including description sentence data regarding the second presentation target, is automatically generated. 8. A corpus generation method for generating accurate corpus data of presentation targets for use in a search operation for the presentation targets, the corpus generation method comprising: a web page acquisition step of acquiring a web page including description sentence data regarding a presentation target; a reference word acquisition step of acquiring a reference word that is an attribute value regarding the presentation target from the web page; an attachment step of extracting an broader word belonging to a layer above the reference word acquired in the reference word acquisition step from a storage that stores hierarchical relationship information indicating a hierarchical relationship between attribute values, and attaching an attribute tag, indicating an attribute name, corresponding to the reference word to the broader word included in the description sentence data when the broader word is included in the description sentence data; an output step of outputting, as corpus data, the description sentence data to which the attribute tag is attached in the attachment step; and a storing step of storing, in a presentation target information registration storage, a set of the attribute name and the attribute value regarding the presentation target acquired from the corpus data output, in the output step, in association with the web page, wherein, when a search for the presentation target is requested, the presentation target is searched by referring to the presentation target information registration storage based on the attribute name and the attribute value regarding the presentation target. 9. The corpus generation method of claim 8 , wherein the web page acquisition step comprises receiving the web page from a server. 10. The corpus generation method of claim 9 , wherein the output step comprises transmitting the corpus data to the server. 11. The corpus generation method of claim 9 , further comprising: a registration step of registering the attribute tag and the corresponding reference word and broader word included in the description sentence data with the presentation target, and a second output step of transmitting the registered attribute tag and the corresponding reference word and broader word included in the description sentence data to the server. 12. A corpus generation device for generating accurate corpus data of presentation targets for use in a search operation for the presentation targets, the corpus generation device comprising: at least one non-transitory memory configured to

Assignees

Inventors

Classifications

  • Semantic analysis · CPC title

  • Dictionaries · CPC title

  • Text processing (natural language analysis G06F40/20; semantic analysis G06F40/30; processing or translation of natural language G06F40/40) · CPC title

  • G06F40/117Primary

    Tagging; Marking up (details of markup languages G06F40/143); Designating a block; Setting of attributes (style sheets, e.g. eXtensible Stylesheet Language Transformation [XSLT], G06F40/154) · CPC title

  • Grammatical analysis; Style critique · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9645979B2 cover?
A corpus generation device according to an embodiment includes a web page acquisition unit, a reference word acquisition unit, an attachment unit and an output unit. The web page acquisition unit acquires a web page including description sentence data regarding a presentation target. The reference word acquisition unit acquires a reference word that is an attribute value regarding the presentat…
Who is the assignee on this patent?
Rakuten Inc, Rakuten Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/117. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue May 09 2017 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).