Glossary management device, glossary management system, and recording medium for glossary generation

US9529792B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-9529792-B2
Application numberUS-201514862981-A
CountryUS
Kind codeB2
Filing dateSep 23, 2015
Priority dateSep 25, 2014
Publication dateDec 27, 2016
Grant dateDec 27, 2016

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A glossary management device includes a read circuit, a storage circuit, an acquisition circuit, an analysis circuit, a term matching circuit, and a registration circuit. The storage circuit has a storage area for a glossary. The acquisition circuit acquires text data of a document if reading of the document is executed by a user. The analysis circuit performs analysis of the text data acquired by the acquisition circuit to identify a language of the document and parts of speech of text segments in the text data and extracts one or more text segments from the document based on the analysis. The term matching circuit performs matching for each of the extracted text segments against a public dictionary. The registration circuit adds to the glossary, each extracted text segment that does not match any entry term in the public dictionary.

First claim

Opening claim text (preview).

What is claimed is: 1. A glossary management device comprising: a read circuit that reads a document; a storage circuit that has a storage area for a glossary to which text segments extracted from the document that is read by the read circuit are to be added as entry terms; an acquisition circuit that acquires text data of the document; an analysis circuit that performs analysis of the text data acquired by the acquisition circuit to identify a language of the document and parts of speech of text segments in the text data and extracts one or more text segments from the document based on the analysis; a term matching circuit that performs matching for each of the extracted text segments against a public dictionary containing entry terms registered therein; and a registration circuit that adds to the glossary, each extracted text segment that does not match any entry term in the public dictionary, wherein the analysis circuit determines whether or not each extracted text segment is a proper noun, if the analysis circuit determines that the extracted text segment is not a proper noun, the term matching circuit performs matching of the extracted text segment against the public dictionary, and the registration circuit adds the extracted text segment to the glossary if the extracted text segment does not match any entry term in the public dictionary, and if the analysis circuit determines that the extracted text segment is a proper noun, the registration circuit adds the extracted text segment to the glossary without the term matching circuit performing matching of the extracted text segment against the public dictionary. 2. The glossary management device according to claim 1 , wherein the acquisition circuit receives a user input designating an internal-external classification of the document, and determines that the document is an internal document and acquires text data of the document if the internal-external classification indicates that the document is classified as an internal document. 3. The glossary management device according to claim 2 , wherein the analysis circuit extracts each text segment that is analyzed to be a noun. 4. The glossary management device according to claim 1 , wherein the read circuit comprises a scanner that reads the document to generate image data. 5. The glossary management device according to claim 1 , wherein the registration circuit adds to the glossary, together with each text segment added to the glossary, a piece of location information, and each piece of location information indicates a location of where in the document a corresponding text segment is extracted. 6. A glossary management system comprising: an image forming apparatus; and an information processing device that communicates with the image forming apparatus over a network, wherein the image forming apparatus includes a processing circuit that executes a job of copying or printing a document, a collection circuit that collects from the document, text segments to be added to a glossary as entry terms, and a transmission circuit that transmits the text segments collected by the collection circuit to the information processing device, the information processing device includes a storage circuit that has a storage area for the glossary, a reception circuit that receives the text segments transmitted from the information processing device, and a registration circuit that adds each of the text segments received by the reception circuit to the glossary, the collection circuit includes an acquisition circuit that acquires text data of the document, an analysis circuit that performs analysis of the text data acquired by the acquisition circuit to identify a language of the document and parts of speech of text segments in the text data and extracts one or more text segments from the document based on the analysis, and a term matching circuit that performs matching for each of the extracted text segments against a public dictionary containing entry terms registered therein, the analysis circuit determines whether or not each extracted text segment is a proper noun, if the analysis circuit determines that the extracted text segment is not a proper noun, the term matching circuit performs matching of the extracted text segment against the public dictionary, and the registration circuit adds the extracted text segment to the glossary if the extracted text segment does not match any entry term in the public dictionary, and if the analysis circuit determines that the extracted text segment is a proper noun, the registration circuit adds the extracted text segment to the glossary without the term matching circuit performing matching of the extracted text segment against the public dictionary. 7. The glossary management device according to claim 6 , wherein the transmission circuit transmits each extracted text segment that does not match any entry term in the public dictionary to the information processing device. 8. A non-transitory computer-readable recording medium storing a glossary management program executable by a computer, the glossary management program comprising: a first program code that causes the computer to acquire text data of a document; a second program code that causes the computer to perform analysis of the text data to identify a language of the document and parts of speech of text segments in the text data and extract one or more text segments from the document based on the analysis; a third program code that causes the computer to perform matching for each of the extracted text segments against a public dictionary containing entry terms registered therein; and a fourth program code that causes the computer to add to a glossary, each extracted text segment that does not match any entry term in the public dictionary, wherein the second program code causes the computer to determine whether or not each extracted text segment is a proper noun, if the computer determines that the extracted text segment is not a proper noun, the third program code causes the computer to perform matching of the extracted text segment against the public dictionary, and the fourth program code causes the computer to add the extracted text segment to the glossary if the extracted text segment does not match any entry term in the public dictionary, and if the computer determines that the extracted text segment is a proper noun, the fourth program code causes the computer to add the extracted text segment to the glossary without the third program code causing the computer to perform matching of the extracted text segment against the public dictionary. 9. The glossary management device according to claim 2 , wherein the analysis circuit determines whether or not the language of the document is Japanese and extracts one or more text segments that are composed only of alphabetic characters from the document if the language of the document is Japanese. 10. The glossary management system according to claim 6 , wherein the acquisition circuit receives a user input designating an internal-external classification of the document, and determines that the document is an internal document and acquires text data of the document if the internal-external classification indicates that the document is classified as an internal document. 11. The glossary management system according to claim 10 , wherein the analysis circuit determines whether or not the language of the document is Japanese and extracts one or more text segments that are composed only of alphabetic characters from the document if the language of the document is Japanese.

Assignees

Inventors

Classifications

  • Parsing · CPC title

  • Language identification · CPC title

  • Foreign languages (with audible presentation of material to be studied G09B5/04) · CPC title

  • Processing of non-Latin text (kana-to-kanji conversion G06F40/129; vowelisation G06F40/232) · CPC title

  • G06F40/242Primary

    Dictionaries · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US9529792B2 cover?
A glossary management device includes a read circuit, a storage circuit, an acquisition circuit, an analysis circuit, a term matching circuit, and a registration circuit. The storage circuit has a storage area for a glossary. The acquisition circuit acquires text data of a document if reading of the document is executed by a user. The analysis circuit performs analysis of the text data acquired…
Who is the assignee on this patent?
Kyocera Document Solutions Inc
What technology area does this patent fall under?
Primary CPC classification G06F40/242. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Dec 27 2016 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).