Information processing device, information processing method, and program

US11934779B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11934779-B2
Application numberUS-202017612522-A
CountryUS
Kind codeB2
Filing dateApr 1, 2020
Priority dateMay 30, 2019
Publication dateMar 19, 2024
Grant dateMar 19, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The occurrence cost of unknown words that are not registered in a morphological analysis dictionary is calculated by applying an occurrence cost regression model, which is a learning model. An information processing device includes a notation feature amount extraction unit that extracts a notation feature amount of a character string, a part-of-speech feature amount extraction unit that extracts a part-of-speech feature amount of the character string, and an occurrence cost estimation unit that receives the notation feature amount and the part-of-speech feature amount and calculates an occurrence cost of the character string by applying an occurrence cost regression model. The occurrence cost regression model is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of a character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data.

First claim

Opening claim text (preview).

The invention claimed is: 1. An information processing device comprising: a notation feature amount extraction unit that extracts a notation feature amount of a character string; a part-of-speech feature amount extraction unit that extracts a part-of-speech feature amount of the character string; and an occurrence cost estimation unit that receives the notation feature amount and the part-of-speech feature amount of the character string and calculates an occurrence cost of the character string by applying an occurrence cost regression model, wherein the occurrence cost is data used in a morphological analysis process, and the occurrence cost regression model is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of the character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data. 2. The information processing device according to claim 1 , wherein the character string of which the occurrence cost is to be calculated by the occurrence cost estimation unit is a character string constituting a new morpheme that is not registered in the existing morphological analysis dictionary. 3. The information processing device according to claim 2 , wherein the occurrence cost estimation unit registers the calculated occurrence cost as an occurrence cost corresponding to the new morpheme in a morphological analysis dictionary. 4. The information processing device according to claim 1 , wherein the notation feature amount extraction unit extracts types of characters constituting the character string as the notation feature amount. 5. The information processing device according to claim 1 , wherein the part-of-speech feature amount extraction unit extracts a part-of-speech type of the character string and a feature amount obtained from a notation thereof as the part-of-speech feature amount. 6. The information processing device according to claim 1 , wherein the occurrence cost estimation unit receives a notation feature amount including the types of characters constituting the character string and a part-of-speech feature amount including the part-of-speech type of the character string, and calculates the occurrence cost of the character string by applying the occurrence cost regression model. 7. The information processing device according to claim 1 , wherein the notation feature amount of the character string extracted by the notation feature amount extraction unit is a notation feature amount including at least one of a type and a composition of characters constituting the character string, a character string length, and words used in the character string. 8. The information processing device according to claim 1 , further comprising a machine learning process execution unit that generates the occurrence cost regression model, wherein the machine learning process execution unit receives a notation feature amount, a part-of-speech feature amount, and an occurrence cost of morphemes registered in an existing morphological analysis dictionary as input data and executes a learning process using the input data as teacher data to generate the occurrence cost regression model. 9. The information processing device according to claim 8 , wherein the machine learning process execution unit generates an occurrence cost regression model which is a learning model that receives a notation feature amount including types of characters constituting a character string and a part-of-speech feature amount including a part of speech type of the character string and a notation thereof and outputs an occurrence cost. 10. An information processing device comprising: an analysis text input unit that inputs text; and a morphological analysis process execution unit that executes a morphological analysis process on the text, wherein the morphological analysis process execution unit executes a morphological analysis process by applying a morphological analysis dictionary in which an occurrence cost of a morpheme unit is registered, and the morphological analysis dictionary used by the morphological analysis process execution unit is a dictionary in which additional registration is performed using an occurrence cost estimated by applying an occurrence cost regression model generated by a learning process which uses registration data of an existing morphological analysis dictionary as teacher data. 11. The information processing device according to claim 10 , wherein the morphological analysis process execution unit includes: a morpheme lattice generation unit that generates a morpheme lattice; a path corresponding cost calculation unit that calculates a path corresponding cost of the morpheme lattice; and a lowest-cost path selection unit that selects a lowest-cost path from paths of the morpheme lattice. 12. The information processing device according to claim 10 , wherein the occurrence cost regression model is a learning model for estimating an occurrence cost from a notation feature amount and a part-of-speech feature amount of a character string. 13. The information processing device according to claim 12 , wherein the notation feature amount includes character type information of characters constituting the character string, and the part-of-speech feature amount includes part-of-speech type information of the character string. 14. An information processing method executed in an information processing device, comprising: allowing a notation feature amount extraction unit to extract a notation feature amount of a character string; allowing a part-of-speech feature amount extraction unit to extract a part-of-speech feature amount of the character string; and allowing an occurrence cost estimation unit to receive the notation feature amount and the part-of-speech feature amount of the character string and calculate an occurrence cost of the character string by applying an occurrence cost regression model, wherein the occurrence cost calculated by the occurrence cost estimation unit is data used in a morphological analysis process, and the occurrence cost regression model applied by the occurrence cost estimation unit is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of the character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data. 15. An information processing method executed in an information processing device, comprising: allowing an analysis text input unit to input analysis target text; and allowing a morphological analysis process execution unit to execute a morphological analysis process on the input text by applying a morphological analysis dictionary in which an occurrence cost of a morpheme unit is registered, wherein the morphological analysis dictionary applied by the morphological analysis process execution unit is a dictionary in which an occurrence cost estimated by applying an occurrence cost regression model which is a learning model generated by a learning process which uses registration data of an existing morphological analysis dictionary as teacher data is registered. 16. A non-transitory computer-readable storage medium storing a program for causing an information processing device to execute information processing, comprising: allowing a notation feature amount extraction unit to extract a notation feature amount of a character string; allowing a part-of-speech feature amo

Assignees

Inventors

Classifications

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11934779B2 cover?
The occurrence cost of unknown words that are not registered in a morphological analysis dictionary is calculated by applying an occurrence cost regression model, which is a learning model. An information processing device includes a notation feature amount extraction unit that extracts a notation feature amount of a character string, a part-of-speech feature amount extraction unit that extract…
Who is the assignee on this patent?
Sony Group Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/268. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Mar 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).