What technology area does this patent fall under?

Primary CPC classification G06F40/268. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Mar 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).

Information processing device, information processing method, and program

US11934779B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11934779-B2
Application number	US-202017612522-A
Country	US
Kind code	B2
Filing date	Apr 1, 2020
Priority date	May 30, 2019
Publication date	Mar 19, 2024
Grant date	Mar 19, 2024

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

The occurrence cost of unknown words that are not registered in a morphological analysis dictionary is calculated by applying an occurrence cost regression model, which is a learning model. An information processing device includes a notation feature amount extraction unit that extracts a notation feature amount of a character string, a part-of-speech feature amount extraction unit that extracts a part-of-speech feature amount of the character string, and an occurrence cost estimation unit that receives the notation feature amount and the part-of-speech feature amount and calculates an occurrence cost of the character string by applying an occurrence cost regression model. The occurrence cost regression model is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of a character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data.

First claim

Opening claim text (preview).

The invention claimed is: 1. An information processing device comprising: a notation feature amount extraction unit that extracts a notation feature amount of a character string; a part-of-speech feature amount extraction unit that extracts a part-of-speech feature amount of the character string; and an occurrence cost estimation unit that receives the notation feature amount and the part-of-speech feature amount of the character string and calculates an occurrence cost of the character string by applying an occurrence cost regression model, wherein the occurrence cost is data used in a morphological analysis process, and the occurrence cost regression model is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of the character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data. 2. The information processing device according to claim 1 , wherein the character string of which the occurrence cost is to be calculated by the occurrence cost estimation unit is a character string constituting a new morpheme that is not registered in the existing morphological analysis dictionary. 3. The information processing device according to claim 2 , wherein the occurrence cost estimation unit registers the calculated occurrence cost as an occurrence cost corresponding to the new morpheme in a morphological analysis dictionary. 4. The information processing device according to claim 1 , wherein the notation feature amount extraction unit extracts types of characters constituting the character string as the notation feature amount. 5. The information processing device according to claim 1 , wherein the part-of-speech feature amount extraction unit extracts a part-of-speech type of the character string and a feature amount obtained from a notation thereof as the part-of-speech feature amount. 6. The information processing device according to claim 1 , wherein the occurrence cost estimation unit receives a notation feature amount including the types of characters constituting the character string and a part-of-speech feature amount including the part-of-speech type of the character string, and calculates the occurrence cost of the character string by applying the occurrence cost regression model. 7. The information processing device according to claim 1 , wherein the notation feature amount of the character string extracted by the notation feature amount extraction unit is a notation feature amount including at least one of a type and a composition of characters constituting the character string, a character string length, and words used in the character string. 8. The information processing device according to claim 1 , further comprising a machine learning process execution unit that generates the occurrence cost regression model, wherein the machine learning process execution unit receives a notation feature amount, a part-of-speech feature amount, and an occurrence cost of morphemes registered in an existing morphological analysis dictionary as input data and executes a learning process using the input data as teacher data to generate the occurrence cost regression model. 9. The information processing device according to claim 8 , wherein the machine learning process execution unit generates an occurrence cost regression model which is a learning model that receives a notation feature amount including types of characters constituting a character string and a part-of-speech feature amount including a part of speech type of the character string and a notation thereof and outputs an occurrence cost. 10. An information processing device comprising: an analysis text input unit that inputs text; and a morphological analysis process execution unit that executes a morphological analysis process on the text, wherein the morphological analysis process execution unit executes a morphological analysis process by applying a morphological analysis dictionary in which an occurrence cost of a morpheme unit is registered, and the morphological analysis dictionary used by the morphological analysis process execution unit is a dictionary in which additional registration is performed using an occurrence cost estimated by applying an occurrence cost regression model generated by a learning process which uses registration data of an existing morphological analysis dictionary as teacher data. 11. The information processing device according to claim 10 , wherein the morphological analysis process execution unit includes: a morpheme lattice generation unit that generates a morpheme lattice; a path corresponding cost calculation unit that calculates a path corresponding cost of the morpheme lattice; and a lowest-cost path selection unit that selects a lowest-cost path from paths of the morpheme lattice. 12. The information processing device according to claim 10 , wherein the occurrence cost regression model is a learning model for estimating an occurrence cost from a notation feature amount and a part-of-speech feature amount of a character string. 13. The information processing device according to claim 12 , wherein the notation feature amount includes character type information of characters constituting the character string, and the part-of-speech feature amount includes part-of-speech type information of the character string. 14. An information processing method executed in an information processing device, comprising: allowing a notation feature amount extraction unit to extract a notation feature amount of a character string; allowing a part-of-speech feature amount extraction unit to extract a part-of-speech feature amount of the character string; and allowing an occurrence cost estimation unit to receive the notation feature amount and the part-of-speech feature amount of the character string and calculate an occurrence cost of the character string by applying an occurrence cost regression model, wherein the occurrence cost calculated by the occurrence cost estimation unit is data used in a morphological analysis process, and the occurrence cost regression model applied by the occurrence cost estimation unit is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of the character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data. 15. An information processing method executed in an information processing device, comprising: allowing an analysis text input unit to input analysis target text; and allowing a morphological analysis process execution unit to execute a morphological analysis process on the input text by applying a morphological analysis dictionary in which an occurrence cost of a morpheme unit is registered, wherein the morphological analysis dictionary applied by the morphological analysis process execution unit is a dictionary in which an occurrence cost estimated by applying an occurrence cost regression model which is a learning model generated by a learning process which uses registration data of an existing morphological analysis dictionary as teacher data is registered. 16. A non-transitory computer-readable storage medium storing a program for causing an information processing device to execute information processing, comprising: allowing a notation feature amount extraction unit to extract a notation feature amount of a character string; allowing a part-of-speech feature amo

Assignees

Sony Group Corp

Inventors

Mitani Ryosuke

Classifications

G06F40/268Primary
Morphological analysis · CPC title
G06F40/216
using statistical methods · CPC title
G06F40/242
Dictionaries · CPC title

Patent family

Related publications grouped by family.

View patent family 73553385

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11934779B2 cover?: The occurrence cost of unknown words that are not registered in a morphological analysis dictionary is calculated by applying an occurrence cost regression model, which is a learning model. An information processing device includes a notation feature amount extraction unit that extracts a notation feature amount of a character string, a part-of-speech feature amount extraction unit that extract…
Who is the assignee on this patent?: Sony Group Corp
What technology area does this patent fall under?: Primary CPC classification G06F40/268. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Mar 19 2024 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 1 related publication on this page (citations in our corpus or others sharing the same primary CPC).