Pattern matching based character string retrieval
US-2015242537-A1 · Aug 27, 2015 · US
US11934779B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11934779-B2 |
| Application number | US-202017612522-A |
| Country | US |
| Kind code | B2 |
| Filing date | Apr 1, 2020 |
| Priority date | May 30, 2019 |
| Publication date | Mar 19, 2024 |
| Grant date | Mar 19, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
The occurrence cost of unknown words that are not registered in a morphological analysis dictionary is calculated by applying an occurrence cost regression model, which is a learning model. An information processing device includes a notation feature amount extraction unit that extracts a notation feature amount of a character string, a part-of-speech feature amount extraction unit that extracts a part-of-speech feature amount of the character string, and an occurrence cost estimation unit that receives the notation feature amount and the part-of-speech feature amount and calculates an occurrence cost of the character string by applying an occurrence cost regression model. The occurrence cost regression model is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of a character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data.
Opening claim text (preview).
The invention claimed is: 1. An information processing device comprising: a notation feature amount extraction unit that extracts a notation feature amount of a character string; a part-of-speech feature amount extraction unit that extracts a part-of-speech feature amount of the character string; and an occurrence cost estimation unit that receives the notation feature amount and the part-of-speech feature amount of the character string and calculates an occurrence cost of the character string by applying an occurrence cost regression model, wherein the occurrence cost is data used in a morphological analysis process, and the occurrence cost regression model is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of the character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data. 2. The information processing device according to claim 1 , wherein the character string of which the occurrence cost is to be calculated by the occurrence cost estimation unit is a character string constituting a new morpheme that is not registered in the existing morphological analysis dictionary. 3. The information processing device according to claim 2 , wherein the occurrence cost estimation unit registers the calculated occurrence cost as an occurrence cost corresponding to the new morpheme in a morphological analysis dictionary. 4. The information processing device according to claim 1 , wherein the notation feature amount extraction unit extracts types of characters constituting the character string as the notation feature amount. 5. The information processing device according to claim 1 , wherein the part-of-speech feature amount extraction unit extracts a part-of-speech type of the character string and a feature amount obtained from a notation thereof as the part-of-speech feature amount. 6. The information processing device according to claim 1 , wherein the occurrence cost estimation unit receives a notation feature amount including the types of characters constituting the character string and a part-of-speech feature amount including the part-of-speech type of the character string, and calculates the occurrence cost of the character string by applying the occurrence cost regression model. 7. The information processing device according to claim 1 , wherein the notation feature amount of the character string extracted by the notation feature amount extraction unit is a notation feature amount including at least one of a type and a composition of characters constituting the character string, a character string length, and words used in the character string. 8. The information processing device according to claim 1 , further comprising a machine learning process execution unit that generates the occurrence cost regression model, wherein the machine learning process execution unit receives a notation feature amount, a part-of-speech feature amount, and an occurrence cost of morphemes registered in an existing morphological analysis dictionary as input data and executes a learning process using the input data as teacher data to generate the occurrence cost regression model. 9. The information processing device according to claim 8 , wherein the machine learning process execution unit generates an occurrence cost regression model which is a learning model that receives a notation feature amount including types of characters constituting a character string and a part-of-speech feature amount including a part of speech type of the character string and a notation thereof and outputs an occurrence cost. 10. An information processing device comprising: an analysis text input unit that inputs text; and a morphological analysis process execution unit that executes a morphological analysis process on the text, wherein the morphological analysis process execution unit executes a morphological analysis process by applying a morphological analysis dictionary in which an occurrence cost of a morpheme unit is registered, and the morphological analysis dictionary used by the morphological analysis process execution unit is a dictionary in which additional registration is performed using an occurrence cost estimated by applying an occurrence cost regression model generated by a learning process which uses registration data of an existing morphological analysis dictionary as teacher data. 11. The information processing device according to claim 10 , wherein the morphological analysis process execution unit includes: a morpheme lattice generation unit that generates a morpheme lattice; a path corresponding cost calculation unit that calculates a path corresponding cost of the morpheme lattice; and a lowest-cost path selection unit that selects a lowest-cost path from paths of the morpheme lattice. 12. The information processing device according to claim 10 , wherein the occurrence cost regression model is a learning model for estimating an occurrence cost from a notation feature amount and a part-of-speech feature amount of a character string. 13. The information processing device according to claim 12 , wherein the notation feature amount includes character type information of characters constituting the character string, and the part-of-speech feature amount includes part-of-speech type information of the character string. 14. An information processing method executed in an information processing device, comprising: allowing a notation feature amount extraction unit to extract a notation feature amount of a character string; allowing a part-of-speech feature amount extraction unit to extract a part-of-speech feature amount of the character string; and allowing an occurrence cost estimation unit to receive the notation feature amount and the part-of-speech feature amount of the character string and calculate an occurrence cost of the character string by applying an occurrence cost regression model, wherein the occurrence cost calculated by the occurrence cost estimation unit is data used in a morphological analysis process, and the occurrence cost regression model applied by the occurrence cost estimation unit is a learning model that estimates the occurrence cost from the notation feature amount and the part-of-speech feature amount of the character string, generated by a learning process using registration data of an existing morphological analysis dictionary as teacher data. 15. An information processing method executed in an information processing device, comprising: allowing an analysis text input unit to input analysis target text; and allowing a morphological analysis process execution unit to execute a morphological analysis process on the input text by applying a morphological analysis dictionary in which an occurrence cost of a morpheme unit is registered, wherein the morphological analysis dictionary applied by the morphological analysis process execution unit is a dictionary in which an occurrence cost estimated by applying an occurrence cost regression model which is a learning model generated by a learning process which uses registration data of an existing morphological analysis dictionary as teacher data is registered. 16. A non-transitory computer-readable storage medium storing a program for causing an information processing device to execute information processing, comprising: allowing a notation feature amount extraction unit to extract a notation feature amount of a character string; allowing a part-of-speech feature amo
Morphological analysis · CPC title
using statistical methods · CPC title
Dictionaries · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.