What technology area does this patent fall under?

Primary CPC classification G06F40/205. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Sep 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Treebank synthesis for training production parsers

US11769007B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11769007-B2
Application number	US-202117303349-A
Country	US
Kind code	B2
Filing date	May 27, 2021
Priority date	May 27, 2021
Publication date	Sep 26, 2023
Grant date	Sep 26, 2023

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A processor provides the at least one corpus to a transformer enhanced parser neural network model. A processor generates at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present. A processor sends the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method for generating synthetic treebanks to be used in training a parser in a production system, the method comprising: receiving, by one or more processors, a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks; retrieving, by the one or more processors, at least one corpus of text in which the requested language is present; providing, by the one or more processors, the at least one corpus to a transformer enhanced parser neural network model; generating, by the one or more processors, at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present, wherein the at least one synthetic treebank is generated with unsupervised training of the transformer enhanced parser neural network model; and sending, by the one or more processors, the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank. 2. The computer-implemented method of claim 1 , wherein the at least one corpus of text includes a corpus directed towards a limited language or domain. 3. The computer-implemented method of claim 2 , wherein the transformer enhanced parser neural network model includes one of the following pretrained transformer models: a bidirectional encoder representations for transformers (BERT) model or a cross-lingual language model (XLM). 4. The computer-implemented method of claim 1 , the transformer enhanced parser neural network model includes a neural-network parser. 5. The computer-implemented method of claim 4 , wherein the parser utilized by the production system is of lower quality than the neural-network parser. 6. The computer-implemented method of claim 1 , wherein the transformer enhanced parser neural network model separates one or more words of the at least one corpus of text into subwords. 7. A computer program product for generating synthetic treebanks to be used in training of a parser in a production system, the computer program product comprising: one or more computer-readable storage media and program instructions stored on the one or more computer-readable storage media, the program instructions comprising: program instructions to receive a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks; program instructions to retrieve at least one corpus of text in which the requested language is present; program instructions to provide the at least one corpus to a transformer enhanced parser neural network model; program instructions to generate at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present, wherein the at least one synthetic treebank is generated with unsupervised training of the transformer enhanced parser neural network model; and program instructions to send the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank. 8. The computer program product of claim 7 , wherein the at least one corpus of text includes a corpus directed towards a limited language or domain. 9. The computer program product of claim 8 , wherein the transformer enhanced parser neural network model includes one of the following pretrained transformer models: a bidirectional encoder representations for transformers (BERT) model or a cross-lingual language model (XLM). 10. The computer program product of claim 7 , the transformer enhanced parser neural network model includes a neural-network parser. 11. The computer program product of claim 10 , wherein the parser utilized by the production system is of lower quality than the neural-network parser. 12. The computer program product of claim 7 , wherein the transformer enhanced parser neural network model separates one or more words of the at least one corpus of text into subwords. 13. A computer system for generating synthetic treebanks to be used in training of a parser in a production system, the computer system comprising: one or more computer processors; one or more computer readable storage media; and program instructions stored on the computer readable storage media for execution by at least one of the one or more processors, the program instructions comprising: program instructions to receive a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks; program instructions to retrieve at least one corpus of text in which the requested language is present; program instructions to provide the at least one corpus to a transformer enhanced parser neural network model; program instructions to generate at least one synthetic treebank associated with a string of text from the at least one corpus of text in which the requested language is present, wherein the at least one synthetic treebank is generated with unsupervised training of the transformer enhanced parser neural network model; and program instructions to send the at least one synthetic treebank to the production system, wherein the production system trains a parser utilized by the production system with the at least one synthetic treebank. 14. The computer system of claim 13 , wherein the at least one corpus of text includes a corpus directed towards a limited language or domain. 15. The computer system of claim 14 , wherein the transformer enhanced parser neural network model includes one of the following pretrained transformer models: a bidirectional encoder representations for transformers (BERT) model or a cross-lingual language model (XLM). 16. The computer system of claim 13 , the transformer enhanced parser neural network model includes a neural-network parser. 17. The computer system of claim 16 , wherein the parser utilized by the production system is of lower quality than the neural-network parser. 18. The computer system of claim 13 , wherein the transformer enhanced parser neural network model separates one or more words of the at least one corpus of text into subwords.

Assignees

Inventors

Classifications

G06N3/09
Supervised learning · CPC title
G06N3/0895
Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title
G06N3/0455
Auto-encoder networks; Encoder-decoder networks · CPC title
G06F40/205Primary
Parsing · CPC title
G06F40/47
Machine-assisted translation, e.g. using translation memory · CPC title

Patent family

Related publications grouped by family.

View patent family 84194037

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11769007B2 cover?: An approach for generating synthetic treebanks to be used in training a parser in a production system is provided. A processor receives a request to generate one or more synthetic treebanks from a production system, wherein the request indicates a language for the one or more synthetic treebanks. A processor retrieves at least one corpus of text in which the requested language is present. A pro…
Who is the assignee on this patent?: IBM
What technology area does this patent fall under?: Primary CPC classification G06F40/205. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Sep 26 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 3 related publications on this page (citations in our corpus or others sharing the same primary CPC).

How to read this patent

Abstract

First claim

Assignees

Inventors

Classifications

Patent family

External sources

Related patents

Cross Data Set Knowledge Distillation for Training Machine Learning Models

Applied artificial intelligence technology for using natural language processing and concept expression templates to train a natural language generation system

Deep analysis of natural language questions for question answering system

Frequently asked questions