Linguistically rich cross-lingual text event embeddings

US11227128B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11227128-B2
Application numberUS-201916434710-A
CountryUS
Kind codeB2
Filing dateJun 7, 2019
Priority dateJun 7, 2019
Publication dateJan 18, 2022
Grant dateJan 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A machine accesses a preexisting set of natural language text documents in multiple natural languages. Each natural language text document in at least a portion of the preexisting set is associated with an event. The machine trains, using the preexisting set of natural language text documents and the associated events, an event encoder to learn associations between texts and event annotations. The event encoder leverages a parser in each of the two or more natural languages. The machine generates, using the event encoder, new event annotations for texts. The machine trains, using the preexisting set of natural language text documents and the new event annotations for the texts generated by the event encoder, an event extraction engine to extract events from natural language texts in the two or more natural languages. The event extraction engine leverages the parser in each of the two or more natural languages.

First claim

Opening claim text (preview).

What is claimed is: 1. An event extraction training apparatus, the apparatus comprising: processing circuitry and memory; the processing circuitry to: access a preexisting set of natural language text documents in two or more natural languages, wherein each natural language text document in at least a portion of the preexisting set is associated with an event; train, using the preexisting set of natural language text documents and the associated events, an event encoder to learn associations between texts and event annotations, wherein the event encoder leverages a parser in each of the two or more natural languages, wherein the preexisting set of natural language text documents comprises more than a training adequacy threshold number of texts annotated with a first event type in a first natural language and fewer than a training inadequacy threshold number of texts annotated with the first event type in a second natural language, wherein the training inadequacy threshold number is less than the training adequacy threshold number; generate, using the event encoder, new event annotations for texts; train, using the preexisting set of natural language text documents and the new event annotations for the texts generated by the event encoder, an event extraction engine to extract events from natural language texts in the two or more natural languages, wherein the event extraction engine leverages the parser in each of the two or more natural languages; and provide an output representing the trained event extraction engine, wherein the trained event extraction engine is trained to extract events of the first event type from texts in the first natural language and texts in the second natural language. 2. The event extraction training apparatus of claim 1 , wherein each event comprises one or more trigger words and one or more arguments. 3. The event extraction training apparatus of claim 2 , wherein the one or more arguments comprise one or more of: an agent/subject of the event, a patient/object of the event, a geographic location of the event, a time of the event, and an artifact of the event. 4. The event extraction training apparatus of claim 2 , wherein the one or more trigger words comprise one or more verbs representing an action of the event. 5. The event extraction training apparatus of claim 2 , wherein each event is represented as a numeric vector representing the one or more trigger words and the one or more arguments. 6. The event extraction training apparatus of claim 1 , wherein the processing circuitry is further to: receive a new natural language text in one of the two or more natural languages; identify, using the event extraction engine, a new event in the new natural language text; and provide an output representing the new event. 7. The event extraction training apparatus of claim 1 , wherein the parser comprises one or more of: a grammatical parser and a semantic parser. 8. An event extraction inferencing, apparatus, the apparatus comprising: processing circuitry and memory; the processing circuitry to: receive a new natural language text; identify, using an event extraction engine, a new event in the new natural language text; and provide an output representing the new event, wherein the event extraction engine is trained by: accessing, at a training apparatus, a preexisting set of natural language text documents in two or more natural languages, wherein each natural language text document in at least a portion of the preexisting set is associated with an event, and wherein the new natural language text is in one of the two or more natural languages; training, using the preexisting set of natural language text documents and the associated events, an event encoder to learn associations between texts and event annotations, wherein the event encoder leverages a parser in each of the two or more natural languages, wherein the preexisting set of natural language text documents comprises more than a training adequacy threshold number of texts annotated with a first event type in a first natural language and fewer than a training inadequacy threshold number of texts annotated with the first event type in a second natural language, wherein the training inadequacy threshold number is less than the training adequacy threshold number; generating, using the event encoder, new event annotations for texts; and training, using the preexisting set of natural language text documents and the new event annotations for the texts generated by the event encoder, the event extraction engine to extract events from natural language texts in the two or more natural languages, wherein the event extraction engine leverages the parser in each of the two or more natural languages, wherein the trained event extraction engine is trained to extract events of the first event type from texts in the first natural language and texts in the second natural language. 9. The event extraction inferencing apparatus of claim 8 , wherein each event comprises one or more trigger words and one or more arguments. 10. The event extraction inferencing apparatus of claim 8 , wherein the parser comprises one or more of: a grammatical parser and a semantic parser. 11. A non-transitory machine-readable medium storing instructions which, when executed by processing circuitry of one or more machines, cause the processing circuitry to: access a preexisting set of natural language text documents in two or more natural languages, wherein each natural language text document in at least a portion of the preexisting set is associated with an event; train, using the preexisting set of natural language text documents and the associated events, an event encoder to learn associations between texts and event annotations, wherein the event encoder leverages a parser in each of the two or more natural languages wherein the preexisting set of natural language text documents comprises more than a training adequacy threshold number of texts annotated with a first event type in a first natural language and fewer than a training inadequacy threshold number of texts annotated with the first event type in a second natural language, wherein the training inadequacy threshold number is less than the training adequacy threshold number; generate; using the event encoder; new event annotations for texts; train, using the preexisting set of natural language text documents and the new event annotations for the texts generated by the event encoder, an event extraction engine to extract events from natural language texts in the two or more natural languages, wherein the event extraction engine leverages the parser in each of the two or more natural languages; and provide an output representing the trained event extraction engine, wherein the trained event extraction engine is trained to extract events of the first event type from texts in the first natural language and texts in the second natural language. 12. The machine-readable medium of claim 11 , wherein each event comprises one or more trigger words and one or more arguments. 13. The machine-readable medium of claim 12 , wherein the one or more arguments comprise one or more of: an agent/subject of the event, a patient/object of the event, a geographic location of the event, a time of the event, and an artifact of the event. 14. The machine-readable medium of claim 12 , wherein the one or more trigger words comprise one or more verbs representing an action of the event. 15. The machine-readable medium of claim 12 , wherein each event is represented as a numeric vector representing the one or more trigger words an

Assignees

Inventors

Classifications

  • Feedforward networks · CPC title

  • Weakly supervised learning, e.g. semi-supervised or self-supervised learning · CPC title

  • Supervised learning · CPC title

  • characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU] · CPC title

  • Architecture, e.g. interconnection topology · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11227128B2 cover?
A machine accesses a preexisting set of natural language text documents in multiple natural languages. Each natural language text document in at least a portion of the preexisting set is associated with an event. The machine trains, using the preexisting set of natural language text documents and the associated events, an event encoder to learn associations between texts and event annotations. …
Who is the assignee on this patent?
Raytheon Bbn Technologies Corp
What technology area does this patent fall under?
Primary CPC classification G06F40/216. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Jan 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).