System and method for language-independent contextual embedding

US11170169B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11170169-B2
Application numberUS-201916369437-A
CountryUS
Kind codeB2
Filing dateMar 29, 2019
Priority dateMar 29, 2019
Publication dateNov 9, 2021
Grant dateNov 9, 2021

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Disclosed is a system for language-independent contextual embedding of entities in a document that includes sentences. The system has a database and a processing arrangement. The processing arrangement has a tokenizer module for tokenizing sentences to obtain tokens, an encoder module for determining character coordinate corresponding to the tokens, wherein the character coordinates corresponding to the tokens occur in a multi-dimensional hierarchical space. The system has a transmutation module for processing the character coordinates to generate contextual embeddings thereof in the multi-dimensional hierarchical space and a prediction module for memorizing sequential information pertaining to the contextual embeddings of the character coordinates.

First claim

Opening claim text (preview).

What is claimed is: 1. A system for character based contextual embedding of entities in a document, the document comprising a plurality of sentences, wherein the system comprises: a database; and a processor communicably coupled, via one or more data communication networks, to the database, wherein the processor is configured to: tokenize each of the plurality of sentences of the document to obtain a plurality of tokens; determine at least one character coordinate corresponding to each of the plurality of tokens utilizing a language relating to the document, wherein each of the at least one character coordinate corresponding to each of the plurality of tokens occurs in a multi-dimensional hierarchical space; process the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space by implementing a plurality of transmutation layers, wherein the plurality of transmutation layers employ machine learning algorithm; and memorize sequential information pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens by implementing a plurality of prediction layers, wherein the plurality of prediction layers employ machine learning algorithms, wherein the plurality of prediction layers are trained by employing a generator-adversarial network, and wherein the generator-adversarial network is implemented by a generator neural network employing generative algorithms to create new data instances and a discriminator neural network employing discriminative algorithms to evaluate the new data instances. 2. The system of claim 1 , wherein the plurality of transmutation layers and the plurality of prediction layers, employing the machine learning algorithms, are trained using unsupervised learning techniques. 3. The system of claim 2 , wherein an unlabeled training dataset for the plurality of transmutation layers includes a first set of existing publications and an unlabeled training dataset for the plurality of prediction layers includes a second set of existing publications. 4. The system of claim 1 , wherein the system further: determines a loss score relating to the plurality of transmutation layers and the plurality of prediction layers; and re-trains the plurality of transmutation layers and the plurality of prediction layers, for determining optimum character based contextual embedding of entities in the document. 5. The system of claim 1 , wherein the database includes at least one ontology therein. 6. The system of claim 5 , wherein the processor employs the at least one ontology stored in the database of the system for tokenizing each of the plurality of sentences of the document to obtain the plurality of tokens. 7. A method for character based contextual embedding of entities in a document, wherein the method is implemented via a system comprising a processor communicably coupled, via one or more data communication networks, to a database, the method comprising tokenizing each of the plurality of sentences of the document, to obtain a plurality of tokens; determining at least one character coordinate corresponding to each of the plurality of tokens utilizing a language relating to the document, wherein each of the character coordinate corresponding to each of the plurality of tokens occurs in a multi-dimensional hierarchical space; processing the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space by implementing a plurality of transmutation layers, wherein the plurality of transmutation layers employ machine learning algorithms; and memorizing sequential information pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens by implementing a plurality of prediction layers, wherein the plurality of prediction layers employ machine learning algorithms, wherein the plurality of prediction layers are trained by employing a generator-adversarial network, and wherein the generator-adversarial network is implemented by a generator neural network employing generative algorithms to create new data instances and a discriminator neural network employing discriminative algorithms to evaluate the new data instances. 8. The method of claim 7 , wherein the method employs training the plurality of transmutation layers and the plurality of prediction layers employing machine learning algorithms using unsupervised learning techniques. 9. The method of claim 8 , wherein an unlabeled training dataset for the plurality of transmutation layers includes a first set of existing publications and an unlabeled training dataset for the plurality of prediction layers includes a second set of existing publications. 10. The method of claim 7 , wherein the method further includes: determining a loss score relating to the plurality of transmutation layers and the plurality of prediction layers; and re-training the plurality of transmutation layers and the plurality of prediction layers, for determining optimum character based contextual embedding of entities in the document. 11. The method of claim 7 , wherein database includes at least one ontology therein. 12. The method of claim 11 , wherein the method employs the at least one ontology stored in the database of the system for tokenizing each of the plurality of sentences of the document to obtain the plurality of tokens. 13. A computer program product comprising non-transitory computer-readable storage media having computer-readable instructions stored thereon, the computer-readable instructions being executable by a computerized device comprising processing hardware to execute a method of claim 7 .

Assignees

Inventors

Classifications

  • Combinations of networks · CPC title

  • Generative networks · CPC title

  • Adversarial learning · CPC title

  • Auto-encoder networks; Encoder-decoder networks · CPC title

  • Convolutional networks [CNN, ConvNet] · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11170169B2 cover?
Disclosed is a system for language-independent contextual embedding of entities in a document that includes sentences. The system has a database and a processing arrangement. The processing arrangement has a tokenizer module for tokenizing sentences to obtain tokens, an encoder module for determining character coordinate corresponding to the tokens, wherein the character coordinates correspondi…
Who is the assignee on this patent?
Innoplexus Ag
What technology area does this patent fall under?
Primary CPC classification G06F40/30. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Nov 09 2021 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 5 related publications on this page (citations in our corpus or others sharing the same primary CPC).