Computer-Implemented Generation and Utilization of a Universal Encoder Component
US-2020210523-A1 · Jul 2, 2020 · US
US11170169B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11170169-B2 |
| Application number | US-201916369437-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 29, 2019 |
| Priority date | Mar 29, 2019 |
| Publication date | Nov 9, 2021 |
| Grant date | Nov 9, 2021 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed is a system for language-independent contextual embedding of entities in a document that includes sentences. The system has a database and a processing arrangement. The processing arrangement has a tokenizer module for tokenizing sentences to obtain tokens, an encoder module for determining character coordinate corresponding to the tokens, wherein the character coordinates corresponding to the tokens occur in a multi-dimensional hierarchical space. The system has a transmutation module for processing the character coordinates to generate contextual embeddings thereof in the multi-dimensional hierarchical space and a prediction module for memorizing sequential information pertaining to the contextual embeddings of the character coordinates.
Opening claim text (preview).
What is claimed is: 1. A system for character based contextual embedding of entities in a document, the document comprising a plurality of sentences, wherein the system comprises: a database; and a processor communicably coupled, via one or more data communication networks, to the database, wherein the processor is configured to: tokenize each of the plurality of sentences of the document to obtain a plurality of tokens; determine at least one character coordinate corresponding to each of the plurality of tokens utilizing a language relating to the document, wherein each of the at least one character coordinate corresponding to each of the plurality of tokens occurs in a multi-dimensional hierarchical space; process the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space by implementing a plurality of transmutation layers, wherein the plurality of transmutation layers employ machine learning algorithm; and memorize sequential information pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens by implementing a plurality of prediction layers, wherein the plurality of prediction layers employ machine learning algorithms, wherein the plurality of prediction layers are trained by employing a generator-adversarial network, and wherein the generator-adversarial network is implemented by a generator neural network employing generative algorithms to create new data instances and a discriminator neural network employing discriminative algorithms to evaluate the new data instances. 2. The system of claim 1 , wherein the plurality of transmutation layers and the plurality of prediction layers, employing the machine learning algorithms, are trained using unsupervised learning techniques. 3. The system of claim 2 , wherein an unlabeled training dataset for the plurality of transmutation layers includes a first set of existing publications and an unlabeled training dataset for the plurality of prediction layers includes a second set of existing publications. 4. The system of claim 1 , wherein the system further: determines a loss score relating to the plurality of transmutation layers and the plurality of prediction layers; and re-trains the plurality of transmutation layers and the plurality of prediction layers, for determining optimum character based contextual embedding of entities in the document. 5. The system of claim 1 , wherein the database includes at least one ontology therein. 6. The system of claim 5 , wherein the processor employs the at least one ontology stored in the database of the system for tokenizing each of the plurality of sentences of the document to obtain the plurality of tokens. 7. A method for character based contextual embedding of entities in a document, wherein the method is implemented via a system comprising a processor communicably coupled, via one or more data communication networks, to a database, the method comprising tokenizing each of the plurality of sentences of the document, to obtain a plurality of tokens; determining at least one character coordinate corresponding to each of the plurality of tokens utilizing a language relating to the document, wherein each of the character coordinate corresponding to each of the plurality of tokens occurs in a multi-dimensional hierarchical space; processing the character coordinates corresponding to the plurality of tokens to generate contextual embeddings thereof in the multi-dimensional hierarchical space by implementing a plurality of transmutation layers, wherein the plurality of transmutation layers employ machine learning algorithms; and memorizing sequential information pertaining to the contextual embeddings of the character coordinates corresponding to the plurality of tokens by implementing a plurality of prediction layers, wherein the plurality of prediction layers employ machine learning algorithms, wherein the plurality of prediction layers are trained by employing a generator-adversarial network, and wherein the generator-adversarial network is implemented by a generator neural network employing generative algorithms to create new data instances and a discriminator neural network employing discriminative algorithms to evaluate the new data instances. 8. The method of claim 7 , wherein the method employs training the plurality of transmutation layers and the plurality of prediction layers employing machine learning algorithms using unsupervised learning techniques. 9. The method of claim 8 , wherein an unlabeled training dataset for the plurality of transmutation layers includes a first set of existing publications and an unlabeled training dataset for the plurality of prediction layers includes a second set of existing publications. 10. The method of claim 7 , wherein the method further includes: determining a loss score relating to the plurality of transmutation layers and the plurality of prediction layers; and re-training the plurality of transmutation layers and the plurality of prediction layers, for determining optimum character based contextual embedding of entities in the document. 11. The method of claim 7 , wherein database includes at least one ontology therein. 12. The method of claim 11 , wherein the method employs the at least one ontology stored in the database of the system for tokenizing each of the plurality of sentences of the document to obtain the plurality of tokens. 13. A computer program product comprising non-transitory computer-readable storage media having computer-readable instructions stored thereon, the computer-readable instructions being executable by a computerized device comprising processing hardware to execute a method of claim 7 .
Combinations of networks · CPC title
Generative networks · CPC title
Adversarial learning · CPC title
Auto-encoder networks; Encoder-decoder networks · CPC title
Convolutional networks [CNN, ConvNet] · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.