Unknown word predictor and content-integrated translator
US-2017344530-A1 · Nov 30, 2017 · US
US2018137090A1 · US · A1
| Field | Value |
|---|---|
| Publication number | US-2018137090-A1 |
| Application number | US-201615350355-A |
| Country | US |
| Kind code | A1 |
| Filing date | Nov 14, 2016 |
| Priority date | Nov 14, 2016 |
| Publication date | May 17, 2018 |
| Grant date | — |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for determining a similarity between text segments within a document comprising textual references are described. According to an example, a system comprises a memory that stores computer executable components; and a processor that executes the computer executable components stored in the memory. The computer executable components can comprise: an identification component that identifies a reference associated with a set of text and an extraction component that extracts the reference from the set of text. The computer executable components can also comprise an embedding component that replaces the reference with a corresponding vector.
Opening claim text (preview).
What is claimed is: 1 . A system comprising: a memory that stores computer executable components; a processor that executes the computer executable components stored in the memory, wherein the computer executable components comprise: an identification component that identifies a reference associated with a set of text; an extraction component that extracts the reference from the set of text; and an embedding component that replaces the reference with a corresponding vector. 2 . The system of claim 1 , further comprising a first determination component that determines a similarity between an identified first language associated with an embedded reference and an identified second language associated with another embedded reference, wherein the similarity is based on a group of operations consisting of a cosine similarity operation and a machine learning operation, and wherein the embedded reference and the another embedded reference comprises a vector and another vector respectively that are capable of being analyzed by the group of operations. 3 . The system of claim 1 , wherein the identification component comprises components from the group consisting of a hyperlink identification component that identifies whether the reference is linked by a hyperlink to the set of text and a contextualization component that identifies an organizational framework of the reference within the set of text. 4 . The system of claim 3 , wherein the extraction component comprises a template extraction component that extracts a reference template from the organizational framework, wherein the reference template facilitates access to a set of data corresponding to the reference. 5 . They system of claim 4 , further comprising a template matching component that matches the reference to a location within the reference template. 6 . The system of claim 1 , further comprising a rule matching component that organizes one or more clause within the set of text according to one or more clause rules. 7 . The system of claim 1 , further comprising an annotation component that annotates a version of one or more clauses of the set of text based on a structural rule representing grammatical requirements for a set of clauses. 8 . The system of claim 1 , wherein the extraction component employs a defined term extraction component that extracts a defined term from the set of text, wherein an extraction of the defined term is based on performance of a semantic parsing operation on the set of text. 9 . The system of claim 8 , further comprising an ontological matching component that ontologically matches the defined term to a reference term within the set of text. 10 . The system of claim 9 , further comprising a second determination component that determines a similarity score based on a comparison of the defined term and the reference term, wherein the similarity score represents a degree of similarity between the defined term and the reference term. 11 . The system of claim 10 , wherein the embedding component embeds a first version of the set of text based on the similarity score being greater than a threshold score, wherein the first version of the set of text comprises the defined term and a reference vector. 12 . The system of claim 11 , wherein the embedding component comprises a construction component that embeds the defined term with the reference vector based on a neural sentence embedding model, wherein the defined term is represented by a common language term based on a neural sentence embedding model. 13 . A computer-implemented method, comprising: identifying, by a system operatively coupled to a processor, a reference associated with a set of text; extracting, by the system, the reference from the set of text; and embedding, by the system, a vector corresponding to the reference as a replacement for the reference. 14 . The computer-implemented method of claim 13 , further comprising determining, by the system, a similarity between a first language associated with an embedded reference and a second language associated with another embedded reference, wherein the similarity is based on a group of operations consisting of a cosine similarity operation and a machine learning algorithm, and wherein the embedded reference and the another embedded reference comprises a vector and another vector respectively that are capable of being analyzed by the group of operations. 15 . The computer-implemented method of claim 13 , further comprising extracting, by the system, a reference template from an organizational framework, wherein the reference template facilitates access to a set of data corresponding to the reference. 16 . The computer-implemented method of claim 15 , further comprising annotating, by the system, a version of one or more clauses of the set of text based on a structural rule representing grammatical requirements for a set of clauses. 17 . The computer-implemented method of claim 13 , further comprising extracting, by the system, a defined term from the set of text, wherein an extraction of the defined term is based on performance of a semantic parsing operation on the set of text. 18 . A computer program product for efficiently determining textual similarities, the computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to: identify a reference associated with a set of text; extract the reference from the set of text; and embed a vector corresponding to the reference as a replacement for the reference. 19 . The computer program product of claim 18 , wherein the program instructions are further executable by the processor to cause the processor to: determine a similarity between a first language associated with an embedded reference and a second language associated with another embedded reference, wherein the similarity is based on a group of operations consisting of a cosine similarity operation and a machine learning algorithm, and wherein the embedded reference and the another embedded reference comprises a vector and another vector respectively that are capable of being analyzed by the group of operations. 20 . The computer program product of claim 18 , wherein the program instructions are further executable by the processor to cause the processor to: extract a reference template from an organizational framework, wherein the reference template facilitates access to a set of data corresponding to the reference.
Semantic analysis · CPC title
Parsing · CPC title
Physics · mapped topic
Physics · mapped topic
Physics · mapped topic
Related publications grouped by family.
Answers are generated from the same data shown on this page.