Golden embeddings
US-11294974-B1 · Apr 5, 2022 · US
US11620319B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11620319-B2 |
| Application number | US-202117319940-A |
| Country | US |
| Kind code | B2 |
| Filing date | May 13, 2021 |
| Priority date | May 13, 2021 |
| Publication date | Apr 4, 2023 |
| Grant date | Apr 4, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Systems, methods, and computer program products for search platforms for unstructured interaction summaries. An application executing on a processor may receive a query comprising a term. The application may generate, based on an embedding vector and the term, an expanded query comprising a plurality of additional terms. The application may generate, based on a term frequency inverse document frequency model, a vector for the expanded query and generate an entity vector for the query. The application may generate a combined vector for the query based on the entity vector and the vector for the expanded query. The application may compute, based on the combined vector for the query and a feature matrix of a corpus, a respective cosine similarity score for a plurality of results in the corpus. The application may return one or more of the plurality of results as responsive to the query based on the similarity scores.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method, comprising: generating, by an application executing on a processor, a first vector for each of a plurality of text summaries in a corpus, wherein the first vector represents each term in the respective text summary as a respective feature of a plurality of features; generating, by the application, a second vector for the plurality of text summaries, wherein the second vector indicates whether each of a plurality of entities is present in the respective text summary; combining, by the application, the first vector and the second vector to produce a feature matrix for the corpus; receiving, by the application, a query comprising a term; generating, by the application based on an embedding vector and the term, an expanded query comprising a plurality of additional terms and the term; generating, by the application based on a term frequency-inverse document frequency (TF-IDF) model, a vector for the expanded query; generating, by the application, an entity vector for the query; generating, by the application, a combined vector for the query based on the entity vector and the vector for the expanded query; computing, by the application based on the combined vector for the query and the feature matrix for the corpus, a respective cosine similarity score for a plurality of results in the corpus; and returning, by the application, one or more of the plurality of results as responsive to the query based on the cosine similarity scores. 2. A non-transitory computer-readable storage medium, the computer-readable storage medium storing instructions that when executed by a processor, cause the processor to: generate, by an application executing on the processor, a first vector for each of a plurality of text summaries in a corpus, wherein the first vector represents each term in the respective text summary as a respective feature of a plurality of features; generate, by the application, a second vector for the plurality of text summaries, wherein the second vector indicates whether each of a plurality of entities is present in the respective text summary; combine, by the application, the first vector and the second vector to produce a feature matrix for the corpus; receive, by the application, a query comprising a term; generate, by the application based on an embedding vector and the term, an expanded query comprising a plurality of additional terms and the term; generate, by the application based on a based on a term frequency-inverse document frequency (TF-IDF) model, a vector for the expanded query; generate, by the application, an entity vector for the query; generate, by the application, a combined vector for the query based on the entity vector and the vector for the expanded query; compute, by the application based on the combined vector for the query and the feature matrix for the corpus, a respective cosine similarity score for a plurality of results in the corpus; and return, by the application, one or more of the plurality of results as responsive to the query based on the cosine similarity scores. 3. A computing apparatus comprising: a processor; and a memory storing instructions that, when executed by the processor, cause the processor to: generate, by an application executing on the processor, a first vector for each of a plurality of text summaries in a corpus, wherein the first vector represents each term in the respective text summary as a respective feature of a plurality of features; generate, by the application, a second vector for the plurality of text summaries, wherein the second vector indicates whether each of a plurality of entities is present in the respective text summary; combine, by the application, the first vector and the second vector to produce a feature matrix for the corpus; receive, by the application, a query comprising a term; generate, by the application based on an embedding vector and the term, an expanded query comprising a plurality of additional terms and the term; generate, by the application based on a based on a term frequency-inverse document frequency (TF-IDF) model, a vector for the expanded query; generate, by the application, an entity vector for the query; generate, by the application, a combined vector for the query based on the entity vector and the vector for the expanded query; compute, by the application based on the combined vector for the query and the feature matrix for the corpus, a respective cosine similarity score for a plurality of results in the corpus; and return, by the application, one or more of the plurality of results as responsive to the query based on the cosine similarity scores. 4. The computer-implemented method of claim 1 , wherein generating the entity vector comprises: identifying, by the application, a first entity of the plurality of entities in the corpus; and storing, by the application in the entity vector for the query, an indication that the query is associated with the first entity of the plurality of entities in the corpus. 5. The computer-implemented method of claim 1 , wherein generating the expanded query comprises: identifying, by the application based on the embedding vector and the term, a respective score for each of the plurality of additional terms; determining, by the application, a subset of the plurality of additional terms that have for which the score exceeds an expansion threshold; and adding, by the application, the subset of the plurality of additional terms having the score exceeding the expansion threshold to the query. 6. The computer-implemented method of claim 1 , wherein the combined vector for the query comprises a plurality of features, the method further comprising: receiving, by the application, input labeling a first feature of the plurality of features as relevant to the query; receiving, by the application, input labeling a second feature of the plurality of features as not relevant to the query; removing, by the application, the second feature from the combined vector for the query; and updating, by the application, the combined vector based on the remaining plurality of features and a respective weight for each remaining feature. 7. The computer-implemented method of claim 1 , wherein the cosine similarity scores are computed based on a product of the combined vector for the query and the feature matrix of the corpus. 8. The computer-implemented method of claim 1 , wherein the embedding vector comprises a plurality of entries, wherein each entry of the embedding vector is associated with a respective one of the additional terms and comprises a respective score for the additional term, wherein each score is based on a similarity between the respective additional term and the term, wherein the entity vector comprises a plurality of entries, wherein each entry of the entity vector is associated with a respective entity of the plurality of entities, wherein a value of the respective entry of the entity vector indicates that the respective entity is present in the query or that the respective entity is not present in the query. 9. The computer-readable storage medium of claim 2 , wherein the instructions to generate the entity vector comprise instructions that when executed by the processor cause the processor to: identify, by the application, a first entity of the plurality of entities in the corpus; and store, by the application in the entity vector for the query, an indication that the query is associated with the first entity of the plurality of entities in the corpus. 10. The computer-readable storage medium of claim 2 , wherein the instructions to generate the expanded query comprises instruction
Machine learning · CPC title
Query expansion · CPC title
Presentation of query results · CPC title
using vector based model · CPC title
Architecture, e.g. interconnection topology · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.