Relevant passage retrieval system
US-2019303375-A1 · Oct 3, 2019 · US
US11475067B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11475067-B2 |
| Application number | US-201916698080-A |
| Country | US |
| Kind code | B2 |
| Filing date | Nov 27, 2019 |
| Priority date | Nov 27, 2019 |
| Publication date | Oct 18, 2022 |
| Grant date | Oct 18, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Techniques for generation of synthetic queries from customer data for training of document querying machine learning (ML) models as a service are described. A service may receive one or more documents from a user, generate a set of question and answer pairs from the one or more documents from the user using a machine learning model trained to predict a question from an answer, and store the set of question and answer pairs generated from the one or more documents from the user. The question and answer pairs may be used to train another machine learning model, for example, a document ranking model, a passage ranking model, a question/answer model, or a frequently asked question (FAQ) model.
Opening claim text (preview).
What is claimed is: 1. A computer-implemented method comprising: training a language machine learning model on a first set of public documents including known question and answer pairs to predict a question from an answer in the first set of public documents; receiving a second set of private documents of a user; generating a set of question and answer pairs from the second set of private documents of the user using the language machine learning model; training a second machine learning model specifically for the user with the set of question and answer pairs generated from the second set of private documents of the user; receiving a search query from the user; generating a result for an input of the search query on data of the user using the second machine learning model; and providing the result to the user. 2. The computer-implemented method of claim 1 , wherein the training comprises training the language machine learning model to predict each successive word of a known question from its known answer for the known question and answer pairs. 3. The computer-implemented method of claim 1 , wherein the result is a top ranked document of the data of the user. 4. A computer-implemented method comprising: receiving a set of private documents of a user; generating a set of question and answer pairs from the set of private documents of the user using a first machine learning model trained on public documents to predict a question from an answer; training a second machine learning model specifically for the user with the set of question and answer pairs generated from the set of private documents of the user; receiving a search query from the user; generating a result for an input of the search query on data of the user using the second machine learning model; and providing the result to the user. 5. The computer-implemented method of claim 4 , wherein the training the second machine learning model comprises training the second machine learning model to predict each successive word of a known question from its known answer for the set of question and answer pairs from the set of private documents of the user. 6. The computer-implemented method of claim 5 , wherein the training the second machine learning model comprises training the second machine learning model to predict an end of question token for the known question from the known answer from the set of private documents. 7. The computer-implemented method of claim 4 , wherein the generating the set of question and answer pairs from the set of private documents of the user, using the first machine learning model, comprises generating a plurality of questions for a single answer of at least one of the set of question and answer pairs from the set of private documents of the user. 8. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked answers from the data of the user for the search query from the user. 9. The computer-implemented method of claim 8 , further comprising displaying the set of top ranked answers to the user. 10. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked documents from the data of the user for the search query from the user. 11. The computer-implemented method of claim 10 , further comprising displaying the set of top ranked documents to the user. 12. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked passages from the data of the user for the search query from the user. 13. The computer-implemented method of claim 12 , further comprising displaying the set of top ranked passages to the user. 14. A system comprising: a document storage service implemented by a first one or more electronic devices to store a set of private documents from a user; and a training data generation service implemented by a second one or more electronic devices, the training data generation service including instructions that upon execution cause the training data generation service to: receive the set of private documents of the user, generate a set of question and answer pairs from the set of private documents of the user using a first machine learning model trained on public documents to predict a question from an answer, train a second machine learning model specifically for the user with the set of question and answer pairs generated from the set of private documents of the user, receive a search query from the user, generate a result for an input of the search query on data of the user using the second machine learning model, and provide the result to the user. 15. The system of claim 14 , wherein the training data generation service includes instructions that upon execution cause the training data generation service to train the second machine learning model to predict each successive word of a known question from its known answer for the set of question and answer pairs from the set of private documents of the user. 16. The system of claim 15 , wherein the training data generation service includes instructions that upon execution cause the training data generation service to train the second machine learning model to predict an end of question token for the known question from the known answer from the set of private documents. 17. The system of claim 14 , wherein the training data generation service generates a plurality of questions for a single answer of at least one of the set of question and answer pairs from the set of private documents of the user. 18. The system of claim 14 , further comprising a model building service implemented by a third one or more electronic devices, the model building service including instructions that upon execution cause the model building service to train the second machine learning model, with the set of question and answer pairs generated from the set of private documents, to determine a set of top ranked answers from data of the user for a search query from the user. 19. The system of claim 14 , wherein the result comprises a set of top ranked answers from the data of the user for the search query from the user. 20. The system of claim 14 , wherein the result comprises a set of top ranked documents from the data of the user for the search query from the user.
Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title
Natural language query formulation or dialogue systems · CPC title
based on feedback of a supervisor · CPC title
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
Document management systems · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.