Systems, apparatuses, and methods to generate synthetic queries from customer data for training of document querying machine learning models

US11475067B2 · US · B2

Patent metadata
FieldValue
Publication numberUS-11475067-B2
Application numberUS-201916698080-A
CountryUS
Kind codeB2
Filing dateNov 27, 2019
Priority dateNov 27, 2019
Publication dateOct 18, 2022
Grant dateOct 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for generation of synthetic queries from customer data for training of document querying machine learning (ML) models as a service are described. A service may receive one or more documents from a user, generate a set of question and answer pairs from the one or more documents from the user using a machine learning model trained to predict a question from an answer, and store the set of question and answer pairs generated from the one or more documents from the user. The question and answer pairs may be used to train another machine learning model, for example, a document ranking model, a passage ranking model, a question/answer model, or a frequently asked question (FAQ) model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: training a language machine learning model on a first set of public documents including known question and answer pairs to predict a question from an answer in the first set of public documents; receiving a second set of private documents of a user; generating a set of question and answer pairs from the second set of private documents of the user using the language machine learning model; training a second machine learning model specifically for the user with the set of question and answer pairs generated from the second set of private documents of the user; receiving a search query from the user; generating a result for an input of the search query on data of the user using the second machine learning model; and providing the result to the user. 2. The computer-implemented method of claim 1 , wherein the training comprises training the language machine learning model to predict each successive word of a known question from its known answer for the known question and answer pairs. 3. The computer-implemented method of claim 1 , wherein the result is a top ranked document of the data of the user. 4. A computer-implemented method comprising: receiving a set of private documents of a user; generating a set of question and answer pairs from the set of private documents of the user using a first machine learning model trained on public documents to predict a question from an answer; training a second machine learning model specifically for the user with the set of question and answer pairs generated from the set of private documents of the user; receiving a search query from the user; generating a result for an input of the search query on data of the user using the second machine learning model; and providing the result to the user. 5. The computer-implemented method of claim 4 , wherein the training the second machine learning model comprises training the second machine learning model to predict each successive word of a known question from its known answer for the set of question and answer pairs from the set of private documents of the user. 6. The computer-implemented method of claim 5 , wherein the training the second machine learning model comprises training the second machine learning model to predict an end of question token for the known question from the known answer from the set of private documents. 7. The computer-implemented method of claim 4 , wherein the generating the set of question and answer pairs from the set of private documents of the user, using the first machine learning model, comprises generating a plurality of questions for a single answer of at least one of the set of question and answer pairs from the set of private documents of the user. 8. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked answers from the data of the user for the search query from the user. 9. The computer-implemented method of claim 8 , further comprising displaying the set of top ranked answers to the user. 10. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked documents from the data of the user for the search query from the user. 11. The computer-implemented method of claim 10 , further comprising displaying the set of top ranked documents to the user. 12. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked passages from the data of the user for the search query from the user. 13. The computer-implemented method of claim 12 , further comprising displaying the set of top ranked passages to the user. 14. A system comprising: a document storage service implemented by a first one or more electronic devices to store a set of private documents from a user; and a training data generation service implemented by a second one or more electronic devices, the training data generation service including instructions that upon execution cause the training data generation service to: receive the set of private documents of the user, generate a set of question and answer pairs from the set of private documents of the user using a first machine learning model trained on public documents to predict a question from an answer, train a second machine learning model specifically for the user with the set of question and answer pairs generated from the set of private documents of the user, receive a search query from the user, generate a result for an input of the search query on data of the user using the second machine learning model, and provide the result to the user. 15. The system of claim 14 , wherein the training data generation service includes instructions that upon execution cause the training data generation service to train the second machine learning model to predict each successive word of a known question from its known answer for the set of question and answer pairs from the set of private documents of the user. 16. The system of claim 15 , wherein the training data generation service includes instructions that upon execution cause the training data generation service to train the second machine learning model to predict an end of question token for the known question from the known answer from the set of private documents. 17. The system of claim 14 , wherein the training data generation service generates a plurality of questions for a single answer of at least one of the set of question and answer pairs from the set of private documents of the user. 18. The system of claim 14 , further comprising a model building service implemented by a third one or more electronic devices, the model building service including instructions that upon execution cause the model building service to train the second machine learning model, with the set of question and answer pairs generated from the set of private documents, to determine a set of top ranked answers from data of the user for a search query from the user. 19. The system of claim 14 , wherein the result comprises a set of top ranked answers from the data of the user for the search query from the user. 20. The system of claim 14 , wherein the result comprises a set of top ranked documents from the data of the user for the search query from the user.

Assignees

Inventors

Classifications

  • G06F16/335Primary

    Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title

  • Natural language query formulation or dialogue systems · CPC title

  • based on feedback of a supervisor · CPC title

  • characterised by the process organisation or structure, e.g. boosting cascade · CPC title

  • Document management systems · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11475067B2 cover?
Techniques for generation of synthetic queries from customer data for training of document querying machine learning (ML) models as a service are described. A service may receive one or more documents from a user, generate a set of question and answer pairs from the one or more documents from the user using a machine learning model trained to predict a question from an answer, and store the set…
Who is the assignee on this patent?
Amazon Tech Inc
What technology area does this patent fall under?
Primary CPC classification G06F16/335. Mapped technology areas include Physics.
When was this patent published?
Publication date Tue Oct 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).