What technology area does this patent fall under?

Primary CPC classification G06F16/335. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Oct 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Systems, apparatuses, and methods to generate synthetic queries from customer data for training of document querying machine learning models

US11475067B2 · US · B2

Patent metadata
Field	Value
Publication number	US-11475067-B2
Application number	US-201916698080-A
Country	US
Kind code	B2
Filing date	Nov 27, 2019
Priority date	Nov 27, 2019
Publication date	Oct 18, 2022
Grant date	Oct 18, 2022

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Techniques for generation of synthetic queries from customer data for training of document querying machine learning (ML) models as a service are described. A service may receive one or more documents from a user, generate a set of question and answer pairs from the one or more documents from the user using a machine learning model trained to predict a question from an answer, and store the set of question and answer pairs generated from the one or more documents from the user. The question and answer pairs may be used to train another machine learning model, for example, a document ranking model, a passage ranking model, a question/answer model, or a frequently asked question (FAQ) model.

First claim

Opening claim text (preview).

What is claimed is: 1. A computer-implemented method comprising: training a language machine learning model on a first set of public documents including known question and answer pairs to predict a question from an answer in the first set of public documents; receiving a second set of private documents of a user; generating a set of question and answer pairs from the second set of private documents of the user using the language machine learning model; training a second machine learning model specifically for the user with the set of question and answer pairs generated from the second set of private documents of the user; receiving a search query from the user; generating a result for an input of the search query on data of the user using the second machine learning model; and providing the result to the user. 2. The computer-implemented method of claim 1 , wherein the training comprises training the language machine learning model to predict each successive word of a known question from its known answer for the known question and answer pairs. 3. The computer-implemented method of claim 1 , wherein the result is a top ranked document of the data of the user. 4. A computer-implemented method comprising: receiving a set of private documents of a user; generating a set of question and answer pairs from the set of private documents of the user using a first machine learning model trained on public documents to predict a question from an answer; training a second machine learning model specifically for the user with the set of question and answer pairs generated from the set of private documents of the user; receiving a search query from the user; generating a result for an input of the search query on data of the user using the second machine learning model; and providing the result to the user. 5. The computer-implemented method of claim 4 , wherein the training the second machine learning model comprises training the second machine learning model to predict each successive word of a known question from its known answer for the set of question and answer pairs from the set of private documents of the user. 6. The computer-implemented method of claim 5 , wherein the training the second machine learning model comprises training the second machine learning model to predict an end of question token for the known question from the known answer from the set of private documents. 7. The computer-implemented method of claim 4 , wherein the generating the set of question and answer pairs from the set of private documents of the user, using the first machine learning model, comprises generating a plurality of questions for a single answer of at least one of the set of question and answer pairs from the set of private documents of the user. 8. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked answers from the data of the user for the search query from the user. 9. The computer-implemented method of claim 8 , further comprising displaying the set of top ranked answers to the user. 10. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked documents from the data of the user for the search query from the user. 11. The computer-implemented method of claim 10 , further comprising displaying the set of top ranked documents to the user. 12. The computer-implemented method of claim 4 , wherein the result comprises a set of top ranked passages from the data of the user for the search query from the user. 13. The computer-implemented method of claim 12 , further comprising displaying the set of top ranked passages to the user. 14. A system comprising: a document storage service implemented by a first one or more electronic devices to store a set of private documents from a user; and a training data generation service implemented by a second one or more electronic devices, the training data generation service including instructions that upon execution cause the training data generation service to: receive the set of private documents of the user, generate a set of question and answer pairs from the set of private documents of the user using a first machine learning model trained on public documents to predict a question from an answer, train a second machine learning model specifically for the user with the set of question and answer pairs generated from the set of private documents of the user, receive a search query from the user, generate a result for an input of the search query on data of the user using the second machine learning model, and provide the result to the user. 15. The system of claim 14 , wherein the training data generation service includes instructions that upon execution cause the training data generation service to train the second machine learning model to predict each successive word of a known question from its known answer for the set of question and answer pairs from the set of private documents of the user. 16. The system of claim 15 , wherein the training data generation service includes instructions that upon execution cause the training data generation service to train the second machine learning model to predict an end of question token for the known question from the known answer from the set of private documents. 17. The system of claim 14 , wherein the training data generation service generates a plurality of questions for a single answer of at least one of the set of question and answer pairs from the set of private documents of the user. 18. The system of claim 14 , further comprising a model building service implemented by a third one or more electronic devices, the model building service including instructions that upon execution cause the model building service to train the second machine learning model, with the set of question and answer pairs generated from the set of private documents, to determine a set of top ranked answers from data of the user for a search query from the user. 19. The system of claim 14 , wherein the result comprises a set of top ranked answers from the data of the user for the search query from the user. 20. The system of claim 14 , wherein the result comprises a set of top ranked documents from the data of the user for the search query from the user.

Assignees

Amazon Tech Inc

Inventors

Classifications

G06F16/335Primary
Filtering based on additional data, e.g. user or group profiles (filtering in web context G06F16/9535, G06F16/9536) · CPC title
G06F16/90332Primary
Natural language query formulation or dialogue systems · CPC title
G06F18/2178
based on feedback of a supervisor · CPC title
G06F18/2148
characterised by the process organisation or structure, e.g. boosting cascade · CPC title
G06F16/93
Document management systems · CPC title

Patent family

Related publications grouped by family.

View patent family 75975362

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US11475067B2 cover?: Techniques for generation of synthetic queries from customer data for training of document querying machine learning (ML) models as a service are described. A service may receive one or more documents from a user, generate a set of question and answer pairs from the one or more documents from the user using a machine learning model trained to predict a question from an answer, and store the set…
Who is the assignee on this patent?: Amazon Tech Inc
What technology area does this patent fall under?: Primary CPC classification G06F16/335. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Oct 18 2022 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 12 related publications on this page (citations in our corpus or others sharing the same primary CPC).