Method and system for performing negotiation task using reinforcement learning agents
US-11521281-B2 · Dec 6, 2022 · US
US12086895B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-12086895-B2 |
| Application number | US-202217575908-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jan 14, 2022 |
| Priority date | Dec 21, 2021 |
| Publication date | Sep 10, 2024 |
| Grant date | Sep 10, 2024 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Automated negotiation agent adaptation is performed by detecting change in a utility function involved in automated negotiation between a supporting agent and an opposing agent while the supporting agent operates according to a first negotiation strategy model, generating a plurality of training samples from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to a baseline negotiation strategy model, and training an initialized negotiation strategy model using the plurality of training samples to produce a second negotiation strategy model.
Opening claim text (preview).
What is claimed is: 1. A computer-readable medium including instructions executable by a computer to cause the computer to perform operations comprising: detecting change in a utility function involved in automated negotiation between a supporting agent and an opposing agent while the supporting agent operates according to a first negotiation strategy model; generating a plurality of training samples from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to a baseline negotiation strategy model; and training an initialized negotiation strategy model using the plurality of training samples to produce a second negotiation strategy model. 2. The computer-readable medium of claim 1 , wherein the utility function is an opposing utility function of the opposing agent. 3. The computer-readable medium of claim 2 , wherein the detecting change includes: obtaining a recent negotiation trace from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to the first negotiation strategy model, the negotiation trace including a plurality of time steps, each time step among the plurality of time steps including an opposing agent offer and a supporting agent offer, and comparing the opposing agent offers of the recent negotiation trace with opposing agent offers of a previous negotiation trace from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to the first negotiation strategy model. 4. The computer-readable medium of claim 3 , wherein the detecting further includes estimating a change value representing an amount of change in the opposing utility function, and the generating and training are performed in response to determining that the change value exceeds a threshold value. 5. The computer-readable medium of claim 2 , wherein the detecting is performed periodically. 6. The computer-readable medium of claim 1 , wherein the utility function is a first supporting utility function of the supporting agent. 7. The computer-readable medium of claim 6 , wherein the operations further comprise: receiving a second supporting utility function; wherein the detecting change includes comparing the first supporting utility function to the second utility function. 8. The computer-readable medium of claim 7 , wherein the comparing includes estimating a change value representing an amount of change between the first supporting utility function and the second supporting utility function, and the generating and training are performed in response to determining that the change value exceeds a threshold value. 9. The computer-readable medium of claim 7 , wherein the detecting is performed in response to receiving the second supporting utility function. 10. The computer-readable medium of claim 1 , wherein the generating a plurality of training samples includes obtaining a negotiation trace from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to a baseline negotiation strategy model, the negotiation trace including a plurality of time steps, each time step among the plurality of time steps including an opposing agent offer and a supporting agent offer. 11. The computer-readable medium of claim 10 , wherein each training sample among the plurality of samples includes a portion of consecutive time steps among the plurality of time steps from a first time step and an opposing agent offer of a subsequent time step among the plurality of time steps subsequent to the portion as an input, and further includes a supporting agent offer of the subsequent time step as a label. 12. The computer-readable medium of claim 11 , wherein the generating the plurality of training samples includes: weighting each training sample based on a utility value of the finally agreed offer of the negotiation trace, the utility value obtained by applying a supporting utility function to the finally agreed offer. 13. The computer-readable medium of claim 1 , wherein the training includes: generating an initialized negotiation strategy model by initializing a portion of the first negotiation strategy model with random values. 14. A computer-implemented method comprising: detecting change in a utility function involved in automated negotiation between a supporting agent and an opposing agent while the supporting agent operates according to a first negotiation strategy model; generating a plurality of training samples from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to a baseline negotiation strategy model; and training an initialized negotiation strategy model using the plurality of training samples to produce a second negotiation strategy model. 15. The computer-implemented method of claim 14 , wherein the utility function is an opposing utility function of the opposing agent. 16. The computer-implemented method of claim 15 , wherein the detecting change includes: obtaining a recent negotiation trace from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to the first negotiation strategy model, the negotiation trace including a plurality of time steps, each time step among the plurality of time steps including an opposing agent offer and a supporting agent offer, and comparing the opposing agent offers of the recent negotiation trace with opposing agent offers of a previous negotiation trace from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to the first negotiation strategy model. 17. The computer-implemented method of claim 16 , wherein the detecting further includes estimating a change value representing an amount of change in the opposing utility function, and the generating and training are performed in response to determining that the change value exceeds a threshold value. 18. The computer-implemented method of claim 1 , wherein the utility function is a first supporting utility function of the supporting agent. 19. The computer-implemented method of claim 18 , further comprising: receiving a second supporting utility function; wherein the detecting change includes comparing the first supporting utility function to the second utility function. 20. An apparatus comprising: a controller including circuitry configured to detect change in a utility function involved in automated negotiation between a supporting agent and an opposing agent while the supporting agent operates according to a first negotiation strategy model; generate a plurality of training samples from automated negotiation between the supporting agent and the opposing agent while the supporting agent operates according to a baseline negotiation strategy model; and train an initialized negotiation strategy model using the plurality of training samples to produce a second negotiation strategy model.
Representative agent · CPC title
Ensemble learning · CPC title
Machine learning · CPC title
Electronic negotiation · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.