Who is the assignee on this patent?

Beijing Didi Infinity Technology & Dev Co Ltd

What technology area does this patent fall under?

Primary CPC classification G06Q50/40. Mapped technology areas include Physics.

When was this patent published?

Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.

What related patents are in patentsdb?

We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).

Method and system for constructing virtual environment for ride-hailing platforms

US12198216B2 · US · B2

Patent metadata
Field	Value
Publication number	US-12198216-B2
Application number	US-202017058407-A
Country	US
Kind code	B2
Filing date	May 14, 2020
Priority date	May 14, 2020
Publication date	Jan 14, 2025
Grant date	Jan 14, 2025

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

Title
What the patent document calls the invention.
Abstract
A short plain-language summary of the technical disclosure.
Assignees and inventors
Who owns or filed the patent and who is credited as inventor.
Key dates
Filing, priority, publication, and grant dates set the timeline.
First independent claim
The legal scope of protection — read this for what is actually claimed.
CPC / IPC classifications
Technology tags used to group this patent with similar filings.
Citations and related patents
Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for constructing a virtual environment for a ride-hailing platform are disclosed. An exemplary method comprises: obtaining a plurality of historical interaction trajectories each comprising one or more interaction records between a driver and a ride-hailing platform, each interaction record comprising a program recommendation of the ride-hailing platform to the driver and a reaction of the driver in response to the program recommendation; training a simulator based on the plurality of historical interaction trajectories; and integrating a reward function with the simulator to construct the virtual environment, wherein the plurality of first program recommendations and the plurality of reactions form a plurality of simulated interactions, and a data distribution of the plurality of simulated interactions approximates a data distribution of a plurality of interaction records in the plurality of historical interaction trajectories.

First claim

Opening claim text (preview).

We claim: 1. A computer-implemented method for constructing a virtual environment for a ride-hailing platform, comprising: obtaining a plurality of historical interaction trajectories each comprising one or more interaction records between a driver and the ride-hailing platform, each interaction record comprising a program recommendation of the ride-hailing platform to the driver and a reaction of the driver in response to the program recommendation; training a simulator based on the plurality of historical interaction trajectories using reinforcement learning (RL); and integrating a reward function with the simulator to construct the virtual environment, wherein the simulator comprises: a platform policy for generating a plurality of first program recommendations by a virtual ride-hailing platform, a confounding policy for generating a plurality of second program recommendations based on the plurality of first program recommendations and a plurality of confounding variables, and a driver policy for generating a plurality of reactions of a plurality of virtual drivers based on the plurality of first program recommendations and the plurality of second program recommendations, and wherein the plurality of first program recommendations and the plurality of reactions form a plurality of simulated interactions between the plurality of virtual drivers and the virtual ride-hailing platform, and a data distribution of the plurality of simulated interactions approximates a data distribution of a plurality of interaction records in the plurality of historical interaction trajectories, and the confounding policy and the driver policy are trained jointly as a confounder-driver policy, and the training of the simulator comprises: constructing a multi-agent generator based on the platform policy and the confounder-driver policy; inputting a driver state of a virtual driver to the multi-agent generator to generate a simulated interaction record according to the platform policy and the confounder-driver policy; determining a first state-action pair and a second state-action pair from the simulated interaction record; determining, based on a discriminator and the first state-action pair, a first reward for the platform policy, wherein the discriminator is trained to determine a probability that a state-action pair is from the data distribution of the plurality of interaction records in the plurality of historical interaction trajectories; determining, based on the discriminator and the second state-action pair, a second reward for the confounder-driver policy; and optimizing the platform policy and the confounder-driver policy according to the first reward and the second reward, respectively. 2. The method of claim 1 , wherein the integrating a reward function with the simulator comprises: obtaining a plurality of control-treatment data sets from randomized trial experiments in the ride-hailing platform; training an uplift inference network based on the plurality of control-treatment data sets, wherein the trained uplift inference network infers a plurality of uplifts corresponding to the plurality of first program recommendations in response to a given driver state, each of the plurality of uplifts indicating a reward difference between (1) the virtual ride-hailing platform not making the corresponding first program recommendation in response to the given driver state and (2) the virtual ride-hailing platform making the corresponding first program recommendation in response to the given driver state; and integrating the trained uplift inference network as the reward function with the simulator to construct the virtual environment. 3. The method of claim 2 , wherein the control-treatment data set comprises a plurality of treatment data entries and a plurality of control data entries, the plurality of treatment data entries comprising a plurality of rewards for the ride-hailing platform making one or more program recommendations, and the plurality of control data entries comprising a plurality of rewards for the ride-hailing platform not making the one or more program recommendations. 4. The method of claim 2 , wherein the uplift inference network comprises a feature extraction subnetwork for extracting a plurality of features from an input driver state, and an uplift inference subnetwork for inferring an uplift for a first program recommendation in response to the input driver state. 5. The method of claim 4 , wherein the uplift inference subnetwork comprises a treatment branch and a control branch, and the training an uplift inference network comprises: training the feature extraction subnetwork and the treatment branch based on the control-treatment data set; and training the feature extraction subnetwork and the control branch based on the control-treatment data set. 6. The method of claim 1 , further comprising optimizing a candidate platform policy in the virtual environment by: determining an initial driver state; determining, based on the initial driver state, a simulated interaction between a virtual driver and the virtual ride-hailing platform according to the simulator, wherein the simulated interaction comprises a program recommendation from the virtual ride-hailing platform; determining a reward for the program recommendation from the virtual ride-hailing platform according to the reward function in the virtual environment; optimizing one or more parameters of the candidate platform policy based on the reward; and transitioning the initial driver state to a new driver state based on the simulated interaction. 7. The method of claim 6 , wherein the initial driver state comprises at least one of following driver features at a time step: gender, age, tenure on the ride-hailing platform, and recent activities on the ride-hailing platform. 8. The method of claim 1 , wherein the plurality of confounding variables comprise one or more of: location information, weather information, event information, holidays, and a competitor's policy. 9. The method of claim 1 , wherein the training the simulator further comprises: obtaining a simulated interaction trajectory generated by the multi-agent generator; and updating one or more parameters of the discriminator based on the simulated interaction trajectory to minimize a first loss function corresponding to the platform policy and a second loss function corresponding to the confounder-driver policy. 10. The method of claim 1 , wherein the inputting a driver state to the multi-agent generator to generate a simulated interaction record comprises: generating, according to the platform policy and the driver state, a third program recommendation; generating, according to the confounding policy, a fourth program recommendation based on (1) the driver state and (2) the third program recommendation; generating, according to the driver policy, a reaction of the virtual driver based on (1) the driver state, (2) the third program recommendation, and (3) the fourth program recommendation; and obtaining a simulated interaction record comprising the driver state, the third program recommendation, and the reaction. 11. The method of claim 10 , wherein: the first state-action pair comprises a first state and a first action; the second state-action pair comprises a second state and a second action, and determining the first state-action pair and the second state-action pair from the simulated interaction record comprises: for the first state-action pair, determining the driver state as the first state, and the third program recommendation as the first action; and for the second state-action pair, determining the driver state and the thir

Assignees

Beijing Didi Infinity Technology & Dev Co Ltd

Inventors

Classifications

G06Q30/0207
Discounts or incentives, e.g. coupons or rebates · CPC title
G06Q10/067
Enterprise or organisation modelling · CPC title
G06N20/00
Machine learning · CPC title
G06Q10/06315
Needs-based resource requirements planning or analysis · CPC title
G06Q30/0211
Determining the effectiveness of discounts or incentives · CPC title

Patent family

Related publications grouped by family.

View patent family 78526203

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US12198216B2 cover?: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for constructing a virtual environment for a ride-hailing platform are disclosed. An exemplary method comprises: obtaining a plurality of historical interaction trajectories each comprising one or more interaction records between a driver and a ride-hailing platform, each interaction record comprisin…
Who is the assignee on this patent?: Beijing Didi Infinity Technology & Dev Co Ltd
What technology area does this patent fall under?: Primary CPC classification G06Q50/40. Mapped technology areas include Physics.
When was this patent published?: Publication date Tue Jan 14 2025 00:00:00 GMT+0000 (Coordinated Universal Time) (B2). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?: We list 2 related publications on this page (citations in our corpus or others sharing the same primary CPC).