Method and Apparatus for Training Information Adjustment Model of Charging Station, and Storage Medium

US2023229913A1 · US · A1

Patent metadata
FieldValue
Publication numberUS-2023229913-A1
Application numberUS-202318125327-A
CountryUS
Kind codeA1
Filing dateMar 23, 2023
Priority dateAug 10, 2022
Publication dateJul 20, 2023
Grant date

How to read this patent

A practical reading order for non-experts. Skip the full description unless you need deep technical detail.

  1. Title

    What the patent document calls the invention.

  2. Abstract

    A short plain-language summary of the technical disclosure.

  3. Assignees and inventors

    Who owns or filed the patent and who is credited as inventor.

  4. Key dates

    Filing, priority, publication, and grant dates set the timeline.

  5. First independent claim

    The legal scope of protection — read this for what is actually claimed.

  6. CPC / IPC classifications

    Technology tags used to group this patent with similar filings.

  7. Citations and related patents

    Prior art links and similar publications in this corpus.

Abstract

Official abstract text for this publication.

A method and apparatus for training an information adjustment model of a charging station, an electronic device, and a storage medium are provided. An implementation comprises: acquiring a battery charging request, and determining environment state information corresponding to each charging station in a charging station set; determining, through an initial policy network, target operational information of each charging station in the charging station set for the battery charging request, according to the environment state information; determining, through an initial value network, a cumulative reward expectation corresponding to the battery charging request according to the environment state information and the target operational information; training the initial policy network and the initial value network by using a deep deterministic policy gradient algorithm; and determining the trained policy network as an information adjustment model corresponding to each charging station.

First claim

Opening claim text (preview).

What is claimed is: 1 . A method for training an information adjustment model of a charging station, comprising: acquiring a battery charging request, and determining an environment state information corresponding to each charging station in a charging station set; determining, through an initial policy network, target operational information of each charging station in the charging station set for the battery charging request, according to the environment state information corresponding to each charging station in the charging station set; determining, through an initial value network, a cumulative reward expectation corresponding to the battery charging request according to the environment state information and the target operational information corresponding to each charging station in the charging station set; training the initial policy network and the initial value network by using a deep deterministic policy gradient algorithm, to obtain a trained policy network and a trained value network, wherein, during the training, the initial value network is updated through a temporal difference method, and the initial policy network is updated with a goal of maximizing the cumulative reward expectation corresponding to the battery charging request; and determining the trained policy network as an information adjustment model corresponding to each charging station in the charging station set. 2 . The method according to claim 1 , wherein determining, through the initial value network, the cumulative reward expectation corresponding to the battery charging request according to the environment state information and the target operational information corresponding to each charging station in the charging station set comprises: determining, through an agent pooling module, integrated representation information representing all charging stations in the charging station set according to the environment state information and the target operational information corresponding to each charging station in the charging station set; and determining, through the initial value network, the cumulative reward expectation corresponding to the battery charging request according to the integrated representation information. 3 . The method according to claim 2 , wherein determining, through the agent pooling module, integrated representation information representing all charging stations in the charging station set according to the environment state information and the target operational information corresponding to each charging station in the charging station set comprises: mapping, through a mapping vector, the environment state information and the target operational information corresponding to each charging station in the charging station set to a score feature representing an importance of each charging station; determining a preset number of charging stations from the charging station set according to the score feature, and determining the environment state information, the target operational information and the score feature corresponding to each charging station of the preset number of charging stations; normalizing score features corresponding to the preset number of charging stations to obtain a gate control vector; determining a gate control feature according to the environment state information, the target operational information, and the gate control vector corresponding to the preset number of charging stations; and determining the integrated representation information of all the charging stations in the charging station set according to the gate control feature. 4 . The method according to claim 2 , wherein training the initial policy network and the initial value network by using the deep deterministic policy gradient algorithm comprises: determining a first loss corresponding to the initial value network through the temporal difference method; determining a second loss corresponding to the agent pooling module through a self-supervised contrastive learning method; updating the initial value network and the agent pooling module according to the first loss and the second loss; and updating the initial policy network with the goal of maximizing the cumulative reward expectation corresponding to the battery charging request. 5 . The method according to claim 4 , wherein determining the second loss corresponding to the agent pooling module through the self-supervised contrastive learning method comprises: determining, for a first subset in a joint feature, first integrated representation information through the agent pooling module, wherein the joint feature comprises the environment state information and the target operational information corresponding to each charging station in the charging station set; determining, for a second subset in the joint feature, second integrated representation information through the agent pooling module; determining, for a third subset in the joint feature corresponding to another battery charging request different from the battery charging request, third integrated representation information through the agent pooling module; and using a self-supervised contrastive learning loss as the second loss, the self-supervised contrastive learning loss being determined according to the first integrated representation information, the second integrated representation information, and the third integrated representation information. 6 . The method according to claim 4 , wherein determining the first loss corresponding to the initial value network through the temporal difference method comprises: determining, through a preset reward function, reward information according to a battery charging behavior of a charging object corresponding to the battery charging request, wherein each charging station in the charging station set shares the reward information, and the preset reward function provides a different reward for a different battery charging behavior; and determining, through the temporal difference method, the first loss corresponding to the initial value network according to the cumulative reward expectation corresponding to the battery charging request, a reward corresponding to the battery charging request, and a cumulative reward expectation corresponding to a second battery charging request next to the battery charging request. 7 . The method according to claim 1 , further comprising: acquiring a new battery charging request; determining the environment state information corresponding to each charging station in the charging station set; for each charging station in the charging station set, determining, through the information adjustment model corresponding to each charging station, target operational information of each charging station for the new battery charging request according to the environment state information of the charging station, wherein each charging station in the charging station set is configured to perceive the environment state information of each other, and the information adjustment model is obtained by performing multi-agent reinforcement learning based on the deep deterministic policy gradient algorithm; displaying the target operational information of each charging station in the charging station set for the new battery charging request; and receiving a selection instruction and determining a target charging station from the charging station set according to the selection instruction. 8 . An apparatus for training an information adjustment model of a charging station, comprising: at least one processor; and a storage device, in communication with the at least one processor, wherein the storage device stores instructions which, when executed by the at least one proc

Assignees

Inventors

Classifications

  • G06N3/08Primary

    Learning methods · CPC title

  • Monitoring or controlling charging stations · CPC title

  • by self learning · CPC title

  • by parameter estimation · CPC title

  • drive range estimation, e.g. of estimation of available travel distance · CPC title

Patent family

Related publications grouped by family.

External sources

Frequently asked questions

Answers are generated from the same data shown on this page.

What does patent US2023229913A1 cover?
A method and apparatus for training an information adjustment model of a charging station, an electronic device, and a storage medium are provided. An implementation comprises: acquiring a battery charging request, and determining environment state information corresponding to each charging station in a charging station set; determining, through an initial policy network, target operational inf…
Who is the assignee on this patent?
Beijing Baidu Netcom Sci & Tech Co Ltd
What technology area does this patent fall under?
Primary CPC classification G06N3/08. Mapped technology areas include Physics.
When was this patent published?
Publication date Thu Jul 20 2023 00:00:00 GMT+0000 (Coordinated Universal Time) (A1). Legal status and post-grant events are not shown on this page.
What related patents are in patentsdb?
We list 8 related publications on this page (citations in our corpus or others sharing the same primary CPC).