Intelligent agent reinforcement learning method and apparatus, device and medium
US-2021117738-A1 · Apr 22, 2021 · US
US11469975B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11469975-B2 |
| Application number | US-202117207571-A |
| Country | US |
| Kind code | B2 |
| Filing date | Mar 19, 2021 |
| Priority date | Jun 30, 2020 |
| Publication date | Oct 11, 2022 |
| Grant date | Oct 11, 2022 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
A filter debugging method, a device, an electronic apparatus and a readable storage medium are provided. The filter debugging method includes: step S1: inputting a current hole parameter and a current index value of a filter into a policy network which is pre-trained; step S2: determining, by the policy network, a target hole to be polished of the filter, according to the current hole parameter and the current index value of the filter; step S3: controlling a mechanical arm to polish the target hole of the filter; and step S4: determining whether the filter is qualified according to an index value of the polished filter; in a case that the filter is qualified, ending a process including the steps S1 to S4; in a case that the filter is unqualified, performing the steps S1 to S4 circularly until the filter is qualified.
Opening claim text (preview).
What is claimed is: 1. A filter debugging method, comprising: step S 1 : inputting a current hole parameter and a current index value of a filter into a policy network which is pre-trained; step S 2 : determining, by the policy network, a target hole to be polished of the filter, according to the current hole parameter and the current index value of the filter; step S 3 : controlling a mechanical arm to polish the target hole of the filter; and step S 4 : determining whether the filter is qualified according to an index value of the polished filter; in a case that the filter is qualified, ending a process including the steps S 1 to S 4 ; in a case that the filter is unqualified, performing the steps S 1 to S 4 circularly until the filter is qualified; wherein the inputting the current hole parameter and the current index value of the filter into the policy network which is pre-trained comprises: performing a Multi-Layer Perceptron (MLP) processing on the current hole parameter of the filter to obtain a first parameter; performing the MLP processing on the current index value of the filter to obtain a second parameter; inputting the first parameter and the second parameter to the policy network which is pre-trained; the determining by the policy network the target hole to be polished of the filter according to the current hole parameter and the current index value of the filter comprises: determining, by the policy network, the target hole to be polished of the filter, according to the first parameter and the second parameter. 2. The method according to claim 1 , wherein the controlling the mechanical arm to polish the target hole of the filter comprises: controlling the mechanical arm to polish the target hole of the filter, wherein the target hole of the filter is polished by one step at a time. 3. The method according to claim 1 , wherein the hole parameter of the filter comprises a hole depth and a hole diameter; the index value of the filter comprises at least one of a center frequency, a pass band width, a return loss, an insertion loss, a left side out-of-band rejection and a right side out-of-band rejection. 4. The method according to claim 1 , wherein the policy network is obtained by training simulation data, the simulation data comprise a simulation hole parameter of a simulation filter and a simulation index value of the simulation filter. 5. The method according to claim 4 , wherein the policy network is trained by: pre-training the policy network to obtain a pre-trained network, and changing a simulation hole parameter of one hole of the simulation filter each time in a pre-training process; initializing parameters of the policy network to parameters of the pre-trained network; and updating the parameters of the policy network until a convergence. 6. A filter debugging device, comprising: at least one processor; a memory communicatively coupled to the at least one processor; and a policy network which is pre-trained, wherein the memory stores instructions executable by the at least one processor, and the at least one processor executes the instructions to: input a current hole parameter and a current index value of a filter into a policy network which is pre-trained; the policy network is configured to determine a target hole to be polished of the filter, according to the current hole parameter and the current index value of the filter; the at least one processor executes the instructions to: control a mechanical arm to polish the target hole of the filter; and determine whether the filter is qualified according to an index value of the polished filter; in a case that the filter is qualified, end a determination of whether the filter is qualified; in a case that the filter is unqualified, trigger the policy network until the filter is qualified, wherein the at least one processor executes the instructions to: perform a Multi-Layer Perceptron (MLP) processing on the current hole parameter of the filter to obtain a first parameter; perform the MLP processing on the current index value of the filter to obtain a second parameter; input the first parameter and the second parameter to the policy network; the policy network is further configured to: determine the target hole to be polished of the filter, according to the first parameter and the second parameter. 7. The device according to claim 6 , wherein the at least one processor executes the instructions to: control the mechanical arm to polish the target hole of the filter, wherein the target hole of the filter is polished by one step at a time. 8. The device according to claim 6 , wherein the hole parameter of the filter comprises a hole depth and a hole diameter; the index value of the filter comprises at least one of a center frequency, a pass band width, a return loss, an insertion loss, a left side out-of-band rejection and a right side out-of-b and rejection. 9. The device according to claim 6 , wherein the policy network is obtained by training simulation data, the simulation data comprise a simulation hole parameter of a simulation filter and a simulation index value of the simulation filter. 10. The device according to claim 9 , wherein the policy network is trained by: pre-training the policy network to obtain a pre-trained network, and changing a simulation hole parameter of one hole of the simulation filter each time in a pre-training process; initializing parameters of the policy network to parameters of the pre-trained network; and updating the parameters of the policy network until a convergence. 11. An electronic apparatus, comprising: at least one processor; and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, and the at least one processor executes the instructions to perform the filter debugging method according to claim 1 . 12. A non-transitory computer readable storage medium storing a computer instruction, wherein a computer executes the computer instruction to perform the filter debugging method according to claim 1 .
using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model · CPC title
using machine learning or artificial intelligence · CPC title
Mechanical parametric or variational design · CPC title
Design optimisation · CPC title
Manufacturing frequency-selective devices (resonators H01P11/008) · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.