Asynchronous deep reinforcement learning
US-2017140270-A1 · May 18, 2017 · US
US11853901B2 · US · B2
| Field | Value |
|---|---|
| Publication number | US-11853901-B2 |
| Application number | US-202016938593-A |
| Country | US |
| Kind code | B2 |
| Filing date | Jul 24, 2020 |
| Priority date | Jul 26, 2019 |
| Publication date | Dec 26, 2023 |
| Grant date | Dec 26, 2023 |
A practical reading order for non-experts. Skip the full description unless you need deep technical detail.
What the patent document calls the invention.
A short plain-language summary of the technical disclosure.
Who owns or filed the patent and who is credited as inventor.
Filing, priority, publication, and grant dates set the timeline.
The legal scope of protection — read this for what is actually claimed.
Technology tags used to group this patent with similar filings.
Prior art links and similar publications in this corpus.
Official abstract text for this publication.
Disclosed are a training method of an artificial intelligence (AI) model configured to provide information identifying a recommendation item and a recommended user, and an electronic apparatus for training an AI model. The training method includes obtaining user data and item data; generating a first semantic vector at a first time interval based on the user data; generating a second semantic vector at the first time interval based on the item data; generating a vector that represents a relevance between the first semantic vector and the second semantic vector at the first time interval; storing data corresponding to the generated vector, the first semantic vector, and the second semantic vector; and obtaining an updated weight for the first AI model by training the first AI model based on the stored data.
Opening claim text (preview).
What is claimed is: 1. A controlling method of an electronic apparatus for training an artificial intelligence (AI) model configured to provide information identifying a recommendation item and a recommended user, the method comprising: obtaining first user data and first item data at a first time interval; obtaining at least one first keyword for the first user data and at least one second keyword for the first item data; generating a first semantic vector at a second time interval by inputting at least one keyword for the first user data to a deep structured semantic model (DSSM); generating a second semantic vector at the second time interval by inputting at least one keyword for the first item data to the DSSM; generating a first vector that represents a relevance between the first semantic vector and the second semantic vector at the second time interval; storing data corresponding to the generated first vector, the first semantic vector, and the second semantic vector; obtaining an updated weight for a first AI model by training the first AI model based on the stored data; applying the updated weight to a second AI model; providing the information identifying the recommendation item or the recommended user based on the second AI model; and re-updating the updated weight of the first AI model based on a user interaction of user, wherein providing the information identifying the recommendation item or the recommended user based on the second AI model comprises: obtaining at least one among second user data and second item data in real time; based on the second user data being obtained in real time, obtaining at least one third keyword for the second user data; generating a third semantic vector by inputting the at least one third keyword for the second user data to the DSSM; generating a fourth semantic vector by inputting the third semantic vector to the second AI model; providing the information identifying the recommendation item based on the fourth semantic vector through a display of the electronic apparatus; receiving the user interaction of user corresponding to the second user data associated with the recommendation item through an input interface of the electronic apparatus; generating a second vector that represents a relevance between the third semantic vector and the fourth semantic vector; based on the second item data being obtained in real time, obtaining at least one fourth keyword for the second item data; generating a fifth semantic vector by inputting the at least one fourth keyword for the second item data to the DSSM; generating a sixth semantic vector by inputting the fifth semantic vector to the second AI model; providing the information identifying the recommended user based on the sixth semantic vector through the display of the electronic apparatus; providing the recommended user with information of the recommendation item corresponding to the fifth semantic vector through the display of the electronic apparatus; receiving the user interaction of the recommended user associated with the recommendation item through the input interface of the electronic apparatus; and generating a third vector that represents a relevance between the fifth semantic vector and the sixth semantic vector, wherein the re-updating the updated weight of the first AI model based on the user interaction comprises: re-updating the updated weight of the first AI model based on the user interaction of user corresponding to the second user data, the third semantic vector, the fourth semantic vector, and the second vector; and re-updating the updated weight of the first AI model based on the user interaction of the recommended user, the fifth semantic vector, the sixth semantic vector, and the third vector, wherein the providing the information identifying the recommendation item comprises displaying an advertising content of the recommendation item through the display of the electronic apparatus, wherein the providing the information identifying the recommended user comprises displaying a list of the recommended user through the display of the electronic apparatus, wherein the providing the recommended user with information of the recommendation item comprises displaying an advertising content of the recommendation item corresponding to the fifth semantic vector through the display of the electronic apparatus, wherein the re-updating the updated weight of the first AI model comprises training the first AI model by performing a reinforcement learning based on state data, action data and reward data, and obtaining the re-updated weight for the first AI model, wherein the state data comprises the third semantic vector and the fifth semantic vector, the action data comprises the fourth semantic vector and the sixth semantic vector, and the reward data comprises the user interaction of user corresponding to the second user data and the user interaction of the recommended user, wherein the first time interval is greater than the second time interval, and wherein the first time interval is predetermined according to a category of the item. 2. The method of claim 1 , wherein the first AI model is a Q network AI model, and the second AI model is a target Q network model. 3. An electronic apparatus for training an artificial intelligence (AI) model configured to provide information identifying a recommendation item and a recommended user, the electronic apparatus comprising: a display; an input interface; a memory configured to store at least one instruction; and a processor configured to execute the at least one instruction to: obtain user data and item data at a first time interval, obtain at least one first keyword for the user data and at least one second keyword for the item data, generate a first semantic vector at a second time interval based on the user data by inputting at least one keyword for the user data to a deep structured semantic model (DSSM), generate a second semantic vector at the second time interval based on the item data by inputting at least one keyword for the item data to the DSSM, generate a first vector representing a relevance of the first semantic vector and the second semantic vector at the second time interval, store data corresponding to a first generated vector, the first semantic vector, and the second semantic vector in the memory, obtain a updated weight for a first AI model, by training a first artificial intelligence (AI) model based on the data stored in the memory, apply the updated weight to a second artificial intelligence (AI) model, provide the information identifying the recommendation item or the recommended user based on a second AI model; and re-update the updated weight of the first AI model based on a user interaction of user, wherein the processor is further configured to: obtain at least one among second user data and second item data in real time, based on the second user data being obtained in real time, obtain at least one third keyword for the second user data, generate a third semantic vector by inputting the at least one third keyword for the second user data to the DSSM, obtain a fourth semantic vector by inputting the third semantic vector to the second AI model, provide the information identifying the recommendation item based on the fourth semantic vector through the display, receive the user interaction of user corresponding to the second user data associated with the recommendation item through the input interface, generate a second vector that represents a relevance between the third semantic vector and the fourth semantic vector, re-update the updated weight of the first AI model based on the user interaction of user corresponding to the second user data, the third semantic vector, the fourth semantic vector, and the second
Learning methods · CPC title
Reinforcement learning · CPC title
Feedforward networks · CPC title
based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO] · CPC title
Knowledge engineering; Knowledge acquisition · CPC title
Related publications grouped by family.
Answers are generated from the same data shown on this page.