Reinforcement Learning (Plugins)

Post reply

❤Follow Topic(13)

0 favourites

24 posts
- 1
- 2

From the Asset Store

Learn The Alphabet

$15 USD

Game with complete Source-Code (Construct 3 / .c3p) + HTML5 Exported.

R Games

- fundation2000
- - Joined 24 Feb, 2014
  - 7 topics • 33 posts
- 1
- 5 Jan, 2016
- Quote
Reinforcement Learning â€” Now for sale in the Scirra Store!

https://www.scirra.com/store/construct2 ... rning-1873

This plugin is built upon the Deep Q-Learning algorithm developed by Google's DeepMind. It enables the developer to give agents a 'brain' and train them using rewards/punishments. The implementation uses convnet.js by Andrej Karpathy.

For a more detailed description of the features and abilities of the plugin, here is a documentation, as well as an example CAPX.

Please note, the plugin comes as-is and it is not guaranteed when/that the artificial neural network will converge, nor can a certain accuracy be guaranteed.

Use this topic to leave comments, ask questions and talk about Reinforcement Learning
- Savvy001
- - Joined 22 Jan, 2012
  - 124 topics • 785 posts
- 1
- 6 Jan, 2016
- Quote
It would be really great if u could explain more in the capx event sheet what is happening and why it needs to happen.

It gives us a more basic understanding of how to use the plugin in our own setup.

I like what im seeing so far!

Hope u can add to the capx
Try Construct 3

Develop games in your browser. Powerful, performant & highly capable.
Try Now Construct 3 users don't see these ads
- fundation2000
- - Joined 24 Feb, 2014
  - 7 topics • 33 posts
- 1
- 6 Jan, 2016
- Quote
I admit the CAPX is a bit bloated, since there's quite a lot going on, but it's all broken down in only 4 steps (Train, triggerAction, Reward and manageEnvironment):

TRAIN:

The agent gets 9 sensors, and each sensor generates three inputs for the brain - "apple", "poison" and "wall", using the distance to the touched object as value.

(In this step, I also draw some lines from the agent to the touched object using the SensedApple, SensedPoison Tiles, but these are purely optional, I guess I could take them out alltogether).

REWARD:

Then there's the reward built upon interaction with the apples, poison or the walls.

TRIGGER ACTION:

And finally there's the output of the brain (for example "right") transformed into an action ("set angle of motion to current angle +50 degrees").

MANAGE ENVIRONMENT:

This just adds more apples and poison if there's not enough laying around.

If you still find this confusing I'll take a look at it in the weekend and trim it down. Although I usually consider more to be better .
- Savvy001
- - Joined 22 Jan, 2012
  - 124 topics • 785 posts
- 1
- 6 Jan, 2016
- Quote
That makes sense.

Please keep the events like this, its all nice and clean.

Also the lines drawn are a perfect visual guide to what is happening.

What would be great is a explanation to how the agent is actually training itself.

So we now know it has sensors connected to the brain.

We also know it gets rewarded.

( but how? even more important what does a reward mean to the agent? as in "does it compare good & bad or something else")

And it has a trigger output.

(but how does this trigger output correlate with the brains input and rewards? ,i guess this is important to know for understanding its learned behavior?)

If this could be explained it becomes easier to understand the training process.

And then i can start training something of my own with this plugin.

(Instead of just a copy & paste

Hope this makes sense
- rexrainbow
- - Joined 4 Apr, 2011
  - 222 topics • 4,531 posts
- 1
- 8 Jan, 2016
- Quote
Garbage in, garbage out. It might be better to tell user to feed well preprocessed data input first, like normalize them in range 0 to 1.
- fundation2000
- - Joined 24 Feb, 2014
  - 7 topics • 33 posts
- 1
- 8 Jan, 2016
- Quote
I'll try to add some tips and hints on the weekend, but Rex is definitely right.

One should definitely invest some time and read up on reinforcement learning - what a neural network is, what inputs and outputs are, what forward and backward propagation are etc, since it's quite the broad topic and at first perhaps not too simple to grasp.

There is a plethora of great sources out there at the moment, as Deep Learning becomes a valuable tool for companies' data analysis. Start with some Wikipedia (like https://en.wikipedia.org/wiki/Reinforcement_learning and https://en.wikipedia.org/wiki/Q-learning) and also google around a bit.
- Savvy001
- - Joined 22 Jan, 2012
  - 124 topics • 785 posts
- 1
- 10 Jan, 2016
- Quote
[quote:2loiyofk]I'll try to add some tips and hints on the weekend
Thank you for this!

And you are correct, investing time into this subject is needed, but on that same note i already have invested allot of time in to it.

However i find that most explanations given are not based in simplicity.

But im learning as i go
- Isaske
- - Joined 17 Mar, 2013
  - 18 topics • 53 posts
- 1
- 12 Mar, 2016
- Quote
you can upload a preview? pliz
- egos
- - Joined 29 Apr, 2012
  - 4 topics • 30 posts
- 1
- 22 Mar, 2016
- Quote
Hello

nice idea and so important for the futur

But

I didnt find the capx on your web page

Any idea to find documentation ?

Thanks
- fundation2000
- - Joined 24 Feb, 2014
  - 7 topics • 33 posts
- 1
- 22 Mar, 2016
- Quote
Hi guys,

so sorry I never got to make a more detailed documentation. I just can't find the time at the moment, I'll try to fit it in sometime.

Until then. here is the capx file and here is the documentation.

If you have specific questions just ask here and I will gladly help.
- egos
- - Joined 29 Apr, 2012
  - 4 topics • 30 posts
- 1
- 23 Mar, 2016
- Quote
Perfect

I test this and make a feedback rapidly

this

http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

will be very helpful i think
- Isaske
- - Joined 17 Mar, 2013
  - 18 topics • 53 posts
- 1
- 17 May, 2016
- Quote
Perfect

I test this and make a feedback rapidly

this

http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

will be very helpful i think

It will be possible to share your file CAPX?

I want to learn, please.
- jan2000
- - Joined 11 Apr, 2014
  - 5 topics • 52 posts
- 1
- 29 Jul, 2016
- Quote
I can't get this to work. Have tried the simplest training I could imagine: One input (random either "a" or "b") and two outputs (either "a" or "b"). Match between input and output is rewarded with 10, mismatch is punished with -10. After 20 minutes of "training" the output is still completely random.. Am I doing it wrong? fundation2000
- fundation2000
- - Joined 24 Feb, 2014
  - 7 topics • 33 posts
- 1
- 30 Jul, 2016
- Quote
After 20 minutes of "training" the output is still completely random.. Am I doing it wrong?

Hi Jan. At first glance what you're doing looks correct to me. In my experience, it usually takes around 200.000 ticks (e.g. 3 hours) for the agent to begin behaving intelligently, e.g. to see it converge. This is a downside of Construct 2 - it only allows actions each tick, so you can't accelerate training beyond that.

Also, try setting Action:Learning to turn off the learning process after the training period - this way the agent won't undertake any more random actions and you get a clearer picture of the outputs which are correlated to your inputs.
- jan2000
- - Joined 11 Apr, 2014
  - 5 topics • 52 posts
- 1
- 31 Jul, 2016
- Quote
You are right! After a long time it gets a lot better. Thank you!

Have you tried it on images?