RE: REWARD HACKING

The more I learn about the "problems" of AI, the better I understand the "problems" of human motivation (e-motion).

An interesting view. I've always held a bit of fear towards AI, imagining the worst traits of humanity in a system that can function faster than humanity. A reflection of what was created. I watched the first video (although I need to watch it again to understand better). Loved reading through the comments, which is one of my favorite pastimes. In the video, he has on the screen a saying

General AI won't want you to fix its code

It reminded me of this scene from Star Trek I watched as a child.

I remember using the mario cheat on the stairs. Unlimited lives (not really, if you were to greedy game was over first time you died) bouncing that turtle on the stairs. Laughed when he talked of using the cheats on a Mario game.

Adversarial Reward Functions. This one made me laugh. You see this everywhere, even on this blockchain. When I was self publishing, you saw it with people exploiting shortcomings on the sales platforms (Kindle, Nook etc) and with webmasters back when they were creating all those crappy backlinks. Not sure how one could make the reward system capable of self defense as those on the losing end would surely feel (perhaps justifiably) that the powerful programmers predetermined the winners of the reward while feigning it was beyond their control, it was simply code.

Much to think about, and haven't seen the other videos nor read the 29 page PDF Concrete Problems in AI Safety yet. Much of this might be above my understanding as I've shied away from AI topics out of a fear of AI.