Github: [https://github.com/paulo101977/TMNT-RecurrentPPO](https://github.com/paulo101977/TMNT-RecurrentPPO)

Hey everyone!
I’ve been training a **Recurrent PPO** agent to play the classic **Teenage Mutant Ninja Turtles (Arcade)** game using only visual input. The goal is to teach the agent to fight through the levels using memory and spatial awareness, just like a human would.

Here are some key details:

* **Environment:** TMNT Arcade via custom Gymnasium + stable-retro integration
* **Observations:** 4 stacked grayscale frames at **160×160** resolution
* **Augmentations:** Random noise, brightness shifts, and cropping to improve generalization
* **Reward Signal:** Based on score increase, boss damage, and stage progression
* **Algorithm:** Recurrent Proximal Policy Optimization (RecPPO) with CNN + LSTM
* **Framework:** PyTorch with custom training loop (inspired by SB3)

The recurrent architecture has made a big difference in stability and long-term decision making. The agent is now able to consistently beat the first few levels and is learning to prioritize enemies and avoid damage.

Trend Engine: AI-Powered News & Trends

recent posts

about