Hey everyone!
I’ve been training a **Recurrent PPO** agent to play the classic **Teenage Mutant Ninja Turtles (Arcade)** game using only visual input. The goal is to teach the agent to fight through the levels using memory and spatial awareness, just like a human would.
Here are some key details:
* **Environment:** TMNT Arcade via custom Gymnasium + stable-retro integration
* **Observations:** 4 stacked grayscale frames at **160×160** resolution
* **Augmentations:** Random noise, brightness shifts, and cropping to improve generalization
* **Reward Signal:** Based on score increase, boss damage, and stage progression
* **Algorithm:** Recurrent Proximal Policy Optimization (RecPPO) with CNN + LSTM
* **Framework:** PyTorch with custom training loop (inspired by SB3)
The recurrent architecture has made a big difference in stability and long-term decision making. The agent is now able to consistently beat the first few levels and is learning to prioritize enemies and avoid damage.