Eren Jaeger Generative Agent

Eren Jaeger Generative Agent

Last post, I left off with how I was preparing to fine-tune an LLM on Eren Jaeger. After collecting all of my training data and spending the weekend training a Mistral model, the results weren't very promising (albeit entertaining). There could have been many reasons for this, including not having enough data, not training long enough, or bad quality data. I could have spent more time trying to get it to work, but it already took so long to train it didn't seem time efficient to try out different methods, considering they might all fail. Using my system of extracting data from episode summaries and transcripts allowed me to create roughly 400 pieces of training data, and if this wasn't enough it wouldn't be easy getting more. I spent a lot of time on this, so it was disappointing (but not too surprising), but ultimately I decided to take a different approach.

If I couldn't build Eren Jaeger's persona into the model itself, I would simply have to feed his persona into the models available. In this case, that meant generating an Eren Jaeger persona using the framework provided by the Generative Agents paper. The most important element of this was curating memories for Eren and putting them into the persona. I would still be using the OpenAI API to generate responses and behavior for Eren, but now it would have direct access to his own experiences, enabling it to emulate him. This approach is a much easier place to start for creating a custom persona.

I still had to figure out how I would generate his experiences. There are a million ways you could do this, but I made an effort to prioritize the simpler approaches. I decided to use an Eren Jaeger's wikipedia page as the source, and treat each sentence in it as an individual memory and insert it into the persona's memory stream. However, this would have been too simple because it would result in him having only basic observations but not higher level reflections, a vital part of what makes this framework special. That's why I had the memory generator reflect on his experiences, just as it works in the generative agents code. I actually used the same code that they did. Doing this took surprisingly long, and I ran into bug after bug. Eventually I got a basic version working.

While the persona is functional, there is certainly plenty of room for improvement. For example, it seems to have the same knowledge of Eren Jaeger and his history, but definitely does not match the tone. For example, when asked "How do you feel about the Titans?", he responds "I feel both fear and anger towards the Titans. They have caused so much destruction and pain to my world, and I am determined to fight back against them. However, I also understand that there may be more to them than meets the eye, and I am willing to explore and learn about their origins and motives. Ultimately, I know that I must do whatever it takes to protect my loved ones and my home." To anyone who has watched Attack on Titan, this response doesn't feel quite right. My main intention with fine-tuning the LLM was to match his tone, but unfortunately that didn't work out.

It's cool to be able to talk to the agent, but the whole purpose of this project was to see how he interacts in the sandbox. However, experimenting with the sandbox can be frustrating because it can take a while for anything to happen and you don't have any control over what the agents do. I put in his background that he was transported to a new world, so he spent his first day wandering around not really doing anything. I'll have to play around more and wait until he actually interacts with another persona. That will be interesting.

With a basic prototype done, the next step is figuring out where to go from here. As usual, I'm overwhelmed with ideas. I could add in more characters (such as Armin or Mikasa), I could try to increase Eren's coherence with his character, or I could provide greater user interactivity. The options are endless, and that excites me.