In FLock’s AI Arena, participants simultaneously compete and collaborate to train models for AI agents. We interviewed a top-performing user, Foundry ML engineer Jerry Adams Franklin, about his journey into community-focused decentralised AI training with FLock. Read on for the techniques behind Jerry consistently rising to the top of league tables.

AI Arena is a blockchain-based, incentivised fine-tuning platform for which FLock launched the closed beta on train.flock.io in mid-May. Participants stake tokens and are rewarded (or slashed) based on the quality of their training and validation. Through public participation and distributed compute, we bring training closer to the user.

Having built the world’s largest Bitcoin mining pool, Foundry leverages its institutional expertise, capital, and market intelligence to empower participants within the crypto ecosystem by providing the tools to build tomorrow’s decentralised infrastructure. Foundry supports FLock as a leading node operator and as a participant and source of valuable feedback in the AI Arena beta. Both Foundry and FLock stand at the forefront of the decentralised AI revolution.

The interview was led by FLock’s Engineering Lead Nick Wen.

From the LLM rudiments upwards

Like many engineers and data scientists, Jerry had limited prior experience working with LLMs before onboarding onto AI Arena. To get up to speed with the fine-tuning process, he used FLock’s testnet training node quickstart, a script that fully automates the training process, to grasp the function of each module. To educate himself further, he watched YouTube videos such as Training Node 101.

This preparation helped Jerry to comprehend what the input actually does to the LLM, and how input should be formatted in order for the LLM to perform optimally.

After a couple of tasks, he began to modify the script. He continues to reference most modules from the training quickstart, such as the data loader and the collator. For fine-tuning, however, he sets his own parameters.

Succeeding at data generation tasks

In the vast majority of tasks so far, datasets have been provided. The challenging element of a minority of tasks is the requirement for users to generate training data themselves. They are a simulation of real-world scenarios where there is insufficient data to train a model.

A good example of this was Farcaster GPT. Participants trained a chatbot using Farcaster documentation. The provided training set only covered a small portion of the Farcaster documentation, but the final validation included the entire documentation. We recommended participants generate more data using all the information available at https://docs.farcaster.xyz/.

Data generation, Jerry explains, is a more important step than hyper-parameter tuning. Most of the time, what you train the model on and what it is exposed to are very different. Due to this contrast, increasing and augmenting the dataset is imperative. 

Choosing a model

Jerry was asked how he succeeds in such tasks. Firstly, he played around with base models that have the specified maximum 7 billion parameters, which all rendered similar results. He settled with Llama and Qwen 1.5 due to familiarity. Other models also have strengths, such as being trained on vast quantities of Chinese data and having great multi-modal capabilities, but for this task this presented no particular advantage.

To Jerry, it seemed crucial to ensure that the model did not overfit on the sample training data. This error could have been made for instance, by increasing the parameters too heavily—a mistake he observed in many of the contestants. 

The key was to strike a balance between hyper-parameterisation and regularisation. Through using a regularisation method, the model can generalise far better than simply learning the data. For that, he used techniques such as weight decay and early stopping. Early stopping halts the training as soon as the loss was similar for a few steps, ensuring that the model does not over-learn the data.

Data cleaning and formatting

Clean, consistent data formatting is essential for the model training to run smoothly. If there is just one data point in the wrong format, then training grinds to a halt. 

Not being a data generation expert, Jerry had to get creative in his techniques. He divided the documents into paragraphs based on headings, then used GPT-4 to generate conversations based on each paragraph.

Closing thoughts

Feeling inspired by Jerry’s experience with FLock and the techniques behind him rising up the league tables? Get involved with AI Arena on train.flock.io. After 3,574 models submitted and 112,750 validations from 285 AI developers, the AI Arena closed beta has successfully concluded. FLock is now pleased to lift the whitelist requirement and launch the open beta.

Find out more about FLock.io in this previous blog and from our docs.

Join our Discord to give feedback and chat with the community. 

For future updates, follow FLock.io on Twitter.