Subnet 47: EvolAI

EvolAI is Bittensor Subnet 47, a language-model evaluation subnet. It runs a continuous competition in which miners submit small language models and validators score those models each round, rewarding the ones that improve the most. Rather than serving live inference, the subnet’s purpose is to push a population of models to get steadily better against a fixed reference.

What the Subnet Produces

The subnet’s output is trained model weights, published openly. Miners host their models on Hugging Face and register them to the subnet under one of the supported architecture tracks. Because every model is public and evaluated on a shared, openly described dataset, the competition is reproducible: anyone can see which samples a model is graded on and how it performed.

The design favors steady improvement over one-off wins. Models are graded against a public reference model, and smaller models receive a parameter-efficiency bonus, so a compact model that closely matches the reference can outscore a larger but less efficient one.

Miner and Validator Roles

Miners train and publish language models and register each new revision with the subnet. Validators load a miner’s weights each round and score them on a few combined signals: the main one is how closely the model’s outputs match a reference model (measured by KL divergence), with additional credit for models that benefit from chain-of-thought reasoning, for consistent improvement across rounds, and for small auxiliary arithmetic tasks.

To keep the competition honest, the subnet locks a miner’s exact model revision before releasing the next evaluation seed, so a model cannot be swapped after the challenge is set, and it applies gates that require each new revision to actually beat the miner’s previous one without degrading general ability. The resulting validator scores become the weights that feed into Yuma Consensus, which converts them into the incentive split across miners and validators.

Evaluation Signal Context

The EvolAI README describes the validator score as a blend of three signals. Quality is the largest component and measures how closely a miner model matches the reference model on the evaluation samples, with an additional check for whether chain-of-thought tokens improve the model’s performance. Flow rewards steady improvement over time, so a miner is not only measured by one isolated model snapshot. Side quests add a small arithmetic accuracy signal that tests whether the model can return short, direct answers on simple tasks.

Those signals make SN47 more specific than a generic model leaderboard. A miner is competing on reference-model matching, usefulness of reasoning tokens, consistency of progress, and small auxiliary tasks at the same time. The scoring design can therefore reward a compact model that keeps improving and remains useful across checks, rather than only rewarding raw size or one favorable round.

The README also separates the competition into supported architecture tracks and describes a parameter-efficiency bonus for smaller models. That matters because the subnet is not simply asking for the largest possible model. It narrows the comparison to allowed model families and gives compact models a way to compete when they can approach reference-model behavior efficiently.

The same README also describes a timing boundary between model submission and challenge release. Validators lock the miner’s model revision before publishing the next seed. After that point, the miner can see the next challenge seed, but the already locked revision is the one that will be evaluated for the current round. That sequence gives miners public challenge information while preventing a model from being changed after the validator has fixed the revision that counts.

The README identifies the active evaluation dataset as evolai/universal_qa, whose dataset card lists instruction and response fields across a large training split. That second source supports the article’s point that EvolAI evaluates models on a public question-answering dataset rather than on private prompts hidden from readers.

For Taopedia readers, this means EvolAI’s public evaluation setup should be read together with its revision-lock rule. Public samples and deterministic challenge selection make the contest inspectable, while revision locking keeps the measured model tied to a specific published state. The subnet is therefore about repeated, transparent model-improvement rounds rather than private inference service.

On-Chain Identity

Live SN47 data, including metagraph state and alpha token pool information, is available on TaoStats. The live Finney identity for netuid 47 links openevolai/evolai.git as the subnet’s open-source codebase, which is the source of truth for the evaluation signals and scoring process described above.

Relationship to Yuma Consensus

Subnet 47 uses Yuma Consensus to convert the model-quality weight vectors that validators submit into the emission shares distributed to miners and validators within the subnet each tempo. The linked documentation describes how validator weight submissions are aggregated into consensus weights for each miner registered on the subnet.

In EvolAI’s context, validators lock each miner’s registered model revision before releasing the evaluation seed, then score submissions on KL divergence against a reference model, chain-of-thought benefit, steady improvement across rounds, and arithmetic side quests, and translate those blended scores into weight vectors for the subnet. The Emission documentation describes how those consensus weights determine each participant’s share of the subnet’s accumulated emission each tempo.

Development Stage Context

The Introduction to Bittensor describes subnet development as moving from localnet to testnet and then mainnet. For EvolAI (SN47), that sequence changes how readers should interpret small language model training examples and continuous evaluation outcomes.

In localnet, EvolAI-compatible miners and validators can be developed and tested in an isolated environment. Localnet model competition scores and emission outcomes do not represent production subnet performance.

On testnet, EvolAI-compatible model submissions can be exercised in a shared, non-production network. Testnet evaluations and validator scores are separate from mainnet subnet state.

On mainnet, EvolAI (SN47) is the live production subnet where miners compete to train small language models and validators evaluate those submissions each round to determine real Bittensor emissions. The EvolAI repository describes the mechanism that applies on the production network.

The Bittensor Networks reference separates mainnet, testnet, and localnet. A model evaluation result or emission outcome from one environment should not be read as representing production subnet performance in another environment.

Reader Boundary

Subnet 47 EvolAI should not be read as generic Bittensor subnet documentation, a live inference API, or proof that one static model snapshot wins the subnet permanently. It names one subnet’s continuous small-language-model training competition on netuid 47 (Understanding Subnets, Glossary: Netuid).

Revision Locking Fixes Which Model Revision Is Scored

The EvolAI README describes validators locking a miner’s model revision before releasing the next evaluation seed (EvolAI README).

That sequence ties each round’s score to a specific published revision rather than a post-challenge swap.

KL Divergence Blends Quality, Flow, and Side Quests

The same README describes validator scoring as a blend of reference-model matching (KL divergence), steady improvement across rounds, and small arithmetic side quests (EvolAI README).

Those signals narrow the contest beyond a single leaderboard metric.

Validator Weights Still Flow Through Yuma Consensus

Subnet 47 uses Yuma Consensus to convert validator weight submissions into emission shares each tempo (Yuma Consensus, Emission).