Subnet 11: TrajectoryRL

TrajectoryRL is Bittensor Subnet 11. Its on-chain identity describes the subnet as agentic RL as a service, and its official repository presents it as a continuous competition for producing installable AI agent skills.

TrajectoryRL is Bittensor Subnet 11. TaoStats describes the live on-chain identity as “Agentic RL as a Service, Optimize agent trajectories to make agents cheaper, safer, and more reliable.” The subnet’s official trajectoryRL/trajectoryRL repository presents TrajectoryRL as a continuous reinforcement learning competition in which miners produce AI agent skills and validators evaluate them in sandboxed scenarios.

What TrajectoryRL Provides

According to the official repository, TrajectoryRL is organized around installable SKILL.md files for AI agents. The README says miners compete to produce the best skills, validators test those skills in real sandboxes, and the winning skills surface through the subnet’s associated tooling.

The same source describes the subnet as a continuous competition rather than a one-time benchmark. Its stated aim is to improve the quality and reliability of agent skills over repeated evaluation cycles.

Skill Artifact Context

The TrajectoryRL repository describes the miner artifact as a SKILL.md pack rather than a model endpoint or always-on server. For a reader, that changes how to interpret the subnet’s output: the durable product is an instruction scaffold that can be installed by an agent and reused across tasks, not just a single benchmark answer. That separates miner work from validator infrastructure: miners compete on reusable agent guidance, while validators measure whether that guidance improves task completion.

The miner guide supports that reading by describing SKILL.md as the persistent context an agent receives before working through scenarios. Competitive miners therefore improve the way an agent approaches classes of operational problems. A skill that only memorizes a known scenario is weak, because it does not help the agent generalize when the next task changes.

References: TrajectoryRL repository, TrajectoryRL miner guide

Sandbox Scoring Context

The trajrl-bench repository describes the benchmark as scenario containers with programmatic verifiers. Validators run the agent through each scenario, then score the produced output against tests. That makes SN11 closer to a software-task evaluation market than to a subjective prompt-ranking contest.

This context explains why the article emphasizes sandboxed scenarios. The useful signal is whether the submitted skill causes the agent to complete concrete tasks, produce the requested deliverable, and pass verifier checks across the active scenario set. The scoring source describes quality as the passed-test share per scenario, summed across scenarios, so breadth and reliability both matter.

References: trajrl-bench, TrajectoryRL miner guide

Miner and Validator Roles

TrajectoryRL follows the standard Bittensor subnet structure. Miners submit work to the subnet, and validators evaluate miner performance and set weights. Those validator assessments feed into Yuma Consensus, which determines how subnet emissions are distributed.

The official repository says TrajectoryRL’s miner role centers on producing agent skills, while the validator role centers on evaluating those skills in sandboxed scenarios.

On-Chain Identity

The on-chain identity for netuid 11 lists the following public metadata:

  • Subnet name: TrajectoryRL
  • Description: Agentic RL as a Service, Optimize agent trajectories to make agents cheaper, safer, and more reliable.
  • GitHub repository: trajectoryRL/trajectoryRL
  • Project URL: trajrl.com
  • Discord: The on-chain Discord field is blank.
  • Owner coldkey: 5D2Jhtbnm7iAdKfjRk6DisXBnr1MEsYat8kXqaPNrVqJP3uE
  • Neurons: 256

Live identity data is available on TaoStats.

Relationship to Multiple Mechanisms

TrajectoryRL has validators evaluate miner agent skills in sandboxed scenarios before setting weights. The Glossary and Multiple Incentive Mechanisms docs note that validators must evaluate miners separately for each mechanism.

For readers, this article documents one subnet market. If that netuid runs more than one incentive mechanism, validator scores and weights should be read per mechanism rather than as one combined path.

Relationship to Yuma Consensus

Subnet 11 uses Yuma Consensus to convert the skill-evaluation weight vectors that validators submit into the emission shares distributed to miners and validators within the subnet each tempo. The linked documentation describes how validator weight submissions are aggregated into consensus weights for each miner registered on the subnet.

In TrajectoryRL’s context, validators execute miner-submitted agent skills in sandboxed scenarios, score each skill against programmatic verifiers, and translate those evaluation results into on-chain weights. The Emission documentation describes how those consensus weights determine each participant’s share of the subnet’s accumulated emission each tempo.

Development Stage Context

The Introduction to Bittensor describes subnet development as moving from localnet to testnet and then mainnet. For TrajectoryRL (SN11), that sequence changes how readers should interpret skill evaluation examples and sandbox scoring outcomes.

In localnet, TrajectoryRL-compatible agents and skills can be developed and evaluated in an isolated environment. Sandbox skill scores and emission outcomes from localnet do not represent production subnet performance.

On testnet, agent skills can be exercised in a shared, non-production network. Testnet skill evaluations and validator scores are separate from mainnet subnet state.

On mainnet, TrajectoryRL (SN11) is the live production subnet where validators score miner skills against sandboxed scenarios to determine real Bittensor emissions. The TrajectoryRL repository describes the mechanism that applies on the production network.

The Bittensor Networks reference separates mainnet, testnet, and localnet. A skill evaluation score or emission example from one environment should not be read as representing production subnet performance in another environment.

Reader Boundary

Subnet 11 TrajectoryRL should not be read as generic Bittensor subnet documentation, a guarantee of agent safety outcomes, or proof that every skill format scores the same way. It names one subnet’s agentic reinforcement-learning competition on netuid 11 (Understanding Subnets, Glossary: Netuid).

SKILL.md Artifacts Are the Miner Deliverable

The TrajectoryRL repository describes the miner artifact as an installable SKILL.md pack rather than a model endpoint or always-on server (TrajectoryRL repository).

Readers should evaluate miner output as reusable agent guidance rather than as a single benchmark answer.

Sandbox Verifiers Define the Scored Signal

The trajrl-bench repository describes scenario containers with programmatic verifiers. Validators run agents through scenarios and score passed-test results across the active scenario set (trajrl-bench).

That makes the useful signal task completion under verifiers rather than subjective prompt ranking.

Validator Weights Still Flow Through Yuma Consensus

Subnet 11 uses Yuma Consensus to convert validator weight submissions into emission shares each tempo (Yuma Consensus, Emission).

Further Reading

Topics Subnets