Subnet 11: TrajectoryRL
TrajectoryRL is Bittensor Subnet 11. TaoStats describes the live on-chain identity as “Agentic RL as a Service, Optimize agent trajectories to make agents cheaper, safer, and more reliable.” The subnet’s official trajectoryRL/trajectoryRL repository presents TrajectoryRL as a continuous reinforcement learning competition in which miners produce AI agent skills and validators evaluate them in sandboxed scenarios.
What TrajectoryRL Provides
According to the official repository, TrajectoryRL
is organized around installable SKILL.md files for AI agents. The README says miners compete to
produce the best skills, validators test those skills in real sandboxes, and the winning skills
surface through the subnet’s associated tooling.
The same source describes the subnet as a continuous competition rather than a one-time benchmark. Its stated aim is to improve the quality and reliability of agent skills over repeated evaluation cycles.
Skill Artifact Context
The TrajectoryRL repository describes the miner
artifact as a SKILL.md pack rather than a model endpoint or always-on server. For a reader, that
changes how to interpret the subnet’s output: the durable product is an instruction scaffold that
can be installed by an agent and reused across tasks, not just a single benchmark answer. That
separates miner work from validator infrastructure: miners compete on reusable agent guidance, while
validators measure whether that guidance improves task completion.
The miner guide
supports that reading by describing SKILL.md as the persistent context an agent receives before
working through scenarios. Competitive miners therefore improve the way an agent approaches classes
of operational problems. A skill that only memorizes a known scenario is weak, because it does not
help the agent generalize when the next task changes.
References: TrajectoryRL repository, TrajectoryRL miner guide
Sandbox Scoring Context
The trajrl-bench repository describes the benchmark as scenario containers with programmatic verifiers. Validators run the agent through each scenario, then score the produced output against tests. That makes SN11 closer to a software-task evaluation market than to a subjective prompt-ranking contest.
This context explains why the article emphasizes sandboxed scenarios. The useful signal is whether the submitted skill causes the agent to complete concrete tasks, produce the requested deliverable, and pass verifier checks across the active scenario set. The scoring source describes quality as the passed-test share per scenario, summed across scenarios, so breadth and reliability both matter.
References: trajrl-bench, TrajectoryRL miner guide
Miner and Validator Roles
TrajectoryRL follows the standard Bittensor subnet structure. Miners submit work to the subnet, and validators evaluate miner performance and set weights. Those validator assessments feed into Yuma Consensus, which determines how subnet emissions are distributed.
The official repository says TrajectoryRL’s miner role centers on producing agent skills, while the validator role centers on evaluating those skills in sandboxed scenarios.
On-Chain Identity
The on-chain identity for netuid 11 lists the following public metadata:
- Subnet name: TrajectoryRL
- Description: Agentic RL as a Service, Optimize agent trajectories to make agents cheaper, safer, and more reliable.
- GitHub repository: trajectoryRL/trajectoryRL
- Project URL: trajrl.com
- Discord: The on-chain Discord field is blank.
- Owner coldkey:
5D2Jhtbnm7iAdKfjRk6DisXBnr1MEsYat8kXqaPNrVqJP3uE - Neurons: 256
Live identity data is available on TaoStats.
Relationship to Multiple Mechanisms
TrajectoryRL has validators evaluate miner agent skills in sandboxed scenarios before setting weights. The Glossary and Multiple Incentive Mechanisms docs note that validators must evaluate miners separately for each mechanism.
For readers, this article documents one subnet market. If that netuid runs more than one incentive mechanism, validator scores and weights should be read per mechanism rather than as one combined path.
Relationship to Yuma Consensus
Subnet 11 uses Yuma Consensus to convert the skill-evaluation weight vectors that validators submit into the emission shares distributed to miners and validators within the subnet each tempo. The linked documentation describes how validator weight submissions are aggregated into consensus weights for each miner registered on the subnet.
In TrajectoryRL’s context, validators execute miner-submitted agent skills in sandboxed scenarios, score each skill against programmatic verifiers, and translate those evaluation results into on-chain weights. The Emission documentation describes how those consensus weights determine each participant’s share of the subnet’s accumulated emission each tempo.
Development Stage Context
The Introduction to Bittensor describes subnet development as moving from localnet to testnet and then mainnet. For TrajectoryRL (SN11), that sequence changes how readers should interpret skill evaluation examples and sandbox scoring outcomes.
In localnet, TrajectoryRL-compatible agents and skills can be developed and evaluated in an isolated environment. Sandbox skill scores and emission outcomes from localnet do not represent production subnet performance.
On testnet, agent skills can be exercised in a shared, non-production network. Testnet skill evaluations and validator scores are separate from mainnet subnet state.
On mainnet, TrajectoryRL (SN11) is the live production subnet where validators score miner skills against sandboxed scenarios to determine real Bittensor emissions. The TrajectoryRL repository describes the mechanism that applies on the production network.
The Bittensor Networks reference separates mainnet, testnet, and localnet. A skill evaluation score or emission example from one environment should not be read as representing production subnet performance in another environment.
Reader Boundary
Subnet 11 TrajectoryRL should not be read as generic Bittensor subnet documentation, a guarantee of agent safety outcomes, or proof that every skill format scores the same way. It names one subnet’s agentic reinforcement-learning competition on netuid 11 (Understanding Subnets, Glossary: Netuid).
SKILL.md Artifacts Are the Miner Deliverable
The TrajectoryRL repository describes the miner artifact as an installable SKILL.md pack rather
than a model endpoint or always-on server
(TrajectoryRL repository).
Readers should evaluate miner output as reusable agent guidance rather than as a single benchmark answer.
Sandbox Verifiers Define the Scored Signal
The trajrl-bench repository describes scenario containers with programmatic verifiers. Validators run agents through scenarios and score passed-test results across the active scenario set (trajrl-bench).
That makes the useful signal task completion under verifiers rather than subjective prompt ranking.
Validator Weights Still Flow Through Yuma Consensus
Subnet 11 uses Yuma Consensus to convert validator weight submissions into emission shares each tempo (Yuma Consensus, Emission).
Further Reading
- Subnet 11 on TaoStats
- TrajectoryRL GitHub repository
- trajrl.com — official site
- Introduction to Bittensor: Subnet development
- Bittensor Networks
- Yuma Consensus — the weight-setting protocol used across all subnets
- Emission