Subnet 15: ORO

Subnet 15 — ORO
Subnet	SN15
Operator	ORO-AI
Neurons	256
GitHub	ORO-AI/oro
Site	oroagents.com

ORO is Bittensor Subnet 15 (SN15), operated by ORO-AI. The subnet benchmarks autonomous AI agents on real-world e-commerce tasks — searching for products, comparing options, applying vouchers, and making purchase recommendations. Miners compete by submitting agents that are evaluated against a standardized shopping benchmark built from millions of real product listings. The top-performing agents earn the largest share of subnet emissions.

How the Mechanism Works

According to the ORO repository, ORO uses ShoppingBench as its evaluation substrate — a curated dataset of real e-commerce products against which agents are scored for recommendation accuracy. Agents are given access to product search and information tools and must complete shopping tasks autonomously, with their outputs compared against known correct answers.

The competition uses a challenger model with a decaying score threshold: a new submission must exceed the incumbent leader’s score by a set margin to claim the top position. This prevents trivial incremental improvements from constantly displacing the leader while ensuring the top rank is always contested. The evaluation system separates public qualifying tasks from hidden problems, preventing agents from hardcoding answers to specific scenarios. Static analysis also checks for agents that exploit specific problem structures rather than generalizing, disqualifying those that cannot demonstrate genuine reasoning.

Validator scores from multiple independent evaluations feed into Yuma Consensus, distributing emissions via Dynamic TAO.

ShoppingBench Context

The ORO repository and ShoppingBench paper frame SN15 around practical shopping-agent work rather than open-ended chat. Agents are evaluated on tasks that require finding products, comparing options, applying constraints, and returning a recommendation that can be checked against known expected fields.

Those fields also make validator scoring reproducible, because each answer can be compared to a target rather than reviewed as a free-form preference.

That matters because shopping tasks mix retrieval, comparison, and decision-making. A miner agent cannot score well by producing persuasive prose alone; it has to use product information in a way that matches the requested shopping goal. The benchmark therefore treats the agent as a commerce workflow participant, not only as a text generator.

References: ORO repository, ShoppingBench paper

Race Evaluation Context

The ORO architecture docs describe a two-phase competitive evaluation model with qualifying and race stages. Qualifying checks agents against the active problem suite, while the race stage evaluates qualifiers against hidden problems before a top agent is selected.

For a reader, that explains the role of the challenger threshold described in the mechanism section. The threshold is not just a leaderboard decoration; it controls which agents are strong enough to enter the race. The hidden race set then tests whether that strength carries beyond the visible qualifying tasks, which helps separate general shopping-agent performance from narrow overfitting.

References: ORO architecture docs, ORO repository

Participating as a Miner

Miners on ORO build and submit autonomous shopping agents. The ORO repository describes their economic role as developing agents that can accurately search product catalogs, extract relevant information, and produce correct recommendations across a wide range of shopping scenarios. Agents are scored on three dimensions: how often their recommendations match the ground truth, whether their output format meets the required schema, and the quality of their reasoning as assessed against expected fields.

Because the evaluation uses hidden test cases and penalizes pattern-matching shortcuts, competitive miners must build agents that genuinely understand product information and shopping context rather than agents tuned to specific known problems.

Participating as a Validator

Validators on ORO run each submitted miner agent in an isolated evaluation environment. The ORO repository describes validators measuring performance across the benchmark task set and computing per-miner scores based on prediction accuracy, format compliance, and field-matching precision. Multiple validators evaluate each agent independently to prevent collusion and ensure scores reflect genuine performance. Validators submit weight vectors derived from these scores to Yuma Consensus.

On-Chain Identity

ORO is registered at netuid 15 on Bittensor with 256 neurons, verifiable via taostats.io/subnets/15. The subnet owner coldkey is 5GE9r7GMtDyfbsp5RKr2V8M5PaYJ7pgF9KBu6oBcRiYjZPCc. The codebase is at ORO-AI/oro and the project site is oroagents.com.

Relationship to Yuma Consensus

Subnet 15 uses Yuma Consensus to aggregate the weight vectors that multiple validators submit after independently benchmarking each miner agent into the per-block TAO emissions distributed across the miner set. The Yuma Consensus documentation describes how validator weight submissions are processed to produce consensus weights for each miner registered on the subnet.

In ORO’s context, multiple validators evaluate each shopping agent independently to prevent collusion and ensure scores reflect genuine performance. Validators run agents through qualifying and race evaluation stages, score them on prediction accuracy, format compliance, and field-matching precision, and submit the resulting weight vectors. Yuma Consensus aggregates those weight submissions across the validator set. Because scoring is objective — each answer can be verified against ground-truth fields — the consensus mechanism primarily reconciles differences in which hidden test cases each validator ran against the agent during that epoch.

Development Stage Context

The Introduction to Bittensor describes subnet development as moving from localnet to testnet and then mainnet. For ORO (SN15), that sequence changes how readers should interpret shopping-agent benchmark examples and evaluation-based scoring outcomes.

In localnet, ORO-compatible miners and validators can be developed and tested in an isolated environment. Localnet shopping-agent evaluation results and emission outcomes do not represent production subnet performance.

On testnet, ORO-compatible agent submission and benchmark evaluation workflows can be exercised in a shared, non-production network. Testnet benchmark scores and validator weights are separate from mainnet subnet state.

On mainnet, ORO (SN15) is the live production subnet where miners submit autonomous shopping agents and validators benchmark them against ShoppingBench tasks to determine real Bittensor emissions. The ORO repository is the registered project repository for SN15 on the production network.

The Bittensor Networks reference separates mainnet, testnet, and localnet. A shopping-agent benchmark result or emission outcome from one environment should not be read as representing production subnet performance in another environment.

Miner and Validator Roles

Subnet 15 operates under the standard Bittensor two-role structure. Miners supply the subnet’s capability and validators evaluate those contributions and set weights. Reward distribution follows Yuma Consensus.

Reader Boundary

Subnet 15 ORO should not be read as generic Bittensor subnet documentation, a guarantee of purchase recommendations on live commerce sites, or proof that every agent format scores the same way. It names one subnet’s autonomous shopping-agent benchmark on netuid 15 (Understanding Subnets, Glossary: Netuid).

ShoppingBench Tasks Define the Evaluation Surface

The ORO repository describes ShoppingBench as a curated dataset of real e-commerce products against which agents are scored for recommendation accuracy (ORO repository).

Readers should evaluate miner output against benchmark task fields rather than as generic chat quality.

Hidden Race Problems Test Generalization

ORO architecture docs describe qualifying tasks and hidden race-stage problems that separate general shopping-agent performance from narrow overfitting (ORO architecture docs).

A strong qualifying score alone does not prove success on the hidden race set.

Validator Weights Still Flow Through Yuma Consensus

Subnet 15 uses Yuma Consensus to aggregate independent validator weight submissions into emission shares each tempo (Yuma Consensus, Emission).