Subnet 13: Data Universe
Data Universe is Bittensor Subnet 13 (SN13), operated by Macrocosmos. The official repository describes a distributed network of miners scraping, storing, and indexing real-time social media data from sources including X (Twitter), Reddit, and YouTube. The result is an open-source dataset distributed across the miner set rather than held by any single entity — enabling scale that would be impractical with a centralized approach. The dataset is publicly accessible and continuously refreshed, with miners adding tens of millions of new records daily.
How the Mechanism Works
According to the official repository, miners store collections of data points organized by source, timestamp, and content label. Each miner publishes an inventory to the network so validators know what data is held where. Validators periodically sample each miner’s stored data to verify accuracy and build a complete map of the distributed dataset.
Scoring rewards data that is fresh, diverse, and in demand. Data older than 30 days receives no score — freshness is the primary driver of value. The subnet uses a Dynamic Desirability List that reflects which data types users are actively requesting through Macrocosmos’s Gravity product; data matching current demand scores higher than unspecified categories.
De-duplication is handled through economic incentives rather than explicit restrictions: data stored across many miners is worth less per copy than data held by fewer. This naturally pushes miners to specialize in different sources and time windows rather than duplicating the same popular content. Miner scores incorporate both total data value and a credibility factor that penalizes miners who misrepresent their inventory to validators.
Weight vectors from validator scoring feed into Yuma Consensus, which converts them into per-block TAO emissions distributed via Dynamic TAO.
Participating as a Miner
Miners on Data Universe scrape designated social media sources and store the collected data points locally. The official repository describes their economic role as maximizing the value of their stored dataset — which means prioritizing fresh data from sources currently in demand, avoiding duplication of what other miners already hold, and accurately reporting their inventory. Miners whose data is fresh, unique, and matches current desirability targets earn higher emissions; stale or widely duplicated data earns proportionally less.
Miners also upload anonymized copies of their datasets to external storage for public access, making their collected data available beyond the subnet itself.
Participating as a Validator
Validators on Data Universe maintain a real-time map of what data the miner set collectively holds. The official repository describes validators querying each miner’s published inventory and spot-checking samples of the reported data to verify correctness. From this, validators compute per-miner scores across two dimensions: the value of the data each miner holds (weighted by freshness, type, and demand) and the miner’s credibility score (based on how accurately they represent their inventory).
Because validators track only metadata and spot-sampled records rather than the full dataset, their storage requirements remain modest even as the total distributed dataset scales to petabyte range. Validators submit weight vectors derived from these scores to Yuma Consensus.
Real-Time Signal Context
Data Universe scores stored data on freshness, with value falling to zero after 30 days. This design reflects what the dataset is for: it is treated as a live signal of ongoing public conversation rather than a static archive. A miner’s score therefore measures how well they keep a current window of social data flowing into the network, not how much history they have accumulated. The Dynamic Desirability List extends this logic — by tying higher scores to data categories that Macrocosmos’s Gravity consumers are actively requesting, the network steers a permissionless miner set toward the data that has present downstream demand. Read this way, a high score signals that a miner is supplying fresh, currently-wanted data, which is the form of value the subnet exists to produce.
Emergent Deduplication Context
Rather than assign each miner a fixed scraping territory, Data Universe lets per-copy value fall as more miners hold the same records. The official repository frames de-duplication as an economic outcome rather than an enforced rule: because duplicating popular content earns progressively less, miners are pushed to specialize in different sources and time windows on their own. The credibility factor completes the trust model — since validators only sample and map inventory rather than store the full dataset, a miner who misrepresents what they hold is penalized, which is what makes lightweight validator verification viable as the dataset scales. Together these explain why a distributed dataset of this size can be coordinated through incentives instead of a central scheduler.
On-Chain Identity
Data Universe is registered at netuid 13 on Bittensor with 256 neurons, verifiable via
taostats.io/subnets/13. The subnet owner coldkey is
5HBswBt1A9Ahx6U76abXXGd7VmabmCNBGhSK2vrP71GSxtgZ. The codebase is maintained at
macrocosm-os/data-universe and the associated
product is macrocosmos.ai/gravity.
Relationship to Yuma Consensus
Subnet 13 uses Yuma Consensus to convert the data-quality weight vectors that validators submit into the emission shares distributed to miners and validators within the subnet each tempo. The linked documentation describes how validator weight submissions are aggregated into consensus weights for each miner registered on the subnet.
In Data Universe’s context, the official repository describes validators sampling each miner’s stored datasets and scoring the collected data on freshness, variety, demand alignment, and reporting accuracy before translating those scores into on-chain weight vectors. The Emission documentation describes how those consensus weights determine each participant’s share of the subnet’s accumulated emission each tempo.
Development Stage Context
Bittensor’s subnet-development path moves from localnet to testnet and then mainnet (Introduction to Bittensor). For Data Universe (SN13), that sequence changes how readers should interpret miner-storage examples and validator scoring outcomes.
In localnet, miners and validators can develop and test Data Universe-compatible implementations in an isolated environment. Localnet freshness scores, inventory checks, and emission outcomes do not represent production subnet performance.
On testnet, Data Universe-compatible implementations can be exercised in a shared, non-production network. Testnet storage maps, validator spot checks, and scoring results are separate from mainnet subnet state.
On mainnet, Data Universe (SN13) is the production subnet where miners scrape and store real-time social media data while validators sample inventories and score data freshness, diversity, demand, and credibility (Data Universe repository).
The Bittensor Networks reference separates mainnet, testnet, and localnet. A freshness score, inventory check, or emission outcome from one environment should not be read as representing production subnet performance in another environment.
That distinction matters for Data Universe because freshness and demand are time-sensitive signals. A local or test example can explain the scoring model, while a production example depends on the current data window, requested categories, and validator sampling context that produced it.
It also prevents a sample inventory result from being mistaken for a durable statement about the whole distributed dataset.
Miner and Validator Roles
Subnet 13 operates under the standard Bittensor two-role structure. Miners supply the subnet’s capability and validators evaluate those contributions and set weights. Reward distribution follows Yuma Consensus.
Reader Boundary
Subnet 13 Data Universe should not be read as generic Bittensor subnet documentation, a centralized data provider, or a permanent archive of social media history. It names one subnet’s distributed, freshness-scored social-data collection network on netuid 13, operated by Macrocosmos (Data Universe repository, Understanding Subnets, Glossary: Netuid).
Data Universe is also distinct from the Macrocosmos Gravity product that consumes its Dynamic Desirability signals. This article documents the on-chain subnet incentive that produces the distributed dataset on netuid 13, not the downstream application or its commercial terms (Macrocosmos Gravity).