- # Dataset Card for AO3
- ### Dataset Summary
- This dataset contains approximately 12.6 million publicly available works from AO3. The dataset was created by processing works with IDs from 1 to 63,200,000 that are publicly accessible. Each entry contains the full text of the work along with comprehensive metadata including title, author, fandom, relationships, characters, tags, warnings, and other classification information.
- ### Languages
- The dataset is multilingual, with works in many different languages, though English is predominant.
- ## Dataset Structure
- ### Data Files
- The dataset is stored in compressed JSONL files (jsonl.zst format), with each archive containing 100,000 sequential IDs. For example, `ao3_40500001-40600000.jsonl.zst` contains works with IDs in that range.
- ### Data Fields
- This dataset includes the following fields:
- - `id`: Unique identifier for the work (string)
- - `title`: Title of the work (string)
- - `metadata`: Dictionary containing:
- - `Archive Warning`: Content warnings for the work
- - `Category`: Relationship categories (e.g., F/M, M/M, F/F)
- - `Characters`: List of characters appearing in the work
- - `Fandom`: Fandom(s) the work belongs to
- - `Language`: Language of the work
- - `Rating`: Content rating (e.g., General Audiences, Teen And Up, Mature, Explicit)
- - `Relationship`: Specific relationship pairings featured
- - `Series`: Series the work belongs to, if applicable
- - `author`: Username of the creator
- - `chapters`: Chapter structure information (e.g., "1/1" for a completed one-shot)
- - `completed`: Whether the work is completed
- - `published`: Publication date
- - `words`: Word count
- - `text`: Main content of the work (string)
- ### Data Splits
- All examples are in a single split.
- ### Download
- magnet:?xt=urn:btih:51c21fd1ae2896d6d5307347960da059236e6bd9&dn=%5BDataset%5D%20nyuuzyou%2Fao3%20%282025-04-25%29&tr=udp%3A%2F%2Ftracker.ducks.party%3A1984%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce