# Dataset Card for AO3 ### Dataset Summary This dataset contains approximately 12.6 million publicly available works from AO3. The dataset was created by processing works with IDs from 1 to 63,200,000 that are publicly accessible. Each entry contains the full text of the work along with comprehensive metadata including title, author, fandom, relationships, characters, tags, warnings, and other classification information. ### Languages The dataset is multilingual, with works in many different languages, though English is predominant. ## Dataset Structure ### Data Files The dataset is stored in compressed JSONL files (jsonl.zst format), with each archive containing 100,000 sequential IDs. For example, `ao3_40500001-40600000.jsonl.zst` contains works with IDs in that range. ### Data Fields This dataset includes the following fields: - `id`: Unique identifier for the work (string) - `title`: Title of the work (string) - `metadata`: Dictionary containing: - `Archive Warning`: Content warnings for the work - `Category`: Relationship categories (e.g., F/M, M/M, F/F) - `Characters`: List of characters appearing in the work - `Fandom`: Fandom(s) the work belongs to - `Language`: Language of the work - `Rating`: Content rating (e.g., General Audiences, Teen And Up, Mature, Explicit) - `Relationship`: Specific relationship pairings featured - `Series`: Series the work belongs to, if applicable - `author`: Username of the creator - `chapters`: Chapter structure information (e.g., "1/1" for a completed one-shot) - `completed`: Whether the work is completed - `published`: Publication date - `words`: Word count - `text`: Main content of the work (string) ### Data Splits All examples are in a single split. ### Download magnet:?xt=urn:btih:51c21fd1ae2896d6d5307347960da059236e6bd9&dn=%5BDataset%5D%20nyuuzyou%2Fao3%20%282025-04-25%29&tr=udp%3A%2F%2Ftracker.ducks.party%3A1984%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce