# Dataset Card for AO3
### Dataset Summary
This dataset contains approximately 12.6 million publicly available works from AO3. The dataset was created by processing works with IDs from 1 to 63,200,000 that are publicly accessible. Each entry contains the full text of the work along with comprehensive metadata including title, author, fandom, relationships, characters, tags, warnings, and other classification information.

### Languages
The dataset is multilingual, with works in many different languages, though English is predominant.

## Dataset Structure
### Data Files
The dataset is stored in compressed JSONL files (jsonl.zst format), with each archive containing 100,000 sequential IDs. For example, `ao3_40500001-40600000.jsonl.zst` contains works with IDs in that range.

### Data Fields
This dataset includes the following fields:
- `id`: Unique identifier for the work (string)
- `title`: Title of the work (string)
- `metadata`: Dictionary containing:
  - `Archive Warning`: Content warnings for the work
  - `Category`: Relationship categories (e.g., F/M, M/M, F/F)
  - `Characters`: List of characters appearing in the work
  - `Fandom`: Fandom(s) the work belongs to
  - `Language`: Language of the work
  - `Rating`: Content rating (e.g., General Audiences, Teen And Up, Mature, Explicit)
  - `Relationship`: Specific relationship pairings featured
  - `Series`: Series the work belongs to, if applicable
  - `author`: Username of the creator
  - `chapters`: Chapter structure information (e.g., "1/1" for a completed one-shot)
  - `completed`: Whether the work is completed
  - `published`: Publication date
  - `words`: Word count
- `text`: Main content of the work (string)

### Data Splits
All examples are in a single split.

### Download
magnet:?xt=urn:btih:51c21fd1ae2896d6d5307347960da059236e6bd9&dn=%5BDataset%5D%20nyuuzyou%2Fao3%20%282025-04-25%29&tr=udp%3A%2F%2Ftracker.ducks.party%3A1984%2Fannounce&tr=udp%3A%2F%2Fexplodie.org%3A6969%2Fannounce&tr=udp%3A%2F%2Fopen.stealth.si%3A80%2Fannounce