Anime Tosho Data Dump
The following is a torrent containing data and assets produced over the years of AT operation. Feel free to share or use this data in any way you like.
Note that I am only able to seed this torrent for a few days, and I won't be retaining this data.
Download Torrent (total size: 1.01TB)
Download NZB (database dump only)
The rest of this page provides some info on how to interpret this data.
Database dumps
The database-dump.7z file in the torrent holds MariaDB SQL dumps, corresponding to the databases of Anime Tosho's updates. You'll need to import the schemas before the data (if you've set up AT using the guide, the schemas should already be imported).
Note: timebase.txt holds the last time (Unix timestamp) the torrent ingestion was run. If you're running animetosho-updater, place it in the same directory as cron.php.
A description of the tables can be found here, and a description of some fields can be found here.
Storage Data
The files have been packaged into a series of 7z files to ease distribution. Unless you're just archiving data, you'll likely want to extract these. To do this in a way that's compatible with AT's code, the 7z archives need to be unpacked into the same folder that they're located. On Windows with 7-Zip installed, go into each folder and select the .7z files, right-click and Extract Here, otherwise the following Bash command (run in the directory contains the torrents' contents) can achieve this.
Extract and remove archives:find . -mindepth 2 -type f -name '*.7z' -execdir sh -c '7z x "$1" && unlink "$1"' _ '{}' \;
Extract archives, retaining .7z files:find . -mindepth 2 -type f -name '*.7z' -execdir 7z x '{}' \;
Note that all files, excluding those in torrent/, are named using a hex encoded ID and you'll need to cross reference them with the database dumps above to identify them. To find it in the database table, convert the full hex name (see below) to decimal and look up the corresponding ID.
Assuming all .7z files are unpacked, the files are arranged in the following folders:
- attachments: extracted subtitles, attachments and chapters/tags, keyed by attachment file ID (DB ref:
toto_repl.toto_attachment_files.id). All files are XZ compressed - nzb: NZBs produced when uploading to Usenet, keyed by Anime Tosho torrent ID (DB ref:
toto_repl.toto_toto.id). All files are GZip compressed - sframes: extracted video I-Frames and pre-rendered subtitles, keyed by Anime Tosho file ID (DB ref:
toto_repl.toto_files.id). These files are used for displaying screenshots. The I-Frames are stored as MKVs with the timestamp (DB ref:toto_repl.toto_files.vidframes) tacked onto the filename, while the pre-rendered subtitles are in WebP format with the track ID (DB ref:toto_repl.toto_attachments.attachments) + timestamp tacked onto the filename.
Note: this additional info tacked ont the filename is separated with underscores and isn't hex encoded - sshots: legacy screenshot thumbnails, keyed by Anime Tosho file ID (DB ref:
toto_repl.toto_files.id). Files are stored as ZIP files containing the JPEG thumbnails (DB ref:toto_repl.toto_files.filethumbs). - torrents: fetched .torrent files, keyed by BTIH (DB ref:
toto_repl.toto_toto.btih). - anidex_archive: .torrent file archive from AniDex, keyed by AniDex ID (DB ref:
arcscrape.anidex_torrents.id) - nekobt_archive: .torrent file archive from nekoBT, keyed by nekoBT ID (DB ref:
arcscrape.nekobt_torrents.id) - nyaasi_archive: .torrent file archive from Nyaa.si, keyed by Nyaa ID (DB ref:
arcscrape.nyaasi_torrents.id) - nyaasis_archive: .torrent file archive from sukebei.nyaa.si, keyed by Sukebei ID (DB ref:
arcscrape.nyaasis_torrents.id)
Note that files are split into subfolders (to avoid having too many files in a folder) and the full name should be interpreted as a concatenation of the file's parent folders. For example, a file ideally named "01234567.xz" may be stored as "01234/567.xz" (and refers to ID=19088743 in the database).
Note on Bad Data
There's a small number of known instances of issues that I never got around to cleaning up. Examples off the top of my head:
- Data in compressed database fields used to be packed using a custom LZMA2 scheme and the compression code had a bug which would corrupt the data in rare cases
- Some I-frame extractions (sframes folder) failed, but ffmpeg still outputted an MKV with 0 frames. These will obviously fail to render as screenshots
- There may be other issues that I don't know of or can't remember. A lot has changed over the years, and there'll be imperfections (especially since I test in prod)
Source Code
You can find a list of code repositories which runs AT here.