Data is in the form of text files, with fields separated by tabs and each line representing a record/row. Data is encoded in UTF-8, with Unix line endings, and tab, newline, null and backslash characters escaped using C-style escapes (\t, \n, \0 and \\ respectively).
The first row always contains the column headers - you can use these to sort data appropriately, to deal with structural changes.
Data is actually exported from a MySQL database, and can be imported into MySQL using a LOAD DATA query. You can also load it into Microsoft Excel, and although I'd like to joke about trying to load sizable data sets into Excel, it actually works...
The following describes the tables and fields.
Torrents Table
This table contains all torrent entries. For example, the display on the homepage simply pulls latest items from this table.
Fields:
id: unique identifier
tosho_id: ID from TokyoTosho; 0 if none available
nyaa_id: ID from Nyaa; 0 if none available
anidex_id: ID from AniDex; 0 if none available
nekobt_id: ID from nekoBT; 0 if none available
name: name/title
link: original HTTP torrent download link
magnet: magnet link of torrent, either obtained from source or generated from torrent file
cat: TokyoTosho category
website: URL for website
totalsize: total size of all files in torrent, in bytes
date_posted: Unix timestamp of when torrent was uploaded
comment:
date_added: Unix timestamp of when torrent was grabbed by AT scripts
date_completed: Unix timestamp of when torrent download was completed
torrentname: Name extracted from torrent file
torrentfiles: Number of files found in the torrent
stored_nzb: Whether AT has an NZB stored with this entry. If available, the NZB can be downloaded from https://storage.animetosho.torrentbay.st/nzbs/xxxxxxxx/file.nzb replacing xxxxxxxx with the 8 character hex encoded ID (see id column, convert the number to hexadecimal representation and left pad with zeroes)
stored_torrent: Whether AT has a .torrent stored with this entry. If available, the torrent can be downloaded from https://storage.animetosho.torrentbay.st/torrent/hex-btih/torrent.torrent replacing hex-btih with the 40 character lower-case hex encoded info hash (see btih column)
anidex_labels: labels from AniDex as bit flags: 1=batch, 2=raw, 4=hentai, 8=reencode
nekobt_hide: 1 if hidden/deleted on nekoBT, 0 otherwise
btih: hex encoded torrent info hash
btih_sha256: hex encoded BitTorrent v2 info hash, if file is a BTv2 or hybrid torrent file
isdupe: whether this entry is considered a duplicate, based on BTIH
deleted: whether this entry is marked as deleted
date_updated: Unix timestamp of when this row was last updated
aid: related AniDB anime ID
eid: related AniDB episode ID
fid: related AniDB file ID
gids: related AniDB group IDs, comma separated list
resolveapproved: whether exclamation mark shows up next to the anime title in the view page
main_fileid: if the torrent contains one file of significance, will be the AT file ID, otherwise 0. If this is set, links from this file are displayed on the home page
This table contains all file entries. Torrents contain one or more files.
Fields:
id: unique identifier
torrent_id: ID of associated torrent entry
is_archive: 1 if this file is an 7z archive created by the Anime Tosho script, 0 otherwise
filename: file's name; includes path if supplied
filesize: file's size in bytes
vidframes: non-empty if video I-frames are being stored for screenshot purposes. Is a comma separated list of integers which are timestamps at which the frame occurs in the video (miliseconds elapsed since start of video). Stored I-frames can be downloaded from https://storage.animetosho.torrentbay.st/sframes/xxxxxxxx_time.mkv where xxxxxxxx is the file's ID, hex encoded and left padded with zeroes to 8 characters, and time is the timestamp of the frame. If soft subtitles have been rendered for the frame, they can be downloaded from https://storage.animetosho.torrentbay.st/sframes/xxxxxxxx_track_time.webp where track is the track number of the subtitle (from the original video, usually is track 3)
crc32: CRC32 hash of file, hex encoded, if available
md5: MD5 hash of file, hex encoded, if available
sha1: SHA1 hash of file, hex encoded, if available
sha256: SHA256 hash of file, hex encoded, if available
tth: TTH hash of file, hex encoded, if available
ed2k: ED2K hash of file, hex encoded, if available
crc32k: CRC32 hash of first 1KB of file, hex encoded, if available
torpc_sha1_*: hex encoded SHA1 hash of concatenated SHA1 hashes (binary encoded) of the respective block size. For example, the torpc_sha1_16k hash is obtained by breaking the file into 16KB blocks (if the last block is less than 16KB, it is discarded), calculating a 20 byte SHA1 hash for each block, concatenating these hashes, which is then fed through SHA1 to obtain the final hash. The selected block sizes correspond with the most common piece sizes used for torrents, and hence this hash can be useful in trying to detect duplicate torrents which have different info hash values.
Mediainfo and related data are currently not included mainly due to size and time it takes to dump the data. The data is also compressed using a custom LZMA based scheme, which users would need to implement a decompressor for. I may consider including this data if many are interested in such.
Attachments Table
This table contains all attachment (subtitles, fonts etc) entries. File entries are mapped 1:1 to attachment entries (if attachments exist). Note that attachments are de-duplicated, and hence, there's a separate Attachment Files table (below) which describes unique attachment files.
Fields:
file_id: ID of associated file entry
attachments: JSON encoded array describing available attachments
JSON Array Structure
The array houses up to four entries, in the order listed below. The first two are an array of objects, describing each file, whilst the last two are integers. A null is used to indicate that a particular entry is missing (e.g. if there's only subtitles, the first entry will be null whilst the second will be an array).
Array of file attachments (e.g. fonts)
_afid: attachment file ID
name: the file name of the attachment
mime: the MIME type of the attachment
Array of subtitles
_afid: attachment file ID
lang: subtitle language
codec: subtitle format (e.g. SRT, ASS)
tracknum: track number the subtitle occupied in the source file
Chapters XML file (attachment file ID)
Tags XML file (attachment file ID)
Attachment Files Table
This table contains information about attachment files stored on disk. A file on disk can be linked to multiple attachments (due to de-duplication).
Fields:
id: unique identifier; files can be downloaded by converting this ID to a hex representation, left-padding with 0's to make it 8 hex characters long, and visiting the URL https://storage.animetosho.torrentbay.st/attach/xxxxxxxx/file.xz, replacing xxxxxxxx with the 8 character hex representation of the ID, left padded with zeroes
sha1: hex encoded SHA1 hash of the file
filesize: size of file
packedsize: size of file, after XZ compression
File Links Exports
These contain the download links generated for files.
The data is an export of links generated/updated since the last export, performed daily. Only a few days' worth of snapshots are retained, and the full table is not available due to size.
Fields:
id: unique identifier
file_id: ID of associated file entry
site: displayed site name for the link. Sub-links are indicated as Parent|Child
part: 1-based part number; if file wasn't split, will be 1
url: the link URL
date_added: Unix timestamp of when this entry was added
date_updated: Unix timestamp of when this entry was last updated
Other Tables
The other tables used by Anime Tosho probably won't be supplied for the following reasons:
Tracker Scrape Tables: contains seeder/leecher stats scraped from torrent trackers; this information can usually be scraped easily
AniDB Info Tables: contains all data retrieved from AniDB, such as series information; please see AniDB API for obtaining data
AniDB - TVDB Mapping Table: used to map AniDB references to TVDB/IMDB. Original mapping data 'anime-lists' can be found here
Source Scrape Tables: contains all data scraped from upstream sources (TokyoTosho, Nyaa, AniDex). Please see sources for data dumps if desired