These tar files contain hourly page requests for all public Wikimedia wikis All human page requests are included, whether for editing or reading, no matter if the page exists or not. Bots and other non-human traffic are filtered as much as possible. These data are collected since May 2015. For older data (since 2008) see similar files (but without non-human traffic filtered) at https://dumps.wikimedia.org/other/pagecounts/ (that older folder is still also used to generate long-term trend reports) webstatscollector 3.0: Source: hadoop / hive See readme.txt at https://dumps.wikimedia.org/other/pageviews/ FILE CONTENT Here are a few sample lines from one file: eml - 134 2336748 en - 8798556 205294255505 en.b - 12011 160444865 en.d - 72411 651799702 en.mw - 693098 61018547525 en.n - 6976 106480601 en.q - 15276 261954372 en.s - 10575 85471409 en.v - 4433 34335326 eo - 14476 240604800 In the above, the first column "en.b" is the project name. The second column is always a dash, the third column is the total number of page requests, and the fourth column is the size of the content returned. When lines have been patched (see below), the fourth column always shows a 1. The following abbreviations are used for projectname: (see html info above for details) wikibooks: '.b' wiktionary: '.d' wikimedia: '.m' (for special wikis) wikimedia mobile: '.mw' (for all projects combined) wikinews: '.n' wikipedia: (no suffix) wikiquote: '.q' wikisource: '.s' wikiversity: '.v' wikivoyage: 'voy' mediawiki: '.w' webstatscollector 2.0 introduced suffixes '.m' for mobile and '.zero' (and webstatscollector 3.0 did away with .mw for mobile traffic to all wikis combined combined, kept in 2.0 for backward compatability, but redundant and deprecated) for special wikis mobile and zero traffic are found at suffixes 'm.m' and '.m.zero' REFRESH RATE New projectcounts files are collected and added to the tar file at least once per day File 'most-recent-file.txt' contains the name of the most recent projectcounts file in the archives.