A quantitative analysis of the Old School Runescape hiscores.
This repository contributes the following:
The dataset consists of the following files:
player-stats.csv: Skill levels in all 23 skills for the top 2 million OSRS accounts.cluster-centroids.csv: Central values for clusters that emerge from partitioning player dataset into groups based on account similarity. Each centroid is a vector of values between 1-99 in "OSRS skill" space.player-clusters.csv: Cluster IDs per player for three separate clustering runs, grouping similar accounts by looking at (i) all skills, (ii) combat skills only and (iii) non-combat skills only.player-stats-raw.csv: Rank, level, xp, clues, minigame and boss stats for the top 2 million OSRS players. This file is the raw output from the scraping process (1.7 GB).These files are not checked in to the repo due to file size constraints. They can be downloaded separately from Google Drive: https://bit.ly/osrs-hiscores-dataset
Player stats were collected from the official OSRS hiscores over a 24-hour period on July 21, 2022.
├── LICENSE
├── Makefile <- Top-level Makefile for building and running project.
├── README.md <- The top-level README for developers using this project.
│
├── app <- Application code and assets.
├── bin <- Utility executables.
│
├── data
│ ├── final <- The final, canonical data set.
│ ├── interim <- Intermediate data that has been transformed.
│ └── raw <- The original, immutable data dump.
│
├── ref <- Reference files used in data processing.
├── scripts <- Scripts for the stages of the data processing pipeline.
│
├── src
│ ├── analysis <- Data science and analytics.
│ └── scrape <- Scraping hiscores data.
│
├── test <- Unit tests.
│
├── Procfile <- Entry point for deployment as a Heroku application.
├── requirements.txt <- Dependencies file for reproducing the project environment.
├── runapp.py <- Main script for Dash application.
└── setup.py <- Setup file for installing this project through pip.
At a high level, this repository implements a data science pipeline:
scrape OSRS hiscores data
↓
cluster players by stats
↓
project clusters to 3D
↓
build application data
along with a Dash application for visualizing the results.
The stages of the data pipeline are driven by a Makefile with top-level make targets for each processing stage:
make init: set up project environment and install dependencies.make scrape: scrape data from the official OSRS hiscores and transform into a cleaned dataset.make cluster: cluster players into groups of similar accounts according to their stats. Uses k-means as the clustering algorithm, implemented by the faiss library.make postprocess: project the cluster centroids from high-dimensional space to 3D for visualization purposes (UMAP is the algorithm used for dimensionality reduction). Compute quartiles for each cluster based on the player population it contains.make build-app: build application data and database using all previous analytic results. This target will launch a MongoDB instance inside a Docker container at the URL localhost:27017 (by default).Steps 2 and 3 can (and should) be skipped by simply running make download-dataset, which fetches the scraped data and clustering results from an S3 bucket. This requires an AWS account with credentials located in the ~/.aws directory.
To launch the application, run make run-app and visit the URL localhost:8050 in a web browser.
The final application can be built and run in one shot via make app, which uses downloaded data rather than scraping and clustering the data from scratch. The target make all is what was used to build the final results for this repo. If scraping data, note that high usage of the hiscores API may result in your IP being blocked. Please be sparing and respectful of Jagex's server resources in your usage of this code.
Run make help to see more top-level targets.
A number of environment variables are set in order to configure the application.
OSRS_APPDATA_URI: path to application data .pkl file (S3 or local)OSRS_MONGO_URI: URL at which MongoDB instance is runningOSRS_MONGO_COLL: store/retrieve player data from collection with this nameThere are also environment variables defining filenames at each stage of the data pipeline.
Defaults for all environment variables are defined in .env.default and imported whenever a make target is run. If a file called .env exists, any settings there will override those in .env.default.
~/.aws directory (create account here)all: all 23 OSRS skillscb: the 7 combat skillsnoncb: the 16 non-combat skillsn_neighbors=10 and min_dist=0.25 were used for splits all and noncb; n_neighbors=20 and min_dist=0.25 were used for split cb.Here are some ideas for data science projects.