Scraping TETR.IO Tetra League standings

August 13, 2024

Since 2022, I’ve regularly enjoyed playing a competitive Tetris variant named TETR.IO. It’s been a while since I have played the game, but when visiting its website I read that the game just moved from it’s Alpha stage to Beta, and that the second season of TETR.IO’s TETRA LEAGUE (the game’s competitive matchmaking mode) starts on the 16th of August.

Something that I missed in the Alpha version of the game was the history of progression: I would have loved to see a plot that shows my rank and the game’s equivalent of my ELO score over time. With the second season of the game fast approaching, it is the perfect time to create a cron job for a script that scrapes my TETR.IO profile and writes the current score to a file. The data itself could then be used to create my plot, and maybe showcase it on my website.

Inspecting the HTML of the TETR.IO profile pages, the score is normally displayed in the following div. Because it is the off-season, however, there is no score to display:

<div class="card categorical" id="usercard_league" style="--color: #375433">
	<h1><img src="/res/league.svg">TETRA LEAGUE</h1>
	<h6 id="user_league_np">Off-season</h6>
	<div id="user_leagueset">
		<div id="user_leaguestateset"></div>
		<div id="user_leaguestandingset" class="standingset">abc</div>
	</div>
	⋯
</div>
<span style="display:flex"><span><div class=<span style="color:#666;font-style:italic">"card categorical"</span> id=<span style="color:#666;font-style:italic">"usercard_league"</span> style=<span style="color:#666;font-style:italic">"--color: #375433"</span>>
</span></span><span style="display:flex"><span>	<h1><img src=<span style="color:#666;font-style:italic">"/res/league.svg"</span>>TETRA LEAGUE</h1>
</span></span><span style="display:flex"><span>	<h6 id=<span style="color:#666;font-style:italic">"user_league_np"</span>>Off-season</h6>
</span></span><span style="display:flex"><span>	<div id=<span style="color:#666;font-style:italic">"user_leagueset"</span>>
</span></span><span style="display:flex"><span>		<div id=<span style="color:#666;font-style:italic">"user_leaguestateset"</span>></div>
</span></span><span style="display:flex"><span>		<div id=<span style="color:#666;font-style:italic">"user_leaguestandingset"</span> class=<span style="color:#666;font-style:italic">"standingset"</span>>abc</div>
</span></span><span style="display:flex"><span>	</div>
</span></span><span style="display:flex"><span>	⋯
</span></span><span style="display:flex"><span></div>
</span></span>

Initially, I planned to simply scrape this using:

curl -s "https://ch.tetr.io/u/geuze" | grep -oP '(?<=<h6 id="user_league_np">).*?(?=</h6>)'
<span style="display:flex"><span>curl -s <span style="color:#666;font-style:italic">"https://ch.tetr.io/u/geuze"</span> | grep -oP <span style="color:#666;font-style:italic">'(?<=<h6 id="user_league_np">).*?(?=</h6>)'</span>
</span></span>

A simple curl and grep of the webpage won’t do, however, as the information seems to be loaded using Javascript. Before the beginning of this new season, I’ll look into scraping Javascript-rendered web pages using Python.