Instagram page: the puzzle “Los Pichor Pathir”, for example, matches the Schiphol Airport railway station when unscrambled. As I barely know any Dutch railway station names, I was curious if something similar was feasibile for Belgian railway stations using Flemish dialect words.
We first retrieve two wordlists: a list of Belgian railway stations, and a wordlist of Flemish dialect words. Afterwards, we use these wordlists to figure out if there are any single- or multi-word anagrams of one wordlist inside the other.
There’s a complete list of Belgian railway stations available on GitHub, maintained by iRail. We can curl
the stations.csv
file from that repository.
curl --output railway_stations.csv \
https://raw.githubusercontent.com/iRail/stations/refs/heads/master/stations.csv
This list of railway stations also contains lots of non-Belgian (often Dutch, German, and French) stations, which I assume are included as they are reachable by train from Belgium. We can filter out these non-Belgian stations using the country-code
column.
$ xan filter 'col("country-code") ne "be"' railway_stations.csv | \
xan select 'name,country-code' | \
xan view -l 5
┌───┬─────────────────────┬──────────────┐
│ - │ name │ country-code │
├───┼─────────────────────┼──────────────┤
│ 0 │ 's Hertogenbosch │ nl │
│ 1 │ Aachen Hbf │ de │
│ 2 │ Agde │ fr │
│ 3 │ Aime-la-Plagne │ fr │
│ 4 │ Aix-en-Provence TGV │ fr │
│ … │ … │ … │
└───┴─────────────────────┴──────────────┘
Belgian railway stations sometimes have both a Dutch and French name. These names are mostly similar (like “Brussel-Centraal” in Dutch and “Bruxelles-Central” in French), but sometimes completely different (like “Diesdelle” and “Vivier d’Oie”). For railway stations in Brussels, these variants are separated by a slash in the name
column of our CSV
.
$ xan filter "'/' in col('name')" railway_stations.csv | \
xan select "name" | \
xan view -l 5
┌───┬────────────────────────────────────┐
│ - │ name │
├───┼────────────────────────────────────┤
│ 0 │ Arcaden/Arcades │
│ 1 │ Boondaal/Boondael │
│ 2 │ Bosvoorde/Boitsfort │
│ 3 │ Brussel-Centraal/Bruxelles-Central │
│ 4 │ Brussel-Congres/Bruxelles-Congrès │
│ … │ … │
└───┴────────────────────────────────────┘
We filter out non-Belgian railway stations, select the names of the remaining stations, split these names on forward slashes so that we get both the Dutch and French name on separate lines where appropriate, and write these lines to a separate text file.
xan filter 'col("country-code") eq "be"' railway_stations.csv | \
xan select name | \
tr '/' '\n' > railway_stations.txt
As far as I can tell, the “Flemish dictionary” does not provide a downloadable wordlist. They do, however, have a page per letter of the alphabet containing all words starting with that letter. Luckily, these letter pages follow the same structure and contain all words on a single page, so by looping through only 26 pages (one for each letter of the alphabet), we can extract all listed words using grep
and sed
.
for letter in {a..z}; do
curl -s https://www.vlaamswoordenboek.be/definities/begintmet/$letter | \
grep -o '<a href="/definities/term/[^"]*">[^<]*</a> <br />' | \
sed 's/<a href="[^"]*">\([^<]*\)<\/a> <br \/>/\1/' >> flemish_wordlist.txt
done
We first look at how to find single-word anagrams—where single Flemish dialect words are anagrams of certain railway stations—using “alphabetic maps”. We pair sets of railway stations with their anagram set, for example:
{'Ternat'} = {'ratten'}
{'Drongen'} = {'gronden'}
{'Aalst'} = {'slaat', 'staal'}
Afterwards, we look at possible ways of finding multi-word anagrams—where multiple words combine to make up an anagram. Some examples, of which "houten afritser" (“wooden playground slide”) is my favourite:
{'Athus-Frontiere'} = {'houten'} + {'afritser'}
{'Mechelen-Nekkerspoel'} = {'schoempelen'} + {'krekelen'}
{'Dave-Saint-Martin'} = {'mandataris'} + {'in vet'}
{'Mortsel-Deurnesteenweg'} = {'lutter'} + {'messe'} + {'onderwegen'}
When alphabetically ordering all characters within words, two distinct words result in the same ordering only if they are anagrams of each other. In Python, for example:
>>> sorted("undefinability") == sorted("unidentifiably")
True
Thus, to find anagrams of Belgian railway stations in Flemish dialect words, we build two alphabetic maps: dictionaries that map ordered characters to a set of “intra-wordlist anagrams”.
from collections import defaultdict
def get_key(word):
return "".join(sorted(filter(str.isalpha, word.lower())))
def group(words):
groups = defaultdict(set)
for word in words:
groups[get_key(word)].add(word)
return groups
with open('railway_stations.txt', 'r') as f:
stations = set(line.strip() for line in f)
grouped_stations = group(stations)
with open('flemish_wordlist.txt', 'r') as f:
wordlist = set(line.strip() for line in f)
grouped_wordlist = group(wordlist)
To obtain the keys of these maps, we turn all characters of the words lowercase and filter out non-alphabetic characters before sorting. We loop through the set of words from each wordlist and build two separate alphabetic maps. Using these alphabetic maps, we can look for anagrams within the wordlists themselves. It turns out, for example, that only two pairs of stations are anagrams of each other:
To find the “inter-wordlist anagrams”—which is what we were originally interested in—we look at the intersection of the keys between alphabetic maps. Every common key corresponds to pairs of anagrams:
for key in (grouped_stations.keys() & grouped_wordlist.keys()):
grouped_stations[key], grouped_wordlist[key]
This results in the following set of single-word anagrams. (As there’s a small overlap between our wordlists, we also get a couple of trivial anagrams in return.)
({'Jette'}, {'tetje'})
({'Temse'}, {'smete'})
({'Lierde'}, {'leider'})
({'Leman'}, {'lamen'})
({'Ternat'}, {'ratten'})
({'Kijkuit'}, {'kijkuit'})
({'Boom'}, {'boom'})
({'Godinne'}, {'doening'})
({'Bleret'}, {'bretel'})
({'Muizen'}, {'muizen'})
({'Asse'}, {'asse'})
({'Landen'}, {'landen'})
({'Weerde'}, {'weerde'})
({'Mol'}, {'mol'})
({'Geel'}, {'leeg', 'Geel'})
({'Sint-Niklaas'}, {'Sint-Niklaas'})
({'Drongen'}, {'gronden'})
({'Welle'}, {'welle'})
({'Ruisbroek'}, {'ruisbroek'})
({'Zele'}, {'zeel'})
({'Meiser'}, {'misere', 'remise', 'eremis'})
({'Spa'}, {'SAP', 'pas'})
({'Menen'}, {'menne', 'nemen'})
({'Lens'}, {'snel'})
({'Tielt'}, {'titel'})
({'Eupen'}, {'peune'})
({'Engis'}, {'signe'})
({'Eichem'}, {'chemie'})
({'Ekeren'}, {'ne keer', 'nekeer'})
({'Niel'}, {'lein'})
({'Manage'}, {'gemaan'})
({'Ronet'}, {'tenor'})
({'Lot'}, {'lot', 'tol'})
({'Essen'}, {'se-n-se', 'sense'})
({'Lobbes'}, {'belbos'})
({'Merode'}, {'moedre'})
({'Blankenberge'}, {'Blankenberge'})
({'Luttre'}, {'lutter'})
({'Lede'}, {'Deel', 'elde', 'leed'})
({'Wavre'}, {'verwa'})
({'Eine'}, {'eine'})
({'Tienen'}, {'ineten', 'tienen'})
({'Aalst'}, {'slaat', 'staal'})
({'Coo'}, {'C.O.O.'})
({'Mollem', 'Lommel'}, {'mollem'})
({'Simonis'}, {'simonis'})
({'Puurs'}, {'Pruus'})
({'Jette'}, {'tetje'})
({'Temse'}, {'smete'})
({'Lierde'}, {'leider'})
({'Leman'}, {'lamen'})
({'Ternat'}, {'ratten'})
({'Kijkuit'}, {'kijkuit'})
({'Boom'}, {'boom'})
({'Godinne'}, {'doening'})
({'Bleret'}, {'bretel'})
({'Muizen'}, {'muizen'})
({'Asse'}, {'asse'})
({'Landen'}, {'landen'})
({'Weerde'}, {'weerde'})
({'Mol'}, {'mol'})
({'Geel'}, {'leeg', 'Geel'})
({'Sint-Niklaas'}, {'Sint-Niklaas'})
({'Drongen'}, {'gronden'})
({'Welle'}, {'welle'})
({'Ruisbroek'}, {'ruisbroek'})
({'Zele'}, {'zeel'})
({'Meiser'}, {'misere', 'remise', 'eremis'})
({'Spa'}, {'SAP', 'pas'})
({'Menen'}, {'menne', 'nemen'})
({'Lens'}, {'snel'})
({'Tielt'}, {'titel'})
({'Eupen'}, {'peune'})
({'Engis'}, {'signe'})
({'Eichem'}, {'chemie'})
({'Ekeren'}, {'ne keer', 'nekeer'})
({'Niel'}, {'lein'})
({'Manage'}, {'gemaan'})
({'Ronet'}, {'tenor'})
({'Lot'}, {'lot', 'tol'})
({'Essen'}, {'se-n-se', 'sense'})
({'Lobbes'}, {'belbos'})
({'Merode'}, {'moedre'})
({'Blankenberge'}, {'Blankenberge'})
({'Luttre'}, {'lutter'})
({'Lede'}, {'Deel', 'elde', 'leed'})
({'Wavre'}, {'verwa'})
({'Eine'}, {'eine'})
({'Tienen'}, {'ineten', 'tienen'})
({'Aalst'}, {'slaat', 'staal'})
({'Coo'}, {'C.O.O.'})
({'Mollem', 'Lommel'}, {'mollem'})
({'Simonis'}, {'simonis'})
({'Puurs'}, {'Pruus'})
Above, we only look at one-to-one mappings where a shuffling of one Flemish dialect word gives us a Belgian railway station name. What if we want to have a combination of two (or more) words that, when shuffled together, gives us a station name?
A bruteforce approach to multi-word anagrams is straightforward: for anagrams of one word list that are made up of $n$ words of the other wordlist, we can take all unordered combinations of $n$ keys of one alphabetic map with replacement, combine them, and check if this combined key is present in the other alphabetic map:
from itertools import combinations_with_replacement as cwr
def combine_keys(*keys):
return "".join(sorted("".join(keys)))
def n_agrams(n):
keys = list(grouped_wordlist)
for key in cwr(keys, n):
word_sets = (grouped_wordlist[k] for k in key)
if (combined_key := combine_keys(*key)) in grouped_stations:
stations = grouped_stations[combined_key]
print(f"{stations} = {list(word_sets)}")
For $n = 2$, we get 1078 different combinations:
{'Diesdelle'} = {'dees'} + {'dille'}
{'Kapelle-op-den-Bos'} = {'bedelke'} + {'salonpop'}
{'Kapelle-op-den-Bos'} = {'poekele', 'poeleke'} + {'plonsbad'}
{'Kapelle-op-den-Bos'} = {'sloppel'} + {'bakendoe'}
{'Kortemark'} = {'karot'} + {'merk'}
{'Kortemark'} = {'krak'} + {'om ter'}
{'Kortemark'} = {'kram'} + {'kroet', 'krote', 'roket'}
{'Kortemark'} = {'kroam', 'kraom'} + {'trek'}
{'Kortemark'} = {'rakker', 'kraker'} + {'mot'}
{'Kortemark'} = {'ram'} + {'kroket'}
{'Landskouter'} = {'dals'} + {'konteur'}
{'Landskouter'} = {'dokteur'} + {'lans'}
{'Landskouter'} = {'duks'} + {'lantoer'}
{'Landskouter'} = {'dutsen'} + {'akrol'}
{'Landskouter'} = {'kadul'} + {'storen', 'roste(n)', 'rotsen'}
...
{'Diesdelle'} = {'dees'} + {'dille'}
{'Kapelle-op-den-Bos'} = {'bedelke'} + {'salonpop'}
{'Kapelle-op-den-Bos'} = {'poekele', 'poeleke'} + {'plonsbad'}
{'Kapelle-op-den-Bos'} = {'sloppel'} + {'bakendoe'}
{'Kortemark'} = {'karot'} + {'merk'}
{'Kortemark'} = {'krak'} + {'om ter'}
{'Kortemark'} = {'kram'} + {'kroet', 'krote', 'roket'}
{'Kortemark'} = {'kroam', 'kraom'} + {'trek'}
{'Kortemark'} = {'rakker', 'kraker'} + {'mot'}
{'Kortemark'} = {'ram'} + {'kroket'}
{'Landskouter'} = {'dals'} + {'konteur'}
{'Landskouter'} = {'dokteur'} + {'lans'}
{'Landskouter'} = {'duks'} + {'lantoer'}
{'Landskouter'} = {'dutsen'} + {'akrol'}
{'Landskouter'} = {'kadul'} + {'storen', 'roste(n)', 'rotsen'}
...
For $n \geq 3$, however, things start to slow down drastically. This naive bruteforce approach could further be optimised through some tricks to disregard impossible candidates, but as $n$ increases this would be “mopping with the faucet running”, as we say in Dutch. For $n \geq 3$, we could use a specialised data structure.
An anatree is a directed edge-labelled tree that describes a set of words. Internal nodes of the anatree represent symbols of an alphabet, and the leaves represent subsets of the set of words. Edges are labelled with positive integers, including zero. Paths of nodes $n_1, \ldots, n_l$ from the root $n_1$ to a leaf $n_l$ along edges labelled with integers $e_1, \ldots, e_l$ arrive at a leaf $n_l$, representing the subset of words that contains exactly $e_i$ times the symbol $n_i$, for all nodes on its path.
The anatree shown above is just one possible anatree for a certain set of words, as there is not just one anatree for a given set. You could have different strategies to pick which character a certain node represents, each resulting in a different anatree. To generate the figure used above, at every step of construction, the most frequently occurring but not yet considered character of the remaining set of words is used as the symbol for the next node. Creating the anatree for the Flemish wordlist resulted in a tree consisting of 172.210 nodes, if we use the least frequent character first. If we use the most frequent character first, we get 127.534 nodes. Other strategies might lead to further reduction of the size of the anatree.
One approach to find multi-word anagrams of one specific train station is starting with that station’s bag of letters and using it as a “budget” while traversing the anatree of dialect words. For a train station with two occurrences of the letter e, for example, we could explore paths that contain at most two e’s, subtracting “spent” e’s from our available budget. After reaching a leaf node on a partial budget, we could explore other paths with the remaining budget. If we are able to reach another leaf with that remaining budget, we have found a multi-word anagram.
I did not implement this multi-word anagram algorithm in the end, as I’m not sure on performance (you’d have to perform a lot of different anatree traversals to find multi-word anagrams, and there are a lot of different ways in which you could “spend” your budget). Exploring this idea already took a bit more time than expected, and I was happy with my $n = 2$ anagram list.
To display these anagrams (and eventually also the title of this post), I wanted to imitate the classic Belgian railway station name signs using CSS
. Looking for the typeface, I landed on a 2017 thread on a Belgian train forum where a poster named Nadieeh
found out that the Brussels-based graphic design studio Speculoos created Alfphabet, a digitised version of the typeface I was looking for. I ended up downloading the typeface from the osp.kitchen.
The FONTLOG.txt
, which is included with a download of Alfphabet, also contains some typical Belgian history:
The Alfphabet family is based on the Belgian road signage called ‘Alphabet’ in French and ‘Alfabet’ in Flemish. It was introduced in 1945 by 3M system working for the Marshall plan after the end of the war. In 1975, it was replaced by the Swiss SNV fonts, but is still in used randomly by the Belgian railroad and Charleroi’s metro. In the early nineties, Pierre Huyghebaert was able to copy the original plates just before the split of the national office of the roads ‘Fond des Routes’ in three regional entities and the burial of the documents deep into regional archives.
For the background colour, I used a colour picker on a picture of an old railway station sign to end up with rgb(0, 33, 84)
, and with some trial and error managed to get a white rounded border to simulate old railway signage:
Going through the CSV
of train stations with a friend, we were curious about the shortest name for a station in Belgium, which turns out to be Sy (part of Ferrières near the Ourthe river):
$ awk '{ print length, $0 }' railway_stations.txt | sort -n | cut -d" " -f2- | head -n 5
Sy
Ans
Ath
Aye
Coo
The Dutch Wikipedia page claims that Sy is the shortest name for a train station in the whole Benelux.
I recently read Adam Aaronson’s "I Drank Every Cocktail" about how he managed to drink all “IBA official cocktails” spread over a couple of years; and I loved his story.
Based on his journey, I wonder how long it would take me to visit all Belgian railway stations. Not just passing the station by train, but to actually visit and take a picture with one of the station name signs.
$ xan filter 'col("country-code") eq "be"' railway_stations.csv | \
xan count
578
Almost 600 stations spread over a couple of years is certainly not impossible, I guess. Who knows, maybe I’ll be able to post my own “I Visited Every Station” eventually!