Solving (Generalized) Ruzzle Boards With elzzur

RSS  •  Permalink  •  Created 29 May 2016  •  Written by Alberto Pettarin

Ruzzle is a word board game app which went viral a few years ago.

A 4-by-4 board of letters is shown to the player:

Blog Image 20160529-ruzzleboard.png

and she has two minutes to find as many valid words in the board as she can.

A word can be formed by connecting at least two adjacent letters (including above/below, left/right, or diagonally), and it is valid if it is present in the app dictionary for the language chosen by the user. For example, "THE" in the above board:

Blog Image 20160529-word.png

The total score is the sum of the scores of the words found, which in turn is the sum of the points of each letter in the word, potentially multiplied by two or three if the path forming the word includes letters with the DL/TL/DW/TW modifiers. Also, 5-letters words or longer are awarded bonus length points.

Note that the score of each letter differs from language to language, since it basically depends on the frequency of the letter in each language, as in other word board games like Scrabble. For example, a "C" is worth 4 points in English, but only 2 points in Italian, etc.

(For details on the Ruzzle rules, see the Scoring Rules Section.)

Kids Challenging The Wrong Uncle

Two of my nephews are quite addicted to this game app, which allows remote players to compete on the same board, the winner being the player achieving a score higher than the opponent.

While I like playing the game myself from time to time, especially with or against my nephews, I also see some serious problems in it. The worst one is that, given a board, it is extremely easy to achieve an astronomical score by cheating. In fact, I mentioned to my nephews (they are 7 and 10 years old) that it is rather straightforward to write a program that, in few seconds, enumerates all the words contained in the very board displayed by the app. They were skeptical, so I had to show them some magic!

Enters elzzur

I coded elzzur, a Python package that contains an executable script which in fact solves any given Ruzzle board.

(Yes, "elzzur" is "ruzzle" backwards...)

Unlike other solvers out there, elzzur solves generalized boards, that is, N-by-M boards (N, M >= 1), not just 4-by-4 boards.

Moreover, it does not just find the words, but it also computes their exact score, thanks to built-in letter scores for a handful of languages, taken from the actual Ruzzle app. The user can sort the results by score, word length, or begin/end cell. In particular, sorting by (decreasing) score ensures the highest degree of "cheatiness"...

Finally, elzzur has built-in dictionaries derived from the GNU aspell project. Although the built-in dictionaries are not the Ruzzle official ones, they are sufficiently extensive to prove my point. (I guess you can obtain the official Ruzzle dictionaries by decompiling the app, and maybe they are already out there, in the dark corners of the Internet...) Alternatively, the user can supply her own dictionary.

Using elzzur

OK, enough talk, let's play!

First, install it:

$ pip install elzzur

Installing via pip will make the elzzur script available in your shell:

$ elzzur --help

usage: [-h] [-l [LANGUAGE]] [-d [DICTIONARY]] [-b [BOARD]]
                   [-o [OUTPUT]] [-r [ROWS]] [-c [COLS]] [-s [SORT]] [-R] [-q]

elzzur solves a Ruzzle board

positional arguments:
  command               [cat|compile|demo|generate|languages|solve]

optional arguments:
  -h, --help            show this help message and exit
  -l [LANGUAGE], --language [LANGUAGE]
                        Language code (e.g., 'en')
  -d [DICTIONARY], --dictionary [DICTIONARY]
                        Path to the dictionary file
  -b [BOARD], --board [BOARD]
                        Path to the board file
  -o [OUTPUT], --output [OUTPUT]
                        Path to the output file
  -r [ROWS], --rows [ROWS]
                        The number of rows of the board to generate
  -c [COLS], --cols [COLS]
                        The number of columns of the board to generate
  -s [SORT], --sort [SORT]
                        Sort words by [score|length|start|end]
  -R, --reverse         Reverse the list of words
  -q, --quiet           Do not output board and statistics

(If you clone the GitHub repo or you use Windows, replace elzzur with python -m elzzur.)

For the impatient, elzzur comes with a "demo" mode, which shows a built-in 4-by-4 board, generated by the official Ruzzle app, and solves it. For example, for English (-l en), it prints:

$ elzzur demo -l en

Ttl R   S   Ndl 
Odw Htw E   I   
Cdw I   N   V   
Etl A   D   E   

COHESIVE    224    (2,0) (1,0) (1,1) (1,2) (0,2) (1,3) (2,3) (3,3)
HEROIC      154    (1,1) (1,2) (0,1) (1,0) (2,1) (2,0)
CHORES      154    (2,0) (1,1) (1,0) (0,1) (1,2) (0,2)
COHEN       149    (2,0) (1,0) (1,1) (1,2) (0,3)
ECHO        144    (3,0) (2,0) (1,1) (1,0)
ROCHE       137    (0,1) (1,0) (2,0) (1,1) (1,2)
OCHRE       137    (1,0) (2,0) (1,1) (0,1) (1,2)
OCHER       137    (1,0) (2,0) (1,1) (1,2) (0,1)
NACHO       137    (2,2) (3,1) (2,0) (1,1) (1,0)
CHORE       137    (2,0) (1,1) (1,0) (0,1) (1,2)
ACHIEVED    128    (3,1) (2,0) (1,1) (2,1) (1,2) (2,3) (3,3) (3,2)
ACHIEVE     111    (3,1) (2,0) (1,1) (2,1) (1,2) (2,3) (3,3)
OCH         108    (1,0) (2,0) (1,1)
IS          2      (1,3) (0,2)
IA          2      (2,1) (3,1)
ES          2      (1,2) (0,2)
ER          2      (1,2) (0,1)
AN          2      (3,1) (2,2)
AI          2      (3,1) (2,1)

Number of words:            281
Length of the longest word: 8
Maximum total score:        7376

As you can see from the output above, you also get the (0,0)-based coordinates of the cells of the board that generate each word. Clearly, for Italian (-l it) you will get a different board and a different list of words:

$ elzzur demo -l it

Ztl T   Edw Edl 
I   N   R   I   
Itw Adw M   U   
A   Ctl A   Z   

MACINARE     236    (2,2) (3,2) (3,1) (2,0) (1,1) (2,1) (1,2) (0,2)
CINERAMA     236    (3,1) (2,0) (1,1) (0,2) (1,2) (2,1) (2,2) (3,2)
CINZIA       226    (3,1) (2,0) (1,1) (0,0) (1,0) (2,1)
TENACIA      195    (0,1) (0,2) (1,1) (2,1) (3,1) (2,0) (3,0)
IRENICA      195    (1,3) (1,2) (0,2) (1,1) (2,0) (3,1) (2,1)
MACINE       190    (2,2) (2,1) (3,1) (2,0) (1,1) (0,2)
MIR          6      (2,2) (1,3) (1,2)
ITRI         6      (1,0) (0,1) (1,2) (1,3)
TI           3      (0,1) (1,0)

Number of words:            111
Length of the longest word: 9
Maximum total score:        5566

Run with the languages command to list the supported languages, that is, languages for which the letter score and frequency, a demo board, and a dictionary, are built in elzzur:

$ elzzur languages
Supported languages: de, en, es, fr, it, nl, pt

At this point, you might want to solve your own board. To do so, create a plain text file, say /tmp/my.board, containing something like the following four lines (details on the format are here):

B   Rdw T   E
O   U   Ntw G
R   Edw T   Atl
Etl Ddl E   H

and then run:

$ elzzur solve -l en -b /tmp/my.board

B   Rdw T   E   
O   U   Ntw G   
R   Edw T   Atl 
Etl Ddl E   H   

BURNED      166    (0,0) (1,1) (0,1) (1,2) (2,1) (3,1)
TURNED      130    (0,2) (1,1) (0,1) (1,2) (2,1) (3,1)
BURNER      130    (0,0) (1,1) (0,1) (1,2) (2,1) (2,0)
BRUNET      130    (0,0) (0,1) (1,1) (1,2) (2,1) (2,2)
DENATURE    116    (3,1) (2,1) (1,2) (2,3) (2,2) (1,1) (2,0) (3,0)
ORNATE      106    (1,0) (0,1) (1,2) (2,3) (2,2) (2,1)
URETHANE    104    (1,1) (2,0) (2,1) (2,2) (3,3) (2,3) (1,2) (0,3)
BORNE       101    (0,0) (1,0) (0,1) (1,2) (2,1)
TUT         4      (0,2) (1,1) (2,2)
OUT         4      (1,0) (1,1) (0,2)
UT          3      (1,1) (0,2)
TU          3      (0,2) (1,1)

Number of words:            238
Length of the longest word: 8
Maximum total score:        6762

In all the above examples the dictionary used is the built-in one. However, you can pass your own dictionary (details are here) to the solver, using the -d switch:

$ cat /tmp/my.dictionary 

$ elzzur solve -l en -b /tmp/my.board -d /tmp/my.dictionary

B   Rdw T   E   
O   U   Ntw G   
R   Edw T   Atl 
Etl Ddl E   H   

BURNED    166    (0,0) (1,1) (0,1) (1,2) (2,1) (3,1)
OUT       4      (1,0) (1,1) (0,2)

Number of words:            2
Length of the longest word: 6
Maximum total score:        170

If you plan to create a full dictionary with thousands of words, you might want to compile it to MARISA format, so that loading it into memory will be faster than loading it from a plain-text file:

$ elzzur compile -d /tmp/my.dictionary -o /tmp/my.dictionary.marisa
File '/tmp/my.dictionary.marisa' saved

$ elzzur solve -l en -b /tmp/my.board -d /tmp/my.dictionary.marisa

Finally, if you want to generate random boards, elzzur can do it for you, sampling letters according to the letter frequency of the selected language:

# generate a 4x4 English board
$ elzzur generate -l en
Utl Ldw D   L   
Utw Htl Stl Itl 
T   A   Ddw T   
Utl E   Hdw Mdw 

# generate another 4x4 English board
$ elzzur generate -l en
T   B   F   T   
F   T   P   Sdl 
Atw F   A   G   
T   Stw A   T   

# generate a 6x6 English board
$ elzzur generate -l en -r 6 -c 6
R   A   I   A   O   B   
W   A   T   W   A   F   
O   D   O   Wdw M   S   
H   Ttl I   T   T   Tdw 
Itl T   Htw A   F   Ctw 
W   T   Tdl E   J   A   

# generate a 6x10 English board, piping it to file
$ elzzur generate -l en -r 6 -c 10 > /tmp/my_6x10.board
$ cat /tmp/my_6x10.board
Otl B   Fdl H   B   Tdl E   Mdw Tdw Tdw 
A   Mdl D   Fdw Hdl P   P   Stw T   Itl 
D   S   H   W   M   N   Wtl Idw T   O   
Ctl Adl Wtl S   Idw Wdl Hdl Y   Gtl S   
T   M   Pdw Idl W   A   Wdw U   Wdw Edl 
Adl Rdw S   H   Atl T   A   T   T   Kdw 

And, as promised, the solver can take it!

$ elzzur solve -l en -b /tmp/my_6x10.board

Otl B   Fdl H   B   Tdl E   Mdw Tdw Tdw 
A   Mdl D   Fdw Hdl P   P   Stw T   Itl 
D   S   H   W   M   N   Wtl Idw T   O   
Ctl Adl Wtl S   Idw Wdl Hdl Y   Gtl S   
T   M   Pdw Idl W   A   Wdw U   Wdw Edl 
Adl Rdw S   H   Atl T   A   T   T   Kdw 

WHIST       365    (4,6) (3,6) (2,7) (1,7) (0,8)
HIPPEST     255    (3,6) (2,7) (1,6) (1,5) (0,6) (1,7) (0,8)
EGOTISM     231    (4,9) (3,8) (2,9) (1,8) (2,7) (1,7) (0,7)
WHIPS       221    (4,6) (3,6) (2,7) (1,6) (1,7)
TO          2      (1,8) (2,9)
SO          2      (3,9) (2,9)
OT          2      (2,9) (1,8)
OS          2      (2,9) (3,9)

Number of words:            514
Length of the longest word: 8
Maximum total score:        13695

Who Is MARISA? Or, A Few Words On The Solver

I really enjoyed coding elzzur, because it gave me an excuse to use one of my favourite data structures: a trie, also known as prefix tree. In particular, I used a MARISA trie, a very efficient C++ implementation of a recursive, bit-indexed trie, wrapped by the Python binding marisa-trie.

Given the dictionary of valid words, "solving" a Ruzzle board means returning the list of words on the board maximing the total score.

Since the total score is the sum of the scores of the words found, this optmization problem can be solved by:

  1. finding all the possible valid "snakes", that is, paths of adjacent cells forming words present in the dictionary; and
  2. computing the score of each snake: if two or more snakes form the same word, only one yielding the highest score must be kept, since the Ruzzle rules say that the player cannot form the same word twice.

While the second step is trivial, the first one requires some thinking.

Attempt 1: Brute Force Search Of The Board

A naive attempt is the following: enumerating all the possible snakes, for example by performing a depth-first search (DFS) or a breadth-first search (BFS) of the board, checking if the word corresponding to the each snake is contained in the dictionary. To check efficiently, the latter can be stored as an hash table in memory.

This naive approach takes a lot of time, since there are about 12 million possible snakes in a 4-by-4 board. To meet the 2-minutes constraint of the original app, each snake should be generated and checked against the dictionary in about 10 microseconds, which is not quite feasible in Python (or even C) with a 100k word dictionary. Even worse, the number of snakes increases exponentially with the size of the board: solving a (say) 10x10 board will take ages on a regular PC.

Attempt 2: A Smarter Search

A better approach consists in reversing the role of the dictionary and the board, since natural languages have less than a million words (including inflected forms), so looping through the dictionary, checking if the current word can be formed on the given board is faster. A further speedup stems from the observation that one can filter the dictionary, removing words that cannot be formed with the letters present on the board, for example all those containing "Z" if "Z" is not on the board, or all those containing three "E"s if the board contains only two "E"s, etc.

However, even with this optimization, the (Python) code I wrote to benchmark it took a few minutes to solve a 4-by-4 board, and not making the 2-minutes cut of the official Ruzzle app!

Attempt 3: Trie-based Dictionary Search

The two attempts described above highlight a key characteristic of the problem.

Suppose you are examining a "current" snake. First, you check if it is a valid word: if so, you store it in the list of valid snakes. Second, you need to extend it, that is, you need to generate new "candidate" snakes, to be examined later, by appending letters adjacent to the current end letter. But of course you can avoid exploring these extensions if you know that no word in the dictionary has a prefix equal to the current snake. An example will help understanding the process:

Blog Image 20160529-exploration.png

Suppose you have the snake "BURN". Since it is a valid English word, you will store it. Now, you examine the letters adjacent to the "N": except for the "R" and "U" letters, which are already part of the current snake and hence they cannot be used again, you will add "BURNT" (top), "BURNE" (top-right), "BURNG", "BURNA", "BURNT" (bottom), and "BURNE" (bottom-left) to the queue of snakes to be examined. In fact, at a later iteration:

  • both "BURNT" will be added to the the list of valid snakes, since "BURNT" is in the dictionary, and it will not be explored further, as no other word starts with prefix "BURNT";
  • "BURNE" is not a valid word, but the dictionary has words like "BURNED", "BURNER" or "BURNETT" which start with that prefix, hence it will be explored further: "BURNED" is indeed on the board and it will be found in a later iteration;
  • "BURNA" is not a valid word, but the dictionary contains "BURNABLE", so it will not be added but it will be explored: since none of "BURNAG", "BURNAH", "BURNAE", or "BURNAT" is a prefix of a dictionary word, the exploration will be pruned in the subsequent iteration;
  • "BURNG" is neither a valid word nor a prefix of a valid word, so it will not be added and it will not be explored.

This observation immediately suggests to store the dictionary in a data structure that answers the following two questions efficiently:

  • deos the data structure contain word w?
  • does the data structure contain words starting with prefix p?

Of course, such a data structure is a trie, aka prefix-tree aka radix-tree.

A trie is a tree with an empty root, while other nodes corresponds to one or more letters. Nodes that are children of the same node represent words that share the same prefix, which is precisely the concatenation of letters in the path from the root to the common parent node, but that then differ in the next letter. The following illustration makes the concept intuitive:

Blog Image 20160529-trie.png

The words stored in the trie depicted above are: a, an, elegant, element, elzzur, i, is, rumble, ruzzle, simple, solve, solver.

Since a dictionary might contain words that are prefixes of other words, like "solve"/"solver" in the example, a virtual character, $ in the example, is added at the end of a word, to indicate that a prefix ("solve") is also a full word ("solve$"). (Actual implementations use a bit field for each node, instead of adding these virtual leaves, but I find easier to understand the $-convention shown above.)

To determine whether a word is stored in the trie, a visit of the trie starts from the root, until a leaf node is reached (which means the word is present) or a failure occurs. When reaching an internal node, its children are scanned to see if one contains a prefix of the word: if not, the search fail; if yes, the search continues from the child. For this reason, siblings are usually kept sorted, to allow binary search. In the above trie, seaching for "elementary", "solver" and "solving" will visit the following nodes:

elementary: ROOT => el => (el)e => (ele)ment => NOT FOUND
solver: ROOT => s => (s)olve => (solve)r => (solver)$ => FOUND
solving: ROOT => s => NOT FOUND

Answering both the above questions ("has a given word?" and "has words with given prefix?") requires to traverse at most h nodes, where h is the height of the tree, each costing a binary search over the prefixes stored in its children. (For alphabets of limited cardinality, one can use an hash table or an array, trading space to achieve access in constant time.) Due to the statistical properties of natural languages, not all sequences of letters appear with the same frequency, and hence most of the times the search of the trie is terminated early, which in turn means that the exploration of the board is heavily pruned.

In elzzur, the solver is thus implemented as in the following Python-like pseudo-code:

func find_valid_snakes(board, dictionary):
  valid_snakes = []
  for row in board.rows:
    for col in board.cols:
      q = [board.letter(row, col)]
      while len(q) > 0:
        current_snake = q.pop()
        if dictionary.has_word(current_snake):
        if dictionary.had_words_with_prefix(current_snake):
          # generate new candidate snakes by appending letters
          # adjacent to current snake end letter
          for (cand_row, cand_col) in board.adjacent_letters(current_snake.end):
            # the extended snake cannot intersect itself
            if not (cand_row, cand_col) in current_snake:
              q.append(current_snake + board.letter(cand_row, cand_col))
  return valid_snakes

Here q is a queue simulating a BFS of the board grid, but the type of search does not really matter, a DFS would have worked equally well. What really matters is the fact that dictionary supports fast has_word() and has_words_with_prefix() operations.

The actual Python code for the above function is here.

Once all the valid snakes are found, it is just a matter of keeping the highest scoring one (if two or more yield the same word), and sorting them according to the method requested by the user: score, word length, or begin or end cell of the snake.

As promised, the code using a MARISA trie to store the dictionary is really fast, taking only a few hundred milliseconds to solve a 4-by-4 board:

$ time elzzur solve -l en -b /tmp/my_4x4.board
real    0m0.456s
user    0m0.420s
sys     0m0.032s

$ time elzzur solve -l en -b /tmp/my_10x10.board
real    0m2.393s
user    0m1.868s
sys     0m0.040s

$ time elzzur solve -l en -b /tmp/my_20x20.board
real    0m8.384s
user    0m5.416s
sys     0m0.096s

$ time elzzur solve -l en -b /tmp/my_50x50.board
real    0m40.112s
user    0m37.796s
sys     0m0.332s

$ time elzzur solve -l en -b /tmp/my_100x100.board
real    2m44.642s
user    2m35.208s
sys     0m1.328s

The sharp-eyed reader will notice that the run time is roughly linear in the number of cells of the board...

More Fun (aka Open Problems And Future Work)

On the GitHub README I listed a few extensions that are worth exploring.

If you fancy combinatorics, obtaining a closed formula for the number of possible snakes in a N-by-M grid seems to be an open problem, the main difficulty being the "geometric" constraints induced by the "non-self-intersecting rule".

Did you enjoy reading this post? Let me know by email or via Twitter (@acutebit).

(I am considering writing a series of posts similar to this one, about non-trivial algorithms and data structures, and their applications to practical and/or fun problems, so feedback is very appreciated.)

Enjoy playing with elzzur!