Penelope

Abstract

Penelope is a multi-tool for creating, editing, converting, and merging electronic dictionaries, especially for eReader devices, like Kobo or Bookeen Cybook Odyssey devices.

I do not assume any legal liability or responsibility for any damage, data loss or inconvenience that you might cause to yourself or to other people by following the procedures below. RTFM, first.

Updates

IMPORTANT UPDATE (2015-02-22) This page is being discontinued and it will be kept online only for historical reasons. Please refer to the GitHub page instead.

IMPORTANT UPDATE (2014-06-30) I moved Penelope to GitHub, and released it under the MIT License, with the version code v2.0.0.

Features

With the current version (v. 2.0.1, 2015-01-25) of Penelope you can:

convert a dictionary FROM/TO the following formats:
- Bookeen Cybook Odyssey (R/W)
- Kobo (R index only, W unencrypted/unobfuscated only)
- StarDict (R/W)
- XML (R/W)
- CSV (R/W)
merge more dictionaries (of the same type) into a single dictionary
define your own parser for each word/definition
define your own collation function when outputting to Bookeen Cybook Odyssey format
generate an EPUB file containing the index of a given dictionary (e.g., to cope with the lack of a search function on your eReader)

Download

Please download the files from the GitHub repo.

You can either:

download the handy ZIP archive from the Releases tab (preferred option);
clone the repository using Git (git); or
download all the source files into the same directory, in raw format (not as HTML pages!).

You need Python, either version 2.x or 3.x, installed on your system to run Penelope.

You might need dictzip installed in your system to read from/write to StarDict dictionaries.

If you want to read from/write to Kobo format, you need a compiled version of MARISA. In case, you must modify the value of variables MARISA_BUILD_PATH and MARISA_REVERSE_LOOKUP_PATH in penelope.py (Python 2.x) or penelope3.py (Python 3.x), making it pointing to the marisa-build and marisa-reverse-lookup executables (see the corresponding comments in the source code).

Usage

In a terminal, issue:

$ python penelope.py -h

to get the list of available options:

$ python penelope.py -p <prefix list> -f <language_from> -t <language_to> [OPTIONS]

Required arguments:
 -p <prefix list>       : list of the dictionaries to be merged/converted (without extension, comma separated)
 -f <language_from>     : ISO 631-2 code language_from of the dictionary to be converted
 -t <language_to>       : ISO 631-2 code language_to of the dictionary to be converted

Optional arguments:
 -d                     : enable debug mode and do not delete temporary files
 -h                     : print this usage message and exit
 -i                     : ignore word case while building the dictionary index
 -z                     : create the .install zip file containing the dictionary and the index
 --sd                   : input dictionary in StarDict format (default)
 --odyssey              : input dictionary in Bookeen Cybook Odyssey format
 --xml                  : input dictionary in XML format
 --kobo                 : input dictionary in Kobo format (reads the index only!)
 --csv                  : input dictionary in CSV format
 --output-odyssey       : output dictionary in Bookeen Cybook Odyssey format (default)
 --output-sd            : output dictionary in StarDict format
 --output-xml           : output dictionary in XML format
 --output-kobo          : output dictionary in Kobo format
 --output-csv           : output dictionary in CSV format
 --output-epub          : output EPUB file containing the index of the input dictionary
 --title <string>       : set the title string shown on the Odyssey screen to <string>
 --license <string>     : set the license string to <string>
 --copyright <string>   : set the copyright string to <string>
 --description <string> : set the description string to <string>
 --year <string>        : set the year string to <string>
 --parser <parser.py>   : use <parser.py> to parse the input dictionary
 --collation <coll.py>  : use <coll.py> as collation function when outputting in Bookeen Cybook Odyssey format
 --fs <string>          : use <string> as CSV field separator, escaping ASCII sequences (default: \t)
 --ls <string>          : use <string> as CSV line separator, escaping ASCII sequences (default: \n)

Examples:
$ python penelope.py -h
$ python penelope.py           -p foo -f en -t en
$ python penelope.py           -p bar -f en -t it
$ python penelope.py           -p "bar,foo,zam" -f en -t it
$ python penelope.py --xml     -p foo -f en -t en
$ python penelope.py --xml     -p foo -f en -t en --output-sd
$ python penelope.py           -p bar -f en -t it --output-kobo
$ python penelope.py           -p bar -f en -t it --output-xml -i
$ python penelope.py --kobo    -p bar -f it -t it --output-epub
$ python penelope.py --odyssey -p bar -f en -t en --output-epub
$ python penelope.py           -p bar -f en -t it --title "My EN->IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0"
$ python penelope.py           -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary"
$ python penelope.py           -p foo -f en -t en --collation custom_collation.py
$ python penelope.py --xml     -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n" 
$ python penelope.py --csv     -p foo -f en -t en --output-xml --fs "\t\t" --ls "\n"

Notes

If you use Python 3.x, replace penelope.py with penelope3.py.
You must have the Python executable (or a directory containing it) listed in your PATH environment variable, or you need to supply its full path.
If you get an error about MARISA, check that you have compiled it correctly, and that your user has the execution right on them.
Bear in mind that no official specifications are published by either Bookeen or Kobo, hence the dictionaries produced by Penelope for Bookeen Cybook Odyssey and Kobo devices work as far as their specifications have been reverse-engineered, by others and myself. (See, for example, the following MobileRead forum threads: T1 T2 T3 T4)
I tried to comment every key point of my script and it should be easy to follow. I took this as a practical exercise to learn Python, so please forgive me if you find my code naive, and drop me an email with your advice to improve it, thanks!

Commented Examples

Example 1

$ python penelope.py -h

Print usage message and exit

Example 2

$ python penelope.py -p foo -f en -t en

Create English monolingual dictionary en.foo.dict and en.foo.dict.idx from StarDict files foo.*

Example 3

$ python penelope.py -p bar -f en -t it

Create English-to-Italian dictionary en-it.dict and en-it.dict.idx from StarDict files bar.*

Example 4

$ python penelope.py -p "bar,foo,zam" -f en -t it

Create English-to-Italian dictionary en-it.dict and en-it.dict.idx merging together StarDict dictionaries bar, foo, and zam

Example 5

$ python penelope.py --xml -p foo -f en -t en

Create English monolingual dictionary en.foo.dict and en.foo.dict.idx, but the input dictionary foo.xml is in XML format

Example 6

$ python penelope.py --xml -p foo -f en -t en --output-sd

As above, but output in StarDict format instead of Bookeen Cybook Odyssey format

Example 7

$ python penelope.py -p bar -f en -t it --output-kobo

As above, but outputs in Kobo format, creating dicthtml-en-it.zip

Example 8

$ python penelope.py -p bar -f en -t it --output-xml -i

Reads from StarDict format and outputs in XML format, creating bar.xml, lowercasing all the keywords

Example 9

$ python penelope.py --kobo -p bar -f it -t it --output-epub

Reads from Kobo format and outputs the XML format, creating the dictionary index in EPUB format bar.epub

Example 10

$ python penelope.py --odyssey -p bar -f en -t en --output-epub

As above, but input is in Bookeen Cybook Odyssey format

Example 11

$ python penelope.py -p bar -f en -t it --title "My EN-IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0"

Create English-to-Italian dictionary but also set title, year and license metadata

Example 12

$ python penelope.py -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary"

As above but set its title and use foo_parser.py to parse the input dictionary definitions. A detailed description of custom parser/collation can be found in the old page.

Example 13

$ python penelope.py -p foo -f en -t en --collation custom_collation.py

As above but use custom_collation.py to perfom key collation. A detailed description of custom parser/collation can be found in the old page.

Example 14

$ python penelope.py --xml -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n"

Create CSV English dictionary foo.csv from XML dictionary foo.xml, and using a double tab as field separator, and a newline as line separator

Example 15

$ python penelope.py --csv -p foo -f en -t en --output-xml --fs "\t\t" --ls "\n"

Create XML English dictionary foo.xml from CSV dictionary foo.csv, and using a double tab as field separator, and a newline as line separator