Penelope

Abstract

Penelope is a multi-tool for creating, editing, converting, and merging electronic dictionaries, especially for eReader devices, like Kobo or Bookeen Cybook Odyssey devices.

I do not assume any legal liability or responsibility for any damage, data loss or inconvenience that you might cause to yourself or to other people by following the procedures below. RTFM, first.

Updates

IMPORTANT UPDATE (2014-06-30) I moved Penelope to GitHub, and released it under the MIT License, with the version code v2.0.0.

Features

With the current version (v. 2.0.0, 2014-06-30) of Penelope you can:

Future versions will include:

Download

Please download the files from the GitHub repo.

You can either:

You need Python, either version 2.x or 3.x, installed on your system to run Penelope.

You might need dictzip installed in your system to read from/write to StarDict dictionaries.

If you want to read from/write to Kobo format, you need a compiled version of MARISA. In case, you must modify the value of variables MARISA_BUILD_PATH and MARISA_REVERSE_LOOKUP_PATH in penelope.py (Python 2.x) or penelope3.py (Python 3.x), making it pointing to the marisa-build and marisa-reverse-lookup executables (see the corresponding comments in the source code).

Usage

In a terminal, issue:

$ python penelope.py -h

to get the list of available options:

$ python penelope.py -p <prefix list> -f <language_from> -t <language_to> [OPTIONS]

Required arguments:
 -p <prefix list>       : list of the dictionaries to be merged/converted (without extension, comma separated)
 -f <language_from>     : ISO 631-2 code language_from of the dictionary to be converted
 -t <language_to>       : ISO 631-2 code language_to of the dictionary to be converted

Optional arguments:
 -d                     : enable debug mode and do not delete temporary files
 -h                     : print this usage message and exit
 -i                     : ignore word case while building the dictionary index
 -z                     : create the .install zip file containing the dictionary and the index
 --sd                   : input dictionary in StarDict format (default)
 --odyssey              : input dictionary in Bookeen Cybook Odyssey format
 --xml                  : input dictionary in XML format
 --kobo                 : input dictionary in Kobo format (reads the index only!)
 --csv                  : input dictionary in CSV format
 --output-odyssey       : output dictionary in Bookeen Cybook Odyssey format (default)
 --output-sd            : output dictionary in StarDict format
 --output-xml           : output dictionary in XML format
 --output-kobo          : output dictionary in Kobo format
 --output-csv           : output dictionary in CSV format
 --output-epub          : output EPUB file containing the index of the input dictionary
 --title <string>       : set the title string shown on the Odyssey screen to <string>
 --license <string>     : set the license string to <string>
 --copyright <string>   : set the copyright string to <string>
 --description <string> : set the description string to <string>
 --year <string>        : set the year string to <string>
 --parser <parser.py>   : use <parser.py> to parse the input dictionary
 --collation <coll.py>  : use <coll.py> as collation function when outputting in Bookeen Cybook Odyssey format
 --fs <string>          : use <string> as CSV field separator, escaping ASCII sequences (default: \t)
 --ls <string>          : use <string> as CSV line separator, escaping ASCII sequences (default: \n)

Examples:
$ python penelope.py -h
$ python penelope.py           -p foo -f en -t en
$ python penelope.py           -p bar -f en -t it
$ python penelope.py           -p "bar,foo,zam" -f en -t it
$ python penelope.py --xml     -p foo -f en -t en
$ python penelope.py --xml     -p foo -f en -t en --output-sd
$ python penelope.py           -p bar -f en -t it --output-kobo
$ python penelope.py           -p bar -f en -t it --output-xml -i
$ python penelope.py --kobo    -p bar -f it -t it --output-epub
$ python penelope.py --odyssey -p bar -f en -t en --output-epub
$ python penelope.py           -p bar -f en -t it --title "My EN->IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0"
$ python penelope.py           -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary"
$ python penelope.py           -p foo -f en -t en --collation custom_collation.py
$ python penelope.py --xml     -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n" 
$ python penelope.py --csv     -p foo -f en -t en --output-xml --fs "\t\t" --ls "\n" 

Notes

Commented Examples

Example 1

$ python penelope.py -h 

Print usage message and exit

Example 2

$ python penelope.py -p foo -f en -t en 

Create English monolingual dictionary en.foo.dict and en.foo.dict.idx from StarDict files foo.*

Example 3

$ python penelope.py -p bar -f en -t it 

Create English-to-Italian dictionary en-it.dict and en-it.dict.idx from StarDict files bar.*

Example 4

$ python penelope.py -p "bar,foo,zam" -f en -t it 

Create English-to-Italian dictionary en-it.dict and en-it.dict.idx merging together StarDict dictionaries bar, foo, and zam

Example 5

$ python penelope.py --xml -p foo -f en -t en 

Create English monolingual dictionary en.foo.dict and en.foo.dict.idx, but the input dictionary foo.xml is in XML format

Example 6

$ python penelope.py --xml -p foo -f en -t en --output-sd 

As above, but output in StarDict format instead of Bookeen Cybook Odyssey format

Example 7

$ python penelope.py -p bar -f en -t it --output-kobo 

As above, but outputs in Kobo format, creating dicthtml-en-it.zip

Example 8

$ python penelope.py -p bar -f en -t it --output-xml -i 

Reads from StarDict format and outputs in XML format, creating bar.xml, lowercasing all the keywords

Example 9

$ python penelope.py --kobo -p bar -f it -t it --output-epub 

Reads from Kobo format and outputs the XML format, creating the dictionary index in EPUB format bar.epub

Example 10

$ python penelope.py --odyssey -p bar -f en -t en --output-epub 

As above, but input is in Bookeen Cybook Odyssey format

Example 11

$ python penelope.py -p bar -f en -t it --title "My EN-IT dictionary" --year 2012 --license "CC-BY-NC-SA 3.0" 

Create English-to-Italian dictionary but also set title, year and license metadata

Example 12

$ python penelope.py -p foo -f en -t en --parser foo_parser.py --title "Custom EN dictionary" 

As above but set its title and use foo_parser.py to parse the input dictionary definitions. A detailed description of custom parser/collation can be found in the old page.

Example 13

$ python penelope.py -p foo -f en -t en --collation custom_collation.py

As above but use custom_collation.py to perfom key collation. A detailed description of custom parser/collation can be found in the old page.

Example 14

$ python penelope.py --xml -p foo -f en -t en --output-csv --fs "\t\t" --ls "\n"

Create CSV English dictionary foo.csv from XML dictionary foo.xml, and using a double tab as field separator, and a newline as line separator

Example 15

$ python penelope.py --csv -p foo -f en -t en --output-xml --fs "\t\t" --ls "\n"

Create XML English dictionary foo.xml from CSV dictionary foo.csv, and using a double tab as field separator, and a newline as line separator

Support and Contribution

The current version runs both under Python 2 or Python 3, and it has been tested under Linux (Debian, Fedora) and Windows (XP, 7). Unfortunately, since I do not have any financial support for the project, I cannot offer support for all the possibile values of the tuple (OS, Python version, console encoding). Therefore, only problems running Penelope in a Linux environment will receive full priority.

If you want to contribute some code or you have suggestions, please let me know by sending an email containing the word "Penelope" in the subject. Thanks!

Acknowledgments

Many thanks to:

If you enjoyed reading this page or using my conversion script, you can send me a "thank-you" email.

If you really enjoyed this work and you feel really grateful to me for writing the conversion script, I would really love to receive a (reasonably recent) 9-inch e-reader or tablet for testing purposes.

If you really really enjoyed my work on this project and you think my brain can help you, I am always glad to hear about job collaborations!

In all three cases, contact me via email, thanks!

Links