News

Last update: 2017-05-27

I am about to move to Torino (Turin), where I will start working for Nuance Communications on 2017-06-01.

For the next two or three months my FLOSS projects will be put on hold. I will only fix urgent bugs and keep the ReadBeyond and aeneas Web App sites up-and-running.

Below you can find some projects I worked on recently.

FLOSS Work

Recently I released aeneas v1.7.3. Except for bug fixes, I am not planning to directly work on it for the next months. I will think about the next major version (v2.x) and the big changes that it will require.

I found online a PDF that I produced years ago, when I was studying at the University of Padova, and that I had on my academic page when I was a teaching assistant. I will save its interesting story for a blog post, but here I just want to mention that it is the famous piece by Italo Calvino about being honest in a country full of corrupted people: Apologo sull’onestà nel paese dei corrotti.

To celebrate 2017, I put online this guide to help non-tech-savvy Windows users to install Python and run a Python program in the Command Prompt.

Professor Tullio De Mauro recently passed away. He was one of the most influential Italian linguists, compiling one of the most comprehensive dictionaries of modern Italian (with an accompanying abridged version), and more importantly, one of the few sentinels who dared to speak out about the decrease of literacy experienced in Italy, mostly due to lack of funds and ignorance from politicians. Just in November 2016, Prof. De Mauro published a new version of the vocabulary of most common Italian terms (Nuovo vocabolario di base della lingua italiana, or NVdB for short), releasing the list of words in PDF. As a thank-you for his work, and to remember it, I put on GitHub a Python script to extract the text data from the PDF file, and to clean it. The same repository also holds the processed/cleaned files, in UTF-8 encoded, plain text.

Since I grew tired of having to use the cumbersome, JS/AJAX-ridden local Web interface of my WebCube4 (Huawei E8378) 4G/LTE router, I thought about automating the process with a shell script. So I captured some traffic with WireShark and analyzed it, deciphering the relevant HTTP headers, API endpoints, and Javascript code. Then, I emulated the relavant bits with curl and node, in a simple but elegant Bash script, now published as webcube4 on GitHub. (The process is a story worth on its own...)

A medium-term project I need to resume working on is yael, my Python library for reading/writing/modifying EPUB files. The reading part is essentially done, but the writing part is missing. I also need to rethink its architecture, since the current one is a bit disorganized and inefficient.

Finally, a few months ago I built a cadence meter for my spin bike with an Hall sensor attached to an Arduino board. The sensor sends the current data (RPM, run time, etc.) to a PC via the USB cable, and some Python code reads the data, stores it, and shows it as a dynamically-updated dashboard in a browser. I also have a speech-recognition module that can be used to command the system via spoken commands. For example, I can change the parameters displayed on the dashboard or control the media player software (I love listening to podcasts/audiobooks while spinning, but I hate sweating the remote control)... The project consists of a very heterogeneous stack (C code for Arduino, Python + ZMQ + Flask + PocketSphinx on PC), which is both interesting but also challenging to set up correctly. I hope to find time to clean the code, write a good set up guide, and release it on GitHub.

Personal Study

Mostly neural network stuff applied to speech recognition, and things called triangular global alignment kernels.

I am playing a bit with Python/Numpy out-of-memory/on-disk computation libraries, as I would like to add that capability to v2.x of aeneas.

In 2017 I should look at Rust more in depth, as it looks extremely promising language. Also I need to better understand how LLVM works.