I spent a good two days trying to resolve a problem with a python based application I’ve been working on.
The app was running as a daemon process, sending items to database.
The database, table and data are all UTF8 encoded. All was fine until an email that actually utilized the benefits of UTF and contained other characters than the usual English ones – the email that was sent came up as either garbage or a series of question marks. From python – no luck. WTF I said (well, I didn’t actually speak out the acronym, but I did utter the expanded version a few times, making sure I repeat the F word for good measure).
My first route was to tackle the email encoding issues but after a day of sweating it and getting no results, It dawned on me that perhaps I’m bashing my head on the wrong wall. A quick interactive shell session confirmed that – the data coming out from the queries was returning as garbage whenever an extended character was supposed to show. Absolutely nothing would make it display properly. The only half-assed thing that worked is encoding the utf8 data as latin1 (and later in the email mime part to set the encoding to utf8. That just didn’t feel right. Any character that could not be represented in latin-1 would have been either ignore, replaced or thrown a Unicode error.
Many Googles later, a long session at #python on undernet (thanks tommi) and a pinch of perseverance, I had a working solution. The secret ingredient is to add a charset=”utf8″ to your connection parameters, and use_unicode=True. The former will tell mySql that it should work in the utf8 character set. The latter is to tell python its on utf8.
from DBUtils.PooledDB import PooledDB
pool = PooledDB( creator = MySQLdb, mincached = 5, db = DB_NAME, host = DB_HOST, user = DB_USER, passwd= DB_PASS,
charset = "utf8", use_unicode = True)
row = __cur.fetchone()