[Trac-tickets] Re: [The Trac Project] #2905: UnicodeDecodeError
The Trac Project
noreply at edgewall.com
Sat Apr 1 02:58:13 CST 2006
#2905: UnicodeDecodeError
---------------------------------------------+------------------------------
Reporter: anonymous | Owner: cboos
Type: defect | Status: new
Priority: high | Milestone: 0.10
Component: general | Version: devel
Severity: normal | Resolution:
Keywords: UnicodeDecodeError unicode utf8 |
---------------------------------------------+------------------------------
Changes (by cboos):
* status: reopened => new
* owner: cmlenz => cboos
Comment:
Well, Alec, I don't think so.
We have to convert `unicode` strings to plain `str` with a specified
encoding only at clearly defined times, like when sending text to the
browser (convert to utf8) or sending a generated mail (convert to the
configured encoding), because we can't switch back and forth between
encoded strings and unicode as the charset used for the encoding is
not remembered.
If you make an object's `__str__` return an UTF-8 encoded string,
next time you'll call `unicode` on that, you'll most likely get
an exception:
{{{
>>> class txt(object):
... def __str__(self):
... return u'été'.encode('utf-8')
...
>>> str(txt())
'\xc3\xa9t\xc3\xa9'
>>> unicode(txt())
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)
}}}
with `'ascii'` replaced by whatever is your `sys.getdefaultencoding()`.
As for the patch above, I'll complement it with some fallback in
the style of `trac.util.to_utf8`, in case there's still an `UnicodeError`
exception raised. This can happen if a wrong charset has been associated
to the file, using the `svn:mime-type` property.
--
Ticket URL: <http://projects.edgewall.com/trac/ticket/2905>
The Trac Project <http://trac.edgewall.com/>
More information about the Trac-Tickets
mailing list