[Trac-tickets] [The Trac Project] #2868: garbled unicode chars in inline diff view

The Trac Project noreply at edgewall.com
Mon Mar 13 15:29:12 CST 2006


#2868: garbled unicode chars in inline diff view
-----------------------------+----------------------------------------------
 Reporter:  Andrew Stromnov  |       Owner:  jonas
     Type:  defect           |      Status:  new  
 Priority:  normal           |   Milestone:       
Component:  changeset view   |     Version:  devel
 Severity:  minor            |    Keywords:  diff 
-----------------------------+----------------------------------------------
 '''garbled unicode chars in highlighted inline diff view'''

 __Example__: diff between '''str1="АШИПКА"''' and
 '''str2="ОШИБКА"'''

  1. '''str1''' and '''str2''' passed to
 [source:/trunk/trac/versioncontrol/diff.py at 2808#L145
 markup_intraline_changes] as raw strings (not unicode) as
 {{{'\xd0\x90\xd0\xa8\xd0\x98\xd0\x9f\xd0\x9a\xd0\x90'}}} and
 {{{'\xd0\x9e\xd0\xa8\xd0\x98\xd0\x91\xd0\x9a\xd0\x90'}}} accordingly
  1. Then '''str1''' and '''str2''' passed to
 [source:/trunk/trac/versioncontrol/diff.py at 2808#L25 _get_change_extent].
 But in this raw strings extent calculated from {{{'\x9e'}}} (second octet
 of UTF8 char).
  1. Results after tag substitution:
 {{{'\xd0<del>\x90\xd0\xa8\xd0\x98\xd0\x9f</del>\xd0\x9a\xd0\x90'}}} and
 {{{'\xd0<add>\x9e\xd0\xa8\xd0\x98\xd0\x91</add>\xd0\x9a\xd0\x90'}}}. First
 UTF8 chars are broken.

 Possible fix: use unicode strings for extent calculation.

 Quick (and dirty) hack:
 {{{
 --- diff.py.orig        Mon Mar 13 13:43:21 2006
 +++ diff.py     Mon Mar 13 15:26:11 2006
 @@ -148,6 +147,11 @@
              if tag == 'replace' and i2 - i1 == j2 - j1:
                  for i in range(i2 - i1):
                      fromline, toline = fromlines[i1 + i], tolines[j1 + i]
 +
 +                    fromline, toline = fromline.decode('utf8'),
 toline.decode('utf8')
 +
                      (start, end) = _get_change_extent(fromline, toline)

                      if start == 0 and end < 0:
 @@ -170,6 +174,12 @@
                          tolines[j1 + i] = toline[:start] + '\0' + \
                                            toline[start:end] + '\1' + \
                                            toline[end:]
 +
 +                    fromlines[i1 + i] = fromlines[i1 + i].encode('utf8')
 +                    tolines[j1 + i] = tolines[j1 + i].encode('utf8')
 +
              yield tag, i1, i2, j1, j2

      changes = []
 }}}

-- 
Ticket URL: <http://projects.edgewall.com/trac/ticket/2868>
The Trac Project <http://trac.edgewall.com/>


More information about the Trac-Tickets mailing list