Commit Graph

5 Commits

Author SHA1 Message Date
Aaron Giles
f57b3da9a3 From: Justin Kerk
Subject: xml_normalize_string() bogusly escapes UTF-8

Various parts of MAME have recently been changed to support UTF-8
strings, so I thought I'd test out using a UTF-8 driver name for the
Sam Coupe driver in MESS, just to see if anything breaks. Most things
do seem to work well - the name is correctly drawn in the UI etc. One
thing that doesn't work properly is the output from -listxml: "Sam
Coupe" becomes "Sam Coupé" - each UTF-8 byte is
individually escaped, resulting in two gibberish characters instead of
the correct character.

The culprit here is xml_normalize_string() in src/lib/util/xmlfile.c -
the code converts any high-bit byte to an XML escape, which is totally
bogus for any encoding but ISO-8859-1 because XML escapes are defined
as Unicode codepoints regardless of the document encoding.

Fortunately, this is very simple to fix - in fact, it is sufficient
just to remove the escaping code and pass through the UTF-8 bytes
directly, because UTF-8 is mandated as the default encoding in the XML
standard.[1] The attached patch does this.

This should be a pretty safe change since as far as I can tell nothing
in MAME or MESS currently triggers this code (that is, the string "&#"
does not occur in the -listxml output of either). One potentially
negative effect is that the ASCII controls which are illegal in XML
(0x00-0x19 excepting line breaks and tabs) would no longer be escaped.
However, I can't imagine why you would want any in a string destined
for -listxml, so IMO that would be a problem elsewhere in the code and
having XML parsers barf on it would be desirable.

-Justin Kerk
2008-04-12 05:12:47 +00:00
Aaron Giles
9476c50ee6 Cleanups for 0.124. Marked Mermaid as working per checkin comment. 2008-03-24 04:07:46 +00:00
Aaron Giles
509dc4c064 De-deprecat-ed ui.c.
Changed xmlfile.c to pass in memory handlers to expat so that
our memory overrides are properly managed.
2008-03-21 14:51:26 +00:00
Aaron Giles
ee9f88963c Copyright cleanup:
- removed years from copyright notices
 - removed redundant (c) from copyright notices
 - updated "the MAME Team" to be "Nicola Salmoria and the MAME Team"
2008-01-06 00:47:40 +00:00
Aaron Giles
7b77f12186 Initial checkin of MAME 0.121. 2007-12-17 15:19:59 +00:00