mirror of
https://github.com/holub/mame
synced 2025-07-01 00:09:18 +03:00
From: Justin Kerk
Subject: xml_normalize_string() bogusly escapes UTF-8 Various parts of MAME have recently been changed to support UTF-8 strings, so I thought I'd test out using a UTF-8 driver name for the Sam Coupe driver in MESS, just to see if anything breaks. Most things do seem to work well - the name is correctly drawn in the UI etc. One thing that doesn't work properly is the output from -listxml: "Sam Coupe" becomes "Sam Coupé" - each UTF-8 byte is individually escaped, resulting in two gibberish characters instead of the correct character. The culprit here is xml_normalize_string() in src/lib/util/xmlfile.c - the code converts any high-bit byte to an XML escape, which is totally bogus for any encoding but ISO-8859-1 because XML escapes are defined as Unicode codepoints regardless of the document encoding. Fortunately, this is very simple to fix - in fact, it is sufficient just to remove the escaping code and pass through the UTF-8 bytes directly, because UTF-8 is mandated as the default encoding in the XML standard.[1] The attached patch does this. This should be a pretty safe change since as far as I can tell nothing in MAME or MESS currently triggers this code (that is, the string "&#" does not occur in the -listxml output of either). One potentially negative effect is that the ASCII controls which are illegal in XML (0x00-0x19 excepting line breaks and tabs) would no longer be escaped. However, I can't imagine why you would want any in a string destined for -listxml, so IMO that would be a problem elsewhere in the code and having XML parsers barf on it would be desirable. -Justin Kerk
This commit is contained in:
parent
9402f05a08
commit
f57b3da9a3
@ -515,10 +515,7 @@ const char *xml_normalize_string(const char *string)
|
||||
case '<' : d += sprintf(d, "<"); break;
|
||||
case '>' : d += sprintf(d, ">"); break;
|
||||
default:
|
||||
if (*string >= ' ' && *string <= '~')
|
||||
*d++ = *string;
|
||||
else
|
||||
d += sprintf(d, "&#%d;", (unsigned)(unsigned char)*string);
|
||||
*d++ = *string;
|
||||
}
|
||||
++string;
|
||||
}
|
||||
|
Loading…
Reference in New Issue
Block a user