Since all device address maps are now class methods defined in ordinary C++, default RAM maps can be provided more simply with an explicit has_configured_map check in an internal map definition.
A number of default address maps that probably weren't meant to be overridden have also been changed to ordinary internal maps.
Associated core changes (nw)
- Move definition of address_space_config from dimemory.cpp to emumem.cpp (declaration was already in emumem.h)
- Add getters for more members of address_space_config with future privatization in mind (nw)
A standard memory handler has as a prototype (where uX = u8, u16, u32 or u64):
uX device::read(address_space &space, offs_t offset, uX mem_mask);
void device::write(address_space &space, offs_t offset, uX data, uX mem_mask);
We now allow simplified versions which are:
uX device::read(offs_t offset, uX mem_mask);
void device::write(offs_t offset, uX data, uX mem_mask);
uX device::read(offs_t offset);
void device::write(offs_t offset, uX data);
uX device::read();
void device::write(uX data);
Use them at will. Also consider
(DECLARE_)(READ|WRITE)(8|16|32|64)_MEMBER on the way out, use the
explicit prototypes.
Same for lambdas in the memory map, the parameters are now optional
following the same combinations.
* Make more #include guards follow standard format - using MAME_ as the prefix makes it easy to see which ones come from our code in a preprocessor dump, and having both src/devices/machine/foo.h and src/mame/machine/foo.h causes issues anyway
* Get #include "emu.h" out of headers - it should only be the first thing in a complilation unit or we get differences in behaviour with PCH on/off
* Add out-of-line destructors to some devices - it forces the compiler to instantiate the vtable in a certain location and avoids some non-deterministic compiler behaviours
This change is intended to expedite debugging of software written for the TMPZ84C015 or similar Z80-based controllers which use 8-bit I/O addressing for the on-chip peripherals but may use either 8-bit or 16-bit addressing externally.
The new cswidth address map constructor method overrides the masking normally performed on narrow-width accesses. This entailed a lot of reconfiguration to make the shifting and masking of subunits independent operations. There is unlikely to have any significant performance impact on drivers that don't frequently reconfigure their memory handlers.
This allows for the much more natural "import another map and patch
it" structure, or "cover a whole region then punch holes in it". Our
previous first-entry-wins rule was always a surprise to newcomers, and
oldcomers too.
please people, remember to keep source UTF-8 and if you're committing on behalf of others, clean up indents to meet MAME conventions
anyone can run srcclean over a submission and see what will get hit
* direct_read_data is now a template which takes the address bus shift
as a parameter.
* address_space::direct<shift>() is now a template method that takes
the shift as a parameter and returns a pointer instead of a
reference
* the address to give to {read|write}_* on address_space or
direct_read_data is now the address one wants to access
Longer explanation:
Up until now, the {read|write}_* methods required the caller to give
the byte offset instead of the actual address. That's the same on
byte-addressing CPUs, e.g. the ones everyone knows, but it's different
on the word/long/quad addressing ones (tms, sharc, etc...) or the
bit-addressing one (tms340x0). Changing that required templatizing
the direct access interface on the bus addressing granularity,
historically called address bus shift. Also, since everybody was
taking the address of the reference returned by direct(), and
structurally didn't have much choice in the matter, it got changed to
return a pointer directly.
Longest historical explanation:
In a cpu core, the hottest memory access, by far, is the opcode
fetching. It's also an access with very good locality (doesn't move
much, tends to stay in the same rom/ram zone even when jumping around,
tends not to hit handlers), which makes efficient caching worthwhile
(as in, 30-50% faster core iirc on something like the 6502, but that
was 20 years ago and a number of things changed since then). In fact,
opcode fetching was, in the distant past, just an array lookup indexed
by pc on an offset pointer, which was updated on branches. It didn't
stay that way because more elaborate access is often needed (handlers,
banking with instructions crossing a bank...) but it still ends up with
a frontend of "if the address is still in the current range read from
pointer+address otherwise do the slowpath", e.g. two usually correctly
predicted branches plus the read most of the time.
Then the >8 bits cpus arrived. That was ok, it just required to do
the add to a u8 *, then convert to a u16/u32 * and do the read. At
the asm level, it was all identical except for the final read, and
read_byte/word/long being separate there was no test (and associated
overhead) added in the path.
Then the word-addressing CPUs arrived with, iirc, the tms cpus used in
atari games. They require, to read from the pointer, to shift the
address, either explicitely, or implicitely through indexing a u16 *.
There were three possibilities:
1- create a new read_* method for each size and granularity. That
amounts to a lot of copy/paste in the end, and functions with
identical prototypes so the compiler can't detect you're using the
wrong one.
2- put a variable shift in the read path. That was too expensive
especially since the most critical cpus are byte-addressing (68000 at
the time was the key). Having bit-adressing cpus which means the
shift can either be right or left depending on the variable makes
things even worse.
3- require the caller to do the shift himself when needed.
The last solution was chosen, and starting that day the address was a
byte offset and not the real address. Which is, actually, quite
surprising when writing a new cpu core or, worse, when using the
read/write methods from the driver code.
But since then, C++ happened. And, in particular, templates with
non-type parameters. Suddendly, solution 1 can be done without the
copy/paste and with different types allowing to detect (at runtime,
but systematically and at startup) if you got it wrong, while still
generating optimal code. So it was time to switch to that solution
and makes the address parameter sane again. Especially since it makes
mucking in the rest of the memory subsystem code a lot more
understandable.
* move rarely-used output and pty interfaces out of emu.h
* consolidate and de-duplicate forward declarations, also remove some obsolete ones
* clean up more #include guard macros
* scope down a few more things
(nw) Everyone, please keep forward declarations for src/emu in src/emu/emufwd.h -
this will make it far easier to keep them in sync with declarations than having
them scattered through all the other files.