please people, remember to keep source UTF-8 and if you're committing on behalf of others, clean up indents to meet MAME conventions
anyone can run srcclean over a submission and see what will get hit
- reset scheduler savestate to what it was for years before rewind
-- changing saved variables should be done after thorough testing. right now, adding some vars breaks some machines, adding other vars breaks others
- switch to megabyte-wise capacity
-- savestate size greatly differs between machines, relying on state count is unstable
- switch to internal indexing
-- no longer depends on inaccurate machine time
- rewind accelerator key in debugger (Ctrl+F11)
- report capacity hit (once), with some useful info
- make error reports saner
- mention rewind and rewind_capacity in the docs
This starts the work requested in #2398.
How RAM states work.
Implemented using util::vectorstream. Instead of dumping m_save.m_entry_list to file, it writes them as binary to vectorstream. Compression is not used, as it would slow down the process. The header is written as usual, also in binary. When a state is loaded, the savestate data gets binary-read from vectorstream.
How rewind works.
Rewind is optional, it can be turned off through MAME GUI while not running. Rewind capacity is available there too. Rewind step hotkey is available from the standard hotkey menu. In the debugger you have the "rewind" command ("rw" shortcut) that works the same as the hotkey.
Every time you advance a frame (pause step), rewinder captures a RAM savestate of the frame you were at. It does the same when you do step into/over/out in the debugger. Every time it captures a new state (and when you unpause), it marks as invalid all its states that go after the current machine time, because input might change, so they are not relevant anymore. It keeps their buffers allocated though, for future use. When rewinder runs out of allowed amount of savestates it can have, it invalidates the first state in the list and tosses its unique_ptr to the end of the list, then it uses its buffer to capture a new state. When you hit the rewind step key, or use "rewind" command in the debugger, it loads a state that is immediately before the current machine time. Invalid states between valid ones are not allowed to appear, as that breaks rewinder integrity and causes problems. Rewinder keeps its own set of ram states as a vector of unique_ptr's. All rewinder operations and errors get reported using machine().popmessage().
* direct_read_data is now a template which takes the address bus shift
as a parameter.
* address_space::direct<shift>() is now a template method that takes
the shift as a parameter and returns a pointer instead of a
reference
* the address to give to {read|write}_* on address_space or
direct_read_data is now the address one wants to access
Longer explanation:
Up until now, the {read|write}_* methods required the caller to give
the byte offset instead of the actual address. That's the same on
byte-addressing CPUs, e.g. the ones everyone knows, but it's different
on the word/long/quad addressing ones (tms, sharc, etc...) or the
bit-addressing one (tms340x0). Changing that required templatizing
the direct access interface on the bus addressing granularity,
historically called address bus shift. Also, since everybody was
taking the address of the reference returned by direct(), and
structurally didn't have much choice in the matter, it got changed to
return a pointer directly.
Longest historical explanation:
In a cpu core, the hottest memory access, by far, is the opcode
fetching. It's also an access with very good locality (doesn't move
much, tends to stay in the same rom/ram zone even when jumping around,
tends not to hit handlers), which makes efficient caching worthwhile
(as in, 30-50% faster core iirc on something like the 6502, but that
was 20 years ago and a number of things changed since then). In fact,
opcode fetching was, in the distant past, just an array lookup indexed
by pc on an offset pointer, which was updated on branches. It didn't
stay that way because more elaborate access is often needed (handlers,
banking with instructions crossing a bank...) but it still ends up with
a frontend of "if the address is still in the current range read from
pointer+address otherwise do the slowpath", e.g. two usually correctly
predicted branches plus the read most of the time.
Then the >8 bits cpus arrived. That was ok, it just required to do
the add to a u8 *, then convert to a u16/u32 * and do the read. At
the asm level, it was all identical except for the final read, and
read_byte/word/long being separate there was no test (and associated
overhead) added in the path.
Then the word-addressing CPUs arrived with, iirc, the tms cpus used in
atari games. They require, to read from the pointer, to shift the
address, either explicitely, or implicitely through indexing a u16 *.
There were three possibilities:
1- create a new read_* method for each size and granularity. That
amounts to a lot of copy/paste in the end, and functions with
identical prototypes so the compiler can't detect you're using the
wrong one.
2- put a variable shift in the read path. That was too expensive
especially since the most critical cpus are byte-addressing (68000 at
the time was the key). Having bit-adressing cpus which means the
shift can either be right or left depending on the variable makes
things even worse.
3- require the caller to do the shift himself when needed.
The last solution was chosen, and starting that day the address was a
byte offset and not the real address. Which is, actually, quite
surprising when writing a new cpu core or, worse, when using the
read/write methods from the driver code.
But since then, C++ happened. And, in particular, templates with
non-type parameters. Suddendly, solution 1 can be done without the
copy/paste and with different types allowing to detect (at runtime,
but systematically and at startup) if you got it wrong, while still
generating optimal code. So it was time to switch to that solution
and makes the address parameter sane again. Especially since it makes
mucking in the rest of the memory subsystem code a lot more
understandable.
Disassemblers are now independant classes. Not only the code is
cleaner, but unidasm has access to all the cpu cores again. The
interface to the disassembly method has changed from byte buffers to
objects that give a result to read methods. This also adds support
for lfsr and/or paged PCs.
- memory_translate now returns an address space number rather a boolean flag, permitting addresses in part of one space to map to an entirely different space. This is primarily intended to help MCUs which have blocks of internal memory that can be dynamically remapped, but may also allow for more accurate emulation of MMUs that drive multiple external address spaces, since the old limit of four address spaces per MAME device has been lifted.
- memory_translate has also been made a const method, in spite of a couple of badly behaved CPU cores that can't honestly treat it as one.
- The (read|write)_(byte|word|dword|qword|memory|opcode) accessors have been transferred from debugger_cpu to device_memory_interface, with somewhat modified arguments corresponding to the translate function it calls through to if requested.
unmapped holes.
Previously, 'dasm' would enter an infinite loop if it hit an
unmapped pc, continuing to grow the output file until the program
was killed.
* New abbreviated types are in osd and util namespaces, and also in global namespace for things that #include "emu.h"
* Get rid of import of cstdint types to global namespace (C99 does this anyway)
* Remove the cstdint types from everything in emu
* Get rid of U64/S64 macros
* Fix a bug in dps16 caused by incorrect use of macro
* Fix debugcon not checking for "do " prefix case-insensitively
* Fix a lot of messed up tabulation
* More constexpr
* Fix up many __names
Use standard uint64_t, uint32_t, uint16_t or uint8_t instead of UINT64, UINT32, UINT16 or UINT8
also use standard int64_t, int32_t, int16_t or int8_t instead of INT64, INT32, INT16 or INT8