mame/docs/source/techspecs/device_disasm_interface.rst

The device_disasm_interface and the disassemblers
=================================================

1. Capabilities
---------------

The disassemblers are classes that provide disassembly and opcode
meta-information for the cpu cores and **unidasm**.  The
**device_disasm_interface** connects a cpu core with its disassembler.

2. The disassemblers
--------------------

2.1. Definition
~~~~~~~~~~~~~~~

A disassembler is a class that derives from
**util::disasm_interface**.  It then has two required methods to
implement, **opcode_alignment** and **disassemble**, and 6 optional,
**interface_flags**, **page_address_bits**, **pc_linear_to_real**,
**pc_real_to_linear**, and one with four possible variants,
**decrypt8/16/32/64**.


2.2. opcode_alignment
~~~~~~~~~~~~~~~~~~~~~

| u32 \ **opcode_alignment**\ () const

Returns the required alignment of opcodes by the cpu, in PC-units.  In
other words, the required alignment for the PC register of the cpu.
Tends to be 1 (almost everything), 2 (68000...), 4 (mips, ppc...),
which an exceptional 8 (tms 32082 parallel processor) and 16
(tms32010, instructions are 16-bits aligned and the PC targets bits).
It must be a power-of-two or things will break.

Note that processors like the tms32031 which have 32-bits instructions
but where the PC targets 32-bits values have an alignment of 1.

2.3. disassemble
~~~~~~~~~~~~~~~~

| offs_t \ **disassemble**\ (std::ostream &stream, offs_t pc, const data_buffer &opcodes, const data_buffer &params)

This is the method where the real work is done.  This method must
disassemble the instruction at address *pc* and write the result to
*stream*.  The values to decode are retrieved from the *opcode*
buffer.  A **data_buffer** object offers four accessor methods:

| u8  util::disasm_interface::data_buffer::\ **r8**\  (offs_t pc) const
| u16 util::disasm_interface::data_buffer::\ **r16**\ (offs_t pc) const
| u32 util::disasm_interface::data_buffer::\ **r32**\ (offs_t pc) const
| u64 util::disasm_interface::data_buffer::\ **r64**\ (offs_t pc) const

They read the data at a given address and take endianness and
nonlinear PCs for larger-than-bus-width accesses.  The debugger
variant also caches the read data in one block, so for that reason one
should not read data too far from the base pc (e.g. stay within 16K or
so, careful when trying to follow indirect accesses).

A number of CPUs have an external signal that splits fetches into an
opcode part and a parameter part.  This is for instance the M1 signal
of the z80 or the SYNC signal of the 6502.  Some systems present
different values to the cpu depending on whether that signal is
active, usually for protection purposes.  On these cpus the opcode
part should be read from the *opcode* buffer, and the parameter part
from the *params* buffer.  They will or will not be the same buffer
depending on the system itself.

The method returns the size of the instruction in PC units, with a
maximum of 65535.  In addition, if possible, the disassembler should
give some meta-information about the opcode by OR-ing in into the
result:

* **STEP_OVER** for subroutine calls or auto-decrementing loops.  If there is some delay slots, also OR with **step_over_extra**\ (n) where n is the number of instruction slots.
* **STEP_OUT** for the return-from-subroutine instructions

In addition, to indicated that these flags are supported, OR the
result with **SUPPORTED**\ .  An annoying number of disassemblers lies
about that support (e.g. they do a or with **SUPPORTED** without even
generating the **STEP_OVER** or **STEP_OUT** information).  Don't do
that, it breaks the step over/step out functionality of the debugger.

2.4. interface_flags
~~~~~~~~~~~~~~~~~~~~

| u32 **interface_flags**\ () const

That optional method indicates specifics of the disassembler.  Default
of zero is correct most of the time.  Possible flags, which need to be
OR-ed together, are:

* **NONLINEAR_PC**\ : stepping to the next opcode or the next byte of the opcode is not adding one to pc.  Used for old LFSR-based PCs.
* **PAGED**\ : PC wraps at a page boundary
* **PAGED2LEVEL**\ : not only PC wraps at some kind of page boundary, but there are two levels of paging
* **INTERNAL_DECRYPTION**\ : there is some decryption tucked between reading from AS_PROGRAM and the actual disassembler
* **SPLIT_DECRYPTION**\ : there is some decryption tucked between reading from AS_PROGRAM and the actual disassembler, and that decryption is different for opcodes and parameters

Note that in practice non-linear pc systems are also paged, that
**PAGED2LEVEL** implies **PAGED**, and that **SPLIT_DECRYPTION**
implies **DECRYPTION**.


2.5. pc_linear_to_real and pc_real_to_linear
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

| offs_t **pc_linear_to_real**\ (offs_t pc) const
| offs_t **pc_real_to_linear**\ (offs_t pc) const

These methods should be present only when **NONLINEAR_PC** is set in
the interface flags.  They must convert pc to and from a value to a
linear domain where the instruction parameters and next instruction
are reached by incrementing the value.  **pc_real_to_linear** converts
to that domain, **pc_linear_to_real** converts back from that domain.


2.6. page_address_bits
~~~~~~~~~~~~~~~~~~~~~~

| u32 **page_address_bits**\ () const

Present on when **PAGED** or **PAGED2LEVEL** is set, gives the number
of address bits in the lowest page.


2.7. page2_address_bits
~~~~~~~~~~~~~~~~~~~~~~~

| u32 **page2_address_bits**\ () const

Present on when **PAGED2LEVEL** is set, gives the number
of address bits in the upper page.

2.8. decryptnn
~~~~~~~~~~~~~~

| u8  **decrypt8**\  (u8  value, offs_t pc, bool opcode) const
| u16 **decrypt16**\ (u16 value, offs_t pc, bool opcode) const
| u32 **decrypt32**\ (u32 value, offs_t pc, bool opcode) const
| u64 **decrypt64**\ (u64 value, offs_t pc, bool opcode) const

One of these must be defined when **INTERNAL_DECRYPTION** or
**SPLIT_DECRYPTION** is set.  The chosen one is the one which takes
what **opcode_alignment** represents in bytes.

That method decrypts a given value read from address pc (from
AS_PROGRAM) and gives the result which will be passed to the
disassembler.  In the split decryption case, opcode indicates whether
we're in the opcode (true) or parameter (false) part of the
instruction.


3. Disassembler interface, device_disasm_interface
--------------------------------------------------

3.1. Definition
~~~~~~~~~~~~~~~

A CPU core derives from **device_disasm_interface** through
**cpu_device**\ .  One method has to be implemented,
**create_disassembler**\ .

3.2. create_disassembler
~~~~~~~~~~~~~~~~~~~~~~~~

| util::disasm_interface \*\ **create_disassembler**\ ()

That method must return a pointer to a newly allocated disassembler
object.  The caller takes ownership and handles the lifetime.

This method will be called at most one in the lifetime of the cpu
object.

4. Disassembler configuration and communication
-----------------------------------------------

Some disassemblers need to be configured.  Configuration can be
unchanging (static) for the duration of the run (cpu model type for
instance) or dynamic (state of a flag or a user preference).  Static
configuration can be done through either (a) parameter(s) to the
disassembler constructor, or through deriving a main disassembler
class.  If the information is short and its semantics obvious (like a
model name), feel free to use a parameter.  Otherwise derive the
class.

Dynamic configuration must be done by first defining a nested public
struct called ``config`` in the disassembler, with virtual destructor
and pure virtual methods to pull the required information.  A pointer
to that struct should be passed to the disassembler constructor.  The
cpu core should then add a derivation from that config struct and
implement the methods.  Unidasm will have to derive a small class from
the config class to give the information.

5. Missing stuff
----------------

There currently is no way for the debugger GUI to add per-core
configuration.  In particular, it is needed for the s2650 and Saturn
cores. It should go through the cpu core class itself, since it's
pulled from the config struct.

There is support missing in unidasm for per-cpu configuration.  That's
needed for a lot of things, see the unidasm source code for the
current list ("Configuration missing" comments).