mirror of
https://github.com/holub/mame
synced 2025-04-27 18:53:05 +03:00
1085 lines
44 KiB
ReStructuredText
1085 lines
44 KiB
ReStructuredText
Software 3D Rendering in MAME
|
||
=============================
|
||
|
||
.. contents:: :local:
|
||
|
||
|
||
Background
|
||
----------
|
||
|
||
Beginning in the late 1980s, many arcade games began incorporating hardware-rendered
|
||
3D graphics into their video. These 3D graphics are typically rendered from low-level
|
||
primitives into a frame buffer (usually double- or triple-buffered), then perhaps
|
||
combined with traditional tilemaps or sprites, before being presented to the player.
|
||
|
||
When it comes to emulating 3D games, there are two general approaches. The first
|
||
approach is to leverage modern 3D hardware by mapping the low-level primitives onto
|
||
modern equivalents. For a cross-platform emulator like MAME, this requires having an
|
||
API that is flexible enough to describe the primitives and all their associated
|
||
behaviors with high accuracy. It also requires the emulator to be able to read back
|
||
from the rendered frame buffer (since many games do this) and combine it with other
|
||
elements, in a way that is properly synchronized with background rendering.
|
||
|
||
The alternative approach is to render the low-level primitives directly in software.
|
||
This has the advantage of being able to achieve pretty much any behavior exhibited by
|
||
the original hardware, but at the cost of speed. In MAME, since all emulation happens
|
||
on one thread, this is particularly painful. However, just as with the 3D hardware
|
||
approach, in theory a software-based approach could be spun off to other threads to
|
||
handle the work, as long as mechanisms were present to synchronize when necessary,
|
||
for example, when reading/writing directly to/from the frame buffer.
|
||
|
||
For the time being, MAME has opted for the second approach, leveraging a templated
|
||
helper class called **poly_manager** to handle common situations.
|
||
|
||
|
||
Concepts
|
||
--------
|
||
|
||
At its core, **poly_manager** is a mechanism to support multi-threaded rendering of
|
||
low-level 3D primitives. Callers provide **poly_manager** with a set of *vertices* for a
|
||
primitive plus a *render callback*. **poly_manager** breaks the primitive into
|
||
clipped scanline *extents* and distributes the work among a pool of *worker
|
||
threads*. The render callback is then called on the worker thread for each extent,
|
||
where game-specific logic can do whatever needs to happen to render the data.
|
||
|
||
One key responsibility that **poly_manager** takes care of is ensuring order. Given a
|
||
pool of threads and a number of work items to complete, it is important that—at least
|
||
within a given scanline—all work is performed serially in order. The basic approach is
|
||
to assign each extent to a *bucket* based on the Y coordinate. **poly_manager** then ensures
|
||
that only one worker thread at a time is responsible for processing work in a given bucket.
|
||
|
||
Vertices in **poly_manager** consist of simple 2D X and Y *coordinates*, plus zero or
|
||
more additional *iterated parameters*. These iterated parameters can be anything: intensity
|
||
values for lighting; RGB(A) colors for Gouraud shading; normalized U, V coordinates for
|
||
texture mapping; 1/Z values for Z buffering; etc. Iterated parameters, regardless of what
|
||
they represent, are interpolated linearly across the primitive in screen space and provided
|
||
as part of the extent to the render callback.
|
||
|
||
|
||
ObjectType
|
||
~~~~~~~~~~
|
||
|
||
When creating a **poly_manager** class, you must provide it a special type that you define,
|
||
known as **ObjectType**.
|
||
|
||
Because rendering happens asynchronously on worker threads, the idea is that the
|
||
**ObjectType** class will hold a snapshot of all the relevant data needed for rendering.
|
||
This allows the main thread to proceed—potentially modifying some of the relevant state—while
|
||
rendering happens elsewhere.
|
||
|
||
In theory, we could allocate a new **ObjectType** class for each primitive rendered;
|
||
however, that would be rather inefficient. It is quite common to set up the rendering
|
||
state and then render several primitives using the same state.
|
||
|
||
For this reason, **poly_manager** maintains an internal array of **ObjectType** objects and
|
||
keeps a copy of the last **ObjectType** used. Before submitting a new primitive, callers
|
||
can see if the rendering state has changed. If it has, it can ask **poly_manager** to allocate
|
||
a new **ObjectType** class and fill it in. When the primitive is submitted for rendering, the
|
||
most recently allocated **ObjectType** instance is implicitly captured and provided to the
|
||
render callbacks.
|
||
|
||
For more complex scenarios, where data might change even more infrequently, there is a
|
||
**poly_array** template, which can be used to manage data in a similar way. In fact,
|
||
internally **poly_manager** uses the **poly_array** class to manage its **ObjectType**
|
||
allocations. More information on the **poly_array** class is provided later.
|
||
|
||
|
||
|
||
Primitives
|
||
~~~~~~~~~~
|
||
|
||
**poly_manager** supports several different types of primitives:
|
||
|
||
* The most commonly-used primitive in **poly_manager** is the *triangle*, which has the
|
||
nice property that iterated parameters have constant deltas across the full surface.
|
||
Arbitrary-length *triangle fans* and *triangle strips* are also supported.
|
||
|
||
* In addition to triangles, **poly_manager** also supports *polygons* with an arbitrary
|
||
number of vertices. The list of vertices is expected to be in either clockwise or
|
||
anticlockwise order. **poly_manager** will walk the edges to compute deltas across
|
||
each extent.
|
||
|
||
* As a special case, **poly_manager** supports a *tile* primitive, which is a simple quad
|
||
defined by two vertices, a top-left vertex and a bottom-right vertex. Like triangles,
|
||
tiles have constant iterated parameter deltas across their surface.
|
||
|
||
* Finally, **poly_manager** supports a fully custom mechanism where the caller provides
|
||
a list of extents that are more or less fed directly to the worker threads.
|
||
This is useful if emulating a system that has unusual primitives or requires highly
|
||
specific behaviors for its edges.
|
||
|
||
|
||
Synchronization
|
||
~~~~~~~~~~~~~~~
|
||
|
||
One of the key requirements of providing an asynchronous rendering mechanism is
|
||
synchronization. Synchronization in **poly_manager** is super simple: just
|
||
call the ``wait()`` function.
|
||
|
||
There are several common reasons for issuing a wait:
|
||
|
||
* At display time, the pixel data must be copied to the screen. If any primitives were
|
||
queued which touch the portion of the display that is going to be shown, you need to
|
||
wait for rendering to be complete before copying. Note that this wait may not be
|
||
strictly necessary in some situations (for example, a triple-buffered system).
|
||
|
||
* If the emulated system has a mechanism to read back from the framebuffer after
|
||
rendering, then a wait must be issued prior to the read in order to ensure that
|
||
asynchronous rendering is complete.
|
||
|
||
* If the emulated system modifies any state that is not cached in the **ObjectType**
|
||
or elsewhere (for example, texture memory), then a wait must be issued to ensure
|
||
that pending primitives which might consume that state have finished their work.
|
||
|
||
* If the emulated system can use a previous render target as, say, the texture source
|
||
for a new primitive, then submitting the second primitive must wait until the first
|
||
completes. **poly_manager** provides no internal mechanism to help detect this, so it
|
||
is on the caller to determine when or if this is necessary.
|
||
|
||
Because the wait operation knows after it is done that all rendering is complete,
|
||
**poly_manager** also takes this opportunity to reclaim all memory allocated for its
|
||
internal structures, as well as memory allocated for **ObjectType** structures. Thus it is
|
||
important that you don’t hang onto any **ObjectType** pointers after a wait is called.
|
||
|
||
|
||
The poly_manager class
|
||
----------------------
|
||
|
||
In most applications, **poly_manager** is not used directly, but rather serves as
|
||
the base class for a more complete rendering class. The **poly_manager** class
|
||
itself is a template::
|
||
|
||
template<typename BaseType, class ObjectType, int MaxParams, u8 Flags = 0>
|
||
class poly_manager;
|
||
|
||
and the template parameters are:
|
||
|
||
* **BaseType** is the type used internally for coordinates and iterated parameters, and
|
||
should generally be either ``float`` or ``double``. In theory, a fixed-point integral
|
||
type could also be used, though the math logic has not been designed for that, so you
|
||
may encounter problems.
|
||
|
||
* **ObjectType** is the user-defined per-object data structure described above.
|
||
Internally, **poly_manager** will manage a **poly_array** of these, and a pointer to
|
||
the most-recently allocated one at the time a primitive is submitted will be implicitly
|
||
passed to the render callback for each corresponding extent.
|
||
|
||
* **MaxParams** is the maximum number of iterated parameters that may be specified in a
|
||
vertex. Iterated parameters are generic and treated identically, so the mapping of
|
||
parameter indices is completely up to the contract between the caller and the render
|
||
callback. It is permitted for **MaxParams** to be 0.
|
||
|
||
* **Flags** is zero or more of the following flags:
|
||
|
||
- POLY_FLAG_NO_WORK_QUEUE — specify this flag to disable asynchronous rendering; this
|
||
can be useful for debugging. When this option is enabled, all primitives are queued
|
||
and then processed in order on the calling thread when ``wait()`` is called on the
|
||
**poly_manager** class.
|
||
|
||
- POLY_FLAG_NO_CLIPPING — specify this if you want **poly_manager** to skip its
|
||
internal clipping. Use this if your render callbacks do their own clipping, or if
|
||
the caller always handles clipping prior to submitting primitives.
|
||
|
||
|
||
Types & Constants
|
||
~~~~~~~~~~~~~~~~~
|
||
|
||
vertex_t
|
||
++++++++
|
||
|
||
Within the **poly_manager** class, you’ll find a **vertex_t** type that describes a
|
||
single vertex. All primitive drawing methods accept 2 or more of these **vertex_t**
|
||
objects. The **vertex_t** includes the X and Y coordinates along with an array of
|
||
iterated parameter values at that vertex::
|
||
|
||
struct vertex_t
|
||
{
|
||
vertex_t() { }
|
||
vertex_t(BaseType _x, BaseType _y) { x = _x; y = _y; }
|
||
|
||
BaseType x, y; // X, Y coordinates
|
||
std::array<BaseType, MaxParams> p; // iterated parameters
|
||
};
|
||
|
||
Note that **vertex_t** itself is defined in terms of the **BaseType** and **MaxParams**
|
||
template values of the owning **poly_manager** class.
|
||
|
||
All of **poly_manager**’s primitives operate in screen space, where (0,0) represents the
|
||
top-left corner of the top-left pixel, and (0.5,0.5) represents the center of that pixel.
|
||
Left and top pixel values are inclusive, while right and bottom pixel values are exclusive.
|
||
|
||
Thus, a *tile* rendered from (2,2)-(4,3) will completely cover 2 pixels: (2,2) and (3,2).
|
||
|
||
When calling a primitive drawing method, the iterated parameter array **p** need not be
|
||
completely filled out. The number of valid iterated parameter values is specified as a
|
||
template parameter to the primitive drawing methods, so only that many parameters need to
|
||
actually be populated in the **vertex_t** structures that are passed in.
|
||
|
||
|
||
extent_t
|
||
++++++++
|
||
|
||
**poly_manager** breaks primitives into extents, which are contiguous horizontal spans
|
||
contained within a single scanline. These extents are then distributed to worker threads,
|
||
who will call the render callback with information on how to render each extent. The
|
||
**extent_t** type describes one such extent, providing the bounding X coordinates along with
|
||
an array of iterated parameter start values and deltas across the span::
|
||
|
||
struct extent_t
|
||
{
|
||
struct param_t
|
||
{
|
||
BaseType start; // parameter value at start
|
||
BaseType dpdx; // dp/dx relative to start
|
||
};
|
||
int16_t startx, stopx; // starting (inclusive)/ending (exclusive) endpoints
|
||
std::array<param_t, MaxParams> param; // array of parameter start/deltas
|
||
void *userdata; // custom per-span data
|
||
};
|
||
|
||
For each iterated parameter, the **start** value contains the value at the left side of
|
||
the span. The **dpdx** value contains the change of the parameter’s value per X coordinate.
|
||
|
||
There is also a **userdata** field in the **extent_t** structure, which is not normally used,
|
||
except when performing custom rendering.
|
||
|
||
|
||
render_delegate
|
||
+++++++++++++++
|
||
|
||
When rendering a primitive, in addition to the vertices, you must also provide a
|
||
**render_delegate** callback of the form::
|
||
|
||
void render(int32_t y, extent_t const &extent, ObjectType const &object, int threadid)
|
||
|
||
This callback is responsible for the actual rendering. It will be called at a later time,
|
||
likely on a different thread, for each extent. The parameters passed are:
|
||
|
||
* **y** is the Y coordinate (scanline) of the current extent.
|
||
|
||
* **extent** is a reference to a **extent_t** structure, described above, which specifies for
|
||
this extent the start/stop X values along with the start/delta values for each iterated
|
||
parameter.
|
||
|
||
* **object** is a reference to the most recently allocated **ObjectType** at the time the
|
||
primitive was submitted for rendering; in theory it should contain most of not all of the
|
||
necessary data to perform rendering.
|
||
|
||
* **threadid** is a unique ID indicating the index of the thread you’re running on; this value
|
||
is useful if you are keeping any kind of statistics and don’t want to add contention over
|
||
shared values. In this situation, you can allocate **WORK_MAX_THREADS** instances of your
|
||
data and update the instance for the **threadid** you are passed. When you want to display
|
||
the statistics, the main thread can accumulate and reset the data from all threads when it’s
|
||
safe to do so (e.g., after a wait).
|
||
|
||
|
||
Methods
|
||
~~~~~~~
|
||
|
||
poly_manager
|
||
++++++++++++
|
||
::
|
||
|
||
poly_manager(running_machine &machine);
|
||
|
||
The **poly_manager** constructor takes just one parameter, a reference to the
|
||
**running_machine**. This grants **poly_manager** access to the work queues needed for
|
||
multithreaded running.
|
||
|
||
wait
|
||
++++
|
||
::
|
||
|
||
void wait(char const *debug_reason = "general");
|
||
|
||
Calling ``wait()`` stalls the calling thread until all outstanding rendering is complete:
|
||
|
||
* **debug_reason** is an optional parameter specifying the reason for the wait. It is
|
||
useful if the compile-time constant **TRACK_POLY_WAITS** is enabled, as it will print a
|
||
summary of wait times and reasons at the end of execution.
|
||
|
||
**Return value:** none.
|
||
|
||
object_data
|
||
+++++++++++
|
||
::
|
||
|
||
objectdata_array &object_data();
|
||
|
||
This method just returns a reference to the internally-maintained **poly_array** of the
|
||
**ObjectType** you specified when creating **poly_manager**. For most applications, the
|
||
only interesting thing to do with this object is call the ``next()`` method to allocate
|
||
a new object to fill out.
|
||
|
||
**Return value:** reference to a **poly_array** of **ObjectType**.
|
||
|
||
register_poly_array
|
||
+++++++++++++++++++
|
||
::
|
||
|
||
void register_poly_array(poly_array_base &array);
|
||
|
||
For advanced applications, you may choose to create your own **poly_array** objects to
|
||
manage large chunks of infrequently-changed data, such a palettes. After each ``wait()``,
|
||
**poly_manager** resets all the **poly_array** objects it knows about in order to reclaim all
|
||
outstanding allocated memory. By registering your **poly_array** objects here, you can ensure
|
||
that your arrays will also be reset after an ``wait()`` call.
|
||
|
||
**Return value:** none.
|
||
|
||
render_tile
|
||
+++++++++++
|
||
::
|
||
|
||
template<int ParamCount>
|
||
uint32_t render_tile(rectangle const &cliprect, render_delegate callback,
|
||
vertex_t const &v1, vertex_t const &v2);
|
||
|
||
This method enqueues a single *tile* primitive for rendering:
|
||
|
||
* **ParamCount** is the number of live values in the iterated parameter array within each
|
||
**vertex_t** provided; it must be no greater than the **MaxParams** value specified in the
|
||
**poly_manager** template instantiation.
|
||
|
||
* **cliprect** is a reference to a clipping rectangle. All pixels and parameter values are
|
||
clipped to stay within these bounds before being added to the work queues for rendering,
|
||
unless **POLY_FLAG_NO_CLIPPING** was specified as a flag parameter to **poly_manager**.
|
||
|
||
* **callback** is the render callback delegate that will be called to render each extent.
|
||
|
||
* **v1** contains the coordinates and iterated parameters for the top-left corner of the tile.
|
||
|
||
* **v2** contains the coordinates and iterated parameters for the bottom-right corner of the tile.
|
||
|
||
**Return value:** the total number of clipped pixels represented by the enqueued extents.
|
||
|
||
render_triangle
|
||
+++++++++++++++
|
||
::
|
||
|
||
template<int ParamCount>
|
||
uint32_t render_triangle(rectangle const &cliprect, render_delegate callback,
|
||
vertex_t const &v1, vertex_t const &v2, vertex_t const &v3);
|
||
|
||
This method enqueues a single *triangle* primitive for rendering:
|
||
|
||
* **ParamCount** is the number of live values in the iterated parameter array within each
|
||
**vertex_t** provided; it must be no greater than the **MaxParams** value specified in the
|
||
**poly_manager** template instantiation.
|
||
|
||
* **cliprect** is a reference to a clipping rectangle. All pixels and parameter values are
|
||
clipped to stay within these bounds before being added to the work queues for rendering,
|
||
unless **POLY_FLAG_NO_CLIPPING** was specified as a flag parameter to **poly_manager**.
|
||
|
||
* **callback** is the render callback delegate that will be called to render each extent.
|
||
|
||
* **v1**, **v2**, **v3** contain the coordinates and iterated parameters for each vertex
|
||
of the triangle.
|
||
|
||
**Return value:** the total number of clipped pixels represented by the enqueued extents.
|
||
|
||
render_triangle_fan
|
||
+++++++++++++++++++
|
||
::
|
||
|
||
template<int ParamCount>
|
||
uint32_t render_triangle_fan(rectangle const &cliprect, render_delegate callback,
|
||
int numverts, vertex_t const *v);
|
||
|
||
This method enqueues one or more *triangle* primitives for rendering, specified in fan order:
|
||
|
||
* **ParamCount** is the number of live values in the iterated parameter array within each
|
||
**vertex_t** provided; it must be no greater than the **MaxParams** value specified in the
|
||
**poly_manager** template instantiation.
|
||
|
||
* **cliprect** is a reference to a clipping rectangle. All pixels and parameter values are
|
||
clipped to stay within these bounds before being added to the work queues for rendering,
|
||
unless **POLY_FLAG_NO_CLIPPING** was specified as a flag parameter to **poly_manager**.
|
||
|
||
* **callback** is the render callback delegate that will be called to render each extent.
|
||
|
||
* **numverts** is the total number of vertices provided; it must be at least 3.
|
||
|
||
* **v** is a pointer to an array of **vertex_t** objects containing the coordinates and iterated
|
||
parameters for all the triangles, in fan order. This means that the first vertex is fixed.
|
||
So if 5 vertices are provided, indicating 3 triangles, the vertices used will be:
|
||
(0,1,2) (0,2,3) (0,3,4)
|
||
|
||
**Return value:** the total number of clipped pixels represented by the enqueued extents.
|
||
|
||
render_triangle_strip
|
||
+++++++++++++++++++++
|
||
::
|
||
|
||
template<int ParamCount>
|
||
uint32_t render_triangle_strip(rectangle const &cliprect, render_delegate callback,
|
||
int numverts, vertex_t const *v);
|
||
|
||
This method enqueues one or more *triangle* primitives for rendering, specified in strip order:
|
||
|
||
* **ParamCount** is the number of live values in the iterated parameter array within each
|
||
**vertex_t** provided; it must be no greater than the **MaxParams** value specified in the
|
||
**poly_manager** template instantiation.
|
||
|
||
* **cliprect** is a reference to a clipping rectangle. All pixels and parameter values are
|
||
clipped to stay within these bounds before being added to the work queues for rendering,
|
||
unless **POLY_FLAG_NO_CLIPPING** was specified as a flag parameter to **poly_manager**.
|
||
|
||
* **callback** is the render callback delegate that will be called to render each extent.
|
||
|
||
* **numverts** is the total number of vertices provided; it must be at least 3.
|
||
|
||
* **v** is a pointer to an array of **vertex_t** objects containing the coordinates and iterated
|
||
parameters for all the triangles, in strip order.
|
||
So if 5 vertices are provided, indicating 3 triangles, the vertices used will be:
|
||
(0,1,2) (1,2,3) (2,3,4)
|
||
|
||
**Return value:** the total number of clipped pixels represented by the enqueued extents.
|
||
|
||
render_polygon
|
||
++++++++++++++
|
||
::
|
||
|
||
template<int NumVerts, int ParamCount>
|
||
uint32_t render_polygon(rectangle const &cliprect, render_delegate callback, vertex_t const *v);
|
||
|
||
This method enqueues a single *polygon* primitive for rendering:
|
||
|
||
* **NumVerts** is the number of vertices in the polygon.
|
||
|
||
* **ParamCount** is the number of live values in the iterated parameter array within each
|
||
**vertex_t** provided; it must be no greater than the **MaxParams** value specified in the
|
||
**poly_manager** template instantiation.
|
||
|
||
* **cliprect** is a reference to a clipping rectangle. All pixels and parameter values are
|
||
clipped to stay within these bounds before being added to the work queues for rendering,
|
||
unless **POLY_FLAG_NO_CLIPPING** was specified as a flag parameter to **poly_manager**.
|
||
|
||
* **callback** is the render callback delegate that will be called to render each extent.
|
||
|
||
* **v** is a pointer to an array of **vertex_t** objects containing the coordinates and iterated
|
||
parameters for the polygon. Vertices are assumed to be in either clockwise or anticlockwise
|
||
order.
|
||
|
||
**Return value:** the total number of clipped pixels represented by the enqueued extents.
|
||
|
||
render_extents
|
||
++++++++++++++
|
||
::
|
||
|
||
template<int ParamCount>
|
||
uint32_t render_extents(rectangle const &cliprect, render_delegate callback,
|
||
int startscanline, int numscanlines, extent_t const *extents);
|
||
|
||
This method enqueues custom extents directly:
|
||
|
||
* **ParamCount** is the number of live values in the iterated parameter array within each
|
||
**vertex_t** provided; it must be no greater than the **MaxParams** value specified in the
|
||
**poly_manager** template instantiation.
|
||
|
||
* **cliprect** is a reference to a clipping rectangle. All pixels and parameter values are
|
||
clipped to stay within these bounds before being added to the work queues for rendering,
|
||
unless **POLY_FLAG_NO_CLIPPING** was specified as a flag parameter to **poly_manager**.
|
||
|
||
* **callback** is the render callback delegate that will be called to render each extent.
|
||
|
||
* **startscanline** is the Y coordinate of the first extent provided.
|
||
|
||
* **numscanlines** is the number of extents provided.
|
||
|
||
* **extents** is a pointer to an array of **extent_t** objects containing the start/stop
|
||
X coordinates and iterated parameters. The **userdata** field of the source extents is
|
||
copied to the target as well (this field is otherwise unused for all other types of
|
||
rendering).
|
||
|
||
**Return value:** the total number of clipped pixels represented by the enqueued extents.
|
||
|
||
zclip_if_less
|
||
+++++++++++++
|
||
::
|
||
|
||
template<int ParamCount>
|
||
int zclip_if_less(int numverts, vertex_t const *v, vertex_t *outv, BaseType clipval);
|
||
|
||
This method is a helper method to clip a polygon against a provided Z value. It assumes
|
||
that the first iterated parameter in **vertex_t** represents the Z coordinate. If any edge
|
||
crosses the Z plane represented by **clipval** that edge is clipped.
|
||
|
||
* **ParamCount** is the number of live values in the iterated parameter array within each
|
||
**vertex_t** provided; it must be no greater than the **MaxParams** value specified in the
|
||
**poly_manager** template instantiation.
|
||
|
||
* **numverts** is the number of vertices in the input array.
|
||
|
||
* **v** is a pointer to the input array of **vertex_t** objects.
|
||
|
||
* **outv** is a pointer to the output array of **vertex_t** objects. **v** and **outv**
|
||
cannot overlap or point to the same memory.
|
||
|
||
* **clipval** is the value to compare parameter 0 against for clipping.
|
||
|
||
**Return value:** the number of output vertices written to **outv**.
|
||
Note that by design it is possible for this method to produce more vertices than the
|
||
input array, so callers should ensure there is enough room in the output buffer to
|
||
accommodate this.
|
||
|
||
|
||
Example Renderer
|
||
----------------
|
||
|
||
Here is a complete example of how to create a software 3D renderer using **poly_manager**.
|
||
Our example renderer will only handle flat and Gouraud-shaded triangles with depth (Z)
|
||
buffering.
|
||
|
||
|
||
Types
|
||
~~~~~
|
||
|
||
The first thing we need to define is our *externally-visible* vertex format, which is distinct
|
||
from the internal **vertex_t** that **poly_manager** will define. In theory you could
|
||
use **vertex_t** directly, but the generic nature of **poly_manager**’s iterated parameters
|
||
make it awkward::
|
||
|
||
struct example_vertex
|
||
{
|
||
float x, y, z; // X,Y,Z coordinates
|
||
rgb_t color; // color at this vertex
|
||
};
|
||
|
||
Next we define the **ObjectType** needed by **poly_manager**. For our simple case, we
|
||
define an **example_object_data** struct that consists of pointers to our rendering buffers,
|
||
plus a couple of fixed values that are consumed in some cases. More complex renderers would
|
||
typically have many more object-wide parameters defined here::
|
||
|
||
struct example_object_data
|
||
{
|
||
bitmap_rgb32 *dest; // pointer to the rendering bitmap
|
||
bitmap_ind16 *depth; // pointer to the depth bitmap
|
||
rgb_t color; // overall color (for clearing and flat shaded case)
|
||
uint16_t depthval; // fixed depth v alue (for clearing)
|
||
};
|
||
|
||
Now it’s time to define our renderer class, which we derive from **poly_manager**. As
|
||
template parameters we specify ``float`` as the base type for our data, since that will
|
||
be enough accuracy for this example, and we also provide our **example_object_data** as
|
||
the **ObjectType** class, plus the maximum number of iterated parameters our renderer
|
||
will ever need (4 in this case)::
|
||
|
||
class example_renderer : public poly_manager<float, example_object_data, 4>
|
||
{
|
||
public:
|
||
example_renderer(running_machine &machine, uint32_t width, uint32_t height);
|
||
|
||
bitmap_rgb32 *swap_buffers();
|
||
|
||
void clear_buffers(rgb_t color, uint16_t depthval);
|
||
void draw_triangle(example_vertex const *verts);
|
||
|
||
private:
|
||
static uint16_t ooz_to_depthval(float ooz);
|
||
|
||
void draw_triangle_flat(example_vertex const *verts);
|
||
void draw_triangle_gouraud(example_vertex const *verts);
|
||
|
||
void render_clear(int32_t y, extent_t const &extent, example_object_data const &object, int threadid);
|
||
void render_flat(int32_t y, extent_t const &extent, example_object_data const &object, int threadid);
|
||
void render_gouraud(int32_t y, extent_t const &extent, example_object_data const &object, int threadid);
|
||
|
||
int m_draw_buffer;
|
||
bitmap_rgb32 m_display[2];
|
||
bitmap_ind16 m_depth;
|
||
};
|
||
|
||
|
||
Constructor
|
||
~~~~~~~~~~~
|
||
|
||
The constructor for our example renderer just initializes **poly_manager** and allocates
|
||
the rendering and depth buffers::
|
||
|
||
example_renderer::example_renderer(running_machine &machine, uint32_t width, uint32_t height) :
|
||
poly_manager(machine),
|
||
m_draw_buffer(0)
|
||
{
|
||
// allocate two display buffers and a depth buffer
|
||
m_display[0].allocate(width, height);
|
||
m_display[1].allocate(width, height);
|
||
m_depth.allocate(width, height);
|
||
}
|
||
|
||
|
||
swap_buffers
|
||
~~~~~~~~~~~~
|
||
|
||
The first interesting method in our renderer is ``swap_buffers()``, which returns a pointer to
|
||
the buffer we’ve been drawing to, and sets up the other buffer as the new drawing target. The
|
||
idea is that the display update handler will call this method to get ahold of the bitmap to
|
||
display to the user::
|
||
|
||
bitmap_rgb32 *example_renderer::swap_buffers()
|
||
{
|
||
// wait for any rendering to complete before returning the buffer
|
||
wait("swap_buffers");
|
||
|
||
// return the current draw buffer and then switch to the other
|
||
// for future drawing
|
||
bitmap_rgb32 *result = &m_display[m_draw_buffer];
|
||
m_draw_buffer ^= 1;
|
||
return result;
|
||
}
|
||
|
||
The most important thing here to note here is the call to **poly_manager**’s ``wait()``, which
|
||
will block the current thread until all rendering is complete. This is important because
|
||
otherwise the caller may receive a bitmap that is still being drawn to, leading to torn
|
||
or corrupt visuals.
|
||
|
||
|
||
clear_buffers
|
||
~~~~~~~~~~~~~
|
||
|
||
One of the most common operations to perform when doing 3D rendering is to initialize or
|
||
clear the display and depth buffers to a known value. This method below leverages
|
||
the *tile* primitive to render a rectangle over the screen by passing in (0,0) and (width,height)
|
||
for the two vertices.
|
||
|
||
Because the color and depth values to clear the buffer to are constant, they are stored in
|
||
a freshly-allocated **example_object_data** object, along with a pointer to the buffers in
|
||
question. The ``render_tile()`` call is made with a ``<0>`` suffix indicating that there are
|
||
no iterated parameters to worry about::
|
||
|
||
void example_renderer::clear_buffers(rgb_t color, uint16_t depthval)
|
||
{
|
||
// allocate object data and populate it with information needed
|
||
example_object_data &object = object_data().next();
|
||
object.dest = &m_display[m_draw_buffer];
|
||
object.depth = &m_depth;
|
||
object.color = color;
|
||
object.depthval = depthval;
|
||
|
||
// top,left coordinate is always (0,0)
|
||
vertex_t topleft;
|
||
topleft.x = 0;
|
||
topleft.y = 0;
|
||
|
||
// bottom,right coordinate is (width,height)
|
||
vertex_t botright;
|
||
botright.x = m_display[0].width();
|
||
botright.y = m_display[0].height();
|
||
|
||
// render as a tile with 0 iterated parameters
|
||
render_tile<0>(m_display[0].cliprect(),
|
||
render_delegate(&example_renderer::render_clear, this),
|
||
topleft, botright);
|
||
}
|
||
|
||
The render callback provided to ``render_tile()`` is also defined (privately) in our class,
|
||
and handles a single span. Note how the rendering parameters are extracted from the
|
||
**example_object_data** struct provided::
|
||
|
||
void example_renderer::render_clear(int32_t y, extent_t const &extent, example_object_data const &object, int threadid)
|
||
{
|
||
// get pointers to the start of the depth buffer and destination scanlines
|
||
uint16_t *depth = &object.depth->pix(y);
|
||
uint32_t *dest = &object.dest->pix(y);
|
||
|
||
// loop over the full extent and just store the constant values from the object
|
||
for (int x = extent.startx; x < extent.stopx; x++)
|
||
{
|
||
dest[x] = object.color;
|
||
depth[x] = object.depthval;
|
||
}
|
||
}
|
||
|
||
Another important point to make is that the X coordinates provided by extent struct are
|
||
inclusive of startx but exclusive of stopx. Clipping is performed ahead of time so that
|
||
the render callback can focus on laying down pixels as quickly as possible with minimal
|
||
overhead.
|
||
|
||
|
||
draw_triangle
|
||
~~~~~~~~~~~~~
|
||
|
||
Next up, we have our actual triangle rendering function, which will draw a single triangle
|
||
given an array of three vertices provided in the external **example_vertex** format::
|
||
|
||
void example_renderer::draw_triangle(example_vertex const *verts)
|
||
{
|
||
// flat shaded case
|
||
if (verts[0].color == verts[1].color && verts[0].color == verts[2].color)
|
||
draw_triangle_flat(verts);
|
||
else
|
||
draw_triangle_gouraud(verts);
|
||
}
|
||
|
||
Because it is simpler and faster to render a flat shaded triangle, the code checks to see
|
||
if the colors are the same on all three vertices. If they are, we call through to a special
|
||
flat-shaded case, otherwise we process it as a full Gouraud-shaded triangle.
|
||
|
||
This is a common technique to optimize rendering performance: identify special cases that
|
||
reduce the per-pixel work, and route them to separate render callbacks that are optimized
|
||
for that special case.
|
||
|
||
|
||
draw_triangle_flat
|
||
~~~~~~~~~~~~~~~~~~
|
||
|
||
Here’s the setup code for rendering a flat-shaded triangle::
|
||
|
||
void example_renderer::draw_triangle_flat(example_vertex const *verts)
|
||
{
|
||
// allocate object data and populate it with information needed
|
||
example_object_data &object = object_data().next();
|
||
object.dest = &m_display[m_draw_buffer];
|
||
object.depth = &m_depth;
|
||
|
||
// in this case the color is constant and specified in the object data
|
||
object.color = verts[0].color;
|
||
|
||
// copy X, Y, and 1/Z into poly_manager vertices
|
||
vertex_t v[3];
|
||
for (int vertnum = 0; vertnum < 3; vertnum++)
|
||
{
|
||
v[vertnum].x = verts[vertnum].x;
|
||
v[vertnum].y = verts[vertnum].y;
|
||
v[vertnum].p[0] = 1.0f / verts[vertnum].z;
|
||
}
|
||
|
||
// render the triangle with 1 iterated parameter (1/Z)
|
||
render_triangle<1>(m_display[0].cliprect(),
|
||
render_delegate(&example_renderer::render_flat, this),
|
||
v[0], v[1], v[2]);
|
||
}
|
||
|
||
First, we put the fixed color into the **example_object_data** directly, and then fill
|
||
out three **vertex_t** objects with the X and Y coordinates in the usual spot, and 1/Z
|
||
as our one and only iterated parameter. (We use 1/Z here because iterated parameters are
|
||
interpolated linearly in screen space. Z is not linear in screen space, but 1/Z is due to
|
||
perspective correction.)
|
||
|
||
Our flat-shaded case then calls ``render_trangle`` specifying ``<1>`` iterated parameter to
|
||
interpolate, and pointing to a special-case flat render callback::
|
||
|
||
void example_renderer::render_flat(int32_t y, extent_t const &extent, example_object_data const &object, int threadid)
|
||
{
|
||
// get pointers to the start of the depth buffer and destination scanlines
|
||
uint16_t *depth = &object.depth->pix(y);
|
||
uint32_t *dest = &object.dest->pix(y);
|
||
|
||
// get the starting 1/Z value and the delta per X
|
||
float ooz = extent.param[0].start;
|
||
float doozdx = extent.param[0].dpdx;
|
||
|
||
// iterate over the extent
|
||
for (int x = extent.startx; x < extent.stopx; x++)
|
||
{
|
||
// convert the 1/Z value into an integral depth value
|
||
uint16_t depthval = ooz_to_depthval(ooz);
|
||
|
||
// if closer than the current pixel, copy the color and depth value
|
||
if (depthval < depth[x])
|
||
{
|
||
dest[x] = object.color;
|
||
depth[x] = depthval;
|
||
}
|
||
|
||
// regardless, update the 1/Z value for the next pixel
|
||
ooz += doozdx;
|
||
}
|
||
}
|
||
|
||
This render callback is a bit more involved than the clearing case.
|
||
|
||
First, we have an iterated parameter (1/Z) to deal with, whose starting and X-delta
|
||
values we extract from the extent before the start of the inner loop.
|
||
|
||
Second, we perform depth buffer testing, using ``ooz_to_depthval()`` as a helper
|
||
to transform the floating-point 1/Z value into a 16-bit integer. We compare this value against
|
||
the current depth buffer value, and only store the pixel/depth value if it’s less.
|
||
|
||
At the end of each iteration, we advance the 1/Z value by the X-delta in preparation for the
|
||
next pixel.
|
||
|
||
|
||
draw_triangle_gouraud
|
||
~~~~~~~~~~~~~~~~~~~~~
|
||
|
||
Finally we get to the code for the full-on Gouraud-shaded case::
|
||
|
||
void example_renderer::draw_triangle_gouraud(example_vertex const *verts)
|
||
{
|
||
// allocate object data and populate it with information needed
|
||
example_object_data &object = object_data().next();
|
||
object.dest = &m_display[m_draw_buffer];
|
||
object.depth = &m_depth;
|
||
|
||
// copy X, Y, 1/Z, and R,G,B into poly_manager vertices
|
||
vertex_t v[3];
|
||
for (int vertnum = 0; vertnum < 3; vertnum++)
|
||
{
|
||
v[vertnum].x = verts[vertnum].x;
|
||
v[vertnum].y = verts[vertnum].y;
|
||
v[vertnum].p[0] = 1.0f / verts[vertnum].z;
|
||
v[vertnum].p[1] = verts[vertnum].color.r();
|
||
v[vertnum].p[2] = verts[vertnum].color.g();
|
||
v[vertnum].p[3] = verts[vertnum].color.b();
|
||
}
|
||
|
||
// render the triangle with 4 iterated parameters (1/Z, R, G, B)
|
||
render_triangle<4>(m_display[0].cliprect(),
|
||
render_delegate(&example_renderer::render_gouraud, this),
|
||
v[0], v[1], v[2]);
|
||
}
|
||
|
||
Here we have 4 iterated parameters: the 1/Z depth value, plus red, green, and blue,
|
||
stored as floating point values. We call ``render_triangle()`` with ``<4>`` as the
|
||
number of iterated parameters to process, and point to the full Gouraud render callback::
|
||
|
||
void example_renderer::render_gouraud(int32_t y, extent_t const &extent, example_object_data const &object, int threadid)
|
||
{
|
||
// get pointers to the start of the depth buffer and destination scanlines
|
||
uint16_t *depth = &object.depth->pix(y);
|
||
uint32_t *dest = &object.dest->pix(y);
|
||
|
||
// get the starting 1/Z value and the delta per X
|
||
float ooz = extent.param[0].start;
|
||
float doozdx = extent.param[0].dpdx;
|
||
|
||
// get the starting R,G,B values and the delta per X as 8.24 fixed-point values
|
||
uint32_t r = uint32_t(extent.param[1].start * float(1 << 24));
|
||
uint32_t drdx = uint32_t(extent.param[1].dpdx * float(1 << 24));
|
||
uint32_t g = uint32_t(extent.param[2].start * float(1 << 24));
|
||
uint32_t dgdx = uint32_t(extent.param[2].dpdx * float(1 << 24));
|
||
uint32_t b = uint32_t(extent.param[3].start * float(1 << 24));
|
||
uint32_t dbdx = uint32_t(extent.param[3].dpdx * float(1 << 24));
|
||
|
||
// iterate over the extent
|
||
for (int x = extent.startx; x < extent.stopx; x++)
|
||
{
|
||
// convert the 1/Z value into an integral depth value
|
||
uint16_t depthval = ooz_to_depthval(ooz);
|
||
|
||
// if closer than the current pixel, assemble the color
|
||
if (depthval < depth[x])
|
||
{
|
||
dest[x] = rgb_t(r >> 24, g >> 24, b >> 24);
|
||
depth[x] = depthval;
|
||
}
|
||
|
||
// regardless, update the 1/Z and R,G,B values for the next pixel
|
||
ooz += doozdx;
|
||
r += drdx;
|
||
g += dgdx;
|
||
b += dbdx;
|
||
}
|
||
}
|
||
|
||
This follows the same pattern as the flat-shaded callback, except we have 4 iterated parameters
|
||
to step through.
|
||
|
||
Note that even though the iterated parameters are of ``float`` type, we convert the
|
||
color values to fixed-point integers when iterating over them. This saves us doing 3
|
||
float-to-int conversions each pixel. The original RGB values were 0-255, so interpolation
|
||
can only produce values in the 0-255 range. Thus we can use 24 bits of a 32-bit integer as
|
||
the fraction, which is plenty accurate for this case.
|
||
|
||
|
||
Advanced Topic: the poly_array class
|
||
------------------------------------
|
||
|
||
**poly_array** is a template class that is used to manage a dynamically-sized vector of
|
||
objects whose lifetime starts at allocation and ends when ``reset()`` is called. The
|
||
**poly_manager** class uses several **poly_array** objects internally, including one for
|
||
allocated **ObjectType** data, one for each primitive rendered, and one for holding all
|
||
allocated extents.
|
||
|
||
**poly_array** has an additional property where after a reset it retains a copy of the most
|
||
recently allocated object. This ensures that callers can always call ``last()`` and get
|
||
a valid object, even immediately after a reset.
|
||
|
||
The **poly_array** class requires two template parameters::
|
||
|
||
template<class ArrayType, int TrackingCount>
|
||
class poly_array;
|
||
|
||
These parameters are:
|
||
|
||
* **ArrayType** is the type of object you wish to allocate and manage.
|
||
|
||
* **TrackingCount** is the number of objects you wish to preserve after a reset. Typically
|
||
this value is either 0 (you don’t care to track any objects) or 1 (you only need one
|
||
object); however, if you are using **poly_array** to manage a shared collection of
|
||
objects across several independent consumers, it can be higher. See below for an example
|
||
where this might be handy.
|
||
|
||
Note that objects allocated by **poly_array** are owned by **poly_array** and will be
|
||
automatically freed upon exit.
|
||
|
||
**poly_array** is optimized for use in high frequency multi-threaded systems. Therefore,
|
||
one added feature of the class is that it rounds the allocation size of **ArrayType** to
|
||
the nearest cache line boundary, on the assumption that neighboring entries could be
|
||
accessed by different cores simultaneously. Keeping each **ArrayType** object in its
|
||
own cache line ensures no false sharing performance impacts.
|
||
|
||
Currently, **poly_array** has no mechanism to determine cache line size at runtime, so
|
||
it presumes that 64 bytes is a typical cache line size, which is true for most x64 and ARM
|
||
chips as of 2021. This value can be altered by changing the **CACHE_LINE_SHIFT** constant
|
||
defined at the top of the class.
|
||
|
||
Objects allocated by **poly_array** are created in 64k chunks. At construction time, one
|
||
chunk’s worth of objects is allocated up front. The chunk size is controlled by the
|
||
**CHUNK_GRANULARITY** constant defined at the top of the class.
|
||
|
||
As more objects are allocated, if **poly_array** runs out of space, it will dynamically
|
||
allocate more. This will produce discontiguous chunks of objects until the next ``reset()``
|
||
call, at which point **poly_array** will reallocate all the objects into a contiguous
|
||
vector once again.
|
||
|
||
For the case where **poly_array** is used to manage a shared pool of objects, it can be
|
||
configured to retain multiple most recently allocated items by using a **TrackingCount**
|
||
greater than 1. For example, if **poly_array** is managing objects for two texture units,
|
||
then it can set **TrackingCount** equal to 2, and pass the index of the texture unit in
|
||
calls to ``next()`` and ``last()``. After a reset, **poly_array** will remember the most
|
||
recently allocated object for each of the units independently.
|
||
|
||
|
||
Methods
|
||
~~~~~~~
|
||
|
||
poly_array
|
||
++++++++++
|
||
::
|
||
|
||
poly_array();
|
||
|
||
The **poly_array** constructor requires no parameters and simply pre-allocates one
|
||
chunk of objects in preparation for future allocations.
|
||
|
||
count
|
||
+++++
|
||
::
|
||
|
||
u32 count() const;
|
||
|
||
**Return value:** the number of objects currently allocated.
|
||
|
||
max
|
||
+++
|
||
::
|
||
|
||
u32 max() const;
|
||
|
||
**Return value:** the maximum number of objects ever allocated at one time.
|
||
|
||
itemsize
|
||
++++++++
|
||
::
|
||
|
||
size_t itemsize() const;
|
||
|
||
**Return value:** the size of an object, rounded up to the nearest cache line boundary.
|
||
|
||
allocated
|
||
+++++++++
|
||
::
|
||
|
||
u32 allocated() const;
|
||
|
||
**Return value:** the number of objects that fit within what’s currently been allocated.
|
||
|
||
byindex
|
||
+++++++
|
||
::
|
||
|
||
ArrayType &byindex(u32 index);
|
||
|
||
Returns a reference to an object in the array by index. Equivalent to [**index**] on a
|
||
normal array:
|
||
|
||
* **index** is the index of the item you wish to reference.
|
||
|
||
**Return value:** a reference to the object in question. Since a reference is returned,
|
||
it is your responsibility to ensure that **index** is less than ``count()`` as there
|
||
is no mechanism to return an invalid result.
|
||
|
||
contiguous
|
||
++++++++++
|
||
::
|
||
|
||
ArrayType *contiguous(u32 index, u32 count, u32 &chunk);
|
||
|
||
Returns a pointer to the base of a contiguous section of **count** items starting at
|
||
**index**. Because **poly_array** dynamically resizes, it may not be possible to access
|
||
all **count** objects contiguously, so the number of actually contiguous items is
|
||
returned in **chunk**:
|
||
|
||
* **index** is the index of the first item you wish to access contiguously.
|
||
|
||
* **count** is the number of items you wish to access contiguously.
|
||
|
||
* **chunk** is a reference to a variable that will be set to the actual number of
|
||
contiguous items available starting at **index**. If **chunk** is less than **count**,
|
||
then the caller should process the **chunk** items returned, then call ``countiguous()``
|
||
again at (**index** + **chunk**) to access the rest.
|
||
|
||
**Return value:** a pointer to the first item in the contiguous chunk. No range checking
|
||
is performed, so it is your responsibility to ensure that **index** + **count** is less
|
||
than or equal to ``count()``.
|
||
|
||
indexof
|
||
+++++++
|
||
::
|
||
|
||
int indexof(ArrayType &item) const;
|
||
|
||
Returns the index within the array of the given item:
|
||
|
||
* **item** is a reference to an item in the array.
|
||
|
||
**Return value:** the index of the item. It should always be the case that::
|
||
|
||
array.indexof(array.byindex(index)) == index
|
||
|
||
reset
|
||
+++++
|
||
::
|
||
|
||
void reset();
|
||
|
||
Resets the **poly_array** by semantically deallocating all objects. If previous allocations
|
||
created a discontiguous array, a fresh vector is allocated at this time so that future
|
||
allocations up to the same level will remain contiguous.
|
||
|
||
Note that the **ArrayType** destructor is *not* called on objects as they are deallocated.
|
||
|
||
**Return value:** none.
|
||
|
||
next
|
||
++++
|
||
::
|
||
|
||
ArrayType &next(int tracking_index = 0);
|
||
|
||
Allocates a new object and returns a reference to it. If there is not enough space for
|
||
a new object in the current array, a new discontiguous array is created to hold it:
|
||
|
||
* **tracking_index** is the tracking index you wish to assign the new item to. In the
|
||
common case this is 0, but could be non-zero if using a **TrackingCount** greater than 1.
|
||
|
||
**Return value:** a reference to the object. Note that the placement new operator is
|
||
called on this object, so the default **ArrayType** constructor will be invoked here.
|
||
|
||
last
|
||
++++
|
||
::
|
||
|
||
ArrayType &last(int tracking_index = 0) const;
|
||
|
||
Returns a reference to the last object allocated:
|
||
|
||
* **tracking_index** is the tracking index whose object you want. In the
|
||
common case this is 0, but could be non-zero if using a **TrackingCount** greater than 1.
|
||
**poly_array** remembers the most recently allocated object independently for each
|
||
**tracking_index**.
|
||
|
||
**Return value:** a reference to the last allocated object.
|