Change the sign of go (or in other terms a12 and a21 matrix stencil
elements). This should make further optimization of matrix population
easier.
In addition hopefully improve the readability of the code by sacrifying
overloads for more verbose member names.
Memory management in plib is now alignment-aware. All allocations
respect c++11 alignas. Selected classes like parray and aligned_vector
also provide hints (__builtin_assume_aligned) to g++ and clang.
The alignment optimizations have little impact on the current use cases.
They only become effective on bigger data processing.
What has a measurable impact is memory pooling. This speeds up netlist
games like breakout and pong by about 5%.
Tested with linux, macosx and windows cross builds. All features are
disabled since I can not rule out they may temporarily break more exotic
builds.
Set USE_MEMPOOL to 1 to try this (max 5% performance increase).
For mingw, there is no alignment support. This triggers -Wattribute
errors which due to -Werror crash the build.
At least on macosx memory used by an object seems to be invalidated
before the dtor is executed. This of course is deadly for child objects
with references to the parent-in-deletion which may call back into the
parent.
One of the worst issues I had to fix. Ever. Lesson learnt: No tricks in
dtors. Never.
startup strategies. This determines the order of device triggering.
0: Full - trigger all delegates. Next all devices not touched.
1: Backwards - trigger all devices backwards (only update delegate)
2: Forward - trigger all devices forward (only update delegate)
- convert macros to c++ code.
- order of device creation should not depend on std lib.
- some state saving cleanup.
- added support for clang-tidy to makefile.
- modifications triggered by clang-tidy-9.
Useful for debugging purposes in the end - but not performance.
/*! Store input values in logic_terminal_t.
*
* Set to 1 to store values in logic_terminal_t instead of
* accessing them indirectly by pointer from logic_net_t.
* This approach is stricter and should identify bugs in
* the netlist core faster.
* By default it is disabled since it is not as fast as
* the default approach.
*
*/
#define USE_COPY_INSTEAD_OF_REFERENCE (0)
- Removed dead code.
- nltool now adds a define NLTOOL_VERSION. This can be tested in
netlists. It is used in kidniki to ensure I stop committing
debug parameters.
- Optimized the proposal for no-deactivate hints.
- Documented in breakout that hints were manually optimized.
- Minor optimizations in the order of 2% enhancement.
Still some work ahead to separate interface from execution. This is a
preparation to switch to another sparse matrix format easily which may
be better suited for parallel processing.
On the linear algebra side there are some nice additions:
- Two additional sort modes: One tries to obtain a upper left identity
matrix, the other prefers a diagonal band matrix structure. Both deliver
slightly better performance than just sorting.
- Parallel execution analysis for Gaussian elimination and LU solve.
This determines which operations may be done independently.
All of this is not really useful right now. The matrix sizes are below
100 nets. I estimate that we at least need four times more so that CPU
parallel processing overhead pays off. For GPU, add another order. But
it's nice to have code which may scale.
- src/lib/netlist/solver/mat_cr.h:143:32: error: call to 'abs' is ambiguous
- src/lib/netlist/solver/nld_ms_direct.h:62:19: error: non-type template argument evaluates to 18446744073709551488, which cannot be narrowed to type 'int' [-Wc++11-narrowing]