As a way to demonstrate Hybridizer’s capabilities, we wrote a simple Windows demo application for fractal rendering.
Demo application
Compute code is written in C# and features floating point arithmetic, function calls, conditions, loops, Parallel.For and bit manipulations:
[Kernel]
public static int IterCount(double cx, double cy, int maxiter)
{
int result = 0;
double x = 0.0;
double y = 0.0;
double xx = 0.0, yy = 0.0;
while (xx + yy <= 4.0 && result < maxiter)
{
xx = x * x;
yy = y * y;
double xtmp = xx - yy + cx;
y = 2.0 * x * y + cy;
x = xtmp;
result += 1;
}
return result;
}
[EntryPoint]
public static unsafe void Render(uint* output,
double fx, double fy, double sx, double sy,
int height, int width, int maxiter)
{
Parallel.For(0, width * height, tid =>
{
int i = tid / width;
int j = tid - i * width;
double hx = sx / (double)width;
double hy = sy / (double)height;
double cx = fx + hx * j;
double cy = fy + hy * i;
output[tid] = GetColor(IterCount(cx, cy, maxiter), maxiter);
});
}
[Kernel]
public static uint GetColor(int iterCount, int maxiter)
{
if (iterCount == maxiter)
{
return 0;
}
return ((uint)(iterCount * (255.0 / (double)(maxiter - 1)))) << 8;
}
This is embedded in a simple Windows Forms application with user controls (zoom/unzoom, increase iter count). User can choose which code is executed using radio buttons on the left:

There are four options:
- C#: plain C# code
- CUDA: CUDA code generated by Hybridizer, running on the most recent GPU in your machine
- AVX: native C++ generated by Hybridizer, specialized for AVX instructions
- AVX2: native C++ generated by Hybridizer, specialized for AVX2 instructions (Fused Multiply-Add are reconstructed)
For each frame, a high resolution clock measures computation time and displays it:
Performances
For more accurate measures of Hybridizer performances, please see our blog posts about mandelbrot and hybridizer versus Numerics.Vector.
Measured in double precision with 10K iterations — GeForce 1080 Ti — Core i7 4770S
| Flavor | Rendering time (ms) | Speed-up |
|---|---|---|
| C# | 2871 | 1 |
| AVX | 945 | 3.03 |
| CUDA | 197 | 14.6 |
If we have a look at generated assembly, we can see that most of the code is vectorized for AVX2:
C5 9D 59 ED vmulpd ymm5,ymm12,ymm5
C4 E2 C5 B8 AC 24 vfmadd231pd ymm5,ymm7,ymmword ptr [rsp+520h]
C5 D5 58 AC 24 20 vaddpd ymm5,ymm5,ymmword ptr [rsp+0B20h]
C5 7D 28 AC 24 20 vmovapd ymm13,ymmword ptr [rsp+0D20h]
C4 41 15 58 ED vaddpd ymm13,ymm13,ymm13
C4 62 D5 98 AC 24 vfmadd132pd ymm13,ymm5,ymmword ptr [rsp+720h]
C5 E5 5E 84 24 80 vdivpd ymm0,ymm3,ymmword ptr [rsp+180h]
C5 FD 29 84 24 C0 vmovapd ymmword ptr [rsp+0C0h],ymm0
C4 C1 15 5C EC vsubpd ymm5,ymm13,ymm12
C4 41 65 5E E8 vdivpd ymm13,ymm3,ymm8
C5 DD 5C D2 vsubpd ymm2,ymm4,ymm2
C4 41 65 5E F1 vdivpd ymm14,ymm3,ymm9
C5 CD 5C E1 vsubpd ymm4,ymm6,ymm1
C5 7D 28 84 24 A0 vmovapd ymm8,ymmword ptr [rsp+0AA0h]
Source Code
Find the source code for this application on GitHub: hybridizer-basic-samples — MandelbrotRenderer
Run it
If you have an AVX-compatible CPU or a CUDA-enabled GPU, you can download and run this pre-built binaries: FractalRenderer