## Friday, March 14, 2008

### DrawCircle Performance Tuning

A while ago a friend of mine asked me an interesting programming question: "How would you implement a DrawCircle() function?". "I'd loop and invoke Sin() and Cos() in order to obtain coordinates" I answered. "OK", he replied, "and how would you make it fast? I have heard of an incremental way to do this, without the need for floating point operations at all." (that was on some embedded hardware back then, where floating point operations were really expensive)

This kept me thinking. So why not give it a try right here and now! Let's start with something like this (C#, .NET 2.0 - using a managed environment is OK, as .NET is guaranteed to JIT this code immediately anyway; floating point operations are much faster on modern hardware, still we should see some effect). For sake of brevity, let's draw our circles centered at (x = 0, y = 0):

```private static void DrawCircle1(int radius)
{
double angle = 0;

do
{
double rad = angle * Math.PI / 180.0;

DrawPixel(x, y);

angle++;
} while (angle < 360);
}```

Sure enough, a circle appears on the screen. It looks weird - the line is dotted, not solid. Right, incrementing the angle by 1 each time might imply advancing too fast. An increment value better suited is the angle between two pixels on the circle. Something like this:

```private static void DrawCircle2(int radius)
{
double angle = 0;

double step = Math.Asin(1.0 / radius) *
180.0 / Math.PI;

do
{
double rad = angle * Math.PI / 180.0;

DrawPixel(x, y);

angle += step;
} while (angle < 360);
}```

This works, so that's something to start with. To get an idea about its speed, let's ignore the call to DrawPixel() for a second (and assume that we have a lightning-fast DrawPixel() implementation at that point - and by lightning-fast I mean inlined native code that just does something like an indexed array lookup, a little pointer arithmetic, and one or two bitwise operations on a bitmap bytearray, not slow stuff like SetPixel(), Bitmap.SetPixel() or BufferedImage.setRGB()). Oh yeah, and no synchronous screen refreshing ;-)

Some profiling tells us that his function can calculate the pixels for 160,000 circles (of radius 100) per second (all numbers refer to release mode compiling).

Even with floating point processing units, floating point operations can be expensive in a certain sense - that is, avoiding them altogether, could still cause a measurable improvement. Sure we can build up a large lookup array with pre-calculates Cos- and Sin-values, and interpolate in between them. I used to do that on other occasions, e.g. in my 3D chat project. But here there is no way of telling in advance how precise the lookup array needs to be, and with increasing precision the table will grow in size too.

Another way of speeding up the algorithm is to take advantage of the fact that a circle is symmetric. Wouldn't it be enough to calculate one quarter of the circle, and then just mirror it on the x- and y-axis. But wait - what about calculating only one eighth of the circle, and mirroring it on the diagonal, too? No sooner said than done:

```private static void DrawCircle3(int radius)
{
double angle = 0;

double step = Math.Asin(1.0 / radius) *
180.0 / Math.PI;

do
{
double rad = angle * Math.PI / 180.0;

DrawPixel(x, y);
DrawPixel(x, -y);
DrawPixel(-x, y);
DrawPixel(-x, -y);
DrawPixel(y, x);
DrawPixel(y, -x);
DrawPixel(-y, x);
DrawPixel(-y, -x);

angle += step;
} while (angle <= 45);
}```

This gives us something close to the expected 8-fold speedup (1,200,000 circle calculations per second to be exact). Still it seems like a dead end. We need a different approach.

Let's search for a way to draw a circle without the need of invoking Sin() and Cos(). What does a circle function look like? It's

`x^2 + y^2 = 1`

so:

`y = sqrt(1 - x^2)`

Let's also take advantage of the fact that we know that x is always incremented while looping. How do we know that? Because we just calculate 1/8th of the circle. We are starting at coordinate (x = 0, y = radius), hence on each step will either move right only, or move right and down.

```private static void DrawCircle4(int radius)
{
double x = 0;

do
{
DrawPixel(x, y);
DrawPixel(x, -y);
DrawPixel(-x, y);
DrawPixel(-x, -y);
DrawPixel(y, x);
DrawPixel(y, -x);
DrawPixel(-y, x);
DrawPixel(-y, -x);

x++;
//cast to get rid of decimal places
y = (int)((Math.Sqrt(1.0 -
} while (x <= y);
}```

100,000 circles per second, what a disappointment. I did some quick check, it seems that Math.Pow() does not have dedicated FPU-support, so this might be a reason.

Still this might open a new perspective. Having a look at two neighboring pixels on the circle, what does make them neighbors? Their x- and y-coordinates differ by 1, either x does, or y, or both. Starting at a well-known pixel, e.g. (x = 0, y = radius) we could move along the circle just be deciding whether we increment x and leave y as it is, or decrement y and leave x as it is, or increment x and decrement y.

Once more the circle function:

`x^2 + y^2 = 1`

which equates to:

`x^2 + y^2 - 1 = 0`

Drawing a perfect circle on a discrete field of pixels, there is always a deviation, an error. An error e that can be expressed like this:

`x^2 + y^2 - 1 = e`

The pixel we draw is either inside the perfect circle or outside. When we have moved outside the perfect circle, we have to pull back to the inside, and vice versa. In each iteration, we must keep e as close to 0 as possible.

How does e change on each iteration? Let's assume we increment x and decrement y, so the next pixel is at (x + 1, y - 1):

```e = x^2 + y^2 - 1
nextE = (x + 1)^2 + (y - 1)^2 - 1
= (x^2 + 2x + 1) + (y^2 - 2y + 1) - 1```

The difference between nextE and e is influenced by the following factors:

```dx = (x^2 + 2x + 1) - x^2 = 2x + 1
dy = (y^2 - 2y + 1) - y^2 = -2y + 1```

This is how nextE looks like when x is incremented and y is decremented:

`nextE = e + dx + dy`

If we decide only to increment x and leave y as it is, nextE would be:

`nextE = e + dx`

And in case we only decrement y, nextE is:

`nextE = e + dy`

At this point we just have to figure out whether e stays smallest by moving along the x-axis only, or the y-axis only, or both.

Wow, no need to invoke Sin(), Cos(), Sqrt() or Pow(), no need for floating point functions or values, we can do it all with integers only. Note that we could even avoid the multiplications by a factor of 2 simply by writing (x + x) and (- y - y), but compilers normally replace that by a left-shift anyway. Also, we do not invoke Math.Abs(), but get rid of minus signs on our own.

```private static void DrawCircle5(int radius)
{

int e = 0;
int x = 0;

do
{
DrawPixel(x, y);
DrawPixel(x, -y);
DrawPixel(-x, y);
DrawPixel(-x, -y);
DrawPixel(y, x);
DrawPixel(y, -x);
DrawPixel(-y, x);
DrawPixel(-y, -x);

int dx = 2*x + 1;
int dy = -(2*y) + 1;

int e1 = e + dx;
int e2 = e + dx + dy;
int e3 = e + dy;

e1 = e1 < 0 ? -e1 : e1;
e2 = e2 < 0 ? -e2 : e2;
e3 = e3 < 0 ? -e3 : e3;

if (e1 <= e2 && e1 <= e3)
{
x++;
e += dx;
}
else if (e2 <= e1 && e2 <= e3)
{
y--;
x++;
e += dx + dy;
}
else
{
y--;
e += dy;
}
} while (x <= y);

}```

2,100,000 circle calculations a second. It cannot possibly get better, can it?

Yes it can. Similar to DrawCircle4(), when we always incremented x because of calculating 1/8th of the circle only, we can assume here that we will never enter the last else-block in DrawCircle5(), as it does not increment x either.

And since we are always moving to the right incrementing x, the delta of e depends entirely on whether we also move down and decrement y. The change in the error value caused by decrementing y is (-2*y + 1), so let's choose to decrement y each time e is greater or equal to y, as this will move e closer to 0 again.

All of this allows us to get rid of several statements, and some local variables too. Also, let's do the left-shifting explicitly this time instead of multiplying by 2 (as mentioned above, we may expect our compiler to take care of that, but I have experienced cases when that did not happen, e.g. when writing -2*y instead of -(2*y)).

```private static void DrawCircle6(int radius)
{
int e = 0;
int x = 0;

do
{
DrawPixel(x, y);
DrawPixel(x, -y);
DrawPixel(-x, y);
DrawPixel(-x, -y);
DrawPixel(y, x);
DrawPixel(y, -x);
DrawPixel(-y, x);
DrawPixel(-y, -x);

if (e < y)
{
e += (x<<1) + 1;
}
else
{
e += (x - y + 1)<<1;
y--;
}
x++;
} while (x <= y);

}```

6,400,000 circles per second. It is now 40 times faster than at the beginning.

I guess this is a good example that there is always room for improvement, and the solution that comes to mind first probably is not the quickest.

By the way, if you know an even faster algorithm, please drop me a line, and I will post a follow-up.

Update 2011:

I did some further research recently, and found out that what I did back then is kind of a variant of the Midpoint Circle Algorithm, dating back to 1967. Plus after all, I had received two initial hints, namely that there should be an algorithm without the need of floating point operations, and that it might work in a kind of incremental way. So I can't really claim it to be a "stand-alone" creation [ ;-) ].