Well, this project looks rather interesting. I worked with direct video memory access and was able to improve performance more than significantly. However, there are different situations, which require different approaches. If you use drawing via device context, chances to improve performance are slim. Like TextOut() and so on. Lines. dots, splines are the different story. Bitmaps and some ops with them too. One has to check different variants- the creation of a pixel map in a memory and fast copying it to the screen using FastBlt(), or using direct access to video memory. Maybe, routines for drawing splines or Bezier curves should be rewritten. Bottlenecks may present in many places. Shadowing, gradient fills definitely require new code. I used fast Fourier transform to increase performance for an interesting object, known as Diffusion Curves (you may find corresponding info at SVGOpen2010 conference's archive, Fourier transform in SVG, if I correctly remember how this topic was named).