Rafi Khan

Senior Software Engineer | Ottawa, Canada

Supporting society through innovation

Making it go faster

This evening I wanted to make one of my drawing algorithms faster.

I’m not sure everyone is interested in performance testing, so for now I’ll keep it brief. If you want to learn more reach out 🙂

I was writing a curve smoothing function using Chaikin’s algorithm.

My reference code for the initial implementation


function chaikin(arr, num) {
    if (num === 0) return arr;
    const l = arr.length;
    const smooth = arr.map((c, i) => {
        return [
            [0.75 * c[0] + 0.25 * arr[(i + 1) % l][0], 0.75 * c[1] + 0.25 * arr[(i + 1) % l][1]],
            [0.25 * c[0] + 0.75 * arr[(i + 1) % l][0], 0.25 * c[1] + 0.75 * arr[(i + 1) % l][1]]
        ];
    }).flat();
    return num === 1 ? smooth : chaikin(smooth, num - 1);
}

Some rough notes on me figuring out how it works – the first time I learned about this algorithm was last night.

; explanation
; let i = 0 .. arr.length
; let p = arr[i]

; let n = arr[i+1] // next point

; if n is the last point
;   return n without smoothing
; else 
;   return s

; // s is 2 new points to use instead of the first point
; s = [ 
;  s1: 
;   x= 0.75 * p.x  + 0.25 * n.x
;       y= 0.75 * p.y +  0.25 * n.y
;  s2:
;   x = 0.25 *p.x + 0.75 * n.x
;   y = 0.25 * p.y + 0.75 * n.y
;  ]

What are these weird semi-colons? I really like programming in lisp 🙂 Here is the initial implementation, using recursion, some async (defun is async in this lisp) and vec objects. A run of 100 points with 3 iterations took 24ms, this is far from reaching my target of 120Hz rendering.

Speaking of numbers, let’s define some. Here are my working notes in source code, trying to figure out what numbers we want.

;; Performance Budget
;; 30Hz - 33.33 ms per frame * 1/8 = 4.17ms
;; 60Hz - 16.67 ms per frame * 1/8 = 2.08ms
;; 90Hz - 11.11 ms per frame * 1/8 = 1.39ms
;; 120Hz - 8.33 ms per frame * 1/8 = 1.04ms

I am targeting a refresh rate of 120 Hz. Is this ambitious? Maybe, but I’m sure to learn a lot along the way. Plus it’s a side project, why settle for less 🙂

So for 120 Hz my entire budget for the frame is 8.33ms. But this includes more than just my smoothing algorithm. Using the magic number 8, I divided by budget and decided I have 1.04ms to run my smoothing algorithm.

(defun smooth (points_arr current index)
   ;; open loop, don't smooth the last point
   (if (eq (length points_arr) (+ index 1))
       (vec current.x current.y)
       
       (let ((next (nth (+ index 1) points_arr)))
          [ (vec (+ (* current.x 0.75) (* next.x 0.25))
            (+ (* current.y 0.75) (* next.y 0.25)))
           (vec (+ (* current.x 0.25) (* next.x 0.75))
            (+ (* current.y 0.25) (* next.y 0.75)))])))

(defun flat (arr)
   (-> arr `flat))
   

(defun chaikin (arr num)
   (if (eq num 0)
       arr
       (progn
          (defvar smoothed_points (flat (map (lambda (current index)
                        (smooth arr current index))
                     arr)))
          
          (if (eq num 1)
              smoothed_points
              (chaikin smoothed_points, (- num 1))))))

Now let’s optimise. I tried a number of things, changing objects to arrays, rewriting all code to be synchronous and changing recursions to iterations. Each of these changes significantly impacted performance. I want to pick the biggest win… but it’s hard. Probably, changing async to sync. The JIT was doing a really good job of handling recursions but for some reason it was losing out on the function calls. Which probably also limited the usefulness of the rest of the optimisations. A good question to ask, why am I even using async? These are all math operations on array like data… how tired was I? haha. Luckily I can say this wasn’t all bad programming. The lisp I am using is async by default, which can be really nice for general programming, but it breaks down once you need performance.

All of these optimisations got us from 24.376 ms down to ~1.0352 ms. A speed up of ~20 times. This brings us within the budget of 1.04 ms to render at 120 Hz.

I’ve pasted my final code below.



defun_sync smooth_arr (points_arr current index)
   (progn 
   ;; open loop, don't smooth the last point.
   (if (eq (length points_arr) (+ index 1))
       [[current.0 current.1]]
       
       (let ((next (prop points_arr (+ index 1) )))
          [ [ (+ (* current.0 0.75) (* next.0 0.25))
            (+ (* current.1 0.75) (* next.1 0.25))]
           [ (+ (* current.0 0.25) (* next.0 0.75))
            (+ (* current.1 0.25) (* next.1 0.75))]]))))

(defun_sync flat (arr)
   (-> arr `flat))
   

(defun_sync chaikin_iterative (arr num)
      (progn 
         (for_each (r (range num))
            (setq arr (flat
                    (for_each (i (range arr.length))
                       (smooth_arr arr (prop arr i) i)))))
      arr))

This was a very satisfying experience. I learned a lot about writing performance sensitive code and found an excuse to share some lisp code.

Let’s be honest, it was the last one that got me to write this post.


Posted

in

by

Tags: