Skip to content


Profiling an interactive simulation in Haskell – part 1

Abstract

This post is a feasibility test for developing bigger computer games using functional reactive programming (a.k.a. functional temporal programming) with the Yampa library. I use GHC performance profiling on a small interactive computer game simulation which implements animation, rendering, collision detection, artificial intelligence and game object management on a very basic level (see Robotroannah video). The focus is however on Yampa specific code, i.e. animation, AI and object management logic. The game will be scaled up to several hundred objects with different behaviours where one frame has to be calculated on a “reasonable speed”. I don’t want to make it just run “smooth enough overall” because physics and rendering will be replaced in future anyway.

Instructions

  1. “You can’t optimize what you can’t measure”
  2. Define goals (60 fps, profile only module X etc.) and simplify to the necessary parts.
  3. Understand time profiling, heap profiling and cost centers.
  4. Create reproducable conditions in interactive programs.
  5. Compile with ghc --make myprog.hs -o myprog -prof -O2 -fforce-recomp
  6. Use -auto, -auto-all or -caf-all to automatically add additional “interesting” cost-centers. I prefer -auto as the other produce lots of spam.
  7. Use -fforce-recomp compiler flag when adding {-# SCC "COST_CENTER_NAME" #-}.
  8. Performance profile with ./myprog +RTS -p. Open myprog.prof.
  9. Heap profile with ./myprog +RTS -h; hp2ps myprog.hp. Open myprog.ps
  10. End all unnecessary processes to produce stable conditions and profile several times to get stable results.
  11. Always document conditions and code changes in profiling reports.
  12. Always profile with -O2 compiler options. Otherwise you may optimize the wrong program parts.

Lessons learned: Even if Haskell is still in heavy development and it’s okay to question the profiler, investigate it’s strange claims first and see if it resolves the bottlenecks. If it does: good :). If not, question the tool.

Open questions

  • How can I define the filename of profiling output? (possibly with > shell operator)
  • How can I automatically numerate files?
  • How can I guarantee that certain code parts are always evaluated? (possibly with strictness)
  • How can I sort by constant and linear heap consumption in hp2ps?

GHC 7
In GHC 7 the -rtsopts compiler flag is used to activate runtime options. I found a bug in profiling however and had to switch back to GHC 6.12.1. Bug report: bug report “<<loop>> when compiling with -O option with ghc-6.10.0.20081019″

Protocol

Status quo

The original version of the computer game “Robotroannah” implements simple animation (animateLoop, animateLoopFrame), AI (knightObj), physics (observeTypeCollisions) and object management (createReq, deleteReqs). The rendering takes 3 render steps (legs, body, head) with a fixed-frame rate of 60Hz using SDL.delay to “sleep away” available frame time (1000/60 = 16.67ms). Spritesheet resources are loaded once but passed to each object every cycle(!) right now. Object management logic uses the dpSwitch from The Yampa Arcade.

Goal

  • First performance profiling
  • First heap profiling
  • Investigate heap consumption
  • Profile 100 mines in isolation (animation only)
  • Profile 100 knights in isolation (animation + AI)
  • Profile object management in isolation

Performance profiling 1

In performance monitoring we know that “you can’t optimize what you can’t measure” and that “we must not make any assumptions about performance bottlenecks”… and I have a good real world example for this: I implemented a resource manager to avoid passing the spritesheet resources to every object as I assumed it would take alot of memory which later turned out not to be a problem at all (which in turn I assume because of laziness).

First I implemented a frame counter which sufficed as a simple indicator for slow performance. The game was indeed slow and took 18ms to render a single frame with only 3 objects(!) (target is 16.67ms).

In order to create reproducable results I had to create some sort of “user-input recording” to replay the same simulation but I eventually just used “no input at all” for now and automatically killed the simulation after exactly 10 seconds. I also removed the “sleep away available frame time”. Thus the problem of “profiling an interactive program” turned into a problem of “profiling of rendering 6000 frames as-fast-as-possible” (60fps * 10sec = 6000).

To simplify the scene I removed the player object and increased the number of objects to 100 “mine objects”. I also replaced the object management dpSwitch with a parB to exclude performance impacts from object creation and deletion. This may produce an artificial situation however as Haskell may do some optimization (like caching) which will not happen when there are different objects.

    Mon Jan  3 19:53 2011 Time and Allocation Profiling Report  (Final)

       robotron +RTS -p -RTS

    total time  =        5.76 secs   (288 ticks @ 20 ms)
    total alloc = 501,932,352 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

outputSDL                      Venice.Subsystem.SDL  57.3   26.6
observeTypeCollisions          Venice.Process        12.2   11.0
animateLoop                    Venice.Process         7.6   16.3
animateLoopFrame               Venice.Process         7.3    8.3
mineObj                        Robotron.Objects       4.9   14.1
main                           Main                   3.5    4.0
core                           Main                   1.7    8.4
playerObj                      Robotron.Objects       1.4    1.2
elemsIL                        Venice.IdentityList    1.4    4.3
body                           Robotron.Types         1.0    0.3

Summary

  • 60% spent in rendering which is plausible. I can ignore rendering, but I added more cost-centers to further investigate just out of curiosity.
  • 15% spent in physics which is also plausible and I also can ignore.
  • 15% spent in animation which is strange I guess.
  • 500MB total allocation with 6000 frames! What the heck?

Heap profiling 1

(Nevermind the inconsistencies of the date and objects please! I just forget to keep the original reports but they look similar.)

As you can see there are some space leaks as the heap consumption grows lineary even though no objects are added. I tested animateLoop and animateLoopFrame in isolation which turned out to have constant consumption. These space leaks actually took me quite some time to track down. What I should have done actually is focus on the calling function – mineObj (or cometObj). I originally assumed that it only does some initialization and assignment which couldn’t have any impact on the performance. This however contradicted with the profiling report (5% spent in mineObj). I later indeed found the cause in mineObj: Every frame I do a conversion from spritesheet space to SDL rendering space, which actually could be done once-and-forall in the initialization step. The new heap profile looks far better.

The new performance profile still looks the same however :/

    Sun Jan  9 21:39 2011 Time and Allocation Profiling Report  (Final)

       robotron +RTS -p -RTS

    total time  =        6.92 secs   (346 ticks @ 20 ms)
    total alloc = 496,279,440 bytes  (excludes profiling overheads)

COST CENTRE                    MODULE               %time %alloc

outputSDL                      Venice.Subsystem.SDL  64.7   30.8
animateLoop                    Venice.Process         8.4   18.9
observeTypeCollisions          Venice.Process         6.9    1.2
mineObj                        Robotron.Objects       6.4   15.6
animateLoopFrame               Venice.Process         5.8    9.5
main                           Main                   2.6    4.9
core                           Main                   2.0   10.1
body                           Robotron.Types         1.2    0.4
output                         Main                   0.3    1.1
elemsIL                        Venice.IdentityList    0.0    5.2

A mystery I will resolve another time…

References

Tagged with , , , .


2 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.

  1. Simon Michael says

    Looking forward to your next post!

  2. Candice says

    Greetings from Florida! I’m bored at work so I
    decided to browse your blog on my iphone during lunch
    break. I really like the information you provide here and can’t wait to take a look when I get home.
    I’m shocked at how fast your blog loaded on my mobile ..

    I’m not even using WIFI, just 3G .. Anyhow, superb site!



Some HTML is OK

or, reply to this post via trackback.