A visualization of a navigation layer used by the platforming and collision detection modules. I realize I haven’t really shown much footage of the engine as it is, so I uploaded some clips to Twitter here. (Please excuse the poor video quality, I’ll have to find a better way to capture video at some point.)
Work Since Last Devlog
Collision Detection
I rewrote the collision detection / resolution and platforming code to better support sloped tiles. I made two attempts. The first time, I tried to do it without a dedicated navigation layer. While it seemed pretty solid at first, it turned out to have some undesirable positioning bugs in certain use cases.
First Attempt
Besides refactoring, there were two major parts to the issue:
1: Allowing actors to selectively pass through certain solid blocks, when these blocks would otherwise interfere with traversing sloped terrain.
For example: In the following two images, the bounding-box rests on a slope at its bottom-center point, and passes through the nearby solid blocks in its lower-left quadrant. In the images that follow, the solid blocks prevent the box from intersecting.
2: Maintaining continuity of movement when an actor travels across areas which are perceived by the user as a single surface, but which are just tiles in a map with no context for what they constitute.
In most cases (without very high-velocity movement, at least), we really don’t want the player or other actors to disconnect from the ground when walking up and down slopes on different vertical planes. Even if disconnecting might be closer to a realistic physics simulation, it can interfere with jumps and other actions.
This picture shows a surface which spans three rows on the map. We would expect a player character to remain attached to it while walking across, even when moving at a fairly quick rate:
My first approach for problem #1 was to change how I do collision detection against solids. I used to just check if edges of the actor’s bounding box intersect solid tiles. I changed it so that the checks start at the middle of the actor’s box, and fan out to the left and right. If a slope tile is encountered whose far-end is at the top coordinate possible, such that it fully covers any solids behind it, then those solids are ignored for this specific collision check.
There can be unwanted and unexpected behavior as a result of this if your actor’s hitbox is much larger than the tiles it’s walking over. But with the old system, this sort of hitbox would just fail to climb slopes because they’d bump into the surrounding solids anyways.
A diagram I threw together while trying to figure this out. I’ve since chucked this entirely in favour of a navigation layer.
For problem #2, I tried to track the kind of terrain at three points:
- BC: Terrain at the actor’s bottom-center coordinate
- ABC: Terrain at tile above BC
- BBC: Terrain at tile beneath BC
I thought that some compound if statements using these three would be enough to figure out the context, but I was still having problems with actors landing on slopes from the air at high-velocities.
Second Attempt
Attempts to patch bugs in the per-tick code with this approach weren’t working well at all. I’d maybe fix one thing, then one or two other problems would show up. No good. So I added a navigation layer for each map, which stores a simplified representation of the map just for the collision system.
The biggest difference is that solid tiles now block movement on each of their four sides independently, and which side is blocking is determined at map initialization by looking at the kind of terrain in each of the tile’s neighbours. The layer is implemented as a set of Lua sparse arrays, one for each property. Currently the following can be checked:
- Block Ingress At Top
- Block Ingress At Bottom
- Block Ingress At Left
- Block Ingress At Right
- Tile contains a slope
Actors can no longer snag on diagonal ground, because those horizontal collision flags are never set when it’s determined that the solid tile is adjacent to a slope or another solid. It also doesn’t require the fragile “check for slopes in the middle, and selectively ignore solid tiles under the right circumstances” collision pass that I was doing earlier.
Another visualization of the navigation map.
I’m not done, because there are still a few remaining placement issues with certain layouts that I haven’t resolved. I really wanted to get it done this weekend, and I’m not happy that this devlog isn’t closing the book on these problems. I’m honestly pretty burned out and would rather focus on something else, but it’s important to get this working right. At least I’ve resolved those high-velocity collision bugs, and the collision code in general is now a lot clearer and readable than it used to be.
Scenes and Actors
In the scene structure, I added an array which manages the order in which actors are processed, independent of the actor array list. This lets actorAdd() / actorRemove() recycle stale actor tables without impacting the order in which newly-spawned actors are run, or messing with actor-to-actor targeting handles. When an actor is destroyed or removed by something mid-tick, the actual removal is deferred to the end of the tick so that other references don’t get mangled. Removed actors are then purged from the actor order array.
Troubleshooting Wild Goose Chase
For testing purposes, I have a debug key configured to rapidly spawn NPC actors from the player’s location. I noticed that performance tanks badly in the main testing area, immediately after startup, even with only about 100 – 120 actors. After hammering the scene with actors for a bit, the issue clears up, and I can get over 200 without dropping below 100FPS. The weird thing is that I can switch to two other test areas, do the same thing, and not experience the same kind of performance dip. I have no idea what’s causing this.
I decided to start by watching Lua memory usage with collectgarbage(“count”) and overall program memory by looking at System Monitor. It was going up pretty fast, about 98KB per second, but surely that’s not enough to tank the whole game? The memory consumption continued even when in the pause menu and debug overlay, so it’s probably not related, but I had to drop this and investigate anyways because it’s not good to have unexplained allocations like that.
Quick aside: So Lua is a garbage-collected language, and eventually, the GC will clean up discarded data that no other part of the program has a reference to. I confirmed this is just discarded tables and not a memory leak by running a GC pass and finding that memory usage went down. This is not the end of the world, but you generally don’t want to increase the rate of GC collection events in an action game when it can be helped, as it can theoretically cause stuttering. My policy so far has been 1) generate garbage freely at initialization, 2) generate garbage freely with debug calls, 3) avoid generating unnecessary garbage the rest of the time. Point 3 can be difficult because a lot of things in Lua can generate garbage, including many string operations.
I recalled using This nifty Lua profiler with Hibernator to locate performance bottlenecks, and decided to see if it would offer any insight here, even though it’s a performance tester and not a memory trace.
At the very top of the profiler list is the Lua string library:
+-----+----------------------------------+----------+--------------------------+----------------------------------+ | # | Function | Calls | Time | Code | +-----+----------------------------------+----------+--------------------------+----------------------------------+ | 1 | [string "boot.lua"]:493 | 2048 | 1.017661 | [string "boot.lua"]:493 | | 2 | draw | 2048 | 0.431357 | bolero_core/bol_main.lua:496 | | 3 | update | 2048 | 0.27298900000001 | bolero_core/bol_main.lua:315 | | 4 | sceneDraw | 2048 | 0.12826 | bolero_core/bol_gfx.lua:259 | | 5 | update | 2048 | 0.098133999999994 | bolero_core/bol_audio.lua:1512 | | 6 | updateChannels | 2048 | 0.088410000000006 | bolero_core/bol_audio.lua:1063 | | 7 | actorsDraw | 2048 | 0.079627999999989 | bolero_core/bol_gfx.lua:436 | | 8 | runFrame | 2048 | 0.070652000000004 | bolero_core/bol_main.lua:256 | | 9 | draw | 2048 | 0.04282700000002 | bolero_core/bol_overlay.lua:759 | | 10 | updateOneChannel | 18432 | 0.040891000000038 | bolero_core/bol_audio.lua:730 | | 11 | runTick | 800 | 0.039415000000012 | bolero_core/bol_main.lua:292 | | 12 | setScissor | 2048 | 0.028544000000016 | bolero_core/bol_gfx.lua:776 | | 13 | run | 800 | 0.023414 | bolero_core/bol_scene.lua:638 | | 14 | draw | 2048 | 0.016062999999982 | behavior/menu_pause.lua:163 | | 15 | cursorHideFrame | 2048 | 0.015260999999996 | bolero_core/bol_mouse.lua:138 | | 16 | setFont | 2048 | 0.012840000000001 | bolero_core/bol_font.lua:386 | | 17 | keyFrameHeldUpdate | 2048 | 0.011291999999989 | bolero_core/bol_keyboard.lua:181 | | 18 | clearScissor | 2048 | 0.010579999999996 | bolero_core/bol_gfx.lua:793 | | 19 | clamp | 4096 | 0.010081999999989 | bolero_core/bol_coord.lua:154 | | 20 | tick | 800 | 0.0082340000000158 | behavior/menu_pause.lua:36 | +-----+----------------------------------+----------+--------------------------+----------------------------------+
I’m not sure if that specifically means string operations like concatenation, or just any string lookup period. I’m using strings a lot as hashmap indices. I didn’t think it was related, but I got printouts of every instance of the following anyways to try and narrow it down:
- String concatenation (..)
- Table creation (starting with ‘{‘)
- tostring()
It was a long list and my eyes glazed over. Too much stuff, and I can’t just go and comment out random lines without breaking the engine. So I started disabling engine subsystems at the config level, and then checking if the memory usage stopped. I disabled all user input, audio, the debug console, logging, and had the engine load a blank screen with no actors. The stupid number kept climbing.
Onto the main love.update() callback. I placed a bunch of print(collectgarbage(“count”)) calls throughout the function, but the number didn’t seem to rise at all within the update callback. I finally narrowed it down by cutting the love.draw() callback short. Memory usage still increased intermittently, but not constantly the way it was before. By slowly cutting-and-pasting if true then return end at different parts of the function, I finally reached the offending code.
It was… love.graphics.getStats(), the LÖVE API call that provides drawing statistics. It returns a table with the info, and I was calling it every frame, regardless of whether or not the stats were actually being shown. Every time you return a new table to a variable, the old table, unless there is another reference to it elsewhere, is destined for the garbage collector. So, mystery solved, but as expected, fixing this had no impact on the frame-drop issue in Map 1 at all.
(21 Nov 2021 Correction: love.graphics.getStats() actually does have a variant which fills in an existing table.)
I think what I should take away from this is to keep an eye on collectgarbage(“count”) a little more closely as I develop. Maybe add some kind of indicator if Lua memory usage increases over 50KB per second, and also watch for GC events.
Old:
bol.globals.debug.gpu_stats = love.graphics.getstats() if bol.gfx.show_gpu_stats then bol.gfx.drawGraphicsStats() end
New:
if bol.gfx.show_gpu_stats then bol.globals.debug.gpu_stats = love.graphics.getstats() bol.gfx.drawGraphicsStats() end
Thoughts
I’m a little angry that I didn’t get this all working this weekend. Realistically, I can probably mitigate the remaining glitches, and it hasn’t been a bad week or anything. I finally got those marble characters rolling up and down hills, which was something I’ve wanted to get working for a while. But, changing anything about the platformer tick will have major consequences that cascade to every moving game object, so before getting too deep into project development, it needs to be made as reliable as possible.
Plans For Next Post
- Fix remaining issues with slopes. Good lord.
- Start posting info about the engine to a wiki, including the task list / roadmap
(e 24/Sept/2019: physics model -> physics simulation)
(e 21/Nov/2021: Note on getStats())