Discovering Sponza by Babylon.js and sharing tips on how to build a cross-platforms WebGL game

To celebrate the 2.3 version of Babylon.js, the team decided to build a new cool demo of what can be done with our WebGL engine but also to show how HTML5 web standards can build great games today. The idea was to build a similar, if not identical, experience on all WebGL supported platforms and to try to reach native apps’ features.

This article will then explain how this experience works and the various challenges we’ve faced to build it.

For that, this demo is using a fairly impressive number of “HTML5 features” like WebGL, Web Audio of course but also Pointer Events (available everywhere thanks to jQuery PEP polyfill), Gamepad API, IndexedDB, HTML5 appcache, CSS3 transition/animation, flexbox and fullscreen API.

Test it on your desktop, mobile or Xbox One:

Discovering the demo

First, you’ll start on an auto-animated sequence giving the credits to who built this scene.

Most of the team’s members come from the “demo scene”. If you don’t know what it is, have a look to this site or go on Wikipedia. You’ll discover that this is an important part of the 3D developers’ culture. On my side, I was on Atari while David Catuhe was on Amiga which is still a regular source of conflicts between us ;-). I was coding a bit but mainly doing the music in my demo group. I was a huge fan of Future Crew and more specifically of Purple Motion, my favorite demo scene composer of all time.

For Sponza, here are the contributors:

  • Michel Rousseau aka “Mitch” has done the awesome visual, animations & rendering optimizations acting as the 3D artist. It took the great Sponza model provided freely by Crytek on their website and used our 3DS Max exporter to generate what you see.
  • David Catuhe aka “deltakosh” and I have done the core part of the Babylon.js engine and also all the code for this demo (custom loader, special effects for the demo mode using fade to black post-processes, etc.) as well as a new type of camera named “UniversalCamera” handling all type of inputs in a generic way.
  • I’ve done the music using Renoise and the awesome EastWest Symphonic Orchestra sound bank. I’ve already shared my way to compose music: Composing the music for the World Monger Windows 8 game using the Renoise tracker & East West VST Plug-ins
  • Julien Moreau-Mathis helped us by building a new tool to help 3D artists finalizing the job between the modeling tools (3DS Max, Blender) and the final results. For instance, Michel has used it to test & tune the various animated cameras and to inject the particles into the scene.

If you wait until the end of the automatic sequence up to the epic finish (my favorite part, you should check it), you’ll be automatically switched to the free interactive mode. If you’d like to by-pass the demo mode, simply click on the camera icon or press A on your gamepad.

In the interactive mode, if you’re on a PC/Mac, you’ll be able to move inside the scene using keyboard/mouse like a FPS game, if you’re on smartphone, you’ll be able to move using a single touch (and 2 to rotate camera) and finally, on an Xbox One using the gamepad (or a desktop if you’re plugging a gamepad into it).

Fun fact: on a Windows touch PC, you can potentially use the 3 types of input in the same time. Smile

The atmosphere is different in the interactive mode. You’ve got 3 storms sounds randomly positioned in the 3D environment, some wind and some small fire sounds on each corner. On supported browsers (Firefox, Chrome, Opera & Safari), you can even switch between the normal speaker mode and the headphone mode clicking on the dedicated icon. This will then use the binaural audio rendering of Web Audio for a more realistic audio simulation if you’re listening via a headphone.

To have a complete app-like experience, we’ve generated icons & tiles for all platforms. This means for instance on Windows 8/10 that you can pin this web app into the start menu. We even have multiple sizes available:

image image

Same on your iPhone, Windows Mobile or Android device:

IMG_0035 wp_ss_20160204_0001 Screenshot_2016-02-05-13-19-59

Offline first!

Once the demo has been completely loaded, you can switch your phone to airplane mode to cut connectivity and click on the Sponza icon. Our web app will still provide the complete experience with WebGL rendering, 3D web audio and touch support. Switch it to full screen and you won’t be able to make the difference with a native app!

We’re using our IndexedDB layer available natively inside Babylon.js for that. The scene (JSON format) and the resources (JPG/PNG textures as well as MP3 for the music & sounds) are stored in IDB. The IDB layer coupled with HTML5 application cache is then providing this complete offline experience. To learn more about this part and how to configure your game to obtain similar results, read this: Using IndexedDB to handle your 3D WebGL assets: sharing feedbacks & tips of Babylon.JS and Caching Resources in IndexedDB in Babylon.js

Xbox One enjoys the show also

Last but not least, the very same demo works flawlessly in MS Edge on your Xbox One:


Press A to switch to interactive mode. The Xbox One says that you can now move using your gamepad inside the 3D scene:


So, let’s briefly recap.

The very same code base works across Windows, Mac, Linux on MS Edge, Chrome, Firefox, Opera & Safari, on iPhone/iPad, on Android devices with Chrome or Firefox, Firefox OS and on Xbox One! Isn’t that cool? Being able to target so many devices with a native like experience directly from your web server?

I’ve already shared my excitement about this huge potential in a previous article: The web: the next game frontier?

Hack the scene with our debug layer to learn how we’ve made it

If you’d like to understand how Michel is mastering the magic of 3D modeling, you can hack the scene using the Babylon.js Debug Layer tool.

To enable it on a machine with a keyboard, press “CTRL + SHIFT + D” and if you’re using a gamepad on PC or Xbox, press on “Y”.

Note: displaying the debug layer costs a bit of performance due to the composition job the rendering engine needs to do. So the FPS displayed are a bit less important than the real FPS you’ve got without the debug layer displayed.

Let’s test it on a PC for instance.

Move near the lion’s head and cut the bump channel from our shader’s pipeline:


You should see that the head is now less realistic. Play with the other channel to check what’s going on.

You can also cut the dynamic lightning engine or disable the collisions engine to fly or move through the walls. For instance, disable the “collisions” checkbox and fly to the first floor. Put the camera in front of the red flags. You can see them slightly moving. Michel has used the bones/skeletons support of Babylon.js to move them. Disable the “skeletons” option and they should stop moving:


At last, you can display the meshes tree on the top right corner. You can enable/disable them to completely break the great job done by Michel:


Removing the geometries, our über shader’s channels or some options of our engine can help you troubleshooting performances on a specific device to see what is currently costing too much. You can also check if you’re CPU limited or GPU limited. Even if, most of the time, we’re CPU limited in WebGL due to the mono-threading nature of JavaScript. Finally, this tool is also very useful to learn how a scene has been built by the 3D artist.

By the way, it works great on Xbox One also:



We faced a number of questions & challenges to build this experience. Let me share some of them.

WebGL performance and cross-platforms compatibility

On the programming side, this one was probably the easiest one as it’s completely handled by the Babylon.js engine itself.

We’re using a unique shader’s architecture that adapts itself to the platform by trying the find the best shader available for the current browser/GPU using various fallbacks. The idea is to lower the quality/complexity of the rendering engine until we manage to display something on screen.

Babylon.js is mainly based on WebGL 1.0 to guarantee that the 3D experiences built on top of it will run everywhere. It has been built with a web philosophy in mind, so we’re using a “graceful degradation” like approach on the shader compilation process. This is completely transparent for the 3D artist who doesn’t want to understand those complexities most of the time.

Still, the 3D artist has a very important role in the performance optimizations. He has to know the platform he’s targeting, its features supported and its limitations. You can’t take assets coming from AAA games made for high end GPUs and DirectX 12 and push them like that on a WebGL engine. Targeting WebGL is today very similar to the job you’ll have to do to optimize for mobile, plus the fact that JavaScript is highly mono-threaded.

Mitch is extremely good doing that: optimizing the textures, pre-calculating the lightning to bake it into the textures, reducing as much as possible the number of draw calls, etc. He’s got years of experience behind him and saw the various generations of 3D hardware & engines (from PowerVR/3DFX to today’s GPUs) which really helps.

He already shared a bit of those basics: Real Time 3D : making a demo for WebGL Purposes –Basics. and already proved several times you can create stunning graphics on the web with high performance on small integrated GPUs with the Mansion, Hill Valley or Espilit demo scenes for instance. I’m also highly recommending you watching his great talk: NGF2014 – Create 3D assets for the mobile world & the web, the point of view of a 3D designer where he shared his experience and how he managed to optimize the Hill Valley scene from less than 1 fps to 60 fps!

The initial goal for Sponza was to build 2 scenes. 1 for the desktop and 1 for the mobile with less complexity, smaller textures, simpler meshes/geometries. But during our tests, we finally discovered that the desktop version was also running pretty fine on mobiles as it can run up to 60fps on an iPhone 6s or an Android OnePlus 2. We then decided not to continue working on the simpler mobile version. But again, following the general web best practices for web games is a good idea. It would have been probably better to have a pure “mobile first” approach on Sponza to reach 30fps+ on most mobiles and then enhance the scene for high-end mobiles and desktop. Still, most of the feedback we have on twitter seems to indicate that the final result works very well on a lot of mobiles. Sponza has been optimized on a HD4000 GPU (Intel Core i5 integrated) which is more or less equivalent to actual high-end mobiles’ GPUs.

We’re then pretty happy of the performance. Sponza is using our über shader with ambient, diffuse, bump, specular & reflection enabled, we have some particles to simulate small fires on each corner, we’ve got animated bones for the red flags, 3D positioned sounds and collisions when you’re moving using the interactive mode. Technically speaking, we’ve got 98 meshes used in the scene generating up to 377781 vertices, 16 actives bones, 60+ particles that could generate up to 36 draw calls. And trust me, having few draw calls is the key to optimal performance, even more on the web.

The loader

For Sponza, we wanted to create a new loader, different than the default one we’re using on the website, to really have a polished web app. I’ve then asked to Michel to suggest something new.

He first provided me that:


Which looks very nice when you first have a look to it!

But then, you’re quickly asking to yourself: “how am I going to make this working on all devices in a responsive way?”. Well, you probably perfectly understand me.

Let’s talk first about the background. The effect created by Michel was nice but wasn’t working well across all window sizes & resolutions generating some moiré. I’ve then changed it with a classical screenshot of the scene.

I then wanted the background to completely fill the screen without black bars and without stretching the image to break the ratio.

The solution mainly comes from CSS “background-size: cover” + centering the image on X & Y.

We then have the experience I was looking to, whatever the screen ratio being used:

image image

The other parts are using classical percentage-based CSS positioning. But my main wonder was on how to handle the size of the fonts based on the size of the viewport.

Well, the good news is that you’ve got a specific unit for that! It’s named “vw” or “vh” where 1vw is 1% of the viewport width for instance. This CSS property is doing all the magic for you and is pretty well supported across browsers. Well, at least, all WebGL-compatible browsers are supporting it and that was more than enough for me. 😉

To know more about this: Viewport Sized Typography

Finally, we’re playing with the opacity property of the background image to move it from 0 to 1 based on the current downloading process moving from 0 to 100%.

Handling all inputs in a transparent way for the users

Our WebGL engine is doing all the job on the rendering side to display the good stuff on all platforms. But how to guarantee that the user will be able to move inside the scene whatever the input type used?

We were currently supporting all types of input and user’s interactions in Babylon.js v2.2: keyboard/mouse, touch, virtual touch joysticks, gamepad, device orientation VR (for Card Board) and WebVR, each via a dedicated camera. You can read our documentation to know more about them:

Touch is being universally addressed via the Pointer Events specification propagated to all platforms via the jQuery PEP polyfill (generating Touch Events for you when needed). To know more about Pointer Events: Unifying touch and mouse: how Pointer Events will make cross-browsers touch support easy

The idea for Sponza was to have a unique camera handling all users’ scenarios possible: desktop, mobile & console.

We ended up creating the UniversalCamera. It was so obvious and simple to create that I still don’t know why we didn’t do it before. The UniversalCamera is more or less a gamepad camera extending the TouchCamera which extends the FreeCamera.

The FreeCamera is providing the keyboard/mouse logic, the TouchCamera is providing the touch logic and the final extend the gamepad logic.

This UniversalCamera is now used by default by Babylon.js. If you’re navigating on the demos, you can move inside the scenes using mouse, touch & gamepad on all of them now. To know more on how we’ve built it, read our code: babylon.universalCamera.ts

Synchronizing the transitions with the music

This is where we’ve asked ourselves most of the questions. You may have noticed that the introduction sequence is synchronized with specific moments of the music. The first texts are displayed with some of the drums I’ve used and the final ending sequence is switching quickly from one camera to another on each note of the horn instrument I’m using.

Synchronizing audio with the WebGL rendering loop is not easy. Again, this is the mono-threaded nature of JavaScript that generates this complexity. To better understand why, please read the first part of this article: Introduction to the HTML5 Web Workers: the JavaScript multithreading approach. It’s really important to understand the problem highlighted by the first diagram to understand the global issue we’re facing.

Usually, in demo scenes (and video games I guess), if you’d like to synchronize the visuals with the sounds/music, you’re going to be driven by the audio stack. 2 approaches are used:

1 – Generate some metadata injected into the audio files that could raise some events when you’re reaching a specific part of it
2 – Real-time analyzing the audio stream via FFT or similar to detect interesting peaks or BPM changes that would again generate events for the visual engine.

Those approaches particularly work well in multi-threaded environments like C++. But in JavaScript with Web Audio, we’ve got 2 problems:

1 – JavaScript, which is mono-threaded and unfortunately, most of the time, web workers won’t help
2 – Web Audio doesn’t have any events that could be sent back to the UI thread even if Web Audio is being handled on a separate thread by the browser.

Web Audio has a much precised timer than JS. It would have been cool to be able to use this separate timer on a separate thread to drive the events back to the UI thread. But today, you can’t do that.

On the other side, we’re rendering the scene using WebGL and the requestAnimationFrame function. This means that, in “best cases”, we have 16ms windows timeframe. If you’re missing one, you’ll have to wait up to 16ms to be able to act on the next frame to reflect the sound synchronization (for instance to launch a fade to black effect).

I was then thinking about injecting the synchronization logic into the requestAnimationFrame loop. Checking the time spent since the beginning of the sequence and checking if I should impact the visual to react on an audio event. The good news is that web audio will render the sound whatever is going on in the main UI thread. For instance, you can be sure that the 12s timestamp of the music will arrive exactly 12s after the music have started even if the GPU is having a hard time rendering the scene.

But we finally chose the simplest solution ever: using setTimeout() calls…

Yes, I know. The above article on the introduction to web workers says that this is unreliable. But in our case, once the scene is ready to be rendered, we know that we’ve downloaded all our resources (textures & sounds) and compiled our shaders. We shouldn’t be too much annoyed by unexpected events saturating the UI thread. GC could be a problem but we’ve also spent a lot of time fighting against it in the engine: Reducing the pressure on the garbage collector by using the F12 developer bar of Internet Explorer 11

Still, I know this is far from being an ideal solution. Switching to another tab or locking your phone and unlocking it some seconds later could generate some issues on the synchronization parts of our demos. We could fix those problems by using the Page Visibility API by pausing the rendering loop, the various sounds and re-computing the next timeframes for the setTimeout() calls.

I’ve got the feeling that I’ve missed something, that there is probably a better approach to handle this problem. So feel free to share your ideas or solutions in the comment section if you think you’ve got something better!

Oh, by the way, the text animations are simply done using CSS3 transitions or animations combined with a flexbox layout to have a simple but efficient way to display at the center or on each corner using alignItems to “center”, “flex-start” or “flex-end” and justifyContent to “center”, “flex-start” and “flex-end” also.

To know more about Flexbox: A Complete Guide to Flexbox

Handling Web Audio on iOS

The last challenge I’d like to share with you is the way Web Audio is being handled by iOS on iPhone & iPad. If you’re looking in search engines for “web audio + iOS”, you’ll find lot of people having hard time to play sounds on iOS.

iOS is perfectly supporting Web Audio, even the binaural audio mode. But Apple has decided that a web page can’t play any sound by default without a specific user’s interaction. It’s probably to avoid having ads or whatever disturbing the user by playing unsolicited sounds.

You then first need to unlock the web audio context of iOS after a user’s touch before trying to play any sound. Otherwise, your web application will remain desperately mute.

Unfortunately, the only way I’ve found to do this check if by doing a user platform sniffing approach as I didn’t found a feature detection way to do it. I really hate doing that but sometimes, this is the only solution you’ve got…

You can check my code: babylon.audioEngine.ts

If you’re not on iPad/iPhone/iPod, the audio context is immediately available for use. Otherwise, we’re unlocking the audio context of iOS by playing a code generated empty sound on the touchend event. You can register to the onAudioUnlocked event if you’d like to wait for that before launching your game.

So if you’re launching Sponza on an iPhone/iPad, you’ll have this final screen at the end of the loading sequence:


Touching anywhere on the screen will unlock the audio stack of iOS and will launch the show.

I hope that you’ve enjoyed our Sponza by Babylon.js experience. I also hope that you’ve learned interesting details in this article. To learn more, read the complete source code of this demo on our github. Main files to review are: index.js and babylon.demo.ts

Finally, I really hope you’ll be now even more convinced that the web is definitely a great platform for gaming!

Stay tuned, we’ve got great new demos coming soon and trust me, they are very very cool. 🙂


6 thoughts on “Discovering Sponza by Babylon.js and sharing tips on how to build a cross-platforms WebGL game

  1. stopped at 96%, got error:
    babylon.js:32 Uncaught (in promise) TypeError: Failed to execute ‘decodeAudioData’ on ‘BaseAudioContext’: parameter 1 is not of type ‘ArrayBuffer’.
    at t._soundLoaded (babylon.js:32)
    at babylon.js:32
    at babylon.js:4
    at IDBTransaction.a.oncomplete (babylon.js:36)

      1. I’m sure that he used FireFox and the browser doesn’t support mp3.
        My problem is when I exit VR mode, it uses the deviceOrientationVRHelper camera instead of CameraPerso!

  2. BJS – [18:24:05]: Error while decoding audio data for: ambientWind / Error: EncodingError: Unable to decode audio data
    platform: win10+chrome
    testified that my wav file worked fine by dragging it to the chorme browser

Leave a Reply