In this tutorial, we’ll create a small extension that will work in all current major modern browsers supporting the Web Extension model: Edge, Chrome, Firefox, Opera, Brave & Vivaldi. We’ll see how to install this extension in all browsers, some simple tips to have a unique code base across them and how to debug in each browser.
Update 08/12/2016: To answer some comments I got on Twitter, I don’t cover Safari in this article as it doesn’t support the same extension model as others: https://developer.apple.com/reference/safariextensions. I’ve updated the article to add a Vivaldi section.
I won’t cover the basics of an extension as there is already plenty of very good resources available made by each vendor:
– Google: https://developer.chrome.com/extensions
– Microsoft: https://developer.microsoft.com/en-us/microsoft-edge/platform/documentation/extensions/ and I advise also this great overview video: Building Extensions for Microsoft Edge
– Mozilla: https://developer.mozilla.org/en-US/Add-ons/WebExtensions and https://wiki.mozilla.org/WebExtensions
– Opera: https://dev.opera.com/extensions/getting-started/
– Brave: https://github.com/brave/browser-laptop/wiki/Developer-Notes-on-Installing-or-Updating-Extensions
So, if you’ve never built an extension before or don’t know how it works, have a quick look to those resources. Don’t worry, it’s simple and straightforward to understand how to build them.
Let’s build a POC of an extension using AI/Computer Vision to help the blind analyzing images of a web page
We’ll see that, with a few lines of code, you can create some powerful features in the browser. In my case, I’m very concerned about accessibility on the web and I’ve already spent some time thinking on how to make a breakout game accessible using Web Audio & SVG for instance.
Still, I was looking for something that could help blind people in a more general way. I was recently inspired while listening to a great talk of Chris Heilmann in Lisbon: Pixels and hidden meaning in pixels.
Indeed, using today’s Artificial Intelligence (AI) algorithm living in the cloud as well as text-to-speech technologies rather exposed in the browser with Web Speech API or using also remote cloud service, we can very easily build a solution that can analyze an image living in a web page with a missing or improperly filled ALT text property.
My little proof-of-concept (POC) will simply extract the images from a specific web page (the active tab) and display the thumbnails into a list. When you’ll click on one of them, it will query the Computer Vision API to get some text describing the image and will use wither the Web Speech API or Bing Speech API to share it with the listener.
This video is demonstrating it into Edge, Chrome, Firefox & Opera and Brave! (please enable sound on your device to understand how it works)
You’ll notice that, even if the Computer Vision API is analyzing some CGI images, it’s very accurate! I’m really impressed by the progresses the industry has made during the last months on this.
In my case, I’m using those services:
– The Computer Vision API from Microsoft Cognitive Services which are free to use (with a quota). You’ll need to generate a free key there to make the code working and replace the TODO section in the code with it to make this extension working on your machine. To have an idea of what this API could do, play with it: https://www.captionbot.ai
You can find the code of this small browser’s extension on my github: http://github.com/davrous/dareangel
You’re then free to modify the code to use other services you may like or want to test.
Tip to make your code compatible with all browsers
Most of the codes and tutorials you’ll find will be using the namespace chrome.xxx for the extension API like chrome.tabs for instance.
But as I’ve told you, the Extension API model is currently being standardize to browser.xxx and some browsers, like Edge, are then defining the msBrowser namespace in the meantime.
Of course, you also need to use the subset of API supported by all browsers. For instance:
– Microsoft Edge supports the current list: https://developer.microsoft.com/en-us/microsoft-edge/platform/documentation/extensions/api-support/
– Mozilla Firefox shares its current Chrome incompatibilities: https://developer.mozilla.org/en-US/Add-ons/WebExtensions
Let’s review together the architecture of this extension. If you’re new to browsers’ extensions, it should help you understanding the flow.
First, let’s start by the manifest file:
This JSON is the minimum required to be able to load the extension in all browsers. For instance, you must specify an “author” property to load it in Edge otherwise it will raise an error. You need also to use the same structure for the icons.
Here are the resources to help you building a manifest file compatible with all browsers:
The content script is simple:
It first logs into the console to let you check the extension has been properly loaded via F12.
It then waits for a message from the UI page with a “requestImages” command to get all images available in the current DOM and returns a list of their URL if they’re bigger than 64×64 (to avoid all the pixel tracker crap as well as too low definition images).
The popup UI Page we’re using is very simple and will display the list of images returned by the content script inside a flexbox container. It loads the start.js script which immediately creates an instance of dareangel.dashboard.js to send a message to the content script to get the URLs of the images of the current tab displayed.
Here’s code living in the UI page that requests the URLs to the content script:
We’re creating image elements, each image will trigger an event if it has focus to query the Computer Vision API for review.
This is done by this simple XHR call:
If you’d like to understand how this Computer Vision API works, please read:
– Analyzing an Image Version 1.0 that will explain you what you can do with this technology
– Computer Vision API – v1.0 that will show you via an interactive console in a web page how to call the REST API with the proper JSON properties and the JSON object you’ll have in return. It’s useful to understand how it works and how you will call it.
In our case, we’re using the describe feature of the API. You see also in the callback that we will try to use either the Web Speech API or the Bing Text-To-Speech service based on your options.
Here is then the global workflow of this little extension:
Loading the extension in each browser
Let’s review quickly how to install your extension in each browser.
Download or clone this small extension somewhere on your hard drive from my github: https://github.com/davrous/dareangel
Please also modify dareangel.dashboard.js to add at least a Computer Vision API key. Otherwise, the extension will only be able to display the images extracted from the web page.
First, you need at least a Windows 10 Anniversary Update (OS Build 14393+) to have support for extensions in Edge. Then:
– Open Edge and type “about:flags” in the address bar. Check the “Enable extension developer features”
– Click on “…” in the Edge’s bar -> “Extensions” -> “Load extension” and select the folder where you’ve cloned my repo. You’ll obtain this:
– Click on this freshly loaded extension and enable the “Show button next to the address bar”
Notice the “Reload extension” button which is useful while you’re developing your extension. You’re not forced to remove/reinstall it during the development process, just click reload to refresh the extension.
– Navigate to http://www.babylonjs.com and click on the Dare Angel (DA) button to do the same demo as shown in the video.
– Open Chrome and navigate to “chrome://extensions” and enable the “Developer mode”
– Click on “Load unpacked extension” and choose the folder where you’ve extracted my extension
– Navigate to http://www.babylonjs.com and open the extension to check that it works fine.
You’ve got 2 options. First one is to temporary load your extension and it’s as easy as in Edge and Chrome.
– Open Firefox, navigate to “about:debugging” and click “Load Temporary Add-on”
– Navigate to the folder of the extension and select the manifest.json file.
– That’s it! Navigate to http://www.babylonjs.com to test the extension.
The only problem with this solution is that everytime you’ll close the browser, you’ll have to reload your extension this way. The second option would be to use the XPI packaging.
– Open Opera and navigate to “about://extensions” and click on the “Developer mode” button.
– Click on “Load unpacked extension…” and choose the folder where you’ve extracted my extension
– Navigate once again to http://www.babylonjs.com and open the extension to test it.
– Open Vivaldi and navigate to “vivaldi://extensions” and enable the “Developer mode”
– Click on “Load unpacked extension…” and choose the folder where you’ve extracted my extension
– Go to http://www.babylonjs.com and open the extension to test it.
You don’t have a “developer mode” embedded in Brave to let you load an unsigned extension, in the public version you can download on their site. You need to build your own version of it, following those steps:
– Clone this repo: https://github.com/brave/browser-laptop in ~/projects on your local hard drive
– Open the “browser-laptop” folder and run a “npm install” command inside it. It will download a lot of dependencies, so please be patient.
– Navigate inside the “app” folder and open the “extensions.js” file in your text editor. Locate the proper lines below where you’ll insert the registration code for your extension. In my case, I’ve just added the 2 last lines:
– Copy the extension into the “app/extensions” folder.
– Open 2 command prompts inside the “browser-laptop” folder. In the first one, launch “npm run watch” and wait for webpack to finish building the Brave electron app. It should say “webpack: bundle is now VALID”. If not, you’ll run into some issues 😉
– Then, in the second one, launch “npm start” that will launch this slightly custom version of Brave
– In Brave, navigate to “about:extensions” and you should see the extension displayed and loaded in the address bar.
Debugging the extension in each browser
To debug the client script part, living in the context of the page, you just need to open F12. Then click on the “Debugger” tab and find your extension folder.
Open the script file you’d like to debug, dareangel.client.js in my case, and debug your code as usual, setting up breakpoints, etc.
If you’d like to debug the popup page, you first need to get the id of your extension. For that, simply go into the property of the extension and you’ll find an ID property:
Then, you need to type in the address bar something like: ms-browser-extension://ID_of_your_extension/yourpage.html . In our case, it would be: ms-browser-extension://DareAngel_vdbyzyarbfgh8/dashboard.html for instance. Then, simply use F12 on this page:
As Chrome & Opera browsers rely on the same Blink code base, they are sharing the same debug process. Even if Brave and Vivaldi are forks of Chromium, they also has the same debug process most of the time.
To debug the client script part, open F12 on the page you’d like to debug (or CTRL+SHIFT+I in Opera).
To debug a tab your extension would create, this is exactly as with Edge, simply using F12.
For Chrome & Opera, to debug the popup page, right-click on the button of your extension next to the address bar and choose “Inspect popup” or open the HTML pane of the popup and right-click inside it to “Inspect”. Vivaldi only support right-click –> Inspect inside the HTML pane once opened.
For Brave, it’s the same process as with Edge. You need first to find the GUID associated to your extension in “about:extensions”:
And then open in a separated tab the page you’d like to debug like in my case: “chrome-extension://bodaahkboijjjodkbmmddgjldpifcjap/dashboard.html” and use F12.
For the layout, you have a bit of help using “SHIFT+F8” that will let you inspect the complete frame of Brave. And you’ll discover that Brave is an electron app using React.js! 🙂
Notice for instance the “data-reactroot” attribute.
Note: I had to slightly modify the CSS of the extension for Brave as it currently displays the popups with a transparent background by default and I had some issues also on the height of my images collection. I’ve limited it to 4 elements in case of Brave.
Mozilla has a really great documentation on Web Extensions Debugging: https://developer.mozilla.org/fr/Add-ons/WebExtensions/Debugging
For the client script part, it’s the same story as in Edge/Chrome/Opera/Brave, simply open F12 on the tab you’d like to debug and you’ll find a “moz-extension://guid” section with your code to debug:
If you need to debug a tab your extension would create (like to Vorlon.js Page Analyzer extension), simply use F12:
Finally, to debug a popup, it’s a bit more complex but well explained in the documentation in the “Debugging popups” section.
It’s awesome to see that today, using our regular JS/CSS/HTML skills, we can now build great extensions, having the very same code base working in all browsers!
I’ve tried to share as much as possible what I’ve learned while working on our Vorlon.js Page Analyzer extension and this little proof-of-concept.
Feel free to ping me on Twitter: @davrous for any feedback.