3. JS Instrument technical documentation¶
Of all the Instruments in the OpenWPM WebExtension the one that is most likely to collect the most information is the Javascript instrument. It allows users to specify which WebAPI calls they are interested in and receive a full breadth of information on how websites use the instrumented APIs.
To allow for this rich data collection it employs a number of tricks and subtleties which this document aims to capture.
TL;DR: We pass the configuration to a content script in the WebExtension. In the content
scope generate a string that contains the script we want to execute on the page
and then insert it in into the page.
This script is literally a format string in which the configuration gets embedded via
JSON.stringify
.
3.1. Setting up the instrumentation¶
In the JavascriptInstrument class which runs in the background script, we register two content
scripts to run at document_start
these are:
A dynamically generated script that sets
window.openWpmContentScriptConfig
to theJSON.stringified
value of the contentScriptConfig.content.js
which is the combination ofjavascript-instrument-page-scope
andjavascript-instrument-content-scope
as produced by webpack
By setting those two in this order we are able to pass a parameter to the content script. I currently do not know of another way to dynamically pass config from the background to the content scope but this feels hacky.
In javascript-instrument-content-scope
we then create a massive string that contains
all of the following:
The
lib/js-instruments.ts
file, where the actual instrumenting happensThe
jsInstrumentationSettings
as a JSON objectThe
javascript-instrument-page-scope
which contains the setup and sendMessagesToLogger functions
This string is then injected into the page scope where javascript-instrument-page-scope
starts executing, pulling the testing and event parameter out of data attributes on it’s
script node. It then calls into lib/js-instruments.ts
which then does the actual
instrumentation.
See CurrentFlow for a diagram I made as part of my bachelor’s thesis to demonstrate the flow of information
3.2. Data collection¶
TL;DR: We wrap each WebAPI that we should instrument and forward all calls to us to the underlying object, while logging the accesses. This is done by the injected script mentioned above.
3.3. Getting the data into the Database¶
Since the data collection happens in the website scope, but we care about it in the in the database, we had to figure out a way to get it there.
We do this via the following steps:
Dispatch a custom event via
document.dispatchEvent
injavascript-instrument-page-scope
Register a listener for the custom event in
javascript-instrument-content-scope
and callruntime.sendMessage
to pass it from the content scope into the background scopeWhere
javascript-instrument
(in the background scope) receives the message and forwards it to theloggingdb