data science, dynamic simulation modelling, genomics, interactive visualisation, dashboards, image & video analysis
e: cnr.lwlss@gmail.comt: @cnrlwlss
In a previous post , I used the interactive plot below to visualise & compare population growth curves simulated from three members of a family of models that share the same set of parameters. The user can adjust parameter values and explore the resulting changes in similarities and differences between simulations from the three models in real time. In this post I will discuss where dynamic visualisation of quantitative analysis is most useful and how the calculations underlying this analysis were carried out in JavaScript. In a later post, I will outline how the plots themselves were constructed using the powerful D3 JavaScript library.:
Interactive visualisation is a great tool for quickly getting an intuition for the implications of a piece of quantitative analysis. One of its most attractive features is that it allows users to focus just on results and data without getting overwhelemed by technical implementations, or difficult mathematical, statistical or programming issues. This clear focus allows analysis to be consumed by a wider audience, increasing impact. It is possible to produce interactive visualisation tools using scientific programming languages such as R, Python, Julia, Matlab or Mathematica. All of these software tools are very powerful, and greatly increase scientist/programmer productivity. However, much like Stephen Hawking’s observation that you lose half your audience every time you present a mathematical equation, I suspect that with every software installation step you require your end users to go through, you lose a similar proportion. If users are forced to install unfamiliar software, and to learn to deal with package installation, documentation, basic programming, etc. just to generate or view a plot, it can dampen their enthusiasm for quantiative analysis generally. For example, it is possible to create beautiful, dynamic, interactive plots in the statistical programming language R, which is freely and easily available to download and install. R has a powerful package installation system and a clean, command-line interface. However, developing software and even interactive tools within R means that a non-trivial commitment of time and effort is required from new users before they can engage with your data or analysis. Using powerful scientific computing tools like R, Python or Julia allow rapid development of quantitiative workflows by experts, but on the other hand can be a barrier to engagement from untrained users.
Web-browsers must be among the most ubiquitous pieces of software in the world. They run on computers, phones and tablets. Their output can be displayed on anything from a television to a public billboard. Importantly, almost everyone has spent a considerable amount of time becoming familiar with how to use and interact with web-pages through their browser. Their ubiquity makes web-pages the ideal environment for running interactive visualisations of quantiative analysis: it absolutely minimises the amount of work that the user has to do to engage with a visualisation tool.
GUI tools like plotly and Shiny are very useful and allow the rapid development of web-based interactive visualisations. However, these tools have their own limitations. Shiny and Plotly are proprietary. This means that you must pay to use certain features, potentially costing a significant amount of money for commercial applications, in particular. They are essentially separate languages. This means that time developers spend learning to use these tools does not really benefit them in any other area. They are also server-side technologies: calculations underlying visualisation are carried out on another machine. This can be a great advantage, of course. The server on which calculations are carried out could be very powerful and could be loaded with vast quantities of relevant data. However, servers must be maintained and remain available in order for your visualisation to continue working. Perhaps the greatest limitation of tools like Shiny and Plotly is simply in the restricted range of visualisations that are achievable with these tools, compared to more flexible, open, client-side alternatives such as d3.js.
One drawback of the d3.js approach to creating interactive visualisations is that it requires carrying out at least some operations to build the visualisation in JavaScript. Javascript has a bad reputation as a poorly designed language. However, it is an incredibly important language, given its complete support by all modern web-browsers. Along with HTML and CSS it is one of the 3 core languages of the web. Also, despite its admittedly poor design, it does have some redeeming features for scientific computing, including some features of functional programming: functions as first class objects (allowing function closures) and the ability to map functions over lists. There are an enormous range of javascript libraries available for carrying out all kinds of interesting tasks and calculations (e.g. numerical integration of ODEs…).
Overall, for simple analysis or for direct visualisation of data, I think that the ease with which users can engage with interactive visualisation in the browser, and the amazing, beautiful array of visualisations that are possible with d3.js, make it worth becoming familiar with using JavaScript for some types of scientific computing.
Here I will describe the full set of calculations for generating simulated outputs from the family of three population models I presented in a previous post.
In this example, since the three models described in a previous article all have analytical solutions, we don’t require JavaScript to do anything more complicated than carry out basic mathematical functions like exp
and pow
. Nevertheless, the nested nature of the models allows for using function closures to maximise code reuse. This is good practise in JavaScript, and any other language where functions are first class objects (i.e. can be returned as the result of another function).
First of all, we should note that one of the worst features of javascript is that all variables are global. This can lead to severe pollution of the global namespace. Storing lots of variables in the global namespace is probably not a big deal if you are writing a short script that you won’t reuse and are not importing any external libraries which might contain variable names that could clash with yours, but it can quickly become a source of bugs in your code as your code becomes more complex. In order to minimise this problem, the best thing to do is to declare one object in the global namespace (pd
below), and then store all the variables, functions and objects that you need for your application in that object.
var pd={};
For example, we can define some generalised logistic model parameters and an initial conditon as follows, avoiding adding x to the global namespace. The price you pay for this is having to constantly indicate that your new variable is a member of the global object by prepending the name of your global object followed by a “.” (in this case “pd.
”) to the names of all of your variables, objects and functions.
pd.N0 = 0.01; // Initial condition
pd.r = 1; // Model parameters
pd.v = 0.75;
pd.K = 1;
Note that JavaScript doesn’t really have any concept of the type of an object or variable. This can be fine for short pieces of code, but it can lead to some nasty bugs if you don’t watch out for it.
As we saw in the previous article, the analytical solution of the generalised logistic model can be collapsed down to produce either of the other two models in the family by appropriate choice of parameter values. This allows us an opportunity to generate any of the three models using the same piece of code. For simulation and plotting, typically we want to fix parameter values (r
, v
& K
) and initial conditions (N_0
), but vary time (t
). To make things as clear and simple as possible, I have written a function closure (a function that returns another function as its output) to generate a one-dimensional function of time t
, given a set of parameter values and an initial condition.
// Closure returning function representing how soln.
// of generalised logistic ODE depends on time
pd.makeGlogist = function(N_0,r,v,K){
Glogist = function(t){
N = K/Math.pow(1.0+(-1.0+Math.pow(K/N_0,v))*Math.exp(-r*v*t),1/v);
return(N);
};
return(Glogist);
};
};
Although the solution to the generalised logistic model is referred to as Glogist
within the scope of the makeGlogist
function, the value (function) returned does not actually include a name. As with values returned from any function, if you need to reuse them, you should capture the values by storing them in a variable; a name should be provided for any new functions as they are generated. For example, we can generate 1D functions for simulating from the exponential, logistic and generalised logistic models with the parameter set specified above as follows:
// Generate functions for carrying out simulation
pd.Exp = pd.makeGlogist(pd.x0,pd.r,1.0,99999999999999999999);
pd.Logist = pd.makeGlogist(pd.x0,pd.r,1.0,pd.K);
pd.Glogist = pd.makeGlogist(pd.x0,pd.r,pd.v,pd.K);
The exponential model is generated from the generalised logistic model by setting the K
parameter to a very large number (approximating infinity), and setting v = 1
. The logistic model is generated from the generalised logistic model by setting v = 1
. Now that we have three functions of t
, in order to simulate results and eventually plot results, we need to have a range of values of t
at which to evaluate these functions. Assuming that we will always simulate from t = 0
up to t = tmax
, and that we require that we simulate Ntsteps
time intervals we can specify two simulation parameters:
pd.tmax = 10; // Simulation parameters
pd.Ntsteps = 10;
which we can use to generate an array (tarr
) containing a regularly spaced set of times at which to evaluate our models:
// Generate times at which to simulate observations
pd.tarr = [];
for (pd.i = 0; pd.i <= pd.Ntsteps; pd.i+=1){
pd.tarr[pd.i] = pd.i*pd.tmax/pd.Ntsteps;
};
Next, we can use tarr
to simulate from the three models by using JavaScripts elegant map
method for applying functions to arrays:
// Simulate & store results
pd.NExp = pd.tarr.map(pd.Exp)
pd.NLog = pd.tarr.map(pd.Logist)
pd.NGlog = pd.tarr.map(pd.Glogist)
To get a quick look at the numerical output we can write a report to screen:
// Report results
pd.report = function(t,Nexp,Nlog,Nglog){
document.writeln("t\t","NExp(t)\t","NLog(t)\t","NGlog(t)");
for (pd.i = 0; pd.i < t.length; pd.i+=1){
document.writeln(t[pd.i].toFixed(2)+"\t",Nexp[pd.i].toFixed(3)+"\t",Nlog[pd.i].toFixed(3)+"\t",Nglog[pd.i].toFixed(3));
};
};
pd.report(pd.tarr,pd.NExp,pd.NLog,pd.NGlog);
Some example output below:
Tags: interactive visualisation javascript functional d3 simulation