Tutorial: Distributed R and Bioconductor for the Web


Karim Chine, European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom.

Description

R is becoming the `lingua franca' of data analysis and statistical computing. It has a very powerful graphics system as well as cross-platform capabilities for packaging any computational code. Hundreds of available R packages implement the most up-to-date computational methods and reflect the state-of-the-art of research in various fields. R packages are foreseen as a reproducible research enabler. There is no obstacle to a large-scale deployment of R on public grids since it is a GPL software. However R is not multi-threaded, doesn't operate as a server and has only a low-level non-object-oriented API. GUIs development for R remains non-standardized. R's potential as a computational back end engine for applications has yet to be fully exploited. While its user base is growing at a high rate, this growth rate would be significantly higher in the presence of a user-friendly and rich workbench.

Biocep is a general unified solution for integrating and virtualizing the access to R engines/servers and aims to become the federative user-friendly computational e-platform for research, finance and education. The Biocep virtual workbench enables the plugability for all the elements of a computational environment: the computational resource whether it is a local machine, a cluster, a grid or a cloud server via a simple URL, the computational components via the import of R packages and the computational GUIs via the import of plugins from repositories or the design of new views with a drag&drop GUI editor. Several dockable built-in views allow users to work interactively and collaboratively with grid-enabled R engines. The views include a console, interactive graphic devices, a workspace explorer, PDF and SVG viewers, R data inspectors, linked plots and innovative collaboration-enabled spreadsheets fully integrated with R functions and data. Biocep is also a toolbox that can be used to generate stateless and state-full web services mapping R functions. It enables Python/Groovy scripting with R on client and on server sides. Using the Biocep frameworks, pools of R engines can be deployed on heterogeneous nodes, managed and used for parallel and distributed computing and for generating dynamic content on-the-fly for web applications.

A Biocep based R virtualization infrastructure has been successfully deployed on the British National Grid Service. The result leaves no doubt about how useful this service would become for researchers. If the new platform was widely adopted, it would greatly enhance the usability of existing HPC infrastructures and would increase their usage. It may also work as an enabler of a new computing business model that would synergize the utility computing model (resources) and the pay-per-use software model (components/GUIs).

This tutorial introduces the new platform. It is useful to anyone interested in e-computing/e-statistics, collaborative data analysis, cloud/grid computing and user-friendly HPC. It is also useful to developers of distributed and web oriented applications for data analysis.

The attendees will learn:

  1. How to run R servers and how to connect them to the virtual R workbench.
  2. How to use the virtual R workbench views for improved user experience.
  3. How to use the workbench to analyse data collaboratively with several partners using the various collaboration-enabled views.
  4. How to use R as a computational toolkit for java applications or for scripting in jython or groovy (Overview of the RServices API).
  5. How to create new views for the workbench with the Netbeans GUI builder.
  6. How to deploy a managed infrastructure of R engines' pools. How to use it for parallel and distributed computing.
  7. How to generate and deploy stateless and stateful web services mapping R functions.
  8. How to use R as a Web Toolkit to generate dynamic content (graphics, statistical analysis results.) for web applications.

Resources, downloads, source code can be found here: http://www.biocep.net/.