Kepler G-Pack

June 22nd, 2012

Kepler/G-Pack: A Kepler Package Using the Google Cloud for Interactive Scientific Workflows
(a.k.a Koogle-Kurator Package)Scientific workflows often aim at fully automating complex data analysis pipelines. However, depending on the nature of the scientific workflow, interactive steps, i.e., which involve human analysis and decision making within the workflow are also common. So far, Kepler has had limited capabilities for letting users become “actors” in a scientific workflow. The Kepler/G-Pack (Google package) is a set of actors that leverage a simple form of cloud computing via Google-Apps and Google-Docs, effectively “outsourcing” Kepler tasks to Google resources.

With this package, many tasks and steps that users would use the Google document and computational cloud for can now be orchestrated by Kepler programmatically. For example, via certain actors in this package, Google spreadsheets can be used as data sources, sinks, or computational steps (“transformers”) in a Kepler workflow.  Additional functionalities include: creating a new copy of a spreadsheet from a template, sharing a spreadsheet with another user, emailing the user to notify the sharing, provide feedback to the workflow that a human interactive session has finished (“committed”) so the workflow instance may proceed, visualizing analysis outputs via Google charts etc. We will demonstrate the capabilities of the Kepler/G-Pack using different workflows, e.g., (1) a “Data Curation Workflow” in which biological specimen records are semi-automatically curated: The upstream part of the workflow groups records, identifying “curation work packages” that curators can validate or manually revise (this workflow shows the use of spreadsheets for data  fusion and manual data cleaning within the workflow); (2) an “Evapotranspiration Workflow” which shows how Google spreadsheets can be used for remote computation and visualization.

This package includes two sub-packages: Koogle(Google in Kepler), Kurator(Curation in Kepler). Koogle package works on regular Kepler platform, while Kurator works on an extended platform from Kepler - COMAD. Koogle actors can be encapsulated as COMAD actors in a way that is similar to composite actors in Kepler, so that actors in Koogle and Kurator can work seemlessly together.
