ADA workflows

Ada is a performant and highly configurable system for secured integration, visualization, and collaborative analysis of heterogeneous data sets, primarily targeting clinical and experimental sources. Ada provides key infrastructure for secured integration, visualization, and analysis of heterogeneous clinical and experimental data generated during the AETIONOMY project. Ada serves as a AETIONOMY Reporting System for the clinical data and enables:

  • Import and export data sets
    For a quick and flawless data integration, Ada provides the import adapters for 6 file formats and APIs. For post-processing, data can be exported into 3 file formats, or pulled directly through Ada's RESTful API.

  • Access the dictionary
    The dictionary panel allows to change the way how fields are displayed on-the-fly. You can also investigate the field types inferred during the data set import.

  • Explore the stats of your data
    Create a view with dynamic and responsive widgets showing different stats, such as, distributions, scatters, correlations, and box plots.

  • Compare different data subsets
    Comparing different subsets has never been easier. Simply define two or more filters and let Ada do the rest.

  • Unleash the power of Spark-based ML
    Any data sets residing in Ada can be "mined" for classification using, e.g., random forest and multilayer perceptron, or regression with, e.g., generalized linear regression and gradient boost regression tree.

Figure 1: * Ada dynamic and user defined views for statistics*

One of Ada's main functionalities is to produce dynamic and personalized views containing filters, statistical widgets, and tables, which can be saved and shared among the users. This makes Ada an ideal tool for a collaborative reporting.

Figure 2: ADA editable dictionary and tree

Metadata and editable dictionary and tree
To define data set’s metadata Ada provides an editable dictionary, and a categorical tree with drag-and-drop manipulation. Ada supports many data field types including number, date, boolean, enumeration, and json. These (collectively called dictionary) could be automatically inferred during an import. Further, each field type can be either scalar or array, which makes Ada's type system flexible enough to cover a wide range of data origins and flavors.

Ada offers advanced machine learning functionality backed by Spark ML library, which is a popular compute grid library for an efficient large-scale data processing and analysis. From Spark, Ada integrated several classification (e.g., random forest), regression (e.g., gradient boost regression tree), and clusterization routines (e.g., bisecting k-means) with a wide range of evaluation metrics (accuracy, AUROC, AUPR, R2, RNMSE, etc.). On the top of that, ML results can be saved, queried, and visualized, and even exported to a (new) data set directly in Ada.

Ada has a modular, lightweight, layered architecture, centered around Scala stack, with a strong focus on performance. Scala is a modern functional and object-oriented JVM language developed by academics at EPFL . The main libraries include Play Framework for web, Spark for distributed computing and machine learning, and Akka for streaming. Ada persists data in two popular NoSQL databases, Elastic Search and Mongo, which provide flexible, schema-free JSON storage. As a matter of fact, JSON in Ada is prominently used coast-to-coast. Ada is packed and compiled using Play sbt scripts as a standalone application with an embedded Netty server, which can be deployed anywhere.

Figure 3: ADA architecture and technologies used in development

Figure 4: Datasets available on ADA