This page contains links to the source code for our publications. Note that code without a public repository is generally not maintained anymore. We will however help with any questions and issues as best as we can. Contact:

Open source projects

We have developed the META-pipe pipeline for marine metagenomics data analysis. We plan to open source most of these repositories. Version 2.0 consists of several backend systems, servers, and services:

  1. Marine Metageonomics Portal. Marine reference databases and more.
  2. Galaxy pipeline provided as part of the NeLS infrastructure. It is intended for Norwegian users so a a FEIDE account is needed for login.
  3. Spark based execution manager and Go components. Closed source.
  4. META-pipe job manager.
  5. META-pipe authentication service that is integrated with Elixir AAI.
  6. META-pipe web application.
  7. Object storage server. Closed source.
  8. Tool to setup META-pipe backend on the OpenStack cPouta cloud.
  9. Tool to setup META-pipe backend on OCCI enabled endpoints.
  10. Scripts to setup META-pipe backend on AWS EMR.
  11. META-pipe deployment scripts. These will not be open sourced.
  12. Marine Metagenomics Portal code.
  13. Marine reference databases web app. Closed source.
  14. Galaxy-Pulsar integration on the Stallo Supercomputer. This is specific to the Stallo machine and will not be open sourced.
  15. Auto scaling framework, simulator and runtime.

Source code for META-pipe 1.0 is in the following repositories. Note that these are not maintained anymore:

  1. META-pipe 1.0. Implemented for execution on HPC clusters.
  2. Patches for META-pipe specific metarep (1.4.0) sequence retrieval modifications.

These repositories are from research projects that use data, infrastructure, or problems from the META-pipe project:

  1. GeStore. This is a system for enabling transparent incremental updates for metagenomic pipelines.
  2. nrsoot. Minimalist process isolation tool implemented with Linux namespaces.
  3. COMBUST I/O. Abstractions facilitating parallel execution of programs implementing common I/O patterns in a pipelined fashion as workflows in Spark.
  4. Mario is a system for interactive data analysis built on top of the HBase storage system.
  5. Benchmark used to evaluate the performance of Hbase using data and access pattern found in typical biological data processing tools.

We have developed a system for data management and standardized preprocessing of the data in the NOWAC study:

  1. nowaclite: R based data management for biinformatics data.
  2. NOWAC R package: has information about the available datasets and analyses you can run on them. (closed source)
  3. Pippeline: standardized and interactive pipeline for NOWAC data preprocessing.
  4. nowaclean. R package implementing the methods of the standard operating procedure for
  5. geneset. R package of data sets and functions that facilitate gene set analysis.
  6. seq. A collection of Docker containers with different bioinformatics tools, such as GATK, bwa, and Picard, installed.

We have developed systems for data management, analysis and exploration in the NOWAC project. But these can also be used for other datasets:

  1. NOWAC R package for managing and documenting omics data.
  2. Kvik. A framework for developing interactive data exploration applications in genomics and systems biology.
  3. walrus. A system for running data analysis pipelines using Docker containers.
  4. Freia. Biological Path Visualization using Unity3D to visualize gene expression data integrated with pathway images.
  5. KEGGviewer. Simple Python Flask web viewer for KEGG images.

In addition, we have developed many different data analyses that are specific for the NOWAC data:

  1. MIxT. Matched Interaction Across Tissues (MIxT) is a web application for exploring and comparing transcriptional profiles from two or more matched tissues across individuals. Online
  2. Smoking variables. Estimate smoking status and other smoking-related variables the NOWAC-cohort.

The air:bit project repositories are:

  1. Luft. Web application for visualizing air quality in Tromsø with data from The Norwegian Institute for Air Research (NILU) and Kongsbakken VGS. Online.
  2. air:bit backend platform. The backend is deployed on Google Cloud Platform.
  3. Air quality sensor and web server. An Arduino-based portable air quality sensor kit and a Ruby on Rails web application deployed on Heroku.

Source code for our research projects (random order):

  1. Histology learning tool for use in a browser with a Python backend .
  2. validator. an R package for running repeated k-fold cross-validation.
  3. So you want to use R on stallo. A brief guide to launching long-running embararssingly parallel R jobs on the UiT supercomputer Stallo.
  4. Supporting data and code to “Empirical bayes shrinkage estimation of crime rate statistics”.
  5. krongen. Creates kronecker graphs that simulate networks with power law edge distributions.
  6. norpd_prescription_analyses Spark and Jupyter notebooks to analyse data from the Norwegian Prescription Database
  7. rhd-codes Automatic transcription of numeric codes from Norwegian population census
  8. Handwritten digit recognition and a solution implemented in Keras.
  9. UiT Github course guide and template. The unofficial guide for using GitHub for UiT courses.
  10. Replication codefor Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
  11. DICOM anonymizer.
  12. Mr. Clean is a tool for combining different visualization tools, interaction devices, and display middleware for visual comparisons on high-resolution displays.
  13. M.O.R.T.A.L. is a programming language for domain specific high performance computing.
  14. Spell expression data processing pipeline. This a data cleaning pipeline for microarray data.
  15. Troilkatt is a system for scalable batch processing of biological data built on the hadoop stack.
  16. BSV system for scalable visualizations on multi-core and multi-display platforms.