Software

This page contains links to the source code for our publications. Note that code without a public repository is generally not maintained anymore. We will however help with any questions and issues as best as we can. Contact: larsab@cs.uit.no

Open source projects

We have developed the META-pipe pipeline for marine metagenomics data analysis. We plan to open source most of these repositories. Version 2.0 consists of several backend systems, servers, and services:

Marine Metageonomics Portal. Marine reference databases and more.
Galaxy pipeline provided as part of the NeLS infrastructure. It is intended for Norwegian users so a a FEIDE account is needed for login.
Spark based execution manager and Go components. Closed source.
META-pipe job manager.
META-pipe authentication service that is integrated with Elixir AAI.
META-pipe web application.
Object storage server. Closed source.
Tool to setup META-pipe backend on the OpenStack cPouta cloud.
Tool to setup META-pipe backend on OCCI enabled endpoints.
Scripts to setup META-pipe backend on AWS EMR.
META-pipe deployment scripts. These will not be open sourced.
Marine Metagenomics Portal code.
Marine reference databases web app. Closed source.
Galaxy-Pulsar integration on the Stallo Supercomputer. This is specific to the Stallo machine and will not be open sourced.
Auto scaling framework, simulator and runtime.

Source code for META-pipe 1.0 is in the following repositories. Note that these are not maintained anymore:

META-pipe 1.0. Implemented for execution on HPC clusters.
Patches for META-pipe specific metarep (1.4.0) sequence retrieval modifications.

These repositories are from research projects that use data, infrastructure, or problems from the META-pipe project:

GeStore. This is a system for enabling transparent incremental updates for metagenomic pipelines.
nrsoot. Minimalist process isolation tool implemented with Linux namespaces.
COMBUST I/O. Abstractions facilitating parallel execution of programs implementing common I/O patterns in a pipelined fashion as workflows in Spark.
Mario is a system for interactive data analysis built on top of the HBase storage system.
Benchmark used to evaluate the performance of Hbase using data and access pattern found in typical biological data processing tools.

We have developed a system for data management and standardized preprocessing of the data in the NOWAC study:

nowaclite: R based data management for biinformatics data.
NOWAC R package: has information about the available datasets and analyses you can run on them. (closed source)
Pippeline: standardized and interactive pipeline for NOWAC data preprocessing.
nowaclean. R package implementing the methods of the standard operating procedure for
geneset. R package of data sets and functions that facilitate gene set analysis.
seq. A collection of Docker containers with different bioinformatics tools, such as GATK, bwa, and Picard, installed.
GeneNet VT. Interactive visualization of large-scale biological networks using a standalone VR headset.

We have developed systems for data management, analysis and exploration in the NOWAC project. But these can also be used for other datasets:

NOWAC R package for managing and documenting omics data.
Kvik. A framework for developing interactive data exploration applications in genomics and systems biology.
walrus. A system for running data analysis pipelines using Docker containers.
Freia. Biological Path Visualization using Unity3D to visualize gene expression data integrated with pathway images.
KEGGviewer. Simple Python Flask web viewer for KEGG images.

In addition, we have developed many different data analyses that are specific for the NOWAC data:

MIxT. Matched Interaction Across Tissues (MIxT) is a web application for exploring and comparing transcriptional profiles from two or more matched tissues across individuals. Online
Smoking variables. Estimate smoking status and other smoking-related variables the NOWAC-cohort.

The air:bit project repositories are:

Luft. Web application for visualizing air quality in Tromsø with data from The Norwegian Institute for Air Research (NILU) and Kongsbakken VGS. Online.
air:bit backend platform. The backend is deployed on Google Cloud Platform.
Air quality sensor and web server. An Arduino-based portable air quality sensor kit and a Ruby on Rails web application deployed on Heroku.

Source code for our research projects (random order):

Histology learning tool for use in a browser with a Python backend .
validator. an R package for running repeated k-fold cross-validation.
So you want to use R on stallo. A brief guide to launching long-running embararssingly parallel R jobs on the UiT supercomputer Stallo.
Supporting data and code to “Empirical bayes shrinkage estimation of crime rate statistics”.
krongen. Creates kronecker graphs that simulate networks with power law edge distributions.
HoVer-net pipeline. Modified version of HoVer-Net.
HoVer-buid. Setup HoVer-Net environment.
HoVer-serving. for inference only code for HoVer-Net we use in Histology viewer.
HEImmune. Simple rule-based classification of cells in H&E images.
ROI TILs quantification project. Region of interest detection in whole-slide images.
Fit Futures social network. Fit Futures social network analysis code and results.
Heart sound classification. Algorithm for predicting valvular heart disease from heart sounds in an unselected cohort.
rhd-codes Automatic transcription of numeric codes from Norwegian population census
Handwritten digit recognition and a solution implemented in Keras.
UiT Github course guide and template. The unofficial guide for using GitHub for UiT courses.
rhd-linking Record linkage of Norwegian historical census data using machine learning.
Ship-detection ShipPointYOLO: Ship Detection and Description based on Point Coordinates in SAR Images
norpd_prescription_analyses Spark and Jupyter notebooks to analyse data from the Norwegian Prescription Database
Replication codefor Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
DICOM anonymizer.
Mr. Clean is a tool for combining different visualization tools, interaction devices, and display middleware for visual comparisons on high-resolution displays.
M.O.R.T.A.L. is a programming language for domain specific high performance computing.
Spell expression data processing pipeline. This a data cleaning pipeline for microarray data.
Troilkatt is a system for scalable batch processing of biological data built on the hadoop stack.
BSV system for scalable visualizations on multi-core and multi-display platforms.
Qupath-Anno conversion of image annotations from Domore dataset to and from qupath, a popular pathologist tool.