Software
This page contains links to the source code for our publications. Note that code without a public repository is generally not maintained anymore. We will however help with any questions and issues as best as we can.
Contact: larsab@cs.uit.no
Open source projects
We have developed the META-pipe pipeline for marine metagenomics data analysis. We plan to open source most of these repositories. Version 2.0 consists of several backend systems, servers, and services:
- Marine Metageonomics Portal. Marine reference databases and more.
- Galaxy pipeline provided as part of the NeLS infrastructure. It is intended for Norwegian users so a a FEIDE account is needed for login.
- Spark based execution manager and Go components. Closed source.
- META-pipe job manager.
- META-pipe authentication service that is integrated with Elixir AAI.
- META-pipe web application.
- Object storage server. Closed source.
- Tool to setup META-pipe backend on the OpenStack cPouta cloud.
- Tool to setup META-pipe backend on OCCI enabled endpoints.
- Scripts to setup META-pipe backend on AWS EMR.
- META-pipe deployment scripts. These will not be open sourced.
- Marine Metagenomics Portal code.
- Marine reference databases web app. Closed source.
- Galaxy-Pulsar integration on the Stallo Supercomputer. This is specific to the Stallo machine and will not be open sourced.
- Auto scaling framework, simulator and runtime.
Source code for META-pipe 1.0 is in the following repositories. Note that these are not maintained anymore:
- META-pipe 1.0. Implemented for execution on HPC clusters.
- Patches for META-pipe specific metarep (1.4.0) sequence retrieval modifications.
These repositories are from research projects that use data, infrastructure, or problems from the META-pipe project:
- GeStore. This is a system for enabling transparent incremental updates for metagenomic pipelines.
- nrsoot. Minimalist process isolation tool implemented with Linux namespaces.
- COMBUST I/O. Abstractions facilitating parallel execution of programs implementing common I/O patterns in a pipelined fashion as workflows in Spark.
- Mario is a system for interactive data analysis built on top of the HBase storage system.
- Benchmark used to evaluate the performance of Hbase using data and access pattern found in typical biological data processing tools.
We have developed a system for data management and standardized preprocessing of the data in the NOWAC study:
- nowaclite: R based data management for biinformatics data.
- NOWAC R package: has information about the available datasets and analyses you can run on them. (closed source)
- Pippeline: standardized and interactive pipeline for NOWAC data preprocessing.
- nowaclean. R package implementing the methods of the standard operating procedure for
- geneset. R package of data sets and functions that facilitate gene set analysis.
- seq. A collection of Docker containers with different bioinformatics tools, such as GATK, bwa, and Picard, installed.
- GeneNet VT. Interactive visualization of large-scale biological networks using a standalone VR headset.
We have developed systems for data management, analysis and exploration in the NOWAC project. But these can also be used for other datasets:
- NOWAC R package for managing and documenting omics data.
- Kvik. A framework for developing interactive data exploration applications in genomics and systems biology.
- walrus. A system for running data analysis pipelines using Docker containers.
- Freia. Biological Path Visualization using Unity3D to visualize gene expression data integrated with pathway images.
- KEGGviewer. Simple Python Flask web viewer for KEGG images.
In addition, we have developed many different data analyses that are specific for the NOWAC data:
- MIxT. Matched Interaction Across Tissues (MIxT) is a web application for exploring and comparing transcriptional profiles from two or more matched tissues across individuals. Online
- Smoking variables. Estimate smoking status and other smoking-related variables the NOWAC-cohort.
The air:bit project repositories are:
- Luft. Web application for visualizing air quality in Tromsø with data from The Norwegian Institute for Air Research (NILU) and Kongsbakken VGS. Online.
- air:bit backend platform. The backend is deployed on Google Cloud Platform.
- Air quality sensor and web server. An Arduino-based portable air quality sensor kit and a Ruby on Rails web application deployed on Heroku.
Source code for our research projects (random order):
- Histology learning tool for use in a browser with a Python backend .
- validator. an R package for running repeated k-fold cross-validation.
- So you want to use R on stallo. A brief guide to launching long-running embararssingly parallel R jobs on the UiT supercomputer Stallo.
- Supporting data and code to “Empirical bayes shrinkage estimation of crime rate statistics”.
- krongen. Creates kronecker graphs that simulate networks with power law edge distributions.
- HoVer-net pipeline. Modified version of HoVer-Net.
- HoVer-buid. Setup HoVer-Net environment.
- HoVer-serving. for inference only code for HoVer-Net we use in Histology viewer.
- HEImmune. Simple rule-based classification of cells in H&E images.
- ROI TILs quantification project. Region of interest detection in whole-slide images.
- Fit Futures social network. Fit Futures social network analysis code and results.
- Heart sound classification. Algorithm for predicting valvular heart disease from heart sounds in an unselected cohort.
- rhd-codes Automatic transcription of numeric codes from Norwegian population census
- Handwritten digit recognition and a solution implemented in Keras.
- UiT Github course guide and template. The unofficial guide for using GitHub for UiT courses.
- rhd-linking Record linkage of Norwegian historical census data using machine learning.
- Ship-detection ShipPointYOLO: Ship Detection and Description based on Point Coordinates in SAR Images
- norpd_prescription_analyses Spark and Jupyter notebooks to analyse data from the Norwegian Prescription Database
- Replication codefor Replication study: Development and validation of deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
- DICOM anonymizer.
- Mr. Clean is a tool for combining different visualization tools, interaction devices, and display middleware for visual comparisons on high-resolution displays.
- M.O.R.T.A.L. is a programming language for domain specific high performance computing.
- Spell expression data processing pipeline. This a data cleaning pipeline for microarray data.
- Troilkatt is a system for scalable batch processing of biological data built on the hadoop stack.
- BSV system for scalable visualizations on multi-core and multi-display platforms.
- Qupath-Anno conversion of image annotations from Domore dataset to and from qupath, a popular pathologist tool.