NEW: Dominic Kwiatkowski’s final paper... more
Open source software
MalariaGEN has a long history of developing and contributing to open source software

MalariaGEN has a long history of developing and contributing to open source software, particularly in the areas of scientific data storage, visualisation and exploration. While our focus is on use cases in malaria genomics, we have benefited greatly from working in partnership with the broader open source community, and have been part of an exciting area of technology development. Below are some of the projects that members of the MalariaGEN team have initiated or contributed to.

Sgkit
A Python package that provides a variety of analytical genetics methods through the use of general-purpose frameworks such as Xarray, Pandas, Dask and Zarr

Scikit Allel
A Python package for exploring and analysing genetic variation data. This package provides utilities for exploratory analysis of large scale genetic variation data. It is based on Numpy, Scipy and other general-purpose Python scientific libraries.

Zarr
A Python package providing an implementation of compressed, chunked, N-dimensional arrays, designed for use in parallel computing.

Earlier projects

  • Petl – a general purpose Python package for extracting, transforming and loading tables of data.
  • Pysamstats – a Python utility for calculating statistics against genome positions based on sequence alignments from a SAM or BAM file.
  • Panoptes – a web application for exploration and visualisation of genomic and geospatioal data.
  • Lookseq – provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods
  • ExplorerCat – allows a flexible representation of data catalogs (e.g. genetic variation) and provides a simple language to query them.
  • MapSeq – a tool to integrate genotype data browsing with geographical distributions, statistical and comparative analysis and exploration of associations.