MultiQC is a tool to aggregate bioinformatics results across many samples into a single report. It is written in Python and contains modules for a number of common tools (FastQC, Bowtie, Picard and many others).
My latest package developed at Science for Life Laboratory, this tool was created out of frustration at having to look through hundreds of log files and sample reports to figure out if running an analysis pipeline worked or not (usually Cluster Flow).
I’d previously written a script to summarise the results held in multiple Bismark reports, but I wanted this tool to be more general and work with whatever reports it could find. I started with the data from FastQC and porting code from the previous Bismark summary script. After a little standardisation, this list of supported tools quickly expanded to cover the ten or some of the programs that I use most frequently.
What MultiQC is useful for
MultiQC is particularly helpful when comparing the progress of a sample through a processing pipeline, or between different types of analysis. The ‘General Stats’ table at the top shows how key numbers fluctuate across modules, and the overlay plots allow you to directly compare different samples in a way that isn’t possible when flicking between different reports.
I’m now in the process of working with others to generalise the code as much as possible, bringing shared processes into easy to use functions. Because of this, writing a new module to understand your favourite program’s output is surprisingly easy. Hopefully, as more people use the tool, more and more modules will be added and it will be increasingly complete!
Finally, MultiQC allows extensive templating, so you can customise the output to your heart’s desire. Check out the included ‘geo’ template for an example. This allows sequencing centres or bioinformatics cores to add functionality and branding as required.
If you have any suggestions or comments, please add them as an issue on the MultiQC repository – it would be great to hear them!