This website uses cookies in order to improve our services. If you proceed visiting this website you accept the usage of cookies. For more info please read our Data Privacy statement.


Scientific Publications on Bioinformatic Aspects in Next Generation Sequencing

This page introduces a selection of scientific publications describing different software tools, analysis methods, comparative studies on bioinformatics solutions and workflows. 

Bioinformatics in Next Generation Sequencing 

A general method to eliminate laboratory induced recombinants during massive, parallel sequencing of cDNA library

This paper investigates different conditions used during sample and library preparation in order to detect and reduce sources of chimeric and PCR artefacts. The authors suggest that the amount of input RNA greatly influences the output of recombination events, and that PCR cycling conditions also influence the rate of recombination. The optimal conditions for sample preparation, which minimise the risk of introducing recombination artefacts during sequencing, are proposed.
Waugh et al. 2015

Free Download

An extensive evaluation of read trimming effects on Ilumina NGS data analysis

In this paper, nine different read trimming tools are compared, Cuatadapt, ConDeTri, ERNE-FILTER, FASTX quality trimmer, PRINSEQ, Trimmomatic, SolexaQA and Sickle. The authors start by explaining the basics of read trimming and discuss the effects of read trimming on gene expression, SNP identification and de novo assembly. The software applications are compared, with focus on Illumina reads and the differences in output, analysis time, and computational requirements.
In summary, trimming algorithms behave differently, depending on the dataset and downstream application.
Del Fabbro et al. 2013

Free Download

A survey of sequence alignment algorithms for next-generation sequencing

This publication provides an overview of alignment algorithm and software tools. Dating from 2010, the paper does not account for recent advancements in NGS technology and software development. Different aspects requiring consideration during alignment are discussed, including gapped alignment, paired end and mate pair mapping, alignment of long sequence reads, bisulphite-treated reads, and spliced reads, and realignment.
Li and Homer 2010

Free Download

Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species

The assembly of high throughput sequencing data remains challenging, despite an increasing variety of assembling tools available on the market. In this study, sequence data from three different vertebrate species were tested in 43 assemblies from 21 different research teams. Assemblies were generated using a wide variety of software tools with significant variations in hardware and time requirements.
In this research, a high degree of variability between the results obtained from different assemblies suggests that room for improvement exists. Remarkably, some assemblies performed better with data from one species, but did not necessarily perform well with data from another species.
The authors concluded practical recommendations for assembler usage of de novo genome assembly.
Bradnam et al. 2013

Free Download

Assessing single nucleotide variant detection and genotype calling on whole-genome sequenced individuals

In this study, the sensitivity of SNP detection, the accuracy of genotype calls and the variant accuracy of six different variant calling methods (GATK, SAMtools, Consensus Assessment of Sequence and Variation (CASAVA), VarScan, glfTools andSOAPsnp) were compared. The authors discussed the impact of filtering and alignment on variant calling, and summarised that, depending on the coverage depth, different variant calling methods performed differently. With increasing coverage rates, the different methods performed better in terms of sensitivity and in the differentiation of true variants from sequencing artefacts.
Youzhi Cheng et al. 2014

Free Download

Benchmarking short sequence mapping tools

This paper described the introduction and testing of the different alignment and mapping software tools ,Bowtie, Bowtie2, BWA, SOAP2, MAQ, RMAP, GSNAP, Novoalign, and mrsFAST (mrFAST).
The authors describe the limitations of sequence mapping, and subsequently explain the different tools and criteria used for evaluation in this benchmark study. The parameters affecting mapping results are evaluated, such as read length, permitted number of mismatches, seed length, and number of mismatches in the seed region or the enabling of gapping. The focus of this study was mapping efficiency and the throughput of the software tools tested.
This extensive evaluation might therefore facilitate a better understand of the limitations and challenges of mapping for less experienced users.
In summary, mapping tools were shown to perform differently on different genomes such as human or chimpanzee.
Hatem et al. 2013

Free Download

CGAT: a model for immersive personalized training in computational genomics

This publication describes a model for training experienced scientists to improve their computational skills and statistical knowledge in order to perform NGS data analysis. The training model is divided into three phases, spanning approximately three years: an initial assessment and basic training followed by practical project training and, finally, a collaboration research project executed by the trainee. In summary, CGAT is a model of training for researchers who wish to extend their expertise into new scientific fields.
Sims et al 2015

Free Download

Distinguishing low frequency mutations from RT-PCR and sequence errors in viral deep sequencing data

In this study, the influence of RT and PCR on error rates during deep sequencing was investigated in FMDV, a small RNA virus. Error rates were determined as either dependant on position (nucleotides) or randomly distributed. In conclusion, the observed mutation rates of the enzymes were within the range indicated by the manufacturer, and mutation rates were shown to not be randomly distributed, with some sites being more susceptible to PCR errors than others.
Orten et al 2015

Free Download

Ensuring backward compatibility of traditional genotyping efforts in the era of whole genome sequencing

In this publication, two software tools (SeqSphere and CLCbio) are described and compared for spa typing of methicillin resistant Staphylococcus aureus (MRSA). The differences in typing were compared with the results obtained by traditional Sanger sequencing, and the output between the two software tools was shown to be significantly different. Overall, analysis parameters were shown to be a key requirement for high quality assembling. If assembling is optimised, the results are almost fully compatible with Sanger sequencing-based spa typing data.
Bletz et al. 2015

Free Download

Evaluation of high throughput sequencing for identification of known and unknown viruses.

The Roche 454 and Illumina NGS technologies are compared in this publication, in terms of their ability to detect 11 different viruses in artificially spiked samples. The Roche technology detected all but one virus sample, whereas the Illumina technology detected all samples present. Furthermore, the Illumina technology also managed to detect a contamination with an older plasmid DNA. The performance of NGS technology was then compared with the limit of detection of selected reference qPCR assays. Even without sample extraction and preparation optimisation, NGS technology was shown to be slightly more sensitive in some samples tested, compared to the reference qPCR assays.
Cheval et al. 2011

Free Download

GAGE-B: an evaluation of genome assemblers for bacterial organisms

In this paper, multiple assembly software tools are compared for their ability to assemble bacterial genomes from single, deep-coverage libraries. Combinations of different assemblies were also tested. The study reviewed only software tools that are freely available, and sought to determine which generates the best bacterial assemblies from a single library, which coverage rate and software parameter are important for assembling, and how assemblies differ between short (100 bp) and longer (250bp) reads. The results produced by the different assemblies varied significantly.
Magoc et al. 2013

Free Download

Integrative workflows for metagenomic analysis

Different workflows for the analysis of metagenomic data are introduced and compared in this review, including CloVR-metagenomics, the Galaxy platform, IMG/M, MetAMOS, RAMMCAP, and SmashCommunity. The general analysis workflow is presented, including quality control, assembling, gene detection, gene annotation, taxonomy analysis, and data management. The advantages and drawbacks of the workflows examined are also discussed.
Ladoukakis et al. 2014

Free Download

Metagenomics: Tools and Insights for Analyzing Next-Generation Sequencing Data Derived from Biodiversity Studies

This review gives an introduction of NGS technologies and discusses their application in meta genomics. A comprehensive overview of the bioinformatics challenges and limitations is presented, along with the introduction of different software tools available. Furthermore, an extensive software list is given with an analysis workflow chart for metagenomics data analysis.
Oulas et al 2015

Free Download

mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud

15. April 2016, Nucleic Acid Research published the study of Weissensteiner at al. "mtDNA-Server: next-generation sequencing data analysis of human mitochondrial DNA in the cloud"

This publication introduces a new scalable web server for human mitochondrial DNA (mtDNA) analysis based on Hadoop. This new tool focus on usability and reliable identification and quantification of heteroplasmic variants.

The mtDNA-Server workflow features parallel read alignment, heteroplasmy detection, artefact or contamination identification, variant annotation, and several quality control metrics. The graphical user interface of the mtDNA-Server is based on the open-source platform Cloudgene. Using the Cloudgene’s workflow enables to use Cloudgene features such as user login, data security or real-time feedback.
For data input FASTQ and SAM/BAM files are supported.

The mtDNA-Server was validated with artificial sample mix-ups on an Ion Torrent PGM and an Illumina HiSeq sequencing system. The results were compared with an analysis performed with the LoFreq software for ultra-sensitive variant detection.
The results showed that heteroplasmies and artificial recombinations were detected down to a level of 1%.

In summary, the mtDNA-Server represents an easy to use web server for the reproducible detection of heteroplasmic variants.
Handling of large amounts of mtDNA data from NGS studies is enabled by parallel data processing using the MapReduce framework.

The publication is free available at NCBI

MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics

MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics
The MutAid pipeline was developed in order to provide an integrated analysis tool for raw sequencing data produced by Sanger sequencing and next generation sequencing platforms including Illumina, Roche454 and Ion Torrent.
In the pipeline different read mapping tools are supported including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and for variant calling the following tools (GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2) are supported.
This new tool focuses on the analysis of clinical data derived from Sanger sequencing, NGS data or both and provides a complete solution for diagnostic sequencing, it is applicable for gene panels, exome as well as whole genome sequencing.
The pipeline includes the following steps: quality control and filtering, mapping reads to reference genome, variant detection, variant effect prediction, variant annotation, and creation of a variant summary table.
In summary, MutAid is a new robust, user-friendly, and integrated bioinformatics pipeline to analyze NGS and Sanger sequencing data with a single command.

MutAid is available at the sourceforge website.
Pandey et al. 2016

Free Download

Next generation informatics for big data in precision medicine era

This short review summarises the findings that were discussed during two workshops with the aim to generate a forum for data miners, informaticians and clinical researchers to share novel findings on their latest investigations in applying informatics techniques to biomedical and healthcare data.
The findings are described in more detail in five additional publications. Briefly, the following topics are introduced: a bibliometric study on tobacco regulatory science (TRS) research, the development of an in silico computational pipeline to mine severe drug-drug interaction (DDI) adverse events (ADE) using semantic web technologies, construction of a normalized cancer based PGx network (CPN) by integrating cancer orientated PGx information from multiple well known PGx resources, introduction of the Vaccine Adverse Event Reporting System (VAERS), and the development of a network-based approach to analyzing the differentially expressed genes at different time points by integrating molecular interactions and gene ontology information.
In summary, the introduced topics should help to find new solutions for big data mining and application of bioinformatics in the field of precision medicine.
Zhang et al. 2015

Free Download

QuickNGS elevates Next-Generation Sequencing data analysis to a new level of automation

The new QuickNGS framework which enables rapid, professional data analysis in major NGS applications is reviewed in this publication. A description of how the workflow (based on a MySQL database) operates, and which software tools are integrated in the workflow, is given. The workflow was tested by reanalysing 10 different RNA sequence samples from a study of transcription factor of Drosophila. The system is compared with other NGS data analysis workflow systems, namely, Galaxy, GenePattern and Chipster.
Wagle et al. 2015

Free Download

Reconstructing complex regions of genomes using long read sequencing technology

There are 900 annotated genes with long segmented duplications in the human genome. These regions are problematic to sequence with short read sequencing technology. In this paper, a 1.3Mpb region was sequenced, and the output from Sanger, PacBio, and Illumina sequencing technologies was compared. The average read length using the PacBio technology was 1,8 kbp, with a maximum of 12,4kpb achieved. Potential artefacts of using SMRT technology are discussed, along with a comparison of the costs and throughput of PacBio technology and Sanger sequencing.
Huddlestone et al. 2014

Free Download

Reproducibility of Variant Calls in Replicate Next Generation Sequencing Experiments

In this publication, the reproducibility of variant calls in breast cancer samples was evaluated by sequencing samples in duplicate or triplicate. The results showed that the variance between replicates of the same sample was relatively high, suggesting that only a fraction are true variants and that a high number are caused by technical noise during sequencing. Different factors influencing variant calling reproducibility were investigated, including coverage, variant allele count, frequency, variant allele quality, and SNV call p-value. It was proposed that the choice of sequencing technology, read length and software analysis tools can affect the reproducibility of variant calling results.
Qi et al. 2014

Free Download

RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets

The RIEMS software pipeline was developed as a tool for sensitive and reliable automated taxonomic classification of all individual reads in metagenomic sequence datasets. The RIEMS workflow includes a combination of software tools. This publication gives an introduction to current analysis tools for metagenomics sequence data. The RIEMS software was tested in different data sets and compared with other analysis tools. Speed, specificity and sensitivity of the software was analysed. In summary, the new RIEMS software tool enable reliable taxonomic classification (including of viruses) of reads from metagenomics data.
Scheuch et al. 2015

Free Download

Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools

In this publication, the author discusses the combination of de novo genome assembly with linkage maps in order to improve correctness of assembling. The assembly process and genetic linkage maps are described, along with the challenges faced in combining linkage maps with de novo assembling.
Overall, coupling de novo assembly with linkage mapping is presented as a powerful tool for producing high-quality reference genomes.
J. L. Fierst 2015

Free Download

Pin It