In this exercise, you will attempt to identify viral pathogens in a disease outbreak matrix.


In 2016, a 14 year old boy from Berlin, Germany, came to the hospital with sudden blindness, reduced consciousness and movement disorders. The mother of the patient reported developmental disorders starting one year ago, with concentration problems, uncontrolled fit of rages, overall decreasing performance in school and occasional compulsive head nods. Unfortunately the patient did not receive medical investigation or treatment, but attended psychological treatment, assuming behavioural problems. MRT of the patient’s brain showed white and gray matter lesions and gliosis. Soon after hospitalisation, the patient showed a persistent vegetative state and died.

We have received metagenomics data obtained by sequencing a sample of the boy’s brain tissue using the Illumina HiSeq platform generating single reads of 150 bp each.



Open the file in a text editor. Notice the typical format of a fastq file, where information about each read covers four lines.


For analyzing the data, we will use MGmapper. Run it with the following settings:

Mapping mode: single-end

Trimming of reads via cutadapt: No

Database id's for Best-mode mapping: 2,3,4,5,8 (where 2-5 are bacterial databases and 8 is a viral database)

Clade level post-processing > Max mismatch ratio: 0.1

Leave all other settings as they are.

When the analysis has finished (or if you can't wait, go to the pre-run results), try to answer the following questions:

1. How many biological relevant reads does the sample contain?

2. Which bacterial species is most abundant?

3. How is depth and coverage defined?

4. Which viral pathogen is present in the sample?

5. How many reads map to the viral pathogen?

6. Would the viral pathogen have been reported, if you had used the default setting for Max mismatch ratio (0.01)?