Exercise6

Metagenomics

In this exercise, you will try to identify potential viral pathogens in a disease outbreak matrix.

Data

On the USB stick handed out earlier, you will find the following file, which contain raw single-end reads:

Day_3/Metagenomics/outbreak_matrix_pathogen.fastq

Open the file in a text editor. Notice the typical format of a fastq file, where information about each read covers four lines.

Analysis

For analyzing the data, we will use MGmapper. Run it with the following settings:

Mapping mode: single-end

Trimming of reads via cutadapt: No

Database id's for Best-mode mapping: 2,3,4,5,8 (where 2-5 are bacterial databases and 8 is a viral database)

Clade level post-processing > Max mismatch ratio: 0.1

Leave all other settings as they are.

When the analysis has finished (or if you can't wait, go to the results via the page with links to all results), try to answer the following questions:

  • How many biological relevant reads does the sample contain?
  • Which bacterial species is most abundant?
  • How is depth and coverage defined?
  • Which viral pathogen is present in the sample?
  • How many reads map to the viral pathogen?
  • Would the viral pathogen have been reported, if you had used the default setting for Max mismatch ratio (0.01)?