The purpose of this exercise is to generate phylogenetic trees of the MRSA or E. coli isolates of your group using two different web-services: NDtree and CSIPhylogeny.
You will work with your group-specific data as described in Exercise2.
The first step in making your phylogenetic trees is to find a suitable reference genome. This is most easily done using KmerFinder (https://cge.cbs.dtu.dk/services/KmerFinder/)(or go to the results from Day 1). The genome in KmerFinder's database with which most of your five isolates have most kmers in common is a good choice as a reference. Note that in the output table from KmerFinder, each database hit is accompanied by a link marked "get sequence". Clicking this link will lead you to an ftp site from where the genome sequence can be downloaded. The file that you are interested in (which contains the nucleotide sequence in FASTA format), is the one that has the extension .fna. In case of several .fna files, they represent the chromosome and individual plasmids of the isolate, respectively. For this exercise, we will only use the chromosome as a reference. The first (largest) .fna file contain the chromosome sequence.
Once you have your reference genome in a FASTA file, you are ready to make the trees using the two different tools.
Go to NDtree. Note the links to "Instructions" and "Output", where you can find guidance for the submission step and interpretation of results.
NDtree is very easy to use, as long as you know that it uses the term "template genome" instead of "reference genome". All you have to do is to point to the just-found reference (template) genome via the "Choose File" button. If you do not specify a particular reference genome to be used, the service will automatically find one among a number of available reference (template) genomes (but don't use this feature for this exercise). Next, select the files (NDtree can take multiple files at once) for which you want to make a phylogenetic tree using the button marked "Isolate File". Leave all other settings as they are.
Note: If you expect that the quality of one of your draft genomes is very poor, you may omit it, as it will otherwise mess up your tree.
Note: If you have raw sequence reads available, they will give a more precise result, but to save time, we will only use assembled genomes.
After having run NDtree with your five (or four in case of poor quality for one) input files, locate the "Downloads", and download the files “dist.mat.gz”, “tree.nj.newick.gz” and ”tree.upgma.newick.gz”. dist.mat.gz is a zipped text file, which contains the distance matrix. It can be opened in a text editor. The .newick files can be opened in figtree, a tree viewing program. If you have not installed figtree yet it can be downloaded here: figtree.
Open the tree.nj.newick file in figtree.
From the phylogenetic tree shown in figtree, can you identify any isolates that seem to form a cluster and might be part of the same outbreak? Can you identify any clear outliers?
Now open the tree.upgma.newick file in figtree.
Does the UPGMA and the NJ tree look the same? If not, what could be the explanation?
Now take a look at the distance matrix.
What is the range of SNPs (the branch lengths) between the isolates that form the cluster in the phylogenetic tree (smallest no. of SNPs to largest no. of SNPs)?
What is the range of SNPs (branch lengths) between the isolates that form the cluster and the outlier?
If it is not possible from the phylogenetic tree to see the relationship of the isolates within the cluster, try to run NDtree again, but without the outlier. This will improve the resolution of the closely related isolates in the cluster.
Have another look at the values in the disctance matrix. For the same pair of isolates are they identical to the values found previously? If not, how can this be explained?
Go to CSIPhylogeny. Note again the links to "Instructions" and "Output".
Create a tree using the default settings, the same reference (template) genome that you used for NDtree and upload your files (again, omit files if they are of very poor quality).
Download the tree in .newick format and open it in figtree.
Does the tree look like the NJ tree created by NDtree? If not, what could be the explanation?
Describe the clusters and outliers. How large is the branch lengths between the cluster and the outlier? How large are they within the cluster? How many SNPs are there between the cluster and the outlier? How many within the cluster?
Hint: Use figtrees branch labels setting to show the branch lengths on the tree.
If you have more time, you can rerun CSIPhylogeny without the outlier.
Compare your trees with the trees in the figure page 132 of Harris et al 2013 (if you are a MRSA1 or MRSA2 group) and with Figure 1 in Grad et al 2012 (if you are a Ecoli1 or Ecoli2 group). Do you see the same relationship between the isolates?