The Bacterial Analysis Pipeline and batch upload
In this exercise we will use the CGE bacterial analysis pipeline which allows several analyses to be run at once as well as batch upload of multiple isolates. We will also test two commercial platforms that offer the CGE tools - GoSeqIt Tools and Illumina BaseSpace Sequence Hub.
Create a user account on CGE.
Login to your account and hover the mouse over the "Welcome" in the top right corner. A list of options appear. Click "Batch Uploader".
Note: The BAP only allows you to run one session at a time. This means you cannot be logged in from several tabs, browsers or computers at the same time with the same user ID.
Click on "Download Metadata Template". Open the downloaded Excel file and fill out the table in the sheet named "Metadata". For help on how to fill out the table correctly, consult the descriptions in the sheet named "Attribute Descriptions". For istance, note that you have to write "yes" with a small "y" in the "pre_assembled" column, or the upload will fail. You should fill in one line per isolate. As country, you should write "Denmark", as isolation_source write "unknown", and as "collection_date" write 2016. Save the file.
Next, upload the filled-out metadata file along with the sequence files (unknown_1.fasta, unknown_2.fasta, and unknown_3.fasta). Note the red bar with warnings that will appear if the metadata file was not filled out correctly. If this happens, click the "+" in the right side of the bar to see which errors occurred.
Note: If the CGE servers are heavily used, your job will not be processed right away, but put in a queue. If this happens, you should not log out of your session (or re-login from a new browser window, which will cause you to log out from the old one) before the job is being processed. Otherwise, your job will fail.
Note: In advance of the workshop, we have created user accounts for everyone and run the three files through the pipeline to be sure you would get results in time. Please find your username on the list of participants. All passwords are 1234, and should be changed under settings.
1. When the job finishes, check the batch jobs summary. Does everything look ok?
2. How many contigs do each of the files contain and what is the N50?
3. Which species was predicted for each of the isolates?
4. What is the Sequence Type of the isolates?
5. Do the isolates contain any acquired resistance genes?
6. Do the isolates contain any plasmids?
Try to download the results as an Excel spreadsheet.
Try to rerun some of the analyses with new settings:
Unknown_1.fsa should be resubmitted to MLST with the second MLST scheme for ecoli.
Unknown_2.fsa should be resubmittet to KmerFinder using the human virus database.
Unknown_3.fsa should be resubmittet to KmerFinder using the fungi database.
7. What was the results of the re-analyses?
Data uploaded to the CGE Bacterial Analysis Pipeline will at some point be made public via upload to the European Nucleotide Archive (ENA). If you do not wish to share your data, or just need the results faster, the CGE methods are also available via two commercial platforms.
Using GoSeqIt Tools is very easy:
Create a login at GoSeqIt Tools and log in.
Running analyses at GoSeqIt Tools cost GoSeqIt Coins. New users get 100 complementary GoSeqIt Coins. From a start, the Bacterial Analysis Pipeline is available in a standard and a sensitive mode. In standard mode, the minimum %Identity for identification of resistance genes and virulence factors is 90%, while it is 80% for identification of plasmid replicons. In sensitive mode, the minimum %Identity for identification of resistance genes, virulence factors, and plasmid replicons is 60%. Reports are stored on the platform to allow re-visiting of results and download of additional files containing, e.g., sequence alignments and summaries for export to Excel. Reports can also be downloaded in PDF format.
Select one of the files unknown_1.fasta, unknown_2.fasta, or unknown_3.fasta and upload it to GoSeqIt Tools (just "drag-and-drop" from your local folder to the field marked "Drop files" in the upper right corner - you do not need to fill out or upload a metadata file). Analyse the file with Bacterial Analysis Pipeline standard. When the analysis is finished compare the results with the results obtained from the CGE pipeline.
8. Do you get the same results when running the files through the CGE Bacterial Analysis Pipeline versus the GoSeqIt Tools pipeline? If not, what could be the explanation?
It is also possible to make customised pipelines selecting among 11 individual methods. Let's assume you would additionally like to search for point mutations leading to antibiotic resistance for the E. coli isolate in unknown_1.fasta and additionally identify the fimH type and phylotype of the isolate.
- Select "Pipelines" (top left) and "Add new pipeline".
- Now tick the box to the left of each of the methods you want to add to your pipeline. Note that at GoSeqIt Tools we have maintained the name of the original method for identification of point mutations leading to antibiotic resistance, which is PointFinder. It is hence not part of the ResFinder method. Also note that if you click the name of the methods, you can change the settings related to the method. If the method name is highlighted in red when you tick the box, it means that you need to have a look at the settings and perform some action (typically select a sub-database).
- You can change the name of your customised pipeline under "Pipeline" or keep the default, which is just "Pipeline".
- Click "Save pipeline".
- Next go to "Reports" (top left) and reanalyse unknown_1.fasta, this time selecting the pipeline you just created.
9. What was the phylotype and fimH type of unknown_1.fasta?
10. Did you discover any mutations causing antibiotic resistance?
Illumina sequencing machines are automatically set up to upload sequenced data to the Illumina BaseSpace Sequence Hub. Your data is stored on Illumina BaseSpace for free, but if you want to use some of the many apps available to analyse the data, you have to pay a fee. The first month after sign-up, you are allowed to run a number of analyses for free, though.
You can sign up to Illumina BaseSpace using this link. We will not test Illumina BaseSpace during this workshop, but if you want to try it later, running the Bacterial Analysis Pipeline is described in this document: PDF