Exercise5

The Bacterial Analysis Pipeline and batch upload

Data

Choose one of the following three files to work with:

Day_3/Pipeline/unknown_1.fasta

Day_3/Pipeline/unknown_2.fasta

Day_3/Pipeline/unknown_3.fasta

Analysis

Create a user account on CGE.

Login to your account and hover with the mouse across the "Welcome" in the top right corner. A list of options appear. Click on "Batch Uploader".

Note: The BAP only allows you to run one session at a time. This means you cannot be logged in from several tabs, browsers or computers at the same time with the same user ID.

Click on "Download Metadata Template". Open the downloaded Excel file and fill out the table in the sheet named "Metadata". For help on how to fill out the table correctly, consult the descriptions in the sheet named "Attribute Descriptions". Importantly, note that you have to write "yes" with a small "y" in the "pre_assembled" column, or the upload will fail. You should fill in one line per isolate. As country, you may write "Denmark", as isolation_source write "unknown", and as "collection_date" write 2016. Save the file.

Next, upload the filled-out metadata file along with the sequence file you chose to work with, and for which you filled out the metadata file (unknown_1.fasta, unknown_2.fasta, or unknown_3.fasta). Note the red bar with warnings that will appear if the metadata file was not filled out correctly. If you click on the "+" in the right side of the bar, you can see which errors occurred.

Press submit.

Note: If the CGE servers are heavily used, your job will not be processed right away, but put in a queue. If this happens, you should not log out of your session (or re-login from a new browser window, which will cause you to log out from the old one) before the job is being processed. Otherwise, your job will fail.

Note: In advance of the workshop, we have created user accounts for all and run the three files through the pipeline to be sure you would get results in time. Please go the page with all the results to find the list of your user name. All passwords are 1234, and should be changed under settings. 

When the job finishes, check the batch jobs summary. Does everything look ok?

Which species was predicted?

What is the MLST for your sample?

Try to download the results as an Excel spreadsheet

How many contigs does your file contain and what is the N50?

Do you find any resistance genes?

Do you find any plasmids?

Try to rerun some of the analysis with new settings:

Unknown_1 groups should resubmit MLST with the second MLST scheme for ecoli

Unknown_2 groups should resubmit KmerFinder using the human_virus

Unknown_3 groups should resubmit KmerFinder using the fungi database.

Do you find anything interesting?

Data uploaded to the CGE Bacterial Analysis Pipeline will eventually be made public via upload to the European Nucleotide Archive (ENA). If you do not wish to share your data, two commercial platforms run the Bacterial Analysis Pipeline too.

GoSeqIt Tools

The GoSeqIt Tools platform is currently in public beta mode, meaning that it is free to use. At a later point in time, a fee will be required when analysing isolates on the platform. Using GoSeqIt Tools is very easy:

  1. Create a login at GoSeqIt Tools and log in.
  2. Upload as many draft assemblies as you want in one go by "drag-and-drop" from a local folder to the field marked "Drop files" in the upper right corner (no need to fill out or upload a metadata file).
  3. Select wether you want the bacterial analysis pipeline to run in standard or sensitive mode. In standard mode, the minimum %Identity for identification of resistance genes and virulence factors is 90%, while it is 80% for identification of plasmid replicons. In sensitive mode, the minimum %Identity for identification of resistance genes, virulence factors, and plasmid replicons is 60%.
  4. Reports are automatically mailed to you, when an analysis is complete, but also stored on the platform to allow re-visiting of results and download of additional files containing, e.g., sequence alignments and summaries for export to Excel.

Illumina Basespace

Illumina sequencing machines are automatically set up to upload sequenced data to Illumina BaseSpace. In this case, you might want to store and also analyse the data here. Your data is stored on Illumina BaseSpace for free, but if you want to use some of the many apps available to analyse the data, you have to pay a fee. The first month after sign-up, you are allowed to run a number of analyses for free, though.