18S metabarcoding analysis

Created

Jul 17, 2023 2:30 PM

Status

In progress

Trimming primers with cutadapt

In Illumina-based sequencing data,

Installation

Installing Docker and QIIME2 for galaxy

👋🏼

For now I have not been able to get a local instance of Galaxy running in Docker, I think because using a Mac M1 makes everything very difficult. Until I can get it working (if ever), we will be using the free public server at https://cancer.usegalaxy.org. Skip to the Running QIIME2 section.

Install Docker desktop:

Download Docker Desktop | Docker

Docker Desktop is available to download for free on Mac, Windows, or Linux operating systems. Get started with Docker today!

www.docker.com

Install q2galaxy (GUI for QIIME 2) using the instructions under the Docker heading:

GitHub - qiime2/q2galaxy: Generate Galaxy tool descriptions automatically from QIIME 2 actions.

Generate Galaxy tool descriptions automatically from QIIME 2 actions. - GitHub - qiime2/q2galaxy: Generate Galaxy tool descriptions automatically from QIIME 2 actions.

github.com

GitHub - qiime2/q2galaxy: Generate Galaxy tool descriptions automatically from QIIME 2 actions.

Start Docker Desktop
If using an Apple computer with M1/M2 chip, navigate to Settings > Features in development and check the box for “Use Rosetta for x86/amd64 emulation on Apple Silicon”

Run: docker pull quay.io/qiime2/q2galaxy (will take 5-10 minutes to download)

Start the container: docker run -d -p 8080:80 -p 8021:21 -p 8022:22 -v $HOME/q2galaxy_data/:/export/ quay.io/qiime2/q2galaxy
It might take a while to spin up…click the container and inspect the Logs to make sure it’s working:

In a browser, navigate to http://localhost:8080. You should see this screen if everything is working properly:

Running QIIME2

In a browser, navigate to https://cancer.usegalaxy.org
Upload the data to be analyzed (probably fastq.gz files) using the Get Data tool:

Analyzing paired-end data

Convert the FASTQ data (the raw data that comes off the sequencer) to QZA (the data that QIIME2 likes working with) by using the qiime2 tools import tool (use the search box to find it).

For “Type of data to import:” select FeatureData [PairedEndSequence]
The fastq.gz files that were uploaded should be automatically populated.

Run the tool: qiime2 vsearch join-pairs to join paired-end reads together

Using DADA2

Installation

👋🏼

Official instructions can be found here.

Ensure you have R version 4.3.1 installed (download from CRAN if not).
Open RStudio
Run the following command:

if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")
BiocManager::install("dada2", version = "3.17")

Forward primer: TGGTGCCAGCASCCGCG

Reverse primer: TCCGTCAATTYCTTIAASTTTC

cutadapt -g TGGTGCCAGCASCCGCG -a GAAASTTXAAGRAATTGACGGA -G TCCGTCAATTYCTTXAASTTTC -A CGCGGSTGCTGGCACCA -n 2 -o 13_R1_001_ca.fastq -p 13_R2_001_ca.fastq 13_R1_001.fastq 13_R2_001.fastq

(replaced I with X)

=== Summary ===

Total read pairs processed:            130,947
  Read 1 with adapter:                  84,130 (64.2%)
  Read 2 with adapter:                  46,162 (35.3%)
Pairs written (passing filters):       130,947 (100.0%)

Total basepairs processed:    54,341,308 bp
  Read 1:    26,855,924 bp
  Read 2:    27,485,384 bp
Total written (filtered):     52,098,642 bp (95.9%)
  Read 1:    25,485,477 bp
  Read 2:    26,613,165 bp