Created
Jul 17, 2023 2:30 PM
Status
In progress
Category
Bioinformatics
QIIME2 isn’t working yet. Let’s use DADA2!
- Trimming primers with cutadapt
- Installation
- Installing Docker and QIIME2 for galaxy
- Running QIIME2
- Analyzing paired-end data
- Using DADA2
- Installation
Trimming primers with cutadapt
In Illumina-based sequencing data,
Installation
Installing Docker and QIIME2 for galaxy
For now I have not been able to get a local instance of Galaxy running in Docker, I think because using a Mac M1 makes everything very difficult. Until I can get it working (if ever), we will be using the free public server at https://cancer.usegalaxy.org. Skip to the Running QIIME2 section.
- Install Docker desktop:
- Install q2galaxy (GUI for QIIME 2) using the instructions under the Docker heading:
- Start Docker Desktop
- If using an Apple computer with M1/M2 chip, navigate to Settings > Features in development and check the box for “Use Rosetta for x86/amd64 emulation on Apple Silicon”
- Run:
docker pull
quay.io/qiime2/q2galaxy
(will take 5-10 minutes to download) - Start the container:
docker run -d -p 8080:80 -p 8021:21 -p 8022:22 -v $HOME/q2galaxy_data/:/export/ quay.io/qiime2/q2galaxy
- It might take a while to spin up…click the container and inspect the Logs to make sure it’s working:
- In a browser, navigate to
http://localhost:8080
. You should see this screen if everything is working properly:
Running QIIME2
- In a browser, navigate to https://cancer.usegalaxy.org
- Upload the data to be analyzed (probably
fastq.gz
files) using the Get Data tool:
Analyzing paired-end data
- Convert the FASTQ data (the raw data that comes off the sequencer) to QZA (the data that QIIME2 likes working with) by using the
qiime2 tools import
tool (use the search box to find it). - For “Type of data to import:” select FeatureData [PairedEndSequence]
- The
fastq.gz
files that were uploaded should be automatically populated. - Run the tool:
qiime2 vsearch join-pairs
to join paired-end reads together
Using DADA2
Installation
Official instructions can be found here.
- Ensure you have R version 4.3.1 installed (download from CRAN if not).
- Open RStudio
- Run the following command:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("dada2", version = "3.17")
Forward primer: TGGTGCCAGCASCCGCG
Reverse primer: TCCGTCAATTYCTTIAASTTTC
cutadapt -g TGGTGCCAGCASCCGCG -a GAAASTTXAAGRAATTGACGGA -G TCCGTCAATTYCTTXAASTTTC -A CGCGGSTGCTGGCACCA -n 2 -o 13_R1_001_ca.fastq -p 13_R2_001_ca.fastq 13_R1_001.fastq 13_R2_001.fastq
(replaced I with X)
=== Summary ===
Total read pairs processed: 130,947
Read 1 with adapter: 84,130 (64.2%)
Read 2 with adapter: 46,162 (35.3%)
Pairs written (passing filters): 130,947 (100.0%)
Total basepairs processed: 54,341,308 bp
Read 1: 26,855,924 bp
Read 2: 27,485,384 bp
Total written (filtered): 52,098,642 bp (95.9%)
Read 1: 25,485,477 bp
Read 2: 26,613,165 bp