- TL;DR
- Preparing your BOSE account
- Perform tracking
- Transfer files to be analyzed
- Prepare and run the Snakemake script
- Monitoring jobs
TL;DR
Before tracking:
- Make sure you have the most recent commit of the
invision-tools
repository in your BOSE home directory. - Make sure the experiments to be analyzed have been uploaded from the InVision computer to the Wheeler Lab NAS.
- Make sure that snakemake is installed into your
invision-env
conda environment. - To install snakemake to the environment, run the following commands:
conda config --add channels bioconda
conda config --add channels conda-forge
# the above commands won't print any messages after completing
mamba install bioconda::snakemake=8.14.0
mamba install bioconda::snakemake-executor-plugin-cluster-generic
# type "Y" and hit Enter when asked to confirm changes
- While logged into the BOSE command line, transfer the videos from the NAS to BOSE:
- In the command below, replace
wheelenj
with your username,20240711-a02-CNN_20240711_140720.24568709
with the experiment you want to analyze, andmiracidia
with the proper BOSE directory. - Use your data.wheelerlab.bio password, not your UWEC password.
- Copy the
invision-tools
Snakemake scripts to the experiment directory: - In the command below, replace
miracidia/20240711-a02-CNN_20240711_140720.24568709
with the path to the experiment you want to analyze - Navigate to the experiment directory:
- In the command below, replace
miracidia/20240711-a02-CNN_20240711_140720.24568709
with the path to the experiment you want to analyze - Activate the
invision-env
and submit the Snakemake script, which will run tracking on all MP4s in the experiment and link the tracks together. - There are two log files that you can follow during run-time.
nohub.out
will show the logs for the snakemake command.logs/track/
andlogs/link/
will show the logs for the tracking (one log for each MP4) and the linking (one log total).
rsync -Ppvrz wheelenj@data.wheelerlab.bio:/volume1/WheelerLab_data/InVision/20240711-a02-CNN_20240711_140720.24568709 /data/groups/wheelenj/miracidia
cp ~/GitHub/invision-tools/Snakefile /data/groups/wheelenj/miracidia/20240711-a02-CNN_20240711_140720.24568709
cd /data/groups/wheelenj/miracidia/20240711-a02-CNN_20240711_140720.24568709
conda activate invision-env
nohup snakemake --profile ~/GitHub/invision-tools/slurm-profile/ &
Preparing your BOSE account
/data/groups/wheelenj/
. The following steps only need to be followed the first time a user performs an analysis.
In all code blocks below, commands that should be run in your terminal follow a $
sign - copy and paste the command (not including the $
) into your terminal and press enter. Lines representing example output following a command do not start with a $
.- Clone the
invision-tools
GitHub repository into your home folder. - Login to BOSE using the terminal:
ssh -p 50022 {UWEC username}@bose.hpc.uwec.edu
- Use your UWEC username and password.
- Login will require two-factor authentication via Okta.
- If off campus, first connect to the VPN (installation instructions here).
- The repository should be cloned into your home directory (
/data/users/{username}
): - If you cloned the repo previously but haven’t updated in awhile, pull the newest version:
- Create the conda environment required for
invision-tools
. This environment will contain all the software and Python libraries required for running the tracking scripts. - This environment can be activated at anytime with the command:
conda activate invision-env
.
$ cd # navigate to your home dir
$ mkdir GitHub # make a new GitHub dir (if you haven't already)
$ cd GitHub # navigate into the new dir
$ git clone https://github.com/wheelerlab-uwec/invision-tools.git # clone the repo
$ cd ~/GitHub/invision-tools # navigate to the repo
$ git pull # get the latest version
$ cd # navigate to your home dir
$ module load python-libs # load the required modules from the BOSE shared software library
$ conda init bash # initialize conda
$ conda env create -f ~/GitHub/invision-tools/environment.yml # create the environment
Perform tracking
Transfer files to be analyzed
- Login to BOSE using the terminal:
ssh {UWEC username}@bose.hpc.uwec.edu
- Use your UWEC username and password
- Login will require two-factor authentication via Okta
- If off campus, first connect to the VPN (installation instructions here)
- Alternatively, navigate to https://ondemand.hpc.uwec.edu/ and click BOSE Cluster Shell Access
- Transfer videos from the Wheeler Lab’s server (https://data.wheelerlab.bio) to BOSE Note: When accessing the server via a browser, you may get a warning about the connection not being private. Click Advanced and Proceed.
- On the server, all videos should be stored in the shared folder found at
/volume1/WheelerLab_data/InVision/
. Here’s what it looks like using the File Station application (click the above link to access the sign-in page): - On BOSE, all videos should be stored in the Wheeler Lab’s group folder found at
/data/groups/wheelenj
. Mosquito videos are in themosquitoes/
subfolder, miracidia videos are in themiracidia/
subfolder, and planaria videos are in theplanaria/
subfolder (here’s what it looks like when using the OnDemand file system): - Use rsync to transfer the files, for example (you will be prompted to enter the password for the NAS). The following shows the command run (on the first line) and the resulting output:
- General rsync syntax:
rsync username@server:/path/to/source /path/to/destination
- Explanation of options:
-P
- show progress-p
- preserve permissions-v
- be verbose-r
- sync recursively (i.e., copy everything in the source directory)-z
- compress while transferring
$ rsync -Ppvrz wheelenj@data.wheelerlab.bio:/volume1/WheelerLab_data/InVision/20240301-a01-MRB_20240301_144112.24568709 /data/groups/wheelenj/mosquitoes
Could not chdir to home directory /var/services/homes/wheelenj: No such file or directory
receiving incremental file list
20240301-a01-MRB_20240301_144112.24568709/
20240301-a01-MRB_20240301_144112.24568709/000000.extra_data.json
274,356 100% 261.65MB/s 0:00:00 (xfr#1, to-chk=4/6)
20240301-a01-MRB_20240301_144112.24568709/000000.hd5
1,118,607,211 100% 85.02MB/s 0:00:12 (xfr#2, to-chk=3/6)
20240301-a01-MRB_20240301_144112.24568709/000000.mp4
1,615,335,187 100% 24.12MB/s 0:01:03 (xfr#3, to-chk=2/6)
20240301-a01-MRB_20240301_144112.24568709/000000.npz
274,972 100% 262.23MB/s 0:00:00 (xfr#4, to-chk=1/6)
20240301-a01-MRB_20240301_144112.24568709/metadata.yaml
875 100% 213.62kB/s 0:00:00 (xfr#5, to-chk=0/6)
sent 123 bytes received 1,647,720,372 bytes 16,728,126.85 bytes/sec
total size is 2,734,492,601 speedup is 1.66
Prepare and run the Snakemake script
- Copy the Snakemake script from the cloned
invision-tools
repository to the folder that was transferred: - Navigate to the experiment directory:
cd /data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709/
. This step is crucial; the job will not run properly if you are not in the experiment directory at the time of submission. - Run the Snakemake workflow to start the job(s):
nohup snakemake --profile ~/GitHub/invision-tools/slurm-profile/ &
. Snakemake will automatically submit Slurm jobs for you, and it will submit them in order (i.e., it won’t submit the linking job until tracking has been completed for all videos). - Verify the job(s) have been started by running the
sacct
command:
$ cp ~/GitHub/invision-tools/Snakemake /data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709/
This script will track all MP4 videos in the experiment and link objects from all the videos together.
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
71371_0 tracking highmemory wheelenj_+ 64 RUNNING 0:0
71371_0.bat+ batch wheelenj_+ 64 RUNNING 0:0
71371_0.ext+ extern wheelenj_+ 64 RUNNING 0:0
You should see a JobID for each job that you are running.
Monitoring jobs
In addition to the sacct
command shown above, there are several ways to monitor running jobs.
- Navigate to the folder being analyzed and view the files being produced:
$ cd /data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709
$ ls -lh
total 1.6G
drwxrwx--- 2 wheelenj SFU_Users 4.0K Mar 1 21:02 000000
-rw-r--r-- 1 wheelenj SFU_Users 268K Mar 1 17:07 000000.extra_data.json
-rw-r--r-- 1 wheelenj SFU_Users 1.6G Mar 1 17:07 000000.mp4
-rw-r--r-- 1 wheelenj SFU_Users 269K Mar 1 17:07 000000.npz
-rw-r--r-- 1 wheelenj SFU_Users 875 Mar 1 17:06 metadata.yaml
A new directory for each mp4 (i.e., 000000/
) should have been created. This directory will contain the dynamically updated background (background.png
), the background-subtracted frame for every 450 frames (i.e., 000000_16650.png
), and the generated data (000000.hdf5
)
A file called nohup.out
will include log messages from the Snakemake command. Here is a snippet of what it might look like:
Using profile /data/users/wheelenj/GitHub/invision-tools/slurm-profile/ for setting default command line arguments.
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided remote nodes: 500
Job stats:
job count
----- -------
all 1
link 1
total 2
Select jobs to execute...
Execute 1 jobs...
[Tue Nov 19 09:52:45 2024]
rule link:
input: 000000.hdf5, 000001.hdf5, 000002.hdf5, 000003.hdf5, 000004.hdf5, 000005.hdf5, 000006.hdf5
output: /data/groups/wheelenj/miracidia/20241017a01rvh_20241017_125910.24568709/20241017a01rvh_20241017_125910.24568709_tracks.pkl.gz, /data/groups/wheelenj/miracidia/20241017a01rvh_20241017_125910.24568709/20241017a01rvh_20241017_125910.24568709.pdf
jobid: 1
reason: Missing output files: /data/groups/wheelenj/miracidia/20241017a01rvh_20241017_125910.24568709/20241017a01rvh_20241017_125910.24568709.pdf, /data/groups/wheelenj/miracidia/20241017a01rvh_20241017_125910.24568709/20241017a01rvh_20241017_125910.24568709_tracks.pkl.gz
threads: 64
resources: mem_mb=250000, mem_mib=238419, disk_mb=8599, disk_mib=8201, tmpdir=<TBD>, partition=week
python ~/GitHub/invision-tools/utils/link_trajectories.py /data/groups/wheelenj/miracidia/20241017a01rvh_20241017_125910.24568709/ --hdf5
Submitted job 1 with external jobid '107349'.
This log will show the Slurm job for each snakemake job that is submitted. The jobid should correspond with the output from sacct
, i.e. 107349.
Logs for each submitted job will be found in logs/
. logs/track/
will include a log for each video tracked and logs/link/
will include a log for the final linking job.