Preparing your BOSE account
/data/groups/wheelenj
. The following steps only need to be followed the first time a user performs an analysis.
In all code blocks below, commands that should be run in your terminal follow a $
sign - copy and paste the command (not including the $
) into your terminal and press enter. Lines representing example output following a command do not start with a $
.- Clone the
invision-tools
GitHub repository into your home folder. - Login to BOSE using the terminal:
ssh -p 50022 {UWEC username}@bose.hpc.uwec.edu
- Use your UWEC username and password.
- Login will require two-factor authentication via Okta.
- If off campus, first connect to the VPN (installation instructions here).
- The repository should be cloned into your home directory (
/data/users/{username}
): - Create the conda environment required for
invision-tools
. This environment will contain all the software and Python libraries required for running the tracking scripts. - This environment can be activated at anytime with the command:
conda activate invision-env
. It will be automatically activated during analysis by the Slurm scripts that manage each analysis job.
$ cd # navigate to your home dir
$ mkdir GitHub # make a new GitHub dir (if you haven't already)
$ cd GitHub # navigate into the new dir
$ git clone https://github.com/wheelerlab-uwec/invision-tools.git # clone the repo
$ cd # navigate to your home dir
$ module load python-libs # load the required modules from the BOSE shared software library
$ conda init bash # initialize conda
$ conda env create -f ~/GitHub/invision-tools/environment.yml # create the environment
Perform tracking
Transfer files to be analyzed
- Login to BOSE using the terminal:
ssh {UWEC username}@bose.hpc.uwec.edu
- Use your UWEC username and password
- Login will require two-factor authentication via Okta
- If off campus, first connect to the VPN (installation instructions here)
- Transfer videos from the Wheeler Lab’s server (https://data.wheelerlab.bio) to BOSE Note: When accessing the server via a browser, you may get a warning about the connection not being private. Click Advanced and Proceed.
- On the server, all videos should be stored in the shared folder found at
/volume1/WheelerLab_data/InVision/
. Here’s what it looks like using the File Station application (click the above link to access the sign-in page): - On BOSE, all videos should be stored in the Wheeler Lab’s group folder found at
/data/groups/wheelenj
. Mosquito videos are in themosquitoes/
subfolder and miracidia videos are in themiracidia/
subfolder (here’s what it looks like when using the OnDemand file system): - Use rsync to transfer the files, for example (you will be prompted to enter the password for the server). The following shows the command run (on the first line) and the resulting output:
- General rsync syntax:
rsync username@server:/path/to/source /path/to/destination
- Explanation of options:
-P
- show progress-v
- be verbose-r
- sync recursively (i.e., copy everything in the source directory)-z
- compress while transferring
$ rsync -Pvrz wheelenj@data.wheelerlab.bio:/volume1/WheelerLab_data/InVision/20240301-a01-MRB_20240301_144112.24568709 /data/groups/wheelenj/mosquitoes
Could not chdir to home directory /var/services/homes/wheelenj: No such file or directory
receiving incremental file list
20240301-a01-MRB_20240301_144112.24568709/
20240301-a01-MRB_20240301_144112.24568709/000000.extra_data.json
274,356 100% 261.65MB/s 0:00:00 (xfr#1, to-chk=4/6)
20240301-a01-MRB_20240301_144112.24568709/000000.hd5
1,118,607,211 100% 85.02MB/s 0:00:12 (xfr#2, to-chk=3/6)
20240301-a01-MRB_20240301_144112.24568709/000000.mp4
1,615,335,187 100% 24.12MB/s 0:01:03 (xfr#3, to-chk=2/6)
20240301-a01-MRB_20240301_144112.24568709/000000.npz
274,972 100% 262.23MB/s 0:00:00 (xfr#4, to-chk=1/6)
20240301-a01-MRB_20240301_144112.24568709/metadata.yaml
875 100% 213.62kB/s 0:00:00 (xfr#5, to-chk=0/6)
sent 123 bytes received 1,647,720,372 bytes 16,728,126.85 bytes/sec
total size is 2,734,492,601 speedup is 1.66
Prepare and run the Slurm scripts
- Copy the template Slurm scripts from the cloned
invision-tools
repository to the folder that was transferred:
$ cp ~/GitHub/invision-tools/slurm/batch_track.sh /data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709
$ cp ~/GitHub/invision-tools/slurm/batch_link.sh /data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709
These two scripts will run sequentially. The first one (batch_track.sh
) will analyze all MP4 videos in the folder and generate the tracking data. The second one (batch_link.sh
) will join the data from each video and link the tracked objects.
Tracking
Here is a simplified copy of the tracking script:
#!/bin/bash
#SBATCH --job-name="tracking" # What is your job called?
#SBATCH --output=%A\_%a_output.txt # Output file - Use %j to inject job id, like output-%j.txt
#SBATCH --error=%A\_%a_error.txt # Error file - Use %j to inject job id, like error-%j.txt
#SBATCH --array=0
#SBATCH --partition=highmemory # Which group of nodes do you want to use? Use "GPU" for graphics card support
#SBATCH --time=0-12:00:00 # What is the max time you expect the job to finish by? DD-HH:MM:SS
#SBATCH --mem=500G # How much memory do you need?
#SBATCH --ntasks-per-node=64 # How many CPU cores do you want to use per node (max 64)?
#SBATCH --nodes=1 # How many nodes do you need to use at once?
##SBATCH --gpus=1 # Do you require a graphics card? How many (up to 3 per node)? Remove the first "#" to activate.
#SBATCH --mail-type=END # What notifications should be emailed about? (Options: NONE, ALL, BEGIN, END, FAIL, QUEUE)
module load python-libs
conda init bash
conda activate invision-env
base_dir='/data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709'
export PYTHONUNBUFFERED=TRUE
python ~/GitHub/invision-tools/utils/tracking.py $base_dir/00000${SLURM_ARRAY_TASK_ID}.mp4 $base_dir/00000${SLURM_ARRAY_TASK_ID}
mv $base_dir/00000${SLURM_ARRAY_TASK_ID}/00000${SLURM_ARRAY_TASK_ID}.hdf5 $base_dir
Two elements of this script should be changed:
#SBATCH --array=0
- The InVision software will split long videos into multiple mp4 files; the first one will be
000000.mp4
, the second will be000001.mp4
, and so on. The Slurm script can be modified to start a job for each mp4 file. If there are two mp4 files as above, change the line to#SBATCH --array=0-1
; if there are three, change it to#SBATCH --array=0-2
. Modify according to the number of mp4s in the folder. If there’s only one video (i.e., most mosquito videos) leave it as#SBATCH --array=0
base_dir='/data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709'
- Change the value of the
base_dir
variable to the path of the folder that you created.
- Run the tracking script to start the job(s):
sbatch /data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709/batch_track.sh
- The path to the script should be adjusted accordingly
- Verify the job(s) have been started by running the
sacct
command:
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
71371_0 tracking highmemory wheelenj_+ 64 RUNNING 0:0
71371_0.bat+ batch wheelenj_+ 64 RUNNING 0:0
71371_0.ext+ extern wheelenj_+ 64 RUNNING 0:0
You should see a JobID for each job that you are running.
Linking
- After tracking is complete, you will see an HDF5 file for each MP4. The tracked objects from each MP4 need to be linked, and the
batch_link.sh
script will be used to perform this function. - Change the value of the
base_dir
variable to the path of the folder that you created. - A file ending in
_tracks.pkl.gz
which is a compressed Python object that contains all the tracking data (this can be read into Python/R scripts for filtering, analyzing, and plotting). - A PDF of the tracks. This PDF will not be used for any of our analyses, but it serves as a sanity check to confirm that the tracking algorithm worked. If you’re tracking miracidia, you’re likely to see many short, spurious tracks.
Here is a copy of the linking script:
#!/bin/bash
# ---- SLURM SETTINGS ---- #
# -- Job Specific -- #
#SBATCH --job-name="linking" # What is your job called?
#SBATCH --output=%A\_%a_output.txt # Output file - Use %j to inject job id, like output-%j.txt
#SBATCH --error=%A\_%a_error.txt # Error file - Use %j to inject job id, like error-%j.txt
#SBATCH --partition=week # Which group of nodes do you want to use? Use "GPU" for graphics card support
#SBATCh --time=0-2:00:00 # What is the max time you expect the job to finish by? DD-HH:MM:SS
# -- Resource Requirements -- #
#SBATCH --mem=16G # How much memory do you need?
#SBATCH --ntasks-per-node=64 # How many CPU cores do you want to use per node (max 64)?
#SBATCH --nodes=1 # How many nodes do you need to use at once?
##SBATCH --gpus=1 # Do you require a graphics card? How many (up to 3 per node)? Remove the first "#" to activate.
# -- Email Support -- #
#SBATCH --mail-type=END # What notifications should be emailed about? (Options: NONE, ALL, BEGIN, END, FAIL, QUEUE)
# ---- YOUR SCRIPT ---- #
module load python-libs
conda init bash
conda activate invision-env
base_dir='/data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709'
export PYTHONUNBUFFERED=TRUE
python ~/GitHub/invision-tools/utils/link_trajectories.py $base_dir --hdf5
One element of this script needs to be changed:
base_dir='/data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709'
This script will run link_trajectories.py
, which reads in all the HDF5 files, joins them, and links tracked objects. It will generate two files:
Monitoring jobs
You will get an email when each job is finished. In addition to the sacct
command shown above, there are several ways to monitor running jobs.
- Navigate to the folder being analyzed and view the files being produced:
$ cd /data/groups/wheelenj/mosquitoes/20240301-a01-MRB_20240301_144112.24568709
$ ls -lh
total 1.6G
drwxrwx--- 2 wheelenj SFU_Users 4.0K Mar 1 21:02 000000
-rw-r--r-- 1 wheelenj SFU_Users 268K Mar 1 17:07 000000.extra_data.json
-rw-r--r-- 1 wheelenj SFU_Users 1.6G Mar 1 17:07 000000.mp4
-rw-r--r-- 1 wheelenj SFU_Users 269K Mar 1 17:07 000000.npz
-rw-rw---- 1 wheelenj SFU_Users 0 Mar 1 21:58 70861_0_error.txt
-rw-rw---- 1 wheelenj SFU_Users 893 Mar 1 21:58 70861_0_output.txt
-rw-r----- 1 wheelenj SFU_Users 1.4K Mar 1 20:39 batch-track.sh
-rw-r--r-- 1 wheelenj SFU_Users 875 Mar 1 17:06 metadata.yaml
A new directory for each mp4 (i.e., 000000/
) should have been created. This directory will contain the dynamically updated background (background.png
), the background-subtracted frame for every 450 frames (i.e., 000000_16650.png
), and the generated data (000000.hdf5
)
Output and error files will be created that have the same JobID as each job (i.e., 70861_0_error.txt
and 70861_0_output.txt
. Tail the output file to watch the logs written out in real time:
$ tail -f 70861_0_output.txt
trackpy.feature.batch: Frame 16933: 1 features
trackpy.feature.batch: Frame 16934: 1 features
trackpy.feature.batch: Frame 16935: 1 features
trackpy.feature.batch: Frame 16936: 1 features
trackpy.feature.batch: Frame 16937: 1 features
trackpy.feature.batch: Frame 16938: 1 features
trackpy.feature.batch: Frame 16939: 1 features
trackpy.feature.batch: Frame 16940: 1 features
trackpy.feature.batch: Frame 16941: 1 features
trackpy.feature.batch: Frame 16942: 1 features
trackpy.feature.batch: Frame 16943: 1 features
If the job quits due to an error, you can find the error messages int the error file. View the messages with the less or cat command: less 70861_0_error.txt
. If there were no errors, the file will remain empty.