Introduction
Globus Online is a web service to transfer data over the internet. It is particularly suited for large files in the GB to TB range. The largest dataset transferred with globus so far is a 2.9 PB file from Argonne National Lab link.
I just completed a workflow where I used globus to transfer a ~500 GB dataset from the DRACO HPC cluster at the MPCDF to a shared samba drive at MPI Evolutionary Biology. Here’s a short writeup of the steps involved:
Upload from MPCDF to DataHub
- Transfer the data from DRACO to DataHub (MPCDF’s Globus Server Endpoint). I did this with plain sftp using a command like this:
$> echo 'put -r SOURCEDIR/' | sftp -b - datahub.mpcdf.mpg.de:/data/grotec/TARGETDIR/.
This command recursively copies the source directory into the target directory on DataHub.
Globus Online has a web portal at https://www.globus.org/. MPG members can use their LDAP ActiveDirectory credentials to login. After the first login, the dashboard is empty since no endpoints are connected yet.
After connecting the DataHub endpoint (datahub.mpcdf.mpg.de), I can list the
uploaded files in the file browser.
Transfer from DataHub to local storage
To get the data from DataHub into local storage, I set up a personal globus endpoint on wallace and initialize the transfer from DataHub to my personal endpoint through the globus online web interface.
Create and run a personal endpoint
Install the ‘globus-cli’ python library
I did this through conda:
$> conda create -n globus
$> conda activate globus
$> conda install -c conda-forge globus-cli
In between, I had to install tcllib
from sources into the conda environment
$CONDA_PREFIX
. Depending on your system, this may not be neccessary.
$> wget https://core.tcl-lang.org/tcllib/uv/tcllib-1.19.tar.gz
$> tar xzvf tcllib-1.19.tar.gz
$> cd tcllib-1.19
$> ./configure --prefix=$CONDA_PREFIX
$> make
$> make install
Setup globus environment
First, login to the globus network
$> globus login
Then, create a new (personal) endpoint
$> globus endpoint create --personal <ENDPOINT NAME>.
Record the endpoint ID and key for later use.
Find more detailed information at https://docs.globus.org/cli/.
Start the endpoint
To actually run the endpoint, another tool, globusconnect
is needed.
Instructions to install and run here: https://docs.globus.org/how-to/globus-connect-personal-linux/
After downloading and unpacking globusconnect
, cd
into the globusconnect
directory and run
$> globusconnect -setup <ENDPOINT KEY>
Follow any instructions (to authorize the connection).
To start the endpoint (on wallace, you may want to run this in a screen
session):
$> globusconnect -start &
Transfer files from DataHub to personal endpoint.
In the web interface, under “Collection”, select the source endpoint in one panel and the target (destination) endpoint in the right panel. Navigate to the directories containing the source files (directories) and the targets. Select the files/directories to transfer and click the “Start button”.

Monitor progress of transfer
Follow the “Activities” icon on the left toolbar to see a log and some statistics on the submitted/active/completed/failed transfers.