The python library numba provides (among others)
just-in-time (jit) compilation for python. Python code can gain tremendous
speed-ups without the need to rewrite a single line of code.
Adding a decorator to a function such as illustrated in this example:
import numba
@njit
def accelerate_me():
...
can lead to run times comparable to C or Fortran code.
Let's make a concrete example, where we add random numbers in a for loop (not
advised, but used here to demonstrate numba).
[Read More]
SLURM Array experiments
Posted on October 18, 2023
|
Background
The SLURM array directive provides a simple and intuitive syntax to submit a job with multiple similar tasks to the underlying HPC system.
To fully exploit this capability, a few things should be kept in mind.
In the following, we will, starting from a simple, serial job, explore how the total run time of our job behaves when the `array` option is applied.
job_001: A job with one task running on a single node on a single CPU
This is likely the most simple job thinkable: We run the command `hostname` and let the process idle for 5 seconds. Before and after the `sleep` statement,
we print out a timestamp (`date`). The time difference between the two timestamps should be 5 seconds.
[Read More]
Experimenting with the tripalv3 JSON+LD API
Posted on December 30, 2022
|
Experimenting with the tripalv3 JSON+LD API
Introduction
In this post I'm experimenting with querying the JSON-LD (LD stands for Linked Data) API of my Tripal3
genome database for Pseudomonas fluorescens SBW25. A little bit of background: Tripal is a content management
system (CMS) for genomic data, based on the Drupal CMS framework. It facilitates setting up customized genomic database
webservers to publish genome assemblies, annotation tracks, experimental and modeling data for one or several organisms. The current stable release is Tripal-v3. Tripal implements the Chado database schemefor biological (genomic) databases
Popular examples for Tripal instances are the Rice Genome Hub, or the Kiwifruit Genome Database, more examples can be found
on the Tripal website. What sets Tripal aside compared to other webserver frameworks with similar objectives (e.g. machado, the UCSC Genome Browser,
or the Ensembl Genome Browser), is that all data in the Tripal database is accessible through a JSON API. Moreover, the JSON response adheres to the JSON-LD standard issued by the World Wide Web Consortium W3C.
This feature makes Tripal (in principle) compatible with Semantic Web applications and eases the connectivity of Linked Data services such as
public SPARQL endpoints and Tripal sites or between Tripal sites.
[Read More]
Conversion of multiple sequence alignment file formats
Posted on December 23, 2022
|
Working on a multiple sequence alignment project, I wanted to calculate a distance matrix from a MSA dataset generated by progressiveMauve. The MSA file is in xmfa format.
I could not find a tool that calculates a distance matrix directly from the xmfa file, so I searched for conversion utilities to other MSA formats, such as phylip, maf, or msa.
This biostars forum entry always came out at top but the links provided there are dead. Geneious-prime loads xmfa but I could not figure out how to
write it back in a different format. Final solution: biopython:
[Read More]
Querying metadata from omero in python
Posted on December 23, 2022
| Carsten
A colleague needed a table listing image ID, image filename, and number or ROIs for all images in a project.
Here’s the python script I came up with:
Imports
import omero
from omero.gateway import BlitzGateway
import pandas as pd
import getpass
Connect to omero
conn = BlitzGateway('XXXXX', getpass.getpass(), host='omero.server.org')
conn.connect()
········
True
Retrieve datasets
This sets up an iterator over the needed datasets but does not load any data into memory (yet).
datasets = conn.getObjects(obj_type='Dataset', ids=[1,2,3.4])
records = []
for dataset in datasets:
id = dataset.id
imgs = conn.getObjects("Image", opts={'dataset':id})
for img in imgs:
records.append((id, img.id, img.getName(), img.getROICount() ))
Write records into a DataFrame
df = pd.DataFrame.from_records(records, columns=['dataset id', 'image id', 'image filename', 'roi count'])
Write data as csv file.
df.to_csv('ds_id_fname_Nroi.csv', index=False)
Create file annotation object and attach to the omero project that encapsulates the datasets.
file_annotation = conn.createFileAnnfromLocalFile('ds_id_fname_Nroi.csv', mimetype="text/plain", desc="Table of images , their dataset id, image id, filename, and ROI count.")
project.linkAnnotation(file_annotation)
Running this produces a csv file on disk and attaches this file as a file annotation on the omero project.
[Read More]