GCB 2026 - Workshops

Organizers: Jan Grau (MLU Halle); Stefan Kurtz (Universität Hamburg); Kay Nieselt (Eberhard Karls Universität Tübingen); Ralf Zimmer (Ludwig-Maximilians-Universität München)

Participants: max. 30

Target audience / Prerequisites: Persons involved in bioinformatics education on the study program development and/or implementation level, as well as student representatives

Description:

This workshop shall bring together people involved in bioinformatics education. Currently, bioinformatik.de lists 38 B.Sc. programs with prominent bioinformatics contents (14 with bioinformatics as major topic) and 35 M.Sc. programs (17 bioinformatics major) in Germany. These programs put varying emphasis on certain bioinformatics topics and skills, and have different access requirements. In previous workshops during GCB 2023, GCB 2024 and GCB 2025, we collected an overview of bioinformatics B.Sc. and M.Sc. programs in Germany, discussed essential, dispensable and desirable topics in bioinformatics B.Sc. programs, leading to a draft curriculum of bioinformatics B.Sc. programs.

This year, we would like to follow up on the results of the previous workshops to yield a broader picture of bioinformatics M.Sc. programs in Germany, especially with regard to regulations for admission, international programs and, as an overarching topic, the effects and consequences of AI on bioinformatics master education: AI in bioinformatics research, necessary AI skills, mandatory AI courses, AI in teaching, impact of AI on examinations, AI usage among the students. Specifically, we would like to discuss

How does the use of AI among students affect teaching concepts and the skills being developed, especially in relation to seminar theses and software development? Do we need modifications of the bioinformatics master curricula, module structures or examinations?
What are typical admission requirements for an M.Sc. in bioinformatics? May only students with a B.Sc. in bioinformatics (or similar) apply or also students with a B.Sc. in (basic) biology and/or (basic) computer science? Is a minimum B.Sc. grade required?
How international should M.Sc. programs in bioinformatics be designed? Is German as the main teaching language still acceptable or should we switch to English as the main language? What is the consequence of an international program (e.g., with regard to the number of applications)?
How diverse should M.Sc. programs be? Should there be a catalog of mandatory courses (in addition to alignment courses for other B.Sc. disciplines) or should everything be elective?

Provisional schedule:

12.30 am – 13.00 am: Summary of the results of previous workshops, open questions (Jan Grau, Stefan Kurtz, Kay Nieselt, Ralf Zimmer)
13.00 am – 14.30 am: Joint discussion on AI in bioinformatics programs
14.30 am – 15.00 am: Coffee break
15.00 am – 16.40 pm: Joint discussion on M.Sc. bioinformatics programs

WS02) Computational Pangenomics

Organizers: Kamil Hepak, Luca Parmigiani, Tizian Schulz, Jens Stoye, Roland Wittler (Bielefeld University)

Participants: max. 30

Target audience / Prerequisites: Participants should have a basic understanding of Linux operating systems to participate in hands-on sessions of the workshop and must bring their own laptop.

Description:

Computational pangenomics deals with the joint analysis of all genomic sequences of a species. Further advances in DNA sequencing technologies constantly let more and more genomic sequences become available for many species, leading to an increasing attractiveness of pangenomic studies. Pangenomics approaches have already been successfully applied to various tasks in many research areas.

The focus of this workshop is to give participants an overview and understanding of commonly used pangenomics tools. Besides an introduction into the motivation and theory behind questions from the field of pangenomics, we will look at specific tools (such as Pangrowth, Corer, PLAST, SANS, and mice) and let the participants explore their usage in hands-on sessions.

Provisional Schedule:

Introduction to computational pangenomics
Investigating a pangenome’s diversity with Pangrowth and hands-on
Pangenomic core detection with Corer and hands-on
Querying a graphical pangenome with PLAST and hands-on
Phylogenomic reconstruction with SANS and hands-on
Computing synteny blocks in a pangenome with mice and hands-on

WS03) Pangenome Mapping

Organizers: Peter Julius Heringer, Daniel Dörr (HHU Düsseldorf)

Participants: max. 20

Target audience / Prerequisites: Participants should have some experience with bioinformatic analysis and using the command line. They should bring their own laptops.

Description:

Genomes contain lots more variants than can be represented by a single reference sequence. This might be due to overlapping variants or multi-allelic loci. Still, lots of bioinformatics applications start out by sequencing (either DNA or RNA) and aligning the resulting reads to a linear reference sequence, introducing a bias towards the used reference into the data. Pangenome references can overcome these issues by containing many genomes and variants. They are therefore more likely to include a sequence that is similar to the reads that are aligned to it. However, due to pangenomes being comparatively new and their data structures being larger and more complex, the tool landscape for working with pangenomes is much less polished than comparative tools with a linear reference. This workshop has the aim of teaching participants how to make sense of the current generation of pangenome tools and how to work with pangenomes as an alternative means to map sequences.

In the workshop, participants will learn how pangenomes work, what formats there are and which are useful in the process of mapping sequences. The participants will learn to work with vg giraffe and GraphAligner for mapping reads to graphs. After that, calling variants (vg call, povu, pantree) will be discussed, as well as genotyping of known variants using Pangenie. The participants will then analyze the resulting VCFs. Additionally, it will be demonstrated how to use vg rna/rpvg for aligning RNA sequencing data to a pangenome reference. At the end, graph construction with Minigraph-Cactus/PGGB will be discussed.

Provisional Schedule:

Introduction to pangenome graphs
Mapping reads to a pangenome
Calling SNPs and structural variants
Aligning RNA-seq reads
Constructing a pangenome graph
Hands-on session on mapping

WS04) Structure of Biological Molecules

Organizers: Olga Kalinina, Volkhard Helms, Roman Joeres (Saarbrücken)

Participants: max. 60

Description:

A decade after AlphaGo’s landmark success, AlphaFold has transformed structural bioinformatics from a search for static coordinates of protein atoms to a foundation for generative design. In some way, one could say that the "folding problem" has been solved. Others may say it has evolved. As the field moves beyond predictions of three-dimensional structures for isolated protein chains, the frontier now lies in capturing the dynamic, multi-component reality of the cellular environment. This workshop brings together ML researchers and structural biologists to bridge the gap between static snapshots and functional biological ensembles. Our focus areas include, but are not limited to:

● Conformational landscapes: Modeling protein dynamics and Boltzmann distributions.

● Genotype-to-phenotype relationships: Predicting the effect of mutations on protein structure, dynamics, and function

● The non-folding frontier: Predicting intrinsically disordered regions (IDRs) and transient states.

● Inter-molecular logic: Protein-RNA interactions, ligand binding, and complex assemblies.

● Protein generation and design: Models for de novo protein generation and targeted design.

Provisional schedule:

12:30 Opening Remarks
12:40 Invited Talk I
13:20 Contributed Talk I
13:40 Contributed Talk II
14:00 Lightning Poster Round
14:00 Poster Session and Coffee Break
15:00 Invited Talk II
15:40 Contributed Talk III
16:00 Contributed Talk IV
16:20 Panel Discussion or more talks
16:40 Closing Remarks

WS05) ProteinsPlus: A Deep Dive into Protein Structures and Protein-Ligand Complexes

Organizers: Christiane Ehrt, Matthias Rarey (University Hamburg)

Participants: max. 30

Target audience / Prerequisites: Participants should bring their laptops.

Description:

This workshop introduces tools for analyzing and mining protein structures. It is based on the freely available ProteinsPlus web server (https://proteins.plus) and encompasses an introduction session and comprehensive hands-on exercises. The introduction will cover the basics and major challenges of structure-based design for pharmaceutically relevant proteins. As the web server enables users to work with protein structures from the Protein Data Bank (PDB), the AlphaFold Database, and individual protein structures in PDB format, the participants can then use the hands-on sessions to learn how to apply the introduced methods to a target of interest.

The hands-on sessions focus on analyzing and predicting protein-ligand complexes. It covers crucial steps of structure-based design, starting with a quality analysis of experimentally determined protein-ligand complexes with EDIAscorer. Next, participants will learn to analyze and depict protein-ligand complexes in 3D and 2D with PoseEdit. The final part of the first session is dedicated to predicting and characterizing pockets for structures without known binding sites, such as AlphaFold models, with DoGSite3. In the second part, we will focus on preparing binding sites and pocket ensembles with SIENA and Protoss. Finally, the participants will apply on-the-fly molecular docking of individually designed small molecules in their binding site of interest with JAMDA.

In the hands-on sessions, users can follow the instructions for a pharmaceutically relevant target structure. Alternatively, they could also decide to perform all tasks on a protein kinase, a phosphodiesterase, or a G protein-coupled receptor to cross-check the outcome of their analyses with the provided solutions to the exercises. The participants will learn to perform elaborate structure analyses, predict potential binding sites, explore protein-ligand interactions, and predict protein-ligand complexes. To this end, they need a basic knowledge of protein structures to follow the instructions and complete the workshop exercises.

Provisional schedule:

12:30: Part 1 - Selecting and Preparing Binding Sites for Structure-Based Design

1.1) Binding Site Quality Assessment
1.2) Protein-Ligand Complex Visualization
1.3) Predicting Binding Sites for Protein Structures

14:30: Coffee break

15:30: Part 2 - Structure-Based Design

2.1) Preparing Binding Site Ensembles to Cope with Protein Flexibility
2.2) Assigning Protonation States and Hydrogen Atoms to Protein Structures
2.3) Predicting Protein-Ligand Complexes by Molecular Docking

WS06) AI-assisted Data Visualization in Bioinformatics

Organizers: Mariana Galvão Ferrarini (MPI Jena), Blerina Sinaimeri (Luiss University, Rome)

Participants: max. 30

Target audience / Prerequisites: Bioinformaticians and computational biologists with basic familiarity in data analysis workflows (e.g., Python or R). No advanced programming skills required, but some experience with scripting will facilitate participation in hands-on sessions.

Description:

Bioinformatics increasingly deals with large and complex datasets, including multi-omics measurements, single-cell and spatial transcriptomics data, and large biological networks. While analytical methods have advanced rapidly, interpreting and communicating these data remains a challenge. At the same time, new AI-based tools are beginning to change how researchers explore and visualize biological data, from using large language models (LLMs) to generate and iterate on visualization code, to AI-assisted pipelines for exploratory visualization, and interfaces that help navigate high-dimensional datasets.

The goal of this workshop is to explore how these emerging approaches can support the exploration and interpretation of complex biological data. We aim to bring together researchers working on data visualization, biological networks, and AI-assisted analytical tools to exchange ideas and discuss practical applications in bioinformatics. Particular attention will be given to limitations, including reproducibility, interpretability, and potential biases introduced by AI-assisted approaches.

The workshop combines short demonstrations of tools, guided hands-on sessions and open discussions. Topics include using LLMs to generate and refine visualization code for multi-omics and network data, interactive exploration of biological networks, automated figure generation for scientific communication, and the integration of AI-based tools into existing bioinformatics workflows. Hands-on sessions will be conducted using reproducible environments (e.g., Jupyter notebooks or RStudio) with pre-configured examples. Curated, preprocessed example datasets provided by the organizers will be used during the hands-on sessions to ensure a consistent and guided experience. Participants wishing to contribute a dataset for discussion are invited to submit it in advance by email to the organizers.

By the end of the workshop, participants will be able to (i) use AI-assisted tools to iteratively generate and refine visualization code, (ii) evaluate the strengths and limitations of different AI-based approaches for exploratory analysis, and (iii) integrate these tools into their own bioinformatics workflows in a reproducible and controlled manner.

Provisional schedule:

12:30 - 13:00: Introduction; overview of AI-assisted visualization tools and concepts
13:00 - 14:00: Structured tool demonstrations (with live examples) - LLM-assisted generation and refinement of publication-quality figures using R and Python visualization libraries
14:00 - 14:30: Hands-on session 1 - guided, step-by-step exploration of curated examples
14:30 - 15:00: Coffee break
15:00 - 16:30: Hands-on session 2 - independent iteration on examples support
16:30 - 17:00: Wrap-up - evaluation of approaches, limitations, and best practices

WS07) Trustworthiness for machine learning

Organizers: Kerstin Lenhof (Göttingen); Lisa-Marie Rolli, Andrea Volkamer (Saarbrücken)

Participants: max. 30

Target audience / Prerequisites: Anyone interested in the topic is welcome. The workshop is primarily aimed at PhD students using machine learning (ML) in bioinformatics. Basic Python and ML familiarity is helpful for the practical sessions.

Description:

Given the current AI adoption rate across science, finding ways to foster justified trust in AI and ML models becomes increasingly essential, especially since these models permeate sensitive domains such as drug design and medical decision-making.

This workshop explores how technical design choices in ML contribute to trustworthiness and how trust in ML models can be strengthened in practice. The workshop includes hands-on exercises for inductive and transductive conformal prediction to establish reliability.

Provisional schedule:

12.30 - 13.00: AI literacy group work: Assessment of trustworthiness of existing ML approaches in computational biology/bioinformatics/medicine. Interactive group work; participants have time to go through high impact ML publications and discuss the trustworthiness of the presented approach.

13.00 - 14:00: Lecture: An introduction to the key concepts associated with the trustworthiness of ML models, based on: "The trustworthiness landscape in machine learning: a conceptual guide with applications in medicine", Lenhof et al. 2025, doi: 10.5281/zenodo.17591544

14.00 - 14:30: Re-evaluation of the group work to highlight the gained knowledge

14:30 - 15.00: Coffee Break

15.00 - 16:00: Lecture: Reliability of ML models with a focus on conformal prediction. Inductive and transductive CP + applications in bioinformatics

16:00 - 17:00: Two-part hands-on session: (1) Practical exercises on inductive and transductive conformal prediction; (2) Examples and implementation guidelines focusing on medical/biological applications

WS08) Turning Python scripts built with Sugar into web apps

Organizers: Tom Eulenfeld, Maria Schreiber (U Jena)

Participants: max. 30

Target audience / Prerequisites: Students and researchers interested in sequence and annotation handling and/or in developing bioinformatics web services. Participants will learn to prototype bioinformatics scripts and deploy them as interactive web applications. Prerequisites: Basic Python knowledge.

Description:

Many bioinformatics tools remain inaccessible to experimental biologists due to their command-line interfaces. This workshop addresses that gap by demonstrating how Python scripts can be transformed into user-friendly web applications. Participants will use the sugar package--a lightweight Python library for sequence and annotation handling--together with NiceGUI to build interactive frontends. Through hands-on exercises, attendees will

develop small analysis scripts and convert them into interactive web applications.

Provisional schedule:

Introduction to sugar library and NiceGUI
Time to finish installation of conda environment
Hands-on: Homology search, multiple sequence alignment, first web application
Coffee break
Introduce concepts for second application
Hands-on: Microsynteny analysis, second web application
Hands-on: Find open reading frames, final web application, alternatively, participants can adapt their own project into a web application

WS09) BioC++: Building Petabase-Scale Search with HIBF and Modern C++

Organizers: Enrico Seiler, Jonas Schulte-Mattler, Svenja Mehringer

Participants: max. 30

Target audience / Prerequisites: This workshop is mostly suited for computational biologists and bioinformaticians with a research focus on sequence analysis (e.g., genomics, metagenomics, read alignment, etc.). A fundamental knowledge about sequencing experiments and the involved data is required. We expect that attendees have an intermediate knowledge of programming with a high-level programming language, e.g., Python, Java, or Rust. While we will be utilizing modern C++ (C++20/23) via SeqAn, the core architectural components and boilerplate will be pre-written. Participants will primarily focus on high-level API integration and algorithmic concepts rather than template metaprogramming or memory management. Therefore, basic C++ knowledge is helpful but not mandatory to successfully complete the course. Attendees should bring their laptops. Software can be installed beforehand, but we will provide pre-configured environments on the de.NBI Cloud for this workshop. Participants will connect to a virtual machine equipped with the software, ensuring a standardized setup without local build system friction.

Description:

A continued decrease in sequencing costs has fueled the exponential increase in available sequencing data, with public databases like the European Nucleotide Archive (ENA) and Sequence Read Archive (SRA) reaching well in the order of petabases. This has been the incentive to develop more scalable tools for common bioinformatics tasks. One such task is the approximate searching (Approximate Membership Query - AMQ) of short sequence patterns like genes or reads in reference data sets. Lately, various indexing data structures have been proposed for searching large sequencing databases. The state-of-the-art index, the Hierarchical Interleaved Bloom Filter (HIBF), was first-in-class to index one million samples. In this workshop, we will explore the concept behind approximate membership queries and apply the HIBF to a read mapper application.

By the end of the course, learners will be able to describe the general concept of AMQ, explain key applications of AMQ, apply AMQ to an example application, compare the performance of the example application with and without AMQ.

Additionally, learners will gain hands-on experience with modern C++ API integration.

Provisional schedule:

Introduction to AMQ and HIBF (45 min)
Environment Setup & Modern C++ Primer (30 min)
Building the AMQ Prototype (75 min)
Building the Mapper Prototype (45 min)
Benchmarking and Discussion (45 min)

WS10) Metagenomics with Cloud Computing and SimpleVM

Organizers: Peter Belman, Viktor Rudko, David weinholz, Nils Hoffmann (FZ Jülich; de.NBI)

Participants: max. 30

Target audience / Prerequisites: This workshop is tailored for researchers and educators seeking to optimize their computational tasks. Whether you’re a seasoned professional or just starting out, SimpleVM’s intuitive platform and robust features will empower you to achieve more in less time. The only requirement is a basic knowledge of the Linux command line and a de.NBI Cloud account ().

https://cloud.denbi.de/wiki/registration/#denbi-cloud-access-registration-guide

Description:

SimpleVM is a self-service platform within the OpenStack-based de.NBI Cloud, designed to simplify access to computational resources for life sciences research. It offers a variety of computational options, including basic data processing, GPU-accelerated machine learning, and cluster computing, all secured by an intrusion prevention system (IPS). SimpleVM also provides pre-configured Virtual Research Environments (VREs) accessible via web browsers or SSH, encompassing integrated development environments (IDEs) and data notebooks.

In this workshop, participants will delve into a metagenomics use case, where they will learn how to scale their analysis using SimpleVM. The workshop is designed to provide both theoretical knowledge and hands-on experience with cloud computing and the advanced features of SimpleVM. Participants will use VREs, SimpleVM Cluster and S3 to search for a genome of interest in the metagenome SRA mirror of the de.NBI Cloud site Bielefeld.

Provisional schedule:

Introduction to SimpleVM and Cloud Computing
Hands-On Session: Starting Your First VM
Metagenomics Use Case: Practical Application of SimpleVM
Hands-On Session: Installing and Testing Tools
S3 Object Storage: Efficient Data Management in SimpleVM
Hands-On Session: Searching in SRA Mirror and Scaling Analysis
Advanced Features: VRE and Cluster Modes in SimpleVM
Hands-On Session: Visualizing Results with VRE and Scaling Analyses Further

WS11) Efficient Bioinformatics on HPC using Software Containers

Organizers: Natalie Breidenbach (NHR TU Dresden), Andreas Henkel (NHR U Mainz), Alexander Wilhelmi (NHR U Frankfurt)

Participants: max. 20

Target audience / Prerequisites: The tutorial is designed for early professionals, at the beginner to intermediate level across all research areas. Participants should have basic Linux and SSH knowledge, including familiarity with using the command line. A foundational understanding of HPC usage will be highly beneficial. For the hands-on activities, you will need a laptop with internet access.

Description:

This introductory tutorial equips bioinformatics researchers with the foundational skills to access and efficiently use national HPC resources (NHR) through software containers. Participants will learn how to apply for compute time on NHR systems, understand allocation policies, and submit jobs via Slurm. The core of the workshop introduces Apptainer, a container platform designed for secure, rootless execution on HPC, covering how to run, customize, and build containers for reproducible analyses. Hands-on exercises guide attendees through pulling public images, binding data, all within the context of real-world bioinformatics workflows. Emphasis is placed on reproducibility and re-usability. Examples are covering container integration with workflow managers (Snakemake/Nextflow) or lightweight LLM models. Additionally, the HPC access via the Open OnDemand web portal will be introduced, lowering the entry barrier to use HPC resources. By the end, participants will be able to apply for NHR resources, launch a containerized analysis, and submit a reproducible job, empowering them to scale their research on shared HPC infrastructure. Prior container experience may help but is not required.

Participants will leave with

• a personal Apptainer image ready for their own analyses,

• a complete NHR job script that embeds the container,

• a clear roadmap for scaling workflows, adding AI components, and sharing containers with collaborators.

Provisional schedule: