Skip to content

Ideas


General setup

Google Doc for discussions

In general, we aim for one full day workshop at the MPI Berlin, but preparations need to make a transition to a virtual workshop as easy as possible. To ensure inclusivity and accessibility (e.g., for attendees with caring responsibilities, risk groups, or other reasons to stay physically distant), we should provide virtual means of attending.

  • Proposed structure: Four 1.5 hour sessions and 30min wrap-up
Time Session name Contents
08.30-09.00 Arrival 👋 Have a coffee and chat!
09.00-10.30 First morning session Concepts, Motivation
10.30-10.45 Coffee break ☕️
10.45-12.15 Second morning session Datalad concepts and principles
12.15-13.15 Lunch break 🍟
13.15-14.45 First afternoon session Reproducible Science
14.45-15.00 Coffee break ☕️
15.00-16.30 Second afternoon session Data publication and collaboration
16.30-17.00 Wrap-up, remaining Qs Outlook
___

Contents

1 Concepts, Motivation
  • Motivation/Benefits for data management, theoretical introduction into version control, FAIRness, provenance.
2 Datalad concepts and principles
  • Basics of local data/code version control + Hands on: tasks to exercise basic building blocks (dataset, datalad create, datalad save, datalad clone, datalad push, datalad update)

  • Basics of modular data management for reproducible science (YODA principles)

  • Demo: Reproducible paper (e.g., https://github.com/psychoinformatics-de/paper-remodnav/)
3 Reproducible Science
  • Basics of provenance capture and reproducible execution with and without software containers (datalad run, datalad containers-run)
  • Hands-on: sketch of a reproducible paper
4 Data publication and collaboration
  • Basic principles of shared datasets and collaborative workflows
  • Hands-on: Syncing between a local computer and a shared compute cluster
  • Hands-on: dataset publication to GitHub/GitLab/Gin (whatever)
5 Wrap-up/Outlook
  • Resources
  • Outlook into what is possible: DICOM to BIDS conversion, Metadata, ...?
  • Questions, maybe discuss usecases

To do and to plan

Infrastructure
  • Private computers (without question if virtual), (guest/existing?) user accounts on a cluster?
  • If if-person: Hygiene standards to comply to
  • Recording set up
  • Recording publication (platforms, time frame (live streaming, publishing post-workshop/...)) =======
  • Idea: Migrate this repository to the handbook repository, keep mkdocs, and create a github page under the handbook namespace -> easier to find, more lasting resource
  • Virtual attendance procedures: Client (Zoom/Jitsi/Hangouts), procedures for participation (how/when to raise questions, interactivity?)
  • Accessibilty: IMO, it can be as open as possible - at least recordings and materials should be openly shared. Unlimited participation can be too difficult to manage if contents are too interacitve
Not course-specific TODOs
  • Code of conduct
  • Post-workshop communication: Means of providing feedback
  • Pre-workshop communication: Means of suggesting topics/wishes upfront