Cyberinfrastructure for FAIR Science (CI4Fair) Workshop
About
Tackling today’s grand challenges—ranging from growing populations to climate change and natural disasters—requires managing vast amounts of data from diverse sources. However, researchers often spend significant time wrangling complex data, especially streaming data from sensors, IoT devices, scientific instruments, and geospatial data. These datasets are often large, heterogeneous, and computationally intensive to process. Funded by the National Science Foundation (NSF), our project aims to develop reusable software solutions that address common data wrangling challenges across science and engineering disciplines and train the next-generation workforce in effectively using cyberinfrastructure skills and advanced computing resources.
The goal of this hands-on workshop is to help participants become familiar with and learn how to use the tools developed under an NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) project, in order to make research data and workflows more FAIR (findable, accessible, interoperable and reusable). The workshop activities will be organized around three themes:
- Wrangling data - Resource intensive data processing pipelines using the GeoEDF workflow library
- Managing continuous data streams - Sensor Data Management with StreamCI
- Cyber training - Interactive online learning using the CyberFaces platform
(more information about these tools is provided in the attached PDF) Participants will also receive an overview of High-Performance Computing (HPC) and Cloud Computing on the NSF Anvil cluster and containerization technologies and have opportunities to interact with experts in these areas.
Participant support funds will be provided to assist with travel and lodging. Participants who complete workshop requirements will receive a certificate. Participants who create datasets, code and learning modules to share on these platforms will receive an additional honorarium.
Date: August 4-5, 2025
Place: Purdue University, West Lafayette campus
Additional Information is available here.
Workshop Schedule(DRAFT)
Day 1
8:30 - 11:30am:
- Welcome and Overview
- Presentation: GeoEDF Workflow Framework, application highlights
- Tutorial: Using and developing custom data connectors and processors
- (Participants may prototype connectors and processors for their own data sources.)
1:00 - 2:30pm:
- Presentation: StreamCI data management and processing platform, application highlights
- Tutorial: Using StreamCI to ingest, query, and analyze sensor data
- (Participants are encouraged to bring their own datasets to work with.)
Break (30 minutes)
3:00 - 5:00pm:
- Presentation: Supporting interactive online teaching and learning using CyberFaces; examples
- Tutorial: Creating and publishing interactive learning modules and training materials
- (Participants can develop or begin designing their own content.)
Day 2
8:30 - 11:30am:
-
Presentation: HPC and cloud computing on Anvil and containerization basics
-
Parallel working sessions:
- Bring your datasets and application ideas
- Discuss collaboration opportunities
- Draft post-workshop implementation plan
-
Develop use cases and plans for post-workshop implementation, Q&A.
11:30 am - 12:00 pm
- Wrap up and Q&A.
Eligibility & Application Process
This workshop is open to researchers, faculty and graduate students from U.S. institutions. To apply, please send:
- Your CV
- A short statement of interest, including but not limited to:
- Which theme(s) you are most interested in
- How you plan to engage (e.g., bringing your own data, developing resource-intensive data processing and sensor data management pipelines, creating learning materials for use in your own instruction or contributing to the CyberFACES platform)
- Any plans for future collaboration or educational use of these tools.
Deadline: May 31, 2025.
This workshop is partially supported by National Science Foundation awards #1835822 and #2230092.
Objectives
- Help participants become familiar with and learn how to use the tools developed under an NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) project, in order to make research data and workflows more FAIR (findable, accessible, interoperable and reusable).
Modules
-
CI4Fair Pre-workshop Survey
Module 1 • 5 minutes
-
GeoEDF Workflow Framework, application highlights
Module 2 • 2 hours
GeoEDF is an extensible data framework designed to simplify data wrangling in geospatial research workflows. GeoEDF enables researchers to define scientific workflows as a logical sequence of data acquisition and processing steps.
Reusable building blocks termed data connectors and data processors implement data acquisition from various repositories using various data access protocols, and a range of domain-agnostic domain-specific geospatial processing operations respectively.
The GeoEDF framework defines the syntax and semantics of connectors and processors, while the engine implements the validation, transformation, job planning, and execution of declarative GeoEDF workflows encoded in YAML syntax.
-
StreamCI tutorial
Module 3 • 1 hour
StreamCI is a streaming data management and analysis platform that provides comprehensive services for sensor data acquisition and data driven analytics, models, and tools. It consists of a general purpose, scalable and secure StreamCI streaming data processing pipeline with data access API and data analytics libraries/tools, and an integrated JupyterHub computation environment for streaming data analytics, visualization, monitoring, and prediction. It is currently deployed on Anvil's composable system. In this learning module, we will introduce the design of StreamCI, showcase how it has been used to support a diverse range of use cases across disciplines, and teach how to ingest and access sensor data using its API.
-
CyberFaCES - Interactive online learning platform for FAIR Science
Module 4 • 1 hour
Welcome to the CyberFaCES Tutorial Module! In this comprehensive session, we first present an overview of our platform. Then, we will discuss the course design. Also, we'll showcase a live demo highlighting the diverse features of our powerful platform. You'll gain hands-on insights into navigating the interface, utilizing key tools, and maximizing the platform's capabilities. Additionally, we'll guide you through the step-by-step process of building your own course, offering tips and best practices to create engaging, effective learning experiences.
-
HPC and cloud computing on Anvil and containerization basics
Module 5 • 1 hour
-
CI4Fair Workshop Post Workshop Survey
Module 6 • 5 minutes
Instructor

Lan Zhao

Jaewoo Shin

Xiaohui Carol Song

Jibin Joseph

Rajesh Kalyanam

Jungha Woo

I Luk Kim

Chimdia Primus Kabuo

Christopher S Thompson
Discussions
Please login to view discussions.