Cyberinfrastructure for FAIR Science Workshop
About
Tackling today’s grand challenges—ranging from growing populations to climate change and natural disasters—requires managing vast amounts of data from diverse sources. However, researchers often spend significant time wrangling complex data, especially streaming data from sensors, IoT devices, scientific instruments, and geospatial data. These datasets are often large, heterogeneous, and computationally intensive to process. Funded by the National Science Foundation (NSF), our project aims to develop reusable software solutions that address common data wrangling challenges across science and engineering disciplines and train the next-generation workforce in effectively using cyberinfrastructure skills and advanced computing resources.
The goal of this hands-on workshop is to help participants become familiar with and learn how to use the tools developed under an NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) project, in order to make research data and workflows more FAIR (findable, accessible, interoperable and reusable). The workshop activities will be organized around three themes:
- Wrangling data - Resource intensive data processing pipelines using the GeoEDF workflow library
- Managing continuous data streams - Sensor Data Management with StreamCI
- Cyber training - Interactive online learning using the CyberFaces platform
(more information about these tools is provided in the attached PDF) Participants will also receive an overview of High-Performance Computing (HPC) and Cloud Computing on the NSF Anvil cluster and containerization technologies and have opportunities to interact with experts in these areas.
Participant support funds will be provided to assist with travel and lodging. Participants who complete workshop requirements will receive a certificate. Participants who create datasets, code and learning modules to share on these platforms will receive an additional honorarium.
Date: August 4-5, 2025
Place: Purdue University, West Lafayette campus
Additional Information is available here.
Workshop Schedule(DRAFT)
Day 1
8:30 - 11:30am:
- Welcome and Overview
- Presentation: GeoEDF Workflow Framework, application highlights
- Tutorial: Using and developing custom data connectors and processors
- (Participants may prototype connectors and processors for their own data sources.)
1:00 - 2:30pm:
- Presentation: StreamCI data management and processing platform, application highlights
- Tutorial: Using StreamCI to ingest, query, and analyze sensor data
- (Participants are encouraged to bring their own datasets to work with.)
Break (30 minutes)
3:00 - 5:00pm:
- Presentation: Supporting interactive online teaching and learning using CyberFaces; examples
- Tutorial: Creating and publishing interactive learning modules and training materials
- (Participants can develop or begin designing their own content.)
Day 2
8:30 - 11:30am:
-
Presentation: HPC and cloud computing on Anvil and containerization basics
-
Parallel working sessions:
- Bring your datasets and application ideas
- Discuss collaboration opportunities
- Draft post-workshop implementation plan
-
Develop use cases and plans for post-workshop implementation, Q&A.
11:30 am - 12:00 pm
- Wrap up and Q&A.
Eligibility & Application Process
This workshop is open to researchers, faculty and graduate students from U.S. institutions. To apply, please send:
- Your CV
- A short statement of interest, including but not limited to:
- Which theme(s) you are most interested in
- How you plan to engage (e.g., bringing your own data, developing resource-intensive data processing and sensor data management pipelines, creating learning materials for use in your own instruction or contributing to the CyberFACES platform)
- Any plans for future collaboration or educational use of these tools.
Deadline: May 25, 2025.
This workshop is partially supported by National Science Foundation awards #1835822 and #2230092.
Objectives
- Help participants become familiar with and learn how to use the tools developed under an NSF Cyberinfrastructure for Sustained Scientific Innovation (CSSI) project, in order to make research data and workflows more FAIR (findable, accessible, interoperable and reusable).
Modules
Instructor

Lan Zhao

Jaewoo Shin

Xiaohui Carol Song

Jibin Joseph

Rajesh Kalyanam
Discussions
Please login to view discussions.