Data Preparation for Data Science

Faculty image Faculty Name Department University/college
Vinod Kumar Ahuja Vinod Kumar Ahuja

Department of Computing and Software Engineering

FGCU/ U.A. Whitaker College of Engineering

Dinh Thuy Tien Dinh Thuy Tien

Department of Artificial Intelligence

Thank Long University

TITLE: Data Preparation for Data Science


Project Description:

This COIL project, “Data Preparation for Data Science,” brought together Data Engineering students from our university and Machine Learning students from a partner institution in Vietnam. Designed to simulate a real-world business analytics workflow, the project tasked our students with exploring data processing, cleaning, transformation, and integration techniques essential for preparing datasets for analysis. Meanwhile, the partner students acted as business analysts, defining data requirements based on specific analytical or business questions. Our students then located, cleaned, and prepared the data accordingly. This reciprocal model allowed both groups to engage in cross-functional, global collaboration—mirroring real data science teams. Through shared workspaces and regular virtual check-ins, students developed both technical proficiency and intercultural communication skills. The project culminated in a presentation and delivery of a prepared dataset aligned with the specified business problem. This experience enhanced students’ understanding of the interplay between data preparation and analytical modeling in international, multidisciplinary contexts.


Student Learning Outcomes:

By the end of this project, students will be able to:

  • Apply data preparation techniques, including cleaning, transformation, and integration, to support machine learning and analytical tasks.
  • Interpret and respond to business-driven data requirements formulated by international peers, simulating real-world data engineering and analytics collaboration.
  • Communicate technical concepts effectively with non-technical stakeholders across cultural and disciplinary boundaries.
  • Collaborate in cross-cultural, interdisciplinary teams using shared tools and communication platforms.
  • Demonstrate an understanding of how data preparation impacts model selection, performance, and business decision-making.

Participating Countries: USA and Vietnam


Number of FGCU and Partner Institution Student Participants:

24 Students at FGCU

16 Students at Thang Long University


Discipline: Software Engineering and AI


FGCU Course Code & Name: COP 3710 – Intro. To Data Engineering


Project Duration: 3 months


Technology Tools: Zoom, WhatsApp, Discord, PostgreSQL, Python, Jupyterlab