Overview
A distributed big data analytics system built to process and analyze historical weather data from NOAA (National Oceanic and Atmospheric Administration). The project leverages Hadoop MapReduce for large-scale data processing, HBase for storage, and a Python-based UI for interactive visualization of weather trends across global stations.
Key Features
- Distributed Data Processing — Hadoop MapReduce jobs for processing large-scale NOAA weather datasets (TMIN, TMAX, etc.)
- HBase Storage — scalable NoSQL database for storing station metadata and processed weather records
- Interactive UI — Python-based interface with a map view for exploring weather stations across the globe
- Global Trend Analysis — compute and visualize temperature trends over time at global and per-station levels
- Dockerized Infrastructure — fully containerized Hadoop, HBase, and Zookeeper cluster for local development
My Responsibilities
- Led the team of 3, coordinating task assignments and setting project milestones
- Organized sprint meetings and tracked progress to ensure on-time delivery
- Developed data download and ingestion scripts for NOAA station datasets
- Built MapReduce jobs for weather data processing and trend analysis
- Contributed to the Python CLI and UI for running analytic jobs and viewing results
- Helped configure the Docker-based distributed environment (HDFS, HBase, Zookeeper)
- Reviewed team members’ code and managed pull requests