Integrating ML-Predicted Weather Data into EPW Files
Benjamin Shih
A Python-based thesis workflow for integrating ML-predicted localized weather data into EPW files and evaluating impacts on building energy simulation results.
Overview
This repository contains the Python workflow developed for my master’s thesis at the Georgia Institute of Technology (M.S. in High Performance Building). The project investigates how machine learning–predicted local microclimate conditions can be integrated into building energy modeling workflows by replacing selected variables in standard EPW weather files.
The main goal is to evaluate how localized air temperature and relative humidity — derived from a FusionLSTM model trained on 16 campus weather stations — affect building energy simulation results compared with conventional city-scale weather inputs.
ML Workflow
The upstream ML pipeline uses geospatial features (surface material, shadow patterns, sun angle) and climate station data to generate high-resolution localized weather maps across the Georgia Tech campus:
EPW Replacement Strategy
The localized air temperature and relative humidity predictions replace the corresponding fields in a baseline TMY EPW file, while all other variables (radiation, wind, sky conditions, pressure) remain unchanged:
Repository Structure
.
├── README.md
├── .gitignore
├── docs/
│ ├── export_gridpoint_timeseries.md # Detailed usage guide for the export script
│ └── figures/ # Diagrams and result figures
│ ├── ML_workflow.jpg
│ ├── EPW_replacement_diagram.jpg
│ └── Total_comparison.jpg
└── src/
├── export_gridpoint_timeseries.py # Extract grid-point time series from ML model
├── replace_epw_temperature_humidity.py # Replace EPW temperature & humidity
├── plot_epw_weather_profiles.py # Compare EPW weather profiles
└── plot_simulation_results.py # Plot building energy simulation results
Scripts
1. export_gridpoint_timeseries.py
Exports hourly time-series weather data for a single grid point from the FusionLSTM + kriging pipeline. Outputs a CSV with columns: Datetime, Node_ID, Temperature_C, Humidity_pct.
Key features:
-
Select grid points by OSM Node ID or grid index
-
Filter by date range and subsample at custom time steps
-
Warm-season coverage (April–September)
# Export by node ID
python src/export_gridpoint_timeseries.py --node_id 553496130
# Limit time range and subsample every 3 hours
python src/export_gridpoint_timeseries.py --node_id 553496130 \
--start_date 2015-06-01 --end_date 2015-06-30 --step_hours 3
See
docs/export_gridpoint_timeseries.mdfor full parameter reference.
2. replace_epw_temperature_humidity.py
Replaces dry-bulb temperature, relative humidity, and dew-point temperature in a baseline EPW file using the exported ML time-series data.
-
Reads original EPW and preserves the 8-line header
-
Matches timestamps between the ML CSV and EPW rows (April–September)
-
Recalculates dew-point temperature from the Magnus formula
-
Writes a new localized EPW file
3. plot_epw_weather_profiles.py
Compares dry-bulb temperature profiles across multiple EPW files over a user-specified date window.
-
Overlays baseline TMY, station-observed, and ML-localized EPW profiles
-
Useful for visually verifying the EPW replacement
4. plot_simulation_results.py
Generates grouped bar charts comparing annual building energy simulation results (EUI, CO₂ emissions, cooling load, heating load) across different weather data scenarios.
Sample Results
Typical Workflow
1. Export grid-point time series → CSV (Datetime, Node_ID, Temp, RH)
2. Replace EPW temperature/humidity → Localized EPW file
3. Run building energy simulation → (EnergyPlus / OpenStudio, external)
4. Plot & compare results → Figures
Dependencies
-
Python 3.9+
-
pandas
-
numpy
-
matplotlib
-
PyTorch (for
export_gridpoint_timeseries.py— model inference)
Input Data
| Input | Description |
|——-|————-|
| Baseline EPW | Standard TMY weather file (e.g., Atlanta TMY3) |
| ML time-series CSV | Exported from export_gridpoint_timeseries.py |
| Simulation results | EnergyPlus / OpenStudio output (manual step) |
Note: The trained model weights and raw station data are not included in this repository. Contact the author for access.
Author
Han-Syun (Benjamin) Shih
Master of Science in High Performance Building
Georgia Institute of Technology, 2025
Sustainable Urban Systems Lab · Advisor: Prof. Patrick Kastner
Source
Link to the repository.