Integrating ML-Predicted Weather Data into EPW Files

Benjamin Shih

A Python-based thesis workflow for integrating ML-predicted localized weather data into EPW files and evaluating impacts on building energy simulation results.

Overview

This repository contains the Python workflow developed for my master’s thesis at the Georgia Institute of Technology (M.S. in High Performance Building). The project investigates how machine learning–predicted local microclimate conditions can be integrated into building energy modeling workflows by replacing selected variables in standard EPW weather files.

The main goal is to evaluate how localized air temperature and relative humidity — derived from a FusionLSTM model trained on 16 campus weather stations — affect building energy simulation results compared with conventional city-scale weather inputs.

ML Workflow

The upstream ML pipeline uses geospatial features (surface material, shadow patterns, sun angle) and climate station data to generate high-resolution localized weather maps across the Georgia Tech campus:

ML Workflow

EPW Replacement Strategy

The localized air temperature and relative humidity predictions replace the corresponding fields in a baseline TMY EPW file, while all other variables (radiation, wind, sky conditions, pressure) remain unchanged:

EPW Replacement Diagram

Repository Structure


.

├── README.md

├── .gitignore

├── docs/

│   ├── export_gridpoint_timeseries.md   # Detailed usage guide for the export script

│   └── figures/                         # Diagrams and result figures

│       ├── ML_workflow.jpg

│       ├── EPW_replacement_diagram.jpg

│       └── Total_comparison.jpg

└── src/

    ├── export_gridpoint_timeseries.py   # Extract grid-point time series from ML model

    ├── replace_epw_temperature_humidity.py  # Replace EPW temperature & humidity

    ├── plot_epw_weather_profiles.py     # Compare EPW weather profiles

    └── plot_simulation_results.py       # Plot building energy simulation results

Scripts

1. export_gridpoint_timeseries.py

Exports hourly time-series weather data for a single grid point from the FusionLSTM + kriging pipeline. Outputs a CSV with columns: Datetime, Node_ID, Temperature_C, Humidity_pct.

Key features:

  • Select grid points by OSM Node ID or grid index

  • Filter by date range and subsample at custom time steps

  • Warm-season coverage (April–September)


# Export by node ID

python src/export_gridpoint_timeseries.py --node_id 553496130



# Limit time range and subsample every 3 hours

python src/export_gridpoint_timeseries.py --node_id 553496130 \

  --start_date 2015-06-01 --end_date 2015-06-30 --step_hours 3

See docs/export_gridpoint_timeseries.md for full parameter reference.

2. replace_epw_temperature_humidity.py

Replaces dry-bulb temperature, relative humidity, and dew-point temperature in a baseline EPW file using the exported ML time-series data.

  • Reads original EPW and preserves the 8-line header

  • Matches timestamps between the ML CSV and EPW rows (April–September)

  • Recalculates dew-point temperature from the Magnus formula

  • Writes a new localized EPW file

3. plot_epw_weather_profiles.py

Compares dry-bulb temperature profiles across multiple EPW files over a user-specified date window.

  • Overlays baseline TMY, station-observed, and ML-localized EPW profiles

  • Useful for visually verifying the EPW replacement

4. plot_simulation_results.py

Generates grouped bar charts comparing annual building energy simulation results (EUI, CO₂ emissions, cooling load, heating load) across different weather data scenarios.

Sample Results

Simulation Results Comparison

Typical Workflow


1. Export grid-point time series     →  CSV (Datetime, Node_ID, Temp, RH)

2. Replace EPW temperature/humidity  →  Localized EPW file

3. Run building energy simulation    →  (EnergyPlus / OpenStudio, external)

4. Plot & compare results            →  Figures

Dependencies

  • Python 3.9+

  • pandas

  • numpy

  • matplotlib

  • PyTorch (for export_gridpoint_timeseries.py — model inference)

Input Data

Input Description

|——-|————-|

Baseline EPW Standard TMY weather file (e.g., Atlanta TMY3)
ML time-series CSV Exported from export_gridpoint_timeseries.py
Simulation results EnergyPlus / OpenStudio output (manual step)

Note: The trained model weights and raw station data are not included in this repository. Contact the author for access.

Author

Han-Syun (Benjamin) Shih

Master of Science in High Performance Building

Georgia Institute of Technology, 2025

Sustainable Urban Systems Lab · Advisor: Prof. Patrick Kastner

Source

Link to the repository.