Flight Delay Analysis and Prediction

Machine Learning

In In this project, we explored a large dataset of flight cancellations and delays from 2019 to 2023 in the United States provided by the Department of Transportation, along with NOAA weather data, the original dataset was composed of 3 Million rows, which after cleaning and preprocessing gave us a final dataset composed of 1.3 Million data.
We used machine learning models to identify trends and patterns in flight delays and cancellations and created an application for consumers to predict the likelihood that their next flight would be delayed or canceled. After testing several machine learning models, we employed the XGBoost model which achieved an accuracy of 0.68 and an F1-weighted average score of 0.70 in the Macro Average results.
Given the class imbalance in our dataset (77% for one class), we opted for macro averaging over weighted averaging. Macro averaging ensures an unbiased evaluation of the model's performance across all classes, providing a fair and robust assessment of overall model performance.
In addition, during feature engineering, we integrated detailed weather data, such as precipitation, snowfall, maximum temperature, minimum temperature and average wind speed, to contextualize flight operations in relation to weather conditions and find possible strong correlations between variables.
To assign geographic markers to each airport in our flight dataset, we use an airport coordinate dataset that includes IATA and ICAO codes, airport names, latitude and longitude.
This project aims to to address flight delays and cancellations by providing a data-driven tool to help us identify and manage flight delays and cancellations.

Link To the project GitHub

Macro Average Results
Model Precision Recall f1-score
Logistic Class. (without Weather) 0.57 0.60 0.55
Logistic Class. 0.66 0.52 0.48
Decision Tree 0.58 0.58 0.58
Decision Tree with RandomSearchCV 0.68 0.55 0.54
Random Forest 0.71 0.55 0.55
Gradient Boosting (XGBoost) 0.62 0.67 0.62
Neural Network 0.68 0.57 0.58
K-Nearest Neighbors 0.61 0.58 0.59
Random Forest (ATL Airport) 0.71 0.56 0.56
Neural Network (ATL Airport) 0.68 0.54 0.52

Final Paper

The PDF is not available for this screen size