Flight Delay Prediction in AWS
— Tp improve the customer experience for flights that were delayed.
Overview
The travel booking website intends to enhance the customer experience for flights affected by weather delays to and from the busiest domestic airports in the US. To achieve this, they aim to implement a new feature that provides customers with advance information about potential weather-related flight delays during the booking process. To address this business challenge, the goal is to develop a machine learning (ML) model capable of accurately predicting flight delays caused by weather conditions for flights involving the major airports in the country. The model will be trained using historical data on the on-time performance of domestic flights operated by large air carriers.
Problem Statement
The type of machine learning problem described in this scenario is a binary classification problem. In a binary classification problem, the goal is to classify data into one of two possible classes or categories. In this case, the ML model is tasked with predicting whether a flight will be delayed due to weather conditions or not. The two classes are "delayed flight due to weather" and "non-delayed flight." The model takes input features such as departure time, airline, origin, destination, distance, and weather conditions, and it outputs a binary label or probability indicating the likelihood of the flight being delayed.
About this dataset
This dataset contains scheduled and actual departure and arrival times reported by certified US air carriers that account for at least 1 percent of domestic scheduled passenger revenues. The data was collected by the U.S. Office of Airline Information, Bureau of Transportation Statistics (BTS). The dataset contains date, time, origin, destination, airline, distance, and delay status of flights for flights between 2013 and 2018.
Preprocessing & Feature
The dataset contains independent features like scheduled and actual departure and arrival times, origin, destination, airline, distance, and delay status of domestic flights operated by large air carriers. This data is directly affecting the dependent variable and is relevant to the problem of predicting flight delays due to weather for flights to or from the busiest airports in the US.The dataset accounts for certified US air carriers that represent at least 1 percent of domestic scheduled passenger revenues. While it may not capture every single flight, this coverage is representative enough to learn patterns about flight delays for major airlines operating in the US. The dataset spans flights between 2013 and 2018 and includes multiple variables for each flight. The size of the dataset is a critical factor for training ML models effectively. A sufficiently large dataset is suitable for training complex models and generalizing patterns.
About this dataset
This dataset contains scheduled and actual departure and arrival times reported by certified US air carriers that account for at least 1 percent of domestic scheduled passenger revenues. The data was collected by the U.S. Office of Airline Information, Bureau of Transportation Statistics (BTS). The dataset contains date, time, origin, destination, airline, distance, and delay status of flights for flights between 2013 and 2018.
Preprocessing & Feature
The dataset contains independent features like scheduled and actual departure and arrival times, origin, destination, airline, distance, and delay status of domestic flights operated by large air carriers. This data is directly affecting the dependent variable and is relevant to the problem of predicting flight delays due to weather for flights to or from the busiest airports in the US.The dataset accounts for certified US air carriers that represent at least 1 percent of domestic scheduled passenger revenues. While it may not capture every single flight, this coverage is representative enough to learn patterns about flight delays for major airlines operating in the US. The dataset spans flights between 2013 and 2018 and includes multiple variables for each flight. The size of the dataset is a critical factor for training ML models effectively. A sufficiently large dataset is suitable for training complex models and generalizing patterns.