PhishingURL

Phishing URL Detector

An AI system for detecting phishing URLs using ensemble learning and advanced feature engineering.

Phishing URL Detector

— An AI phishing detection system powered by advanced ML pipelines and feature engineering, built to spot malicious URLs with high precision.

Overview
‍
Phishing remains one of the most prevalent cybersecurity threats, tricking users into revealing sensitive information via deceptive links. This Spam Detector is a machine learning-based solution trained to detect such phishing attempts using structural and behavioral URL features.
‍
Problem Statement

Phishing URLs use techniques like:
‍
-Redirection & mimicry of trusted domains
-Homoglyph attacks (e.g., "g00gle.com")
-URL-based fingerprinting and data harvesting

These threats are hard to detect manually or with simple blacklists. We needed an adaptable, intelligent system that evolves with phishing strategies.

Dataset Compilation
‍
‍Combined and cleaned data from 4 major sources:
UCI Machine Learning Repository (2024)
Kaggle Datasets (2021, 2023)
PhishTank (2021)

Final Dataset Size: ~297,725 records | 📊 Features: 56

Preprocessing & Feature Engineering
‍
‍Web-crawling to extract uniform features from multiple sources
Added custom features like: URL EntropyDot & Uppercase Letter Count
Presence of keywords: login, secure, verify, etc.

Feature selection based on statistical significance

Modeling

‍Trained multiple ML classifiers (e.g., Random Forest, XGBoost)
Performed: GridSearchCV for basic hyperparameter tuning and Optuna for advanced Hyperparameter tuning.
Final model showed high accuracy with no overfitting.

Results

Trained and tuned multiple models, including logistic regression, neural networks, and boosting algorithms (XGBoost),achieving 99.47% accuracy.
‍‍

References

‍UCI Repository (2024)
Kaggle URL Dataset (2023)
‍Kaggle URL Dataset (2021)
‍PhishTank Database
‍

‍Tech Stack
‍
‍Python, scikit-learn, pandas,
Optuna
BeautifulSoup & Requests for crawling
‍Jupyter, Matplotlib, Seaborn for analysis

‍

Phishing URL Detector

Back To Homepage