Modeling of Diamond Prices: A Data-Driven Analysis

Electronic Business and Big Data Infrastructures

This project explores predictive modeling of diamond prices using the Kaggle Diamonds dataset. With over 53,000 observations and 10 variables, the analysis leverages machine learning and advanced preprocessing techniques to provide insights into the factors influencing diamond prices. The study evaluates the impact of skewness correction on model performance, offering recommendations for practitioners in data-driven pricing strategies.



Models and Results

Studied Models
  • Linear Regression
  • K-Nearest Neighbors (KNN)
  • Decision Tree
  • Random Forest
  • Gradient Boosting
  • Neural Network
Key Results

Neural Networks performed best after skewness correction, achieving the lowest prediction error and demonstrating the importance of preprocessing for complex models. Tree-based models like Random Forest showed robust performance even without transformations, making them suitable for straightforward applications. Meanwhile, simpler models such as Linear Regression and KNN performed consistently but did not match the accuracy of ensemble methods. These findings underscore the importance of tailoring model selection to the specific use case, balancing accuracy with computational resources and interpretability.


Final Paper

The PDF is not available for this screen size