Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Housing Price Prediction using Ames Dataset
Project type
Data Science
Date
May, 2025
Location
Los Angeles, California
Role
Data Science Engineer
Using the Ames, Iowa housing dataset from Kaggle, I built an end‑to‑end machine‑learning pipeline in Python to predict residential sale prices. After an initial exploratory analysis—visualizing the sale‑price distribution, quantifying missingness, and plotting correlation heatmaps—I dropped features with >40% missing data and imputed the rest (median for numerics, constant “Missing” for categoricals). I then wrapped median imputation, one‑hot encoding, and feature scaling into a single scikit‑learn Pipeline for reproducible preprocessing.
I trained and tuned three models (Linear, Ridge, Random Forest) with 5‑fold cross‑validation, landing on a Random Forest that achieved R² = 0.89 and RMSE ≈ $28,715. All steps, from raw CSV to final model, are documented in annotated Jupyter notebooks; outputs (predictions, charts, pickled model) are version‑controlled on GitHub. This project highlights my skills in data cleaning, feature engineering, model selection, and delivering production‑ready artifacts.







