top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Housing Price Prediction using Ames Dataset

Project type

Data Science

Date

May, 2025

Location

Los Angeles, California

Role

Data Science Engineer

Using the Ames, Iowa housing dataset from Kaggle, I built an end‑to‑end machine‑learning pipeline in Python to predict residential sale prices. After an initial exploratory analysis—visualizing the sale‑price distribution, quantifying missingness, and plotting correlation heatmaps—I dropped features with >40% missing data and imputed the rest (median for numerics, constant “Missing” for categoricals). I then wrapped median imputation, one‑hot encoding, and feature scaling into a single scikit‑learn Pipeline for reproducible preprocessing.

I trained and tuned three models (Linear, Ridge, Random Forest) with 5‑fold cross‑validation, landing on a Random Forest that achieved R² = 0.89 and RMSE ≈ $28,715. All steps, from raw CSV to final model, are documented in annotated Jupyter notebooks; outputs (predictions, charts, pickled model) are version‑controlled on GitHub. This project highlights my skills in data cleaning, feature engineering, model selection, and delivering production‑ready artifacts.

bottom of page