Skip to content
#

pyspark-mllib

Here are 21 public repositories matching this topic...

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

  • Updated Aug 17, 2024
  • Python

This project encompasses end-to-end ETL and ML pipeline development. Data ingestion from TMDB API covered top-rated, current, upcoming, and popular movies with genres. Performed EDA to derive several valuable insights and observations. Developed a regression model with 97% r2 score to predict average movie ratings accurately.

  • Updated Jan 5, 2025
  • Python

Improve this page

Add a description, image, and links to the pyspark-mllib topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark-mllib topic, visit your repo's landing page and select "manage topics."

Learn more