pyspark-mllib

Objectives: Using pyspark, MLlib and graphframes libraries, perform 1) classification and custering tasks using RandomF and Kmeans and 2) graph analysis tasks. This material is from UIUC MCS coursework.

graphframes pyspark-mllib

Updated Jan 11, 2022
Python

zahidadeel / product-recommendation-engine

Star

Product recommendation engine by using LSH Jaccard distance

pyspark recommendation-engine pyspark-mllib

Updated Nov 30, 2021
Python

OmarNouih / Twitter-Streams

Star

Real-Time Sentiment Analysis on Twitter Streams is a web application that categorizes tweets into sentiments like Negative, Positive, Neutral, or Irrelevant. Built using Apache Kafka , Spark and PySpark ML models, it offers real-time analysis capabilities.

machine-learning kafka spark streams pyspark spark-streaming kafka-streams pyspark-mllib

Updated May 20, 2024
Python

SyedIkram / Analysis-and-Recommendations-on-YELP-Dataset

Star

Analysis and Recommendations on YELP Dataset

pyspark yelp-dataset pyspark-mllib

Updated Jan 6, 2019
Python

ksashok / Movie-Recommendation-PySpark

Star

Movie Recommendation using Apache Spark MLlib

machine-learning apache-spark pyspark als movie-recommendation spark-submit spark-ml pyspark-mllib pyspark-machine-learning

Updated Jul 28, 2019
Python

cgDeepLearn / LearnSpark

Star

😄 Learn pyspark

python spark pyspark mllib pyspark-mllib

Updated Mar 30, 2018
Python

divithraju / divith-raju-pipeline-hadoop-pyspark

Star

This project presents a comprehensive data pipeline designed to predict customer churn using historical customer data. By leveraging Hadoop and PySpark, this pipeline efficiently processes large datasets, performs feature engineering, and trains a machine learning model to identify customers at risk of leaving.

linux open-source data database hadoop pipeline ubuntu bigdata apache project python3 pyspark software-engineering dataengineering hadoop-hdfs pyspark-mllib pyspark-python project-repository

Updated Aug 17, 2024
Python

prakashdontaraju / dietary-trends-pyspark

Star

12 year nutrient intake analysis across financial classes with PySpark and KMeans clustering

Updated Mar 9, 2022
Python

aviggithub / PySpark

Star

PySpark is a Python API for support Python with Spark. Whether it is to perform computations on large datasets or to just analyze them

python ai spark linear-regression ml datascience pyspark spark-streaming pyspark-tutorial pyspark-mllib machine-leanring pyspark-python pyspark-machine-learning pyspark-ml model-building-and-evaluation python-spark pysparkml

Updated Jan 22, 2023
Python

limz1986 / PySpark-ML-Model-DataBricks

Star

An introduction to PySpark, Creating a simple multi regression ML model and hosting it on a databricks cluster

linear-regression pyspark databricks databricks-notebooks pyspark-mllib

Updated Sep 24, 2022
Python

kabbina / Big-Data

Star

spark hadoop pagerank spark-streaming pyspark-mllib spam-ham-python

Updated Jan 2, 2022
Python

SayamAlt / TMDB-Movies-End-to-End-ETL-and-ML-Pipeline

Star

This project encompasses end-to-end ETL and ML pipeline development. Data ingestion from TMDB API covered top-rated, current, upcoming, and popular movies with genres. Performed EDA to derive several valuable insights and observations. Developed a regression model with 97% r2 score to predict average movie ratings accurately.

spark exploratory-data-analysis data-transformation data-visualization feature-engineering regression-models data-ingestion extract-transform-load azure-key-vault etl-pipeline pyspark-mllib azure-databricks mlflow mlflow-tracking model-training-and-evaluation

Updated Jan 5, 2025
Python

Chan2k20 / Wine-Prediction-Prediction-Model-On-AWS-EMR

Star

Implemented random forest machine learning algorithm using pyspark on AWS EMR to classify the wines. The model is then deployed in docker container.

docker random-forest aws-s3 aws-ec2 ec2-instance aws-emr-clusters pyspark-mllib wine-quality-prediction

Updated Apr 10, 2024
Python

nikisthaa / credit-card-fraud-detection

Star

Credit card fraud detection using pyspark ML

distributed-computing pyspark-mllib credit-card-fraud-detection

Updated Oct 26, 2023
Python

Improve this page

Add a description, image, and links to the pyspark-mllib topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark-mllib topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyspark-mllib

Here are 21 public repositories matching this topic...

autodeployai / pypmml-spark

biagiom / spark-network-traffic-classifier

animenon / pyspark_mllib

gabridego / spark-exercises

asif7adil / scSPARKL

ravichoudharyds / Pyspark_Recommendation_System

steve303 / spark_MLlib_graphf

zahidadeel / product-recommendation-engine

OmarNouih / Twitter-Streams

SyedIkram / Analysis-and-Recommendations-on-YELP-Dataset

ksashok / Movie-Recommendation-PySpark

cgDeepLearn / LearnSpark

divithraju / divith-raju-pipeline-hadoop-pyspark

prakashdontaraju / dietary-trends-pyspark

aviggithub / PySpark

limz1986 / PySpark-ML-Model-DataBricks

kabbina / Big-Data

SayamAlt / TMDB-Movies-End-to-End-ETL-and-ML-Pipeline

Chan2k20 / Wine-Prediction-Prediction-Model-On-AWS-EMR

nikisthaa / credit-card-fraud-detection

Improve this page

Add this topic to your repo