Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgraded with rich content #29

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
ab9dac8
Add files via upload
weirdolucifer Nov 24, 2018
ec0044d
updating more modules
weirdolucifer Nov 24, 2018
886066c
Added More Features to the System
weirdolucifer Oct 10, 2020
31187ca
Updated README file
weirdolucifer Oct 10, 2020
271e128
Updated engine and server with more modules
weirdolucifer Oct 10, 2020
b0151a2
Requirements.txt file
weirdolucifer Oct 10, 2020
3d5f26b
Added Content based Recommneder module
weirdolucifer Oct 10, 2020
ef731ff
Updated flask server file with all routers required for frontend
weirdolucifer Oct 10, 2020
f48dba4
Added Static files and flask templates
weirdolucifer Oct 10, 2020
4648057
Added pretrained files for item basedcollaborative recommender system
weirdolucifer Oct 10, 2020
c5b3dcc
Reference book for recommender system
weirdolucifer Oct 10, 2020
0d6fe50
Update README.md
weirdolucifer Oct 10, 2020
ab5f5cf
Updated the item_based_features pretrained data compatible with Python3
weirdolucifer Oct 10, 2020
2bd13a7
Updated the project for python3.6+
weirdolucifer Oct 10, 2020
953639e
Added the item based recommendation Colab file for generating the pre…
weirdolucifer Oct 10, 2020
6f40470
Update README.md
weirdolucifer Oct 10, 2020
5e7ff52
Added Demo files
weirdolucifer Oct 10, 2020
424c313
Merge branch 'master' of https://github.com/Weirdolucifer/spark-movie…
weirdolucifer Oct 10, 2020
e2e4e66
Update README.md
weirdolucifer Oct 10, 2020
1036622
Update README.md
weirdolucifer Oct 10, 2020
cdd8a5b
Update README.md
weirdolucifer Oct 10, 2020
a06bbcb
Update README.md
weirdolucifer Oct 26, 2020
1913012
Update README.md
TanishQ10 Sep 19, 2021
1401952
Update README.md
weirdolucifer Oct 6, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added 9781785884856-BUILDING_RECOMMENDATION_ENGINES.pdf
Binary file not shown.
18 changes: 3 additions & 15 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,17 +1,5 @@
This repository contains a variety of content; some developed by Jose A. Dianes, and some from third-parties. The third-party content is distributed under the license provided by those parties.
The parent repository of this project contains the basic content, which was developed by Jose A. Dianes (2016).
This project is extended with rich modules by Avinash Yadav.

The content developed by Jose A. Dianes is distributed under the following license:

Copyright 2016 Jose A Dianes

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
The content developed by Avinash Yadav
151 changes: 116 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,137 @@
# A scalable on-line movie recommender using Spark and Flask
# MovieRec
This project is a web app for movie websites like Netflix where a user is allowed to create an account and watch movies. This web app has mainly focused on the quality of recommendations we make to the user. From the various forms of recommendations we have used some of the most appropriate ones. The user can view the already watched and rated movies in the dashboard. But before that when the user opens the web app he is prompted to login the website if not registered we can as well register. The web app has a nice GUI with every button and field labeled with their respective role. So, the user will not face any difficulty in using the web app.

This Apache Spark tutorial will guide you step-by-step into how to use the [MovieLens dataset](http://grouplens.org/datasets/movielens/) to build a movie recommender using [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) with [Spark's Alternating Least Saqures](https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html) implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system.

This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Most of the code in the first part, about how to use ALS with the public MovieLens dataset, comes from my solution to one of the exercises proposed in the [CS100.1x Introduction to Big Data with Apache Spark by Anthony D. Joseph on edX](https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x), that is also [**publicly available since 2014 at Spark Summit**](https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html). Starting from there, I've added with minor modifications to use a larger dataset, then code about how to store and reload the model for later use, and finally a web service using Flask.
## Welcome to MovieRec
![homepage](images/homepage.png)

In any case, the use of this algorithm with this dataset is not new (you can [Google about it](https://www.google.co.uk/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=movielens%20dataset%20collaborative%20filtering)), and this is because we put the emphasis on ending up with a usable model in an on-line environment, and how to use it in different situations. But I truly got inspired by solving the exercise proposed in that course, and I highly recommend you to take it. There you will learn not just ALS but many other Spark algorithms.
## Login and Register Module
This module is what I will see first after opening the web app. Here the user is asked to enter the username and password to be able to login and see the dashboard. If not registered, users can go to the registration page. Where it can fill the details and get registered after which certain questions will be asked for solving the “cold start” problem.

It is the second part of the tutorial the one that explains how to use Python/Flask for building a web-service on top of Spark models. By doing so, you will be able to develop a complete **on-line movie recommendation service**.
![content_based](images/content_based.png)

## Part I: [Building the recommender](notebooks/building-recommender.ipynb)
## Watch List Module
This module basically deals with already registered users where we display the rated movies by the user where users can change the ratings as well.

## Part II: [Building and running the web service](notebooks/online-recommendations.ipynb)
![watchlist](images/watchlist.png)

## Quick start
## Top K Recommended List Module
Then there is a top K recommended list module which shows the user recommended list of movies based on the user-based collaborative filtering method.

The file `server/server.py` starts a [CherryPy](http://www.cherrypy.org/) server running a
[Flask](http://flask.pocoo.org/) `app.py` to start a RESTful
web server wrapping a Spark-based `engine.py` context. Through its API we can
perform on-line movie recommendations.
![user_based](images/user_based.png)

Please, refer the the [second notebook](notebooks/online-recommendations.ipynb) for detailed instructions on how to run and use the service.
## Movie Details and Similar Movies Module
This module is responsible for showing any particular movie details with the predicted ratings. Then there is a separate section showing the similar movies based on the attributes and tags using item-item collaborative filtering. This module inherently gets called whenever the user clicks on the movie to see the details.

## Contributing
![user_based](images/item.png)
![user_based](images/item_based.png)

Contributions are welcome! For bug reports or requests please [submit an issue](https://github.com/jadianes/spark-movie-lens/issues).
# Tutorial Guide
## A scalable on-line movie recommender using Spark and Flask
This Apache Spark tutorial will guide you step-by-step into how to use the [MovieLens dataset](http://grouplens.org/datasets/movielens/) to build a movie recommender using [collaborative filtering](https://en.wikipedia.org/wiki/Recommender_system#Collaborative_filtering) with [Spark's Alternating Least Saqures](https://spark.apache.org/docs/latest/mllib-collaborative-filtering.html) implementation. It is organised in two parts. The first one is about getting and parsing movies and ratings data into Spark RDDs. The second is about building and using the recommender and persisting it for later use in our on-line recommender system.

## Contact
This tutorial can be used independently to build a movie recommender model based on the MovieLens dataset. Most of the code in the first part, about how to use ALS with the public MovieLens dataset, comes from my solution to one of the exercises proposed in the [CS100.1x Introduction to Big Data with Apache Spark by Anthony D. Joseph on edX](https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x). Starting from there, I've added different techniques with modifications to use a larger dataset, then code about how to store and reload the model for later use, and finally a web service using Flask.

Feel free to contact me to discuss any issues, questions, or comments.
In any case, the use of this algorithm with this dataset is not new (you can [Google about it](https://www.google.com/search?ei=tJSAX5WYC6Se4-EPlc6z-AU&q=movielens+dataset+recommender+system&oq=movielens+dataset+recommender+system&gs_lcp=CgZwc3ktYWIQAzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQRzIECAAQR1AAWABghTloAHACeACAAQCIAQCSAQCYAQCqAQdnd3Mtd2l6yAEIwAEB&sclient=psy-ab&ved=0ahUKEwiVwqno-6fsAhUkzzgGHRXnDF8Q4dUDCA0&uact=5)), and this is because we put the emphasis on ending up with a usable model in an on-line environment, and how to use it in different situations. But I truly got inspired by solving the exercise proposed in that course, and I highly recommend you to take it. There you will learn not just ALS but many other Spark algorithms.

* Twitter: [@ja_dianes](https://twitter.com/ja_dianes)
* GitHub: [jadianes](https://github.com/jadianes)
* LinkedIn: [jadianes](https://www.linkedin.com/in/jadianes)
* Website: [jadianes.me](http://jadianes.me)
It is the second part of the tutorial the one that explains how to use Python/Flask for building a web-service on top of Spark models. By doing so, you will be able to develop a complete **on-line movie recommendation service**.

## License
### Part I: [Building the recommender](notebooks/building-recommender.ipynb)

### Part II: [Building and running the web service](notebooks/online-recommendations.ipynb)

### Part III: [Pretraining model for online recommendation ( Item based collaborative filtering)](item_based_collaborative_filtering_colab/item_based_movie_recommender.ipynb)

# Installation Guide
Prerequisite for this project is to install JAVA in your linux system
```
sudo apt-get install openjdk-8-jdk-headless
```
You must have Python 3.6+ installed in your system. Since this is upgraded version of the project. You can prefer older version of this project [here](https://github.com/Weirdolucifer/spark-movie-lens/tree/v0.1.0).
###### Download the latest version of Apache Spark form the official site. I'll recommend you to use the same version which I am using for painless journey.
```
wget -q https://downloads.apache.org/spark/spark-3.0.1/spark-3.0.1-bin-hadoop2.7.tgz
```
###### Extarct this folder and move it to the Home directory.

Clone this repository:
```
git clone https://github.com/Weirdolucifer/spark-movie-lens
```
If you don't have installed pip, use pip3 for installation
```
sudo apt-get install python3-pip
```
Set up a virtual environment and activate it to avoid dependency issues.
```
mkvirtualenv venv
workon venv
```
Install default-libmysqlclient-dev for flask-mysqldb:
```
sudo apt install default-libmysqlclient-dev
```
Install the required dependencies using the following command
```
pip3 install -r requirements.txt
```
MySql database setup:
Here, I have removed the password from mysql login as root. You can set your own password. I created the database and table will be used for the application.
```
mysql -u root -p;

mysql> CREATE DATABASE flaskapp;
mysql> USE mysql;
mysql> UPDATE user SET plugin='mysql_native_password' WHERE User='root';
mysql> FLUSH PRIVILEGES;

mysql> USE flaskapp;
mysql> CREATE TABLE `users` (
`ID` int(20) NOT NULL,
`Password` char(60) DEFAULT NULL,
`Name` varchar(40) DEFAULT NULL,
`Genre1` varchar(40) DEFAULT NULL,
`Genre2` varchar(40) DEFAULT NULL,
PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
mysql> exit;

mysql -u root;
```
###### Make Sure your MySql server keep running.

# Data Set
Download the dataset by running `download_dataset.sh`.
###### Move item_based_features folder to `/datasets/ml-latest`.
For the convenience I have replaced `/datasets/ml-latest/ratings.csv` by `/datasets/ml-latest-small/ratings.csv` to run locally.

# Instructions to run Application
- Make sure Folder `[spark-3.0.1-bin-hadoop2.7]` in in home directory.
- Go to the Network settigs: Find the IPv4 Address.
- Go to `home/<username>/spark-3.0.1-bin-hadoop2.7/conf` and make a copy of `spark-env.sh.template` file and rename it to `spark-env.sh`
- Add `SPARK_MASTER_PORT=5435` ,`SPARK_MASTER_HOST=<Your IPv4 Address>` in `spark-env.sh` file.
- Go to the project folder and find `server.py` file and update `'server.socket_host': '<Your IPv4 Address>'`.
- The file `server/server.py` starts a [CherryPy](http://www.cherrypy.org/) server running a [Flask](http://flask.pocoo.org/) `app.py` to start a RESTful web server wrapping a Spark-based `engine.py` context. Through its API we can perform on-line movie recommendations.
##### If you are not using distributed feature of spark:
- Update `start-server.sh` with `~/spark-3.0.1-bin-hadoop2.7/bin/spark-submit server.py`
- Run `./start-server.sh`. You'll get the server link at the end of execution.

##### If you are using distributed feature of the spark:
- Go to `home/<username>/spark-3.0.1-bin-hadoop2.7/conf` and run `start-master.sh` file (master node).
- After that you can initiate slave process in other systems having same structure by running `start-slave.sh <MASTER'S_IPv4_ADDRESS>`
- Then run `start-server.sh` in slave systems by updatig `start-server.sh` with `~/spark-3.0.1-bin-hadoop2.7/bin/spark-submit --master spark://<MASTER'S_IPv4_ADDRESS:5435> server.py`

This repository contains a variety of content; some developed by Jose A. Dianes, and some from third-parties. The third-party content is distributed under the license provided by those parties.
Please, refer the the [second notebook](notebooks/online-recommendations.ipynb) for detailed instructions on how to run and use the service.

The content developed by Jose A. Dianes is distributed under the following license:

Copyright 2016 Jose A Dianes

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
## Contributing
Contributions are welcome! Raise a PR :wink:

http://www.apache.org/licenses/LICENSE-2.0
## License

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
The parent repository of this project contains the basic content, which was developed by [**Jose A. Dianes**](https://github.com/jadianes) (2016).
This project is extended with rich modules by [**Avinash Yadav**](https://github.com/Weirdolucifer) and [**Ankit Kumar**](https://github.com/TanishQ10)

The content developed by Avinash Yadav

![Report Card](https://github-readme-stats.vercel.app/api/pin?username=weirdolucifer&repo=spark-movie-lens&title_color=fff&icon_color=f9f9f9&text_color=9f9f9f&bg_color=151515)
Loading