A movie recommendation system, powered by machine learning recommendation engines, can create a personalized viewing experience that keeps viewers satisfied and engaged. Building a top-notch movie recommendation system is crucial because it directly impacts user retention and platform popularity. It’s a complex mix of technology and creativity, with techniques ranging from content-based filtering to collaborative filtering in R Programming Language .
So, let’s talk about recommendation systems. These nifty systems use fancy algorithms and machine learning to predict what you might like and suggest stuff you’ll be interested in. How do they do it? Well, they go through a bunch of steps like collecting, storing, analyzing, and filtering data to give you personalized recommendations.
Now, there are two main types of recommendation systems.
Alright, now let’s dive into how movie and TV show recommendation engines work. These engines are all about analyzing your behavior, preferences, and even your demographic info to suggest content that’s right up your alley. They use collaborative filtering to see what you and other users have in common, and content-based filtering to focus on the specific attributes of the movies and shows.
To begin setting up the environment in R for a movie recommendation system, one needs to install several key libraries.
In order to build our recommendation system, we have used the MovieLens Dataset. You can find the movies.csv and ratings.csv file that we have used in our Recommendation System Project here :
Dataset Link: Movies Data , Rating Data
This data consists of 105339 ratings applied over 10329 movies.
First we will install and load the required libraries.
% operatorOutput:
userId movieId rating timestamp 1 1 16 4.0 1217897793 2 1 24 1.5 1217895807 3 1 32 4.0 1217896246 4 1 47 4.0 1217896556 5 1 50 4.0 1217896523 6 1 110 4.0 1217896150
Now we will preprocessing the data.
Output:
movieId title Action Adventure Animation Children Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western 1 1 Toy Story (1995) 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 2 2 Jumanji (1995) 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 3 3 Grumpier Old Men (1995) 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 4 4 Waiting to Exhale (1995) 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 5 5 Father of the Bride Part II (1995) 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 Heat (1995) 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
This step involves preprocessing the genre information from the movie dataset. We split the genre strings into individual genres using tstrsplit. We then create a binary matrix genre_mat where each row represents a movie and each column represents a genre, with 1 indicating the presence of a genre for a movie. This matrix is then combined with the movie data to form the SearchMatrix.
Now we will visualize the data.
Output:
Movie and TV Show Recommendation Engine in R
This plot shows the distribution of ratings in the dataset. It helps us understand the overall rating behavior of users.
Now we will visualize the top Rated Movies.
Output:
Movie and TV Show Recommendation Engine in R
This plot shows the top 10 movies based on average ratings, considering only movies with more than 50 ratings. It provides insights into the highest-rated movies in the dataset.
Now we will create one Rating Matrix for the Recommendation Engine in R.
Output:
668 x 10325 rating matrix of class ‘realRatingMatrix’ with 105339 ratings.
We transform the ratings data into a matrix format suitable for the recommendation engine. Using dcast, we create a user-item matrix where rows represent users and columns represent movies, with the values being the ratings. This matrix is then converted into a realRatingMatrix object, which is the required format for the recommenderlab package.
Now we will Build our Recommendation Engine in R.
Output:
HYBRID_realRatingMatrix''ALS_realRatingMatrix''ALS_implicit_realRatingMatrix''IBCF_realRatingMatrix. $HYBRID_realRatingMatrix 'Hybrid recommender that aggegates several recommendation strategies using weighted averages.' $ALS_realRatingMatrix 'Recommender for explicit ratings based on latent factors, calculated by alternating least squares algorithm.' $ALS_implicit_realRatingMatrix 'Recommender for implicit data based on latent factors, calculated by alternating least squares algorithm.' $IBCF_realRatingMatrix 'Recommender based on item-based collaborative filtering.' $LIBMF_realRatingMatrix 'Matrix factorization with LIBMF via package recosystem (https://cran.r-project.org/web $POPULAR_realRatingMatrix 'Recommender based on item popularity.' $RANDOM_realRatingMatrix 'Produce random recommendations (real ratings).' $RERECOMMEND_realRatingMatrix 'Re-recommends highly rated items (real ratings).' $SVD_realRatingMatrix 'Recommender based on SVD approximation with column-mean imputation.' $SVDF_realRatingMatrix 'Recommender based on Funk SVD with gradient descend (https://sifter.org/~simon/journal/20061211.html).' $UBCF_realRatingMatrix 'Recommender based on user-based collaborative filtering.'
We build the recommendation model using the Item-Based Collaborative Filtering (IBCF) method. The k parameter specifies the number of nearest neighbors. We visualize the similarity matrix using a heatmap to understand the similarity between the first 20 movies.
IBCF’s Inner Workings:
Output:
Recommender of type ‘IBCF’ for ‘realRatingMatrix’ learned using 668 users. 'dgCMatrix' 1032510325
Heatmap of first rows and columns
We build the recommendation model using Item-Based Collaborative Filtering (IBCF) with 30 nearest neighbors. We inspect the similarity matrix and visualize the similarities between the first 20 items using a heatmap.
Now we will predict Recommendations.
Output:
"Now and Then (1995)" 72 "Kicking and Screaming (1995)" 84 "Last Summer in the Hamptons (1995)" 90 "Journey of August King, The (1995)" 131 "Frankie Starlight (1995)" 271 "Losing Isaiah (1995)" 279 "My Family (1995)" 309 "Red Firecracker, Green Firecracker (Pao Da Shuang Deng) (1994)" 330 "Tales from the Hood (1995)" 352 "Crooklyn (1994)"
We split the data into training and testing sets (80% training, 20% testing). We then predict the top 10 movie recommendations for the users in the testing set. For a specific user (e.g., user 1), we extract the recommended movie IDs and get their titles.
Now we will Evaluate our Recommendation Engine.
Output:
RMSE 1.49711559147115 MSE 2.24135509422602 MAE 1.14430992655367
We evaluate the recommendation engine using a split method (80% train, 20% test) and calculate prediction accuracy using RMSE and MAE. This helps to measure how close the predicted ratings are to the actual ratings.
In this article, we started by getting the data ready. We created a rating matrix and extracted movie genres. Then, we used the recommenderlab package in R to train a recommendation model called item-based collaborative filtering (IBCF).
Once the model was trained, we checked how well it performed using a testing dataset. We also made predictions for the top recommendations for each user. To understand how items are related in the recommendation system, we analyzed the similarity matrix. Additionally, we visualized the distribution of similarities between items and users’ average ratings.