Movie and TV Show Recommendation Engine in R

A movie recommendation system, powered by machine learning recommendation engines, can create a personalized viewing experience that keeps viewers satisfied and engaged. Building a top-notch movie recommendation system is crucial because it directly impacts user retention and platform popularity. It’s a complex mix of technology and creativity, with techniques ranging from content-based filtering to collaborative filtering in R Programming Language .

Understanding Recommendation Systems

So, let’s talk about recommendation systems. These nifty systems use fancy algorithms and machine learning to predict what you might like and suggest stuff you’ll be interested in. How do they do it? Well, they go through a bunch of steps like collecting, storing, analyzing, and filtering data to give you personalized recommendations.

Now, there are two main types of recommendation systems.

Collaborative Filtering : Collaborative filtering uses data from user-item interactions to create suggestions. This approach splits into two main types: user-based and item-based. User-based filtering spots folks with matching tastes and pushes items these similar users enjoy. It uses tools like Pearson correlation or cosine similarity to figure out how alike users are. Item-based filtering, however, looks at how similar items are to each other. It suggests stuff that’s close to what you’ve already checked out. This method often scales better than its counterpart.
Content-Based Filtering : content-based filtering in recommendation systems, It guesses and proposes items that match what a user liked before. This approach banks on item traits – think movie genres or actors. The system looks at these features and suggests new stuff that fits what the user digs. For content-based filtering to work well, you need a ton of details about each item. You also need a full picture of the user – their clicks, ratings, likes, the works

Alright, now let’s dive into how movie and TV show recommendation engines work. These engines are all about analyzing your behavior, preferences, and even your demographic info to suggest content that’s right up your alley. They use collaborative filtering to see what you and other users have in common, and content-based filtering to focus on the specific attributes of the movies and shows.

Setting Up Your Environment in R

To begin setting up the environment in R for a movie recommendation system, one needs to install several key libraries.

recommenderlab: R package for building recommendation systems.
ggplot2: R package for data visualization with a grammar of graphics approach.
data.table : R package for efficient data manipulation, especially for large datasets.
reshape2: R package for data reshaping and aggregation, facilitating analysis and visualization.
dplyr : R package For data manipulation using the %>% operator.

Step 1: Data Collection

In order to build our recommendation system, we have used the MovieLens Dataset. You can find the movies.csv and ratings.csv file that we have used in our Recommendation System Project here :

Dataset Link: Movies Data , Rating Data

This data consists of 105339 ratings applied over 10329 movies.

Step 2: install and load the required libraries

First we will install and load the required libraries.

% operator

Output:

 userId movieId rating timestamp 1 1 16 4.0 1217897793 2 1 24 1.5 1217895807 3 1 32 4.0 1217896246 4 1 47 4.0 1217896556 5 1 50 4.0 1217896523 6 1 110 4.0 1217896150

Step 3: Data Preprocessing

Now we will preprocessing the data.

Output:

 movieId title Action Adventure Animation Children Comedy Crime Documentary Drama Fantasy Film-Noir Horror Musical Mystery Romance Sci-Fi Thriller War Western                    1 1 Toy Story (1995) 0 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 2 2 Jumanji (1995) 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 3 3 Grumpier Old Men (1995) 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 4 4 Waiting to Exhale (1995) 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 5 5 Father of the Bride Part II (1995) 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 6 6 Heat (1995) 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0

This step involves preprocessing the genre information from the movie dataset. We split the genre strings into individual genres using tstrsplit. We then create a binary matrix genre_mat where each row represents a movie and each column represents a genre, with 1 indicating the presence of a genre for a movie. This matrix is then combined with the movie data to form the SearchMatrix.

Step 4: Visualizing of the Data

Now we will visualize the data.

Output:

Movie and TV Show Recommendation Engine in R

This plot shows the distribution of ratings in the dataset. It helps us understand the overall rating behavior of users.

Top Rated Movies

Now we will visualize the top Rated Movies.

Output:

Movie and TV Show Recommendation Engine in R

This plot shows the top 10 movies based on average ratings, considering only movies with more than 50 ratings. It provides insights into the highest-rated movies in the dataset.

Step 5: Create Rating Matrix

Now we will create one Rating Matrix for the Recommendation Engine in R.

Output:

668 x 10325 rating matrix of class ‘realRatingMatrix’ with 105339 ratings.

We transform the ratings data into a matrix format suitable for the recommendation engine. Using dcast, we create a user-item matrix where rows represent users and columns represent movies, with the values being the ratings. This matrix is then converted into a realRatingMatrix object, which is the required format for the recommenderlab package.

Step 6: Build Recommendation Engine

Now we will Build our Recommendation Engine in R.

Output:

HYBRID_realRatingMatrix''ALS_realRatingMatrix''ALS_implicit_realRatingMatrix''IBCF_realRatingMatrix. $HYBRID_realRatingMatrix 'Hybrid recommender that aggegates several recommendation strategies using weighted averages.' $ALS_realRatingMatrix 'Recommender for explicit ratings based on latent factors, calculated by alternating least squares algorithm.' $ALS_implicit_realRatingMatrix 'Recommender for implicit data based on latent factors, calculated by alternating least squares algorithm.' $IBCF_realRatingMatrix 'Recommender based on item-based collaborative filtering.' $LIBMF_realRatingMatrix 'Matrix factorization with LIBMF via package recosystem (https://cran.r-project.org/web $POPULAR_realRatingMatrix 'Recommender based on item popularity.' $RANDOM_realRatingMatrix 'Produce random recommendations (real ratings).' $RERECOMMEND_realRatingMatrix 'Re-recommends highly rated items (real ratings).' $SVD_realRatingMatrix 'Recommender based on SVD approximation with column-mean imputation.' $SVDF_realRatingMatrix 'Recommender based on Funk SVD with gradient descend (https://sifter.org/~simon/journal/20061211.html).' $UBCF_realRatingMatrix 'Recommender based on user-based collaborative filtering.'

We build the recommendation model using the Item-Based Collaborative Filtering (IBCF) method. The k parameter specifies the number of nearest neighbors. We visualize the similarity matrix using a heatmap to understand the similarity between the first 20 movies.

Step 7: Build the IBCF Model

IBCF’s Inner Workings:

Similarity Measurement : IBCF figures out how alike items are based on user ratings. It uses tools like cosine similarity or Pearson correlation to crunch these numbers.
Making Picks: For each user, IBCF spots stuff they’ve given thumbs up to. Then it hunts down other items that match up . These lookalikes end up as suggestions for the user.
Pros and Things to Ponder : IBCF scales better than User-Based CF. Its similarity matrix takes up less space and has fewer gaps than the user-item matrix. IBCF tackles the “cold start” issue for new items more . It bases its picks on how alike items are, not just on what users did before. To get the best out of IBCF, you need to tweak things like k (the number of neighbors). This can change how well it works and how good its suggestions are.
Fitting into the Big Picture: IBCF doesn’t work alone. It’s part of a bigger system that cleans data, trains models, checks how good they are, and puts them to use. IBCF is just one tool in the box. Recommendation engines use it along with other methods (like different CF types, content-based filtering, and mix-and-match approaches) to give each user tailored suggestions based on what they do and what items are like.

Output:

Recommender of type ‘IBCF’ for ‘realRatingMatrix’ learned using 668 users. 'dgCMatrix' 1032510325

Screenshot-2024-07-03-100009

Heatmap of first rows and columns

We build the recommendation model using Item-Based Collaborative Filtering (IBCF) with 30 nearest neighbors. We inspect the similarity matrix and visualize the similarities between the first 20 items using a heatmap.

Step 8: Predict Recommendations

Now we will predict Recommendations.

Output:

 "Now and Then (1995)" 72 "Kicking and Screaming (1995)" 84 "Last Summer in the Hamptons (1995)" 90 "Journey of August King, The (1995)" 131 "Frankie Starlight (1995)" 271 "Losing Isaiah (1995)" 279 "My Family (1995)" 309 "Red Firecracker, Green Firecracker (Pao Da Shuang Deng) (1994)" 330 "Tales from the Hood (1995)" 352 "Crooklyn (1994)"

We split the data into training and testing sets (80% training, 20% testing). We then predict the top 10 movie recommendations for the users in the testing set. For a specific user (e.g., user 1), we extract the recommended movie IDs and get their titles.

Step 9: Evaluate the Recommendation Engine

Now we will Evaluate our Recommendation Engine.

Output:

RMSE 1.49711559147115 MSE 2.24135509422602 MAE 1.14430992655367

We evaluate the recommendation engine using a split method (80% train, 20% test) and calculate prediction accuracy using RMSE and MAE. This helps to measure how close the predicted ratings are to the actual ratings.

Conclusion

In this article, we started by getting the data ready. We created a rating matrix and extracted movie genres. Then, we used the recommenderlab package in R to train a recommendation model called item-based collaborative filtering (IBCF).

Once the model was trained, we checked how well it performed using a testing dataset. We also made predictions for the top recommendations for each user. To understand how items are related in the recommendation system, we analyzed the similarity matrix. Additionally, we visualized the distribution of similarities between items and users’ average ratings.