Sample Solution: Movie Recommender

If you haven’t already, please review the DAIR BoosterPack Flight Plan, Machine Learning: Recommender System, before implementing the Sample Solution below.


The Movie Recommender BoosterPack provides a deployable Sample Solution that allows users to observe and study the application of a Collaborative Filtering model to provide recommendations to users based on their past behaviour. The purpose of this document is to describe the Sample Solution and how it demonstrates the use of advanced technologies.

Any acronyms and terms identified with Capitalized Italics in this document are defined in the Glossary.

Problem Statement

In today’s world, users have an overwhelming array of options to choose from in practically any online business they interact with. It happens when trying to choose a book, a TV show, a movie, a new electronic device, or even groceries. When users have thousands, hundreds, or even just tens of options to choose from, it is essential that businesses provide tools to help them discover products of their interest quickly.

Automatic recommendation systems powered by machine learning aim to solve this problem and have actually become an essential feature for content providers and retail sites. Recommender Systems use information like users demographics, behaviour, product information, or product ratings provided by users to make predictions about what they will be most interested in at a particular time. Without a recommendation, the user experience is greatly degraded as they are forced to spend too much time searching for items of their interest and are more likely to abandon the task.

Sample Solution

This Sample Solution showcases how TensorFlow and TensorRT can be used to build a Collaborative Filtering model for movie recommendations that runs on a GPU.

Solution Overview

The solution consists of an end-to-end deep learning collaborative filtering recommendation system from user data.

Collaborative Filtering is a widely used approach to implement recommender systems. Collaborative filtering methods are based on users’ behaviours, activities, or preferences and predict what users will like based on their similarity to other users. A key advantage of the collaborative filtering approach is that it can rely only on observed user behaviour, without requiring extra information from the user or the product, making it easily transferable to different business applications where that information may not be available.

The solution includes the creation of a Deep Learning model developed in Python and TensorFlow and the deployment to a production-like environment leveraging NVIDIA TensorRT library and TensorRT Inference Server. Python is a widely used library in machine learning projects by both beginners and experienced practitioners. TensorFlow and TensorRT are available as part of DAIR Cloud infrastructure and allow the development of medium- to large-scale, high performance deep learning models.

The Deep Learning model is a Multilayer Perceptron as proposed in Neural Collaborative Filtering (He et al. 2017). The problem is implemented as a classification problem and a Neural Network is trained by using movies watched and rated by users as positive examples, and unwatched movies as negative examples.

The model is trained using the Movielens dataset, which is a dataset with movie information and ratings from users that is publicly available for non-commercial purposes and commonly used in tutorials and research projects.

Solution Overview Diagram

The diagram below illustrates the structure of the Sample Solution.

The solution consists of an orchestrator that runs on the DAIR Cloud platform to coordinate deployments and two Python sub-systems in the DAIR Cloud: an offline pipeline and an online service. The offline component is the implementation of the experimentation phase of a machine learning project. It generates a machine learning model from the data. The online sub-system is the real-time service that makes recommendations to users by making predictions from the model. The offline component is not executed or queried by users in real-time. It may run (maybe several times) before the online service is launched, and periodically after to create new improved models when new data or model types are available. The online component includes a separate client that makes queries to the prediction service. That showcases how a larger system could easily integrate the Sample Solution as a component.

Component Descriptions

The table below is a summary of the significant components of the Sample Solution:

Component Summary
Orchestrator Set of scripts to coordinate the deployment of the rest of the components into the DAIR Cloud service.

Data processing pipeline Collect data from an external data source and process it to make it usable by the Model Trainer (clean up, properly format, etc.).

Model Trainer TensorFlow scripts to create, train and evaluate the machine learning model given the preprocessed data. The outputs are a model file and reports about the model performance.


Model Exporter Transform and export the model generated by the Trainer into a TensorRT model.

GPU Inference Server Online TensorRT server to make predictions given inputs using a trained model leveraging a GPU.

Light Client A command-line script that acts as a sample client making online queries to the GPU inference server. Used by users of the recommender system.


Technology Demonstration

This section guides you through a demonstration of a Recommendation SystemUsing a Recommendation System is compelling because it provides value to users by narrowing the search of items to those that they could be more interested in.

The demonstration will illustrate how to get recommendations of movies for users that have previously watched and rated other movies.

How to Deploy and Configure

If you’re a DAIR participant with access to a GPU, you can deploy this Sample Solution by following the instructions below. To complete these steps, you must already have or request an account on the DAIR OpenStack cloud with access to a GPU.

  • Log into the DAIR OpenStack cloud environment.
  • Navigate to Projects > Orchestration > Stacks and click the + Launch Stack

In the Select Template dialog, select URL as the Template Source and paste the following URL:

into the Template URL field and then click Next.

In the Launch Stack dialog, configure the application as shown in the figure below providing:

  1. a Stack Name,
  2. a Password for the user (can be anything but not blank),
  3. the Flavor/Instance Type of “v2.medium”, and
  4. Image of “Ubuntu 22.04 – vGPU – 525”
  5. click the Launch.

3. The sample application will now be deployed, taking roughly 5-10 minutes to complete. Post-provisioning scripts are also run automatically to setup the environment needed for the Sample Solution.

Once complete, you will need to copy the private key so that you can open a console (SSH) to the newly created GPU instance.

4. To retrieve the private key, click the Stack Name and scroll down until you find the private_key field and copy its contents to be used with your SSH client of choice.

Configuration and Application Launch

Once the application deployment is complete, initiate a console to the GPU instance via SSH using its external IP which can be found by navigating to Project > Compute > Instances as shown below.

SSH/login using the IP and username ‘ubuntu’ with no password.


Application Launch

Once the deployment is complete, you are ready to use the application. To do so, you need the IP of the app. You can find it on the instance detail page as described in the last step of the previous section.

Once you are logged in, attach to the Docker container named trtclient by running the following command:

sudo docker attach trtclient

If it seems to hang, just click Enter again. You are now in the Docker container that holds the client command-line application. To start the client, run the following command:

python3 /workspace/movierec/

The command-line application will show instructions on how to use it. Just type a number (for example, 0) and hit Enter to see movies recommended for a user. Instead of showing information about the user, the program shows some example movies that the user has watched and rated before.

If you want to exit the trtclient container but not kill it, CTRL-p CTRL-q. That will detach from the container, but not kill it, so you can attach to it again later.


Once you are finished exploring the Sample Solution, terminate the application to free up GPU resources for the DAIR community.

Navigate to Projects > Orchestration > Stacks, select the Stack and click Delete Stacks at the top (see figure below). This operation will delete the stack and the associated GPU instance. It should take less than 1 minute to complete.

Technology Considerations

This section describes considerations for usage and adaptation of the Sample Solution.

Deployment Options

The Sample Solution is deployed to a TensorRT inference server, but the TensorFlow model could be run directly production without exporting the model and running a TensorRT inference server. However, TensorRT is in general more efficient on GPUs and the predictions are in general faster than other alternatives.

Moreover, in this example, for simplicity, both the client and the inference server run on the same host. But typically, they would run on different hosts exposing the appropriate ports between them.

Technology Alternatives

An alternative to TensorFlow to build and train the model is PyTorch. It is another popular framework for machine learning using Python that is also compatible with the TensorRT inference server. There are many comparatives online regarding when to choose one or the other, some are Awni Hannin: PyTorch or TensorFlow?, TensorFlow or PyTorch: The Force is Strong with Which One?,  PyTorch vs. TensorFlow — Spotting the Difference and The Battle: TensorFlow vs Pytorch.

Data Architecture Considerations

The Sample Solution relies on a dataset that is in this case, implemented as a single text file. There are several considerations regarding the data, the pieces of code that rely on it, how to extend it, and best practices.

The two main components that directly use the dataset are the data pipeline to load it into memory for training and the client to map predictions to actual movie names. Those are the pieces that would need updates if the data architecture changed. For example, the data could be stored in a database and therefore, the data pipeline would need to either dump the database to a file before running the current pipeline or directly query the database system from the code. As examples for the client code, the implementation could load the data to a hash map from a file or from a database or could query a database on every request.

To extend the data and incorporate new users or movies to the solution, it is necessary to retrain the model. A production solution would incorporate periodic retraining of the model and would provide personalized recommendations only after users have made enough use of the system to collect the data. Typically, similar projects incorporate generalized recommendations of popular items before providing personalized ones.

Finally, since the data may contain users’ information, developers implementing a similar solution should follow standard practices for critical data management. For instance, the training component only needs identifiers, not the specific users or items information. However, during prediction time, the client would need to identify a user (for example, a login), translate to an ID, and do the same for movies. Therefore, similar solutions should consider standard industry best practices in the components that collect and store data from users, export data for model training, and query data to translate IDs to plain text.

Security Considerations

Once deployed, there is a minor risk that bad actors could gain access to the Sample Solution environment and modify it to mount a cyber attack (for example, perform a DoS attack). In order to mitigate the risk, the deployment scripts follow DAIR security ‘Best Practices’, such as:

  • firewall rules restricting access to all ports except SSH port 22 on the deployed instance.
  • access authorization control allows only authenticated DAIR participants to deploy and access an instance of the Sample Solution.

In addition, to limit security risks please follow the recommendations below:

  • use security controls ‘as deployed’ without modification
  • when finished using the reference solution, proceed to terminate the app (see section about termination earlier in this document).

Finally, as a stand-alone application, the reference solution does not directly consume network or storage resources while running. As such, those resources do not need any explicit control procedures.

Networking Considerations

The inference server can be queried by clients using HTTP or gRPC (Google Remote Procedure Call) protocols. There are not any specific networking considerations to highlight.

Scaling Considerations

This Sample Solution uses a stateless model. That means that the same model can be deployed to many inference servers, implementing a standard highly scalable architecture where many requests can be sent in parallel and a load balancer spreads them through the inference servers.

Availability Considerations

The TensorRT inference server provides a health check API that indicates if the server is able to respond to inference requests. That allows the inference server to be included as any regular host in a highly available architecture, where the health check can be used by a load balancer to shut forward the requests, replace the host, or start a new one.

User Interface (UI) Considerations

The Sample Solution only provides a simple command-line interface meant to showcase the backend. The UI would depend on the specific application and it is out of the scope of this example.

API Considerations

The code is regular Python code, it is organized in a modular manner, and includes extensive code comments. Thus, developers can easily extend it to create custom solutions.

Cost Considerations

This solution requires a single GPU instance in DAIR, whose equivalent value is approximately $100 / month in a public cloud.

License Considerations

All the libraries used in this sample solution are open source. The Movie Recommender code itself is as well open source. The MovieLens dataset is available for non-commercial use and under certain conditions. See detailed licensing information below. If you plan to use, modify, extend or distribute any components of this Sample Solution or its libraries, you must consider conformance to the terms of the licenses:

Source Code

The source code for the solution is available at: and is available to DAIR participants. Please, refer to the file for instructions on how to clone and use the repository.


Term Description
API Application Programming Interface.
Collaborative Filtering
Collaborative Filtering is a technique used by Recommender Systems to make automatic predictions about the interests of a user by collecting preferences or ratings from many users (collaborating).
Compute Unified Device Architecture. It is a parallel computing platform and programming model from NVIDIA.
DAIR Digital Accelerator for Innovation and Research. This document refers to the DAIR Pilot released in Fall 2019.
Deep Learning
Deep learning is part of a broader family of machine learning methods based on artificial neural networks.
GPU Graphics processing unit. A hardware component with high performance for parallel processing.
Recommender System
A recommender system or a recommendation system is a Machine Learning model that seeks to predict the “rating” or “preference” a user would give to an item. They provide suggestions of relevant items to users.
UI User interface.