Data Analytics Tutorial : Lesson 8 - Choosing the right Movie using Data Analysis
Read Time 6 mins | Written by: Anoop
Explore the fundamentals of simple statistical analysis in data analytics, covering classification of data with the aid of graphs.
In this fun exercise,we'll use the past 100 Years movie ratings data to select a perfect movie to watch over the weekend!
If your OTT recommendation engine is not doing a good job of suggesting a movie for you, its time to take things to your own hand!
lets go nerdy and analyse some data to identify a movie to watch over the weekend. As with any quantitative analysis, let's deduce things one by one and arrive at a few options to watch.
In this tutorial we'll look at the IMDB movie data from 1910 - 2024 and follow the below steps to identify a few movie title options.
- First Lets choose a Genre
- Then we'll identify highest rated movies from those genres
- We'll then look at the movie directors and their previous movies average ratings
- Finally we'll get all the details about the movie that we shortlisted
With this Analysis, we are trying to understand how the spends are going to increase and what should be our personal earnings growth to survive.
Please use the attached sample data set to follow along with this Tutorial
Fig. 1 : Snapshot of the Data in .CSV file
Steps to perform Analysis on this data
Step 1:
Login in to your Free Talktodata.AI account and upload the data set. Below is the screenshot for reference.
Fig.2 : Steps to Login and Upload the sample data file
Step 2:
Identifying the data outliers and removing them from our analysis. It is important to remove this outlier data to ensure the outputs are not over weighed by best and the worst ratings.
For this, i'm using the command:
"can you identify the outliers from the dataset and exclude them from further analysis?"
We identified and excluded the outliers from the dataset.
- Number of outliers: 1,002
- Number of entries after excluding outliers: 2,171
Fig. 3: Graph Output
Step 3:
Now lets further analyse the data and short list few genres from the past two decades:
First I'm using the prompt : "Can you give me a visualisation of genre wise average ratings for the past two decades using the cleaned dataset?"
Fig. 4: Graph Output
Fig. 5: Genres with More than 10 Movies
Step 4:
Now for the top genre, lets see the top movies, their directors and their average ratings
First I'm using the prompt : "can you give me highest rated movies from the top 5 genres where the director has more than 3 movies?"
By using this prompt, we are again ensuring that the averages work in our favor.
Now that we have this list, lets compare the IMDB scores and Metascores to arrive at a conclusion.
For this, i'm using the command : "can you plot the IMDB score and metascore for these movies?"
Looks like we have picked up Harry potter for this weekend!
This is one example of how you can analyse your data while ensuring that the outliers are taken care of. Also your sub-classes of data have enough inputs to qualify as sound data
But this is not the only way to do this. Here are a few thought starters for you to try :
- Trust the AI and ask it to recommend a Movie considering factors like Ratings, Directors and Metascore.
- Fond of a Genre? Do a deep dive by doing decade wise ratings changes. Best movie of all time etc.
To ensure accurate results, it's important for us to ask the right questions.
Here is a simple guide on how to ask the right questions
You can further enhance this visualisation by asking the assistant to add data tags, change graph colours, add an average line etc. Now go-ahead and start your AI Assisted Data Analytics Journey.
Click here to Download the sample data set.
For any queries or support, please visit https://talktodata.ai/