IMDB-Data-Analysis-in-SQL
This project was carried out to answer a set of analytical questions to suggest a movie production house on which set of actors, directors, and production houses would be the best fit for a super hit commercial movie..
Table of Content (TOC)
- Database Creation for the Project
- Table Creation
- Data Insertion
Data Analysis
- EXECUTIVE SUMMARY AND RECOMMENDATIONS
1. Overview
This analysis is carried out to support RSVP Movies with a well-analyzed list of global stars to plan a movie for the global audience in 2022.
With this, we will be able to answer a set of analytical questions to suggest RSVP Production House on which set of actors, directors, and production houses would be the best fit for a super hit commercial movie.
RSVP Movies is an Indian film production company that has produced many super-hit movies. They have usually released movies for the Indian audience but for their next project, they are planning to release a movie for the global audience in 2022.
Why this Analysis?
The production company wants to plan its every move analytically based on data and has approached for help with this new project.
We have been provided with the data of the movies that have been released in the past three years. Let’s analyze the data set and draw meaningful insights that can help them start their new project.
We will use SQL to analyze the given data and give recommendations to RSVP Movies based on the insights.
We will be carrying out the entire analytics process into four segments, where each segment leads to significant insights from different combinations of tables.
2. Database Creation for the Project
A. check the list of database.
- The very first step of any MySQL analysis is to access the database and check if related data is available or not.
- Use show databases; to access the list of databases:
Database |
---|
classicmodels |
company |
information_schema |
market_star_schema |
org |
b. Create Database
- Create a new database for this project.
- Use Create database IMDB;
- Use show databases; to confirm the list of databases:
Database |
---|
classicmodels |
company |
imdb |
information_schema |
market_star_schema |
org |
c. Use Database
- Instruct the system to use *IMDB Database* by running use imdb;
3. Table Creation
Steps to follow before creating the table:.
- Download the IMDb dataset. And try to understanding every table and its importance.
- Understand the ERD and the table details. Study them carefully and understand the relationships between the table.
- Inspect each table given in the subsequent tabs and understand the features associated with each of them.
- Draft your table with the correct Data Type and Constraints in a paper or note file.
- Open your MySQL Workbench and start writing the DDL and DML commands to create the database.
Create Table
For this project we need a total of 6 tables:
Table Number | Tables_in_imdb |
---|---|
1 | director_mapping |
2 | genre |
3 | movie |
4 | names |
5 | ratings |
6 | role_mapping |
a. Create Table Movie
Table Name: Movie | Column Description |
---|---|
id | Movie Id is a unique ID associated with each movie |
title | Title of the movie |
year | year of Release |
date_published | Date of Movie Release |
duration | Duration of Movie |
country | Country of Release |
worlwide_gross_income | worlwide_gross_income |
languages | Languages released in |
production_company | production company associated with the movie |
b. Create Table Genre
Table Name: Genre | Column Description |
---|---|
movie_id | Movie Id of the movie |
genre | Genre tagged for movie |
c. Create Table director_mapping
| Table Name: director_mapping | Column Description | | ———– | ———– | | movie_id | Movie Id of the movie directed by a director | | name_id | Name ID of the director |
d. Create Table role_mapping
Table Name: Role_Mapping | Column Description |
---|---|
movie_id | Movie Id of the movies |
name_id | Name ID of the associated person |
category | Associated responsibility like Actor, director on a movie |
e. Create Table names
Table Name: Names | Column Description |
---|---|
id | Name ID of each individual |
name | Name of each individual |
height | Height of individual |
date_of_birth | DOB |
known_for_movies | Famous or well known movie |
f. Create Table ratings
Table Name: Ratings | Column Description |
---|---|
movie_id | Movie Id of the movie |
avg_rating | Average Rating of Movie |
total_votes | Total vote counts |
median_rating | Median Rating of the movie |
Now, Run show tables; to ensure that all the six tables are created.
4. Data Insertion
In the previous steps, we created six tables. Now, we will insert the data into these tables. Here, we will be showing the syntax of 5 rows insertion into each table. (The complete data insertion syntax is available in the Repository)
a. Inserting data into Movie Table
B. inserting data into genre table, c. inserting data into director_mapping table, d. inserting data into role_mapping table, e. inserting data into names table, f. inserting data into ratings table, checking tables for inserted values:.
Select * from Movie;
Select * from Genre;
Select * from Director_Mapping;
Select * from Role_Mapping;
Select * from Names;
Select * from Ratings;
All the sample data inserted looks good. SO, we can go ahead with insertion of complete data. For insertion to work smoothly, lets drop all data from tables using TRUNCATE :
Insert Complete data
Run the command to insert complete data: IMDB File 3 Insert all data
1. Find the total number of rows in each table of the schema?
Alternative 1:.
Number of Rows after ignoring the Null Rows
Alternative 2:
Rows count inclusive of Null Rows:
TABLE_NAME Tables_in_imdb director_mapping 3867 genre 14662 movie 8519 names 23714 ratings 8230 role_mapping 15173
2. Which columns in the movie table have null values?
id_null title_null year_null date_null duration_null country_null world_null language_null production_null 0 0 0 0 0 20 3724 194 528
3.1. Find the total number of movies released each year?
Movies per year:, 3.2. find the total number of movies released each year, movies per month, 4.1 find the count of indian movies., 4.2 find the count of movies from usa, 4.3 find the count of movies which are either from india or usa, 4.4 find the count of movies that are either from india or usa and released in 2019., 5. find the unique list of the genres present in the data set, 6.1 find the movies count for each genre., 6.2 find the genre with the maximum number of movies., 6.3 find the genre with minimum number of movies., 6.4 find the top-3 genre with the maximum number of movies., 6.4 find the movies count for action genre., 6.5 find the genre count for each movie., 6.6 find the list of indian movies that belongs to 3 genre., 6.7 longest indian movie tagged with 3 genre..
‘tt6200656’, ‘Kammara Sambhavam’, ‘182’, ‘3’
6.8 Which genres are tagged with ‘Kammara Sambhavam’ movie.
genre Action Comedy Drama
7.1. How many movies belong to only one genre?
Create a list of Movies with a genre count
Restrict the list to Genre count = 1
Count the total number of rows
7.2. How many movies belong to two genres?
7.3. how many movies belong to three genres, 8.1. what is the average duration of movies in each genre, 8.2. rank the genre by the average duration of movies in each genre., 9. what is the rank of the ‘thriller’ genre of movies among all the genres in terms of the number of movies produced, 10. find the minimum and maximum values in each column of the rating table except the movie_id column, 11. which are the top 10 movies based on average rating, 12. summarize the ratings table based on the movie counts by median ratings., 13. which production house has produced the most number of hit movies (average rating > 8).
Create list of production house with count of movies where average rating > 8 and Ranked over “Movies count”
Applied CTE to pull the production house with Rank = 1
NOTE: applied (production_company IS NOT NULL) as there are few movies without production house name
14. How many movies released in each genre during March 2017 in the USA had more than 1,000 votes?
15. find movies of each genre that start with the word ‘the’ and which have an average rating > 8, 16. of the movies released between 1 april 2018 and 1 april 2019, how many were given a median rating of 8, 17. do german movies get more votes than italian movies, q18. which columns in the names table have null values, 19. who are the top three directors in the top three genres whose movies have an average rating > 8.
Pull the Top three Genre by Movie count where avg_rating > 8
Pull the Directors with Movie count where avg_rating > 8
Keeping “top_3_genres” as CTE, restrict the 2nd code to avg_rating > 8 and directors of top_3_genre
Trying Row_Number() function:
20. who are the top two actors whose movies have a median rating >= 8, 21. which are the top three production houses based on the number of votes received by their movies, 22. rank actors with movies released in india based on their average ratings. which actor is at the top of the list.
– Note: The actor should have acted in at least five Indian movies.
ALTERNTIVE 1 (Using Rank Window Function):
Alternative 2 (using cte):, 23.find out the top five actresses in hindi movies released in india based on their average ratings.
– Note: The actresses should have acted in at least three Indian movies.
24. Select thriller movies as per avg rating and classify them in the following category:
Rating > 8: Superhit movies
Rating between 7 and 8: Hit movies
Rating between 5 and 7: One-time-watch movies
Rating < 5: Flop movies
——————————————————————————————–*/
EXECUTIVE SUMMARY AND RECOMMENDATIONS {##-EXECUTIVE-SUMMARY-AND-RECOMMENDATIONS}
1. insights.
Based on 7,997 released and recorded on IMDB between 2017 and 2019, a summary of audience interest and recommendations are mentioned as below:
- Average Duration: 103.89359
- Total number of Actors: 12611 (7445 actor & 5166 Actress)
1. Year and Month wise Movie Release Pattern:
- A year wise record of movies indicates a slight decrease in number of movies from 3052 movies in 2017 to 2001 movies in 2019.
- Maximum number of movies were released in March, followed by September, October, and January. While more interesting fact is about the least number of movies being released in mid-year and end of year months, could be because of more people prefer vacation and family time in this time of year.
2. Geographical Region Distribution
- USA and India produced 1059 movies together in 2019 alone, way above half of total movies released (2001) in the year.
3. Genre Popularity
- Movies were tagged with genre tags as Drama, Fantasy, Thriller, Comedy, Horror, Family, Romance, Adventure, Action, Sci-Fi, Crime, and Mystery.
- Drama is most popular genre among all the genre with 4285 tags across three years, followed by Comedy and Thriller.
- There were 3289 movies with only one genre tags, while remaining were tagged with multiple genres.
4. The average duration of movies are around 103.89359 minutes, and even genre vise average revolves around the same figure.
5. top production houses.
- Marvel Studios rules the best Production House category with 551245 votes based on the number of votes received by the movies they have produced, followed by Syncopy, and New Line Cinema.
- Star Cinema, and Twentieth Century Fox are the top 2 multi-Lingual production house based on the most number of superhit movies.
6. Top Director
- James Mangold has given most number of Superhit Movies, followed by Soubin Shahir, Joe Russo, and Anthony Russo.
- A.L. Vijay, Andrew Jones, and Chris Stokes are the top directors based on number of movies.
7. Top Actors and Actress
- Mammootty with 8 Superhit movies is most successful actor followed by Mohanlal with 5 Superhits.
- There are quite a few number of actors with 4 Superhit movies under their name, which include Amrinder Gill, Amit Sadh, Johnny Yong Bosch, Tovino Thomas, Dulquer Salmaan, Siddique, Rajkummar Rao, Fahadh Faasil, Pankaj Tripathi, Dileesh Pothan, Joju George, and Ayushmann Khurrana.
- Vijay Sethupathi, Fahadh Faasil, and Yogi Babu are the top three Indian actors who have acted atleast in five movies.
- Taapsee Pannu, Divya Dutta, and Kriti Kharbanda are the top three Hindi Speaking actress who have acted at least in three movies.
- Parvathy Thiruvothu, Susan Brown, and Amanda Lawrence are the best rated actresses in Drama genre.
8. Top-10 movies based on average rating are: Kirket, Love in Kilnerry, Gini Helida Kathe, Runam, Fan, Android Kunjappan Version 5.25, Yeh Suhaagraat Impossible, Safe, The Brighton Miracle, and Shibu
- Based on Median rating counts, most of the movies are rated between 5 and 8, and falls under hit movie categories.
9. Top Grossing Movies
The highest-grossing movies of each year are:
i. Thank You for Your Service, a comedy movie released in 2017
ii. The Villain, a thriller movie released in 2018
iii. Joker, a drama movie released in 2019
2. Recommendation:
Based on Insights, the recommendations for RSVP are as following:
- Concentrate on multi-genre drama-comedy movies with a pinch of thriller, keeping an average duration of around 104 minutes.
- Plan for release of movie between January to March. Focus on multilingual movies which can be launched in India and USA as preferred audience market.
- Rope in either Star Cinema or Twentieth Century Fox as the production house, under the directorial of James Mangold with assistance of A.L. Vijay.
- Mammootty and Mohanlal can be the lead actors along with assistance from other side actors. Inclusion of Vijay Sethupathi would act as stardom promotion for the movie.
- Parvathy Thiruvothu is one of the most rated drama actresses to be brought in.
Use SQL on a Movie Database to Decide What to Watch
Table of Contents
Completing the SQL Movie Database Download
Sql exercises on a movie database, finding all the movies for a given director, using sql on a large existing movie database.
We’ll demonstrate how to use SQL to parse large datasets and gain valuable insights, in this case, to help you choose what movie to watch next using an IMDb dataset.
In this article, we’ll be downloading a dataset directory from IMDb. Not sure what to watch tonight? Are you browsing Netflix endlessly? Decide what to watch using the power of SQL! We’ll be loading an existing movie IMDb dataset into SQL. We’ll analyze the data in different ways like sorting movies by their rating, by what actors star in the movie, or by other similar criteria.
As mentioned in this blog post on how to practice SQL , the best way to practice SQL is by gaining hands-on experience in solving real-world problems, which is exactly what we’ll be doing.
If you have a basic knowledge of SQL, you should be able to follow this article easily. If you have no IT experience whatsoever, consider starting with this SQL A to Z Learning Track designed for people who have no experience in IT and want to start their adventure with SQL.
Let’s get started by learning how to get the movie data into our SQL database.
Let’s walk through the process of downloading our data and loading it into a database management system (DBMS), step by step. Common DBMSs include MySQL, Oracle DB, PostgreSQL, and SQL Server.
Although this article focuses on movie data, you can choose an entirely different dataset. Check out this list of free online datasets you can use and find the one you are interested in. The import of these datasets will be similar regardless of what dataset you use.
Open whatever variety of SQL you are using. For this example, I’ll be using SQL Server Management Studio, but the steps should be similar for all of the other varieties of SQL out there. Let’s get started:
- The dataset files can be accessed and downloaded from https://datasets.imdbws.com/ . The data is refreshed daily.
- basics.tsv.gz
- akas.tsv.gz
- crew.tsv.gz
- episode.tsv.gz
- principals.tsv.gz
- ratings.tsv.gz
- Extract the downloaded zip files. The end result will be a TSV (tab-separated) file for each table.
- Open each file in a spreadsheet application like Google Sheets or Microsoft Excel.
- Find and replace all occurrences of “\N” with an empty cell.
- Save the file as a CSV file. This will make it easier to import into the DBMS of your choice.
- Open your DBMS.
- Create a new schema or table by right-clicking on the left pane and selecting “New Database.” I’ve named my new database “imdb.”
- Set valid data types for each column you are importing. I recommend using nvarchar(MAX) for string columns, since you do not know how long the strings will be for each field. You can change the column datatype later if required.
- Repeat this process for each of the files you have downloaded.
After completing these steps, your SQL movie database will be in place! You are now ready to start analyzing and querying the data.
Thankfully, this dataset came with some descriptive documentation . To get an even better idea of the data, you can quickly select the top 1000 rows from each table.
Let’s start looking for our first movie. Imagine you want to watch a horror movie. How can we isolate only the horror movies? Fortunately, this task is frighteningly simple.
If this query causes any confusion, open this SQL cheat sheet to refresh your knowledge. Have this cheat sheet open for the rest of the tutorial to help you along!
What if we wanted to refine this horror movie list further? We could restrict the results to horror movies created after 1990, with an average rating above 9.0 and at least 10,000 votes.
This will involve getting data from multiple tables. Opening each table and taking a look at the column headers, we can see the following tables will be involved:
- title_basics : handles the genre of movie and the release year (represented by the column startYear ).
- title_ratings : handles the rating ( averageRating ) and votes ( numVotes ).
The two tables can be joined on the shared column, tconst . As explained in the IMDb documentation here , tconst is an alphanumeric unique identifier of the title. Let’s write our query:
titleType | primaryTitle | startYear | genres | averageRating | numVotes |
---|---|---|---|---|---|
videoGame | Resident Evil 4 | 2005 | Action,Adventure,Horror | 9.2 | 11406 |
Executing this query returns a single result, but not the result we want! On closer inspection, we can see that this title is a video game, not a movie. Let’s alter our query to include only movies, and expand the search by reducing the minimum number of votes required to 1,000 and the minimum rating required to 8.0.
titleType | primaryTitle | startYear | genres | averageRating | numVotes |
---|---|---|---|---|---|
movie | Manichitrathazhu | 1993 | Comedy,Horror,Music | 8.7 | 9468 |
Executing this query also yields a single result! Looks like we won’t have to decide what to watch anymore, since there’s only one option that fits our criteria!
Let’s run through another scenario. What if we want to see all of the movies Steven Spielberg has directed? How would this work?
By looking through the tables, we can determine the following:
- name_basics : It contains the names of all actors, writers, directors, and others involved in the creation of film and TV titles.
- title_crew : It acts as a linking table for titles, directors, and writers. We’ll use this table to connect Steven Spielberg to the titles he’s involved with.
- title_basics : We have already used this table. It contains title information like name, release date, rating, etc.
Let’s get to work! Let’s write a query for the name_basics table to try and find the famous director Steven Spielberg.
Executing this query yields a single result:
nconst | primaryName | birthYear | deathYear | primaryProfession | knownForTitles |
---|---|---|---|---|---|
nm0000229 | Steven Spielberg | 1946 | NULL | producer,writer,director | tt0082971,tt0083866,tt0120815,tt0108052 |
This gives us the important value of nconst . From the documentation, we know that nconst is the alphanumeric unique identifier of the name/person.
We can feed this value into the title_crew table, which contains the director and writer information for all the titles in IMDb, and match Steven Spielberg to all the titles he’s involved with.
Executing this query results in a list of 45 titles. You can see from the value of the directors column that Steven Spielberg was the director of them all.
We need a way of using this list of titles alongside the title_basics table to get the name of the movies instead of just the tconst. Let’s use a subquery for this!
Execute this query to see the result:
titleType | primaryTitle | startYear | genres |
---|---|---|---|
movie | Firelight | 1964 | Sci-Fi,Thriller |
movie | The Sugarland Express | 1974 | Crime,Drama |
movie | Jaws | 1975 | Adventure,Thriller |
movie | Close Encounters of the Third Kind | 1977 | Drama,Sci-Fi |
movie | 1941 | 1979 | Action,Comedy,War |
movie | Indiana Jones and the Raiders of the Lost Ark | 1981 | Action,Adventure |
movie | E.T. the Extra-Terrestrial | 1982 | Family,Sci-Fi |
movie | Indiana Jones and the Temple of Doom | 1984 | Action,Adventure |
movie | The Color Purple | 1985 | Drama |
movie | Empire of the Sun | 1987 | Action,Drama,History |
movie | Always | 1989 | Drama,Fantasy,Romance |
movie | Indiana Jones and the Last Crusade | 1989 | Action,Adventure |
movie | Hook | 1991 | Adventure,Comedy,Family |
movie | Jurassic Park | 1993 | Action,Adventure,Sci-Fi |
movie | Schindler's List | 1993 | Biography,Drama,History |
movie | Amistad | 1997 | Biography,Drama,History |
movie | The Lost World: Jurassic Park | 1997 | Action,Adventure,Sci-Fi |
movie | Saving Private Ryan | 1998 | Drama,War |
movie | Minority Report | 2002 | Action,Crime,Mystery |
movie | A.I. Artificial Intelligence | 2001 | Drama,Sci-Fi |
movie | Catch Me If You Can | 2002 | Biography,Crime,Drama |
movie | The Terminal | 2004 | Comedy,Drama,Romance |
movie | Indiana Jones and the Kingdom of the Crystal Skull | 2008 | Action,Adventure |
movie | War of the Worlds | 2005 | Adventure,Sci-Fi,Thriller |
movie | Munich | 2005 | Action,Drama,History |
movie | Lincoln | 2012 | Biography,Drama,History |
movie | The Adventures of Tintin | 2011 | Action,Adventure,Animation |
There we have it, all of the Steven Spielberg movie titles from our database!
Don’t stop here! Write your own custom queries to extract more insights from this large dataset. There are many ways to practice SQL. If you feel like you’ve had enough of working with this dataset, check out this post on 12 Ways to Learn SQL Online for more excellent learning resources.
You have learned how to import and analyze large existing datasets into the DBMS of your choice and to use SQL to analyze a movie database. This is a powerful tool in your SQL arsenal. Not to mention, you’ll never have to worry about not being able to choose a movie to watch again! Completing SQL exercises on movie databases is a helpful way to learn, but if you would like more structure, check out this SQL Practice Set from LearnSQL.com .
You may also like
How Do You Write a SELECT Statement in SQL?
What Is a Foreign Key in SQL?
Enumerate and Explain All the Basic Elements of an SQL Query
Datasets: stanfordnlp / imdb like 216
lengths |
|
---|---|
Dataset Card for "imdb"
Dataset summary.
Large Movie Review Dataset. This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well.
Supported Tasks and Leaderboards
More Information Needed
Dataset Structure
Data instances.
- Size of downloaded dataset files: 84.13 MB
- Size of the generated dataset: 133.23 MB
- Total amount of disk used: 217.35 MB
An example of 'train' looks as follows.
Data Fields
The data fields are the same among all splits.
- text : a string feature.
- label : a classification label, with possible values including neg (0), pos (1).
Data Splits
name | train | unsupervised | test |
---|---|---|---|
plain_text | 25000 | 50000 | 25000 |
Dataset Creation
Curation rationale, source data, initial data collection and normalization, who are the source language producers, annotations, annotation process, who are the annotators, personal and sensitive information, considerations for using the data, social impact of dataset, discussion of biases, other known limitations, additional information, dataset curators, licensing information, citation information, contributions.
Thanks to @ghazi-f , @patrickvonplaten , @lhoestq , @thomwolf for adding this dataset.
Models trained or fine-tuned on stanfordnlp/imdb
tasksource/deberta-small-long-nli
Sileod/deberta-v3-base-tasksource-nli, jiaqilee/imdb-finetuned-bert-base-uncased.
lvwerra/distilbert-imdb
Tasksource/deberta-base-long-nli.
edbeeching/gpt2-imdb
Spaces using stanfordnlp/imdb 24.
IMDb Non-Commercial Datasets
Subsets of IMDb data are available for access to customers for personal and non-commercial use. You can hold local copies of this data, and it is subject to our terms and conditions. Please refer to the Non-Commercial Licensing and copyright/license and verify compliance.
As of March 18, 2024 the datasets on this page are backed by a new data source. There has been no change in location or schema, but if you encounter issues with the datasets following the March 18th update, please contact [email protected].
Data Location
The dataset files can be accessed and downloaded from https://datasets.imdbws.com/ . The data is refreshed daily.
IMDb Dataset Details
Each dataset is contained in a gzipped, tab-separated-values (TSV) formatted file in the UTF-8 character set. The first line in each file contains headers that describe what is in each column. A ‘\N’ is used to denote that a particular field is missing or null for that title/name. The available datasets are as follows:
title.akas.tsv.gz
- titleId (string) - a tconst, an alphanumeric unique identifier of the title
- ordering (integer) – a number to uniquely identify rows for a given titleId
- title (string) – the localized title
- region (string) - the region for this version of the title
- language (string) - the language of the title
- types (array) - Enumerated set of attributes for this alternative title. One or more of the following: "alternative", "dvd", "festival", "tv", "video", "working", "original", "imdbDisplay". New values may be added in the future without warning
- attributes (array) - Additional terms to describe this alternative title, not enumerated
- isOriginalTitle (boolean) – 0: not original title; 1: original title
title.basics.tsv.gz
- tconst (string) - alphanumeric unique identifier of the title
- titleType (string) – the type/format of the title (e.g. movie, short, tvseries, tvepisode, video, etc)
- primaryTitle (string) – the more popular title / the title used by the filmmakers on promotional materials at the point of release
- originalTitle (string) - original title, in the original language
- isAdult (boolean) - 0: non-adult title; 1: adult title
- startYear (YYYY) – represents the release year of a title. In the case of TV Series, it is the series start year
- endYear (YYYY) – TV Series end year. ‘\N’ for all other title types
- runtimeMinutes – primary runtime of the title, in minutes
- genres (string array) – includes up to three genres associated with the title
title.crew.tsv.gz
- directors (array of nconsts) - director(s) of the given title
- writers (array of nconsts) – writer(s) of the given title
title.episode.tsv.gz
- tconst (string) - alphanumeric identifier of episode
- parentTconst (string) - alphanumeric identifier of the parent TV Series
- seasonNumber (integer) – season number the episode belongs to
- episodeNumber (integer) – episode number of the tconst in the TV series
title.principals.tsv.gz
- nconst (string) - alphanumeric unique identifier of the name/person
- category (string) - the category of job that person was in
- job (string) - the specific job title if applicable, else '\N'
- characters (string) - the name of the character played if applicable, else '\N'
title.ratings.tsv.gz
- averageRating – weighted average of all the individual user ratings
- numVotes - number of votes the title has received
name.basics.tsv.gz
- primaryName (string)– name by which the person is most often credited
- birthYear – in YYYY format
- deathYear – in YYYY format if applicable, else '\N'
- primaryProfession (array of strings)– the top-3 professions of the person
- knownForTitles (array of tconsts) – titles the person is known for
Get started
Contact us to see how IMDb data can solve your customers needs.
Navigation Menu
Search code, repositories, users, issues, pull requests..., provide feedback.
We read every piece of feedback, and take your input very seriously.
Saved searches
Use saved searches to filter your results more quickly.
To see all available qualifiers, see our documentation .
- Notifications You must be signed in to change notification settings
Assignment on IMDB database using sqlite3 and pandas
GopiSumanth/SQL
Folders and files.
Name | Name | |||
---|---|---|---|---|
3 Commits | ||||
Repository files navigation
Assignment on IMDB database using sqlite3 and pandas This repository contains Db-IMDB database and its schema is in db_schema file. Required SQL commands are present in mySql Commands file. It is kind of my notes on SQL The Assignment questions are present in sql_questions file and the solutions are present in solutions.ipynb
NOTE: If anyone found better way to solve the assignment questions kindly let me know. My email: [email protected]
- Jupyter Notebook 100.0%
IMAGES
VIDEO
COMMENTS
Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data science goals.
SQL queries performed on IMDb database to provide recommendations to RSVP Movies based on insights. sql imdb-dataset rsvp-movies Readme Activity 7 stars 1 watching 9 forks Report repository
The questions in each segment with business objectives are written in the script given below. You have to write the solution code below every question and submit the same SQL script file with the solution in the 'Submission' segment. About the assignment Where do I get the data from?
SQl practice with imdb data. Contribute to stegemang/sqlimdb development by creating an account on GitHub.
End-to-End data analysis of the famous IMDb data set with business context, codes and visuals.
The IMDb dataset is a treasure trove of information for movie enthusiasts and data analysts alike. In this article, we'll embark on a journey through the IMDb dataset using SQL queries to answer ...
Movies and TV Series have always been my favorite pastime. My first Data analysis project for GDAC was based on the Movie Industry dataset in Kaggle — How To Invest in Popular/Profitable Movies ...
Explore and run machine learning code with Kaggle Notebooks | Using data from IMDb Project (SQL)
Domain: Movies Tech Stack: SQL Objective: RSVP Movies plans to produce next movie based on data of highest rated movies released in the past three years Key Achievement: Found the correct genre ...
This project was carried out to answer a set of analytical questions to suggest a Movie Production House on which set of actors, directors, and production houses would be the best fit for a super hit commercial movie.
SQL is the best way to interact with large datasets. This article demonstrates how to query a vast existing movie dataset from IMDb.
IMDb (Internet Movie Database) is one of the most recognized names for its comprehensive online database collection of movies, films, TV series and so on. As of today (July 2020), you'll see through the following data pull that IMDb database has approximately 7 million titles. In this article, I will use Python in Jupyter Notebook to demonstrate where to pull the data, how to quickly ...
Db-IMDB-Assignment.db - Sample IMDB database that we would be using. sql_questions.pdf - List of 10 SQL problems. sql_on_IMDB_dataset.ipynb - IPython Notebook with all the solutions. We would be using python pandas library in a ipython notebook to coonect to the given database and run our sql queries.
Explore and run machine learning code with Kaggle Notebooks | Using data from imdb-sqlite-dataset
The actors fall in love at first sight, words are unnecessary. In the director's own experience in Hollywood that is what happens when they go to work on the set. It is reality to him, and his peers, but it is a fantasy to most of us in the real world. So, in the end, the movie is hollow, and shallow, and message-less.
Explore and run machine learning code with Kaggle Notebooks | Using data from IMDb Project (SQL)
IMDb Non-Commercial Datasets Subsets of IMDb data are available for access to customers for personal and non-commercial use. You can hold local copies of this data, and it is subject to our terms and conditions. Please refer to the Non-Commercial Licensing and copyright/license and verify compliance.
Much More Than SQL Alone With that being said, let me share with you what I appreciated the most about the recent SQL Scavenger Hunt on Kaggle, which was so nicely put together by Rachel Tatman.
Assignment on IMDB database using sqlite3 and pandas This repository contains Db-IMDB database and its schema is in db_schema file. Required SQL commands are present in mySql Commands file.
Explore and run machine learning code with Kaggle Notebooks | Using data from Top 100 IMDB Movies Dataset.
14. Project - 9 | Data Analysis | IMDB Movie Dataset | Python Pandas Project | Kaggle Dataset Data Thinkers 15.7K subscribers Subscribed 1.1K 57K views 3 years ago #DataAnalysisProject #PandasProject
Explore and run machine learning code with Kaggle Notebooks | Using data from IMDB 5000 Movie Dataset.
Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic. Dataset from IMDb to make a recommendation system