Process
Steps in Data Extraction, Transformation and Load:
- Extracted data from web sources using an CSV file
- Load raw data and clean in Python
- Load data into prospective AWS, ML and Tableau
Steps in AWS:
- Load raw data into S3
- Create RDS Database
- Create a server in Postgres to connect to AWS(RDS)
- Utilize Google Collaborator and Python to write SW raw data to RDS
- Utilize AWS Quicksight Data Analytics
- Due to time constraints, we were only able to successfully install EC2
Interactive Data Visualizations:
- Rendered HTML and CSS
- Several Visualizations were produced by Tableau
- Plotly based graphs from Machine Learning models
- Data table using HTML
Project Outcomes
- Our Story -
- Machine Learning -
- Conclusions from Linear Regression:
- Implementation of Other Models:
- Tableau Visualizations -
We are exploring the global popularity of VIDEO GAMES. We will analyze the gaming industry trends for the last 40 years with the popularity of consoles and specific games. We can visualize the data with the different attributes we have learned in class with an interactive dashboard and visuals with the data provided. The data will be useful to understand the video game industry. Such data can help a video game company's marketing department, who wants to understand the sales generated by a different genre of video games by what consoles. This data can also help make projections about the new games coming in the industry and how likely they are to perform in the industry. One can even understand the trends of the competitor. Our work aims to analyze the consumption of video games worldwide and consider the trends in genres' popularity across time.
Video Game data had three scores in the form of the rating score, which was Play Score, Game Score, and Critic Score, respectively. To describe each of them, the Play Score is the score obtained from the Player Ratings on various games, Game Score is the rating given to each game by the developer of the game, and the Critic Score is obtained from the critic rating for the games. To study the relevance of these scores, we ran some Machine Learning Algorithms on our data (the analysis was done on North America Sales since it is the biggest market in sales). The following are the conclusions drawn.
All Scores were spread identically w.r.t their correlation with North America Sales Comparatively amongst all three Scores, Critic Score had the highest positive correlation with Video Game Sales. Low positive correlation found between the three scores and North America Sales. In conclusion, these scores are not a good predictor of the Video Game Sales.
Through data manipulation and machine learning we are able to find indicators of higher sales. The most promising model was a deep neural network model that took inputs to predict the label of a top-selling video game. And after adjusting parameters the model was correctly able to predict outcomes 89% of the time.
Through Tableau software, we were able to show the data in new ways with different perspectives. Considering that our data sets are so large, this is a great way to summarize and display the data for the general population to understand. Tableau makes the different charts very easy to understand, and they are also interactive. The tooltips are great to hover over a marker and show the related information. We were able to adapt to the trend line and create a forecast for future sales.