Play Mongo with Twitter?
In this assignment you will be using MongoDb to interact with a database. For this assignment you
have to install and work with MongoDb.
The data you will be testing during this assignment is a twitter posts about one the FIFA World Cup,
either in reality or virtually using one of the gaming equipments; we do not really care about the
Use the [url removed, login to view] file that . Also you will find the data along with the material provided for this
assignment. The [url removed, login to view] is the file we used in the lecture to lay the ground for using and playing with
MongoDb. It has the needed commands to download, install, and load the database with the required
data for this assignment.
After loading the data into the database, you should be able to perform some checks.
Part 1: Only simple stuff: (on the users collection)
1- Write a query to count how many documents the collection has.
2- Write a query to show the first 5 documents nicely formatted.
3- Write a query to find the unique values for user_name field.
4- Write a query to find the number of unique values for user_name field.
5- Write a query to find the tweet text for the user james_the_cat1. Do not show the id field.
6- Write a query to find the count of documents where Followers Count is greater that the Friends
7- Write a query to find the count of documents without location information
8- Write a query to find the username of the user with the most followers.
9- Write a query to find the username of the user with the least friends.
10- Write a query to count all the documents where tweet text includes the word goal and do not
include the word FIFA. (hint: an index may be needed here)
Part 2: MapReduce again …… really? … Yes
[url removed, login to view]
1- Write a map-reduce that calculates the total number of documents for each tweet text.
(Coding hint: The easiest way to create functions in the mongo shell, is to write the methods in a
text editor and paste them in, then call them within the shell.)
1- You should be developing this project under the Cloudera virtual machine. You should have
installed it at the beginning of this semester.
language used in class.
3- In the one file you have to submit, you are required to do the following:
a. Name the file [url removed, login to view]
b. Your code file has to start with a block of comment.
1. This comment block has: Students names, ids, and sections
c. Then, part 1 answers. Every answer query in one line only. You have to number the lines
to match the questions.
d. Last, part 2 answer where you have to put 3 blocks of code:
i. The map function code,
ii. The reduce function code, and
iii. Code to invoke map and reduce to solve the task in part 2.
4- You have to make sure that your code runs error-free, especially compilation/interpretation
errors. We will not debug or fix any errors. Very low score is expected in this case.
5- How to run the code:
a. Part1: create the scripts and execute these in the MongoDb shell.
b. Part2: see the hint above.