newsdata.py is a database reporting tool written in Python (2.7) for the Udacity Full Stack Nanodegree program. It connects to the newsdata.sql database, which features information relating to a hypothetical web-based news platform. The program executes PostgreSQL queries and prints the resulting data in a formatted list.
This tool seeks to answer the following questions:
- Who are the top 3 most popular authors on this site, ranked by article views?
- Who are the most popular authors, ranked by article views?
- Are there any days where the percentage of HTTP logs with error codes exceeds 1%?
Python file containing all code to execute the queries and print results
Plain text file containing printed results from newsdata.py
Download can be found here
VirtualBox is software that runs the Virtual Machine (VM) that will be used to run this program. Download here, and install the platform package. Note: you do not need the extension pack, SDK or to launch the program
Vagrant configures the VM and establishes a shared directory between the VM and your local filesystem. Download here and download vagrantfile
from this repo. Navigate to the directory containing vagrantfile
and run
vagrant up
to set up the VM. Note: this process should take a while.
When finished run
vagrant ssh
to log in to the virtual machine. For more information, see the Vagrant documentation here.
Download newsdata.sql
, found here and place in your vagrant directory. To connect to the database run:
psql -d news -f newsdata.sql
newsdata.py
makes use of three custom SQL Views for convenience. Download create_views.sql
and then import the views:
psql -d news -f create_views.sql
Download newsdata.py
file to the same directory in the VM as newsdata.sql, and run:
python newsdata.py
newsdata.py
makes use of three custom SQL Views for convenience. The code used to create these views is listed below:
SELECT path, count(*) AS views
FROM log
GROUP BY log.path;
CREATE VIEW t_logs as
SELECT date_trunc('day', time) as day,
count(*) as total
FROM log
GROUP BY day;
CREATE VIEW e_logs as
SELECT date_trunc('day', time) as day,
count(*) as errors
FROM log
WHERE status like '4%'
GROUP BY day;
Executes an SQL query that returns the top three articles of all time, ranked by view count.
Executes an SQL query that returns all authors in the database, rank ordered by combined popularity of articles written.
Executes an SQL query that returns days where the database logs contained > 1% error status codes.