Skip to content

trekirkman/udacity-newsdata

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Newsdata Logs Project

Overview

newsdata.py is a database reporting tool written in Python (2.7) for the Udacity Full Stack Nanodegree program. It connects to the newsdata.sql database, which features information relating to a hypothetical web-based news platform. The program executes PostgreSQL queries and prints the resulting data in a formatted list.

This tool seeks to answer the following questions:

  1. Who are the top 3 most popular authors on this site, ranked by article views?
  2. Who are the most popular authors, ranked by article views?
  3. Are there any days where the percentage of HTTP logs with error codes exceeds 1%?

Package Contents

1. newsdata.py

Python file containing all code to execute the queries and print results

2. newsdata_output.txt

Plain text file containing printed results from newsdata.py


Instructions

Install Python 2.7

Download can be found here

Install Virtual Box

VirtualBox is software that runs the Virtual Machine (VM) that will be used to run this program. Download here, and install the platform package. Note: you do not need the extension pack, SDK or to launch the program

Install Vagrant

Vagrant configures the VM and establishes a shared directory between the VM and your local filesystem. Download here and download vagrantfile from this repo. Navigate to the directory containing vagrantfile and run

vagrant up

to set up the VM. Note: this process should take a while.

When finished run

vagrant ssh

to log in to the virtual machine. For more information, see the Vagrant documentation here.

Download Database

Download newsdata.sql, found here and place in your vagrant directory. To connect to the database run:

psql -d news -f newsdata.sql

Create Custom SQL Views

newsdata.py makes use of three custom SQL Views for convenience. Download create_views.sql and then import the views:

psql -d news -f create_views.sql

Download Program & Run

Download newsdata.py file to the same directory in the VM as newsdata.sql, and run:

python newsdata.py


Custom Views

newsdata.py makes use of three custom SQL Views for convenience. The code used to create these views is listed below:

1. log_path: View that displays log paths and associated view counts

SELECT path, count(*) AS views
FROM log
GROUP BY log.path;

2. t_logs: View that displays total logs per day

CREATE VIEW t_logs as
SELECT date_trunc('day', time) as day, 
count(*) as total 
FROM log 
GROUP BY day;

3. e_logs: View that displays total logs per day with an error status code

CREATE VIEW e_logs as
SELECT date_trunc('day', time) as day, 
count(*) as errors 
FROM log 
WHERE status like '4%' 
GROUP BY day;

Functions

1. get_pop_articles()

Executes an SQL query that returns the top three articles of all time, ranked by view count.

1. get_pop_authors()

Executes an SQL query that returns all authors in the database, rank ordered by combined popularity of articles written.

1. get_error_logs()

Executes an SQL query that returns days where the database logs contained > 1% error status codes.


License

MIT

About

Data analysis tool written in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages