Skip to content

Latest commit

 

History

History
123 lines (80 loc) · 4.44 KB

README.md

File metadata and controls

123 lines (80 loc) · 4.44 KB

opendisclosure

Overview

The goal of the project is to produce useful visualizations and statistics for Oakland's campaign finance data, starting with the November 2014 mayoral race.

Meeting notes can be found in this Google Doc.

Running Locally

To start, you'll need ruby installed.

brew install rbenv
brew install ruby-build
rbenv install 2.1.2

Then install bundler and foreman:

gem install bundler
gem install foreman

Install postgres:

brew install postgres

# choose one:
# A) to start postgres on startup:
ln -sfv /usr/local/opt/postgresql/*.plist ~/Library/LaunchAgents
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist

# B) or, to run postgres in a terminal (you will have to leave it running)
postgres -D /usr/local/var/postgres

ARCHFLAGS="-arch x86_64" gem install pg

Now you can install the other dependencies with:

bundle install

Create your postgresql user: (may be unnecessary, depending on how postgres is installed):

sudo -upostgres createuser $USER -P
# enter a password you're not terribly worried to share
echo DATABASE_URL="postgres://$USER:[your password]@localhost/postgres" > .env

You should be all set. Run the app like this:

foreman start

Then, to get a local copy of all the data:

bundle exec ruby backend/load_data.rb

Data Source

The raw, original, separated-by-year data can be found on Oakland's "NetFile" site here: http://ssl.netfile.com/pub2/Default.aspx?aid=COAK

We process that data in a nightly ETL process. Every day (or so) this dataset is updated with the latest version of the data. There is a data dictionary of what all the columns mean here.

Current technology choices to-date:

  • Javascript/HTML/CSS
  • D3.js for visualization
  • Ruby (probably) for backend data acquisition and processing
  • Neo4j (graph database) for storying and querying contributions data

The raw data needs to be cleaned. A few common problems are:

  • Misspellings
  • Mischaracterized/vague occupation

Development Process

Trying out an "agile" approach, with a basic first version:

  • Clean a sample dataset (for one point in time) using a list of rules we come up with
  • Use cleaned dataset to answer the below 5 key questions
  • Then try to automate the cleaning to the extent possible for automatic updates/dynamic data

5 Key Questions from the Public Ethics Commission

(Click a question to be taken to the GitHub Issue for that question.)

1. Who are the top 5-10 contributors to each campaign? (people or company)

2. Which industries support each candidate? (top 5 industries, aggregate amount given from this industry to each candidate, percentage that this contribution makes in the committee’s entire fundraising efforts for this reporting period)

3. Bar graph showing how much campaign committee has raised so far versus how much that committee has spent in expenditures on the campaign.

4. What percentage of campaign contributions to each mayoral candidate are made from Oakland residents vs. others?

5. Evaluate any overlap between corporations and industries that employ and register a lobbyist with the City of Oakland and campaign contribution and expenditure data.

TABLETOP CODE

Initialize Tabletop

var gsheet = "https://docs.google.com/spreadsheet/pub?key=0AnZDmytGK63SdDVyeE9ONFctMnRSU2VjanhZTUJsN1E&output=html";

$(document).ready(function(){
    Tabletop.init( { key: gsheet,
                       callback: setTheScene,
                       proxy: IF USING PROXY,
                       wanted: ["SHEETi"], ##can haz multiple sheets
                       debug: true } );
});

function setTheScene(data, tabletop){
  $.each( tabletop.sheets("SHEETi").all(), function(i, sheeti) {
    var insertRow = [];
    insertRow[0] = sheeti.columnname; #enter column name here
    ARRAY.push(insertRow); ## push to array of choice
  });
}