Scoping Projects

We wrote about what a good Data Science for Social Good project looks like. Our post begins:

Data Science for Social Good is a summer program that requires year-round preparation. A successful summer requires a mix of good people and projects, and we spend a lot of time trying to find projects to solve and the people to solve them. In addition to reading over 800 applications from aspiring fellows, mentors, and project managers, we’ve spent numerous hours researching, pursuing, and scoping projects: exploring datasets, speaking with representatives, and wrangling with attorneys. Well over a hundred projects will cross our emails, phones, and eyes before we find the 12 to do next summer.

Read more here.

Visiting the Charlotte-Mecklenburg Police Department

Ayesha Mahmoud, Kenny Joseph, and I wrote about the trip and how it affects our work. The post begins:

Police departments around the country have been in the spotlight recently because of several controversial, high-profile incidents. Tragic events in Ferguson, New York City, Baltimore, and elsewhere have highlighted the need for police departments to better address the issue of adverse interactions between the police and the public. Many police departments are working hard to avoid these negative interactions with new technologies and tactics, while others are leading new data collection efforts.

This summer, as part of the White House Police Data Initiative, fellows Sam Carton, Kenny Joseph, Ayesha Mahmud, and Youngsoo Park, technical mentor Joe Walsh, and project manager Lauren Haynes are working with the Charlotte-Mecklenburg Police Department (CMPD) on a novel approach: using data science to improve the department’s Early Intervention System (EIS) for flagging officers who may be at a high risk for being involved in an adverse interaction.

Read more here.

Finding Legislative Plagiarism

Eugenia Giraudy and I wrote a blog post introducing our Data Science for Social Good project:

In 2005, Florida implemented a new “Stand Your Ground” law, which legally protected the use of deadly force in self-defense. The law, which removes the “duty to retreat” when a person is threatened with serious bodily harm, gained national attention after George Zimmerman fatally shot Trayvon Martin in 2012.

Soon after its passage in Florida, Stand Your Ground laws went “viral,” spreading to other parts of the country. Currently, at least two dozen states have implemented a version of Florida’s legislation. These laws didn’t arise in response to broad, spontaneous popular demand. Interest groups, in particular the National Rifle Association and the American Legislative Exchange Council (ALEC), drafted a model bill to ease passage across the country. Ten states have passed nearly identical bills to the ones Florida used and ALEC promoted.

Read more here.

Simple (as Possible) Drake Installation

Factual has created a data-workflow tool called Drake. Drake lets the analyst outline her command-line instructions -- including data collection, pre-processing, analysis, validation, and visualization -- and easily run them together. If the analyst modifies code or data in the workflow, Drake naturally re-runs all instructions that depend on that modified piece. This makes for a cleaner, more efficient, more reproducible workflow.

Installing Drake requires Java JDK, Leiningen, the Drake uberjar, and a shell script. Here I provide a series of steps that can install these things on an Ubuntu system. Note that I chose to put the Drake files in the /usr/local/bin/ directory, which resides in my PATH. Continue reading

Adding New Users to Existing EC2s

It's somewhat straightforward to add a user to an AWS security group and then create an AWS instance that the new user can access. It's more difficult to grant a new user access to existing instances. I don't want to waste time trying to find an answer again (Amazon, your AWS documentation could use some work!), so I'm posting my solution here, where future me and other confused individuals can find it. Here are the steps:

  1. Create a new user in IAM.
  2. Go to 'users' in IAM and add the new user to the appropriate security group.
  3. Go to 'users' in OpsWorks and 'import IAM users' (at the bottom of the page).
  4. Choose the user(s) you'd like to add and click on 'import to OpsWorks'.
  5. Click on the user you just imported and copy the public key into the provided box. Also enable SSH access (a checkbox below) so the user can SSH into the instance.

OpsWorks executes a recipe automatically that pushed the new user's permissions to the instances, which takes a minute or two. The new user can now log in.

To remove the user's permissions, go to OpsWorks -> users -> [user's account] -> 'deny permission'

OpenGov Voices: Bringing Transparency to Earmarks Buried in the Budget

My Data Science for Social Good earmarks team wrote a post for the Sunlight Foundation about difficulties encountered in trying to find congressional earmarks. It begins:

Last week, President Obama kicked off the fiscal year 2016 budget cycle by unveiling his $3.99 trillionbudget proposal. Congress has the next eight months to write the final version, leaving plenty of time for individual senators and representatives, state and local governments, corporate lobbyists, bureaucrats, citizens groups, think tanks and other political groups to prod and cajole for changes. The final bill will differ from Obama’s draft in major and minor ways, and it won’t always be clear how those changes came about. Congress will reveal many of its budget decisions after voting on the budget, if at all.

To continue reading, click here.

Who Attends Chicago Public Schools? A Breakdown by Race

While Chicagoans understand that white students are less likely to enroll in CPS schools than students of color, few seem to know how big the difference is.  Using Bayes' formula and publicly available data (located here, here, and here), I calculated the probability that a Chicago child of African American, Hispanic, Asian, white, and multi-racial descent attends the public schools.  Here are the results:

5-19 year olds in ChicagoPr(attends CPS)Pr(race)Pr(race | attends CPS)Pr(attends CPS | race)
African American0.79710.3900.3970.81

Pr(attends CPS) is the probability that a child (between 5 and 19 years old) in Chicago attends a public school.  Pr(race) is the probability that a randomly chosen child in Chicago belongs to a given race; for example, the probability that a Chicago child is white is 15.5%.  Pr(race | attends CPS) is the probability that a randomly chosen CPS student is a given race; for example, about 39.7% of CPS students are African American.

From these three inputs, I calculate Pr(attend CPS | race), the probability that a child of a given race attends a Chicago public school.  While nine in ten Hispanic children and eight in ten African American children in Chicago are enrolled in CPS, fewer than five in ten white children are.  An African American student is 1.7 times more likely to attend a CPS school than a white student.  An Hispanic student is almost twice as likely.

Shining a Light on Earmarks

I am serving as a mentor for the Eric and Wendy Schmidt Data Science for Social Good program this year. Madian Khabsa (one of my fellows) and I wrote about our Congressional-earmarks project for the DSSG blog.  Here's the beginning of the article:

Earmarks have been called “the best known, most notorious, and most misunderstood aspect of the congressional budgetary process.” These government budget items allocated to specific people, places, or projects are alternately described as a subversion of democracy or an important negotiation tool to smooth the passage of controversial legislation. But despite the attention earmarks attract, they remain extremely tedious and time-consuming to identify in federal bills and reports that may be hundreds of pages long.

This summer, Data Science for Social Good fellows Matthew Heston, Madian Khabsa, Vrushank Vora, and Ellery Wulczyn and mentor Joe Walsh, working with Christopher Berry at the Harris School of Public Policy, will help shine a light on earmarks, building computational tools to automatically identify them in Congressional texts.

You can read more here.