Tableau, the U.S. Census Bureau, and data.world recently co-hosted a vizathon (hackathon for data visualization) at Capital Factory in Austin. I participated as a way to practice my skills with Tableau and to meet other people interested in saving the world through data viz projects.
The week before the vizathon, I became aware that Farm & City is in the middle of promoting public transit advocacy in Texas through their 1000 Texans for Transit campaign. They noted that, of states with large cities, only Texas and Ohio do not provide any dedicated funding for public transit use. Transit use is an important health determinant in urban areas, since transit use is associated with greater physical activity, lower traffic fatality rates, and lower transportation emissions per mile traveled. The goal of this dashboard project is to display transit use and transit spending, at the metropolitan level. The original map we presented to vizathon judges was a work-in-progress. The map below is the final project, and additionally includes the whole U.S. as well as transit funding data. This dashboard is best displayed on Tableau’s website.
The U.S. Census Bureau through their annual American Community Survey (ACS) estimates the number and percentage of commuters aged 16 and up who use various forms of transportation to get to work, including public transit. While work commutes comprise a fraction of all trips people take, this data set is available at the neighborhood level and for the entire United States. Because we were working at the Census tract level, we used the most recent 5-year data (2012-2016) to get data that is relatively accurate and up-to-date. Using this one measure, we can compare transit use in metro areas across state lines. The Census API is quite user-friendly, but requires multiple API pulls to get sub-county data for more than a single state. By happenstance, a statistician friend of mine was on my bus the morning of the vizathon and suggested we use the acs library in R to more easily pull data from the API – and that’s what I did to expand the data set beyond Texas after the competition.
A major (necessary, but not sufficient) factor that predicts transit use is population density since denser areas are better served by transit systems due to more people concentrated in the same general commute routes. As well, cities vary considerably by residential population density, so to make more useful comparisons between metro areas, it makes sense to consider the proportion of workers that use transit given the density of their neighborhood. While the Census Bureau no longer publishes the geographic area of Census tracts, we were able to extract land area from the shapefile data provided by Census Bureau staff.
The transit funding data came from the Federal Transit Administration and was suggested by Farm & City staff. Because some metro areas have more than one transit service, I aggregated all funding that flowed to any transit system within a given metro area by funding source and year.
The judges, who represented each of the co-hosts, considered the three projects that were submitted at the vizathon, and determined that this transit use dashboard had the most potential. One of the judges suggested adding data about transit funding, so I added that additional component after the event to the final dashboard. Pictured to the right are the winning team: Constantine Murenin, Heike Jost, and me.
Possible next steps
Given more time for this project, there are several components that could make this Tableau dashboard more useful for decision makers. First, it’s a bit clunky in that transit use and transit funding charts require separate pull down menus to operate. This is due to issues joining several of the data sets based on data formatting. As well, there is an unintentional feature on the map where scrolling over any Census tract with zero transit use highlights all similar tracts – I’d like to eliminate that.
The second improvement would be to include jobs and student density, since people spend a great deal of their time at places other than home. Residential population is the standard way people think about population counts, but people are rarely commuting from a residential area to another residential area. This would help in calculate more accurate estimates of transit use based on where people are at, not just where their beds are located. A good source for data on estimated number of jobs by Census tract is OnTheMap.
A third improvement would be to use other sources of data for transit use, to better capture the reality that most bus and train trips are not for commuting purposes. It might also be useful to consider transit spending per user, or per capita, within each metro area. If you know of additional good transit use data, please share it!