Visualizations are a tool for understanding data. They help to express relationships between data, the significance of data points, and the importance of data without having to scroll through tables or calculate percentages. Visualizations provide immediate understanding based on vision and interpretation of the image. All of the data used for this exercise is from the 2013 College ScoreCard dataset, which provides the most complete data I found as of this date.
One of the first images I created is a map of the United States showing the tuition of bachelor’s degree granting universities across geographies. The size of circles indicate the amount of in-state tuition charged by the institution, while the color of the circle indicates whether the institution is public (light blue) or non-profit private (dark blue). As seen, dark blue circles tend to be significantly larger on average as expected. The visualization also shows that in terms of distribution, more expensive private universities tend to be located in the Northeast while the South and Midwest feature less expensive public and private options.
The second graph I produced shows the geographic distribution of student populations using a bubble graph. The graph demonstrates that most undergraduate students are located in the Southeast region (1,994,386) while the fewest are located in the Rocky Mountains region (410,525). I created a legend separately to spell out each state.
My final visualization shows the correlation between Family Income and SAT at universities in Illinois. As seen in the graph, there appears to be a positive correlation between an increase in family income and the related increase in test scores. Northwestern University and the University of Chicago, two of the most prestigious universities in the state, both have the highest test scores and the highest average family income of any undergraduate degree granting institution.
While it’s important to note that these visualizations are not developed using entirely scientific methods, they still serve as a useful way to digest what is otherwise a noisy dataset or table. This exercise has taught me the importance of visualizations and the utility that tools such as Tableau can provide when dealing with data.
As discussed in class, data can be noisy, messy and at times difficult to understand. However, using data and data analysis tools we can transform data into information. The key distinction lies in usability. Information contains relevant and easily digested facts and figures. Part of the process of converting data to information involves reducing the volume being presented. In this post, I’ll explore ways to decrease the size and scope of my data-set to make my analysis more manageable.
Because my data-set centers on U.S. higher education, five key categories stand out as relevant for narrowing down the data: State, Private vs Public, Size, Institution Type, and Financial Aid. For example, I might want to focus on private universities in Illinois with 5,000 to 15,000 students. To further narrow down my analysis, I would focus on pulling the key figures relevant to my analysis, such as cost, employment rate, and debt and earnings levels after graduation. Not only will this narrow down the scope of my data-set, it will allow for more relevant comparisons between similar institutions.
In my next post, I will investigate data analysis tools I can use to visualize and present the information extracted from the data.
For my project, I plan to look at the college scorecard dataset
In September of 2015, the U.S. Government launched the College Scorecard website. The site is designed to allow parents, students and other interested consumers to easily compare statistics on higher education institutions. Key stats such as cost, post-graduation earnings, debt levels, employment levels, and other facts are found within the government’s data sets. Full data sets are available on the college scorecard website.
For my topic, I will be analyzing the government’s data set in order to better understand college as both an investment and financial decision. The raw data – containing information ranging from 1996-2015 – are large, with each year representing 100MB of data. The uncompressed files total 2GB in size. In terms of the data table, the files each have roughly 7,700+ rows and 1,700+ columns.
In order to tackle this dataset I plan to break the data into more manageable pieces. Some of this has already been done by the government and can be downloaded from the College Scorecard website. As stated in the reading, this is the ultimate goal of information architecture, which is to make information findable and understandable. In my next post, I plan to consider the ways in which I can break down and compare the data in more detail.