In September of 2015, the U.S. Government launched the College Scorecard website. The site is designed to allow parents, students and other interested consumers to easily compare statistics on higher education institutions. Key stats such as cost, post-graduation earnings, debt levels, employment levels, and other facts are found within the government’s data sets. Full data sets are available on the college scorecard website.
For my topic, I will be analyzing the government’s data set in order to better understand college as both an investment and financial decision. The raw data – containing information ranging from 1996-2015 – are large, with each year representing 100MB of data. The uncompressed files total 2GB in size. In terms of the data table, the files each have roughly 7,700+ rows and 1,700+ columns.
In order to tackle this dataset I plan to break the data into more manageable pieces. Some of this has already been done by the government and can be downloaded from the College Scorecard website. As stated in the reading, this is the ultimate goal of information architecture, which is to make information findable and understandable. In my next post, I plan to consider the ways in which I can break down and compare the data in more detail.