Student Debt Story

The issue of student debt is one of the most pressing for the millennial generation. Millions of young people every year leave college with an ever increasing burden of debt, threatening their purchasing power and financial security. The data bears these facts out, as the tableau story below will demonstrate. The purpose of this story is to demonstrate that student debt continues to be an unaddressed issue and is affecting the viability and ultimate value of the current higher education model.

The story type I used was change over time. Using the college scorecard data set, I was able to pull a decade of tuition, debt and income data for Non-Profit and Public institutions. The story looks at change in tuition, total debt and debt-to-income from 2003 to 2013. I provided three story points in order to tell the story. Each story point should logically build on the previous one – setting the stage with the increase in tuition leads to the consequential increase in debt burden. Because tuition exceeds the increase in income over the period – student debt-to-income naturally increases as well.

Visualizing Colleges Using Tableau

Visualizations are a tool for understanding data. They help to express relationships between data, the significance of data points, and the importance of data without having to scroll through tables or calculate percentages. Visualizations provide immediate understanding based on vision and interpretation of the image. All of the data used for this exercise is from the 2013 College ScoreCard dataset, which provides the most complete data I found as of this date.

One of the first images I created is a map of the United States showing the tuition of bachelor’s degree granting universities across geographies. The size of circles indicate the amount of in-state tuition charged by the institution, while the color of the circle indicates whether the institution is public (light blue) or non-profit private (dark blue). As seen, dark blue circles tend to be significantly larger on average as expected. The visualization also shows that in terms of distribution, more expensive private universities tend to be located in the Northeast while the South and Midwest feature less expensive public and private options.

US Tuition Map_Private Non Profit & Public Bachelors

The second graph I produced shows the geographic distribution of student populations using a bubble graph. The graph demonstrates that most undergraduate students are located in the Southeast region (1,994,386) while the fewest are located in the Rocky Mountains region (410,525). I created a legend separately to spell out each state.

US Undergrad Population by RegionUS Undergrad Population by Region Agenda

My final visualization shows the correlation between Family Income and SAT at universities in Illinois. As seen in the graph, there appears to be a positive correlation between an increase in family income and the related increase in test scores. Northwestern University and the University of Chicago, two of the most prestigious universities in the state, both have the highest test scores and the highest average family income of any undergraduate degree granting institution.

Test Scores & Family Income in Illinois.PNG

While it’s important to note that these visualizations are not developed using entirely scientific methods, they still serve as a useful way to digest what is otherwise a noisy dataset or table. This exercise has taught me the importance of visualizations and the utility that tools such as Tableau can provide when dealing with data.

Counting Illinois Colleges

To dive deeper into my dataset, I used Google BigQuery to run SQL queries on my data table. Because of the size of my dataset (over 10MB) I had to first upload the file to Google Cloud.

In order to sample the use of BigQuery on my dataset, I decided to query the number of accredited undergraduate degree granting publc and non-profit institutions in the state of Illinois. In order to run this query I constructed the following in SQL:

FROM [neat-sunspot-164714:CollegeScoreCard.Dataset_2013]

The structure of the query is to select all the results (not excluding any columns) from my table where the state is Illinois, the predominate degrees granted are for undergraduate, and control of the institution is not for-profit.

A count expression provides: 71 results

I’m actually somewhat surprised by the number of results, however they’re confirmed by viewing the institution names, id’s and cities from the table. The query took about 5 seconds to run, despite needing to run through tens of thousands of rows.

145691 Illinois College Jacksonville IL
146481 Lake Forest College Lake Forest IL
143288 Blackburn College Carlinville IL
146825 MacMurray College Jacksonville IL
148405 Rockford University Rockford IL
144351 Concordia University-Chicago River Forest IL
147341 Monmouth College Monmouth IL
148131 Quincy University Quincy IL
143084 Augustana College Rock Island IL
148496 Dominican University River Forest IL
149505 Trinity Christian College Palos Heights IL
148584 University of St Francis Joliet IL
147660 North Central College Naperville IL
145725 Illinois Institute of Technology Chicago IL
149514 Trinity International University-Illinois Deerfield IL
145372 Greenville College Greenville IL
147679 North Park University Chicago IL
147244 Millikin University Decatur IL
147013 McKendree University Lebanon IL
144962 Elmhurst College Elmhurst IL
143118 Aurora University Aurora IL
145619 Benedictine University Lisle IL
147536 National Louis University Chicago IL
147828 Olivet Nazarene University Bourbonnais IL
143358 Bradley University Peoria IL
148654 University of Illinois at Springfield Springfield IL
146612 Lewis University Romeoville IL
145336 Governors State University University Park IL
148627 Saint Xavier University Chicago IL
144883 East-West University Chicago IL
148487 Roosevelt University Chicago IL
146719 Loyola University Chicago Chicago IL
144892 Eastern Illinois University Charleston IL
144005 Chicago State University Chicago IL
149231 Southern Illinois University-Edwardsville Edwardsville IL
149772 Western Illinois University Macomb IL
147776 Northeastern Illinois University Chicago IL
145813 Illinois State University Normal IL
148335 Robert Morris University Illinois Chicago IL
144281 Columbia College-Chicago Chicago IL
144740 DePaul University Chicago IL
145637 University of Illinois at Urbana-Champaign Champaign IL
145600 University of Illinois at Chicago Chicago IL
147703 Northern Illinois University Dekalb IL
149222 Southern Illinois University-Carbondale Carbondale IL
147369 Moody Bible Institute Chicago IL
149639 VanderCook College of Music Chicago IL
143297 Blessing Rieman College of Nursing Quincy IL
148511 Rush University Chicago IL
147590 National University of Health Sciences Lombard IL
149028 Saint Anthony College of Nursing Rockford IL
149763 Resurrection University Chicago IL
148575 Saint Francis Medical Center College of Nursing Peoria IL
146533 Lakeview College of Nursing Danville IL
147129 Methodist College Peoria IL
144971 Eureka College Eureka IL
146427 Knox College Galesburg IL
145646 Illinois Wesleyan University Bloomington IL
146667 Lincoln Christian University Lincoln IL
149781 Wheaton College Wheaton IL
146339 Judson University Elgin IL
143048 School of the Art Institute of Chicago Chicago IL
144050 University of Chicago Chicago IL
147767 Northwestern University Evanston IL
148593 St. John’s College-Department of Nursing Springfield IL
145558 Rosalind Franklin University of Medicine and Science North Chicago IL
148849 Shimer College Chicago IL
149329 Telshe Yeshiva-Chicago Chicago IL
145497 Hebrew Theological College Skokie IL
146621 Lexington College Chicago IL
260947 Christian Life College Mount Prospect IL

Web Site Structure & Data Schema

For assignment 2, I was tasked with designing a web structure and data schema for the data set I am using. To do this, I reviewed the resources from webstyleguide and lucidchart to gain a better understanding of both information architecture principles.

Web Architecture Assignment 2
Figure 1

As seen in figure 1 – my site structure takes on a mostly hierarchical structure. Sub-pages are accessed by first visiting or hovering over the relevant content or page. This then provides access to the sub-pages which sit underneath the parent category. I attempted to allow for internal routing as well – for example by accessing Resources you can choose between accessing an external link or being routed within the site to the College Scorecard sub-page of the Data category. This was an organization that seemed to make the most sense to me, however once actually implementing this structure I’m sure it would undergo revision. This process had me thinking about how a site should be structured and whether the most logical way is necessarily the most user friendly way.

For part 2 of my assignment, I was asked to design a database schema for my data set. This was somewhat difficult as my dataset is a “flat file” with most columns representing attributes of the University ID. I gave it my best shot, though, and focused on dividing into tables and eventually joining 3 key attributes. 1) The university’s key attributes such as size, type, tuition. 2) Its accreditation agency and the compliance needed to be accredited. 3) Its student post-graduate profile on earnings and debt. This wasn’t entirely a straight forward exercise because of the nature of my data, but I wanted to give it a thoughtful effort and tried to split the data as best as I could.

College Scorecard Data Schema Mock-up - New Page (1)
Figure 2

As seen in Figure 2 – the entity records involved ultimately direct to the Accredited College entity record. The idea is to bring all the information from the College record (basic facts), Accreditation profile/agency, and post-graduate survey and federal student loan reporting into the same record of Accredited College. The idea here is that the information is gathered in separate places, not all at once. In order for it to be joined together without significant mistakes, data integrity must be maintained along the way. For example, while each college is located in one state in the data, states can have many different colleges so we need the State to be separated and not stored with the college table. Next, since a University can be accredited by multiple agencies (regional accreditation, national accreditation, program accreditation), the accreditation profile for compliance needs to be separated as well. Overall, I felt this was a good exercise to think through and separate the dataset out logically.

Finding The Right Tools



When it comes to transforming data into information, the tool used is often as important as the data itself. The College Scorecard data-set will require data cleansing and any tool used to create visualizations will need to be able to handle large data volumes. The first step to analyzing data is to understand the structure of your data-set.

To begin, I will use Microsoft Excel to open the .CSV files in the raw data dump and better understand how the data is structured. This will give me an idea of what is relevant and usable in the data-set. It will also give me the beginning of an understanding of what data needs to be cleansed and what columns or rows will be irrelevant to the analysis. Excel can provide quick and easy formulas, sums and averages for simple data calculations.

Second, I will look to use a tool such as Microsoft PowerBI to attempt to analyze and visualize the data. PowerBI allows powerful visualization and the adding of fields and filters to present information in the most relevant manner. Finally, it allows the establishment of relationships between fields, tables and even data-sets. Understanding a data-set, its components and the relationships between fields and tables is important to utilizing it as effective information.

Bringing Data Down to Size

As discussed in class, data can be noisy, messy and at times difficult to understand. However, using data and data analysis tools we can transform data into information. The key distinction lies in usability. Information contains relevant and easily digested facts and figures. Part of the process of converting data to information involves reducing the volume being presented. In this post, I’ll explore ways to decrease the size and scope of my data-set to make my analysis more manageable.


Because my data-set centers on U.S. higher education, five key categories stand out as relevant for narrowing down the data: State, Private vs Public, Size, Institution Type, and Financial Aid. For example, I might want to focus on private universities in Illinois with 5,000 to 15,000 students. To further narrow down my analysis, I would focus on pulling the key figures relevant to my analysis, such as cost, employment rate, and debt and earnings levels after graduation. Not only will this narrow down the scope of my data-set, it will allow for more relevant comparisons between similar institutions.

In my next post, I will investigate data analysis tools I can use to visualize and present the information extracted from the data.

Analyzing College Scorecard Data

For my project, I plan to look at the college scorecard dataset

In September of 2015, the U.S. Government launched the College Scorecard website. The site is designed to allow parents, students and other interested consumers to easily compare statistics on higher education institutions. Key stats such as cost, post-graduation earnings, debt levels, employment levels, and other facts are found within the government’s data sets. Full data sets are available on the college scorecard website.


For my topic, I will be analyzing the government’s data set in order to better understand college as both an investment and financial decision. The raw data – containing information ranging from 1996-2015 – are large, with each year representing 100MB of data. The uncompressed files total 2GB in size. In terms of the data table, the files each have roughly 7,700+ rows and 1,700+ columns.

Data Pic

In order to tackle this dataset I plan to break the data into more manageable pieces. Some of this has already been done by the government and can be downloaded from the College Scorecard website. As stated in the reading, this is the ultimate goal of information architecture, which is to make information findable and understandable. In my next post, I plan to consider the ways in which I can break down and compare the data in more detail.