Visualizations are a tool for understanding data. They help to express relationships between data, the significance of data points, and the importance of data without having to scroll through tables or calculate percentages. Visualizations provide immediate understanding based on vision and interpretation of the image. All of the data used for this exercise is from the 2013 College ScoreCard dataset, which provides the most complete data I found as of this date.
One of the first images I created is a map of the United States showing the tuition of bachelor’s degree granting universities across geographies. The size of circles indicate the amount of in-state tuition charged by the institution, while the color of the circle indicates whether the institution is public (light blue) or non-profit private (dark blue). As seen, dark blue circles tend to be significantly larger on average as expected. The visualization also shows that in terms of distribution, more expensive private universities tend to be located in the Northeast while the South and Midwest feature less expensive public and private options.
The second graph I produced shows the geographic distribution of student populations using a bubble graph. The graph demonstrates that most undergraduate students are located in the Southeast region (1,994,386) while the fewest are located in the Rocky Mountains region (410,525). I created a legend separately to spell out each state.
My final visualization shows the correlation between Family Income and SAT at universities in Illinois. As seen in the graph, there appears to be a positive correlation between an increase in family income and the related increase in test scores. Northwestern University and the University of Chicago, two of the most prestigious universities in the state, both have the highest test scores and the highest average family income of any undergraduate degree granting institution.
While it’s important to note that these visualizations are not developed using entirely scientific methods, they still serve as a useful way to digest what is otherwise a noisy dataset or table. This exercise has taught me the importance of visualizations and the utility that tools such as Tableau can provide when dealing with data.
For assignment 2, I was tasked with designing a web structure and data schema for the data set I am using. To do this, I reviewed the resources from webstyleguide and lucidchart to gain a better understanding of both information architecture principles.
As seen in figure 1 – my site structure takes on a mostly hierarchical structure. Sub-pages are accessed by first visiting or hovering over the relevant content or page. This then provides access to the sub-pages which sit underneath the parent category. I attempted to allow for internal routing as well – for example by accessing Resources you can choose between accessing an external link or being routed within the site to the College Scorecard sub-page of the Data category. This was an organization that seemed to make the most sense to me, however once actually implementing this structure I’m sure it would undergo revision. This process had me thinking about how a site should be structured and whether the most logical way is necessarily the most user friendly way.
For part 2 of my assignment, I was asked to design a database schema for my data set. This was somewhat difficult as my dataset is a “flat file” with most columns representing attributes of the University ID. I gave it my best shot, though, and focused on dividing into tables and eventually joining 3 key attributes. 1) The university’s key attributes such as size, type, tuition. 2) Its accreditation agency and the compliance needed to be accredited. 3) Its student post-graduate profile on earnings and debt. This wasn’t entirely a straight forward exercise because of the nature of my data, but I wanted to give it a thoughtful effort and tried to split the data as best as I could.
As seen in Figure 2 – the entity records involved ultimately direct to the Accredited College entity record. The idea is to bring all the information from the College record (basic facts), Accreditation profile/agency, and post-graduate survey and federal student loan reporting into the same record of Accredited College. The idea here is that the information is gathered in separate places, not all at once. In order for it to be joined together without significant mistakes, data integrity must be maintained along the way. For example, while each college is located in one state in the data, states can have many different colleges so we need the State to be separated and not stored with the college table. Next, since a University can be accredited by multiple agencies (regional accreditation, national accreditation, program accreditation), the accreditation profile for compliance needs to be separated as well. Overall, I felt this was a good exercise to think through and separate the dataset out logically.