Counting Illinois Colleges

To dive deeper into my dataset, I used Google BigQuery to run SQL queries on my data table. Because of the size of my dataset (over 10MB) I had to first upload the file to Google Cloud.

In order to sample the use of BigQuery on my dataset, I decided to query the number of accredited undergraduate degree granting publc and non-profit institutions in the state of Illinois. In order to run this query I constructed the following in SQL:

SELECT (*)
FROM [neat-sunspot-164714:CollegeScoreCard.Dataset_2013]
WHERE STABBR = ‘IL’
AND PREDDEG = 3
AND CONTROL != 3

The structure of the query is to select all the results (not excluding any columns) from my table where the state is Illinois, the predominate degrees granted are for undergraduate, and control of the institution is not for-profit.

A count expression provides: 71 results

I’m actually somewhat surprised by the number of results, however they’re confirmed by viewing the institution names, id’s and cities from the table. The query took about 5 seconds to run, despite needing to run through tens of thousands of rows.

UNITID INSTNM CITY STABBR
145691 Illinois College Jacksonville IL
146481 Lake Forest College Lake Forest IL
143288 Blackburn College Carlinville IL
146825 MacMurray College Jacksonville IL
148405 Rockford University Rockford IL
144351 Concordia University-Chicago River Forest IL
147341 Monmouth College Monmouth IL
148131 Quincy University Quincy IL
143084 Augustana College Rock Island IL
148496 Dominican University River Forest IL
149505 Trinity Christian College Palos Heights IL
148584 University of St Francis Joliet IL
147660 North Central College Naperville IL
145725 Illinois Institute of Technology Chicago IL
149514 Trinity International University-Illinois Deerfield IL
145372 Greenville College Greenville IL
147679 North Park University Chicago IL
147244 Millikin University Decatur IL
147013 McKendree University Lebanon IL
144962 Elmhurst College Elmhurst IL
143118 Aurora University Aurora IL
145619 Benedictine University Lisle IL
147536 National Louis University Chicago IL
147828 Olivet Nazarene University Bourbonnais IL
143358 Bradley University Peoria IL
148654 University of Illinois at Springfield Springfield IL
146612 Lewis University Romeoville IL
145336 Governors State University University Park IL
148627 Saint Xavier University Chicago IL
144883 East-West University Chicago IL
148487 Roosevelt University Chicago IL
146719 Loyola University Chicago Chicago IL
144892 Eastern Illinois University Charleston IL
144005 Chicago State University Chicago IL
149231 Southern Illinois University-Edwardsville Edwardsville IL
149772 Western Illinois University Macomb IL
147776 Northeastern Illinois University Chicago IL
145813 Illinois State University Normal IL
148335 Robert Morris University Illinois Chicago IL
144281 Columbia College-Chicago Chicago IL
144740 DePaul University Chicago IL
145637 University of Illinois at Urbana-Champaign Champaign IL
145600 University of Illinois at Chicago Chicago IL
147703 Northern Illinois University Dekalb IL
149222 Southern Illinois University-Carbondale Carbondale IL
147369 Moody Bible Institute Chicago IL
149639 VanderCook College of Music Chicago IL
143297 Blessing Rieman College of Nursing Quincy IL
148511 Rush University Chicago IL
147590 National University of Health Sciences Lombard IL
149028 Saint Anthony College of Nursing Rockford IL
149763 Resurrection University Chicago IL
148575 Saint Francis Medical Center College of Nursing Peoria IL
146533 Lakeview College of Nursing Danville IL
147129 Methodist College Peoria IL
144971 Eureka College Eureka IL
146427 Knox College Galesburg IL
145646 Illinois Wesleyan University Bloomington IL
146667 Lincoln Christian University Lincoln IL
149781 Wheaton College Wheaton IL
146339 Judson University Elgin IL
143048 School of the Art Institute of Chicago Chicago IL
144050 University of Chicago Chicago IL
147767 Northwestern University Evanston IL
148593 St. John’s College-Department of Nursing Springfield IL
145558 Rosalind Franklin University of Medicine and Science North Chicago IL
148849 Shimer College Chicago IL
149329 Telshe Yeshiva-Chicago Chicago IL
145497 Hebrew Theological College Skokie IL
146621 Lexington College Chicago IL
260947 Christian Life College Mount Prospect IL

Bringing Data Down to Size

As discussed in class, data can be noisy, messy and at times difficult to understand. However, using data and data analysis tools we can transform data into information. The key distinction lies in usability. Information contains relevant and easily digested facts and figures. Part of the process of converting data to information involves reducing the volume being presented. In this post, I’ll explore ways to decrease the size and scope of my data-set to make my analysis more manageable.

explainia-poster1-1024x791

Because my data-set centers on U.S. higher education, five key categories stand out as relevant for narrowing down the data: State, Private vs Public, Size, Institution Type, and Financial Aid. For example, I might want to focus on private universities in Illinois with 5,000 to 15,000 students. To further narrow down my analysis, I would focus on pulling the key figures relevant to my analysis, such as cost, employment rate, and debt and earnings levels after graduation. Not only will this narrow down the scope of my data-set, it will allow for more relevant comparisons between similar institutions.

In my next post, I will investigate data analysis tools I can use to visualize and present the information extracted from the data.

Analyzing College Scorecard Data

For my project, I plan to look at the college scorecard dataset

In September of 2015, the U.S. Government launched the College Scorecard website. The site is designed to allow parents, students and other interested consumers to easily compare statistics on higher education institutions. Key stats such as cost, post-graduation earnings, debt levels, employment levels, and other facts are found within the government’s data sets. Full data sets are available on the college scorecard website.

the-redesigned-college-scorecard-site-uses-open-data-to-help-find-the-right-school-for-you

For my topic, I will be analyzing the government’s data set in order to better understand college as both an investment and financial decision. The raw data – containing information ranging from 1996-2015 – are large, with each year representing 100MB of data. The uncompressed files total 2GB in size. In terms of the data table, the files each have roughly 7,700+ rows and 1,700+ columns.

Data Pic

In order to tackle this dataset I plan to break the data into more manageable pieces. Some of this has already been done by the government and can be downloaded from the College Scorecard website. As stated in the reading, this is the ultimate goal of information architecture, which is to make information findable and understandable. In my next post, I plan to consider the ways in which I can break down and compare the data in more detail.