The summer just begins and a lot of people start to experience flight delay. I am also interested in seeing which cites in the US are well connected by flights. The OpenFlights project provides free flight information. Then I used R to download the data and loaded them into a SQLite database, since I try to keep them as the persistent data.
RSQLite
facilities R to join and query tables in the database. Finally the flight routes and the airports were visualized by ggplot2
and ggmap
.Rank | IATA code | City | Arriving flight routes |
---|---|---|---|
1 | ORD | Chicago | 494 |
2 | LAX | Los Angeles | 438 |
3 | DEN | Denver | 401 |
4 | JFK | New York | 363 |
5 | ATL | Atlanta | 339 |
6 | DFW | Dallas-Fort Worth | 281 |
7 | SFO | San Francisco | 259 |
8 | IAH | Houston | 244 |
9 | MIA | Miami | 244 |
10 | EWR | Newark | 242 |
Wiki says that Atlanta is the busiest airport in the US according to total passenger boardings. However, from the number of the incoming flight routs, it only ranks 5th, following Chicago, Los Angeles, Denver and New York. Possibly Atlanta is the hub mostly for passengers to do connection. If somebody really loves air traveling, Chicago(with ORD) and New York(with both JFK and EWR) are the two most convenient cities to stay with, because they have the most options.
This post is inspired by one post on the blog Data Science and R
# Import libraries and set up directory
library(ggmap)
library(RSQLite)
setwd("C:/Google Drive/Codes")
# Read data directly from URLs
airport <- read.csv("http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/airports.dat", header = F)
route <- read.csv("http://openflights.svn.sourceforge.net/viewvc/openflights/openflights/data/routes.dat", header = F)
# Remove the airports without IATA codes and rename the variables
airport <- airport[airport$V5!='', c('V3', 'V4', 'V5','V7','V8','V9')]
colnames(airport) <- c("City", "Country", "IATA", "lantitude", "longitude", "altitude")
route <- route[c('V3', 'V5')]
colnames(route) <- c("Departure", "Arrival")
# Store data to SQLite database
conn <- dbConnect("SQLite", dbname = "air.db")
dbSendQuery(conn, "drop table if exists airport;")
dbWriteTable(conn, "airport", airport)
dbSendQuery(conn, "drop table if exists route;")
dbWriteTable(conn, "route", route)
dbDisconnect(conn)
# Manipulate data in SQLite database
conn <- dbConnect("SQLite", dbname = "air.db")
sqlcmd01 <- dbSendQuery(conn, "
select a.type, a.city as iata, a.frequency, b.city, b.country, b.lantitude, b.longitude
from (select 'depart' as type, departure as city, count(departure) as frequency
from route
group by departure
order by frequency desc, type) as a
left join airport as b on a.city = b.iata
order by frequency desc
;")
top <- fetch(sqlcmd01, n = -1)
sqlcmd02 <- dbSendQuery(conn, "
select route.rowid as id, route.departure as point, airport.lantitude as lantitude, airport.longitude as longitude
from route left join airport on route.departure = airport.iata
union
select route.rowid as id, route.arrival as point, airport.lantitude as lantitude, airport.longitude as longitude
from route left join airport on route.arrival = airport.iata
order by id
;")
combine <- fetch(sqlcmd02, n = -1)
dbDisconnect(conn)
# Draw the flight routes and the airports on Google map
ggmap(get_googlemap(center = 'us', zoom = 4, maptype = 'roadmap'), extent = 'device') +
geom_line(data = combine, aes(x = longitude, y = lantitude, group = id), size = 0.1,
alpha = 0.05,color = 'red4') +
geom_point(data = top, aes(x = longitude, y = lantitude, size = frequency), colour = "blue", alpha = 0.3) +
scale_size(range=c(0,15))
The map appears to not have flights to Asia or Europe. Was this only western hemisphere flights?
ReplyDeleteCharlie, when you read in data to SQL-Lite with this method, does it automatically recognize data type and create a schema?
ReplyDeleteGreat post. I used to be checking continuously this blog and I am inspired! Very useful information specially the remaining section :) I handle such information a lot. I used to be seeking this certain info for a very long time. Thanks and good luck....
ReplyDelete