Many data come in JSON format, which can be read into and manipulated in R. Here’s how to get started!
Pleiades and Trismegistos are amazing resources for anyone who is interested in places of the ancient world. They both collect an amazing amount of data and information and make it freely available for anyone to use.
Pleiades is a very robust online gazetteer (an index of places) housed at the Ancient World Mapping Center. It is an excellent starting place for research on ancient places. From the website entrance, you type in a search term and it will present you with a list of possible matches. Clicking on a link brings you to a site with a ton of information, and all of it cited from reputable sources, and all clickable. Good stuff!
Trismegistos is another website with great, well sourced information about ancient places. Again, you enter your place name and search for the matches. I selected Trismegistos as a source when it kept popping up in my Google searches during research. It was often able to locate places that were not available in Pleiades. It also has a really extensive set of data on the alternative names each place might have. That said, the great majority of my places had entries in both datasets. In fact, both Pleiades and Trismegistos include each other in their standard set of reference output.
What is the best, though, is that through Creative Commons licensing, you are free to download and use these data sets as you like. Both offer a variety of formats, depending on what you want to access. For Pleiades, I snagged the JSON set – because I wanted to learn about extracting JSON data, and it seems to be the only place where I can get at the very extensive reference lists Pleiades provides. It is also the most consistently up-to-date. On the Trismegistos side, I downloaded the Geographical data dump, available in .csv, which contains all of the current entries for Trismegistos geographical data.
Reading the sets
Using .csv files is straightforward: you read the data in, and if it’s mostly clean, the columns and rows will come in as you would expect them to. I was able to read in Trimegistos’ data without issue. The JSON data from Pleiades is structured differently, and if you are not familiar with these files, they’re not immediately intuitive. I am pretty new to using them, myself and it was a bit of a learning curve. Thankfully, R has a package, jsonlite, that makes pulling that data out of the JSON really easy.
# Read in trismegistos data dump .csv
trismegistos_all_export_geo <- read.csv(paste0(data_sets,"trismegistos_all_export_geo.csv"))
# Read in json file
pleiades_full_json <- read_json(paste0(data_sets,"/pleiades-places.json",
simplifyDataFrame = T,
flatten = T)
Steps:
- Use
read.csv()
to read in the Trismegistos .csv file and save it as an object calledtrismegistos_all_export_geo.
- Use
read_json()
from the jsonlite package to read in the .json file. This was a huge file and took between 10-15 minutes to finish loading. Save this as an object calledpleiades_full_json
.simplifyDataFrame = T
, puts lists containing only records (JSON objects) into a single data frame;flatten = T
, flattens nested data frames into a single data frame if possible;
The tidying and wrangling of this data was a lot more involved and took some time due to the nested nature of the JSON data. Nevertheless, R helped me get the data in a format I could access easily. For anyone who wants to dive into the minutiae of what Pleiades and Trismegistos has to offer, they should download and extract the respective datasets and walk through the above scripts to import it. Then, if you feel like it, follow me over to the wrangling post where I go through cleaning and tidying elements of these sets.
Citations:
- Roger Bagnall, et al. (eds.), Pleiades: A Gazetteer of Past Places, 2016, <http://pleiades.stoa.org/> [Accessed: February 17, 2016].
- H. Verreth, A survey of toponyms in Egypt in the Graeco-Roman period (Trismegistos Online Publications, 2), Leuven: Trismegistos Online Publications, 1253 pp.
- Ooms J (2014). “The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects.” arXiv:1403.2805 [stat.CO]. https://arxiv.org/abs/1403.2805.
Leave a Reply