How to use tidygeocoder’s reverse_geocode()
function to verify locations and place names. If you want to use your own data, make sure it has longitude and latitude variables available. I am also making my own locations data set available here: https://github.com/timestamped-blog/follow_alongs/blob/main/early_xty_locations.csv. This is a somewhat cleaner version of locations_master
that I am using below.
Way back when I was actively collecting my data, I was obsessed with making sure that the places associated with my entries both existed, and were locatable. I spent a lot of time tracking down place names, alternative names, modern names, whatever was out there. Eventually, the list became very long, with tons of duplicate entries (like, did you know that a lot of early Christian events took place in Rome? Who knew!) I realized I had to organize this nonsense, and decided to build a mySQL database to store it all. I took all my city/country locations and consolidated them to one-record-per-location, assigned a unique ID for that particular place, then created a table for it in database. This is currently how all of my location data exists and is the most up-to-date.
But, I wanted to make sure my locations were right. I felt some of my modern country info was dicey, and I wasn’t sold that all of my coordinates were right. I had done way too much copying and pasting over the years! To do this, I decided to do a reverse lookup on my coordinates, then check the countries that were generated. For that, I used a package called tidygeocoder which has a function: reverse_geocode()
, which will allow you to feed in a set of coordinates, and return a (very) wide list of data about that place. I was mainly interested in the country variable given by Open Street Map (OSM).
Here are how my locations looked as I downloaded them from mySQL so you have a sense of my variables (the set is called locations_master
). In addition to tidygeocoder, I had (as always) tidyverse loaded to help with this work.

I wanted to grab my latitudes (lat
) and longitudes (longi
) and feed them into the reverse_geocode()
function, then apply some other useful arguments. I stored it all in an object called “rev_geo
“. The whole thing looked like this:
rev_geo <- locations_master %>%
filter(!is.na(lat)) %>%
reverse_geocode(
lat = lat,
long = longi,
method = "osm",
full_results = TRUE,
custom_query = list("accept-language"="en-US")
)
Steps:
- First, I
filter
out any NA values in my coordinates, since it will cause the OSM lookup to error out. lat =
takes in my latitude (lat
) value;long =
takes in my longitude (longi
) value.method = "osm"
indicates the OSM geocoding service. A full list of supported services can be found here: https://jessecambon.github.io/tidygeocoder/articles/geocoder_services.htmlfull_results =
returns a wide data set of various bits of information surrounding your coordinates.custom_query =
in this instance, instructs returning the results in US English. Without this applied, many of my addresses came back in their native languages, which was cool as hell, but difficult to read! Lucky for me, someone else had this issue and asked StackOverflow about it awhile ago.
When reverse geocode finished, I wanted to check the matches. I created a quick a script to check my work and see which countries matched up.
rev_geo %>%
select(locationID,
modernCountry,
country,
address) %>%
mutate(country_check = if_else(modernCountry == country, T, F)) %>%
filter(country_check == F)
Steps:
- Select only the columns I want for readability.
- Create a column with
mutate()
that checks if the modernCountry matches the country I pulled in. The if_else tests ifmodernCountry == country
. If it does, it returnsTRUE
, else it it returnsFALSE
. - Filter only the
FALSE
records.
All of my entries were instances where the names were different, but correct (i.e: Britain vs. United Kingdom); regions that used to stretch through many countries (i.e.: Roman Mauretania), and some of the cities rest on the borders. I thought it looked good and decided to pause. Since the geocode took awhile, I wanted to save it so that I had it on hand for the next steps. I stashed it in my “objects” directory where I save all my .rds files.
saveRDS(rev_geo, paste0(objects_directory,"locations_reversed_joined.rds"))
Now that I am confident my locations look good, I want to pair them up with a couple data sets: Pleiades and Trismegistos, that will provide me almost every attestation I would need. But before I can do that, I need to mine the data from both sites and make them usable in the R environment. The next several posts walk through loading, wrangling, and tidying data from JSON files and .csv data dumps.
If you would like to walk through a sample of my data set using the scripts above, I have a follow along script located on time-stamped’s Github.
Citations:
Cambon J, Hernangómez D, Belanger C, Possenriede D (2021). tidygeocoder: An R package for geocoding. Journal of Open Source Software, 6(65), 3544, https://doi.org/10.21105/joss.03544 (R package version 1.0.5)
Leave a Reply