This walk through assumes you have some sort of deeply nested data you sourced from a JSON file. Here, I am specifically working with the Pleiades JSON data-dump that I loaded in this post. You can download it yourself, here: https://atlantides.org/downloads/pleiades/json/ (use the pleiades-places-latest.json.gz file. If you do not have a program to decompress the file, you can use 7-zip to unpack it).
Once again, I return to Pleiades, and my quest for excellent source data. When I first downloaded and explored their JSON data, it was confusing to me. Even with the flattening applied, the result was one, massive list, divided into 2 main groups. Those lists had more lists, several layers deep in some sets. But even then, it was easy to see how the data lined up.
I clicked on the list to bring up the R Studios view tab. This allowed me to collapse and expand the lists to get a sense of what was in there. If you want to keep it all in-console, use glimpse()
, then use the $
to select a list from the level below. You can use $ for as many levels of data there are in the list.

View(pleiades_full_json
)
– view of the file with the @graph list expanded.
glimpse(pleiades_full_json$@graph
) – view of the same list. <list>
types shown, then individual structures are shown with dimensions.What I really wanted was the “id”, or the unique record ID that identifies every place in Pleiades. That should allow me to pair those ID’s up to any of these data sets and maintain the fidelity of the information. I have always wanted those references, so I went for that, first.
# Pull out the lists into their own objects.
plei_ids <- pleiades_full_json$`@graph`$id
plei_refs <- pleiades_full_json$`@graph`$references
# Pull them together into one tibble.
pleiades_references <- tibble(
id = plei_ids,
references = plei_refs
) %>%
unnest(references)
Steps:
- Pull out both the id list and the references list from the full list, then convert them into their own objects called
plei_ids
andplei_refs
, respectively. - Pull the 2 objects together in one tibble at the same time. The id will pair up to the respective records in the references table. Because one id may refer to multiple references, a nested value is created in the new tibble for the references column.
unnest(references)
will expand these into their own rows. - Note: places with many references will therefore generate many new rows. If there are more lists, they will show up as they have previously, nested inside the columns. It looks like that is not the case here, since my URL field has been converted to
<chr>
:

Did a little spot checking.
# Pull out the ID's and pick 3 at random.
spot_check_ids <- pleiades_references %>%
select(id) %>%
distinct() %>%
slice_sample(n = 3)
# Filter to ids and check what references are attached to them.
pleiades_references %>%
filter(id %in% spot_check_ids$id)
Steps:
select()
only the id column and usedistinct()
just to make sure I am getting 100% unique id’s. I then sample 3 of them at random usingslice_sample(n = 3)
.filter()
to only those id’s chosen and see what URL’s are tied to them.- Check each of those id’s on https://pleiades.stoa.org/ and make sure the same references are being identified under the References section of the place’s page. I did this a few times to make sure.
I thought this looked good! At some point, I want to use these references to do some more exploration. There are a ton of other database systems linked in here (like Trismegistos!) that would be really fun to poke at.
Next, I wanted to check out the Locations data. At first glance, the data looked bonkers – tons of nested tables. However, it’s really not that bad once you understand what’s in the set. Many of these nested tables were available in the top-level of the list, which would allow you to link the id directly to the value, rather than trying to unnest the columns in the larger set – exactly as I did with the references set above. I was interested in the representative coordinates (latitude, longitude), which are stored in pleiades_full_json$@graph$reprPoint
.
# Create an object from the reprPoint list.
plei_rep_locs <- pleiades_full_json$`@graph`$reprPoint
# Pull id and location data together. Make the data clean and tidy.
pleiades_locations_id_match <- tibble(
id = plei_ids,
location_data = plei_rep_locs
) %>%
unnest_wider(location_data,
names_sep = "_") %>%
rename("long" = location_data_1,
"lat" = location_data_2)
Steps:
- Store an object called
plei_rep_locs
with the data frompleiades_full_json$`@graph`$reprPoint
. - The next steps flow together to make the final location set:
- Create a tibble called
pleiades_locations_id_match
that combinesplei_ids
andlocation_data
(as we did before). - The location_data comes over as a double-valued list. If you
unnest()
, you will create a new record for each coordinate (latitude will be on one line, longitude under it). That is not helpful. Instead, we can useunnest_wider()
to coerce those values into 2 columns.names_sep = "_"
tells unnest to use the column name as the base name for the new columns, and apply an underscore ( _ ) to separate the name and the column position of the set. Since I only have 2 coordinates, this will create location_data_1 and location_data_2.
- Finally, rename the columns so you know for sure which coordinate is which.
- Create a tibble called
I did this process with a few more lists and was able to get at an incredible amount of data. I saved the ones that caught my interest to a list to keep things accessible and organized.
# Create a blank list.
pleiades_from_json_cleaned_tidied <- list()
# Add to the list.
pleiades_from_json_cleaned_tidied[["pleiades_locations"]] <- pleiades_locations_id_match
pleiades_from_json_cleaned_tidied[["pleiades_references"]] <- pleiades_references
# Save the list.
saveRDS(pleiades_from_json_cleaned_tidied, paste0(objects_directory, "pleiades_from_json_cleaned_tidied.rds"))
Steps:
- Create an empty
list()
. Here mine is calledpleiades_from_json_cleaned_tidied
. - Add your sets to the list. The name inside the brackets
[[ ... ]]
will be the name inside your list. Name it something meaningful. - Save as an .rds. Here, I have a predefined value for my objects directory, but this can be any place you want to save the file. Again, name the .rds something meaningful.
Follow along code is up on Github.
Next up: different joining methods to match my locations to attestations through location and name.
Leave a Reply