GeoJSON data is available from ramm.com about our cities bridges. To make it available for our engineers it needs to loaded to a local database. There are lots of metrics stored there too which can change daily, so we need to do this every night. Let’s automate it with python!
Please note I have obtained permission from RAMM NZ to use their web services. You can do this by requesting an API key from their customer support.
The Main Steps
To populate a database with web JSON data nightly we need to do a few things
- Delete all features in the target dataset – we don’t want any duplicates
- Request the data from the web by building urls
- Read data response
- Write the data
Simple right? Conceptually yes, the devil’s in the details. First we want to check that the URL’s content is valid. I’ve done this by hitting the url to check its content and then displaying it in QGIS.
To view a geojson file in QGIS follow these steps:
- Copy and paste the url contents into a text file
- Name is with a .json extension eg data.json
- Drag the .json file into QGIS and make sure it displays
- Id the feature to check its attributes are working correctly
To read the GeoJSON from the web into a usable format we can use the python modules urllib and json. Using the command json.loads(response.read()) we can grab the data as a dictionary and then request elements within the json structure by name, like this feature[‘geometry’][‘type’] and feature[‘geometry’][‘coordinates’]. Receiving a response for each object looking something like this:
Polygon
[[[1821228.2874631872, 5531032.432156478], [1821241.408314114, 5531039.600723921], [1821248.6960629625, 5531026.261731579], [1821235.5752120358, 5531019.093164136], [1821228.2874631872, 5531032.432156478]]]
To get a better understanding of the data structure we’re working with I like to copy and paste the JSON into a JSON to human readable converter which should look like this:
I’m working on an ESRI’s SDE database built on top on Microsoft SQL Server. In this case I make a connection to the database like this:
connection = "C:\\Users\\username\\AppData\\Roaming\\ESRI\\Desktop10.4\\ArcCatalog\\SQLGIS.sde\\"
Then build a variable connection to the specific data set, using the name of the connection name
BRIDGE_FC = "{0}{1}".format(connection, "SDE_DATA.SDEADMIN.RAMM_BRIDGE")
Build a list of the fields in the database we would like to update
BRIDGE_FIELDS = ["bridge_id", "bridge_type", "age", "length_m", "rail_to_rail", "bridge_name", "bridge_no", "SHAPE@", "sync_date"]
Use the arcpy.da.insertcursor to insert the data into the target data set by passing the list of features to be updated, and the values from the geojson dictionary object. Because of this we can call components of the bridges datasets by name. Here my function responsible for inserting geojson into the database:
def insert_ramm(fields, featList): """The function to write the data to SDE""" cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields) for feat in featList: #print feat['properties']['bridge_id'] cursor.insertRow((feat['properties']['bridge_id'], feat['properties']['bridge_type'], feat['properties']['age'], feat['properties']['length_m'], feat['properties']['rail_to_rail'], feat['properties']['bridge_name'], feat['properties']['bridge_no'])) #Delete cursor object del cursor
This has updated 124 features, which is the total number in our GeoJSON web scrape, however there is no geometry. So all we have so far is a list of bridges in the database but we don’t know where they are spatially. The following code will return a Polygon object which can be added to the update to give geometry. At first this code block didn’t work until I realized you need to add the spatial reference when instantiating the Polygon object.
sr = arcpy.SpatialReference(2193) cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields) for feat in featList: #find the geometry type if feat['geometry']['type'] == "Polygon": print "Polygon" array = arcpy.Array() for coordinates in feat['geometry']['coordinates']: #print coordinates for vertex in range(len(coordinates)): array.add(arcpy.Point(coordinates[vertex][0], coordinates[vertex][1])) #print coordinates[vertex] geom = arcpy.arcpy.Polygon(array, sr)
The final step is to automate automate this so that it runs every night. Fortunately there is a helpful blog post here from ESRI about automating tasks with task scheduler. If you’re working on a Linux system you can use a cron job for this.
I like to run python scripts from a .bat file rather than specifying the actual .py file. That way we have some flexibility and can change the target script in the future without having to change the scheduled job.
Here is the complete program if you would like to try the same process, enjoy:
import os, sys import arcpy import urllib, json import datetime #SQLGIS CONNECTION connection = "C:\\Users\\YOUR_ID\\AppData\\Roaming\\ESRI\\Desktop10.4\\ArcCatalog\\SQLGIS.sde\\" #GLOBAL VARS BRIDGE_URL = r"YOURURL" BRIDGE_FC = "{0}{1}".format(connection, "SDE_DATA.SDEADMIN.RAMM_BRIDGE") BRIDGE_FIELDS = ["bridge_id", "bridge_type", "age", "length_m", "rail_to_rail", "bridge_name", "bridge_no", "SHAPE@", "sync_date"] #OTHER VARS today = datetime.date.today() def delete_all(fc): arcpy.DeleteFeatures_management(fc) def read_ramm(url): """ The function to read url data and return as a dictionary""" response = urllib.urlopen(url) #writing the response to a dict data = json.loads(response.read()) featList = data["features"] """for feat in featList: #print i['type'] #not useful #print i['id'] #not useful print feat['geometry']['type'] print feat['geometry']['coordinates'] print feat['properties']['bridge_id'] print feat['properties']['bridge_id'] print feat['properties']['bridge_id'] print feat['properties']['bridge_id'] print feat['properties']['bridge_id']""" return featList def ramm_bridge(fields, featList): """The function to write the data to SDE""" sr = arcpy.SpatialReference(2193) cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields) for feat in featList: #find the geometry type if feat['geometry']['type'] == "Polygon": print "Polygon" array = arcpy.Array() for coordinates in feat['geometry']['coordinates']: #print coordinates for vertex in range(len(coordinates)): array.add(arcpy.Point(coordinates[vertex][0], coordinates[vertex][1])) #print coordinates[vertex] geom = arcpy.arcpy.Polygon(array, sr) else: print "failed validation" break #print feat['properties']['bridge_id'] cursor.insertRow((feat['properties']['bridge_id'], feat['properties']['bridge_type'], feat['properties']['age'], feat['properties']['length_m'], feat['properties']['rail_to_rail'], feat['properties']['bridge_name'], feat['properties']['bridge_no'], geom, today)) #Delete cursor object del cursor #### BRIDGES #### delete_all(BRIDGE_FC) featList = read_ramm(BRIDGE_URL) ramm_bridge(BRIDGE_FIELDS, featList) print "welldone"
Thanks for reading
Use arcpy.da.Editor() to open a transaction so that if the update fails you still have previous data rather than an empty dataset.
LikeLiked by 1 person
Good idea thank you. I’m going to update my live script using the documentation for arcpy.da.Editor(). Here is the link for others who would like to follow this example http://desktop.arcgis.com/en/arcmap/10.3/analyze/arcpy-data-access/editor.htm
LikeLike
Lucas,
Good stuff, step by step.
One suggestion on JSON Viewer part. I found below Json Grid Viewer very helpful for complex data structure like this. It shows Json in table format, very comfortable and effective.
https://jsongrid.com/json-grid
Maybe it’s worth mentioning in your blog post itself.
Cheers!
LikeLiked by 1 person
That’s great, thanks John, I’ll look into that!
LikeLike