How to Ethically Harvest Online Data with Python for ESRI Integration

GeoJSON data is available from ramm.com about our cities bridges. To make it available for our engineers it needs to loaded to a local database. There are lots of metrics stored there too which can change daily, so we need to do this every night. Let’s automate it with python!

Please note I have obtained permission from RAMM NZ to use their web services. You can do this by requesting an API key from their customer support.

The Main Steps

To populate a database with web JSON data nightly we need to do a few things

Delete all features in the target dataset – we don’t want any duplicates
Request the data from the web by building urls
Read data response
Write the data

Simple right? Conceptually yes, the devil’s in the details. First we want to check that the URL’s content is valid. I’ve done this by hitting the url to check its content and then displaying it in QGIS.

JSON 3.PNG

To view a geojson file in QGIS follow these steps:

Copy and paste the url contents into a text file
Name is with a .json extension eg data.json
Drag the .json file into QGIS and make sure it displays
Id the feature to check its attributes are working correctly

To read the GeoJSON from the web into a usable format we can use the python modules urllib and json. Using the command json.loads(response.read()) we can grab the data as a dictionary and then request elements within the json structure by name, like this feature[‘geometry’][‘type’] and feature[‘geometry’][‘coordinates’]. Receiving a response for each object looking something like this:

Polygon 
[[[1821228.2874631872, 5531032.432156478], [1821241.408314114, 5531039.600723921], [1821248.6960629625, 5531026.261731579], [1821235.5752120358, 5531019.093164136], [1821228.2874631872, 5531032.432156478]]]

To get a better understanding of the data structure we’re working with I like to copy and paste the JSON into a JSON to human readable converter which should look like this:

I’m working on an ESRI’s SDE database built on top on Microsoft SQL Server. In this case I make a connection to the database like this:

connection = "C:\\Users\\username\\AppData\\Roaming\\ESRI\\Desktop10.4\\ArcCatalog\\SQLGIS.sde\\"

Then build a variable connection to the specific data set, using the name of the connection name

BRIDGE_FC = "{0}{1}".format(connection, "SDE_DATA.SDEADMIN.RAMM_BRIDGE")

Build a list of the fields in the database we would like to update

BRIDGE_FIELDS = ["bridge_id", "bridge_type", "age", "length_m", "rail_to_rail", "bridge_name", "bridge_no", "SHAPE@", "sync_date"]

Use the arcpy.da.insertcursor to insert the data into the target data set by passing the list of features to be updated, and the values from the geojson dictionary object. Because of this we can call components of the bridges datasets by name. Here my function responsible for inserting geojson into the database:

def insert_ramm(fields, featList):
"""The function to write the data to SDE"""
cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields)

for feat in featList:
#print feat['properties']['bridge_id']
cursor.insertRow((feat['properties']['bridge_id'],
feat['properties']['bridge_type'],
feat['properties']['age'],
feat['properties']['length_m'],
feat['properties']['rail_to_rail'],
feat['properties']['bridge_name'],
feat['properties']['bridge_no']))

#Delete cursor object
del cursor

This has updated 124 features, which is the total number in our GeoJSON web scrape, however there is no geometry. So all we have so far is a list of bridges in the database but we don’t know where they are spatially. The following code will return a Polygon object which can be added to the update to give geometry. At first this code block didn’t work until I realized you need to add the spatial reference when instantiating the Polygon object.

sr = arcpy.SpatialReference(2193)
cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields)
for feat in featList:

#find the geometry type
if feat['geometry']['type'] == "Polygon":
print "Polygon"
array = arcpy.Array()
for coordinates in feat['geometry']['coordinates']:
#print coordinates
for vertex in range(len(coordinates)):
array.add(arcpy.Point(coordinates[vertex][0], coordinates[vertex][1]))
#print coordinates[vertex]

geom = arcpy.arcpy.Polygon(array, sr)

The final step is to automate automate this so that it runs every night. Fortunately there is a helpful blog post here from ESRI about automating tasks with task scheduler. If you’re working on a Linux system you can use a cron job for this.

I like to run python scripts from a .bat file rather than specifying the actual .py file. That way we have some flexibility and can change the target script in the future without having to change the scheduled job.

Here is the complete program if you would like to try the same process, enjoy:


import os, sys
import arcpy
import urllib, json
import datetime

#SQLGIS CONNECTION
connection = "C:\\Users\\YOUR_ID\\AppData\\Roaming\\ESRI\\Desktop10.4\\ArcCatalog\\SQLGIS.sde\\"

#GLOBAL VARS
BRIDGE_URL = r"YOURURL"
BRIDGE_FC = "{0}{1}".format(connection, "SDE_DATA.SDEADMIN.RAMM_BRIDGE")
BRIDGE_FIELDS = ["bridge_id", "bridge_type", "age", "length_m", "rail_to_rail", "bridge_name", "bridge_no", "SHAPE@", "sync_date"]

#OTHER VARS
today = datetime.date.today()

def delete_all(fc):
arcpy.DeleteFeatures_management(fc)

def read_ramm(url):
""" The function to read url data and return as a dictionary"""
response = urllib.urlopen(url)

#writing the response to a dict
data = json.loads(response.read())
featList = data["features"]

"""for feat in featList:
#print i['type'] #not useful
#print i['id'] #not useful
print feat['geometry']['type']
print feat['geometry']['coordinates']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']
print feat['properties']['bridge_id']"""

return featList

def ramm_bridge(fields, featList):
"""The function to write the data to SDE"""
sr = arcpy.SpatialReference(2193)
cursor = arcpy.da.InsertCursor(BRIDGE_FC, fields)
for feat in featList:

#find the geometry type
if feat['geometry']['type'] == "Polygon":
print "Polygon"
array = arcpy.Array()
for coordinates in feat['geometry']['coordinates']:
#print coordinates
for vertex in range(len(coordinates)):
array.add(arcpy.Point(coordinates[vertex][0], coordinates[vertex][1]))
#print coordinates[vertex]

geom = arcpy.arcpy.Polygon(array, sr)

else:
print "failed validation"
break

#print feat['properties']['bridge_id']
cursor.insertRow((feat['properties']['bridge_id'],
feat['properties']['bridge_type'],
feat['properties']['age'],
feat['properties']['length_m'],
feat['properties']['rail_to_rail'],
feat['properties']['bridge_name'],
feat['properties']['bridge_no'],
geom,
today))
#Delete cursor object
del cursor

#### BRIDGES ####
delete_all(BRIDGE_FC)
featList = read_ramm(BRIDGE_URL)
ramm_bridge(BRIDGE_FIELDS, featList)

print "welldone"

Thanks for reading

Posted

May 1, 2017

Python

Lucas

Tags:

Data, ESRI, geojson, json, Python

Comments

4 responses to “How to Ethically Harvest Online Data with Python for ESRI Integration”

Kim

May 18, 2017

Use arcpy.da.Editor() to open a transaction so that if the update fails you still have previous data rather than an empty dataset.

LikeLiked by 1 person

Reply
1. Lucas
  
  May 18, 2017
  
  Good idea thank you. I’m going to update my live script using the documentation for arcpy.da.Editor(). Here is the link for others who would like to follow this example http://desktop.arcgis.com/en/arcmap/10.3/analyze/arcpy-data-access/editor.htm
  
  LikeLike
  
  Reply
John K

February 22, 2019

Lucas,

Good stuff, step by step.

One suggestion on JSON Viewer part. I found below Json Grid Viewer very helpful for complex data structure like this. It shows Json in table format, very comfortable and effective.

https://jsongrid.com/json-grid

Maybe it’s worth mentioning in your blog post itself.

Cheers!

LikeLiked by 1 person

Reply
1. Lucas
  
  June 30, 2020
  
  That’s great, thanks John, I’ll look into that!
  
  LikeLike
  
  Reply

How to Ethically Harvest Online Data with Python for ESRI Integration

The Main Steps

Share this:

Comments

4 responses to “How to Ethically Harvest Online Data with Python for ESRI Integration”

Leave a comment Cancel reply