Shapefiles are an old file format, originally developed by ESRI, which have become a common way of working with Geospatial data; much to the chagrin of ESRI who have ever since been trying to migrate to a Geodatabase format. A shapefile is driven by is .shp extension but can contain upto 17 different files adding valuable information such as Z values. The four critical file extensions for a shapefile to function correctly are
- .shp
- .shx
- .prj – this contains the projection information of the shapefile
- .dbf – this contains the data table.
Geometry types
Here is the ESRI documentation on a shapefile. In essence it contains one Geometry type only, those are:
- Points – literally 1 xy
- Lines – two or more xy coordinates
- Polygons – a start xy, any number of intermediate xy and a closing xy which completes the geometry.
There are some more advanced types such as multipart polygons and donut polygons which you could read more about here as it is a genuinely interesting subject within Geospaital data.
5 Steps to creating a custom Shape file
Step 1 The easiest way to create a shapefile is to download the application QGIS, working on mac, linux and windows here.
Step 2 Open up QGIS and you should see the shapefile creation dialogue
Step 3 Create a new folder to contain all of your shapefile and save the file name. Mine here is test003
Step 4 Click edit, add vertices and save the edits
Step 5 Go to the file system and you will see the new shape file.
I should say that there are many reasons why a shapefile is not the ideal data format but it is very useful for quick data edits or shaping a polygon.
Writing a .shp file with python
To work with a shapefile programmatically, and outside of the ESRI ecosystem, you need to lean on a few libraries.
- Shapely which deals with geometry operations
- Fiona which handles the reading a writing and most terrifyingly
- pyproj4 for all your projections and transformations
However we can also just go ahead and use Geopandas which combines all of the above libraries into the Pandas ecosystem for data munging.