On the context of the game I'm developing I have to load gigantic .shp/.tif files and parse them to hexagons.
The problem is that this operation is executed in memory (because I need to consolidate/merge data) and the process fails with large files. I had two options:
- Change the process so that it would store the preliminary results on a separate datastore
- Instead of loading full shapefiles or raster images, split them into smaller chunks
I did try the first option and, although working, became so much slower (like 100x) that I had to discard it (or at least shelve it while I searched for a better alternative).
I also tried the smaller chunks approach and started by creating "packages" per Country, manually picking data from sources such as
Geofabrik.
But this posed two problems:
- Very tedious work, particularly as there are hundreds of countries.
- Wouldn't work for larger countries, hitting the memory roadblock as well.
So I opted to split the files in a grid like manner. I decided to use
QGIS as it provides all the required tooling.
Doing this splitting is quite easy to do manually:
2. Generate a grid.
- Choose Geoalgorithms > Vector > Creation > Create graticule
- Set the desired size (in degrees per rectangle) and set Grid type to Rectangle (polygon)
- A grid is created over the map (I've changed the opacity of the grid layer to show the map beneath).
3. Select one of the rectangles
4. Perform an intersection between both layers
- Choose Vector > Geoprocessing Tools > Intersect
- Choose the base layer as input and the rectangle as the intersect layer, using the "Use only selected features" option.
- After executing a new layer is created with the intersecting polygons
Now I would just need to repeat this for all the 648 grid items x number of layers to process. I'm assuming this would take about 1 minute per each one and about 10 layers. So, approximately 108 hours non-stop... Not going to happen :). So, what about automating this inside QGIS?
QGIS does provide a command-line/editor/plugins to programmatically leverage its functionalities. Unfortunately for me, it's in Python, which I had never used before. Regardless, the optimistic in me jumped at the opportunity to learn something new.
So, here it is, a sample Python script for QGIS that basically mimics the manual steps I did above:
- Generates a grid (size hardcoded)
- Iterates the various tiles in the grid
- Iterates all layers currently visible (both raster and vector)
- Outputs the intersection between the layer and the tile
- Creates a subfolder (harcoded) with all the intersected layers for each tile.
Complete code (
update: there's a newer implementation on the bottom)
import processing
import os
#Create a GRID
result = processing.runalg("qgis:creategrid", 1, 360, 180, 10, 10, 0, 0, "epsg:4326", None)
#Add it to the canvas
gridLayer = iface.addVectorLayer(result.get("OUTPUT"),"grid","ogr")
#Iterate ever square on the grid
i = 0
for square in gridLayer.getFeatures():
i = i + 1
#Create a new layer
newSquareLayer = iface.addVectorLayer("Polygon?crs=epsg:4326", "temporary_polygon_" + str(i), "memory")
provider = newSquareLayer.dataProvider()
#feature that simply holds one square
newSquare = QgsFeature()
newSquare.setGeometry( QgsGeometry.fromPolygon(square.geometry().asPolygon()))
provider.addFeatures( [ newSquare ] )
#Make sure the target folder exists
folder = "c:\\temp\\grid\\grid_" + str(i)
if not os.path.exists(folder):
os.makedirs(folder)
#iterate the various layers except the grid
for mapLayer in iface.mapCanvas().layers():
layerType = mapLayer.type()
layerName = mapLayer.name()
intersectionName = "intersection_" + layerName + "_" + str(i)
#vector layers and raster layers are processed differently
if layerType == QgsMapLayer.VectorLayer and layerName != "grid":
#Calculate the intersection between the specific grid rectangle and the layer
intersection = processing.runalg("qgis:intersection", mapLayer, newSquareLayer, None)
iface.addVectorLayer(intersection.get("OUTPUT"),intersectionName,"ogr")
#create a shapefile for this new intersection layer on the filesystem.
#A separate folder will be added for each square
intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
QgsVectorFileWriter.writeAsVectorFormat(
intersectionLayer,
folder + "\\" + layerName + ".shp",
"utf-8",
QgsCoordinateReferenceSystem(4326),
"ESRI Shapefile")
#remove the intersection layer from the canvas
QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
elif layerType == QgsMapLayer.RasterLayer:
#Calculate the intersection between the specific grid rectangle and the raster layer
intersection = processing.runalg('saga:clipgridwithpolygon', mapLayer, newSquareLayer, None)
#add the intersection to the map
iface.addRasterLayer(intersection.get("OUTPUT"), intersectionName)
#export to file
intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
pipe = QgsRasterPipe()
provider = intersectionLayer.dataProvider()
pipe.set(provider.clone())
rasterWriter = QgsRasterFileWriter(folder + "\\" + layerName + ".tif")
xSize = provider.xSize()
ySize = provider.ySize()
rasterWriter.writeRaster(pipe, xSize, ySize, provider.extent(), provider.crs())
#remove the intersection layer from the canvas
QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
else:
print "layer type not supported"
#Now that all the intersections have been calculated remove the new square
print "Removing temporary grid item " + str(i) + "(" + newSquareLayer.id() + ")"
QgsMapLayerRegistry.instance().removeMapLayers( [newSquareLayer.id()] )
#Remove the grid
QgsMapLayerRegistry.instance().removeMapLayers( [gridLayer.id()] )
To use this script:
- Open the editor and copy&paste the script there
- Save the script
- Execute it
Could take a couple of minutes for large files, particularly the raster ones, but at least it's automatic.
Update (11/12/2014)
Although the above script worked as planned, the import process didn't, particularly on the data between the generated tiles. As there's no overlap on the clipped data, after processing small gaps would appear on the end-result. Thus, I've created a brand new implementation that is a little bit more polished and supports a new "buffer" parameter. This allows the tiles to overlap slightly like:
Also, the grid is now created programmatically without using "creategrid" function, which also allowed me to use a more logical X,Y naming for the tiles.
The new code is:
import processing
import os
####### PARAMS #######
originX = -180
originY = 90
stepX = 10
stepY = 10
width = 360
height = 180
iterationsX = width / stepX
iterationsY = height / stepY
buffer = 1
j = 0
i = 0
targetBaseFolder = "C:\\temp\\grid"
####### MAIN #######
for i in xrange(0,iterationsX):
for j in xrange(0,iterationsY):
tileId = str(i) + "_" + str(j)
folder = targetBaseFolder + "\\" + tileId
if not os.path.exists(folder):
os.makedirs(folder)
print "Processing tile " + tileId
minX = (originX + i * stepX) - buffer
maxY = (originY - j * stepY) + buffer
maxX = (minX + stepX) + buffer
minY = (maxY - stepY) - buffer
wkt = "POLYGON ((" + str(minX) + " " + str(maxY)+ ", " + str(maxX) + " " + str(maxY) + ", " + str(maxX) + " " + str(minY) + ", " + str(minX) + " " + str(minY) + ", " + str(minX) + " " + str(maxY) + "))"
tileLayer = iface.addVectorLayer("Polygon?crs=epsg:4326", "tile", "memory")
provider = tileLayer.dataProvider()
tileFeature = QgsFeature()
tileFeature.setGeometry(QgsGeometry.fromWkt(wkt))
provider.addFeatures( [ tileFeature ] )
for mapLayer in iface.mapCanvas().layers():
layerType = mapLayer.type()
layerName = mapLayer.name()
intersectionName = "intersection_" + layerName + "_" + tileId
#vector layers and raster layers are processed differently
if layerType == QgsMapLayer.VectorLayer and layerName != "tile":
#Calculate the intersection between the specific grid rectangle and the layer
intersection = processing.runalg("qgis:intersection", mapLayer, tileLayer, None)
iface.addVectorLayer(intersection.get("OUTPUT"),intersectionName,"ogr")
#create a shapefile for this new intersection layer on the filesystem.
#A separate folder will be added for each square
intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
QgsVectorFileWriter.writeAsVectorFormat(
intersectionLayer,
folder + "\\" + layerName + ".shp",
"utf-8", QgsCoordinateReferenceSystem(4326),
"ESRI Shapefile")
#remove the intersection layer from the canvas
QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
elif layerType == QgsMapLayer.RasterLayer:
#Calculate the intersection between the specific grid rectangle and the raster layer
intersection = processing.runalg('saga:clipgridwithpolygon', mapLayer, tileLayer, None)
#add the intersection to the map
iface.addRasterLayer(intersection.get("OUTPUT"), intersectionName)
#export to file
intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
pipe = QgsRasterPipe()
provider = intersectionLayer.dataProvider()
pipe.set(provider.clone())
rasterWriter = QgsRasterFileWriter(folder + "\\" + layerName + ".tif")
xSize = provider.xSize()
ySize = provider.ySize()
rasterWriter.writeRaster(pipe, xSize, ySize, provider.extent(), provider.crs())
#remove the intersection layer from the canvas
QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
else:
print "layer type not supported"
#remove the temporary tile
QgsMapLayerRegistry.instance().removeMapLayers( [tileLayer.id()] )
Eventually I'll create a proper QGIS plugin for this that allows setting input parameters without all that hardcoded logic.