Wednesday, December 10, 2014

Splitting vector and raster files in QGIS programmatically

On the context of the game I'm developing I have to load gigantic .shp/.tif files and parse them to hexagons.

The problem is that this operation is executed in memory (because I need to consolidate/merge data) and the process fails with large files. I had two options:
  • Change the process so that it would store the preliminary results on a separate datastore
  • Instead of loading full shapefiles or raster images, split them into smaller chunks
I did try the first option and, although working, became so much slower (like 100x) that I had to discard it (or at least shelve it while I searched for a better alternative).

I also tried the smaller chunks approach and started by creating "packages" per Country, manually picking data from sources such as Geofabrik.

But this posed two problems:
    • Very tedious work, particularly as there are hundreds of countries.
    • Wouldn't work for larger countries, hitting the memory roadblock as well.
So I opted to split the files in a grid like manner. I decided to use QGIS as it provides all the required tooling.  

Doing this splitting is quite easy to do manually:

1. Open the shapefile (For this example I'm using Countries data from NaturalEarth).


2. Generate a grid.
  • Open the toolbox

  • Choose Geoalgorithms > Vector > Creation > Create graticule
  • Set the desired size (in degrees per rectangle) and set Grid type to Rectangle (polygon)

  • A grid is created over the map (I've changed the opacity of the grid layer to show the map beneath).

3. Select one of the rectangles



4. Perform an intersection between both layers
  • Choose Vector > Geoprocessing Tools > Intersect

  • Choose the base layer as input and the rectangle as the intersect layer, using the "Use only selected features" option.
  • After executing a new layer is created with the intersecting polygons


Now I would just need to repeat this for all the 648 grid items x number of layers to process. I'm assuming this would take about 1 minute per each one and about 10 layers. So, approximately 108 hours non-stop... Not going to happen :). So, what about automating this inside QGIS?

QGIS does provide a command-line/editor/plugins to programmatically leverage its functionalities. Unfortunately for me, it's in Python, which I had never used before. Regardless, the optimistic in me jumped at the opportunity to learn something new.

So, here it is, a sample Python script for QGIS that basically mimics the manual steps I did above:
  • Generates a grid (size hardcoded)
  • Iterates the various tiles in the grid
    • Iterates all layers currently visible (both raster and vector)
      • Outputs the intersection between the layer and the tile
      • Creates a subfolder (harcoded) with all the intersected layers for each tile.
Complete code (update: there's a newer implementation on the bottom)

import processing
import os

#Create a GRID
result = processing.runalg("qgis:creategrid", 1, 360, 180, 10, 10, 0, 0, "epsg:4326", None)

#Add it to the canvas
gridLayer = iface.addVectorLayer(result.get("OUTPUT"),"grid","ogr")

#Iterate ever square on the grid
i = 0
for square in gridLayer.getFeatures():
    i = i + 1
    
    #Create a new layer
    newSquareLayer = iface.addVectorLayer("Polygon?crs=epsg:4326", "temporary_polygon_" + str(i), "memory")        
    provider = newSquareLayer.dataProvider()

    #feature that simply holds one square
    newSquare = QgsFeature()
    newSquare.setGeometry( QgsGeometry.fromPolygon(square.geometry().asPolygon()))
    provider.addFeatures( [ newSquare ] )

    #Make sure the target folder exists
    folder = "c:\\temp\\grid\\grid_" + str(i) 
    if not os.path.exists(folder):
        os.makedirs(folder)

    #iterate the various layers except the grid
    for mapLayer in iface.mapCanvas().layers():
        
        layerType = mapLayer.type()
        layerName = mapLayer.name()
        intersectionName = "intersection_" + layerName + "_" + str(i)

        #vector layers and raster layers are processed differently
        if layerType == QgsMapLayer.VectorLayer and layerName != "grid":
        
            #Calculate the intersection between the specific grid rectangle and the layer
            intersection = processing.runalg("qgis:intersection", mapLayer, newSquareLayer, None)

            iface.addVectorLayer(intersection.get("OUTPUT"),intersectionName,"ogr")
            
            #create a shapefile for this new intersection layer on the filesystem. 
            #A separate folder will be added for each square
            intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
            QgsVectorFileWriter.writeAsVectorFormat(
                        intersectionLayer, 
                        folder + "\\" + layerName + ".shp", 
                        "utf-8", 
                        QgsCoordinateReferenceSystem(4326), 
                        "ESRI Shapefile")

            #remove the intersection layer from the canvas
            QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
                
        elif layerType == QgsMapLayer.RasterLayer:
            
            #Calculate the intersection between the specific grid rectangle and the raster layer
            intersection = processing.runalg('saga:clipgridwithpolygon', mapLayer, newSquareLayer, None)
            
            #add the intersection to the map
            iface.addRasterLayer(intersection.get("OUTPUT"), intersectionName)

            #export to file
            intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
            
            pipe = QgsRasterPipe()
            provider = intersectionLayer.dataProvider()
            pipe.set(provider.clone())
            
            rasterWriter = QgsRasterFileWriter(folder + "\\" + layerName + ".tif")
            xSize = provider.xSize()
            ySize = provider.ySize()
            
            rasterWriter.writeRaster(pipe, xSize, ySize, provider.extent(), provider.crs())
            
            #remove the intersection layer from the canvas
            QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
            
        else:
            print "layer type not supported"
    
    #Now that all the intersections have been calculated remove the new square
    print "Removing temporary grid item " + str(i) + "(" + newSquareLayer.id() + ")"
    QgsMapLayerRegistry.instance().removeMapLayers( [newSquareLayer.id()] )
    
#Remove the grid
QgsMapLayerRegistry.instance().removeMapLayers( [gridLayer.id()] )
To use this script:
  • Open the Python console:
  1. Open the editor and copy&paste the script there
  2. Save the script
  3. Execute it

Could take a couple of minutes for large files, particularly the raster ones, but at least it's automatic.


Update (11/12/2014)

Although the above script worked as planned, the import process didn't, particularly on the data between the generated tiles. As there's no overlap on the clipped data, after processing small gaps would appear on the end-result. Thus, I've created a brand new implementation that is a little bit more polished and supports a new "buffer" parameter. This allows the tiles to overlap slightly like:
Also, the grid is now created programmatically without using "creategrid" function, which also allowed me to use a more logical X,Y naming for the tiles.

The new code is:
import processing
import os

#######  PARAMS  #######

originX = -180
originY = 90

stepX = 10
stepY = 10

width =  360
height = 180

iterationsX = width / stepX
iterationsY = height / stepY

buffer = 1

j = 0
i = 0

targetBaseFolder = "C:\\temp\\grid"

#######  MAIN   #######

for i in xrange(0,iterationsX):
    
    for j in xrange(0,iterationsY):
        
        tileId = str(i) + "_" + str(j)
        
        folder = targetBaseFolder + "\\" + tileId
        
        if not os.path.exists(folder):
            os.makedirs(folder)
        
        print "Processing tile " + tileId

        minX = (originX + i * stepX) - buffer
        maxY = (originY - j * stepY) + buffer
        maxX = (minX + stepX) + buffer
        minY = (maxY - stepY) -  buffer

        wkt = "POLYGON ((" + str(minX) + " " + str(maxY)+ ", " + str(maxX) + " " + str(maxY) + ", " + str(maxX) + " " + str(minY) + ", " + str(minX) + " " + str(minY) + ", " + str(minX) + " " + str(maxY) + "))"

        tileLayer = iface.addVectorLayer("Polygon?crs=epsg:4326", "tile", "memory")
        provider = tileLayer.dataProvider()
        tileFeature = QgsFeature()

        tileFeature.setGeometry(QgsGeometry.fromWkt(wkt))
        provider.addFeatures( [ tileFeature ] )
        
        for mapLayer in iface.mapCanvas().layers():
            
            layerType = mapLayer.type()
            layerName = mapLayer.name()
            intersectionName = "intersection_" + layerName + "_" + tileId

            #vector layers and raster layers are processed differently
            if layerType == QgsMapLayer.VectorLayer and layerName != "tile":
            
                #Calculate the intersection between the specific grid rectangle and the layer
                intersection = processing.runalg("qgis:intersection", mapLayer, tileLayer, None)

                iface.addVectorLayer(intersection.get("OUTPUT"),intersectionName,"ogr")
                
                #create a shapefile for this new intersection layer on the filesystem. 
                #A separate folder will be added for each square
                intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
                QgsVectorFileWriter.writeAsVectorFormat(
                        intersectionLayer, 
                        folder + "\\" + layerName + ".shp", 
                        "utf-8", QgsCoordinateReferenceSystem(4326), 
                        "ESRI Shapefile")

                #remove the intersection layer from the canvas
                QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
                    
            elif layerType == QgsMapLayer.RasterLayer:
                
                #Calculate the intersection between the specific grid rectangle and the raster layer
                intersection = processing.runalg('saga:clipgridwithpolygon', mapLayer, tileLayer, None)
                
                #add the intersection to the map
                iface.addRasterLayer(intersection.get("OUTPUT"), intersectionName)

                #export to file
                intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
                
                pipe = QgsRasterPipe()
                provider = intersectionLayer.dataProvider()
                pipe.set(provider.clone())
                
                rasterWriter = QgsRasterFileWriter(folder + "\\" + layerName + ".tif")
                xSize = provider.xSize()
                ySize = provider.ySize()
                
                rasterWriter.writeRaster(pipe, xSize, ySize, provider.extent(), provider.crs())
                
                #remove the intersection layer from the canvas
                QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
                
            else:
                print "layer type not supported"
        
        #remove the temporary tile
        QgsMapLayerRegistry.instance().removeMapLayers( [tileLayer.id()] )

Eventually I'll create a proper QGIS plugin for this that allows setting input parameters without all that hardcoded logic.

4 comments:

  1. Hi Pedro,

    I am currently working on a project which involves DEM data which I need to view and analyse in my Delphi written software. For this project I have all the DEM models for South Africa in 1 Degree tiles. To be able to read the data into my software, I need to cut the tiles into smaller tiles, approximately 5km x 5km and translate these tiles into a grid type text file.

    I tried to use and make sense of your script, but due to my Python knowledge being practically non-existent, I could not get it to work. Have you gotten around to doing the plugin?

    Any help and advise would be greatly appreciated.

    Regards,

    Martin

    ReplyDelete
    Replies
    1. Hi Martin,

      Unfortunately I haven't done the plugin yet (curiously I did think about that this weekend).

      Regardless, if you just need to parse DEM raster files the python script could be much simpler...

      What happened when you executed it? Some error? No result? Please note that it's currently hardcoding the target path where all the tiles will be stored.


      Also, which QGIS version are you using (mine is 2.6.1) ?

      Delete
    2. Hi Pedro,

      I am currently on 2.6.1 as well(Windows). I do have to add that for some reason some of the tools does not work due to missing files, though the errors I get does not seem to relate to this.

      The first error I receive is an Invalid Layer error with the following details:
      GDAL provider: Cannot open GDAL dataset C:\Users\MARTIN~1\AppData\Local\Temp\processing\440a9909954342de85bfa2cd3358b012\OUTPUT.tif: `C:\Users\MARTIN~1\AppData\Local\Temp\processing\440a9909954342de85bfa2cd3358b012\OUTPUT.tif' does not exist in the file system, and is not recognised as a supported dataset name.
      Raster layer: Provider is not valid (provider: gdal, URI: C:\Users\MARTIN~1\AppData\Local\Temp\processing\440a9909954342de85bfa2cd3358b012\OUTPUT.tif

      Here I realised that I'll need to check the script and make any changes that pertains to my project. So firstly I replaced all the CRS definitions with the one I am using. Then I Saved my *.bil dem to a *.tif and loaded that up instead.

      This still produced the same error.

      I the saw your parameter values and realised these will need to be adjusted to my project as well. since I'm trying this out on just one tile, my origin point should be the same as my raster layer's origin? I would also need to adjust your step parameters to something that would end up being approx 5km x 5km. I guessed that the step units where in degrees, so I had to enter my values in degrees, which equates to 0.45454545, which is a floating point value and can not be used in "for i in xrange(0,iterationsX):"

      This is where I gave up...

      Regards,

      Martin

      Delete
  2. Hi,
    your original code above is solving the problem I am trying to solve. I have a very large tiff image (aerial photography) which I want to split into 250m x 250m tiles. I nearly got it to work, but I have two issues. the first seems to be that my image is a multiband and saga apparently doesn't support this. The second is that the grid created has some grid cells with no tiff under (ie image stretches NE/ SW whereas the grid is obviously, well a grid).

    Any suggestions for someone with no python experience and now only limited QGIS experience would be very gratefully received!

    ReplyDelete