Thursday, December 25, 2014

Processing GeoTiff files in .NET (without using GDAL)

I've recently created a process that is able to import geoferenced raster data, namely GeoTiff files... with a catch:
As I didn't find a proper lib to load GeoTiff files I was converting the source files to an ASCII Gridded XYZ format prior to importing. Despite the fancy name it's simply an ASCII file where each line contains:
<LONGITUDE>     <LATITUDE>      <VALUE1>   [ <VALUE 2>]   [<VALUE 3>]
Each line on this file corresponds to a pixel on a raster image, thus it's incredibly easy to parse in C#. The code would be something like this:
foreach (var line in File.ReadLines(filePath))
{
    if (line == null)
    {
        continue;
    }

    var components = line.Split(
        new[] { ' ' }, 
        StringSplitOptions.RemoveEmptyEntries);

    var longitude = double.Parse(components[0]);
    var latitude = double.Parse(components[1]);

    IEnumerable<string> values = components.Skip(2);

    //... process each data item

    yield return dataItem;
}

Nice and easy. But this has two disadvantages:

First, it requires an extra step to convert to XYZ from Geotiff, although easily done with the GDAL:
gdal_translate -of XYZ <input.tif> <output.xyz>
Another disadvantage is that the XYZ becomes much larger than its TIFF counterpart. Depending on the source file the difference could be something like 10 GB vs 500 MB (yes, 50x bigger)

So, I would really like to be able to process a GeoTiff file directly in c#. One obvious candidate is using GDAL, particularly its C# bindings. Although these bindings work relatively well they rely on an unmanaged GDAL installation. For me this is a problem as I want to host this on Azure WebSites, where installing stuff is not an option.
So, I embraced the challenge of being able to process a GeoTIFF file in a fully managed way.


Wednesday, December 10, 2014

Splitting vector and raster files in QGIS programmatically

On the context of the game I'm developing I have to load gigantic .shp/.tif files and parse them to hexagons.

The problem is that this operation is executed in memory (because I need to consolidate/merge data) and the process fails with large files. I had two options:
  • Change the process so that it would store the preliminary results on a separate datastore
  • Instead of loading full shapefiles or raster images, split them into smaller chunks
I did try the first option and, although working, became so much slower (like 100x) that I had to discard it (or at least shelve it while I searched for a better alternative).

I also tried the smaller chunks approach and started by creating "packages" per Country, manually picking data from sources such as Geofabrik.

But this posed two problems:
    • Very tedious work, particularly as there are hundreds of countries.
    • Wouldn't work for larger countries, hitting the memory roadblock as well.
So I opted to split the files in a grid like manner. I decided to use QGIS as it provides all the required tooling.  

Doing this splitting is quite easy to do manually:

1. Open the shapefile (For this example I'm using Countries data from NaturalEarth).


2. Generate a grid.
  • Open the toolbox

  • Choose Geoalgorithms > Vector > Creation > Create graticule
  • Set the desired size (in degrees per rectangle) and set Grid type to Rectangle (polygon)

  • A grid is created over the map (I've changed the opacity of the grid layer to show the map beneath).

3. Select one of the rectangles



4. Perform an intersection between both layers
  • Choose Vector > Geoprocessing Tools > Intersect

  • Choose the base layer as input and the rectangle as the intersect layer, using the "Use only selected features" option.
  • After executing a new layer is created with the intersecting polygons


Now I would just need to repeat this for all the 648 grid items x number of layers to process. I'm assuming this would take about 1 minute per each one and about 10 layers. So, approximately 108 hours non-stop... Not going to happen :). So, what about automating this inside QGIS?

QGIS does provide a command-line/editor/plugins to programmatically leverage its functionalities. Unfortunately for me, it's in Python, which I had never used before. Regardless, the optimistic in me jumped at the opportunity to learn something new.

So, here it is, a sample Python script for QGIS that basically mimics the manual steps I did above:
  • Generates a grid (size hardcoded)
  • Iterates the various tiles in the grid
    • Iterates all layers currently visible (both raster and vector)
      • Outputs the intersection between the layer and the tile
      • Creates a subfolder (harcoded) with all the intersected layers for each tile.
Complete code (update: there's a newer implementation on the bottom)

import processing
import os

#Create a GRID
result = processing.runalg("qgis:creategrid", 1, 360, 180, 10, 10, 0, 0, "epsg:4326", None)

#Add it to the canvas
gridLayer = iface.addVectorLayer(result.get("OUTPUT"),"grid","ogr")

#Iterate ever square on the grid
i = 0
for square in gridLayer.getFeatures():
    i = i + 1
    
    #Create a new layer
    newSquareLayer = iface.addVectorLayer("Polygon?crs=epsg:4326", "temporary_polygon_" + str(i), "memory")        
    provider = newSquareLayer.dataProvider()

    #feature that simply holds one square
    newSquare = QgsFeature()
    newSquare.setGeometry( QgsGeometry.fromPolygon(square.geometry().asPolygon()))
    provider.addFeatures( [ newSquare ] )

    #Make sure the target folder exists
    folder = "c:\\temp\\grid\\grid_" + str(i) 
    if not os.path.exists(folder):
        os.makedirs(folder)

    #iterate the various layers except the grid
    for mapLayer in iface.mapCanvas().layers():
        
        layerType = mapLayer.type()
        layerName = mapLayer.name()
        intersectionName = "intersection_" + layerName + "_" + str(i)

        #vector layers and raster layers are processed differently
        if layerType == QgsMapLayer.VectorLayer and layerName != "grid":
        
            #Calculate the intersection between the specific grid rectangle and the layer
            intersection = processing.runalg("qgis:intersection", mapLayer, newSquareLayer, None)

            iface.addVectorLayer(intersection.get("OUTPUT"),intersectionName,"ogr")
            
            #create a shapefile for this new intersection layer on the filesystem. 
            #A separate folder will be added for each square
            intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
            QgsVectorFileWriter.writeAsVectorFormat(
                        intersectionLayer, 
                        folder + "\\" + layerName + ".shp", 
                        "utf-8", 
                        QgsCoordinateReferenceSystem(4326), 
                        "ESRI Shapefile")

            #remove the intersection layer from the canvas
            QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
                
        elif layerType == QgsMapLayer.RasterLayer:
            
            #Calculate the intersection between the specific grid rectangle and the raster layer
            intersection = processing.runalg('saga:clipgridwithpolygon', mapLayer, newSquareLayer, None)
            
            #add the intersection to the map
            iface.addRasterLayer(intersection.get("OUTPUT"), intersectionName)

            #export to file
            intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
            
            pipe = QgsRasterPipe()
            provider = intersectionLayer.dataProvider()
            pipe.set(provider.clone())
            
            rasterWriter = QgsRasterFileWriter(folder + "\\" + layerName + ".tif")
            xSize = provider.xSize()
            ySize = provider.ySize()
            
            rasterWriter.writeRaster(pipe, xSize, ySize, provider.extent(), provider.crs())
            
            #remove the intersection layer from the canvas
            QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
            
        else:
            print "layer type not supported"
    
    #Now that all the intersections have been calculated remove the new square
    print "Removing temporary grid item " + str(i) + "(" + newSquareLayer.id() + ")"
    QgsMapLayerRegistry.instance().removeMapLayers( [newSquareLayer.id()] )
    
#Remove the grid
QgsMapLayerRegistry.instance().removeMapLayers( [gridLayer.id()] )
To use this script:
  • Open the Python console:
  1. Open the editor and copy&paste the script there
  2. Save the script
  3. Execute it

Could take a couple of minutes for large files, particularly the raster ones, but at least it's automatic.


Update (11/12/2014)

Although the above script worked as planned, the import process didn't, particularly on the data between the generated tiles. As there's no overlap on the clipped data, after processing small gaps would appear on the end-result. Thus, I've created a brand new implementation that is a little bit more polished and supports a new "buffer" parameter. This allows the tiles to overlap slightly like:
Also, the grid is now created programmatically without using "creategrid" function, which also allowed me to use a more logical X,Y naming for the tiles.

The new code is:
import processing
import os

#######  PARAMS  #######

originX = -180
originY = 90

stepX = 10
stepY = 10

width =  360
height = 180

iterationsX = width / stepX
iterationsY = height / stepY

buffer = 1

j = 0
i = 0

targetBaseFolder = "C:\\temp\\grid"

#######  MAIN   #######

for i in xrange(0,iterationsX):
    
    for j in xrange(0,iterationsY):
        
        tileId = str(i) + "_" + str(j)
        
        folder = targetBaseFolder + "\\" + tileId
        
        if not os.path.exists(folder):
            os.makedirs(folder)
        
        print "Processing tile " + tileId

        minX = (originX + i * stepX) - buffer
        maxY = (originY - j * stepY) + buffer
        maxX = (minX + stepX) + buffer
        minY = (maxY - stepY) -  buffer

        wkt = "POLYGON ((" + str(minX) + " " + str(maxY)+ ", " + str(maxX) + " " + str(maxY) + ", " + str(maxX) + " " + str(minY) + ", " + str(minX) + " " + str(minY) + ", " + str(minX) + " " + str(maxY) + "))"

        tileLayer = iface.addVectorLayer("Polygon?crs=epsg:4326", "tile", "memory")
        provider = tileLayer.dataProvider()
        tileFeature = QgsFeature()

        tileFeature.setGeometry(QgsGeometry.fromWkt(wkt))
        provider.addFeatures( [ tileFeature ] )
        
        for mapLayer in iface.mapCanvas().layers():
            
            layerType = mapLayer.type()
            layerName = mapLayer.name()
            intersectionName = "intersection_" + layerName + "_" + tileId

            #vector layers and raster layers are processed differently
            if layerType == QgsMapLayer.VectorLayer and layerName != "tile":
            
                #Calculate the intersection between the specific grid rectangle and the layer
                intersection = processing.runalg("qgis:intersection", mapLayer, tileLayer, None)

                iface.addVectorLayer(intersection.get("OUTPUT"),intersectionName,"ogr")
                
                #create a shapefile for this new intersection layer on the filesystem. 
                #A separate folder will be added for each square
                intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
                QgsVectorFileWriter.writeAsVectorFormat(
                        intersectionLayer, 
                        folder + "\\" + layerName + ".shp", 
                        "utf-8", QgsCoordinateReferenceSystem(4326), 
                        "ESRI Shapefile")

                #remove the intersection layer from the canvas
                QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
                    
            elif layerType == QgsMapLayer.RasterLayer:
                
                #Calculate the intersection between the specific grid rectangle and the raster layer
                intersection = processing.runalg('saga:clipgridwithpolygon', mapLayer, tileLayer, None)
                
                #add the intersection to the map
                iface.addRasterLayer(intersection.get("OUTPUT"), intersectionName)

                #export to file
                intersectionLayer = QgsMapLayerRegistry.instance().mapLayersByName(intersectionName)[0]
                
                pipe = QgsRasterPipe()
                provider = intersectionLayer.dataProvider()
                pipe.set(provider.clone())
                
                rasterWriter = QgsRasterFileWriter(folder + "\\" + layerName + ".tif")
                xSize = provider.xSize()
                ySize = provider.ySize()
                
                rasterWriter.writeRaster(pipe, xSize, ySize, provider.extent(), provider.crs())
                
                #remove the intersection layer from the canvas
                QgsMapLayerRegistry.instance().removeMapLayers( [intersectionLayer.id()] )
                
            else:
                print "layer type not supported"
        
        #remove the temporary tile
        QgsMapLayerRegistry.instance().removeMapLayers( [tileLayer.id()] )

Eventually I'll create a proper QGIS plugin for this that allows setting input parameters without all that hardcoded logic.