This page will offer snapshots of my research work, and will hopefully change on a very regular basis!
A query on r-sig-geo asked about making
a density map ("hot-spot" map) with map boundary information in R. This is kind of tricky as it requires
use of multiple R libraries. There are almost surely better ways of doing it, but here's an example that
I put together using randomly generated data. Here's the R code for it.
Smooth image with random points in R (spatstat, using maps)
The department has upgraded the former Sun Unix Lab to a thin-client Linux/Windows lab, which among
other things has led me to completely redo my digital terrain analysis course. The technology platform
for that course is GRASS; I've been spending lots of time this fall developing labs to do basic and
advanced terrain analysis in GRASS 6.4. Viewshed calculation is one of the more advanced applications,
but r.los is pretty easy to use.
It readily produces the viewshed for a single point location on a DEM.
Viewshed for a point above Shortridge Farms, Medora, Indiana
A related problem is more complex: identify a set of observation points that offer good viewsheds across a region. No built-in command will do this; indeed, optimal observer siting is a difficult problem. However, fairly simple approaches using basic GRASS commands give decent results. Here I assume that observation sites that are the highest in their surroundings, and are substantially higher than nearby elevations, are good candidates for siting. Here's a GRASS approach, using an elevation map called dem, to find such points:
r.neighbors -c input="dem" output="dem_median" method="median" size=19 r.neighbors -c input="dem" output="dem_max" method="maximum" size=19 r.neighbors -c input="dem" output="dem_std" method="stddev" size=19 r.mapcalc 'dem_high = if((dem == dem_max && dem > dem_median + 2 * dem_std), 1, null())'
The last step finds all cells that are the highest in their 19 cell circular neighborhood and are also more than two standard deviations above the median in their neighborhood. I found that a lot were near or on the edge, because the neighborhood is so much smaller there. I decided to remove points near the edges. To do this, I had to create a raster map of the edge cells and then use r.grow to 'grow' that raster inward. This won't work so well if there are internal null() values:
r.mapcalc 'edge = if(isnull(dem[-1,-1]) || isnull(dem[1,1]), 1, null())' r.grow in=edge, out=edge19 radius=19 r.mapcalc 'dem_high = if(dem_high && isnull(edge19), 1, null())'
This raster should have a small number of cells with value 1. Convert it to vector points and export that to ascii:
r.to.vect input=dem_high output=high_points feature=point r.to.vect input=medora_high@PERMANENT output=high_points feature=point --overwrite v.out.ascii input=high_points output=hipts.txt format=point fs=, dp=1 layer=1
Output should be a very small number of points. Here's what I got for the Medora DEM. As expected,
they are positioned on local high points.
Nine observation points were automatically identified.
Finally, I ran a shell script like this one to loop through each line
in hipts.txt and run r.loc on it. The following figure shows the viewable areas; the darker the blue, the
more observation points can view that location. Gray cells are not visible from any observation point.
Combined viewsheds for the nine observation points.
Former MSU undergraduate student Evan Bowling and I developed OpenError, a set of libraries enabling
the simulation and animation of positional uncertainty in multipoint spatial datasets. These libraries
can be readily integrated with web mapping applications including
Google Maps, and
Visit our OpenError website to run some demos, find out more, and download code.
From about a quarter of a million samples I extracted on a regular grid across the conterminous USA, just 17 locations had SRTM 3" elevation errors in excess of 100 meters. Visit this custom Google Maps application showing the locations of these points and their error magnitudes
I am not the first to struggle with the definition of Michigan GeoRef; there seems to be a persistent thread of issues with it dating back many years. I have a spatial table in Postgres with EPA waste water discharge permit info, including lat-lon. I assigned srid 4326 to this, which defines geographic coordinates, WGS84 datum. I have another spatial table with point locations of all Michigan lakes larger than 10 acres in area. This is in Michigan GeoRef, units meters, srid 3078. When I query to find the five closest EPA waste water permits to Lake Lansing, in Ingham County, Michigan, I get this:
lake | company | type | st_distance --------------+--------------------------------+------------------------------+------------------ Lansing, Lake | O-N MINERALS-PORT INLAND PLANT | CRUSHED AND BROKEN LIMESTONE | 4507592.89101402 Lansing, Lake | ISLE ROYALE NATIONAL PARK | SEWERAGE SYSTEMS | 4619209.17945445 Lansing, Lake | COPPER HARBOR WWTP | SEWERAGE SYSTEMS | 4706370.09355534 Lansing, Lake | KEWEENAW CO-MT H GREELEY WWSL | SEWERAGE SYSTEMS | 4707250.80257881 Lansing, Lake | NORTH HOUGHTON CO W&SA CSO | SEWERAGE SYSTEMS | 4713280.32057519 (5 rows)All of those companies are located in the upper peninsula, and are hundreds of kilometers from Lake Lansing. The problem appears to be with the Michigan GeoRef coordinate system definition in the spatial reference system table. There is no '+no_uoff' parameter; you can see it has been added in the command below. After correcting the spatial_ref_sys table with this command:
lake | company | type | st_distance --------------+--------------------------------+--------------------------------+------------------ Lansing, Lake | EAST LANSING WWTP | SEWERAGE SYSTEMS | 8523.36101708985 Lansing, Lake | WEBBERVILLE WWSL | SEWERAGE SYSTEMS | 10849.5846686267 Lansing, Lake | MOTOR WHEEL DISPOSAL SITE | NONCLASSIFIABLE ESTABLISHMENTS | 11112.0216447675 Lansing, Lake | WILLIAMSTON WWTP | SEWERAGE SYSTEMS | 11473.6962998489 Lansing, Lake | LANSING SCHOOL DIST-JOHNSON FH | ELEMENTARY & SECONDARY SCHOOLS | 12128.5345535646 (5 rows)All of these are in Ingham County and the distances make a lot more sense.
It's been a while since I updated this page. I guess it's still very regular on a geologic time scale.
I obtained a csv file of EPA water discharge (pollutant) permits for the state of Michigan from this EPA website. This included updates through December, 2009.This data set lat-lon coordinates for the discharge sites. The first step was to clean and preprocess the data into something that the database could read. I did this in OpenOffice Calc. There were some problems:
Once all this was fixed, the file was saved as a csv with comma delimiters called:
Then in Postgres, I made a new table:
CREATE TABLE PCA (NPDES char(10), SIC_CODE char(4), SIC_TEXT varchar, CITY varchar, COUNTY varchar, LATITUDE double precision, LONGITUDE double precision, NAME varchar);
And I copied the data into the new table:
\copy pca FROM 'epa_permits.csv' WITH CSV HEADER
Now I create a new Geometry column in this table. The arguments to AddGeometryColumn are:
(table, new field, SRID code, geometry, dimension).
SELECT AddGeometryColumn('pca', 'the_geom', 4326, 'POINT', 2);
Here I used SRID 4326, which is lat lon, datum WGS84. I'm actually not sure the datum is correct (could also be NAD83 or, less likely something else), but since I think this is all user-contributed info from non-GIS people, the idea that the datum is actually consistent is... wrong. Other Michigan-specific datasets may be in Michigan GeoRef, which is SRID 3078.
Finally, I updated the_geom by dropping the lat and lon fields in there to actually get data:
UPDATE pca SET the_geom = ST_SetSRID(ST_MakePoint(longitude, latitude), 4326);
The table is all set! If I want I can export to an SQL file for passing along to others.
At the command line:
pg_dump -t pca -f epa_permits.sql
This file can be imported into another Postgresql database from the command line with:
psql -f epa_permits.sql
I just finished a workshop for CSTAT - MSU's center for statistical training and consulting. A pdf of the presentation and a zip file with the data and R scripts I used for the demos are available!
I'm currently playing with Quantum GIS,
a cross platform, open source GIS with a friendly interface. I like its substantial
file interoperability, and I've used its interface to GRASS. Mostly I use it for
quick visualizations of spatial data. While prepping a lab on obtaining digital
terrain data, I assembled the following map of the Republic of Georgia in QGIS.
Republic of Georgia: Terrain, Road, and Rail Networks
Elevation data is ETOPO 1-minute data from the NGDC's nifty GEODAS Grid Translator. It was downloaded as an ASCII raster file. Network and boundary data are digital chart of the World (DCW) downloaded as Arc E00 files from Geocommmunity.com.
This is not a great map by any means - for one thing it is unprojected, and for another there's no annotation, no labels, no legend, no scale bar. However, for the purposes of rapid visualization it took two minutes to construct. I literally dragged and dropped the ascii raster and the e00 files right into QGIS and grabbed a screenshot.
I look forward to QGIS' development; the pace of development is rapid, and its usability and stability have improved dramatically since I began fiddling with it.
I'll be talking about this at GIScience 2008 workshop.
A challenge with working with categorical data like land cover is that its nominal scale
prevents us from using more sophisticated metrics and models. In collaboration with
Ola Ahlqvist, I've been
mucking with semantic indicators of local spatial association - that is, the degree to
which the semantic content of nearby locations are similar. For example, classes 'water' and
'deciduous forest' may be quite distant from one another in a semantic space, while
deciduous forest' and 'coniferous forest' may be very near one another in semantic space.
National Land Cover Data for a small area in central Massachusetts
In the map shown above, different forest classes are represented in green; these are
pretty similar semantically. Urban classes (pink) and water (blue) are quite different
from the forest classes.
Average indicator distance between each cell class and those of its four neighbors
In this figure, the average class distance between a cell and its four neighbors is mapped
for the Massachusetts land cover data. If cells have the same class, their distance is zero.
If neighboring cells have different classes, their distance is one. This map makes no
distinction between the semantic relationships between the classes.
Average semantic distance between each cell class and those of its four neighbors
In this figure, the average class distance between a cell and its four neighbors is mapped using the semantic distances identified between classes. A more nuanced map results, with much lower distances in many parts of the map. Distances are still high around the lake at right and the urban-forest edges, but are quite low between the many forested patches in the center.
SRTM DEM data is available worldwide, but little wide-scale empirical assessment of this product's
quality exists. I've been working with extracting 10's of thousands of small chunks of DEM from
across the United States, where higher quality data is available, to conduct such an assessment.
I believe the work will have important implications for the use of SRTM data across the globe.
The following figure shows one output. This is a point map generated in R of the SRTM error in small patches of elevations spaced about 10 km apart on a regular grid. RMSE values (essentially the expected magnitude of error) ranges from 0 meters to over 200 meters across these thousands of patches. Most values fall between 3 and 10 meters. The spatial pattern of error is striking, revealing lower accuracy in higher elevation areas. Gaps in this map indicate regions where SRTM data could not be collected by the Space Shuttle.
I'm involved with Alan Arbogast on this project. We have what appear to be the first estimates
for total dune sand volume along the eastern shoreline of Lake Michigan. The quantity and
spatial distribution of this volume has implications for dune development, sources of sand, and practical management implications (e.g. sand mining). The GIS processing was
pretty interesting and more challenging than I'd thought.
I'm a free software advocate, so an ulterior motive on this has been to employ such software in the work. Much of the analysis is being done in R, a terrific stats package, and some visualization in Quantum GIS, which looks a lot like the old ArcView but has much greater functionality.
Finally, I'm working on honing both my basic cart skills and also learning Inkscape, another open source option for doing vector graphics that I just started messing around with. Below is an early result of this work showing the distribution of dune sand within the study region. Michigan is turned 'on its side' to tweak with your worldview a bit.