Wednesday, December 9, 2015

Final Project

For our final project assignment we were given two basic choices: either apply what we've learned to a pre-fab project, or do the same to one we've created on our own.  I went with the latter option mainly because I was curious as to how the 2013 Rim Fire has affected the overall forest health in the area.  Given that the state of California is currently in a severe drought I suspected that these effects would also be highly visible on Landsat imagery.

Unsupervised classification of my study area.  MMU was 1:100,000 so take this with a grain of salt.

To start, I'd like to mention that I'm not a botanist, so a lot of the plant terminology and spectral analysis discussion mentioned throughout the course was something I've only recently picked up.  Before this class I had no idea what a NDVI was, or even how to tell one tree from another when viewing various multi-spectral data (I still can't, unless I've been given pointers on what to look for first!).  So in the spirit of applying all that I'd learned over this course I focused on a project that required the classification of healthy vs. non-healthy vegetation.

The first image shows a very rough classification of healthy vs. non-healthy vegetation.  The MMU was at 1:100,000, which at first I had done for time saving purposes (mostly to not go crazy during the analysis portion).  I now feel that I made the right call, since the study area is rather large, and I was trying to quantify general trends.  For something more in-depth, perhaps a 1:50,000 or larger scale should be used.

Comparison view of three years shown across two different band combinations.

The comparison of the multi-spectral imagery was perhaps the most exciting part of this project, and also the most enlightening.  For example, on the bottom left of the image above you'll see a map inset with a lot of bright green - that's healthy (fast-growing) vegetation.  The image in the middle was taken during the Rim Fire, and one can see how dark the remaining vegetation had become.  Part of this could be due to the change of Landsat satellites used - 2010 marked the wane of the Landsat TM 5, and since 2012 Landsat 8 has been generating imagery for the United States.  With the change of satellites marked a change in the number of bands, which is why I've listed two band combinations on the map.

The 2015 imagery had smoke cover... but the website said no cloud cover!  Which I suppose is technically true.  The smoke made interpretation of the image very difficult, although if nothing else the image shows how pollutants in the air can be visualized.


*Originally published December 9, 2015.  Updated on 2/27/2017 to repair image links.

Monday, December 7, 2015

Week 15 - Dasymetric Mapping

This lab marks our final lab of the semester.  In it we covered dasymetric mapping, which is a mapping style that has been around since the 1800s.  This technique involves using ancillary data such as land cover information to map where a given population is more likely to reside, based on various attributes of the ancillary data.  Dasymetric mapping is used when one wishes to extrapolate population information from an enumeration unit to another category, such as land cover cells for urban areas.  Since this type of mapping is an estimate, quite a bit of error can be introduced into the estimated results - particularly since one is estimating values from two different areas that do not cover the same spatial area.  Error checking is key to staying on track with dasymetric mapping.

For our lab we compared an areal weighting technique and a dasymetric mapping technique.  For the areal weighting we estimated the amount of the school aged population per a given high school zone, but that was located outside of areas covered in water. The theory here is that the population could reside anywhere within the census tract as it intersected with the high school zone, as long as that area was outside of a water polygon. 

View of impervious areas (in red) as they relate to census tract and high school zones.

The view above shows the dasymetric mapping result of our lab.  Instead of using land cover information for our ancillary data we instead used a measure of imperviousness.  The impervious areas are shown above in red; gray areas indicate zero imperviousness.  FYI, imperviousness indicates a built-up area (an impermeable area), and is a better measure of determining where people are more likely to reside than using land cover data alone.

The goal was to determine how many school aged children reside in the impervious areas per high school zone.  Each high school area (shown above bounded in dark gray) contains census tract data, but as the view shows the two are not spatially congruent.  This also holds true with the impervious areas, which are depicted above in raster format.

To make this all work I ended up using the Zonal Statistics to Table to determine the amount of impervious areas per census tracts.  This operation was completed first with my census tract and impervious data, and then again using an intersect of the census tract and high school zone information.  New fields were added to each result to account for a 'before' impervious area and an 'after' impervious area.  This was important because to ultimately determine the population of school age children per high school zone I used the following calculation: school aged children per high school = total school aged children * ('after' impervious area / 'before' impervious area).

The final result, after error checking against a reference population, showed that only 10% of the population was allocated incorrectly.  While not perfect it indicates a slight improvement over the areal weighting technique, which resulted in 11% of the population having been allocated incorrectly. 

*Originally published on December 17, 2015.  Updated on 2/27/2017 to repair image links.

Friday, December 4, 2015

GIS Student Portfolio

For our final internship seminar assignment we created a portfolio of our GIS related work.  My portfolio contains project examples that were created throughout the course of the entire GIS Certificate program, and can be viewed here.  An audio discussion about my portfolio content can be heard here

The portfolio also includes some examples from my GIS internship, which I'm happy about since many of these same map products were also quite time consuming to create!  The assignment helped show me how far I've actually come within this program.  It was kind of fun (and sometimes also cringe-worthy) to review old projects, and see how my overall map style has evolved. 

Overall I found this assignment to be very useful, and I will be incorporating this into my set of job search materials.  This includes adding a revised version of this portfolio to my LinkedIn page.  I realize I could have done that for this assignment - but then I wouldn't have had a handy paper copy to take with me to interviews!


Tuesday, December 1, 2015

Lab 14 - Spatial Data Aggregation

This week's lab focused on the modifiable areal unit problem (MAUP) and ways to identify it.  One very well known MAUP issue involves political districts and the practice known as gerrymandering.  Gerrymandering essentially represents a zonal effect of MAUP in that polygons which represent voting districts are drawn in such a way as to favor one political party or another... the result rarely very much to do with population pressures or the like.  Usually a gerrymandered district will appear very compact, yet also elongated and irregular in shape.  It's a polygon that looks like it was drawn as if the cherry-pick the residents who will fall within that particular district.

Example of a congressional district that is very compact - a possible sign of gerrymandering.

Another gerrymandering test involved viewing how well a congressional district represented its community.  Ideally the district would not break up a county - for example, ideally a county would have the same district, not multiple congressional districts.

*Originally published on December 1, 2015.  Updated on 2/27/2017 to repair image links.

Friday, November 27, 2015

GIS Day

GIS Day was on Nov. 18, however I was working that day (and there weren't any planned activities at my place of employment).  So instead I ended up celebrating GIS Day this past Monday (11/23) at work with the official unveiling of one of my internship deliverables.

As part of my internship I had updated a 'how-to' guide explaining GIS and GPS protocol for the Stanislaus NF (STF) Heritage Staff.  The guide takes one through the following steps: create a working data storage file structure, how to collect GPS data & download it, how to append the GPS data to a working copy of the STF Heritage geodatabase, how to digitize surveyed areas, what the preferred attribute values are for the Heritage data, and how (as well as when) to submit the final product to the STF GIS Coordinator.  The guide will hopefully standardize the spatial data collection methods for the forest, and also help those who may not have very strong GIS skills complete basic data management tasks.

Sample page on how to find & use the Append tool.

Towards the latter end, I 'field-tested' my guide on two employees who have had little to no GIS experience.  Their current job duties have given them the GPS collection and download experience, but the GIS side of things was lacking because technically that's not their job.  Yet in order to move ahead our profession basic GIS skills are required... but opportunities to learn them can be thin on the ground (but much appreciated whenever they come along).  My co-workers used the guide, and said that it was easy to follow and understand (which I was seriously wondering about, because the append part wasn't so easy to write... I have a new appreciation for our professors and TAs who put together our labs with all those screenshots!).  Hopefully future new employees at STF also find it easy to use and understand, and can navigate their way through data collection at STF with confidence.

*Originally published on November 27, 2015.  Updated on 2/27/2017 to repair image links.

Monday, November 23, 2015

Lab 13 - Effects of Scale

This week's lab marked the start of three-part series dealing with issues of scale and resolution.  For the final portion of the lab we compared SRTM data and LIDAR data, both of which had been re-sampled to a 90 m cell size.

Comparison of SRTM data and LIDAR data, both at 90 m resolution.

The resampled DEMs and their derivatives (slope and aspect rasters) were compared visually and the overall range of DEM elevation values and average slope were also discussed.  At the start the LIDAR DEM appears to have slightly more detail than the SRTM DEM.  This appearance becomes very pronounced in the derivative products, with the LIDAR based slope and aspect rasters each containing so much detail that they appear almost pixelated.

The SRTM data has less detail than the LIDAR dataset simply because the data was collected from a satellite - it will never be able to capture the amount of data LIDAR can simply because it is too far removed to be able to do so.   LIDAR data is normally collected via airplane, making it much closer to the source that it is remotely sensing. 

*Originally published on November 23, 2015.  Updated on 2/27/2017 to repair image links.

Tuesday, November 17, 2015

Lab 12 - Geographically Weighted Regression

This week's lab wrapped up a 3-week exploration into the use of regression; the focus for this week was specifically using geographically weighted regression. 

Geographically weighted regression (GWR) is different from using a regular regression method like ordinary least squares (OLS) in that it takes into account the spatial difference between the variables, as well as the variables themselves.  Each set of variables is then weighted according to its position near or away from the other variables (and, incidentally, the nearer something is the more likely it is to have a higher weight - because it's more likely to be related to the other variables).

For the final part of our lab we had to compare an OLS model with a GWR model - all using the same variable inputs, of course.  Using the rate of hit-and-run counts as my dependent variable I then compared four other neighborhood statistics (such as percentage of renter occupied units) against the hit-and-run crime rate.

Unfortunately in my case I did not observe much of a change between the two regression models, although I have a fairly good idea of why that may have been - I had two variables that probably were too similar to each other and so one should have been dropped (a variable for the percentage of renter occupied units and a separate variable for median home value).  Neither of these variables set off any colinearity alarms during the OLS stage (the VIF statistic provided with the ArcGIS OLS results would have shown me that), but something was clearly amiss.  When comparing my AIC, Adjusted R-square, and z-score results between the GWR and the OLS models it was clear that any changes between the two were not very significant.  Considering my overall low Adjusted R-square values between the two models (both were at 0.189) it's back to the drawing board in terms of choosing variables for my model.

Tuesday, November 10, 2015

Lab 10 - Supervised Image Classification

For this lab we used a supervised image classification method to create a thematic land use/land cover (LULC) map of Germantown, Maryland.

This method uses what are called 'training areas' to guide the computer in assigning a LULC value per pixel.  These training areas are selected prior to running the automated classification method, which does imply that the thematic map creator knows quite a bit about what to expect land cover-wise prior to beginning the process.

The LULC classes as created with supervised classification.

The map above was created with several training classes provided for most categories - the idea here is that more than one example is better for the program when it assigns classes to the various pixels.  The pixel assignments are neighborhood based and used a maximum likelihood assignment method.  This means that pixel assignments are based on those having the highest probability of matching the spectral values as provided in the training class(es). 

As can be seen in the map above, there are quite a few acres devoted to roads... and that is not technically correct.  Quite a bit of those roads seem to represent urban areas, or possibly even grasses.  Some tweaks are needed for the roads training classes (I had used two).  Unfortunately the spectral signatures for roads is very similar to that of the urban areas, so there will always be some error on the map no matter how much those training classes get altered.

A spectral euclidean distance map is also shown above as an inset.  As I understand it, this map represents the amount of error on my thematic map - and is displayed as bright pixels.  Since my inset map is quite bright, that means there happens to be quite a bit of error on my map... and most of those errors seem to follow along my roads class.  It seems that this process requires a lot of trial and error before a final product can be presented.

*Originally published on November 10, 2015.  Updated on 2/27/2017 to repair image links.

Monday, November 9, 2015

Lab 11 - Multivariate Regression, Diagnostics, and Regression in ArcGIS

This week we expanded our regression analysis from bivariate (or comparing two variables) to multivariate (or comparing more than two variables).  This type of analysis can be accomplished in ArcGIS by using the Ordinary Least Squares (OLS) script tool.

As suggested by ESRI staff, using the OLS tool is a must - even if your target is to run a Geographically Weighted analysis.  By using the OLS tool you can determine if your model is, in fact, the best fit to explain your data.  How one does this is by determining if the OLS results passes the "6 OLS checks", which are:

1.  Are the independent variables helping your model (are they statistically significant)?
2.  Are the relationships as expected (variables are either negatively or positively correlated)?
3.  Are any of the explanatory variables redundant?
4.  Is the model biased?
5.  Do you have all key explanatory variables?
6.  How well are you explaining your dependent variable?

Each of the above can be answered with the slew of stats generated by the OLS report.  For example, to check for model bias you review the Jarque-Bera test results.  This test assesses whether your residuals are normally distributed or not - if this test comes back as statistically significant then you have a problem with skewed (or biased) data.

To determine if you have all explanatory variables it necessary to run the Spatial Correlation (Global Moran's I) tool; the extremely helpful printout generated at the end not only shows your residual distribution, but also lets you know if any clustering or dispersion is statistically significant.  If you have problems here then you need to add more data.

To compare models one simply needs to know the Akaike's Information Criterion (AIC) score and the Adjusted R-squared residual... also helpfully provided within the OLS report.  And if there are issues with what an OLS generated statistic means, then there are plenty of ArcGIS Help files to help you out.  It's actually quite impressive what the ESRI folks have done to make regression analysis easier for the general user. 

Thursday, November 5, 2015

Lab 10 - Introductory Statistics, Correlation, and Bivariate Regression

This week we started our penultimate theme - spatial statistics.  This lab was essentially a review of basic statistics, and how these can be applied to spatial data analysis. 

Scatterplot showing a regression line created from known weather station readings.

The graphic above depicts data from two different weather stations.  This data was used to create a regression line, or the best 'fit line between two known values.  This best fit line is then used to predict values, such as predicted rainfall totals. 

For the purposes of our lab we used the regression line to obtain possible values for Station A, which was missing data for an 18 year period.  By determining the slope and intercept values (based on the known input from our two weather stations) we were able to predict what the rainfall totals for Station A could have been based on the data from Station B for the same year.  The formula used was: Y' = bX + a.  Or written another way: the predicted value for Station A = (slope * Station B input) + intercept.

This type of analysis is very useful if you wish to compare the differences between two (or more) variables, or to make value predictions based on known information.  However there are some caveats: first, the data must be linear and normalized - wacky outliers can skew these results.  Also, not all data types can be used to run a bivariate regression analysis - for example, if the data to be compared consists of percentages or arbitrary values (such as names) the data must either be transformed or an alternative analysis method must be used.  Lastly, just because two variables can be compared doesn't mean they should be - there may not be a statistical or logical relationship between the two variables.  Essentially, one needs to know their datasets - and run additional tests (such as a t-test) to determine statistical validity.

*Originally published on November 5, 2015.  Updated on 2/27/2017 to repair image links.

Wednesday, November 4, 2015

Lab 9 - Unsupervised Classification

This week's lab focused on using an automated method to classify aerial imagery: the unsupervised classification.  This method isn't exactly hands free - it's just called unsupervised because the program that completes the process does so without a training data set.  If it had training data to guide it, then the process would be considered supervised.

With unsupervised classification the computer program iterates through the image using whatever algorithms and input parameters were assigned at the start of the process.  When the program creates the classes it does so by grouping similar brightness values together.  Once the process is complete it is necessary to review the results and then manually classify (or re-classify) as needed.

Map depicts an image that had been re-classed into 5 land use/land cover categories using unsupervised classification.

The above image represents an unsupervised classification that was run using the ERDAS Imagine program.  An ISODATA classification was used; specified input parameters included the choice of 50 classes to be created, setting the maximum number of iterations to 25, and setting the convergence threshold to 0.950.  All other options were left at their defaults. 

After the image was re-classed with 50 (!) classes, I then manually pared this down to 5 based on very general land use/land cover types (grass, trees, urban areas, mixed, and shadows).  The shadow category was something of a surprise, but given the time of day the image was taken there were quite a few shadows!  The mixed category represents those pixels that actually could be assigned to more than one land use/land cover category.

*Originally published on November 4, 2015.  Updated on 2/27/2017 to reset image links.

Tuesday, October 27, 2015

Lab 8 - Thermal & Multispectral Analysis

This week's lab focused on interpreting thermal imagery using ERDAS Imagine and ArcGIS.  To this end we each had to select a unique feature on a multispectral composite image of the Pensacola, Florida and analyze how it appears in various wavelengths.

Comparison of how a sandbar is viewed using various views of multispectral imagery.
The wavy appearance of the sandbars along the northern shorelines caught my eye, so I'd decided to focus on how these appear within various wavelengths.  While the sandbars are visible in just about all of the combined multispectral imagery, when viewed within separate bands it was almost impossible to see.  The contrast, or brightness values, had to be altered in most cases.  The only individual bands that the sandbar was semi-visible in was within Band 4 (a near infrared band) and Band 6 (a thermal infrared band).

Monday, October 26, 2015

Lab 9 - Accuracy of DEMs

This week we analyzed the accuracy of DEMs (Digital Elevation Models).  This particular lab has built upon concepts that were covered in Labs 1 - 3, and heralded the return of RMSEs (root mean square error), percentile calculations, and Excel spreadsheets.

Determining the accuracy of elevation data is remarkably similar to determining the accuracy between x, y coordinates.  Essentially, one needs a series of sample points from the original data set (preferably at least 20 per land cover class type) and a set of reference data.  The reference data should be of a higher quality than the source data.  In the case of elevation data this usually would be the elevation data collected from a sub-meter GPS during the initial data capture (such as with LIDAR).  The differences between the source data and the reference data sets are then calculated.  Statistics such as RMSE, 68th percentile, and 95th percentile are then calculated.

To illustrate this, the first portion of our lab compared reference points and source points for LIDAR data taken within North Carolina.  The data was sub-divided into 5 general land cover types; each land cover type had test points well in excess of the recommended 20 sample points.  After calculating the average RMSE, 68th percentile, and 95th percentile statistics for each land cover type tested, the data was then viewed on a scatter plot graph to see at a glance where obvious outliers in the data are, as well as areas of potential bias.

What I had found was that the differences between the reference and source elevations were relatively similar (from between -0.2 to 0.4 m).  There was one obvious outlier, which represented a possible error during data collection.  There was also a slight bias in the DEM, with the DEM underestimating elevation values.  I've included a graph showing this data; the potential bias is visible from -0.3 to -0.7 m on the graph.

Graph of differences between the source data and reference data; difference values are in meters.



Wednesday, October 21, 2015

Lab 7 - Multispectral Analysis

This week's lab focused on identification of features using various bands of satellite imagery.  The maps below show the results of a seek-and-find type exercise, using spikes in pixel values between image bands as our guide to find the required features.

Map 1.  The darkness of the pixel values representing open water were offset by using false natural color.

Map 2.  The brightness of the snow pack is offset by the surrounding landscape, shown in false color infra-red.

Map 3.  The variations within the shallow water are visible by setting the color bands to a TM Bathymetry setting.

Monday, October 19, 2015

Lab 8 - Surface Interpolation

This week we covered various interpolation methods, as well as the best uses for each type of interpolator. 

Part of the lab involved comparing two DEMs that we created from elevation point data.  The two interpolation methods used were IDW (Inverse Distance Weighted) and Spline (regularized).  To compare the differences between the two methods the Raster Calculator tool was used to subtract the elevation values between the two DEM grids.  A map of the final result is shown below.

Map showing the differences between two DEM grids.

Thursday, October 15, 2015

Lab 6 - Spatial Enhancements

View of the spatially enhanced image.
This week we utilized our newly acquired ERDAS Imagine skills to apply visual enhancements to raster imagery.  The image above originally had these thick black diagonal lines crossing through it - with the spatial enhancement measures applied these lines are not quite as noticeable.

The enhancements were applied using both the ERDAS Imagine software and ArcGIS.  The processing began with ERDAS Imagine by applying a Fourier transformation with a low pass filter.  Then the image was further processed by applying a 3x3 sharpening effect (the 3x3 refers to the number of cells used as the kernel for the enhancements).

The image was then opened in ArcGIS, since this program has a Focal Statistics tool (and ERDAS Imagine doesn't quite have these statistics in its arsenal of enhancements).  After setting the Focal Statistics tool to run a rectangular 3x3 mean pass the image was considered to be ready for the final map layout.

While the diagonal lines haven't disappeared completely at least it is now possible to figure out what the features in the image are.  Apparently it is possible to remove those diagonal lines, but that requires some additional (advanced) techniques... trying to get the Fourier transformation correct for this exercise was difficult as it was.

Monday, October 12, 2015

Lab 7 - TINs and DEMs

This week's lab explored the differences between a TIN surface and a DEM. 

TIN stands for 'Triangulated Irregular Network' and DEM stands for 'Digital Elevation Model'.  Both surfaces represent landforms based on elevation data, and both can be used to show slope, aspect, and contours.  A major difference between the two is the file type - a TIN surface is vector based whereas a DEM is raster based. 

Detail view of a TIN surface, showing contours, nodes, and edges.

A bonus to a TIN surface is that aspect, contours, and slope can all be depicted and stored within the same file.  A DEM raster cannot do this - different files would need to be created to show slope, aspect, and contours... and this could potentially involve multiple steps (for example, a slope raster may need to be converted to polygons if one wanted to show it as vector data).

Another bonus to a TIN surface is that breaklines (essentially, a linear feature) can be used to modify the original topography.  This is not possible with a DEM, thus making the TIN surface ideal for engineering applications... or detailed archaeological site maps.

Why aren't we using more TIN surfaces  if they're so great?  Well, they require a lot of computer memory, and therefore shouldn't be larger than 1,000,000 nodes.  That may sound like a lot, but these things tend to be quite detailed and so can reach that threshold easily.  TINs also are not the best surfaces to use when one wants to model continuous data over large areas - that is where DEMs shine.

Monday, October 5, 2015

Lab 6 - Location-Allocation Modeling

This week's lab saw us using Network Analyst to perform a basic location-allocation analysis.  The goal was to show where change may be needed in a fictional company's distribution center service areas.  Since the distribution centers were built during the company's national growth, the areas being served may not actually be the best fit in terms of time and resources.  The service areas, or market areas, were reassessed using the location-allocation function.

Comparison of the market areas before and after completing the location-allocation analysis.

Technical Notes

The location-allocation analysis was run with the distribution centers as a required facility type.  This meant that the solver had to choose all 22 of the distribution centers when computing the final service areas.

The analysis also utilized advanced settings and set the analysis type to minimize impedance.  This meant that it solves what ESRI calls the 'warehouse location' problem - it minimized all of the total impedances set into an analysis.  Our impedances utilized a roads network, which allowed U-turns and had the origination of the routes be at the facility (as opposed to the demand points - a.k.a. customers). 

If you were wondering, an impedance is a cost built into the analysis - meaning something the algorithm has to factor in.  Since the location-allocation analysis is built into the Network Analyst extension, this means that the algorithm essentially calculates the best routes from a facility to a customer.  No impedance cut-off was set (which usually refers to a set mileage from a facility or a drive time), so theoretically a customer located hundreds of miles away from a distribution center could be included within the market area for that center... provided that no other facility was located closer.

Tuesday, September 29, 2015

Lab 5a - Intro to ERDAS Imagine and Digital Data 1

This week's lab was mostly a tutorial on how to navigate within ERDAS Imagine.

Detail view of land cover data set within Washington State.
The functionality of ERDAS Imagine is great - it makes sense on the user's end, and one doesn't really need to drill down into various properties just to change one little thing.  Being able to mess with each color band was neat, and helped to make certain aspects of remotely sensed imagery processing a little less abstract.  However some drawbacks to the program do seem to be its map making capabilities (there are known bug issues).  The above map was finished using ArcGIS.

The view above is an inset of a much larger classed raster image.  It took some tries to get the correct view extent to transfer over to ArcGIS... even though this step was spelled out in the directions I suppose I still needed to do my own trial and error to learn this part! 

Lab 5 - Vehicle Routing Problem

Screen capture of a solved vehicle routing problem.
This week's lab focused on the vehicle routing problem, and how ArcGIS Network Analyst extension solves for this problem.  The screenshot above shows the results of the analysis, which is a series of 22 routes with various delivery stops.

Solving for any vehicle routing problem (VRP) can become a bit complex as behind-the-scenes work just to build up the truck and employee cost information, customer information, and depot information can be very extensive.  This does not even begin to include the use of a suitable road network on which to model the routes, or the definitions of where each truck/route can go... happily the great majority of this work was completed for us prior to beginning this lab.

The solved routes in the screenshot above utilize all 22 trucks in a distribution company's fleet.  Each truck was assigned to a certain "route zone" meaning that they ideally do not leave their zones to make deliveries in other areas.  However a few trucks were allowed to stray outside of their routine service zones in order to be part of a more profitable solution for this particular company.  Why have a service zone?  In order to provide continuity for the customer by having the same delivery man, instead of random unknowns doing the drop-offs. 

The one big issue with making service zones overly strict is this: the VRP solver can miss the obvious 'common-sense' solution in favor of an optimal solution that adheres to strict parameters.  Prior to the screenshot above our initial VRP route had been solved with such strictly defined service zones.  As operating costs are also an important factor, the VRP solver had provided only 14 routes (meaning 14 trucks) to shoulder the burden of delivering 128 orders across southern Florida.  This meant a lot of overtime, and in the end there were 6 orders left unfulfilled and several others that would have been delivered outside of regular business hours.

By tweaking a few items (mainly allowing a few trucks to make deliveries in adjacent delivery zones) the VRP solver was able to assign a truck route to deliver all of the orders, with only 1 order (out of 128) being made after normal business hours.  The overall revenue generated with the modified route also went up ... along the with cost to operate, but the operating costs were comparable to the rise in revenue.  Assuming that a satisfied customer is one that does repeat business, then choosing the optimal route solely on the lowest operating cost simply does not make sense - one needs to also make sure that the customer's needs are being met.  This was accomplished with the modified VRP route.

Tuesday, September 22, 2015

Module 4 - Ground Truthing & Accuracy Assessment

This week was all about 'ground truthing' classification maps and evaluating the overall accuracy of the classes assigned.  Using the Land Use / Land Cover map created for Module 3 we 'ground truthed' our own classifications.  Overall my map from last week was 67% accurate... ouch!

The above map had been 'ground truthed' for accuracy - overall accuracy was found to be 67%.
Since physically visiting Pascagoula, Mississippi was not an option for me (or most of my classmates) we used Google Maps as our higher grade dataset, then visually compared pre-selected sample locations against Google Maps. 

The sample locations were derived using a stratified random scheme, with at least 2 points per class type.  Most classes had 3 points, and two classes (which did not have very large representation on the map) had only 1 point.  The largest class contained 4 sample points.  Sample locations were plotted using a program called the Sampling Design Tool.  The program is available as a download through ESRI and was created by NOAA's Biogeography Branch.

After checking Google Earth against the sample point locations on my original LULC map, I have some basic observations:

  1. My original cropland designation apparently was just grass mowed with a distinctive pattern... I'm still not buying it, and believe instead that the land use has changed over the years.
  2. My beach was apparently someone's home - they live right on the water and seem to have little to no landscaping.  That was not readily apparent at my MMU of 1:4,000.
  3. I should have set 3 sample points per class, and then manually messed with their location.  Many of the sample points were bunched up for some reason.  Also, classifications that did not cover as much map space should have had at least 1 sample point taken away.  But the largest classes didn't really need to absorb those 'excess' sample points.

Monday, September 21, 2015

Lab 4 - Building Networks

Results from running a route on a network with historic traffic trends added.
This week's lab had us building a road network with the ArcGIS Network Analyst extension.  The road network was built twice - once with historic traffic pattern data added, and once without. 

The additional functionality was added during the network building process by electing to model traffic patterns.  Associated with the roads dataset is a set of two tables that contain historic traffic pattern data, with such information as what percentage of normal (ideal) driving speeds a particular road segment has at certain times of day, for every day of the week.  Once this data was linked to the network the overall results became more realistic. 

Not shown above are the various results obtained from making minor changes to the route start and stop times, and on what day of the week the route was created for.  The resulting route travel times changed by only a few minutes... and surprisingly so did the overall distance traveled.  These route changes were so small that I could not detect them on my network, only on the route properties screen (such as in the example shown above).

Thursday, September 17, 2015

Lab 3 - Land Use / Land Cover Classification Mapping

This week we got to try our hand at classifying an aerial photograph.  Using a Land Use Land Cover classification system, which was developed in 1976 by Anderson et al. for the USGS, we classed a single aerial photo to the second level.  My minimum mapping unit (MMU) was at 1:4,000. 

Aerial map showing Level II Land Use and Land Cover Classification.
What does all this mean, exactly?  To start, land use is a bit different from land cover.  Land use shows human-based uses of the landscape (urban areas, agriculture), and land cover shows primarily natural settings (forests, water).  To classify a map one first needs to differentiate between farmland, urban areas, forested areas, etc.  That is a primary classification level.  To map something at the second classification level is to specify if, for example, the areas within an urban area are for residential use versus an industrial use.

The map above shows various classifications within a small section of Pascagoula, Mississippi.  For consistency I had digitized my classification polygons at the 1:4,000 level only.  While I may have zoomed in or out to double check on a classification type or my overall location on the aerial photo, when I digitized an area it was always at 1:4,000 (also known as my MMU).

Completing this map was a bit rough at times - occasionally I felt like I was adding to much detail, and other times like I wasn't adding enough.  Since a Level II classification is meant to be a bit coarse perhaps my biggest lesson was in learning to let go of the details!  For example, a high-tension wire crosses through the lower left of the aerial photo.  This was not mapped in mainly because to do so would have been very difficult given the level of detail that it would require... if my MMU was a bit larger then perhaps it would have been possible, however I would also probably still be working on this map!

Reference:
Anderson, J.R., and E. E. Hardy, J.T. Roach, R. E. Witmer
1976     A Land Use and Land Cover Classification System for Use with Remote Sensor Data.  Geological Survey Professional Paper 964.  United States Government Printing Office, Washington D. C.

Monday, September 14, 2015

Lab 3 - Determining the Quality of Road Networks

One way to measure the quality of a road network is to evaluate its overall completeness.  The idea behind this is that the more roads mapped within a given network, the greater the likelihood that the network has better coverage of a given area.  This was the focus of our lab this week.

Do note, however, that just because a network has more coverage does not necessarily make it more spatially accurate... those lines still need to be in the right place!  Our lab focused only on comparing the completeness of one road network against another for the same area - testing the spatial accuracy of a road network using points was covered last week (Lab 2).

Technical Notes

The first comparison metric is exactly what one might think: we totaled the collective line segments lengths per road network, and compared the results.  At first glance the TIGER Roads data is more complete than the Jackson Co. street centerlines data.

After determining these lengths we then needed to break down just how complete each road network was per grid cell.  We overlaid a grid (a series of square polygons) covering the whole of Jackson County, then split up the road network polylines by grid cell.  This was done using the Intersect (analysis) tool. 

Once the road segments were separated their respective lengths per grid cell were then updated using the Calculate Geometry tool.  The grid cell data was then joined to the road segments - this made it easier to obtain the overall road length totals per grid cell (per road network). 

The TIGER Roads were also shown to be more complete in terms of overall length per grid cell... but the Jackson County street centerline data is more complete in more grid cells than the TIGER Roads data.  The results are depicted in the map below.  A choropleth map using (Jenks) Natural Breaks was created, and the results are explained in terms of percentage variance for the Jackson County street centerline network from the TIGER roads data.



The final result, explained in terms of how it relates to the Jackson Co. street centerline data.

Tuesday, September 8, 2015

Lab 2 - Visual Interpretation


This week’s lab focused on the elements of aerial photo interpretation.  To accomplish this we viewed two separate photos, then identified various elements within each. 

Map 1 - Comparing tone and texture.

The first map is a study in texture and tones.  We were to identify various tones (ranging from very dark to very light) and various textures (ranging from very fine to very coarse).  I found the tones to be a bit of a challenge, as to me the gradation of very dark to dark was a bit subjective.  Never fear, I was able to tell the basic difference between light and dark!
 
Map 2 - Picking out features based on specific criteria.

The second map required that we identify features within the photo that correspond to a specific criteria: association, shadows, shape and size, and pattern.  It was interesting to see just how many findings based on association I was able to make… I didn’t map all of them in as the assignment only called for two examples, but there are quite a bit as can be seen above.

Monday, September 7, 2015

Lab 2 - Determining Quality of Road Networks

This week's lab continued in the theme of spatial accuracy and data quality.  Using the National Standard for Spatial Accuracy (NSSDA) statistics we compared the accuracy of two street datasets in Albuquerque, New Mexico.

Street test point locations in Albuquerque, New Mexico.

The first data set consisted of streets data provided by the City of Albuquerque - this was our 'truth' layer.  The second data set was a portion of the USA Streets layer provided by ESRI.  Both data sets were compared at 55 different test point locations.  Test point locations were selected using an ArcGIS Desktop Add-In called the 'Sampling Design Tool'.  These sample locations were then moved to the nearest four- or three-way intersection on the City of Albuquerque data.  Sample locations were discarded if the closest City of Albuquerque streets intersection did not also have a corresponding ESRI USA Streets intersection (the ESRI USA Streets layer was not as complete as that provided by the City of Albuquerque).

The reference layer was then hand digitized using aerial orthophotos of the study area (outlined in blue above). All reference points were taken in what can be considered the 'centerline' of the street, as viewed in the orthophoto image.  The two data sets were then compared to this reference location using an Excel table to compute the Euclidean distance difference between the reference x and y locations and the streets layer under comparison.  After following the NSSDA worksheet (the details of which can be viewed here: https://www.fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/part3/chapter3) the results were as follows:


USA Streets Positional Accuracy:  Using the National Standard for Spatial Data Accuracy, the data set tested 264.7 feet horizontal accuracy at 95% confidence level.

Albuquerque Streets Positional Accuracy: Using the National Standard for Spatial Data Accuracy, the data set tested 51.5 feet horizontal accuracy at 95% confidence level.


Sunday, August 30, 2015

Lab 1 - Calculating Metrics for Spatial Data Quality

The first lab covered how to calculate error in x, y data.  We specifically examined how 50 GPS points taken at the same location related to each other as well as to the 'true' point location, otherwise known as the reference point.  Our goal was to calculate the amount of accuracy and precision our sample of 50 GPS points has.

Accuracy represents the absence of error, or how close a point is to any given reference or 'true' point location.  Precision represents variance around a given point location, or how many times the same result occurs around a given point location.

Map layout of the GPS points in relation to the average point location.
The above image shows the 50 sample GPS points in relation to their average point location.  The rings around the average point location represent what percentile the points located around the average location are in - this is a measure of the overall precision of the GPS points.  So, roughly 50% of the points are 2.9 meters from the average point location, 68% of the points are 4.4 meters from the average point location, and 95% of the points are 14.8 meters from the average point location. 

As shown above, the sample GPS points were not very accurate, nor were they very precise.  The points are scattered all around the average point location, and while 68% of the points are within 4.4 meters of the average that level of accuracy can be considered too course if one requires sub-meter accuracy for their project.

Friday, August 7, 2015

Module 11 - Sharing Tools

The final lab of the semester was short and sweet - we created parameter help messages associated with a script tool, then embedded the main python script within the tool in order make it easier to distribute.  To keep our scripts 'safe' we password protected the embedded script - which means that only those who know the password can see or modify the python script.

The script tool that we modified does the following: creates a series of random points within an extent as defined by the input feature class, then creates a buffer around the randomly created points.  The random points are offset from each other by a certain distance, as defined by the tool end user.  A screenshot of the tool dialog box and output results is shown below.

Tool dialog box with custom parameter help messages on the left, and the tool results on the right.

Parting Thoughts...

This course has proved to be very useful... I'm still in shock over being able to actually create a custom script tool that actually works!  Learning the basics of how to code in Python has been very cool and I definitely think the lessons 'took'.  Even if I never get a chance to write another custom tool in my life, just by being able to understand what all those code examples mean on the ArcGIS Help page will come in handy.  In a strange way I think I have a better understanding of how most tools run now that I can get the gist of their associated code.

Sunday, July 26, 2015

Discussion Post 2: Using the Geographical Collocates Tool – a custom tool for text based spatial analysis



The article I selected for this discussion topic is entitled Automatically Analyzing Large Texts in a GIS Environment: The Registrar General's Reports and Cholera in the 19th Century.  The researchers use techniques developed within the fields of Natural Language Processing (a sub-field within computer science that focuses on automating human language analysis via computer) and Corpus Linguistics (a sub-field within linguistics that examines large amounts of texts for linguistic relationships)  to help shape the way their custom tool, called Geographical Collocates Tool, analyzes large bodies of digitized texts.  The tool identifies place-names associated with a specific topic – in the paper example the focus was on cholera, diarrhea, and dysentery as documented within the General Registrar reports for the years 1840 – 1880.

The tool works as follows: one first defines what words or phrases the tool should look for, and then defines how 'far' the tool needs to look within the text to find an associated place-name.  This can be as far away as an entire paragraph, within the same sentence, or only up to five words away (just to give a few examples as described within the article).  The results of this tool is a database of word associations, locations within the text document (for more in-depth human review), place-names, and locations in lat./long. for the associated place names.  This of course requires a bit of geoparsing before running the tool – as was the case for the examined dataset within the article.  The next step involves running a series of fairly complex statistical analyses on the database results – which requires a more in-depth discussion than what I'm prepared to give here. 

The main take-away for me was the use of collocation to group the results of their tool, and the idea that while not every place-name/word proximity association is meaningful if the pattern is repeated often enough it becomes statistically significant (p.300).   The overall analysis results are also fascinating –outbreaks were numerous in the 1840s but dropped off by the 1870s (showing the discovery of the link between sanitation and diseases, and the implementation of better public sanitation).  The analysis also showed a bit of a policy bias – London had the greatest public and governmental focus owing to the raw counts of deaths related to the outbreaks but other cities, particularly Methyr Tydfil in Wales, had the highest mortality rates in relation to their overall population.  The results also showed a spike in 1868 of the disease – which is because a disease history report covering the years 1831 to 1868 had been published that year (and so correctly showed up as a statistically significant spike within the analysis). 

Essentially, this article highlights an effective and a relatively accurate way to analyze large amounts of text (without spending years doing so), to find and analyze spatial patterns based on specific topics, and a completely new way to approach historic documents and to frame associated research questions. 

Reference:
Murrieta-Flores, P., Baron, A., Gregory, I., Hardie, A., and Rayson, P.  (2015)  Automatically Analyzing Large Texts in a GIS Environment: The Registrar General's Reports and Cholera in the 19th Century.  Transactions in GIS, 19(2): 296-320.