Wednesday, December 9, 2015

Final Project

For our final project assignment we were given two basic choices: either apply what we've learned to a pre-fab project, or do the same to one we've created on our own.  I went with the latter option mainly because I was curious as to how the 2013 Rim Fire has affected the overall forest health in the area.  Given that the state of California is currently in a severe drought I suspected that these effects would also be highly visible on Landsat imagery.

Unsupervised classification of my study area.  MMU was 1:100,000 so take this with a grain of salt.

To start, I'd like to mention that I'm not a botanist, so a lot of the plant terminology and spectral analysis discussion mentioned throughout the course was something I've only recently picked up.  Before this class I had no idea what a NDVI was, or even how to tell one tree from another when viewing various multi-spectral data (I still can't, unless I've been given pointers on what to look for first!).  So in the spirit of applying all that I'd learned over this course I focused on a project that required the classification of healthy vs. non-healthy vegetation.

The first image shows a very rough classification of healthy vs. non-healthy vegetation.  The MMU was at 1:100,000, which at first I had done for time saving purposes (mostly to not go crazy during the analysis portion).  I now feel that I made the right call, since the study area is rather large, and I was trying to quantify general trends.  For something more in-depth, perhaps a 1:50,000 or larger scale should be used.

Comparison view of three years shown across two different band combinations.

The comparison of the multi-spectral imagery was perhaps the most exciting part of this project, and also the most enlightening.  For example, on the bottom left of the image above you'll see a map inset with a lot of bright green - that's healthy (fast-growing) vegetation.  The image in the middle was taken during the Rim Fire, and one can see how dark the remaining vegetation had become.  Part of this could be due to the change of Landsat satellites used - 2010 marked the wane of the Landsat TM 5, and since 2012 Landsat 8 has been generating imagery for the United States.  With the change of satellites marked a change in the number of bands, which is why I've listed two band combinations on the map.

The 2015 imagery had smoke cover... but the website said no cloud cover!  Which I suppose is technically true.  The smoke made interpretation of the image very difficult, although if nothing else the image shows how pollutants in the air can be visualized.


*Originally published December 9, 2015.  Updated on 2/27/2017 to repair image links.

Monday, December 7, 2015

Week 15 - Dasymetric Mapping

This lab marks our final lab of the semester.  In it we covered dasymetric mapping, which is a mapping style that has been around since the 1800s.  This technique involves using ancillary data such as land cover information to map where a given population is more likely to reside, based on various attributes of the ancillary data.  Dasymetric mapping is used when one wishes to extrapolate population information from an enumeration unit to another category, such as land cover cells for urban areas.  Since this type of mapping is an estimate, quite a bit of error can be introduced into the estimated results - particularly since one is estimating values from two different areas that do not cover the same spatial area.  Error checking is key to staying on track with dasymetric mapping.

For our lab we compared an areal weighting technique and a dasymetric mapping technique.  For the areal weighting we estimated the amount of the school aged population per a given high school zone, but that was located outside of areas covered in water. The theory here is that the population could reside anywhere within the census tract as it intersected with the high school zone, as long as that area was outside of a water polygon. 

View of impervious areas (in red) as they relate to census tract and high school zones.

The view above shows the dasymetric mapping result of our lab.  Instead of using land cover information for our ancillary data we instead used a measure of imperviousness.  The impervious areas are shown above in red; gray areas indicate zero imperviousness.  FYI, imperviousness indicates a built-up area (an impermeable area), and is a better measure of determining where people are more likely to reside than using land cover data alone.

The goal was to determine how many school aged children reside in the impervious areas per high school zone.  Each high school area (shown above bounded in dark gray) contains census tract data, but as the view shows the two are not spatially congruent.  This also holds true with the impervious areas, which are depicted above in raster format.

To make this all work I ended up using the Zonal Statistics to Table to determine the amount of impervious areas per census tracts.  This operation was completed first with my census tract and impervious data, and then again using an intersect of the census tract and high school zone information.  New fields were added to each result to account for a 'before' impervious area and an 'after' impervious area.  This was important because to ultimately determine the population of school age children per high school zone I used the following calculation: school aged children per high school = total school aged children * ('after' impervious area / 'before' impervious area).

The final result, after error checking against a reference population, showed that only 10% of the population was allocated incorrectly.  While not perfect it indicates a slight improvement over the areal weighting technique, which resulted in 11% of the population having been allocated incorrectly. 

*Originally published on December 17, 2015.  Updated on 2/27/2017 to repair image links.

Friday, December 4, 2015

GIS Student Portfolio

For our final internship seminar assignment we created a portfolio of our GIS related work.  My portfolio contains project examples that were created throughout the course of the entire GIS Certificate program, and can be viewed here.  An audio discussion about my portfolio content can be heard here

The portfolio also includes some examples from my GIS internship, which I'm happy about since many of these same map products were also quite time consuming to create!  The assignment helped show me how far I've actually come within this program.  It was kind of fun (and sometimes also cringe-worthy) to review old projects, and see how my overall map style has evolved. 

Overall I found this assignment to be very useful, and I will be incorporating this into my set of job search materials.  This includes adding a revised version of this portfolio to my LinkedIn page.  I realize I could have done that for this assignment - but then I wouldn't have had a handy paper copy to take with me to interviews!


Tuesday, December 1, 2015

Lab 14 - Spatial Data Aggregation

This week's lab focused on the modifiable areal unit problem (MAUP) and ways to identify it.  One very well known MAUP issue involves political districts and the practice known as gerrymandering.  Gerrymandering essentially represents a zonal effect of MAUP in that polygons which represent voting districts are drawn in such a way as to favor one political party or another... the result rarely very much to do with population pressures or the like.  Usually a gerrymandered district will appear very compact, yet also elongated and irregular in shape.  It's a polygon that looks like it was drawn as if the cherry-pick the residents who will fall within that particular district.

Example of a congressional district that is very compact - a possible sign of gerrymandering.

Another gerrymandering test involved viewing how well a congressional district represented its community.  Ideally the district would not break up a county - for example, ideally a county would have the same district, not multiple congressional districts.

*Originally published on December 1, 2015.  Updated on 2/27/2017 to repair image links.

Friday, November 27, 2015

GIS Day

GIS Day was on Nov. 18, however I was working that day (and there weren't any planned activities at my place of employment).  So instead I ended up celebrating GIS Day this past Monday (11/23) at work with the official unveiling of one of my internship deliverables.

As part of my internship I had updated a 'how-to' guide explaining GIS and GPS protocol for the Stanislaus NF (STF) Heritage Staff.  The guide takes one through the following steps: create a working data storage file structure, how to collect GPS data & download it, how to append the GPS data to a working copy of the STF Heritage geodatabase, how to digitize surveyed areas, what the preferred attribute values are for the Heritage data, and how (as well as when) to submit the final product to the STF GIS Coordinator.  The guide will hopefully standardize the spatial data collection methods for the forest, and also help those who may not have very strong GIS skills complete basic data management tasks.

Sample page on how to find & use the Append tool.

Towards the latter end, I 'field-tested' my guide on two employees who have had little to no GIS experience.  Their current job duties have given them the GPS collection and download experience, but the GIS side of things was lacking because technically that's not their job.  Yet in order to move ahead our profession basic GIS skills are required... but opportunities to learn them can be thin on the ground (but much appreciated whenever they come along).  My co-workers used the guide, and said that it was easy to follow and understand (which I was seriously wondering about, because the append part wasn't so easy to write... I have a new appreciation for our professors and TAs who put together our labs with all those screenshots!).  Hopefully future new employees at STF also find it easy to use and understand, and can navigate their way through data collection at STF with confidence.

*Originally published on November 27, 2015.  Updated on 2/27/2017 to repair image links.

Monday, November 23, 2015

Lab 13 - Effects of Scale

This week's lab marked the start of three-part series dealing with issues of scale and resolution.  For the final portion of the lab we compared SRTM data and LIDAR data, both of which had been re-sampled to a 90 m cell size.

Comparison of SRTM data and LIDAR data, both at 90 m resolution.

The resampled DEMs and their derivatives (slope and aspect rasters) were compared visually and the overall range of DEM elevation values and average slope were also discussed.  At the start the LIDAR DEM appears to have slightly more detail than the SRTM DEM.  This appearance becomes very pronounced in the derivative products, with the LIDAR based slope and aspect rasters each containing so much detail that they appear almost pixelated.

The SRTM data has less detail than the LIDAR dataset simply because the data was collected from a satellite - it will never be able to capture the amount of data LIDAR can simply because it is too far removed to be able to do so.   LIDAR data is normally collected via airplane, making it much closer to the source that it is remotely sensing. 

*Originally published on November 23, 2015.  Updated on 2/27/2017 to repair image links.

Tuesday, November 17, 2015

Lab 12 - Geographically Weighted Regression

This week's lab wrapped up a 3-week exploration into the use of regression; the focus for this week was specifically using geographically weighted regression. 

Geographically weighted regression (GWR) is different from using a regular regression method like ordinary least squares (OLS) in that it takes into account the spatial difference between the variables, as well as the variables themselves.  Each set of variables is then weighted according to its position near or away from the other variables (and, incidentally, the nearer something is the more likely it is to have a higher weight - because it's more likely to be related to the other variables).

For the final part of our lab we had to compare an OLS model with a GWR model - all using the same variable inputs, of course.  Using the rate of hit-and-run counts as my dependent variable I then compared four other neighborhood statistics (such as percentage of renter occupied units) against the hit-and-run crime rate.

Unfortunately in my case I did not observe much of a change between the two regression models, although I have a fairly good idea of why that may have been - I had two variables that probably were too similar to each other and so one should have been dropped (a variable for the percentage of renter occupied units and a separate variable for median home value).  Neither of these variables set off any colinearity alarms during the OLS stage (the VIF statistic provided with the ArcGIS OLS results would have shown me that), but something was clearly amiss.  When comparing my AIC, Adjusted R-square, and z-score results between the GWR and the OLS models it was clear that any changes between the two were not very significant.  Considering my overall low Adjusted R-square values between the two models (both were at 0.189) it's back to the drawing board in terms of choosing variables for my model.