class: center, middle, inverse, title-slide # Predictive Mapping with R ## Part I: Intro & Background ### Martin Hinz ### 4.2.2019 --- class: inverse, center, middle # Background --- ## Predictive Modelling/Mapping - First considerations for the prediction with the help of environmental data and statistical analysis methods: Human Geography (Haggett 1968) -- - Until the end of the 1980s, research in this field concentrated in the United States of America -- - In archaeology for the first time summarizing theory and methodology of modeling Judge and Sebastian (1988) -- - This development is related to - the beginning of natural space analyses with geoinformation systems - the development of processual archaeology - The integration of heritage management into archaeology --- ## Basic Idea .pull-left[ - You take a bunch of known sites - You take a bunch of environmental data - You add some ~~magick~~ statistics - You get a prediction for unknown sites ] .pull-right[ <img src="images/scheme_sites.png" width="150px" /> <img src="images/scheme_environmental_data.png" width="150px" /> <img src="images/playmobil_potter.jpg" width="150px" /> <img src="images/scheme_predmap.png" width="150px" /> ] .caption[ (Mostly) Mennenga 2016. ] --- ## Specific archaeological problems .pull-left[ - sparse data - noisy data - no negative evidence - we might know, where the sites are, but not, where they not are - 'biased' preservation ] .pull-right[ <img src="images/scheme_arch_preservation.png" width="400px" /> ] .caption[ Mennenga 2016. ] --- ## Two flavours .pull-left[ ### Deductive <!-- --> .caption[ After Kamermans and Wansleeben (1999). ] - build on prior assumptions - data only for testing - Pro/Con - .green[testing straight forward] - .red[weakly fitted] ] .pull-right[ ### Inductive  .caption[ After Kamermans and Wansleeben (1999). ] - build on data (and prior assumptions) - data used for training & testing - Pro/Con - .green[fitted to the data] - .red[testing becomes an issue] ] --- ## Two ends determine strategy .pull-left[ ### Prognosis - **Were will we find archaeology?** - biased preservation is implicit - factors that prevent conservation have a direct negative impact on probability - negative evidence (non-sites) is less a problem - resulting model should be more conservative ] .pull-right[ ### Reconstruction - **Were was prehistoric activity?** - biased preservation has to be corrected - conservation-preventing factors indirectly (rather) positively influence probability (unknown unknowns) - negative evidence becomes important - resulting model should be more accurate ] --- ## Concentrating on inductive reconstruction Basically, inductive is ML! 1. You teach the computer what characterises sites 1. Than you let the computer evaluate the whole landscape ### Several approaches - (expert judgements) - simple Additive Models - Kriging - Cluster Analysis - Generalised Linear Modelling - Naive Bayesian - Support Vector Machines - Neuronal Networks --- class: inverse, center, middle # Getting practical --- ## Data ### Site data - consisting of location and, if relevant, classification .center[ |site | y| x|dating | |:--------------------------|--------:|--------:|:---------| |Seeberg, Burgäschisee-Süd | 47.15505| 7.666646|neolithic | |Sion, Petit Chasseur II | 46.22908| 7.359417|neolithic | |Yverdon | 46.76667| 6.633333|neolithic | |Yverdon, Avenue des Sports | 46.76667| 6.633333|neolithic | |Twann | 47.09420| 7.156992|neolithic | |Zürich, Kleine Hafner | 47.36610| 8.544198|neolithic | ] .caption[ Neolithic Sites in Switzerland from RADON. ] --- ## Data ### Site data - consisting of location and, if relevant, classification .center[
] .caption[ Neolithic Sites in Switzerland from RADON. ] --- ## Data ### Environmental data Not necessarily only purely 'natural environment', but also eg. second order attributes (eg. Settlement density). Guidelines: - Might it have been potentially significant for locational choices of prehistoric people? - Is it accessible at all? - Can it be transformed into spatial explicit and extensive data cover? - Can it be assumed that it (modern data) is indicative for the prehistoric situation? --- ## Data ### Environmental data .pull-left[ Possible options: - DEM and derived data - altitude, slope, aspect, tpi, ... - Soil data - Distance to water - Viewshed Analysis - Accessibilty and Least Cost Path networks - (modern) Land cover - Ressource Availability - ... ] .pull-right[ <img src="images/scheme_environmental_data.png" width="608" height="300px" /> ] .caption[ Mennenga 2016 ] On spot or in a wider catchment around known sites Continuous or discrete, metric or ordinal/nominal --- ## Data ### Negative evidence .pull-left[ Possible options: - Use real negative evidence - eg. linear projects - rarely available and trustworthy - use every location without recorded archaeology - .green[total coverage] - .red[will result in underestimation] - .red[computational expensive] - use a random selection - .red[limited coverage] - .red[will result in overestimation] - .green[computational less expensive] ] .pull-right[
.caption[ Random locations in Switzerland ] ] --- ## The ~~magick~~\* statistics We will use two approaches ### Generalised Linear Modelling `$$g(\mu_m) = \eta_m = \beta_{m,0} + X_1 \beta_{m,1} + \cdots + X_p \beta_{m,p} \\ \text{ where } \mu_m = \mathrm{P}(Y = m \mid Y \in \{1,m\} ). \,$$` ### Naive Bayesian Classifier `$$p(C_k \mid \mathbf{x}) = \frac{p(C_k) \ p(\mathbf{x} \mid C_k)}{p(\mathbf{x})} \,$$` .footnote[\[\*\] That is enough spells, we will not use equations from now on... ;-)] --- class: inverse, center, middle # Getting REALLY practical --- ## Setting the scene We will use several librarys. We will load the most important ones upfront. ```r library(sp) # library for spatial objects library(raster) # library for spatial raster data library(mapview) # nice little library for displaying dynamic maps library(magrittr) # library for using pipe syntax ``` ``` ## ## Attaching package: 'magrittr' ``` ``` ## The following object is masked from 'package:raster': ## ## extract ``` The warning here: - We will have to call some commands with explicitly mentioning the package --- ## Loading site data A dataset of 64 sites from the RADON Database have been prepared ```r sites <- read.csv("data/sites_neolithic.csv") head(sites) ``` ``` ## site y x ## 1 Seeberg, Burgäschisee-Süd 47.15505 7.666646 ## 2 Sion, Petit Chasseur II 46.22908 7.359417 ## 3 Yverdon 46.76667 6.633333 ## 4 Yverdon, Avenue des Sports 46.76667 6.633333 ## 5 Twann 47.09420 7.156992 ## 6 Zürich, Kleine Hafner 47.36610 8.544198 ``` --- ## \[Small advertisement block\] .red[Do not run] The data were extracted using c14bazaar (Clemens Schmid et al. 2018) .small[ ```r library(c14bazAAR) neolith <- get_RADON() %>% determine_country_by_coordinate() %>% dplyr::filter(country_coord == "Switzerland", sitetype == "settlement") head(neolith) ``` ``` ## sourcedb labnr c14age c14std c13val site ## 1 RADON B-116 4930 120 NA Seeberg, Burgäschisee-Süd ## 2 RADON B-126 4500 110 NA Seeberg, Burgäschisee-Süd ## 3 RADON B-2110 5130 100 NA Sion, Petit Chasseur II ## 4 RADON B-2111 5100 70 NA Sion, Petit Chasseur II ## 5 RADON B-2197 4060 100 NA Yverdon ## 6 RADON B-2204 4080 100 NA Yverdon ## sitetype feature ## 1 settlement Unterkante der Kulturschicht (Quadr. 33 A). ## 2 settlement Pfosten Nr. 977 (Haus 1). ## 3 settlement Schicht 14 der Grabung II. ## 4 settlement Schicht 14 der Grabung II. ## 5 settlement <NA> ## 6 settlement <NA> ## period culture material species country ## 1 <NA> Cortaillod wood Eiche (Quercus). Switzerland ## 2 <NA> Cortaillod wood Eiche (Quercus). Switzerland ## 3 Typ Petit-Chasseur Cortaillod <NA> <NA> Switzerland ## 4 Typ Petit-Chasseur Cortaillod <NA> <NA> Switzerland ## 5 <NA> Saône-Rhône <NA> <NA> Switzerland ## 6 <NA> Saône-Rhône <NA> <NA> Switzerland ## country_coord lat lon ## 1 Switzerland 47.15505 7.666646 ## 2 Switzerland 47.15505 7.666646 ## 3 Switzerland 46.22908 7.359417 ## 4 Switzerland 46.22908 7.359417 ## 5 Switzerland 46.76667 6.633333 ## 6 Switzerland 46.76667 6.633333 ## shortref ## 1 Müller-Beck 1961, 434; Müller-Beck/Oeschger 1967 ## 2 Breunig 1987, 199; Oeschger et al. 1959, 141 ## 3 Breunig 1987, 200; Gallay 1986, 50f. ## 4 Breunig 1987, 200; Gallay 1986, 50f. ## 5 Breunig 1987, 207 ## 6 Breunig 1987, 207 ``` ] https://github.com/ISAAKiel/c14bazAAR --- ## Making the data spatial We use sp as spatial library (Swiss Army Knife!) .small[ ```r # Define, which columns hold the coordinates coordinates(sites) <- ~x+y # Define the projection system proj4string(sites)<- CRS("+proj=longlat +ellps=WGS84 +datum=WGS84") sites ``` ``` ## class : SpatialPointsDataFrame ## features : 64 ## extent : 6.141915, 9.433333, 46.205, 47.75 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +ellps=WGS84 +datum=WGS84 +towgs84=0,0,0 ## variables : 1 ## names : site ## min values : Affoltern, Zwillikon-Weid ## max values : Zürich, Wollishofen ``` ] --- ## Displaying spatial data (aka a map) ### Interactive We use mapview ```r mapview(sites) ```
--- ### static We can simply plot ```r plot(sites) ``` <img src="presentation_files/presentation1/unnamed-chunk-17-1.png" width="100%" /> --- ### static, nicer We can use ggmap .small[ ```r library(ggmap) basemap <- get_stamenmap(sp::bbox(sites), zoom=7, maptype = "toner-lite", crop = F) ggmap(basemap) + geom_point(data=data.frame(sites), aes(x=x,y=y), col="red") ``` <img src="presentation_files/presentation1/unnamed-chunk-18-1.png" width="100%" /> ] --- ## Loading raster data A DEM dataset for Switzerland have been prepared .small[ ```r dem <- raster("data/dem_switzerland.grd") dem ``` ``` ## class : RasterLayer ## dimensions : 264, 564, 148896 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : 5.9, 10.6, 45.7, 47.9 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : /home/martin/r_projekte/pred_map_tut/data/dem_switzerland.grd ## names : CHE_msk_alt ## values : 190, 4393 (min, max) ``` ] --- ## \[Getting Raster Data\] .red[Do not run] The raster package holds a convenient function for getting spatial data: `getData()` .small[ ```r dem_raster <- raster::getData(name = "alt", country='CHE', path = "data/") dem_raster ``` ``` ## class : RasterLayer ## dimensions : 264, 564, 148896 (nrow, ncol, ncell) ## resolution : 0.008333333, 0.008333333 (x, y) ## extent : 5.9, 10.6, 45.7, 47.9 (xmin, xmax, ymin, ymax) ## coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 ## data source : /home/martin/r_projekte/pred_map_tut/data/CHE_msk_alt.grd ## names : CHE_msk_alt ## values : 190, 4393 (min, max) ``` ] - **alt**: aggregated altitude from SRTM Also available: - **GADM**: administrative areas - **SRTM**: refers to the hole-filled CGIAR-SRTM (90 m resolution) - **countries**: polygons for all countries - **worldclim**: climate and enviromental variables --- ## Displaying raster data ### Interactive We use mapview, again ```r mapview(dem) ```
--- ### static We can simply plot ```r plot(dem) ``` <img src="presentation_files/presentation1/unnamed-chunk-22-1.png" width="100%" /> --- ### static, nicer We can use ggmap .small[ ```r ggmap(basemap) + geom_raster(data=data.frame(raster::rasterToPoints(dem)), aes(x=x,y=y,fill = CHE_msk_alt), alpha=.5) + coord_cartesian() ``` <img src="presentation_files/presentation1/unnamed-chunk-23-1.png" width="400px" /> ] --- ## Reprojecting spatial data .tiny[ Currently, we are working in lat-lng space (WGS84). For spatial analysis it is better to use a projected (m based) system. We might use UTM, zone 32 might be optimal for Switzerland. For vector data: ```r sites_utm <- spTransform(sites, crs("+init=epsg:32632")) sites_utm ``` ``` ## class : SpatialPointsDataFrame ## features : 64 ## extent : 279522.6, 532625.5, 5118059, 5288558 (xmin, xmax, ymin, ymax) ## coord. ref. : +init=epsg:32632 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 ## variables : 1 ## names : site ## min values : Affoltern, Zwillikon-Weid ## max values : Zürich, Wollishofen ``` For raster data (DEM) ```r dem_utm <- projectRaster(dem, crs=crs("+init=epsg:32632")) dem_utm ``` ``` ## class : RasterLayer ## dimensions : 279, 589, 164331 (nrow, ncol, ncell) ## resolution : 636, 926 (x, y) ## extent : 254216.6, 628820.6, 5056114, 5314468 (xmin, xmax, ymin, ymax) ## coord. ref. : +init=epsg:32632 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0 ## data source : in memory ## names : CHE_msk_alt ## values : 190.4313, 4384.884 (min, max) ``` ] --- ## Revisualising projected data together .pull-left[ ```r plot(dem) plot(sites, pch=19, add=TRUE) ``` <!-- --> ] .pull-right[ ```r plot(dem_utm) plot(sites_utm, pch=19, add=TRUE) ``` <!-- --> ] --- ## Getting some random sample locations As negative evidence, we sample some non-sites. We use the DEM as sampling boundary. We can do that regular or random. We will use the latter! .tiny[ .pull-left[ Regular ```r nonsites <- sampleRegular(dem_utm, 1000, sp=T) plot(dem_utm) plot(nonsites, pch=19, add=TRUE) ``` <!-- --> ] .pull-right[ Random ```r nonsites <- sampleRandom(dem_utm, 1000, sp=T) plot(dem_utm) plot(nonsites, pch=19, add=TRUE) ``` <!-- --> ] ] --- ## Creating DEM derived data .tiny[ We will use slope, aspect and tpi as predictors. We can calculate them from the dem: ```r env_data <- terrain(dem_utm, opt = c('slope','aspect','tpi'), unit="degrees") mapview(env_data) ```
] --- ## Special treatment of aspect (circular data) .pull-left[ Aspect (angle) is a circular variable (359 is very close to 1) Options: * use sin/cos * make it a nominal variable with directions << we do that ] .pull-right[ <img src="images/windrose.png" width="150px" /> ] .tiny[ .pull-left[ ```r summary(env_data$aspect) ``` ``` ## aspect ## Min. 8.192446e-03 ## 1st Qu. 8.845337e+01 ## Median 1.800573e+02 ## 3rd Qu. 2.848638e+02 ## Max. 3.599996e+02 ## NA's 9.601400e+04 ``` ] .pull-right[ ```r env_data$aspect <- ceiling( (env_data$aspect + 360/8/2) / (360/8) ) env_data$aspect[env_data$aspect>8]<-1 summary(env_data$aspect) ``` ``` ## aspect ## Min. 1 ## 1st Qu. 2 ## Median 4 ## 3rd Qu. 7 ## Max. 8 ## NA's 96014 ``` ] ] --- ## Sampling informations from the environmental data Now that we have the environmental data, we might extract them at our sample locations (sites and nonsites). .tiny[ ```r sites <- raster::extract(env_data, sites_utm, sp=T) nonsites <- raster::extract(env_data, nonsites, sp=T) summary(sites) ``` ``` ## Object of class SpatialPointsDataFrame ## Coordinates: ## min max ## x 279522.6 532625.5 ## y 5118058.6 5288557.9 ## Is projected: TRUE ## proj4string : ## [+init=epsg:32632 +proj=utm +zone=32 +datum=WGS84 +units=m +no_defs ## +ellps=WGS84 +towgs84=0,0,0] ## Number of points: 64 ## Data attributes: ## site tpi slope ## Egolzwil 4 : 2 Min. :-115.797 Min. : 0.2622 ## Affoltern, Zwillikon-Weid: 1 1st Qu.: -24.308 1st Qu.: 1.4092 ## Alle, Noir Bois : 1 Median : -10.723 Median : 2.6403 ## Arbon, Bleiche 1 : 1 Mean : -19.282 Mean : 3.2753 ## Arbon, Bleiche 3 : 1 3rd Qu.: -6.752 3rd Qu.: 4.4285 ## Arbon, Bleiche 4 : 1 Max. : 20.795 Max. :16.6228 ## (Other) :57 ## aspect ## Min. :1.000 ## 1st Qu.:4.000 ## Median :5.500 ## Mean :5.297 ## 3rd Qu.:6.250 ## Max. :8.000 ## ``` ] --- ## Save the data for later use ```r save(sites, nonsites, env_data, file="data/pred_data.RData") ```