Use thisgetwd() to check working directory. Will most likely be whatever folder your Rmd file is in. If you need to change it for some reason use setwd()

Data Wrangling

Load in data and merge datasets:

Load in each years .csv files containing raw cover data

Eliminates empty rows separating data that we labeled “drop” in the date column
Changes date column from being read as a character to date

raw_2018 <- read.csv("../Data/LTER_89MAT_raw_cover_2018.csv", header = TRUE) %>%
  filter(date != "drop" & date != "drop ")%>%
  mutate(date = as.Date(date, "%m/%d/%Y"))

raw_2019 <- read.csv("../Data/LTER_89MAT_raw_cover_2019.csv", header = TRUE)%>%
  filter(date != "drop" & date != "drop ")%>%
  mutate(date = as.Date(date, "%m/%d/%Y"))

raw_2021 <- read.csv("../Data/LTER_89MAT_raw_cover_2021.csv", header = TRUE)%>%
  filter(date != "drop" & date != "drop ")%>%
  mutate(date = as.Date(date, "%m/%d/%Y"))

unique(raw_2019$treatment)

## [1] "EXNPNF" "CT2"    "GHNP"   "GHCT"   "NP"

unique(raw_2021$treatment)

## [1] "CT2"    "EXC-NP" "GHCT"   "GHNP"   "NP"

Combine all years together: MAKE SURE ALL HEADINGS MATCH

MyMerge       <- function(x, y){
  df            <- merge(x, y, all = TRUE)
  rownames(df)  <- df$Row.names
  df$Row.names  <- NULL
  return(df)
}
raw_orignial           <- Reduce(MyMerge, list(raw_2018, raw_2019, raw_2021))

LOGIC CHECK - Do the number of observation in raw_original equal to the sum of the observations in the data sets you combined?

YES -> Great continue on
NO -> Data is likely being dropped and you need to figure out why. (Could be due to column names being slightly different)

Transform from wide to long format

raw_long <- raw_orignial %>%
  pivot_longer(cols = -c(date, year, region, site, treatment, block, species), names_to = "quad_num", values_to = "cover")

LOGIC CHECK - Do the number of observation in raw_long equal to the number of observations in theraw_originalX8 ?

YES -> Great continue on
NO -> Data is likely being dropped and you need to figure out why.

Remove any data that you will not need for the analysis

In this case we can remove unknowns, vole activity, tussock number…ect.

#Returns a list of unique species names. Use this to see what you want to eliminate. Rerun after you remove data to make sure it worked. 
unique(raw_long$species)

##  [1] "And pol"               "bare"                  "Bet nan"              
##  [4] "Car big"               "Cas tet"               "Emp nig"              
##  [7] "Eri vag"               "frost boil"            "Led pal"              
## [10] "lichen"                "litter"                "moss"                 
## [13] "Ped lap"               "Pol bis"               "Rub cha"              
## [16] "Sal pul"               "St. D. Bet."           "Vac uli"              
## [19] "Vac vit"               "Mushrooms"             "grass"                
## [22] "chopped vole litter "  "latrienes (%)"         "Sampling"             
## [25] "Severed vole litter"   "Trampling"             "tussock #"            
## [28] "vole hole (#)"         "vole trail (%)"        "Winter kill"          
## [31] "Other dead"            "Structure"             "Trample"              
## [34] "other S.D"             "St.D. other"           "Unk.N2"               
## [37] "Calcan"                "Car big/other"         "Unk.1"                
## [40] "# of shrooms"          "Grass"                 "Other S.D."           
## [43] "Green House Structure" "sampling"              "tarp"                 
## [46] "trampling"             "GH structure"          "sampled "             
## [49] "trampled"              "trampled "             "Distructive"          
## [52] "Smapled"               "Sampled"               "St. D. other"         
## [55] "Trampled"              "# TUSS."               "AND POL"              
## [58] "ARC ALP"               "BARE GR."              "BET NAN"              
## [61] "CAR BIG"               "CAS TET"               "DEAD BET"             
## [64] "DEAD EV."              "EMP NIG"               "ERI VAG"              
## [67] "EV. LITTER"            "FR BOIL"               "GRASS ex."            
## [70] "LED PAL"               "LICHEN"                "LITTER"               
## [73] "MOSS"                  "PED LAP"               "PET FRI"              
## [76] "POL BIS"               "RUB CHA"               "SAL PUL"              
## [79] "SAL RET"               "VAC ULI"               "VAC VIT"

#Remove data you don't need/want
raw_long <- raw_long %>%
  filter(species != "Unk.N2" & species != "vole hole (#)" & species != "chopped vole litter " & species != "tussock #" & species != "sampled " & species != "latrienes (%)" & species != "vole trail (%)" & species != "Severed vole litter" & species != "trampled " & species != "tarp" & species != "trampled" & species != "sampled" & species != "trampling" & species != "Unk. 1" & species != "Green House Structure" & species != "GH structure" & species != "Distructive" & species != "Trampled" & species != "Mushrooms" & species != "Sampling" & species != "# of shrooms" & species != "sampling" & species != "Smapled" & species != "# TUSS." & species != "Structure" & species != "Trample" & species != "Sampled" & species != "Trampling" & species != "Unk.1")

Fix naming convention errors

Check unique vales in each column to make sure that there are not naming errors

unique(raw_long$species)

If there are mistakes then rename them using the code below and recheck unique values again to make sure the recode worked. make sure you didn’t loose any observations

Examples of naming convention corrections:

Changed species names in all caps to 1st letter capitalized and everything else lower case (EX: “BET NAN” = “Bet nan”)
Standing dead ID to species were renames to just “Std”
Dead ID to species were renames to just “Dead”
Litter ID to species were renames to just “litter”

#fix naming convention errors in treatment names 
raw_long$treatment <- raw_long$treatment %>% 
  recode("EXC-NP" = "EXNPNF")

#fix naming convention errors in species names 
raw_long$species <- raw_long$species %>% 
  recode("Other dead" = "St D", "Other S.D." = "St D", "GRASS ex." = "grass", "POL BIS" = "Pol bis", "other S.D" = "St D", "St. D. other" = "St D", "DEAD BET" = "St D Bet", "LED PAL" = "Led pal", "RUB CHA" = "Rub cha", "St. D. Bet." = "St D Bet", "St.D. other" = "St D", "AND POL" = "And pol", "DEAD EV." = "litter", "LICHEN" = "lichen", "SAL PUL" = "Sal pul", "Calcan" = "Cal can", "ARC ALP" = "Arc alp", "EMP NIG" = "Emp nig", "LITTER" = "litter", "SAL RET"="Sal ret", "Car big/other" = "grass", "BARE GR." = "bare", "ERI VAG" = "Eri vag", "MOSS" = "moss", "VAC ULI" = "Vac uli", "BET NAN" = "Bet nan", "EV. LITTER" = "litter", "PED LAP" = "Ped lap", "VAC VIT" = "Vac vit", "Grass" = "grass", "CAR BIG" = "Car big", "FR BOIL" = "Fr boil", "PET FRI"= "Pet fri", "CAS TET" = "Cas tet", "Winter kill" = "Win kill", "frost boil" = "Fr boil")

Sum functional cover across species within quadrats. This likely wont change the relative cover for most species unless they were listed more than once in a quadrat but it’s good to do to insure accuracy after fixing naming conventions.

The column “n” tells you the number of observations the data was summed across. For most species it will likely just be 1. However for things originally identified to species that we changed to a broader category (Ex: “EV litter” became “litter”) it might report that it summed across 2 observations because it combined the values originally reported separately.

raw_clean<- (raw_long) %>% 
  group_by(date, year, region, site, treatment, block, species, quad_num) %>% 
  summarise(cover = sum(cover, na.rm = TRUE), n = n())

## `summarise()` has grouped output by 'date', 'year', 'region', 'site', 'treatment', 'block', 'species'. You can override using the `.groups` argument.

Relativize cover

First we need to sum cover values across all species within a quadrat

quad_sum <- (raw_clean) %>% 
  group_by(date, year, region, site, treatment, block, quad_num) %>%
  summarise(sum_quad = sum(cover)) %>%
  ungroup()

## `summarise()` has grouped output by 'date', 'year', 'region', 'site', 'treatment', 'block'. You can override using the `.groups` argument.

Then we join the new table (that has the sum of all the species cover for a quadrat), with original table

cover_join <- left_join(raw_clean, quad_sum, by= c("date", "year", "region", "site", "treatment", "block", "quad_num"))

Then we divide each cover value for a species in a quadrat by the sum of all cover values in that quadrat to relativize the values. We will also drop the columns we don’t need anymore.

rel_cov_clean <- cover_join %>%
  mutate(relcov = cover/sum_quad) %>%
  select(-sum_quad, -cover)

LOGIC CHECK: # of observations should be the same as you had in raw_clean

unique(rel_cov_clean$treatment)

## [1] "CT2"    "NP"     "GHCT"   "GHNP"   "EXNPNF"

89MAT_relcov

Nicole