-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathPaper.Rmd
2620 lines (2433 loc) · 154 KB
/
Paper.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Copulas and their potential for ecology"
author: "Shyamolina Ghosh, Lawrence W. Sheppard, Mark T. Holder, Terrance D. Loecke, Philip C. Reid, James D. Bever, Daniel C. Reuman"
date: ""
fontsize: 12pt
geometry: "left=2.54cm,right=2.54cm,top=2.54cm,bottom=2.54cm"
output:
pdf_document:
number_sections: yes
keep_tex: yes
fig_caption: yes
includes:
in_header: head_maintext.sty
tables: True
link-citations: True
urlcolor : blue
indent : True
csl: ecology-letters.csl
bibliography: REF_ALL.bib
---
```{r setup_Paper, echo=F}
knitr::opts_chunk$set(echo = TRUE, fig.pos = "H")
options(scipen = 1, digits = 3) #This option round all numbers appeared in the inline r code upto specified digit
#seed<-101
#source("mtime.R") #A function needed for caching
```
\raggedright
\setlength\parindent{2em}
\setlength{\parskip}{6pt}
\noindent \emph{Affiliations:}
\noindent Ghosh, Sheppard, Bever: Department of Ecology and Evolutionary Biology and Kansas Biological Survey, University of Kansas, Lawrence, KS, 66045, USA
\noindent Holder: Department of Ecology and Evolutionary Biology and Biodiversity Institute, University of Kansas, Lawrence, KS, 66045, USA
\noindent Loecke: Environmental Studies Program and Kansas Biological Survey, University of Kansas, Lawrence, KS, 66047, USA
\noindent Reid: School of Biological & Marine Sciences, University of Plymouth, Drake Circus, Plymouth PL4 8AA, UK; Continuous Plankton Recorder Survey, The Marine Biological Association, The Laboratory, Citadel Hill, Plymouth PL1 2PB, UK
\noindent Reuman: Department of Ecology and Evolutionary Biology and Kansas Biological Survey, University of Kansas, Lawrence, KS, 66047, USA; Laboratory of Populations, Rockefeller University, 1230 York Ave, New York, NY, 10065, USA
<!--\noindent \emph{Possible alternative titles:}
Complex and informative dependencies propagate throughout ecology and can be revealed using copulas
Copulas reveal complex and informative dependencies propagating throughout ecology
Copulas and their importance for ecology
Copulas and their importance in ecology
Copulas and their importance throughout ecology
Copulas and their potential throughout ecology
Copulas and their importance
Copulas in ecology
Copulas
The importance of copulas
The importance of copulas in ecology-->
\noindent \emph{Correspondence:} Daniel Reuman, 2101 Constant Ave, Lawrence, KS, 66047, [email protected], 626 560 7084.
\noindent \emph{The number of words in the abstract:} 241
\newpage
\noindent \textbf{Abstract}
\noindent All branches of ecology study relationships among and between environmental and biological variables. However,
standard approaches to studying such relationships, based on correlation and regression,
provide only some of the complex information contained in the relationships. Other
statistical approaches exist that provide a complete description of relationships between variables,
based on the concept of the *copula*; they are applied in finance, neuroscience and elsewhere,
but rarely in ecology. We explore the concepts that underpin copulas and the potential
for those concepts to improve our understanding of ecology. We find that informative copula
structure in dependencies between variables is common across all the environmental,
species-trait, phenological, population, community, and ecosystem functioning datasets we
considered. Many datasets exhibited asymmetric tail associations, whereby two variables were more
strongly related in their left compared to right tails, or *vice versa*. We describe mechanisms by which
observed copula structure and tail associations can arise in ecological data, including a Moran-like
effect whereby dependence structures are inherited from environmental variables; and asymmetric or nonlinear influences of environments on ecological variables, such as
under Liebig's law of the minimum. We also describe consequences of copula structure for
ecological phenomena, including impacts
on extinction risk, Taylor's law, and the temporal stability of ecosystem services. By documenting the
importance of a complete description of dependence between variables, advancing conceptual frameworks, and
demonstrating a powerful approach, we encourage widespread use of copulas in ecology, which we
believe can benefit the discipline.
\noindent \emph{Keywords:} dependence, regression, correlation, copula, population, community, ecosystem functioning
\newpage
# Introduction\label{Intro}
All branches of ecology
study relationships among biological variables and relationships between
environmental and biological
variables. However, commonly used correlation and regression
approaches to studying
such relationships are limited, and
provide only a partial description of the relationship.
For instance, datasets showing different
relationships may have the same correlation coefficient
(Fig. \ref{fig_cop_pedag1}A-B). The variables of Fig. \ref{fig_cop_pedag1}A
(respectively, Fig. \ref{fig_cop_pedag1}B) are principally
related in the left (respectively, right) portions of their distributions,
an asymmetric pattern of association that can
have ecological significance, as discussed below, but that is not captured by
correlation. Although correlation is only one way to
study relationships between variables, other common metrics also
provide only partial information.
\begin{figure}[!h]
\begin{center}
\includegraphics[width=3.5in]{./Results/PedagogFig1.pdf}
\caption{(A-D) Bivariate datasets showing diverse relationships between
variables. Some of these datasets nevertheless have the same Pearson (P)
or Spearman (S) correlation coefficients. (C) and (D) show normalized ranks of (A), (B),
respectively. (E, F) Reflections
of (C), (D), respectively, about a vertical axis. Subscripts h and v
in the axis labels stand for "horizontal" and "vertical".\label{fig_cop_pedag1}}
\end{center}
\end{figure}
Well-developed approaches do exist, however, and are
applied widely in other fields,
that provide a description of the association
between two or more variables that is mathematically provably
complete [@nelsen2006_copula; @joe2014_dependence; @MaiScherer2017].
The approaches are based on
the concept of the *copula*. Copula approaches separate information in a bivariate
random variable into two parts: the information in the
marginal distributions (which says nothing about the association between the
variables), and the remaining information (which is solely about
the association). We here introduce some simple concepts, based on ranks,
that relate to copulas and give a conceptual
flavor of copula ideas to be introduced
formally in the "Background" section below (section \ref{Background}). Given
a sample $(x_i,y_i)$, $i=1,2,\ldots,n$ (e.g.,
Fig. \ref{fig_cop_pedag1}A-B), information about the structure of the association
between $x$ and $y$ can be separated from information contained
in the marginal distributions by
considering the plot of $u_i$ versus $v_i$, where
$u_i$ is the rank of $x_i$ in the set of $x_j$ ($j=1,\ldots,n$), divided by
$n+1$; and $v_i$ is defined in the same way but using the $y_i$. The rank
of the smallest element in a set is $1$. The $u_i$ and
$v_i$ are called *normalized ranks* of the $x_i$ and $y_i$; they relate to
the empirical cumulative distribution function of the $x_i$ and $y_i$,
respectively. We henceforth abbreviate "cumulative distribution function" as "cdf".
See, for instance,
Fig. \ref{fig_cop_pedag1}C-D, which show the normalized ranks of
A-B. Ranking makes the marginal distributions of
the component datasets uniform, isolating the
dependence structure. Dependence structure and marginals can then be studied
separately. Normalized rank plots such as Fig. \ref{fig_cop_pedag1}C-D
relate to copulas in a way to be described
(section \ref{Background}). Note that Pearson correlation, even though it is the most commonly
used measure of association, is modified by monotonic
transformation of the component variables (Appendix
\ref{SM-pedagog1_details}, Fig. \ref{SM-fig_transf_pearson_spearman}),
and therefore reflects not only dependence information, but also
information on the marginals [@Genest2007].
Spearman and Kendall correlations, being rank-based,
are not influenced by monotonic transformations of the variables.
They provide information solely on the dependence between those variables.
A main benefit of a copula approach is
that it can detect associations in the tails of distributions, and asymmetries of such associations.
Tail association (introduced formally in section
\ref{Background}) is association between extreme values of variables.
If smaller values of two positively associated variables are more
strongly associated than are larger values, the variables are said to
have stronger lower- or left-tail association than right- or upper-tail association
(Fig. \ref{fig_cop_pedag1}C);
and *vice versa* if larger values are more strongly associated
(Fig. \ref{fig_cop_pedag1}D). Datasets of the same
Spearman or Kendall correlation can have a range of tail associations
(Fig. \ref{SM-fig_cop_pedag}), which can be quantified (see sections on Background and
methods for Q1 below, sections \ref{Background} and \ref{Methods}).
Although copulas can be used to model the entire
complex dependence structure of data, we have found tail associations to
be a useful component of that structure, so tail association
is a main focus of this paper. We give four examples, below, which
illustrate some of the ecological meaning of tail associations.
The goal of this paper is to explore the potential for
applications of copulas in ecology and to
estimate to what extent ecological understanding may benefit from
using copulas. We provide an introduction
to copula concepts in section \ref{Background}.
Readers can find additional abbreviated [@Anderson2018; @Genest2007]
and comprehensive [@nelsen2006_copula; @joe2014_dependence; @MaiScherer2017]
introductions elsewhere. Copulas have already been used effectively
in a few studies to illuminate ecological phenomena
[@Valpine2014; @Anderson2018; @Popovic2019], but such usage is
still rare. Whereas @Anderson2018 provide a specific method, based
on copulas, for important problems of the
analysis of multivariate population count data,
we instead advocate more broadly for the application of copulas across ecology.
Our results suggest that environmental,
ecological and evolutionary processes
may commonly generate complex dependence structures, including asymmetric tail
associations, and that greater use of copulas can illuminate underlying processes.
We believe copula approaches are among the tools all ecologists
should be considering for analysis of their data in the 21st century.
\begin{figure}[!h]
\begin{center}
\includegraphics[width=7in]{./MasterFigure.pdf}
\caption{Summary and guide for analyses presented in the text. Middle boxes
correspond to ecological datasets for which Q1 was examined (in sections
\ref{Data}-\ref{Results}, see also Table \ref{tab_data_info}). Upper boxes
correspond to potential causes of non-normal (non-N) copula structure that
were examined (in sections \ref{Causes}-\ref{Results_Q2} of the text). Lower boxes
correspond to potential consequences we examined of non-normal copula structure for
ecological understanding and for applications (in sections
\ref{Consequences}-\ref{Results_Q3} of the text). Arrow labels (A-D for analyses pertaining
to causes, Q2; and X-Z for analyses pertaining to consequences, Q3)
link to locations in the text.\label{MasterFigure}}
\end{center}
\end{figure}
One way we may expect, *a priori*, a study of tail associations and
copula dependence structures
to better reflect ecological relationships relates to Liebig's law of the
minimum. Liebig's law is
the idea that growth is controlled not by total resources but by the resource which
is scarcest relative to organism needs.
If, for instance, the growth of a plant depends on soil
nitrogen, N, and other factors, a plot of growth rates versus soil N may
look more like Fig. \ref{fig_cop_pedag1}A or C than like
Fig. \ref{fig_cop_pedag1}B or D, i.e., the two variables may show
left-tail association: N
controls plant growth, producing a clear
relationship, only when it is limiting.
This aspect of the association would be visible in a rank-based
plot, but may not be revealed by correlation.
A second reason why we may expect detailed understanding of dependence structures
to benefit ecological
research is that prior work demonstrated, using copulas, complex
structure in the spatial dependence of environmental
variables [@Serinaldi2008; @She2018; @Goswami2018;
@li2013]. An environmental variable
measured through time in two locations may show strong tail associations between
the locations if
intense meteorological events are also widespread, as seems
frequently to be the case: extreme values are associated with intense events,
so happen in both places at the same time, whereas moderate values of the
environmental variable may be associated with local phenomena, which differ
between the locations.
Spatial dependence in an environmental variable
tends to beget spatial dependence
in fluctuations of populations or other ecosystem variables
(this is called \emph{spatial synchrony}) influenced by the environmental
variable. This is called the \emph{Moran effect}.
If asymmetric tail associations, or other complex dependence structure, is transmitted
from environmental to ecological variables, then we would expect complex dependence
structure and tail associations to be a common feature of the
spatial synchrony of population, community,
biogeochemical and other environmentally influenced ecological variables.
Synchrony attracts major interest in
ecology [@Liebhold2004; @Sheppard2016; @walteretal2017].
A third reason why we may expect an understanding of complex
dependence structures and tail associations to benefit
ecology is that such an approach may help illuminate
causal mechanisms between variables. If two species,
Sp1 and Sp2, are strong competitors, abundances of the two species
across quadrats should be negatively related, as in
Fig. \ref{fig_cop_pedag1}E-F. If Sp1 is plotted on the horizontal axis
and Sp2 on the vertical, then Fig. \ref{fig_cop_pedag1}E
may suggest Sp1 is the dominant
competitor: when Sp1 is abundant, Sp2 is necessarily rare because it is suppressed;
whereas when Sp1 is rare, Sp2
may be abundant, or may also be relatively rare due to
limiting factors other than Sp1.
Alternatively, Fig. \ref{fig_cop_pedag1}F may suggest Sp2
is the dominant competitor. Other causal hypotheses may produce
similar dependence structure, and it is usually impossible to obtain
complete information on causal pathways from analyses which are
fundamentally based on associations. However,
examination of tail associations may
be combined with biological reasoning to rule out
some causal hypotheses which could not previously be eliminated.
@Popovic2019 used copulas to illuminate causal relationships
between species.
A fourth reason why we may expect an understanding of tail associations and complex
dependence structures to be useful
for ecology has to do with spatially aggregated or averaged quantities.
Many ecological variables of applied importance
depend on the spatial average or sum of local
quantities. For instance, regional methane and CO$_2$
fluxes are the sum of local fluxes;
and the total economic value of a fishery is the sum of local catches.
We will explore how tail associations between fluctuations of local
quantities can influence fluctuations of
the spatial mean or sum, and how this may influence
higher organizational levels in ecology and human concerns.
To illustrate this idea we cite @li2013, who demonstrated that the overall
reliability of wind-generated electricity depends sensitively on details
of the dependence between wind speeds at multiple generator sites.
Spatially aggregated ecological variables may be subject to a similar
effect. For instance, if populations of a
pest species in different locations are all positively associated and are also more strongly
related to each other in their right tails, then local outbreaks will tend
to occur together, creating regional epidemics.
Stronger left-tail associations in a pest species, even if overall
correlation were the same, would have more benign effects.
We approached our overall goal of exploring the potential of copula approaches
for ecology by addressing the following specific questions. (Q1) Do datasets
in ecology have dependence structure distinct from that of a multivariate
Gaussian/normal
distribution (here called non-normal copula structure)? Do positively associated
ecological variables show tail
associations distinct from those of a normal distribution, and in particular
do they show asymmetric tail associations? Normal copula structure is assumed
by standard approaches that use multivariate
normal distributions or distributions obtained by transforming
the marginals of a normal distribution. So Q1 asks whether ecological
data contain dependency information distinct from a standard or default case.
(Q2) If the answer to
Q1 is "yes", then what are some possible causes/mechanisms of non-normal
copula structure and asymmetric tail associations in ecology? And (Q3), what are the
consequences of non-normal copula
structure and asymmetric tail associations for ecological understanding and
applications? After section \ref{Background}, 'Background on copulas', we address Q1
(sections \ref{Data} through \ref{Results} of the paper)
by analyzing several datasets, including
environmental, species-trait, phenological, population, community, and
ecosystem functioning data, selected to span multiple
sub-fields and organizational levels of ecology.
We address Q2 (sections \ref{Causes} through \ref{Results_Q2}) using
simple models. We address Q3 (sections \ref{Consequences} through \ref{Results_Q3}) using
both data and models. Section \ref{Discussion} is the Discussion.
Multiple analyses are brought to bear on each question, and these
are summarized diagrammatically in Fig. \ref{MasterFigure}, which can
serve as a *post hoc* guide to the paper.
Copula approaches have been used to great effect in neuroscience [@onken2009],
bioinformatics [@Kim2008], medical research [@Emura2016], direct study of environmental
variables [@Serinaldi2008; @She2018; @Goswami2018; @li2013], and finance [@Li2000],
and have also been used effectively, but rarely so far, in ecology [@Valpine2014;
@Anderson2018; @Popovic2019].
We argue that benefits of wider use of
copula descriptions of dependence in ecology will be substantial.
# Background on copulas\label{Background}
We give a brief introduction to copulas, focusing, for simplicity, on bivariate
copulas and on concepts needed
for the rest of the paper. We assume familiarity with the basic
language of probability distributions and random variables.
See @nelsen2006_copula, @Genest2007, @joe2014_dependence, and
@MaiScherer2017 for general background on copulas,
and see also @Anderson2018, part of
which is a short introduction to
copulas for ecologists. A bivariate *copula* can be defined as a bivariate cdf
with both margins uniform on $(0,1)$ (@joe2014_dependence, p. 7).
This will be the
cdf of a bivariate random variable with uniform marginals, and
the terminology *copula* is sometimes also applied to this random vector.
A foundational theorem of
@sklar1959_theorem (see also @MaiScherer2017, p. 16)
says that if $F$ is the cdf
of a random vector $(X,Y)$, with margins $F_X$ and $F_Y$, then there
exists a copula $C$
such that for all $(x,y)$ in the Euclidean plane, $F(x,y)=C(F_X(x),F_Y(y))$.
The theorem also says $C$ is unique if $F_X$ and $F_Y$ are continuous.
Thus $C$ couples the bivariate cdf, $F$, with the
cdfs of the marginals.
Finally, Sklar's theorem says that if $D$ is any bivariate copula and
$G_A$ and $G_B$ are univariate cdfs, then
$D(G_A(x),G_B(y))$ is a valid cdf
of some bivariate random variable.
The applications of this study fall within the case in which univariate marginal
distributions have continuous, strictly
monotonic cdfs, and this case is simpler. So we
henceforth make such an assumption. Table \ref{SM-tab_summary_notation} provides a summary
of notation.
<!--***DAN: Shyamolina, once I email you an edited version of the
table you produced, please make a table in the sup mat Rmd and cause it to be
cited here in place of the SX above. Please place the table in such
a way that SX is replaced with S1, since I believe this is now going
to be the first supplementary table we are citing.-->
Sklar's theorem implies that
any random vector can be constructed from a unique copula and marginal
cdfs, and, furthermore,
any copula and any univariate cdfs give rise to a random vector;
so Sklar's theorem provides a correspondence between random vectors
and pairs consisting of a copula and two univariate cdfs,
thereby making it possible to study a random vector by studying these two
constituents.
Marginals contain no information about the
dependence structure of a random vector, so the copula contains
all such information - it is a complete description of the dependence
between variables.
<!--***-->
The univariate cdfs associated with a random vector
$(X,Y)$ are its margins, $F_X$ and $F_Y$, and the
associated copula is the copula, $C$, guaranteed by Sklar's theorem
(and guaranteed unique, thanks to the assumptions of continuity
and monotonicity made above). In fact, and making Sklar's correspondence
more concrete, we show in Appendix \ref{SM-ContStrictMonoton}
that $C$ is the bivariate distribution
function of the random variable $(F_X(X),F_Y(Y))$.
<!--***-->
Conversely, given a copula
$D$ and univariate cdfs $G_A$ and $G_B$,
Sklar's theorem guarantees that $D(G_A(x),G_B(y))$ is a
cdf, so the random variable with this
cdf is the random variable corresponding
to $D$, $G_A$ and $G_B$ under Sklar's correspondence.
Again making Sklar's correspondence
concrete, we show in Appendix \ref{SM-ContStrictMonoton}
that this random variable is $(G_A^{-1}(U),G_B^{-1}(V))$,
where $G_A^{-1}$ and $G_B^{-1}$ are the inverses of $G_A$ and $G_B$
and $(U,V)$ is the random variable with cdf $D$
(which has uniform marginals).
If we conflate a copula with the random vector of
which the copula is the cdf, then Sklar's correspondence
is between random variables, $(U,V)$, with uniform marginals
and random variables, $(X,Y)$, with arbitrary continuous, strictly monotonic
marginals. The correspondence is simply via the application of the
univariate cdfs, or their inverses: $U=F_X(X)$,
$V=F_Y(Y)$, $X=F^{-1}_X(U)$, $Y=F^{-1}_Y(V)$.
Thus Sklar's theorem makes it possible to construct
bivariate distributions in two separate steps: by specifying the marginal
distributions and by specifying the dependence structure.
Construction of several bivariate distributions is carried out in this way in
Fig. \ref{PedagogFigCopThry}. The contrast between Fig. \ref{PedagogFigCopThry}C and D
illustrates in particular how a bivariate distribution can be changed, while retaining
the same marginals, by changing the copula. Fig. \ref{PedagogFigCopThry}C is a
bivariate normal distribution, but Fig. \ref{PedagogFigCopThry}D clearly
is not, although its marginals
are the same. Fig. \ref{PedagogFigCopThry}D shows stronger association of the two variables
in the left than in the right tails. Sklar's theorem
also makes it possible to study and model dependence separately
from marginal distributions. For instance, in a pedagogical
example, @Anderson2018 modelled the dependence between abundance
data for two fish species, red moki and black angelfish, by first
modelling the marginal distributions and then modelling
the dependence using Gaussian copulas (their section 2).
See the Discussion for a few words on multivariate copulas, which can
be developed in the same way as above [@nelsen2006_copula; @joe2014_dependence;
@MaiScherer2017; @Anderson2018].
\begin{figure}[!h]
\begin{center}
\includegraphics[width=6.5in]{./Results/PedagogFigCopThry.pdf}
\caption{Normal (A) and Clayton (B) copulas were combined with standard normal
marginals (C, D) and gamma marginals (E, F) via Sklar's theorem to produce
bivariate distributions. Note that a normal copula is distinct from a normal marginal
distribution: a normal copula is the copula of a bivariate normal distribution.
See section \ref{Background} for more information on normal, Clayton,
and other copulas. Each copula was used with both sets of marginals and each
set of marginals was used with both copulas, to demonstrate that: both copula and
marginals contribute fundamentally to the resulting distribution, the copula
contributing the information on the dependence between the variables; and that
one can select the copula and marginals independently. Bivariate
distributions, including the copulas, are depicted via their log-scale probability density functions (pdfs),
and marginals are depicted via their linear-scale pdfs. Grey dots are
50 random samples from each distribution. The parameter $0.7$ was used for the normal
copula, the parameter $2$ for the Clayton
copula, and shape and scale parameters $5$ and $1$ for the gamma marginals.
Variables $u$ and $v$ were used for copulas
and $x$ and $y$ for distributions created by pairing a copula with
(non-uniform) marginals.
\label{PedagogFigCopThry}}
\end{center}
\end{figure}
We introduce some standard copula families, as examples, and also
because we will fit these families to data. The bivariate normal
copula family, which consists of copulas of bivariate normal
distributions, is a one-parameter family. The parameter, $p$,
corresponds to the correlation
of the related bivariate normal distribution, and controls the degree of association
between the variables. A normal copula with $p=0.7$ was already introduced (Fig.
\ref{PedagogFigCopThry}A). The formula for the
copula is $\Phi_{2,p}(\Phi_{1}^{-1}(u),\Phi_{1}^{-1}(v))$, where $\Phi_1^{-1}$ is the inverse of the
cdf of a (univariate) standard
normal and $\Phi_{2,p}$ is the cdf of a
bivariate normal distribution with mean $(0,0)$ and covariance matrix having $1$s on the diagonal
and $p$ in the off diagonal positions. Formulas for all the
copula families we use were provided by @Brechmann2013.
However, formulas for copulas are often not as
directly informative about copula properties as probability density function (pdf)
pictures; Fig. \ref{figped_N} has
example pdfs corresponding to bivariate normal copulas for various values of $p$.
Note that the pdfs are symmetric
around the diagonal line $v=-u+1$ (Fig. \ref{figped_N}), so normal copulas have symmetric
associations between the two variables in the left and right tails.
The Clayton copula family,
of which we already pictured an example (Fig.
\ref{PedagogFigCopThry}B), is another one-parameter family. In contrast to normal copulas, Clayton copulas
have stronger left- than right-tail association. The formula is
$\left[ \reumax(u^{-p}+v^{-p}-1,0)\right]^{-1/p}$, for parameter $p$; though this again provides
less direct intuition than do example pdfs (Fig. \ref{figped_C}). Higher values of the parameter,
$p$ produce copulas with higher Kendall or Spearman correlation. The survival
Clayton family is the symmetric opposite of the Clayton family, showing
stronger right- than left-tail association (Fig. \ref{SM-figped_SC}). The BB1 copula
family is a two-parameter family, thereby providing more flexibility: for some parameters
BB1 copulas have stronger left- than right-tail association, and for others the reverse
(Fig. \ref{figped_BB1}). See @nelsen2006_copula, @joe2014_dependence and
@Brechmann2013 for further
information on these and other families.
Having introduced tail association conceptually and referred to it
in examples, we now need a precise definition of a measure of tail association.
We use the term *tail association* to describe the general idea of the strength of
association between two variables in the tails of their distributions. One
way this is measured is called *tail dependence* (@joe2014_dependence, section 2.13;
@nelsen2006_copula, section 5.4), defined here. Given a random vector $(X,Y)$
with margins $F_X$, $F_Y$ and defining $U=F_X(X)$, $V=F_Y(Y)$, the upper-tail
dependence of $X$ and $Y$ is defined as
$\lambda_U = \lim_{u \rightarrow 1^-} P[Y>F_Y^{-1}(u) | X>F_X^{-1}(u)]$. This equals
$\lim_{u \rightarrow 1^-} P[U>u | V>u]$, which in turn equals
$\lim_{u \rightarrow 1^-} P[U>u, V>u]/P[V>u]=\lim_{u \rightarrow 1^-} P[U>u, V>u]/(1-u)$,
which shows upper-tail dependence is defined symmetrically in the two variables.
All the variables we consider were positively associated when they were significantly
associated, so we think of positively associated variables when conceptualizing the
definitions here. Lower tail dependence is defined analogously, as
$\lambda_L = \lim_{u \rightarrow 0^+} P[Y \leq F_Y^{-1}(u) | X \leq F_X^{-1}(u)]$.
Tail dependence is a property of $(U,V)$, so depends only
on the copula and not on the marginals combined with it. It can be shown
(@nelsen2006_copula, section 5.4) that $\lambda_L=\lambda_U=0$ for the normal copula,
$\lambda_L=2^{-1/p}$ and $\lambda_U=0$ for a Clayton copula with parameter $p>0$,
and $\lambda_L=0$ and $\lambda_U=2^{-1/p}$ for a survival Clayton copula with parameter $p>0$.
In section \ref{Methods} we will introduce
ways tail dependence can be applied to data, by first fitting copulas to the data. We will also
introduce nonparametric measures of tail association.
Plots using normalized ranks were proposed in the Introduction as being conceptually
similar to copulas; we explain the connection. As stated above, the copula associated
with a random variable $(X,Y)$ is the cdf of the random variable
$(F_X(X),F_Y(Y))$. Given a sample $(x_i,y_i)$, $i=1,\ldots,n$ from $(X,Y)$, let $\hat{F}_X$ and
$\hat{F}_Y$ be the empirical cdfs associated with the $x_i$ and $y_i$, respectively.
These are step functions of $x$ and $y$, respectively, that start at $0$ for low values of
$x$ and $y$ and jump by $1/n$
at each of the data points. Therefore $\hat{F}_X(x_j)$ is the rank of $x_j$ in
the set $\{x_i : i=1,\ldots,n\}$, divided by $n$,
and analogously $\hat{F}_Y(y_j)$ is the rank of $y_j$ in the set
$\{y_i : i=1,\dots,n\}$, divided
by $n$. For large $n$, $\hat{F}_X(x_i)$ and $\hat{F}_Y(y_i)$
approximate the cdfs $F_X$ and $F_Y$, respectively.
But $\hat{F}_X(x_i)$ equals the normalized rank of $x_i$ times
$(n+1)/n$, and likewise $\hat{F}_Y(y_i)$ equals the normalized rank of $y_i$ times
$(n+1)/n$. So the normalized rank in turn approximates the empirical cdf and therefore
the cdf. Thus the normalized rank pairs $(u_i,v_i)$ for $i=1,\ldots,n$ can be
regarded as approximate samples from the random variable $(F_X(X),F_Y(Y))$,
which is the random variable associated with the copula of $(X,Y)$. The scatterplot
of $v_i$ versus $u_i$ can be used to infer aspects of copula
structure (see section \ref{Methods} below).
\begin{figure}[!h]
\begin{center}
\includegraphics[width=\textwidth]{./Results/PedagogSuppMat_Normal.pdf}
\caption[Example normal copulas]{Log-transformed pdfs (A-E) and samples (F-J) from example normal copulas.
K is Kendall correlation; $p$ is the value of the parameter for the normal
family (it is a
one-parameter family); and LT and UT are the measures of lower- and upper-tail dependence, respectively (for the definitions of these, see the section on Background on copulas,
section \ref{Background}). The
parameter range for the family is $p \in [-1,1]$, and lower- and upper-tail
dependence are always $0$.\label{figped_N}}
\end{center}
\end{figure}
\begin{figure}[!h]
\begin{center}
\includegraphics[width=\textwidth]{./Results/PedagogSuppMat_Clayton.pdf}
\caption[Example Clayton copulas]{Log-transformed pdfs (A-E) and samples (F-J) from example Clayton
copulas. K is Kendall correlation; $p$ is the value of the parameter for the Clayton family (it
is a one-parameter family); and LT and UT are the measures of lower- and
upper-tail dependence,
respectively (for the definitions of these, see the section on Background on copulas,
section \ref{Background}). The parameter range for the family
is $p \in (0,\infty)$, lower-tail dependence is $2^{-1/p}$ and upper-tail
dependence is $0$.\label{figped_C}}
\end{center}
\end{figure}
\begin{figure}[!h]
\begin{center}
\includegraphics[width=\textwidth]{./Results/PedagogSuppMat_BB1.pdf}
\caption[Example BB1 copulas]{Log-transformed pdfs for example BB1 copulas. K is Kendall correlation; $p_1$ and $p_2$
denote the two parameters of the family (it is a
two-parameter family); and LT and UT are our measures of lower- and upper-tail dependence,
respectively (for the definitions of these, see the section on Background on copulas,
section \ref{Background}). The parameter ranges for
the family are $p_1 \in (0,\infty)$ and $p_2 \in [1,\infty)$,
lower-tail dependence is $2^{-1/(p_1 p_2)}$ and upper-tail
dependence is $2-2^{1/p_2}$.\label{figped_BB1}}
\end{center}
\end{figure}
# Data \label{Data}
```{r read_somedata, echo=F, results='hide'}
d_soilCN<-read.csv("./Data/SoilCN/SoilCNdata.csv")
rownames(d_soilCN)<-d_soilCN$X
d_soilCN<-d_soilCN[,-1]
d_soilCN<-na.omit(d_soilCN)
d_birdsBMR<-readRDS("./Data/BMR/BirdBodyMassesMetabolicRates/my_birdsBMR_data.RDS")
d_mammalsBMR<-readRDS("./Data/BMR/MammalianBodyMassesMetabolicRates/my_mammalsBMR_data.RDS")
d_cc<-readRDS("./Results/Ceder_creek_results/cop_HB_all_yr.RDS")
```
The datasets we used included environmental, species-trait, phenological,
population, community, and ecosystem functioning data (Table \ref{tab_data_info}),
selected to span multiple fields and levels
of organization within ecology. Copula structure
between atmospheric weather variables such as rainfall or wind speed,
as measured in multiple locations through time, has been
examined previously in the meteorological literature [e.g., @Serinaldi2008; @li2013]. We therefore
examined environmental variables from the soil instead, using the Rapid Carbon
Assessment database [RaCA; @Wills2014].
<!--***DAN: Add Wills and Loecke pending reference if published in time-->
The database comprises measurements of soil organic carbon and total soil
nitrogen (megagrams C or N per hectare of soil surface) from `r nrow(d_soilCN)`
locations across the coterminous United States (Fig. \ref{SM-fig_SoilCNmapUSA} for locations).
Species-trait data were average species basal metabolic rate (BMR, KJ per hour) and
body mass (grams) for `r nrow(d_birdsBMR)` species of birds [@Mcnab_2009]
and `r nrow(d_mammalsBMR)` species of mammals [@Mcnab_2008].
These data have been much studied, but to our knowledge the copula structure
of the association has not been examined.
Species-trait data such as these reflect the coevolution of the two traits.
Phenological data were aphid first-flight dates from 10 locations (Fig. \ref{SM-fig_aphid_map})
across the United Kingdom (UK) for 20 aphid species (Table \ref{SM-tab_aphid_info}), for the
35 years 1976 to 2010. These time series were computed from the Rothamsted Insect
Survey (RIS) suction-trap dataset [@Harrington2014; @Bell2015]. The first of our two
population-level datasets was also derived from the RIS suction-trap data, and comprised
total counts of aphids trapped for the same locations, species, and years.
Our second population dataset comprised average annual plankton abundance estimates
for 14 locations (Fig. \ref{SM-fig_plankton_map})
in the North Sea and British seas for 22 taxa (Table \ref{SM-tab_plankton_info}) for the
56 years 1958 to 2013. The locations are $2^\circ$ by $2^\circ$ grid cells.
These data were computed from the Continuous Plankton Recorder survey of the
UK Marine Biological Association.
Community-level data, obtained from the Cedar Creek Ecosystem Science Reserve, were
plant aboveground biomass [@ccdataple120] and Shannon's diversity index (computed from plant
species percent cover data [@ccdatapce120]) for `r nrow(d_cc[[5]])` plots [@ccdataplotse120],
each 9m by 9m, as sampled in the years 1996-2000 and 2007, each year
analyzed separately [@Tilman2001; @Tilman2006].
Finally, ecosystem functioning data were methane (CH$_{4}$) fluxes between the soil or
water surface and the troposphere, measured at 13 locations at daily to weekly intervals
from September 2015 to September 2016 at the Great Miami Wetland mitigation bank in Trotwood,
Ohio [@Holland1999; @Jarecke2016; @smyth2019using]. Each included location was measured on at least 50 dates. See Appendix \ref{SM-Data_details} for additional data details.
Our environmental, species-trait, and community datasets happen to be
"bivariate" datasets in the sense that they comprise two quantities measured at
different locations or for different species (Table \ref{tab_data_info}).
Our phenology, population, and ecosystem functioning datasets are
"multivariate" in that they comprise, for each taxon, measurements through
time at multiple locations;
copula structure was studied for each location pair.
<!--summary table with info of all data used in Paper-->
```{r tab_summary_data_info, echo=F, results="asis",message=F}
library(kableExtra)
library(dplyr)
tab_data_info<-as.data.frame(matrix(NA,nrow=8,ncol=4))
colnames(tab_data_info)<-c("Data","Category","No. of measurements we used", "References")
tab_data_info$Data<-c("Soil C and N","Bird body masses and BMR", "Mammal body masses and BMR",
"Green spruce and other aphid species abundances", "Leaf-curling plum and other aphid first flight dates","\\textit{Ceratium furca} and other plankton taxa abundances", "Plant diversity and aboveground biomass","Methane-flux")
tab_data_info$Category<-c("Environmental","Species-trait","Species-trait",
"Population","Phenological","Population","Community level","Ecosystem functioning")
tab_data_info$`No. of measurements we used`<-c(paste(nrow(d_soilCN)," locations",sep=""),
paste(nrow(d_birdsBMR)," birds",sep=""),
paste(nrow(d_mammalsBMR)," mammals",sep=""),
"10 locations with at least 30-yr. timeseries for each of 20 sp.",
"10 locations with at least 30-yr. timeseries for each of 20 sp.",
"14 locations with at least 45-yr. timeseries for each of 22 taxa",
paste(nrow(d_cc[[5]])," plots",sep=""),
"13 locations with at least 50 dates of data")
tab_data_info$References<-c("US Rapid Carbon Assessment database (RaCA), Wills et al. (2014)", "McNab (2009)","McNab (2008)",
"Rothamsted Insect Survey","Rothamsted Insect Survey","Continuous Plankton Recorder Survey",
"Cedar Creek Ecosystem Science Reserve, biodiversity experiment e120",
"Great Miami Wetland Mitigation Bank, Smyth et al. (2019)")
knitr::kable(tab_data_info,
#format="latex", align="l",
format="latex", align="l",escape=F,
caption = "Summary table for the data we used. Bold entries are the multivariate data, the rest are bivariate datasets (see section \\ref{Data}). Basal metabolic rate=BMR. \\label{tab_data_info}",
booktabs = T,linesep = "\\addlinespace") %>%
column_spec(1:4,width="3.4 cm")%>%
row_spec(c(4,5,6,8), bold=T)
```
```{r tab_summary_copula_info, echo=F, results="asis",message=F}
library(kableExtra)
library(dplyr)
tab_copula_info<-as.data.frame(matrix(NA,nrow=16,ncol=4))
colnames(tab_copula_info)<-c("Copula family","No. of parameters","Tails with stronger association", "Reference")
tab_copula_info$`Copula family`<-c("Normal",
"Clayton","Survival Clayton",
"Gumbel","Survival Gumbel",
"Joe","Survival Joe",
"Frank",
"BB1","Survival BB1",
"BB6","Survival BB6",
"BB7","Survival BB7",
"BB8","Survival BB8")
tab_copula_info$`No. of parameters`<-c("1",
"1","1",
"1","1",
"1","1",
"1",
"2","2",
"2","2",
"2","2",
"2","2")
tab_copula_info$`Tails with stronger association`<-c("Neither (symmetric)",
"Lower","Upper",
"Upper","Lower",
"Upper","Lower",
"Neither (symmetric)",
"Either (depending on params)","Either (depending on params)",
"Upper","Lower",
"Either (depending on params)","Either (depending on params)",
"Upper or symmetric (depending on paras)","Lower or symmetric (depending on params)")
# caution: this fig. numbers are not auto-linked
tab_copula_info$Reference<-c("Fig. 4",
"Fig. 5","Fig. S3",
"Fig. S7","Fig. S8",
"Fig. S9","Fig. S10",
"Fig. S11",
"Fig. 6","Fig. S12",
"Fig. S13","Fig. S14",
"Fig. S15","Fig. S16",
"Fig. S17","Fig. S18")
knitr::kable(tab_copula_info,
format="latex", align="l",escape=F,
caption = "Summary table for the sixteen copula families we used (see section \\ref{Background}). For the parameters we consider, all copula families model positive association between variables. \\label{tab_copula_info}",
booktabs = T,linesep = "\\addlinespace") %>%
column_spec(2,width="2 cm")
```
# Concepts and methods for Q1 \label{Methods}
Recall that Q1 has three parts: Do datasets
in ecology have non-normal copula structure? Do positively associated
ecological variables show tail
associations distinct from those of a normal distribution? And in particular
do they show asymmetric tail associations?
We addressed Q1 first via a model selection procedure in which several
families of copulas were fitted to
our ecological datasets and fits were compared via the Akaike and Bayesian
Information Criteria (AIC and BIC). One of the fitted copulas was the normal
copula, making possible comparisons of the degree to which the normal versus
other copula families were good descriptions of data.
For bivariate datasets
$(x_i,y_i)$ for $i=1,...,n$, model selection involved several steps.
First, we produced normalized ranks
$u_i$ and $v_i$ as in the Introduction.
Second, we tested the independence of the $u_i$ and $v_i$
using the statistic $\sqrt{(9n(n-1))/(4n+10)}|\tau|$, where $\tau$
is Kendall's tau for the data. @Genest2007 argue that this statistic is approximately standard normally
distributed. We used the
implementation of this test in `BiCopIndTest` in the `VineCopula` package
in R. We tested for independence because our model selection
algorithms were ineffective if data could not
be distinguished from independent data,
since many copula families include the independent copula.
If independence could be rejected ($0.05$ level), model
selection proceeded. Third, we fit 16 bivariate copula families (see below) to the
normalized ranks via maximum likelihood. The approach of fitting
copula families to normalized ranks was recommended
by @Genest1995 and @Shih1995. Their estimator of copula
parameters, which we use, is consistent, asymptotically normal,
and fully efficient at indepencence [@Genest1995]. @Genest2007
recommend carrying out inferences of dependence structures (which was our
goal here) using normalized ranks.
We used the fitting implementation given in `BiCopEst` in `VineCopula`.
Fourth, we obtained AIC and BIC values and accompanying model weights
$\text{AIC}_{\text{w}}$ and
$\text{BIC}_{\text{w}}$ [@burnham2003_modelselection] for each fitted copula.
`BiCopEst` also
provided lower- and
upper-tail dependence of the best-fitting member of each family.
$\text{AIC}_{\text{w}}$ values were used to get
model-averaged lower- and upper-tail dependence values using standard model averaging
formulas [@burnham2003_modelselection]; likewise for BIC.
Thus for each bivariate dataset,
the end products of our procedure were threefold, corresponding to the
three parts of Q1 listed in the Introduction: A) the AIC, BIC, $\text{AIC}_{\text{w}}$ and
$\text{BIC}_{\text{w}}$ values for each of our 16 copula families (see below for list),
including the normal family, providing an inference as to whether the normal
copula or an alternative was better supported by
data; B) lower- and upper-tail dependence measures for each fitted copula and
model-averaged tail dependence measures, providing information on whether,
and to what extent, tail dependence differed from the tail dependence of
a normal copula (i.e., 0); and C) the difference of lower- and upper-tail
dependence measures for each fitted copula and model averages of those
quantities, providing information on whether tail
dependence was asymmetric.
Model selection methods give the *relative* support of several models, but do not
indicate whether any of the models are an objectively good fit. To test that,
we tested the goodness of fit of our AIC-best copula family using a
bootstrapping procedure of @Wang2000 and @genest2006goodness,
implemented as `BiCopGofTest` in `VineCopula`. The procedure performed
one test based on a Cramer-von Mises statistic and another based on a
Kolmogorov-Smirnov statistic. To keep computation times reasonable,
a run using $100$ bootstraps was performed. If
the $p$-value from either test was $<0.2$, tests were re-run with
$1000$ bootstraps.
We fit 16 bivariate copula families, exhibiting a variety of
lower- and upper-tail dependence characteristics, with bivariate datasets. The purpose
of using a large collection of families was to include a
variety of alternative dependence structures to have a robust model
selection and multi-model inference procedure. For that purpose, it
is not important for the reader to
understand the details of these copulas,
and, additionally, these copulas have been
described in detail elsewhere [@joe1997multivariate; @Brechmann2013]. So
we only briefly identify each family, say a few words about its tail
dependence, and provide pictorial descriptions (Figs
\ref{figped_N}-\ref{figped_BB1}, \ref{SM-figped_SC} and \ref{SM-figped_G}-\ref{SM-figped_SBB8}).
The pictorial descriptions can also be used to aid quick comparisons of copula
families when alternative families are being considered for future applications.
We used several families that can exhibit positive lower-tail dependence
(of strength depending on parameters) and
zero upper-tail dependence: the Clayton, survival Gumbel, survival Joe
and survival BB6 families. We used several families that can exhibit positive
upper-tail dependence (strength depends on parameters) and zero lower-tail
dependence: the survival Clayton, Gumbel, Joe and BB6 families. We used
families that show zero upper- and lower-tail dependence: the normal
and Frank families. These families have pdfs symmetric about the line
$v=-u+1$. We also used several families that can show both upper- and
lower-tail dependence, in relative amounts depending on parameter
values: the BB1, survival BB1, BB7 and survival BB7 families.
We also used the BB8 family, which shows zero lower-tail dependence,
and zero upper-tail dependence except for a boundary case for the parameters.
And we used the survival BB8 copula, which shows zero upper-tail dependence,
and zero lower-tail dependence except for a boundary case.
"Survival" families are rotations of the copula with the similar name by
180 degrees.
See @joe1997multivariate and @Brechmann2013 for details on all families. We used the implementations
provided in the `VineCopula` package for the R programming
language. See Figs \ref{figped_N}-\ref{figped_BB1}, \ref{SM-figped_SC} and \ref{SM-figped_G}-\ref{SM-figped_SBB8}
for visual depictions of the pdfs of these copulas and how
pdfs and tail dependence are influenced by the parameters. See
Table \ref{tab_copula_info} for a summary.
<!--***DAN: Info on families kept here for convenience, see also the file
CopulaFamilyParamRangesAndTailDependence.jpg, stored in the BIVAN folder:
lower not upper dependence:
Clayton (3), survival Gumbel (14), survival Joe (16),
SBB6 (18)
zero both, and symmetric:
normal (1), Frank (5)
upper not lower dependence:
SC (13), Gumbel (4), Joe (6)
BB6 (8)
both, in relative amounts that depend on parameters:
BB1 (7) - maybe need to look at formulas for this one to understand what the
limits are, since I only chose some random values of parameters in my pdf
pictures
SBB1 (17)
BB7 (9), SBB7 (19)
BB8 (10) - 0 LT dep, UT dep 0 except in a boundary case for the parameters
SBB8 (20) - reverse of BB8-->
For multivariate datasets, we performed the bivariate analysis
for all possible pairwise combinations of distinct locations. We carried out
pairwise bivariate analyses instead of trying to fit a multivariate copula,
for simplicity and because that approach was
sufficient to answer our research questions; but see the Discussion
for a few words on multivariate copulas.
We present
the number of pairs for which a non-normal copula
was the AIC-best copula, and we characterize AIC differences between
the normal and AIC-best copulas across location pairs.
We also computed the model-averaged lower- and
upper-tail statistics, and differences
between these, for each pair of locations, and we characterize
the distributions of these values across location pairs.
In addition to our model selection approach, we also used a
nonparametric approach, to provide greater
confidence in our answers to Q1. We used three
statistics which quantify the extent to which
the normalized ranks $u_i$ and $v_i$ are
related in any part of their distributions. We here describe the
statistics, with additional details in Appendix \ref{SM-nonparam_stats}.
The statistics are defined with positively associated
variables in mind. All our variables were positively associated
when they were significantly associated.
Given two bounds
$0 \leq l_b < u_b \leq 1$, define the lines $u+v=2l_b$ and
$u+v=2u_b$, which intersect the unit square (Fig. \ref{NonparamStatsFig}).
Our statistics quantify the association
between $u_i$ and $v_i$ in the region bounded by these lines.
Using $l_b=0$ and $u_b\leq 0.5$ gives information about
association in the left parts of the distributions of $u$ and $v$, and
using $u_b=1$ and $l_b \geq 0.5$ gives information about association in
the right parts. The first statistic, a partial Spearman correlation,
\begin{equation}
\cor_{l_b,u_b}(u,v)=\frac{\sum
(u_i-\mean(u)) (v_i-\mean(v))}{(n-1)\sqrt{\var(u)\var(v)}},\label{eq:partialspearman}
\end{equation}
\noindent is the portion of the Spearman
correlation of $u_i$ and $v_i$ that is attributable to the points
between the bounds.
Here sample means and variances are computed using all $n$ data
points, but the
sum is over only the indices $i$ for which $u_i+v_i > 2l_b$ and
$u_i+v_i < 2u_b$.
Larger values of the partial Spearman correlation indicate stronger positive association.
The sum of
$\cor_{0,0.5}(u,v)$ and $\cor_{0.5,1}(u,v)$ (or some other choice of
$\cor_{l_{b_k},u_{b_k}}(u,v)$ for bounds $l_{b_k},u_{b_k}$ that partition the
interval $(0,1)$) equals the standard Spearman correlation
as long as no points fall exactly on the bounds.
We also defined a statistic $\Ps_{l_b,u_b}$ (Appendix \ref{SM-nonparam_stats}),
which has a similar interpretation to $\cor_{l_{b},u_{b}}$.
Our third statistic, $\Dsq_{l_b,u_b}$, is the average squared
distance between points satisfying $u_i+v_i>2l_b$ and
$u_i+v_i<2u_b$ and the line $v=u$.
Unlike $\cor_{l_b,u_b}$ and
$\Ps_{l_b,u_b}$, for which large values indicate strong association
between the bounds, small values of
$\Dsq_{l_b,u_b}$ indicate strong association. These statistics are not estimators of the
tail dependence quantities
defined previously, but rather are conceptually similar measures of associations in
the tail portions of the distributions when appropriate values of $l_b$ and $u_b$ are
selected.
\begin{figure}[!h]
\begin{center}
\includegraphics[width=3.5in]{./Results/NonparamStatsFig.pdf}
\caption{The partial Spearman correlation, $\cor_{l_{b},u_{b}}(u,v)$,
within a band can be computed
for any band (section \ref{Methods}) to describe how the strength of association
between $u$ and $v$
varies from one part of the two distributions to another,
as can the statistics $\Ps_{l_b,u_b}(u,v)$ and $\Dsq_{l_b,u_b}(u,v)$. Diagonal lines
show two bands, the data in the left/lower band showing stronger
association than those in the right/upper band.\label{NonparamStatsFig}}
\end{center}
\end{figure}
For large datasets (large $n$),
we used $l_b$ and $u_b$ close together without incurring undue
sampling variation in our statistics, and we considered
multiple bands $(l_b,u_b)$ to understand how association
varies across different parts of the distributions. But for datasets with
smaller $n$ we considered only $l_b=0$, $u_b=0.5$ and $l_b=0.5$, $u_b=1$,
abbreviating $\cor_l=\cor_{0,0.5}$ ($l$ is for "lower")
and $\cor_u=\cor_{0.5,1}$ ($u$ is for "upper"). Likewise $\Ps_l=\Ps_{0,0.5}$,
$\Ps_u=\Ps_{0.5,1}$, $\Dsq_l=\Dsq_{0,0.5}$, $\Dsq_u=\Dsq_{0.5,1}$.
To test for asymmetry of association in upper and lower portions of distributions,
we used differences $\cor_l-\cor_u$, $\Ps_l-\Ps_u$ and $\Dsq_u-\Dsq_l$
for smaller datasets (note the opposite order
in the last of these); and $\cor_{0,u_b}-\cor_{1-u_b,1}$,
$\Ps_{0,u_b}-\Ps_{1-u_b,1}$ and
$\Dsq_{1-u_b,1}-\Dsq_{0,u_b}$ with $u_b$ close
to $0$ for large datasets.
We tested our statistics in Appendix \ref{SM-Test_npa_stat} (see also
Figs \ref{SM-fig_stat_testing35}-\ref{SM-fig_stat_testing1000}).
For bivariate datasets, we compared values
of $\cor_{l_b,u_b}$,
$\Ps_{l_b,u_b}$, $\Dsq_{l_b,u_b}$, $\cor_{0,u_b}-\cor_{1-u_b,1}$,
$\Ps_{0,u_b}-\Ps_{1-u_b,1}$ and
$\Dsq_{1-u_b,1}-\Dsq_{0,u_b}$ to
distributions of the same statistics computed on surrogate datasets that were produced from
the empirical data by randomizing it in a special way to have no
tail dependence (Appendix \ref{SM-surrog_test}).
The surrogate/randomized datasets had exactly
the same marginal distributions as the empirical data and had very similar Kendall or Spearman
correlation (the surrogate algorithm had two versions, one for preserving each correlation coefficient),
but had normal copula structure. Thus our comparisons tested the null hypothesis
that our statistics took values on the empirical data no different from what would have been expected
if the copula structure of the data were normal, but the data were otherwise