Here (below) are the data (obtained from the University of Dayton Average Daily Temperature Archive) with a smattering of credible regression curves superimposed on the data. There are 20 curves, each using different parameter combinations from 20 widely-separated steps in the MCMC chain.

The panels at the bottom of the figure show the marginal posterior distributions of the parameters. In particular, the linear trend parameter is credibly non-zero, and suggests that on average over the last 17 years, the average daily temperature has increased by 0.059 degrees Fahrenheit per year (i.e., about 0.59 degrees per decade).
Details:
Here is the JAGS model statement:
model {
for( i in 1 : Ndata ) {
y[i] ~ dt( mu[i] , tau , nu )
mu[i] <- beta0 + beta1 * x[i] + amp * cos( ( x[i] - thresh ) / wl )
}
beta0 ~ dnorm( 0 , 1.0E-12 )
beta1 ~ dnorm( 0 , 1.0E-12 )
tau ~ dgamma( 0.001 , 0.001 )
amp ~ dunif(0,50)
thresh ~ dunif(-183,183)
nu <- nuMinusOne + 1
nuMinusOne ~ dexp(1/29)
}
The predicted value, mu, is a linear trend, beta0 + beta1 * x[i], plus a sinusoidal deviation, amp * cos( ( x[i] - thresh ) / wl ). Notice that the data are assumed to be distributed as a t distribution, not as a normal distribution, around the predicted value mu. The heavy tails of a t distribution accommodate outliers. The df parameter of the t distribution, nu, is estimated from the data. (I put an exponential prior on nu; you can read more about "robust estimation" in this article.) In this version of the model, I assumed a fixed wavelength of 365.24219 days per cycle. That's a "tropical year". Hence the wl parameter in the model was set at 365.24219/(2*pi) to convert to radians. In other versions I estimated wl and it came out to be very close to the duration of a tropical year.
Quibbles:
Maybe the linear trend is an artifact because of the arbitrary alignment of start and stop dates on the cycle. Two reasons to doubt this explanation. First, the sinusoidal component is supposed to "soak up" variance due to the cycle, thereby reducing influence of end points of the data. Second, as a check, I constructed an artificial data set that had zero linear trend, and examined the estimated linear coefficient. To do this, I took the 2003 data and concatenated that cycle to itself 17 times. Therefore the artificial data had the same sort of alignment issues as real data. The resulting estimate of the linear trend had zero very near its mode.
The sinusoid doesn't seem to capture the actual shape of the temperature cycle very accurately. The extremes in the actual temperatures seem to go beyond the peaks in the sinusoidal curve. The tails of the t distribution accommodate the high extremes in summer and the low extremes in winter, but the temperatures are not symmetrically distributed around mu throughout the year. If anyone can tell me a better model of annual temperature trends (that's also easy to write in JAGS for demo purposes), please let me know.
The data violate the assumption of independence in the model. The model assumes that, on each day, the temperature is a random draw from a t distribution centered on mu, with no influence from the temperature of the previous day. Clearly this is wrong as a description of real temperatures. Does this mean the model is utterly useless and uninterpretable? Maybe not. We can think of the likelihood function, which multiplies together the t-distribution densities of all the data points, merely as a measure of fit rather than as a generative model of data. The estimated parameters tell us about the descriptive fit, not about the process that generated the data. Maybe.
Hiç yorum yok:
Yorum Gönder