Last night I spotted this tweet about the R package TSclust.
Thank you Pablo and Jose for #TSclust - time series clustering package in #rstats ! http://t.co/GBQtQnQ8Lr
— Pasha Roberts (@pasharoberts) March 2, 2015
I should start by saying that I really don’t know what I’m doing, so be warned. I thought it would interesting to apply TSclust to the S&P 500 price time series. I took the 1-day simple rate of change, grouped by year with dplyr, and then indexed by the day of the year all in one pipeR pipeline. Since the TSclust paper
TSclust: An R Package for Time Series Clustering
Journal of Statistical Software, Volume 62, Issue 1
November 2014
demonstrates interoperability with hclust in their OECD interest rate example ( Section 5.2 ), I thought I could visualize the results nicely with treewidget from the epiwidgets package. Just because the htmlwidget was designed for phylogeny doesn’t mean we can’t use it for finance. Here is the result.
For reference and searching, I’ll copy the code below, but all of this can be found in this Github repo.
library(TSclust) library(quantmod) library(dplyr) library(pipeR) library(tidyr) library(epiwidgets) sp5 <- getSymbols("^GSPC",auto.assign=F,from="1900-01-01")[,4] sp5 %>>% # dplyr doesn't like xts, so make a data.frame ( data.frame( date = index(.) ,price = .[,1,drop=T] ) ) %>>% # add a column for Year mutate( year = as.numeric(format(date,"%Y"))) %>>% # group by our new Year column group_by( year ) %>>% # within each year, find what day in the year so we can join mutate( pos = rank(date) ) %>>% mutate( roc = price/lag(price,k=1) - 1 ) %>>% # can remove date select( -c(date,price) ) %>>% as.data.frame %>>% # years as columns as pos as row spread( year, roc ) %>>% # remove last year since assume not complete ( .[,-ncol(.)] ) %>>% # remove pos since index will be same select( -pos ) %>>% # fill nas with previous value na.fill( 0 ) %>>% t %>>% # use TSclust diss; notes lots of METHOD options diss( METHOD="ACF" ) %>>% hclust %>>% ape::as.phylo() %>>% treewidget