Thursday, August 9, 2012

48 Industries (Dendrogram Ordered) Over 50 Years

Thanks to reader AHWest for the comment on post 48 Industries Since 1963.

“I think it would be interesting to see the industries ordered by some sort of similarity of returns.”

I think this is a great suggestion, and I would like to see it also.  I tried the dendrogram plot technique from Inspirational Stack Overflow Dendrogram Applied to Currencies, but then I spotted the dendrogramGrob in the latticeExtra documentation.  This was much easier, and in a couple of lines, we are able to order and connect the 48 industries.

From TimelyPortfolio
R code from GIST (do raw for copy/paste):
require(fAssets)
require(latticeExtra)
require(quantmod)
require(PerformanceAnalytics)
#my.url will be the location of the zip file with the data
my.url="http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/48_Industry_Portfolios_daily.zip"
#this will be the temp file set up for the zip file
my.tempfile<-paste(tempdir(),"\\frenchindustry.zip",sep="")
#my.usefile is the name of the txt file with the data
my.usefile<-paste(tempdir(),"\\48_Industry_Portfolios_daily.txt",sep="")
download.file(my.url, my.tempfile, method="auto",
quiet = FALSE, mode = "wb",cacheOK = TRUE)
unzip(my.tempfile,exdir=tempdir(),junkpath=TRUE)
#read space delimited text file extracted from zip
french_industry <- read.table(file=my.usefile,
header = TRUE, sep = "",
as.is = TRUE,
skip = 9, nrows=12211)
#get dates ready for xts index
datestoformat <- rownames(french_industry)
datestoformat <- paste(substr(datestoformat,1,4),
substr(datestoformat,5,6),substr(datestoformat,7,8),sep="-")
#get xts for analysis
french_industry_xts <- as.xts(french_industry[,1:NCOL(french_industry)],
order.by=as.Date(datestoformat))
#divide by 100 to get percent
french_industry_xts <- french_industry_xts/100
#delete missing data which is denoted by -0.9999
french_industry_xts[which(french_industry_xts < -0.99,arr.ind=TRUE)[,1],
unique(which(french_industry_xts < -0.99,arr.ind=TRUE)[,2])] <- 0
#get price series or cumulative growth of 1
french_industry_price <- cumprod(french_industry_xts+1)
#get 250 day rate of change or feel free to change to something other than 250
roc <- french_industry_price
#split into groups so do not run out of memory
for (i in seq(12,48,by=12)) {
roc[,((i-11):(i))] <- ROC(french_industry_price[,((i-11):(i))],n=250,type="discrete")
}
roc[1:250,] <- 0
# try to do http://stackoverflow.com/questions/9747426/how-can-i-produce-plots-like-this
# was much easier to use latticeExtra
# get dendrogram data from hclust
# backward and repetitive but it works
t <- assetsDendrogramPlot(as.timeSeries(french_industry_xts))
# thanks to the latticeExtra example
dd.row <- as.dendrogram(t$hclust)
row.ord <- order.dendrogram(dd.row)
xyplot(roc[,row.ord],
layout=c(1,48), ylim=c(0,0.25),
scales = list(tck = c(1,0), y = list(draw = FALSE,relation = "same")),
horizonscale=0.25,
origin = 0,
colorkey = TRUE,
#since so many industries, we will comment out grid
panel = function(x,y,...) {
panel.horizonplot(x,y,...) #feel free to change to whatever you would like)
# panel.grid(h=3, v=0,col = "white", lwd=1,lty = 3)
},
ylab = list(rev(colnames(roc[,row.ord])), rot = 0, cex = 0.7, pos = 3),
xlab = NULL,
par.settings=theEconomist.theme(box = "gray70"),
#use ylab above for labelling so we can specify FALSE for strip and strip.left
strip = FALSE,
strip.left = FALSE,
main = "French Daily 48 Industry (Dendrogram Ordered) 1963-2011\n source: http://mba.tuck.dartmouth.edu/pages/faculty/ken.french",
legend =
list(right =
list(fun = dendrogramGrob,
args =
list(x = dd.row, ord = row.ord,
side = "right",
size = 10))))

2 comments:

  1. This one throws an error

    french_industry_xts <- as.xts(french_industry[,1:NCOL(french_industry)],
    order.by=as.Date(datestoformat))

    Error in as.POSIXlt.character(x, tz, ...) :
    character string is not in a standard unambiguous format

    Any idea why is that?

    ReplyDelete
    Replies
    1. in a prior post, reader suggested the following

      I changed it to

      french_industry_xts <- as.xts(french_industry[,1:NCOL(french_industry)],
      as.POSIXct(datestoformat,format="%Y-%m-%d"))

      and it's all OK.

      Delete