## R Code For Canonical Correlation Analysis .

############################################## # R Code for canonical correlation analysis # ############################################## # We will use the built-in iris data set. # We will consider the entire data set (all three species) attach(iris) # We will standardize the variables first # by dividing by each column’s standard deviation: # (we will remove column 5, the species labels) iris.std <- sweep(iris[,-5], 2, sqrt(apply(iris[,-5],2,var)), FUN=”/”) sepal.meas <- iris.std[,1:2] petal.meas <- iris.std[,3:4] ### Doing the CCA the long way: # Finding blocks of the correlation matrix: R11 <- cor(sepal.meas) R22 <- cor(petal.meas) R12 <- c(cor(sepal.meas[,1], petal.meas[,1]), cor(sepal.meas[,1], petal.meas[,2]), cor(sepal.meas[,2], petal.meas[,1]), cor(sepal.meas[,2], petal.meas[,2])) R12 <- matrix(R12, ncol=ncol(R22), byrow=T) # R12 has q2 columns, same as number of petal measurements R21 <- t(R12) # R21=transpose of R12 # Finding the E1 and E2 matrices: E1 <- solve(R11) %*% R12 %*% solve(R22) %*% R21 E2 <- solve(R22) %*% R21 %*% solve(R11) %*% R12 # print(E1) # print(E2) eigen(E1) eigen(E2) # The canonical correlations are: canon.corr <- sqrt(eigen(E1)\$values) canon.corr # The canonical variates are based on the eigenvectors of E1 and E2: # a1 = (0.922, -0.388) # b1 = (0.943, -0.333) # a2 = (0.457, 0.890) # b2 = (-0.679, 0.734) # Only the first canonical correlation is really substantial: # u1 = 0.92*Sepal.Length – 0.39*Sepal.Width # v1 = 0.94*Petal.Length – 0.33*Petal.Width # Plotting the first set of canonical variables: u1 <- as.matrix(iris.std[,1:2]) %*% as.matrix(eigen(E1)\$vectors[,1]) v1 <- as.matrix(iris.std[,3:4]) %*% as.matrix(eigen(E2)\$vectors[,1]) plot(u1,v1) cor(u1,v1) # Plotting the second set of canonical variables: u2 <- as.matrix(iris.std[,1:2]) %*% as.matrix(eigen(E1)\$vectors[,2]) v2 <- as.matrix(iris.std[,3:4]) %*% as.matrix(eigen(E2)\$vectors[,2]) plot(u2,v2) cor(u2,v2) ### Doing CCA using the built-in cancor function: cancor(sepal.meas, petal.meas) # The canonical correlations are the same as the ones we found, # The canonical variates are a little different because the cancor # function works with the centered data rather than the original data. ###...

## Multiple Regression Analysis Of Copper Prices

Multiple regression analysis of copper prices Fundamental: Copper prices are determined by a lot of fundamentals like dollar index, copper consumption, housing index, industrial production, and stock of copper in the world, copper ore production and import of copper by different countries. From past few years, China and USA have been the largest importer of copper in the world and imports quantity also has an impact on copper prices. This study determines the impact of different variables on copper prices using multiple regression analysis. In this regression analysis, copper price is the dependant variable and dollar index (DX), China imports of copper, USA imports of copper, total stock of copper and world consumption of copper are the independent variables. The variables selected are based on the correlation analysis, the variables which are least correlated are taken into the analysis as dependant variables. The process flow for the analysis is as under: Data partition: In data partition, the entire data is divided into two partitions, 80% being the training data, 10% testing and 10% is validating data. After partitioning data, the insight analysis of the data is done using enterprise miner. The data is checked for the assumptions of linear regressions like normality, detecting outliers and transformation of variables before putting it to the regression analysis. The above table shows the transformation of different variables used in analysis. It can be seen that after transformation the skewness of data has decreased to a great extent in most of the variables. After transformation of the variable the outlier filter is run which would remove the outliers in some of the variables as seen in the distribution analysis. The regression analysis when run gave the following results: The regression analysis when run gave results where Dollar Index came out to be having most...