Csv data generator

10/10/2023

composition of our empirical data by the "unknown" factors

create a hidden ("unknown") loadingsmatrix, which describes the Nf = ncf+nef // needed uncorrelated random-factors Ncf,nef = 3,nv // set number of common factors, error-factors Nv = 6 // set number of empirical variables 6 itemspecific error-factors (normal distribution) make.It could be done shorter, without the naming/the use of variables like N, nv, etc, you could just insert the values but for documentation here I've done it with that richer documented form. The following might be done similarly, and perhaps better, in R but I show it here in my own matrix-tool-language MatMate because I'm unexperienced in R. This is less "natural" to look at at the beginning but one can get used to this. Just to prevent to set correlation which are "impossible" as a whole set (the matrix of correlations can become non-positivedefinite) - for instance you can't define two nearly correlated variables and a third one near to one of them and far to the other of them - it might be more useful to begin with a "factorloadings"-matrix instead, which describe the composition of the randomvariables as linear (regression)-equations. Which makes a file of the name "myfile.csv" in the current working directory with the contents: "","y","x" So now it's just a matter of writing out the results in your preferred format (all the formats you mention can be done easily for example, as a csv file, you'd call write.csv: write.csv(ame(y=y1,x=x2),file="myfile.csv") Which produces the correlation: cor(y1,x2) There are a variety of ways to achieve that, but one simple way is to take residuals from a regression (which will be uncorrelated with the x-variable in the regression), and then scale both variables to have unit variance. For the exact sample correlation, you need samples with exactly zero sample correlation, and identical sample variances, before applying the above trick. If $Z$ is a vector of length $k$ of independent random variables with unit (or at least constant) standard deviation and $\S$ is a correlation matrix with Cholesky decomposition $S=LL'$, then $LZ$ with have population correlation $S$. If you need it for more than two variables and some prespecified correlation matrix, this can be done using Cholesky decomposition (of which the above is a special case). This could be done in Excel or any number of other packages with similar ease. (I just ran the code three times and got sample correlations of 0.938,0.895, and 0.933). Here the underlying variables have population correlation of the desired size, but the sample correlation will differ from it. If $X_1$ and $X_2$ are independent standard normal variables, then $Y=rX_2+\sqrtX_1$ will have correlation $r$ between $Y$ and $X_2$. This is a simple matter in the bivariate case of taking independent random variables with the same standard deviation and creating a third variable from those two that has the required correlation with one of the two random variables. almost anything capable of doing basic statistical calculations. You could do it in any variety of places.

0 Comments

Csv data generator

Leave a Reply.

Author

Archives

Categories