From sarah.goslee at gmail.com Mon Aug 1 00:10:38 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Sun, 31 Jul 2011 18:10:38 -0400 Subject: [R] How to count numbers of a vector and use them as index values? In-Reply-To: <1312148489.3609.94.camel@mattotaupa> References: <1312148489.3609.94.camel@mattotaupa> Message-ID: Hi Paul, I would use something like this: > x <- c(2,2,3,3,4,6) > table(x) x 2 3 4 6 2 2 1 1 > x <- factor(x, levels=1:8) > table(x) x 1 2 3 4 5 6 7 8 0 2 2 1 0 1 0 0 Sarah On Sun, Jul 31, 2011 at 5:41 PM, Paul Menzel wrote: > Dear R folks, > > > I am sorry to ask this simple question, but my search for the right > way/command was unsuccessful. > > I have a vector > >> x <- c(2, 2, 3, 3, 4, 6) > > Now the values of x should be considered the index of another vector > with possible greater length, say 8, and the value should be how often > the indexes appeared in the original vector x. > >> length(result) > ?[1] 8 >> result > ?[1] 0 2 2 1 0 1 0 0 > > > Thank you in advance, > > Paul > -- Sarah Goslee http://www.functionaldiversity.org From djmuser at gmail.com Mon Aug 1 00:17:14 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sun, 31 Jul 2011 15:17:14 -0700 Subject: [R] Error in plotmath In-Reply-To: <1312136706012-3708153.post@n4.nabble.com> References: <1312136706012-3708153.post@n4.nabble.com> Message-ID: Both of these work on my system: set.seed(6) x <- rnorm(30, 100, 20) xs <- seq(50, 150, length=150) cdf <- pnorm(xs, 100, 20) plot(xs, cdf, type='l', ylim=c(0,1), xlab=expression(x), ylab=expression(paste("Prob[", X <= x, "]"))) # FH lines(ecdf(x), cex=.5) plot(xs, cdf, type='l', ylim=c(0,1), xlab=expression(x), ylab=expression("Prob[X" <= "x]")) # PE lines(ecdf(x), cex = 0.5) > sessionInfo() R version 2.13.1 Patched (2011-07-21 r56468) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets grid methods [8] base other attached packages: [1] directlabels_1.3 sos_1.3-1 brew_1.0-6 lattice_0.19-30 [5] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 Dennis On Sun, Jul 31, 2011 at 11:25 AM, Frank Harrell wrote: > Under > > platform ? ? ? x86_64-pc-linux-gnu > arch ? ? ? ? ? x86_64 > os ? ? ? ? ? ? linux-gnu > system ? ? ? ? x86_64, linux-gnu > status > major ? ? ? ? ?2 > minor ? ? ? ? ?13.1 > year ? ? ? ? ? 2011 > month ? ? ? ? ?07 > day ? ? ? ? ? ?08 > svn rev ? ? ? ?56322 > language ? ? ? R > version.string R version 2.13.1 (2011-07-08) > > I get a double quote mark in place of <= in the y-axis label when I run the > following. > > set.seed(6) > x <- rnorm(30, 100, 20) > xs <- seq(50, 150, length=150) > cdf <- pnorm(xs, 100, 20) > plot(xs, cdf, type='l', ylim=c(0,1), > ? ? xlab=expression(x), > ? ? ylab=expression(paste("Prob[", X <= x, "]"))) > lines(ecdf(x), cex=.5) > > The problem also occurs if I use instead ylab=expression(prob(X <= x))) > > All is well if I remove "=" but I need <=. > > Frank > > ----- > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: http://r.789695.n4.nabble.com/Error-in-plotmath-tp3708153p3708153.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From j3ffdick at gmail.com Mon Aug 1 00:19:20 2011 From: j3ffdick at gmail.com (Jeffrey Dick) Date: Sun, 31 Jul 2011 15:19:20 -0700 Subject: [R] How to count numbers of a vector and use them as index values? In-Reply-To: <1312148489.3609.94.camel@mattotaupa> References: <1312148489.3609.94.camel@mattotaupa> Message-ID: Here's an attempt using sapply: > x <- c(2, 2, 3, 3, 4, 6) > ys <- 1:8 > sapply(ys, function(y) { length(which(x==y)) } ) [1] 0 2 2 1 0 1 0 0 Jeff On Sun, Jul 31, 2011 at 2:41 PM, Paul Menzel wrote: > Dear R folks, > > > I am sorry to ask this simple question, but my search for the right > way/command was unsuccessful. > > I have a vector > >> x <- c(2, 2, 3, 3, 4, 6) > > Now the values of x should be considered the index of another vector > with possible greater length, say 8, and the value should be how often > the indexes appeared in the original vector x. > >> length(result) > ?[1] 8 >> result > ?[1] 0 2 2 1 0 1 0 0 > > > Thank you in advance, > > Paul > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From tdenes at cogpsyphy.hu Mon Aug 1 00:21:43 2011 From: tdenes at cogpsyphy.hu (=?iso-8859-1?Q?=22D=E9nes_T=D3TH=22?=) Date: Mon, 1 Aug 2011 00:21:43 +0200 Subject: [R] How to count numbers of a vector and use them as index values? In-Reply-To: References: <1312148489.3609.94.camel@mattotaupa> Message-ID: <8f694de9bd03c52880a0bba670165701.squirrel@webmail.cogpsyphy.hu> See also ?tabulate. tabulate(x,8) > Hi Paul, > > I would use something like this: > >> x <- c(2,2,3,3,4,6) >> table(x) > x > 2 3 4 6 > 2 2 1 1 >> x <- factor(x, levels=1:8) >> table(x) > x > 1 2 3 4 5 6 7 8 > 0 2 2 1 0 1 0 0 > > Sarah > > On Sun, Jul 31, 2011 at 5:41 PM, Paul Menzel > wrote: >> Dear R folks, >> >> >> I am sorry to ask this simple question, but my search for the right >> way/command was unsuccessful. >> >> I have a vector >> >>> x <- c(2, 2, 3, 3, 4, 6) >> >> Now the values of x should be considered the index of another vector >> with possible greater length, say 8, and the value should be how often >> the indexes appeared in the original vector x. >> >>> length(result) >> ?[1] 8 >>> result >> ?[1] 0 2 2 1 0 1 0 0 >> >> >> Thank you in advance, >> >> Paul >> > > > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From paulepanter at users.sourceforge.net Mon Aug 1 00:23:37 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Mon, 01 Aug 2011 00:23:37 +0200 Subject: [R] [solved] How to count numbers of a vector and use them as index values? In-Reply-To: References: <1312148489.3609.94.camel@mattotaupa> Message-ID: <1312151017.3609.101.camel@mattotaupa> Dear Sarah, Am Sonntag, den 31.07.2011, 18:10 -0400 schrieb Sarah Goslee: > I would use something like this: > > > x <- c(2,2,3,3,4,6) > > table(x) > x > 2 3 4 6 > 2 2 1 1 > > x <- factor(x, levels=1:8) > > table(x) > x > 1 2 3 4 5 6 7 8 > 0 2 2 1 0 1 0 0 awesome. Thank you. Looking further I found the article ?Thinking in R: vectors? on R-bloggers [1]. The given example there returned a list though by using `lapply()` and the author is asking at the end: ?Any R experts out there with suggestions for a non-lapply solution??. So Derek-Jones, Sarah just gave you the answer. A further note regarding the example in [1], instead of length(X[X == n]) to count the number of occurrences you can also use `sum(X == n)` relying on the fact that `TRUE` and `FALSE` are converted to the integers `1` and `0`. Thanks, Paul [1] http://www.r-bloggers.com/thinking-in-r-vectors/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From paulepanter at users.sourceforge.net Mon Aug 1 00:37:34 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Mon, 01 Aug 2011 00:37:34 +0200 Subject: [R] [solved] How to count numbers of a vector and use them as index values? In-Reply-To: References: <1312148489.3609.94.camel@mattotaupa> Message-ID: <1312151854.3609.107.camel@mattotaupa> Am Sonntag, den 31.07.2011, 15:19 -0700 schrieb Jeffrey Dick: > Here's an attempt using sapply: > > > x <- c(2, 2, 3, 3, 4, 6) > > ys <- 1:8 > > sapply(ys, function(y) { length(which(x==y)) } ) > [1] 0 2 2 1 0 1 0 0 The last piece for my trials missing was `sapply()` which I overlooked reading `?lapply()` inspired by [1]. So an alternative is > x <- c(2, 2, 3, 3, 4, 6) > ys <- 1:8 > sapply(ys, function(y) { sum(x==y) } ) [1] 0 2 2 1 0 1 0 0 which is of course overkill reading D?nes? response. > tabulate(x, 8) [1] 0 2 2 1 0 1 0 0 Thank you all, Paul [1] http://www.r-bloggers.com/thinking-in-r-vectors/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From cloos+r-help at jhcloos.com Mon Aug 1 00:21:44 2011 From: cloos+r-help at jhcloos.com (James Cloos) Date: Sun, 31 Jul 2011 18:21:44 -0400 Subject: [R] Error in plotmath In-Reply-To: <4E35A486.9030809@ucalgary.ca> (Peter Ehlers's message of "Sun, 31 Jul 2011 11:52:54 -0700") References: <1312136706012-3708153.post@n4.nabble.com> <4E35A486.9030809@ucalgary.ca> Message-ID: I can confirm the " instead of ?, using the default X11 device. I use a Gentoo amd64 box, R-2.13.0. Using ylab=expression("Prob[X" <= "x]") didn?t change things. (The " might actually be a ? (DOUBLE PRIME); it is hard to tell.) OTOH, using a utf8-encoded ? works: set.seed(6) x<- rnorm(30, 100, 20) xs<- seq(50, 150, length=150) cdf<- pnorm(xs, 100, 20) plot(xs, cdf, type='l', ylim=c(0,1), xlab=expression(x), ylab=expression(paste("Prob[X ? x]"))) lines(ecdf(x), cex=.5) I ran the scripts in ESS, but that shouldn't affect the plots. -JimC -- James Cloos OpenPGP: 1024D/ED7DAEA6 From mackay at northnet.com.au Mon Aug 1 00:38:19 2011 From: mackay at northnet.com.au (Duncan Mackay) Date: Mon, 01 Aug 2011 08:38:19 +1000 Subject: [R] fitting a sinus curve In-Reply-To: References: <1311843912949-3700833.post@n4.nabble.com> Message-ID: <201107312240.p6VMe6pP009607@mail15.tpg.com.au> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mackay at northnet.com.au Mon Aug 1 00:46:00 2011 From: mackay at northnet.com.au (Duncan Mackay) Date: Mon, 01 Aug 2011 08:46:00 +1000 Subject: [R] Error in plotmath In-Reply-To: References: <1312136706012-3708153.post@n4.nabble.com> Message-ID: <201107312247.p6VMlgYq009816@mail14.tpg.com.au> I get the same on mine for the code below sessionInfo() sessionInfo() R version 2.13.1 (2011-07-08) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 [4] LC_NUMERIC=C LC_TIME=English_Australia.1252 attached base packages: [1] datasets utils stats graphics grDevices grid methods base other attached packages: [1] pracma_0.7.5 R.oo_1.8.1 R.methodsS3_1.2.1 foreign_0.8-45 chron_2.3-40 MASS_7.3-14 lattice_0.19-30 Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mackay at northnet.com.au At 08:17 01/08/2011, you wrote: >Both of these work on my system: > >set.seed(6) >x <- rnorm(30, 100, 20) >xs <- seq(50, 150, length=150) >cdf <- pnorm(xs, 100, 20) > >plot(xs, cdf, type='l', ylim=c(0,1), > xlab=expression(x), > ylab=expression(paste("Prob[", X <= x, "]"))) # FH >lines(ecdf(x), cex=.5) > >plot(xs, cdf, type='l', ylim=c(0,1), > xlab=expression(x), > ylab=expression("Prob[X" <= "x]")) # PE >lines(ecdf(x), cex = 0.5) > > > sessionInfo() >R version 2.13.1 Patched (2011-07-21 r56468) >Platform: x86_64-pc-mingw32/x64 (64-bit) > >locale: >[1] LC_COLLATE=English_United States.1252 >[2] LC_CTYPE=English_United States.1252 >[3] LC_MONETARY=English_United States.1252 >[4] LC_NUMERIC=C >[5] LC_TIME=English_United States.1252 > >attached base packages: >[1] stats graphics grDevices utils datasets grid methods >[8] base > >other attached packages: >[1] directlabels_1.3 sos_1.3-1 brew_1.0-6 lattice_0.19-30 >[5] ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 plyr_1.5.2 > >Dennis > >On Sun, Jul 31, 2011 at 11:25 AM, Frank Harrell > wrote: > > Under > > > > platform x86_64-pc-linux-gnu > > arch x86_64 > > os linux-gnu > > system x86_64, linux-gnu > > status > > major 2 > > minor 13.1 > > year 2011 > > month 07 > > day 08 > > svn rev 56322 > > language R > > version.string R version 2.13.1 (2011-07-08) > > > > I get a double quote mark in place of <= in the y-axis label when I run the > > following. > > > > set.seed(6) > > x <- rnorm(30, 100, 20) > > xs <- seq(50, 150, length=150) > > cdf <- pnorm(xs, 100, 20) > > plot(xs, cdf, type='l', ylim=c(0,1), > > xlab=expression(x), > > ylab=expression(paste("Prob[", X <= x, "]"))) > > lines(ecdf(x), cex=.5) > > > > The problem also occurs if I use instead ylab=expression(prob(X <= x))) > > > > All is well if I remove "=" but I need <=. > > > > Frank > > > > ----- > > Frank Harrell > > Department of Biostatistics, Vanderbilt University > > -- > > View this message in context: > http://r.789695.n4.nabble.com/Error-in-plotmath-tp3708153p3708153.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. From jim.silverton at gmail.com Mon Aug 1 00:59:13 2011 From: jim.silverton at gmail.com (Jim Silverton) Date: Sun, 31 Jul 2011 18:59:13 -0400 Subject: [R] Simulating from the Null Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From studentofr at gmail.com Mon Aug 1 00:38:29 2011 From: studentofr at gmail.com (r student) Date: Sun, 31 Jul 2011 15:38:29 -0700 Subject: [R] help with algorithm In-Reply-To: References: Message-ID: Thanks for everyone's suggestions. I think looping is the way to go. I have 50 files on which I need to apply the same procedures, so I'll try and wrap my final code in some sort of loop. > > highest and lowest groups)? (I can do this in multiple steps* but wonder > > what the best, "R way" is to do this.) > > Here is one way to get the means by groups: > > tmp <- with(mtcars, tapply(mpg, cyl, mean)) > ## and now subset by it > subset(mtcars, mtcars$cyl %in% names(c(which.max(tmp), which.min(tmp)))) Looks exactly like what I'd need but, I tried and got a list of variable names followed by "<0 rows> (or 0-length row.names)". If I can get this to work, would I be able to use it for weighted means with "rm.na=TRUE"? Since they're weighted means I've been trying to use the following: f<-by(oh,oh$BYGRP, function(z) weighted.mean(z$VAR1,z$WEIGHT,na.rm=TRUE)) Which seems to work, but need to use these means to subset the highest and lowest groups (to create density plots of). The above produces an object that I'm not entirely sure how to work with (say to merge back onto "oh" so I can subset. > How to draw cutoff lines at specific points on density plots? > plot(density(rnorm(100))) > abline(v = c(-1, 1)) Thanks! Amazing how much easier to do this in R than in SAS. > > How to create a matrix of plots? (Take 4 separate plots and put them into a > > single graphic.) > > This depends a bit on the potential complexity of layouts you need. > See ?par and ?layout Thanks again! Many good suggestions here and from others. > > * Get group means, add means back to file, sort by mean, take first and last > > groups > > dat <- mtcars > dat$gm <- with(mtcars, ave(mpg, cyl, FUN = mean)) I tried, but I think the NAs are giving me trouble. tmp <- oh tmp$GM <- with(oh, ave(FINCP, PUMA, FUN=mean)) summary(tmp$GM) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 114222 And I need to create weighted means anyway. > Hope this helps, > > Josh > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, ATS Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ Very helpful. Thanks. From daisy.duursma at gmail.com Mon Aug 1 01:20:12 2011 From: daisy.duursma at gmail.com (Daisy Englert Duursma) Date: Mon, 1 Aug 2011 09:20:12 +1000 Subject: [R] Legend for 2 plots on same screen In-Reply-To: References: Message-ID: You could make three plots. The first two you plot in and the third one you place the legend in. nf<-layout(matrix(c(1,2,3), 1, 3, byrow = TRUE), c(6,6,3), c(6)) layout.show(nf) plot(sin, -pi, 2*pi, col = "blue2") plot(sin, -pi, 2*pi, col = "darkorange3") plot(1, xlim=c(1,2), ylim=c(1,2), type="n",axes=F, ann=F) legend("topleft", c("A","B"), lty=c(1,3),col= c("blue2","darkorange3")) On Mon, Aug 1, 2011 at 4:52 AM, Cheryl Johnson wrote: > Hello, > > I have two plots on the same screen. I use the command par(mfrow=c(1,2)) in > order to do this. When I try to make a legend for both plots, it only puts > the legend in the plot on the right side. If I would like a legend that is > outside of both of the plots, how would I do this? > > Thanks > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Daisy Englert Duursma Department of Biological Sciences Room E8C156 Macquarie University, North Ryde, NSW 2109 Australia Tel +61 2 9850 9256 From dwinsemius at comcast.net Mon Aug 1 01:25:54 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 31 Jul 2011 19:25:54 -0400 Subject: [R] example package for devel newcomers In-Reply-To: References: <201107311805.43392@spsconsultoria.com> Message-ID: <7F1462F7-6146-4781-9F40-FCA6EF2FC304@comcast.net> On Jul 31, 2011, at 5:11 PM, Joshua Wiley wrote: > On Sun, Jul 31, 2011 at 2:05 PM, Alexandre Aguiar > wrote: >> Hi, >> >> I'd like to know whether there is a package (or more, of course) >> regarded >> as a good example that could be used also as an instructional tool >> for >> newcomers to R extensions development. > > I used/use SoDA, but then I also used Dr. Chambers' book. My memory is that this question gets asked every few months and one of the stock answers is to use the function 'package.skeleton' in the utils package as a starting point. > > Josh > >> >> Thanks. >> >> -- David Winsemius, MD West Hartford, CT From murdoch.duncan at gmail.com Mon Aug 1 01:36:10 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Sun, 31 Jul 2011 19:36:10 -0400 Subject: [R] Display/show the evaluation result of R commands automatically In-Reply-To: References: Message-ID: <4E35E6EA.6080703@gmail.com> On 11-07-31 7:15 AM, Anthony Ching Ho Ng wrote: > Hello R-help, > > I wonder if it is possible to configure R, so that it will > display/show the evaluation result of the R commands automatically > (similar to the behavior of Matlab) It's open source so in theory anything is possible, but there's no built-in support for that. > i.e. If I type x<- 8 > > it will print 8 in the command prompt, instead of having type x > explicitly to show the result and perhaps put an ";" at the end to > suppress the output. > > i.e. x<- 8; If you wrap the assignment in parens the result is made visible, e.g. (x <- 8) will print if you enter it in the console. (It won't print if it's a line in a function; there you need an explicit call to print(). Only the value returned at the top level is eligible for auto-printing.) Duncan Murdoch From paulepanter at users.sourceforge.net Mon Aug 1 01:36:31 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Mon, 01 Aug 2011 01:36:31 +0200 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: References: <1311809771.29519.276.camel@mattotaupa> Message-ID: <1312155391.3609.135.camel@mattotaupa> Am Mittwoch, den 27.07.2011, 19:59 -0400 schrieb R. Michael Weylandt : > Some more skilled folks can help with the curve fitting, but the general > answer is yes -- R will handle this quite ably. Great to read that. > Consider the following code: > > <<-------------------------------------->> > n = 1e5 > length = 1e5 > > R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n) > R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make > your life INFINITELY better > R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired. > > <<---------------------------------------->> > > There are actually even faster ways to do what you are asking for, but this > introduces you to some useful R architecture, above all the apply function. Thank you very much. I realized the the 0 column is not need when summing this up. Additionally I posted the wrong example code and I actually am only interested how long it stays negative from the beginning. > To see how long the longest stretch of negatives in each row is, the > following is a little sneaky but works pretty well: > > countNegative = apply(R,1,function(x){which.max(table(cumsum(x>=0))}) > > then you can study these random numbers to do whatever with them. > > The gist of this code is that it counts how many positive number have been > seen up to each point: for any stretch this doesn't increase, you must be > negative, so this identifies the longest such stretch on each row and > records the length. (It may be off by one so check it on a smaller R matrix. That is a great example. It took me a while what `table()` does here but thanks to your explanation I finally understood it. [?] > So all together: > > <<-------------------------------------->> > n = 1e3 > length = 1e3 > > R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n) > R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make > your life INFINITELY better > R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired. > fTemp <- function(x) { > return(max(table(cumsum(x>=0)))) > } > countNegative = apply(R,1,fTemp) > mu = mean(countNegative) > sig = sd(countNegative)/sqrt(length(countNegative)) > > <<---------------------------------------->> > > This runs pretty fast on my laptop, but you'll need to look into the > memory.limit() function if you want to increase the simulation parameters. > There are much faster ways to handle the simulation as well, but this should > get you off to a nice start with R. > > Hope this helps, It did. Thank you again for the kind and elaborate introduction. Trying to run your example right away froze my system using `n = 1000` and `length = 1e5` [1]. So I really need to be careful how big such a matrix can get. One thing is to use integers as suggested in [2]. My current code looks like the following. -------- 8< -------- code -------- >8 -------- f4 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n) R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better fTemp <- function(x) { if (x[1] >= 0 ) { return(1) } for (i in 1:length-1) { if (x[i] < 0 && x[i + 1] >= 0) { return(as.integer(i/2 + 2)) # simple random walks only hit 0 on even ?times? } } } countNegative = apply(R,2,fTemp) tabulate(as.vector(countNegative), length) } -------- 8< -------- code -------- >8 -------- 1.I could actually avoid `cumsum()` half the time, when the first entry is already positive. So I am still looking for a way to speed that up in comparison to a simple two loops scenario. 2. The counting of the length how long the walk stayed negative is probably also inefficient and I should find a better way on how to return the values. I am still thinking about both cases, but to come up with vectoriazations of the problem is quite hard. So I welcome any suggestions. ;-) Thanks, Paul [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832 [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832#10 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From ggrothendieck at gmail.com Mon Aug 1 01:39:42 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Sun, 31 Jul 2011 19:39:42 -0400 Subject: [R] Display/show the evaluation result of R commands automatically In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 7:15 AM, Anthony Ching Ho Ng wrote: > Hello R-help, > > I wonder if it is possible to configure R, so that it will > display/show the evaluation result of the R commands automatically > (similar to the behavior of Matlab) > > i.e. If I type x <- 8 > > it will print 8 in the command prompt, instead of having type x > explicitly to show the result and perhaps put an ";" at the end to > suppress the output. > > i.e. x <- 8; > Not quite the same but perhaps good enough is that at the console this will give you the last value: .Last.value -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From djmuser at gmail.com Mon Aug 1 02:31:47 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sun, 31 Jul 2011 17:31:47 -0700 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: <1312155391.3609.135.camel@mattotaupa> References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> Message-ID: Hi: See if this works for you: f4 <- function() { x <- sample(c(-1L,1L), 1) if (x >= 0 ) {return(1)} else { csum <- x len <- 1 while(csum < 0) { csum <- csum + sample(c(-1, 1), 1) len <- len + 1 } } len } # In one batch of repetitions of this function, system.time(out <- replicate(1000, f4())) user system elapsed 0.51 0.00 0.52 > range(out) [1] 1 17372 but in another (untimed), this took a significantly longer amount of time to run [for obvious reasons]: > range(out) [1] 1 987752 For 100000 repetitions, I'd guess this could run anywhere from one to several minutes, depending on the lengths of the sojourns encountered. This looks like a reasonable way to visualize the output for 1000 replications: hist(log(out), nclass = 20) Notice that the function takes no arguments, returns the length of the random walk while its cumulative sum is negative [or 1 if it starts out positive], and then uses the replicate() function to iterate the function f4() N times. HTH, Dennis On Sun, Jul 31, 2011 at 4:36 PM, Paul Menzel wrote: > Am Mittwoch, den 27.07.2011, 19:59 -0400 schrieb R. Michael Weylandt : >> Some more skilled folks can help with the curve fitting, but the general >> answer is yes -- R will handle this quite ably. > > Great to read that. > >> Consider the following code: >> >> <<-------------------------------------->> >> n = 1e5 >> length = 1e5 >> >> R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n) >> R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make >> your life INFINITELY better >> R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired. >> >> <<---------------------------------------->> >> >> There are actually even faster ways to do what you are asking for, but this >> introduces you to some useful R architecture, above all the apply function. > > Thank you very much. I realized the the 0 column is not need when > summing this up. Additionally I posted the wrong example code and I > actually am only interested how long it stays negative from the > beginning. > >> To see how long the longest stretch of negatives in each row is, the >> following is a little sneaky but works pretty well: >> >> countNegative = apply(R,1,function(x){which.max(table(cumsum(x>=0))}) >> >> then you can study these random numbers to do whatever with them. >> >> The gist of this code is that it counts how many positive number have been >> seen up to each point: for any stretch this doesn't increase, you must be >> negative, so this identifies the longest such stretch on each row and >> records the length. (It may be off by one so check it on a smaller R matrix. > > That is a great example. It took me a while what `table()` does here but > thanks to your explanation I finally understood it. > > [?] > >> So all together: >> >> <<-------------------------------------->> >> n = 1e3 >> length = 1e3 >> >> R = matrix(sample(c(-1,1),length*n,replace=T),nrow=n) >> R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make >> your life INFINITELY better >> R = cbind(rep(0,n),R) ## Now each row is a random walk as you desired. >> fTemp <- function(x) { >> ? ? return(max(table(cumsum(x>=0)))) >> } >> countNegative = apply(R,1,fTemp) >> mu = mean(countNegative) >> sig = sd(countNegative)/sqrt(length(countNegative)) >> >> <<---------------------------------------->> >> >> This runs pretty fast on my laptop, but you'll need to look into the >> memory.limit() function if you want to increase the simulation parameters. >> There are much faster ways to handle the simulation as well, but this should >> get you off to a nice start with R. >> >> Hope this helps, > > It did. Thank you again for the kind and elaborate introduction. > > Trying to run your example right away froze my system using `n = 1000` > and `length = 1e5` [1]. So I really need to be careful how big such a > matrix can get. One thing is to use integers as suggested in [2]. > > My current code looks like the following. > > -------- 8< -------- code -------- >8 -------- > f4 <- function(n = 100000, # number of simulations > ? ? ? ? ? ? ? length = 100000) # length of iterated sum > { > ? ? ? ?R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n) > ? ? ? ?R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better > ? ? ? ?fTemp <- function(x) { > ? ? ? ? ? ? ? ?if (x[1] >= 0 ) { > ? ? ? ? ? ? ? ? ? ? ? ?return(1) > ? ? ? ? ? ? ? ?} > > ? ? ? ? ? ? ? ?for (i in 1:length-1) { > ? ? ? ? ? ? ? ? ? ? ? ?if (x[i] < 0 && x[i + 1] >= 0) { > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?return(as.integer(i/2 + 2)) # simple random walks only hit 0 on even ?times? > ? ? ? ? ? ? ? ? ? ? ? ?} > ? ? ? ? ? ? ? ?} > ? ? ? ?} > ? ? ? ?countNegative = apply(R,2,fTemp) > ? ? ? ?tabulate(as.vector(countNegative), length) > } > -------- 8< -------- code -------- >8 -------- > > 1.I could actually avoid `cumsum()` half the time, when the first entry > is already positive. So I am still looking for a way to speed that up in > comparison to a simple two loops scenario. > 2. The counting of the length how long the walk stayed negative is > probably also inefficient and I should find a better way on how to > return the values. > > I am still thinking about both cases, but to come up with > vectoriazations of the problem is quite hard. > > So I welcome any suggestions. ;-) > > > Thanks, > > Paul > > > [1] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832 > [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=635832#10 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From asaguiar at spsconsultoria.com Mon Aug 1 03:24:26 2011 From: asaguiar at spsconsultoria.com (Alexandre Aguiar) Date: Sun, 31 Jul 2011 22:24:26 -0300 Subject: [R] example package for devel newcomers In-Reply-To: <7F1462F7-6146-4781-9F40-FCA6EF2FC304@comcast.net> References: <201107311805.43392@spsconsultoria.com> <7F1462F7-6146-4781-9F40-FCA6EF2FC304@comcast.net> Message-ID: <201107312224.31828@spsconsultoria.com> Em Domingo 31 Julho 2011, voc? escreveu: > My memory is that this question gets asked every few months and one of > the stock answers is to use the function 'package.skeleton' in the > utils package as a starting point. Got that from docs. And actually I already have most of the code written. My question addresses known tricks and impressions by experienced R interface programmers. This kind of stuff can be really useful. For instance, tricks are much better than docs when embedding php. Thanx. -- Alexandre -- Alexandre Santos Aguiar, MD, SCT -------------- Pr?xima Parte ---------- Um anexo n?o-texto foi limpo... Nome: n?o dispon?vel Tipo: application/pgp-signature Tamanho: 198 bytes Descri??o: This is a digitally signed message part. URL: From bbolker at gmail.com Mon Aug 1 04:12:51 2011 From: bbolker at gmail.com (Ben Bolker) Date: Mon, 1 Aug 2011 02:12:51 +0000 Subject: [R] Simulating from the Null References: Message-ID: Jim Silverton gmail.com> writes: > > Hello all, > I am doing glm with a negative binomial link. > I have two treatments and 3 replicates in each treatment. My question is > this, how can I simulate data for the the columns from the null and [alternative?] > distribution. > Simulate from the null: fit the null model and use the simulate method: nullmodel <- glm.nb(response~1,data=yourdata) simulate(nullmodel) Simulate from the fitted distribution: as above, but fit the full model. From jholtman at gmail.com Mon Aug 1 04:32:58 2011 From: jholtman at gmail.com (jim holtman) Date: Sun, 31 Jul 2011 22:32:58 -0400 Subject: [R] memory problem; Error: cannot allocate vector of size 915.5 Mb In-Reply-To: <1312127619560-3707943.post@n4.nabble.com> References: <1312127619560-3707943.post@n4.nabble.com> Message-ID: My advice to you is to get a 64-bit version of R. Here is what it does on my 64-bit Windows 7 version: > N<-250 > x<-matrix(c(rnorm(N,-1.5,1), rnorm(N,1,1), rbinom(N,1,0.5)), ncol=3) > my.stats(1) 1 (1) - Rgui : 22:30:20 <0.7 78.6> 78.6 : 20.5MB > start<-(-1) > end<-3 > step<-10^(-2) > n.steps<-(end-start)/step > steps2 <-n.steps^2 > grids<-seq(from=start+step, to=end, by=step) > xMax <-matrix(0,N*steps2,3) > my.stats(2) 2 (1) - Rgui : 22:30:23 <4.1 82.1> 82.1 : 935.5MB > xMax[,1]<-rep(x[,1],steps2) > xMax[,2]<-rep(x[,2],steps2) > xMax[,3]<-rep(x[,3],steps2) > my.stats(3) 3 (1) - Rgui : 22:30:35 <16.0 94.3> 94.3 : 1998.9MB > G.search1<-as.matrix(rep(grids, n.steps, each=N)) > G.search2<-as.matrix(rep(grids, N, each=n.steps)) > G.search<-cbind(1,G.search1, G.search2) > my.stats(3) 3 (1) - Rgui : 22:30:45 <25.2 103.7> 103.7 : 2456.6MB > > gc() used (Mb) gc trigger (Mb) max used (Mb) Ncells 143726 7.7 350000 18.7 350000 18.7 Vcells 320137296 2442.5 353288723 2695.4 320138039 2442.5 > my.ls() Size Mode .my.env 56 environment .Random.seed 2,544 numeric end 48 numeric G.search 960,000,200 numeric G.search1 320,000,200 numeric G.search2 320,000,200 numeric grids 3,240 numeric N 48 numeric n.steps 48 numeric start 48 numeric step 48 numeric steps2 48 numeric x 6,200 character xMax 960,000,200 numeric **Total 2,560,013,128 ------- > You have objects totaling 2.5GB which is probably larger than can be handled on a 32-bit version, especially when copies have to be made. On Sun, Jul 31, 2011 at 11:53 AM, Dimitris.Kapetanakis wrote: > Dear all, > > I am trying to make some matrix operations (whose size I think is smaller > than what R allows) but the operations are not feasible when they run in one > session but it is feasible if they run separately while each operation is > totally independent of the other. I run the code in one session the error > that appears is: > > Error: cannot allocate vector of size 915.5 Mb > R(16467,0xa0421540) malloc: *** mmap(size=960004096) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > R(16467,0xa0421540) malloc: *** mmap(size=960004096) failed (error code=12) > *** error: can't allocate region > *** set a breakpoint in malloc_error_break to debug > > In the code that I run (next lines), if I do not include the last three > lines it runs perfectly, if I exclude operations to create the xMax again it > runs perfectly, if I include both G.search and xMax appears the error term. > Does anyone knows the solution of this problem or why this problem happens? > > The code that I run is: > > N<-250 > x<-matrix(c(rnorm(N,-1.5,1), rnorm(N,1,1), rbinom(N,1,0.5)), ncol=3) > start<-(-1) > end<-3 > step<-10^(-2) > n.steps<-(end-start)/step > steps2 ?<-n.steps^2 > grids<-seq(from=start+step, to=end, by=step) > xMax ? ?<-matrix(0,N*steps2,3) > xMax[,1]<-rep(x[,1],steps2) > xMax[,2]<-rep(x[,2],steps2) > xMax[,3]<-rep(x[,3],steps2) > G.search1<-as.matrix(rep(grids, n.steps, each=N)) > G.search2<-as.matrix(rep(grids, N, each=n.steps)) > G.search<-cbind(1,G.search1, G.search2) > > Thank you > > Dimitris > > > -- > View this message in context: http://r.789695.n4.nabble.com/memory-problem-Error-cannot-allocate-vector-of-size-915-5-Mb-tp3707943p3707943.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From mailinglist.honeypot at gmail.com Mon Aug 1 05:32:02 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Sun, 31 Jul 2011 23:32:02 -0400 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> Message-ID: Hi, I haven't been following this thread very closely, but I'm getting the impression that the "inner loop" that's killing you folks here looks quite simple (assuming it is the one provided below). How about trying to write the of this `f4` function below using the rcpp/inline combo. The C/C++ you will need to write looks to be quite trivial, let's change f4 to accept an x argument as a vector: I've defined f4 in the same way as Dennis did: > f4 <- function() > ?{ > ? ? x <- sample(c(-1L,1L), 1) > > ? ? ?if (x >= 0 ) {return(1)} else { > ? ? ? ? ? csum <- x > ? ? ? ? ? len <- 1 > ? ? ? ? ? ? ? while(csum < 0) { > ? ? ? ? ? ? ? ? ? csum <- csum + sample(c(-1, 1), 1) > ? ? ? ? ? ? ? ? ? len <- len + 1 > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?} ? ? } > ? ? ?len > ?} Now, let's do some inline/c++ mojo: library(inline) inc <- " #include #include #include " fxx <-cxxfunction(includes=inc, plugin="Rcpp", body=" int len = 1; int x = ((rand() % 2 ) == 0) ? 1 : -1; int csum = x; while (csum < 0) { x = ((rand() % 2 ) == 0) ? 1 : -1; len++; csum = csum + x; } return wrap(len); ") Assuming I've faithfully translated this into c++, the timings aren't all that comparable. Doing 500 replicates with the pure R version: set.seed(123) system.time(out <- replicate(500, f4())) user system elapsed 31.525 0.120 32.510 Doing 10,000 replicates using the fxx function doesn't even break a sweat: system.time(outxx <- replicate(10000, fxx())) user system elapsed 0.371 0.001 0.373 range(out) [1] 1 1994308 range(outxx) [1] 1 11909394 Hope I'm not too off of the mark, here. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From matt.curcio.ri at gmail.com Mon Aug 1 05:41:40 2011 From: matt.curcio.ri at gmail.com (Matt Curcio) Date: Sun, 31 Jul 2011 23:41:40 -0400 Subject: [R] Use dump or write? or what? Message-ID: Greetings all, I am calculating two t-test values for each of many files then save it to file calculate another set and append, repeat. But I can't figure out how to write it to file and then append subsequent t-tests. (maybe too tired ;} ) I have tried to use "dump" and "file.append" to no avial. ttest_results = tempfile() two_sample_ttest <- t.test (tempA, tempB, var.equal = TRUE) welch_ttest <- t.test (tempA, tempB, var.equal = FALSE) dump (two_sample_ttest, file = "dumpdata.txt"", append=TRUE) ttest_results <- file.append (ttest_results, two_sample_ttest) Any suggestions, M -- Matt Curcio M: 401-316-5358 E: matt.curcio.ri at gmail.com From djmuser at gmail.com Mon Aug 1 06:10:12 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sun, 31 Jul 2011 21:10:12 -0700 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> Message-ID: Hi Steve: Very, very nice. Thanks for the useful Rcpp script. I'm not surprised that a C++ version blows my humble little R function out of the water :) I noticed that the R function ran a lot more slowly when the sojourns were very long. It suggests that algorithms that entail conditional iteration are quite likely to be better off written in a compiled programming language that can communicate with R. It also shows off the capabilities of the Rcpp package. Best regards, Dennis On Sun, Jul 31, 2011 at 8:32 PM, Steve Lianoglou wrote: > Hi, > > I haven't been following this thread very closely, but I'm getting the > impression that the "inner loop" that's killing you folks here looks > quite simple (assuming it is the one provided below). > > How about trying to write the of this `f4` function below using the > rcpp/inline combo. The C/C++ you will need to write looks to be quite > trivial, let's change f4 to accept an x argument as a vector: > > I've defined f4 in the same way as Dennis did: > >> f4 <- function() >> ?{ >> ? ? x <- sample(c(-1L,1L), 1) >> >> ? ? ?if (x >= 0 ) {return(1)} else { >> ? ? ? ? ? csum <- x >> ? ? ? ? ? len <- 1 >> ? ? ? ? ? ? ? while(csum < 0) { >> ? ? ? ? ? ? ? ? ? csum <- csum + sample(c(-1, 1), 1) >> ? ? ? ? ? ? ? ? ? len <- len + 1 >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?} ? ? } >> ? ? ?len >> ?} > > Now, let's do some inline/c++ mojo: > > library(inline) > inc <- " > #include > #include > #include > " > > fxx <-cxxfunction(includes=inc, plugin="Rcpp", body=" > ?int len = 1; > ?int x = ((rand() % 2 ) == 0) ? 1 : -1; > ?int csum = x; > > ?while (csum < 0) { > ? ?x = ((rand() % 2 ) == 0) ? 1 : -1; > ? ?len++; > ? ?csum = csum + x; > ?} > > ?return wrap(len); > ") > > Assuming I've faithfully translated this into c++, the timings aren't > all that comparable. > > Doing 500 replicates with the pure R version: > > set.seed(123) > system.time(out <- replicate(500, f4())) > ? user ?system elapsed > ?31.525 ? 0.120 ?32.510 > > Doing 10,000 replicates using the fxx function doesn't even break a sweat: > > system.time(outxx <- replicate(10000, fxx())) > ? user ?system elapsed > ?0.371 ? 0.001 ? 0.373 > > range(out) > [1] ? ? ? 1 1994308 > > range(outxx) > [1] ? ? ? ?1 11909394 > > Hope I'm not too off of the mark, here. > -steve > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > From michael.weylandt at gmail.com Mon Aug 1 06:32:02 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Sun, 31 Jul 2011 23:32:02 -0500 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: <1312155391.3609.135.camel@mattotaupa> References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From M.Rosario.Garcia at slu.se Mon Aug 1 01:54:17 2011 From: M.Rosario.Garcia at slu.se (Rosario Garcia Gil) Date: Mon, 1 Aug 2011 01:54:17 +0200 Subject: [R] export/import matrix Message-ID: Hello I have a problem on keeping the format when I export a matrix file with the write.table() function. When I import the data volcano from rgl package it looks like this in R: > data[1:5,] [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [1,] 100 100 101 101 101 101 101 100 100 100 101 101 102 102 [2,] 101 101 102 102 102 102 102 101 101 101 102 102 103 103 [3,] 102 102 103 103 103 103 103 102 102 102 103 103 104 104 [4,] 103 103 104 104 104 104 104 103 103 103 103 104 104 104 [5,] 104 104 105 105 105 105 105 104 104 103 104 104 105 105 I use this data to represent a 3D map with the follwing script and it works PEFECT! > y<- 2*data > x <- 10* (1:nrow(y)) > z <- 10* (1:ncol(y)) > ylim <- range(y) > ylen <-ylim[2] - ylim[1] + 1 > colorlut <- terrain.colors(ylen) > col <- colorlut[y-ylim[1] + 1] > rgl.open() > rgl.surface(x,z,y, color=col, back="lines") Then I export it as write.table(data, file="datam.txt", row.names=TRUE, col.names=TRUE), when I import it back into R again with read.table("datam.txt") it looks like this in R: V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 1 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102 103 104 103 2 101 101 102 102 102 102 102 101 101 101 102 102 103 103 103 103 104 105 104 3 102 102 103 103 103 103 103 102 102 102 103 103 104 104 104 104 105 106 105 4 103 103 104 104 104 104 104 103 103 103 103 104 104 104 105 105 106 107 106 5 104 104 105 105 105 105 105 104 104 103 104 104 105 105 105 106 107 108 108 The script I mention before does not anymore work on it, if I converted to matrix with as.matrix still does not work. I have read the pdf on import/export of R and searched by googleling but I have not found any answer to my problem. I am sorry if the answer is very obvious but I have tried for more than a week. Any help is really wellcome, thanks in advance. Rosario From marcelcurlin at gmail.com Mon Aug 1 06:00:58 2011 From: marcelcurlin at gmail.com (marcel) Date: Sun, 31 Jul 2011 21:00:58 -0700 (PDT) Subject: [R] Plot Frame color and linewidth Message-ID: <1312171258369-3708858.post@n4.nabble.com> I have a figure with a lattice plot and a basic plot. Is there a way to select the color and line width of the surrounding boxes? # Data tC <- textConnection(" Time Type1 Type2 Type3 1.3 .50 .10 .40 4.5 .45 .20 .35 5.2 .40 .30 .30 ") data1 <- read.table(header=TRUE, tC) data2 <- data.frame(Time=rep(data1$Time, 3), stack(data1[,2:4])) close.connection(tC) rm(tC) #My lattice plot require(lattice) plot1<-xyplot(values ~ Time, par.settings = list(layout.widths = list(ylab.axis.padding = 0)), ylab=list(label="Y Label", fontsize=9), scales=list(y=list(relation="free", rot=0, cex=0.7), x = list(draw = FALSE)), group=ind, data=data2, stack=TRUE, horizontal=FALSE, panel=panel.barchart, box.width=0.1, axes=FALSE, ylim=c(0.03,0.98), xlim=c(-0.2, 6.25), main="Lattice Plot", xlab="") plot(0.1,0.1, axes=FALSE, ylab="", xlab="", pch = "") # dummy plot to reset basic plot environment print(plot1, position=c(-0.0068,0.221,0.741,0.466)) #My basic plot par(new=TRUE, mfrow = c(4, 2), fig=c(0,1,0,1), mar=c(42.9, 4, 1.14, 15)) plot(data1$Time, data1$Type1, frame=T, main="Basic Plot", ylab="Y Label", xlab="", col=2, xlim= c(0,6), ylim= c(0, 1), axes=FALSE) axis(2, at=c(0.25, 0.5, 0.74), las=1) -- View this message in context: http://r.789695.n4.nabble.com/Plot-Frame-color-and-linewidth-tp3708858p3708858.html Sent from the R help mailing list archive at Nabble.com. From Achim.Zeileis at uibk.ac.at Mon Aug 1 09:10:36 2011 From: Achim.Zeileis at uibk.ac.at (Achim Zeileis) Date: Mon, 1 Aug 2011 09:10:36 +0200 (CEST) Subject: [R] zero truncated poisson regression In-Reply-To: <1312143730.59151.YahooMailNeo@web120607.mail.ne1.yahoo.com> References: <1312131597.14558.YahooMailNeo@web120616.mail.ne1.yahoo.com> <1312143730.59151.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: On Sun, 31 Jul 2011, Iasonas Lamprianou wrote: > Thanks > Pscl seems to be a sensible option. > > > I have the counts variable with the name "N". This variable can only > take values bigger than zero! > > I have two explanatory variables with the names "type" and "diam" > > but when I run > > hpm <- hurdle(n ~ type+diam, data = an, dist = "poisson") > > I get the message "invalid dependent variable, minimum count is not > zero". Well, I know that N>0, that is why want to run a zero-truncated > model. But I must be missing something...and the manual does not seem to > help a lot... > > Can anyone help please? As previously pointed out by others on this list: hurdle() is not what you are looking for (although it is related to what you want to do). The hurdle() model is a two-part model consisting of a zero-truncated count part and a binary part for modeling N=0 vs N>0. See also vignette("countreg", package = "pscl") for details. As you don't need the binary hurdle part, you cannot use hurdle() directly. This is why the package "countreg" on R-Forge provides the function zerotrunc() which essentially does the same thing as the count part in hurdle(). install.packages("countreg", repos = "http://R-Forge.R-project.org") library("countreg") m <- zerotrunc(n ~ type + diam, data = an, dist = "poisson") summary(m) > ? > Dr. Iasonas Lamprianou > Department of Social and Political Sciences > University of Cyprus > > >> ________________________________ >> From: Mitchell Maltenfort >> To: Iasonas Lamprianou ; "r-help at r-project.org" >> Sent: Sunday, 31 July 2011, 20:45 >> Subject: Re: [R] zero truncated poisson regression >> >> Pscl package. >> >> On 7/31/11, Iasonas Lamprianou wrote: >>> Dear friends, >>> >>> does anyone know how I can run a zero truncated poisson regression using R >>> (or even SPSS)? >>> >>> Dr. Iasonas Lamprianou >>> Department of Social and Political Sciences >>> University of Cyprus >>> >>> ??? [[alternative HTML version deleted]] >>> >>> >> >> -- >> Sent from my mobile device >> >> Due to the recession, requests for instant gratification will be >> deferred until arrears in scheduled gratification have been satisfied. >> >> >> > [[alternative HTML version deleted]] > > From dieter.menne at menne-biomed.de Mon Aug 1 09:26:09 2011 From: dieter.menne at menne-biomed.de (Dieter Menne) Date: Mon, 1 Aug 2011 00:26:09 -0700 (PDT) Subject: [R] Use dump or write? or what? In-Reply-To: References: Message-ID: <1312183569305-3709031.post@n4.nabble.com> oaxacamatt wrote: > > I am calculating two t-test values for each of many files then save it > to file calculate another set and append, repeat. > You did not tell use what you want to do with the data in the file. If you just want a copy of the output, bracketing with sink(file) and sink() can be useful. If you want to process the results later, try save/load. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Use-dump-or-write-or-what-tp3708904p3709031.html Sent from the R help mailing list archive at Nabble.com. From j3ffdick at gmail.com Mon Aug 1 09:40:39 2011 From: j3ffdick at gmail.com (Jeffrey Dick) Date: Mon, 1 Aug 2011 00:40:39 -0700 Subject: [R] Use dump or write? or what? In-Reply-To: References: Message-ID: Hi Matt, I assume that you want a tabular text file of the results. Since I don't know what your tempA and tempB are I'll steal some examples from ?t.test > t.example.1 <- t.test(1:10,y=c(7:20)) > t.example.2 <- t.test(1:10,y=c(7:20, 200)) Now looking at ?dump, the first argument needs to be *character*, signifying "The names of one or more R objects to be dumped." So put the names of the objects in quotes. I'm ignoring your "ttest_results = tempfile()" line because it appears you want to put the results into "dumpdata.txt". > dump ("t.example.1", file = "dumpdata.txt") > dump ("t.example.2", file = "dumpdata.txt", append=TRUE) Both objects show up in the file (yay!) but the result probably isn't what you're after, with some R-like code along the lines of t.example.1 <- structure(list(statistic [ ... lots of other stuff that isn't in a tabular format ... ] write()ing a list isn't the way to go either: > write(t.example.1,"test.txt") Error in cat(list(...), file, sep, fill, labels, append) : argument 1 (type 'list') cannot be handled by 'cat' write.table() of the whole result gives some kind of problem as well: > write.table(t.example.1,"test.txt") Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class '"htest"' into a data.frame What about saving only part of the results? The first line below overwrites the "dumpdata.txt" created above. The second line appends to the file, and also doesn't write the column names because they are already present from the first write.table. > write.table(t.example.1[1:3], "dumpdata.txt") > write.table(t.example.2[1:3], "dumpdata.txt", append=TRUE, col.names=FALSE) There are certainly many variations to try. This writes only the "statistic", "parameter" and "p.value" of the t-tests. Here is the resulting file. > cat(readLines("dumpdata.txt"), sep="\n") "statistic" "parameter" "p.value" "t" -5.43492976389406 21.982212340189 1.85528183251181e-05 "t" -1.63290263320121 14.1645989530125 0.124513498089745 Jeff On Sun, Jul 31, 2011 at 8:41 PM, Matt Curcio wrote: > Greetings all, > I am calculating two t-test values for each of many files then save it > to file calculate another set and append, repeat. > But I can't figure out how to write it to file and then append > subsequent t-tests. > (maybe too tired ;} ) > I have tried to use "dump" and "file.append" to no avial. > > ttest_results = tempfile() > > two_sample_ttest <- t.test (tempA, tempB, var.equal = TRUE) > welch_ttest <- t.test (tempA, tempB, var.equal = FALSE) > > dump (two_sample_ttest, file = "dumpdata.txt"", append=TRUE) > ttest_results <- file.append (ttest_results, two_sample_ttest) > > Any suggestions, > M > -- > > > > Matt Curcio > M: 401-316-5358 > E: matt.curcio.ri at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From pllcc023 at gmail.com Mon Aug 1 09:41:34 2011 From: pllcc023 at gmail.com (Paola Lecca) Date: Mon, 1 Aug 2011 09:41:34 +0200 Subject: [R] Help with modFit of FME package In-Reply-To: References: Message-ID: Dear R users, I'm trying to fit a set an ODE to an experimental time series. In the attachment you find the R code I wrote using modFit and modCost of FME package and the file of the time series. When I run summary(Fit) I obtain this error message, and the values of the parameters are equal to the initial guesses I gave to them. The problem is not due to the fact that I have only one equation (I tried also with more equations, but I still obtain this error). I would appreciate if someone could help me in understanding the reason of the error and in fixing it. Thanks for your attention, Paola Lecca. Here the error: > summary(Fit) Parameters: Estimate Std. Error t value Pr(>|t|) pro1_strength 1 NA NA NA Residual standard error: 2.124 on 10 degrees of freedom Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix In addition: Warning message: In summary.modFit(Fit) : Cannot estimate covariance; system is singular ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *Paola Lecca, PhD* *The Microsoft Research - University of Trento* *Centre for Computational and Systems Biology* *Piazza Manci 17 38123 Povo/Trento, Italy* *Phome: +39 0461282843* *Fax: +39 0461282814* -------------- next part -------------- time pp1_mrna 0 0 2 2.754 4 2.958 6 4.058 8 3.41 10 3.459 12 2.453 14 1.234 16 2.385 18 3.691 20 3.252 From dieter.menne at menne-biomed.de Mon Aug 1 09:44:53 2011 From: dieter.menne at menne-biomed.de (Dieter Menne) Date: Mon, 1 Aug 2011 00:44:53 -0700 (PDT) Subject: [R] Plot frame color and linewidth In-Reply-To: <1312171258369-3708858.post@n4.nabble.com> References: <1312171258369-3708858.post@n4.nabble.com> Message-ID: <1312184693189-3709062.post@n4.nabble.com> marcel wrote: > > I have a figure with a lattice plot and a basic plot. Is there a way to > select the color and line width of the surrounding boxes for each of > these? I could not find any documentation on this. > > Thanks for providing a nice self-contained example. There was nothing wrong with it, but you can simplyfy your life with: library(lattice) xyplot(1~1, par.settings = list(axis.line=list(col="green"))) When lost in trellis space, I always do: str(trellis.par.get()) For standard graphics, there is an example with gray color at the bottom of the par-help page. Dieter -- View this message in context: http://r.789695.n4.nabble.com/Plot-frame-color-and-linewidth-tp3708858p3709062.html Sent from the R help mailing list archive at Nabble.com. From dieter.menne at menne-biomed.de Mon Aug 1 09:51:06 2011 From: dieter.menne at menne-biomed.de (Dieter Menne) Date: Mon, 1 Aug 2011 00:51:06 -0700 (PDT) Subject: [R] export/import matrix In-Reply-To: References: Message-ID: <1312185066452-3709072.post@n4.nabble.com> Rosario Garcia Gil-2 wrote: > > I have a problem on keeping the format when I export a matrix file with > the write.table() function. > > When I import the data volcano from rgl package it looks like this in R: > >> data[1:5,] > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] > [,14] > [1,] 100 100 101 101 101 101 101 100 100 100 101 101 102 > 102 > [2,] 101 101 102 102 102 102 102 101 101 101 102 102 103 > 103 > ... > I use this data to represent a 3D map with the follwing script and it > works PEFECT! > >> y<- 2*data >> x <- 10* (1:nrow(y)) >> z <- 10* (1:ncol(y)) >> ylim <- range(y) >> ylen <-ylim[2] - ylim[1] + 1 >> colorlut <- terrain.colors(ylen) >> col <- colorlut[y-ylim[1] + 1] >> rgl.open() >> rgl.surface(x,z,y, color=col, back="lines") > ... > Then I export it as write.table(data, file="datam.txt", row.names=TRUE, > col.names=TRUE), > ... > when I import it back into R again with read.table("datam.txt") it looks > like this in R: > > ... > The script I mention before does not anymore work on it, if I converted to > matrix with as.matrix still does not work. > > ... > It is always better to report what str(mydata) looks like, instead of showing the data. And I an quite sure that something like as.matrix would work, but you did not tell use what the error message in "still does not work" looked like. Dieter -- View this message in context: http://r.789695.n4.nabble.com/export-import-matrix-tp3708935p3709072.html Sent from the R help mailing list archive at Nabble.com. From ehlers at ucalgary.ca Mon Aug 1 10:25:29 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Mon, 01 Aug 2011 01:25:29 -0700 Subject: [R] Problems with ks.test() In-Reply-To: <1311930451360-3703469.post@n4.nabble.com> References: <1311930451360-3703469.post@n4.nabble.com> Message-ID: <4E3662F9.5060109@ucalgary.ca> (I'm replying to your original post because your follow-up omits the context.) The K-S test is designed for continuous distributions. You have far too many zeros in your data to get anything reasonable out of the test. For your data, the K-S statistic is the difference in the (e)cdfs at zero. Your results just show that this can be sensitive to the degree of rounding used for the theoretical cdf. Peter Ehlers On 2011-07-29 02:07, Jochen1980 wrote: > Hi, > > I got two data point vectors. Now I want to make a ks.test(). I you print > both vectors you will see, that they fit pretty fine. Here is a picture: > http://www.jochen-bauer.net/downloads/kstest-r-help-list-plot.png > > As you can see there is one histogram and moreover there is the gumbel > density > function plotted. Now I took to bin-mids and the bin-height for vector1 and > computed the distribution-values to all bin-mids as vector2. > > I pass these two vectors to ks.test(). Are those the right vectors, if I > want > to decide afterwards, if my experiment-data is gumbel-distributed? > > Surprisingly the p-value changes tremendously if I calculate more digits out > of > my theoretical formula. If I round to 0 digits, p is 1, if I round to 4 > digits, > p drops to 0 - how could this happen, I thought more digits will bring more > accurate results?! > > XXXX Case 0 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXX > [1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16 > 7 > [19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [91] 0 0 0 0 0 0 0 0 0 0 > [1] 0 0 0 0 1 10 49 113 160 168 147 113 81 55 37 24 15 > 10 > [19] 6 4 2 2 1 1 0 0 0 0 0 0 0 0 0 0 0 > 0 > [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [91] 0 0 0 0 0 0 0 0 0 0 > [1] "Ergebnisse" > [1] "Analyse der Eingangsdaten" > [1] "Mean: 0.104537195" > [1] "SAbw.: 0.0277657985898433" > [1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung" > [1] "Mue: 0.0920411082987717" > [1] "Beta: 0.0216489043196013" > [1] "KS-Test -> 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 = > Histogrammh?hen" > [1] "KST D: 0.04" > [1] "KST P: 1" > > XXX Case 4 digits: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX > [1] 0 0 0 0 0 24 74 98 133 147 134 120 89 69 46 31 16 > 7 > [19] 7 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [37] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [55] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [73] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 > [91] 0 0 0 0 0 0 0 0 0 0 > [1] 0.000 0.000 0.000 0.006 0.622 10.094 49.271 112.776 > 160.174 > [10] 168.419 146.527 113.137 81.026 55.344 36.690 23.870 15.347 > 9.793 > [19] 6.220 3.939 2.490 1.572 0.992 0.625 0.394 0.248 > 0.157 > [28] 0.099 0.062 0.039 0.025 0.016 0.010 0.006 0.004 > 0.002 > [37] 0.002 0.001 0.001 0.000 0.000 0.000 0.000 0.000 > 0.000 > [46] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 > 0.000 > [55] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 > 0.000 > [64] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 > 0.000 > [73] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 > 0.000 > [82] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 > 0.000 > [91] 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 > 0.000 > [100] 0.000 > [1] "Ergebnisse" > [1] "Analyse der Eingangsdaten" > [1] "Mean: 0.104537195" > [1] "SAbw.: 0.0277657985898433" > [1] "Parameter-Berechnung der Daten bei angenommener Gumbelverteilung" > [1] "Mue: 0.0920411082987717" > [1] "Beta: 0.0216489043196013" > [1] "KS-Test -> 1000 Werte, 100 Bins, x: Klassenmitten, y1, y2 = > Histogrammh?hen" > [1] "KST D: 0.2" > [1] "KST P: 0.0366" > > Thanks in advance for some help. > Jochen > > -- > View this message in context: http://r.789695.n4.nabble.com/Problems-with-ks-test-tp3703469p3703469.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From baxy at hi.htnet.hr Mon Aug 1 10:33:09 2011 From: baxy at hi.htnet.hr (baxy77) Date: Mon, 1 Aug 2011 01:33:09 -0700 (PDT) Subject: [R] Beta fit returns NaNs Message-ID: <1312187589883-3709139.post@n4.nabble.com> Hi, sorry for repeating the question but this is kind of important to me and i don't know whom should i ask. So as noted before when I do a parameter fit to the beta distr i get: fitdist(vectNorm,"beta"); Fitting of the distribution ' beta ' by maximum likelihood Parameters: estimate Std. Error shape1 2.148779 0.1458042 shape2 810.067515 61.8608126 Warning messages: 1: In dbeta(x, shape1, shape2, log) : NaNs produced 2: In dbeta(x, shape1, shape2, log) : NaNs produced 3: In dbeta(x, shape1, shape2, log) : NaNs produced 4: In dbeta(x, shape1, shape2, log) : NaNs produced 5: In dbeta(x, shape1, shape2, log) : NaNs produced 6: In dbeta(x, shape1, shape2, log) : NaNs produced Now im my vector has cca 900 points. are those 6 error messages some thing to be really concerned or ???? what does it mean ?? -- View this message in context: http://r.789695.n4.nabble.com/Beta-fit-returns-NaNs-tp3709139p3709139.html Sent from the R help mailing list archive at Nabble.com. From lamprianou at yahoo.com Mon Aug 1 10:37:22 2011 From: lamprianou at yahoo.com (Iasonas Lamprianou) Date: Mon, 1 Aug 2011 01:37:22 -0700 (PDT) Subject: [R] zero truncated poisson regression In-Reply-To: References: <1312131597.14558.YahooMailNeo@web120616.mail.ne1.yahoo.com> <1312143730.59151.YahooMailNeo@web120607.mail.ne1.yahoo.com> Message-ID: <1312187842.60769.YahooMailNeo@web120601.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pdalgd at gmail.com Mon Aug 1 10:46:46 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Mon, 1 Aug 2011 10:46:46 +0200 Subject: [R] Beta fit returns NaNs In-Reply-To: <1312187589883-3709139.post@n4.nabble.com> References: <1312187589883-3709139.post@n4.nabble.com> Message-ID: On Aug 1, 2011, at 10:33 , baxy77 wrote: > Hi, > > sorry for repeating the question but this is kind of important to me and i > don't know whom should i ask. > > So as noted before when I do a parameter fit to the beta distr i get: > > > fitdist(vectNorm,"beta"); > Fitting of the distribution ' beta ' by maximum likelihood > Parameters: > estimate Std. Error > shape1 2.148779 0.1458042 > shape2 810.067515 61.8608126 > Warning messages: > 1: In dbeta(x, shape1, shape2, log) : NaNs produced > 2: In dbeta(x, shape1, shape2, log) : NaNs produced > 3: In dbeta(x, shape1, shape2, log) : NaNs produced > 4: In dbeta(x, shape1, shape2, log) : NaNs produced > 5: In dbeta(x, shape1, shape2, log) : NaNs produced > 6: In dbeta(x, shape1, shape2, log) : NaNs produced > > > Now im my vector has cca 900 points. are those 6 error messages some thing > to be really concerned or ???? what does it mean ?? They are probably harmless. It just means that in the search of the parameter space, the fitting algorithm ventured into forbidden territory (most likely, it tried a negative value for one of the shape parameters). You could try setting start= to something closer to the final estimates and see if the warnings go away. BTW: I assume this is using the fitdistrplus contributed package (and not just misspelling fitdistr from MASS)? You really should specify such things -- to make it easier for people to help, but also out of courtesy to the author. -pd -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From ehlers at ucalgary.ca Mon Aug 1 11:16:43 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Mon, 01 Aug 2011 02:16:43 -0700 Subject: [R] Beta fit returns NaNs In-Reply-To: <1312187589883-3709139.post@n4.nabble.com> References: <1312187589883-3709139.post@n4.nabble.com> Message-ID: <4E366EFB.1000705@ucalgary.ca> On 2011-08-01 01:33, baxy77 wrote: > Hi, > > sorry for repeating the question but this is kind of important to me and i > don't know whom should i ask. > > So as noted before when I do a parameter fit to the beta distr i get: > > > fitdist(vectNorm,"beta"); > Fitting of the distribution ' beta ' by maximum likelihood > Parameters: > estimate Std. Error > shape1 2.148779 0.1458042 > shape2 810.067515 61.8608126 > Warning messages: > 1: In dbeta(x, shape1, shape2, log) : NaNs produced > 2: In dbeta(x, shape1, shape2, log) : NaNs produced > 3: In dbeta(x, shape1, shape2, log) : NaNs produced > 4: In dbeta(x, shape1, shape2, log) : NaNs produced > 5: In dbeta(x, shape1, shape2, log) : NaNs produced > 6: In dbeta(x, shape1, shape2, log) : NaNs produced > > > Now im my vector has cca 900 points. are those 6 error messages some thing > to be really concerned or ???? what does it mean ?? Those warnings are from optim(). You probably don't have to worry about them. I usually use fitdistr() in the MASS package. But it will require reasonable start values. To avoid the warnings, you could try using the parameter estimates from your fitdist(vectNorm, "beta") call as start values and re-run fitdist() with those values, and you might also set the optim method to BFGS (which, BTW, is the default in fitdistr()). library(fitdistrplus) fitdist(vectNorm, "beta", start = list(shape1 = 2.15, shape2 = 810), optim.method = "BFGS") Peter Ehlers > > -- > View this message in context: http://r.789695.n4.nabble.com/Beta-fit-returns-NaNs-tp3709139p3709139.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From paul.hiemstra at knmi.nl Mon Aug 1 11:33:48 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Mon, 01 Aug 2011 09:33:48 +0000 Subject: [R] Spatial Data Interpolation In-Reply-To: <1312058132.44931.YahooMailNeo@web121702.mail.ne1.yahoo.com> References: <1312058132.44931.YahooMailNeo@web121702.mail.ne1.yahoo.com> Message-ID: <4E3672FC.4070902@knmi.nl> Dear Peter, The spatial taskview lists a number of interpolation methods [1]. Some of those support spatio-temporal interpolation. For example gstat supports spatio-temporal kriging [2,3,4]. regards, Paul [1] http://cran.r-project.org/web/views/Spatial.html [2] http://en.wikipedia.org/wiki/Kriging [3] http://www.google.nl/search?q=space+time+kriging [4] http://cran.r-project.org/web/packages/gstat/index.html On 07/30/2011 08:35 PM, Peter Maclean wrote: > Dear GIS people > What is the best way of implemeting spatial data interpolation (from large to small grids)-especially for dummies. I searched the internet and could not get concrete answer. Here is an example with simulated data. > > #Example of spatial data interpolation > require(utils) > #I need to interpolate the temp and rain data (from its surounding points) > #for the same period and accoubting for elevation > #New coordinates and elevation > lat <-seq(-1, -5, by=-0.1) > lon <-seq(28, 30, by=0.1) > year <- seq(2000, 2005, by=1) > period <- c("Mar", "Apr","May") > ndata <- list(year=year,period=period,lat=lat, lon=lon) > ndata <- expand.grid(ndata) > ndata$elev <-sample(1000: 8000,nrow(ndata),replace=T) > ndata <- ndata[order(ndata$year,ndata$period) , ] > fix(ndata) > > #Original data with elevation-same period > lat <- seq(-1, -5, by=-0.5) > lon <- seq(28, 30, by=0.5) > data <- list(year=year,period=period,lat=lat, lon=lon) > data <- expand.grid(data) > data$temp <- sample(15:100, nrow(data),replace=T) > data$rain <- sample(0: 1000,nrow(data),replace=T) > data <- data[order(data$year,data$period) , ] > data <- na.omit(merge(data,ndata, by=c("year", "period", "lat","lon"))) > fix(data) > ########## > #Spatial-Temporal Interpolation from original data (temp & rain) to new data > > > Peter Maclean > Department of Economics > UDSM > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From baxy at hi.htnet.hr Mon Aug 1 11:55:11 2011 From: baxy at hi.htnet.hr (baxy77) Date: Mon, 1 Aug 2011 02:55:11 -0700 (PDT) Subject: [R] Beta fit returns NaNs In-Reply-To: References: <1312187589883-3709139.post@n4.nabble.com> Message-ID: <1312192511268-3709277.post@n4.nabble.com> yes it is the fitdistrplus package. Sorry form not mentioning it earlier. Usually i do those things but this time it somehow slipped my mind , sorry and Thank you both! -- View this message in context: http://r.789695.n4.nabble.com/Beta-fit-returns-NaNs-tp3709139p3709277.html Sent from the R help mailing list archive at Nabble.com. From jim at bitwrit.com.au Mon Aug 1 12:14:58 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Mon, 01 Aug 2011 20:14:58 +1000 Subject: [R] Legend for 2 plots on same screen In-Reply-To: References: Message-ID: <4E367CA2.1010701@bitwrit.com.au> On 08/01/2011 04:52 AM, Cheryl Johnson wrote: > Hello, > > I have two plots on the same screen. I use the command par(mfrow=c(1,2)) in > order to do this. When I try to make a legend for both plots, it only puts > the legend in the plot on the right side. If I would like a legend that is > outside of both of the plots, how would I do this? > Hi Cheryl, You probably want to use par(xpd=TRUE) to allow displaying the legend outside the plot areas. Look at the color.legend (plotrix) function to see how it's done. Jim From fraenzi.korner at oikostat.ch Mon Aug 1 12:18:40 2011 From: fraenzi.korner at oikostat.ch (fraenzi.korner at oikostat.ch) Date: 1 Aug 2011 12:18:40 +0200 Subject: [R] =?utf-8?q?R-help_Digest=2C_Vol_102=2C_Issue_1?= Message-ID: <20110801101840.29147.qmail@srv5.yoursite.ch> Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden F?llen melden Sie sich bei Stefanie von Felten steffi.vonfelten at oikostat.ch We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfelten at oikostat.ch From ogbos.okike at gmail.com Mon Aug 1 12:32:29 2011 From: ogbos.okike at gmail.com (ogbos okike) Date: Mon, 1 Aug 2011 12:32:29 +0200 Subject: [R] axes label Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jeroen.ooms at stat.ucla.edu Mon Aug 1 12:47:05 2011 From: jeroen.ooms at stat.ucla.edu (Jeroen Ooms) Date: Mon, 1 Aug 2011 12:47:05 +0200 Subject: [R] Save generic plot to file (before rendering to device) In-Reply-To: <4E21E189.6060806@statistik.tu-dortmund.de> References: <1310400495706-3659999.post@n4.nabble.com> <067B2EEC-F9BC-4777-8366-6CB1D6D5F69D@comcast.net> <4E21C7D5.4060506@statistik.tu-dortmund.de> <4E21E189.6060806@statistik.tu-dortmund.de> Message-ID: Bumping this one up because the 'before.plot.new' solution turned out to be sub-optimal after all. >> It should be possible to do this with a before.plot.new hook, right? > > Yes, sure, if you treat the first and last plot separately. It turns out that the before.plot.new hook does not is not triggered at the right moments. I'm not sure if this is intended behavior or incorrect implementation. What I was expecting is a hook/event that is triggered every time before a new graphics frame is opened. E.g. if there is an open PDF device and some plots are printed, the number of times the hook is called should be exactly equal to the number of pages in the resulting PDF document. Sometimes this works as expected, sometimes it doesn't. At the end of this message some example code. In the first example, the hook works as expected is called 4 times, as there are 4 plots. In all the other examples the event is either triggered too often or not triggered at all. I guess the hook is called when the plot.new() function is explicitly called, which might not always happen. My question would be if (1) this is the intended behavior for 'before.plot.new', and (2) if yes, would it be possible to define an additional event that always triggers, and only triggers, if a completely new graphics device is opened. I.e. whenever a pdf device would start a new page. Thank you. #set the hook (event listener) setHook("before.plot.new", NULL); setHook("before.plot.new", function(){ message("Yay! A new plot!")}); #works as expected: plot(lm(speed~dist, cars), ask=F); #triggered way too often, once for every partition of the plot plot(mtcars); #not triggered at all by lattice library(lattice); dotplot(speed~dist, cars); #not triggered at all by ggplot2 library(ggplot2); qplot(speed, dist, data=cars); From paul.hiemstra at knmi.nl Mon Aug 1 13:32:05 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Mon, 01 Aug 2011 11:32:05 +0000 Subject: [R] help with algorithm In-Reply-To: References: Message-ID: <4E368EB5.4010002@knmi.nl> On 07/31/2011 05:57 PM, r student wrote: > I'm wondering if anyone can give some basic advice about how to approach a > specific task in R. > > I'm new to R but have used SAS for many years, and while I can muscle > through a lot of the code details, I'm unsure of a few things. > > > Specific questions: > > If I have to perform a set of actions on a group of files, should I use a > loop (I feel like I've heard people say to avoid looping in R)? Hi, Looping over several files is best done using the apply family of functions. Especially the llply, ldply and ddply functions from the plyr package I use a lot for processing. An example of looping over files and recombining the results would look something like: library(plyr) listoffiles = list.files("/where/the/files/are") combinedResult = ldply(listoffiles, function(filename) { bla = read.table(filename) ... now maybe do some stuff with it... return(result) # Note that result is a data.frame # Can contain e.g. summary stats of bla }) ldply will automatically combine the result of the function call in an efficient manner. It can take some time to get the hang of these things, but I love working with them when processing data. > How to get means for "by" groups and subset a files based on those (subset > highest and lowest groups)? (I can do this in multiple steps* but wonder > what the best, "R way" is to do this.) when your data.frame has the form and is called dat: value by 1 A 5 A 3 B etc You can use ddply like this to get the mean value per category in 'by': ddply(dat, .(by), summarise, m = mean(value)) > How to draw cutoff lines at specific points on density plots? > > How to create a matrix of plots? (Take 4 separate plots and put them into a > single graphic.) I really like the ggplot2 package, this provides drawing several plots using a special syntax construct (no need to manually subdivide the canvas nor keep the axis of the plots equal manually). Take a look at the website of ggplot2, specifically look at the examples given for the facet_wrap and facet_grid functions. cheers, Paul > > * Get group means, add means back to file, sort by mean, take first and last > groups > > > > Feel free to excoriate me if I'm asking for too much help. If possible > though, a few words of advice (loops are the best way, just use the "main" > parameter to combine plots) would be lovely if you can provide. > > > > Thanks! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From petr.pikal at precheza.cz Mon Aug 1 13:55:03 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Mon, 1 Aug 2011 13:55:03 +0200 Subject: [R] Odp: converting factor to numeric gives "NAs introduced by coercion" In-Reply-To: <1311929122752-3703408.post@n4.nabble.com> References: <1311929122752-3703408.post@n4.nabble.com> Message-ID: Hi > Hi, > > I have a dataframe that I imported from a .txt file by: > > skogTemp <- read.delim2("Skogaryd_shoot_data.txt", header=TRUE, fill=TRUE) > > and the data are factors, how can avoid factors from the beginning? Although > the file contains both characters and numbers. You have got an answer but here are some comments. If you have characters and numbers in one column the character values are converted to NA by as.numeric > > I tried to convert some of the columns from factor to numeric and as I > understood it you can not use only as.numeric but as.character first. I got > this warning message: > > > skogTemp_1 <- as.numeric(as.character(skogTemp_1[,2:4])) > Warning message: > NAs introduced by coercion What is skogTemp_1? I presume skogTemp is data frame and in that case you can not use such construction directly. > > I have lots of NAs in my data. Tries to check what class I had now but > another warning is given me: > > class(skogTemp_1[,2]) skogTemp_1 is probably a vector with only one dimension therefore you get this error. class(skogTemp_1) shall give you the desired result, however I prefer ?str Regards Petr > Error in skogTemp_1[, 2] : incorrect number of dimensions > > class(skogTemp_1[1,2]) > Error in skogTemp_1[1, 2] : incorrect number of dimensions > > frustrating... I don't know what this mean. > > Can anyone help? > > Thank you, > Angelica > > > > -- > View this message in context: http://r.789695.n4.nabble.com/converting- > factor-to-numeric-gives-NAs-introduced-by-coercion-tp3703408p3703408.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ehlers at ucalgary.ca Mon Aug 1 14:05:33 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Mon, 01 Aug 2011 05:05:33 -0700 Subject: [R] axes label In-Reply-To: References: Message-ID: <4E36968D.6030105@ucalgary.ca> On 2011-08-01 03:32, ogbos okike wrote: > Dear All, > I am trying to put 10^-8 st km^-2day^-1 on x-axis of my plot. I tried using > : ylab = expression(paste("st / ", plain(km)^2, " / day")) to see if I can > at least get the unit before thinking about the power of 10 (10^-8). > > However, ylab = expression(paste("st / ", plain(km)^2, " / day")) didn't > give the result I expected. The power 2 in km was missing. Works for me. But I don't see the need for paste() or plain(). Try this: plot(0, ylab="", xlab="") title(ylab = expression(10^{-8} ~ "st" ~ "km"^{-2} ~ "day"^{-1})) Replace any '~' with '*' if you don't want the space. Peter Ehlers > > > I will be glad for any help on how to label 10^-8 st km^-2day^-1 on the > axis. > Many thanks > Regards > Ogbos > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From cdshikida at gmail.com Mon Aug 1 14:19:49 2011 From: cdshikida at gmail.com (=?ISO-2022-JP?B?Q2xhdWRpbyBTaGlraWRhICgbJEJJX0VEPCNAPyEhJS8laSUmJTglKhsoQik=?=) Date: Mon, 1 Aug 2011 09:19:49 -0300 Subject: [R] ivreg and structural change Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Mon Aug 1 14:22:51 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 08:22:51 -0400 Subject: [R] Help with modFit of FME package In-Reply-To: References: Message-ID: <53204538-D54D-445C-AC88-48638277D62E@comcast.net> On Aug 1, 2011, at 3:41 AM, Paola Lecca wrote: > Dear R users, > > I'm trying to fit a set an ODE to an experimental time series. In the > attachment you find the R code I wrote using modFit and modCost of FME > package and the file of the time series. This is getting a bit tiresome. None of the three duplicate such messages have had successful attachment of any code. Why don't you look at what got distributed to the list? The rule I have developed is that I should assume that any file not ending in .pdf or .txt will get scrubbed by the mail server. I realize it is not an exact rule, but it keeps me from submitting files ending in .r or .rdata because I know they will get scrubbed. For some reason a recent rewrite of the Posting Guide appears to have left out this information which my memory tells me used to be there last year. -- David. > > When I run summary(Fit) I obtain this error message, and the values > of the > parameters are equal to the initial guesses I gave to them. > > The problem is not due to the fact that I have only one equation (I > tried > also with more equations, but I still obtain this error). > > I would appreciate if someone could help me in understanding the > reason of > the error and in fixing it. > > Thanks for your attention, > Paola Lecca. > > Here the error: > >> summary(Fit) > > Parameters: > Estimate Std. Error t value Pr(>|t|) > pro1_strength 1 NA NA NA > > Residual standard error: 2.124 on 10 degrees of freedom > Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix > In addition: Warning message: > In summary.modFit(Fit) : Cannot estimate covariance; system is > singular > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > > > -- > *Paola Lecca, PhD* > *The Microsoft Research - University of Trento* > *Centre for Computational and Systems Biology* > *Piazza Manci 17 38123 Povo/Trento, Italy* > *Phome: +39 0461282843* > *Fax: +39 0461282814* > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From pllcc023 at gmail.com Mon Aug 1 14:34:00 2011 From: pllcc023 at gmail.com (Paola Lecca) Date: Mon, 1 Aug 2011 14:34:00 +0200 Subject: [R] Help with modFit of FME package 2 Message-ID: * Apologies for multiple posting * I attached to my previous e-mail a .r file, and it was not permitted by the rules of the mailing lis. Again, please receive my sincere apologies for this. I re-send again the e-mail with .txt attachemnt in the hope someone an help me to solve my problem. I'm trying to fit a set an ODE to an experimental time series. In the attachment you find the R code I wrote using modFit and modCost of FME package and the file of the time series. When I run summary(Fit) I obtain this error message, and the values of the parameters are equal to the initial guesses I gave to them. The problem is not due to the fact that I have only one equation (I tried also with more equations, but I still obtain this error). I would appreciate if someone could help me in understanding the reason of the error and in fixing it. Thanks for your attention, Paola Lecca. Here the error: > summary(Fit) Parameters: Estimate Std. Error t value Pr(>|t|) pro1_strength 1 NA NA NA Residual standard error: 2.124 on 10 degrees of freedom Error in cov2cor(x$cov.unscaled) : 'V' is not a square numeric matrix In addition: Warning message: In summary.modFit(Fit) : Cannot estimate covariance; system is singular ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *Paola Lecca, PhD* *The Microsoft Research - University of Trento* *Centre for Computational and Systems Biology* *Piazza Manci 17 38123 Povo/Trento, Italy* *Phome: +39 0461282843* *Fax: +39 0461282814* -------------- next part -------------- time pp1_mrna 0 0 2 2.754 4 2.958 6 4.058 8 3.41 10 3.459 12 2.453 14 1.234 16 2.385 18 3.691 20 3.252 -------------- next part -------------- require(deSolve) require(FME) ################################################################################## # PART 1 # ################################################################################## # Differential equations model_1_part_1 <- function(t, S, parameters) { with(as.list(parameters), { # cod1 = pro1_strength # pp1_mrna_degradation_rate <- 1 ############################################################### # v1 = cod1 v2 = pp1_mrna_degradation_rate * S[1] # ################################################################# # dS1 = v1 - v2 # #################################################################### # list(c(dS1)) }) } # Parameters parms_part_1 <- c(pro1_strength = 1.0) # Initial values of the species concentration S <- c(pp1_mrna = 0) times <- seq(0, 20, by = 2) # Solve the system ode_solutions_part_1 <- ode(S, times, model_1_part_1, parms = parms_part_1) ode_solutions_part_1 summary(ode_solutions_part_1) ## Default plot method plot(ode_solutions_part_1) ######################################################################################## ######################################################################################## # Estimate of the parameters experiment <- read.table("./wild_pp1_mrna.txt", header=TRUE) rw <- dim(experiment)[1] names <- array("", rw) for (i in 1:rw) { names[i] <- "pp1_mrna" } names observed_data_part_1 <- data.frame(name = names, time = experiment[,1], val = experiment[,2]) observed_data_part_1 ode_solutions_part_1 Cost_function <- function (pars) { out <- ode_solutions_part_1 cost <- modCost(model = out, obs = observed_data_part_1, y = "val") cost } Cost_function(parms) # Fit the model to the observed data Fit <- modFit(f = Cost_function, p = parms_part_1) Fit # Summary of the fit summary(Fit) # Model coefficients coef(Fit) # Deviance of the fit deviance(Fit) From jholtman at gmail.com Mon Aug 1 14:35:51 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 1 Aug 2011 08:35:51 -0400 Subject: [R] Odp: converting factor to numeric gives "NAs introduced by coercion" In-Reply-To: References: <1311929122752-3703408.post@n4.nabble.com> Message-ID: If you are not going to be using factors, then you can keep everything a character (if there are non-numerics in a column) by adding 'as.is=TRUE' as a parameter on the 'rad.table' functions. On Mon, Aug 1, 2011 at 7:55 AM, Petr PIKAL wrote: > Hi > >> Hi, >> >> I have a dataframe that I imported from a .txt file by: >> >> skogTemp <- read.delim2("Skogaryd_shoot_data.txt", header=TRUE, > fill=TRUE) >> >> and the data are factors, how can avoid factors from the beginning? > Although >> the file contains both characters and numbers. > > You have got an answer but here are some comments. If you have characters > and numbers in one column the character values are converted to NA by > as.numeric > >> >> I tried to convert some of the columns from factor to numeric and as I >> understood it you can not use only as.numeric but as.character first. I > got >> this warning message: >> >> > skogTemp_1 <- as.numeric(as.character(skogTemp_1[,2:4])) >> Warning message: >> NAs introduced by coercion > > What is skogTemp_1? I presume skogTemp is data frame and in that case you > can not use such construction directly. > >> >> I have lots of NAs in my data. Tries to check what class I had now but >> another warning is given me: >> > class(skogTemp_1[,2]) > > > skogTemp_1 is probably a vector with only one dimension therefore you get > this error. > class(skogTemp_1) > > shall give you the desired result, however I prefer > > ?str > > Regards > Petr > >> Error in skogTemp_1[, 2] : incorrect number of dimensions >> > class(skogTemp_1[1,2]) >> Error in skogTemp_1[1, 2] : incorrect number of dimensions >> >> frustrating... I don't know what this mean. >> >> Can anyone help? >> >> Thank you, >> Angelica >> >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/converting- >> > factor-to-numeric-gives-NAs-introduced-by-coercion-tp3703408p3703408.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From dwinsemius at comcast.net Mon Aug 1 14:42:42 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 08:42:42 -0400 Subject: [R] export/import matrix In-Reply-To: References: Message-ID: <15EB704C-18DE-42CB-9EB3-F09D5875CA15@comcast.net> On Jul 31, 2011, at 7:54 PM, Rosario Garcia Gil wrote: > Hello > > I have a problem on keeping the format when I export a matrix file > with the write.table() function. > The quick answer is ... don't do that. Use save() if you want to preserve the attributes of an R object. And that especially applies if you don't understand the differences between R object types. I have discarded a longer answer that complained about your failure to provide complete code. -- David > When I import the data volcano from rgl package it looks like this > in R: > > >> data[1:5,] > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [, > 13] [,14] > [1,] 100 100 101 101 101 101 101 100 100 100 101 > 101 102 102 > [2,] 101 101 102 102 102 102 102 101 101 101 102 > 102 103 103 > [3,] 102 102 103 103 103 103 103 102 102 102 103 > 103 104 104 > [4,] 103 103 104 104 104 104 104 103 103 103 103 > 104 104 104 > [5,] 104 104 105 105 105 105 105 104 104 103 104 > 104 105 105 > > I use this data to represent a 3D map with the follwing script and > it works PEFECT! > >> y<- 2*data >> x <- 10* (1:nrow(y)) >> z <- 10* (1:ncol(y)) >> ylim <- range(y) >> ylen <-ylim[2] - ylim[1] + 1 >> colorlut <- terrain.colors(ylen) >> col <- colorlut[y-ylim[1] + 1] >> rgl.open() >> rgl.surface(x,z,y, color=col, back="lines") > > > Then I export it as write.table(data, file="datam.txt", > row.names=TRUE, col.names=TRUE), > > when I import it back into R again with read.table("datam.txt") it > looks like this in R: > > > V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 > V18 V19 > 1 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102 > 103 104 103 > 2 101 101 102 102 102 102 102 101 101 101 102 102 103 103 103 103 > 104 105 104 > 3 102 102 103 103 103 103 103 102 102 102 103 103 104 104 104 104 > 105 106 105 > 4 103 103 104 104 104 104 104 103 103 103 103 104 104 104 105 105 > 106 107 106 > 5 104 104 105 105 105 105 105 104 104 103 104 104 105 105 105 106 > 107 108 108 > > The script I mention before does not anymore work on it, if I > converted to matrix with as.matrix still does not work. > > I have read the pdf on import/export of R and searched by googleling > but I have not found any answer to my problem. > > I am sorry if the answer is very obvious but I have tried for more > than a week. > > Any help is really wellcome, thanks in advance. > Rosario > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From jholtman at gmail.com Mon Aug 1 14:44:39 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 1 Aug 2011 08:44:39 -0400 Subject: [R] export/import matrix In-Reply-To: References: Message-ID: If you are just exporting it so you can read it back into R later, it is better to use save/load since it keep the data in the internal format so it will look the same. Can you describe what you are going to be doing with the data that you 'export'; that might help us come up with a solution to your problem. On Sun, Jul 31, 2011 at 7:54 PM, Rosario Garcia Gil wrote: > Hello > > I have a problem on keeping the format when I export a matrix file with the write.table() function. > > When I import the data volcano from rgl package it looks like this in R: > > >> data[1:5,] > ? ? [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] > [1,] ?100 ?100 ?101 ?101 ?101 ?101 ?101 ?100 ?100 ? 100 ? 101 ? 101 ? 102 ? 102 > [2,] ?101 ?101 ?102 ?102 ?102 ?102 ?102 ?101 ?101 ? 101 ? 102 ? 102 ? 103 ? 103 > [3,] ?102 ?102 ?103 ?103 ?103 ?103 ?103 ?102 ?102 ? 102 ? 103 ? 103 ? 104 ? 104 > [4,] ?103 ?103 ?104 ?104 ?104 ?104 ?104 ?103 ?103 ? 103 ? 103 ? 104 ? 104 ? 104 > [5,] ?104 ?104 ?105 ?105 ?105 ?105 ?105 ?104 ?104 ? 103 ? 104 ? 104 ? 105 ? 105 > > I use this data to represent a 3D map with the follwing script and it works PEFECT! > >> y<- 2*data >> x <- 10* (1:nrow(y)) >> z <- 10* (1:ncol(y)) >> ylim <- range(y) >> ylen <-ylim[2] - ylim[1] + 1 >> colorlut <- terrain.colors(ylen) >> col <- colorlut[y-ylim[1] + 1] >> rgl.open() >> rgl.surface(x,z,y, color=col, back="lines") > > > Then I export it as write.table(data, file="datam.txt", row.names=TRUE, col.names=TRUE), > > when I import it back into R again with read.table("datam.txt") it looks like this in R: > > > ? V1 ?V2 ?V3 ?V4 ?V5 ?V6 ?V7 ?V8 ?V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 > 1 100 100 101 101 101 101 101 100 100 100 101 101 102 102 102 102 103 104 103 > 2 101 101 102 102 102 102 102 101 101 101 102 102 103 103 103 103 104 105 104 > 3 102 102 103 103 103 103 103 102 102 102 103 103 104 104 104 104 105 106 105 > 4 103 103 104 104 104 104 104 103 103 103 103 104 104 104 105 105 106 107 106 > 5 104 104 105 105 105 105 105 104 104 103 104 104 105 105 105 106 107 108 108 > > The script I mention before does not anymore work on it, if I converted to matrix with as.matrix still does not work. > > I have read the pdf on import/export of R and searched by googleling but I have not found any answer to my problem. > > I am sorry if the answer is very obvious but I have tried for more than a week. > > Any help is really wellcome, thanks in advance. > Rosario > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jholtman at gmail.com Mon Aug 1 14:47:28 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 1 Aug 2011 08:47:28 -0400 Subject: [R] Use dump or write? or what? In-Reply-To: References: Message-ID: Can you define better exactly what you what to do with the data. I would suggest that you keep each of the outputs (objects) of the test in a 'list' that way you can access each one and do what you need. You can also 'save' the list and later 'load' it into another session. On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio wrote: > Greetings all, > I am calculating two t-test values for each of many files then save it > to file calculate another set and append, repeat. > But I can't figure out how to write it to file and then append > subsequent t-tests. > (maybe too tired ;} ) > I have tried to use "dump" and "file.append" to no avial. > > ttest_results = tempfile() > > two_sample_ttest <- t.test (tempA, tempB, var.equal = TRUE) > welch_ttest <- t.test (tempA, tempB, var.equal = FALSE) > > dump (two_sample_ttest, file = "dumpdata.txt"", append=TRUE) > ttest_results <- file.append (ttest_results, two_sample_ttest) > > Any suggestions, > M > -- > > > > Matt Curcio > M: 401-316-5358 > E: matt.curcio.ri at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From paulepanter at users.sourceforge.net Mon Aug 1 14:49:42 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Mon, 01 Aug 2011 14:49:42 +0200 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> Message-ID: <1312202982.4034.184.camel@mattotaupa> Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt : > Glad to help -- I haven't taken a look at Dennis' solution (which may be far > better than mine), but if you do want to keep going down the path outlined > below you might consider the following: I will try Dennis? solution right away but looked at your suggestions first. Thank you very much. > Instead of throwing away a simulation if something starts negative, why not > just multiply the entire sample by -1: that lets you still use the sample > and saves you some computations: of course you'll have to remember to adjust > your final results accordingly. That is a nice suggestion. For a symmetric random walk this is indeed possible and equivalent to looking when the walk first hits zero. > This might avoid the loop: > > x = ## Whatever x is. > xLag = c(0,x[-length(x)]) # 'lag' x by 1 step. > which.max((x>=0) & (xLag <0)) + 1 # Depending on how you've decided to count > things, this +1 may be extraneous. > > The inner expression sets a 0 except where there is a switch from negative > to positive and a one there: the which.max function returns the location of > the first maximum, which is the first 1, in the vector. If you are > guaranteed the run starts negative, then the location of the first positive > should give you the length of the negative run. That is the same idea as from Bill [1]. The problem is, when the walk never returns to zero in a sample, `which.max(?everything FALSE)` returns 1 [2]. That is no problem though, when we do not have to worry about a walk starting with a positive value and adding 1 (+1) can be omitted when we count the epochs of first hitting 0 instead of the time of how long the walk stayed negative, which is always one less. Additionally my check `(x>=0) & (xLag <0)` is redundant when we know we start with a negative value. `(x>=0)` should be good enough in this case. > This all gives you, > > f4 <- function(n = 100000, # number of simulations > length = 100000) # length of iterated sum > { > R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) > > > R = apply(R,1,cumsum) > > > R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row The line above seems to look the columns instead of rows. I think the following is correct since after the `apply()` above the random walks are in the columns. R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] > > fTemp <- function(x) { > > > xLag = c(0,x[-length(x)]) > return(which.max((x>=0) & (xLag <0))+1) > > > countNegative = apply(R,2,fTemp) > > tabulate(as.vector(countNegative), length) > > } > > That just crashed my computer though, so I wouldn't recommend it for large > n,length. Welcome to my world. I would have never thought that simulating random walks with a length of say a million would create that much data and push common desktop systems with let us say 4 GB of RAM to their limits. > Instead, you can help a little by combining the lagging and the & > all in one. > > f4 <- function(n = 100000, llength = 100000) > { > R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) > R = apply(R,1,cumsum) > R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row > R = (cbind(rep(0,NROW(R)),R)<0)&(cbind(R,rep(0,NROW(R)))>=0) > countNegative = apply(R,1,which.max) + 1 > return (tabulate(as.vector(countNegative), length) ) > > > } I left that one out, because as written above the check can be shortened. > Of course, this is all starting to approach a very specific question that > could actually be approached much more efficiently if it's your end goal > (though I think I remember from your first email a different end goal): That is true. But to learn some optimization techniques on a simple example is much appreciated and will hopefully help me later on for the iterated random walk cases. > We can use the symmetry and "restart"ability of RW to do the following: > > x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T) > D = diff(which(x == 0)) Nice! > This will give you a vector of how long x stays positive or negative at a > time. Thinking through some simple translations lets you see that this set > has the same distribution as how long a RW that starts negative stays > negative. I have to write those translations down. On first sight though we need again to handle the case where it stays negative the whole time. `D` then has length 0 and we have to count that for a walk longer than `BIGNUMBER`. > Again, this is only good for answering a very specific question > about random walks and may not be useful if you have other more complicated > questions in sight. Just testing for 0 for the iterated cases will not be enough for iterated random walks since an iterated random walk can go from negative to non-negative without being zero at this time/epoch. I implemented all your suggestions and got the following. -------- 8< -------- code -------- >8 -------- f4 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n) R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better fTemp <- function(x) { if (x[1] >= 0 ) { return(1) } for (i in 1:length-1) { if (x[i] < 0 && x[i + 1] >= 0) { return(as.integer(i/2 + 2)) } } } countNegative = apply(R,2,fTemp) tabulate(as.vector(countNegative), length) } f5 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) R = apply(R,1,cumsum) R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] # If the first element in the row is positive, flip the entire row R <- R>=0 countNegative = apply(R,2,which.max) tabulate(as.vector(countNegative), length) } f6 <- function(n = 100000, # number of simulations length = 100000) # length of iterated sum { x = cumsum(sample(c(-1L,1L), length*n,replace=T)) D = diff(which(c(0, x) == 0)) tabulate(D, max(D)) } -------- 8< -------- code -------- >8 -------- The timings differ quite much which is expected though. > # f1 is using only for loops but only does half the calculations > # and does not yet flip random walks starting with a positive value. > set.seed(1) ; system.time( z1 <- f1(300, 1e5) ) User System verstrichen 2.700 0.008 2.729 > # f1 adapted with flips > set.seed(1) ; system.time( z1f <- f1withflip(300, 1e5) ) User System verstrichen 4.457 0.004 4.475 > set.seed(1) ; system.time( z4 <- f4(300, 1e5) ) User System verstrichen 8.033 0.380 8.739 > set.seed(1) ; system.time( z5 <- f5(300, 1e5) ) User System verstrichen 9.640 0.812 10.588 > set.seed(1) ; system.time( z6 <- f6(300, 1e5) ) User System verstrichen 4.208 0.328 4.606 So `f6` seems to be the most efficient setting right now and even is slightly faster than `f1` with the for loops. But we have to keep in mind that both operate on different data sets although `set.seed(1)` is used and `f6` treats the problem totally different. One other thought is that when reusing the walks starting with a positiv term and flipping those we can probably also take the backward/reverse walk (dual problem). I will try that too. Thank you very much, Paul [1] https://stat.ethz.ch/pipermail/r-help/2011-July/285015.html [2] https://stat.ethz.ch/pipermail/r-help/2011-July/285396.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From ogbos.okike at gmail.com Mon Aug 1 14:54:21 2011 From: ogbos.okike at gmail.com (ogbos okike) Date: Mon, 1 Aug 2011 14:54:21 +0200 Subject: [R] Problem Fixed: axes label Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From matt.curcio.ri at gmail.com Mon Aug 1 15:11:14 2011 From: matt.curcio.ri at gmail.com (Matt Curcio) Date: Mon, 1 Aug 2011 09:11:14 -0400 Subject: [R] Use dump or write? or what? In-Reply-To: References: Message-ID: Greetings all, Thanks for all your help so far. Let me give a better idea of what I am doing. I have hundreds of files that I need to plow thru with a t-test and correlation test. BTW, 'tempA' and tempB' are simply columns of numbers from a gene-chip experiment that spits out dna 'amounts'. So I have set up a loop to read the files and carry out the tests but need to save it for later inspection (and Jim H-you are probably right, for later inspection). By inspection I mean I don't know what I want to do with it yet, Remember: "That's why they call it Research." So it seems that 'save/load' might be a good alternative for my work. Any suggestions, M On Sun, Jul 31, 2011 at 11:41 PM, Matt Curcio wrote: > Greetings all, > I am calculating two t-test values for each of many files then save it > to file calculate another set and append, repeat. > But I can't figure out how to write it to file and then append > subsequent t-tests. > (maybe too tired ;} ) > I have tried to use "dump" and "file.append" to no avial. > > ttest_results = tempfile() > > two_sample_ttest <- t.test (tempA, tempB, var.equal = TRUE) > welch_ttest <- t.test (tempA, tempB, var.equal = FALSE) > > dump (two_sample_ttest, file = "dumpdata.txt"", append=TRUE) > ttest_results <- file.append (ttest_results, two_sample_ttest) > > Any suggestions, > M > -- > > > > Matt Curcio > M: 401-316-5358 > E: matt.curcio.ri at gmail.com > -- Matt Curcio M: 401-316-5358 E: matt.curcio.ri at gmail.com From Samuel.Le at srlglobal.com Mon Aug 1 15:27:28 2011 From: Samuel.Le at srlglobal.com (Samuel Le) Date: Mon, 1 Aug 2011 13:27:28 +0000 Subject: [R] formula used by R to compute the t-values in a linear regression Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Mon Aug 1 15:44:32 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 09:44:32 -0400 Subject: [R] formula used by R to compute the t-values in a linear regression In-Reply-To: References: Message-ID: <79389FCC-6CC7-4890-B14B-D931D669490F@comcast.net> On Aug 1, 2011, at 9:27 AM, Samuel Le wrote: > Hello, > > > I was wondering if someone knows the formula used by the function lm > to compute the t-values. > > I am trying to implement a linear regression myself. Assuming that I > have K variables, and N observations, the formula I am using is: > > For the k-th variable, t-value= b_k/sigma_k > > With b_k is the coefficient for the k-th variable, and sigma_k > =(t(x) x )^(-1) _kk is its standard deviation. > > I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2)) > > With sigma: the estimated standard deviation of the residuals, > > Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2) > > With: > > N: number of observations > > K: number of variables > > This formula comes from my old course of econometrics. > > For some reason it doesn't match the t-value produced by R (I am off > by about 1%). I can match the other results produced by R > (coefficients of the regression, r squared, etc.). Usually such a small difference results from using different degrees of freedom. Have you reduced the df's appropriately after considering the number of other estimated parameters? Just quoting code from you econometrics reference is not enough to answer the question. We would need to see code... as the message states at the end of every posting.) > > I would be grateful if someone could provide some clarifications. > > > > Samuel > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From pdalgd at gmail.com Mon Aug 1 15:45:05 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Mon, 1 Aug 2011 15:45:05 +0200 Subject: [R] formula used by R to compute the t-values in a linear regression In-Reply-To: References: Message-ID: On Aug 1, 2011, at 15:27 , Samuel Le wrote: > Hello, > > > > I was wondering if someone knows the formula used by the function lm to compute the t-values. > > > > I am trying to implement a linear regression myself. Assuming that I have K variables, and N observations, the formula I am using is: > > For the k-th variable, t-value= b_k/sigma_k > > > > With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1) _kk is its standard deviation. > > > > I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2)) > > > > With sigma: the estimated standard deviation of the residuals, > > Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2) > > > > With: > > N: number of observations > > K: number of variables > > > > This formula comes from my old course of econometrics. > > For some reason it doesn't match the t-value produced by R (I am off by about 1%). I can match the other results produced by R (coefficients of the regression, r squared, etc.). > > > > I would be grateful if someone could provide some clarifications. AFAICT, your formula only holds for K=1. Otherwise, the formula for sigma_k involves matrix inversion. Also, even for K=1, beware that textbook formulas like SSDx = SSx - (Sx)^2/n involve subtraction of nearly equal quantities and easily loses multiple digits of precision, so software tends to use rather more careful algorithms. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From hadley at rice.edu Mon Aug 1 16:00:01 2011 From: hadley at rice.edu (Hadley Wickham) Date: Mon, 1 Aug 2011 09:00:01 -0500 Subject: [R] Reading name-value data In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From S.Ellison at LGCGroup.com Mon Aug 1 16:15:50 2011 From: S.Ellison at LGCGroup.com (S Ellison) Date: Mon, 1 Aug 2011 15:15:50 +0100 Subject: [R] formula used by R to compute the t-values in a linear regression In-Reply-To: References: Message-ID: > -----Original Message----- > [mailto:r-help-bounces at r-project.org] On Behalf Of Samuel Le > Subject: [R] formula used by R to compute the t-values in a > linear regression > I was wondering if someone knows the formula used by the > function lm to compute the t-values. Typing summary.lm I found the standard error and t calculation (for around line 58-62 of the resulting listing. resvar <- rss/rdf R <- chol2inv(Qr$qr[p1, p1, drop = FALSE]) se <- sqrt(diag(R) * resvar) est <- z$coefficients[Qr$pivot[p1]] tval <- est/se You can also find (rather further up) that the degrees of freedom df used are taken directly from the linear model $df (z$df in the function). Others noted that incorrect df often cause problems, so checking that you're using the correct df is possible by inspecting the lm summary. The standard errors are apparently (as is usual for a least squares problem, I think) taken from the diagonal of the inverse of the hessian, multiplied by the residual variance. Unfortunately I could not get at the hessian calculation quite as easily (it looks like it uses a function that's not exported from stats) so that's left as an exercise in browsing source code ... S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} From Samuel.Le at srlglobal.com Mon Aug 1 16:18:54 2011 From: Samuel.Le at srlglobal.com (Samuel Le) Date: Mon, 1 Aug 2011 14:18:54 +0000 Subject: [R] formula used by R to compute the t-values in a linear regression In-Reply-To: References: Message-ID: Exactly. My formula holds only for k=1, this is how I generated it. Do you have any references concerning the " rather more careful algorithms"? Thanks, Samuel -----Original Message----- From: peter dalgaard [mailto:pdalgd at gmail.com] Sent: 01 August 2011 14:45 To: Samuel Le Cc: r-help at stat.math.ethz.ch Subject: Re: [R] formula used by R to compute the t-values in a linear regression On Aug 1, 2011, at 15:27 , Samuel Le wrote: > Hello, > > > > I was wondering if someone knows the formula used by the function lm to compute the t-values. > > > > I am trying to implement a linear regression myself. Assuming that I have K variables, and N observations, the formula I am using is: > > For the k-th variable, t-value= b_k/sigma_k > > > > With b_k is the coefficient for the k-th variable, and sigma_k =(t(x) x )^(-1) _kk is its standard deviation. > > > > I find sigma_k = sigma * n/(n*Sum x_{k,i}^2 -(sum x_{k,i}^2)) > > > > With sigma: the estimated standard deviation of the residuals, > > Sigma = sqrt(1/(N-K-1)*Sum epsilon_i^2) > > > > With: > > N: number of observations > > K: number of variables > > > > This formula comes from my old course of econometrics. > > For some reason it doesn't match the t-value produced by R (I am off by about 1%). I can match the other results produced by R (coefficients of the regression, r squared, etc.). > > > > I would be grateful if someone could provide some clarifications. AFAICT, your formula only holds for K=1. Otherwise, the formula for sigma_k involves matrix inversion. Also, even for K=1, beware that textbook formulas like SSDx = SSx - (Sx)^2/n involve subtraction of nearly equal quantities and easily loses multiple digits of precision, so software tends to use rather more careful algorithms. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg __________ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From Samuel.Le at srlglobal.com Mon Aug 1 16:26:53 2011 From: Samuel.Le at srlglobal.com (Samuel Le) Date: Mon, 1 Aug 2011 14:26:53 +0000 Subject: [R] formula used by R to compute the t-values in a linear regression In-Reply-To: References: Message-ID: Yes, that's what I was looking for. Many thanks, Samuel -----Original Message----- From: S Ellison [mailto:S.Ellison at LGCGroup.com] Sent: 01 August 2011 15:16 To: Samuel Le; r-help at stat.math.ethz.ch Subject: RE: formula used by R to compute the t-values in a linear regression > -----Original Message----- > [mailto:r-help-bounces at r-project.org] On Behalf Of Samuel Le > Subject: [R] formula used by R to compute the t-values in a > linear regression > I was wondering if someone knows the formula used by the > function lm to compute the t-values. Typing summary.lm I found the standard error and t calculation (for around line 58-62 of the resulting listing. resvar <- rss/rdf R <- chol2inv(Qr$qr[p1, p1, drop = FALSE]) se <- sqrt(diag(R) * resvar) est <- z$coefficients[Qr$pivot[p1]] tval <- est/se You can also find (rather further up) that the degrees of freedom df used are taken directly from the linear model $df (z$df in the function). Others noted that incorrect df often cause problems, so checking that you're using the correct df is possible by inspecting the lm summary. The standard errors are apparently (as is usual for a least squares problem, I think) taken from the diagonal of the inverse of the hessian, multiplied by the residual variance. Unfortunately I could not get at the hessian calculation quite as easily (it looks like it uses a function that's not exported from stats) so that's left as an exercise in browsing source code ... S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:25}} From bt_jannis at yahoo.de Mon Aug 1 11:38:05 2011 From: bt_jannis at yahoo.de (Jannis) Date: Mon, 01 Aug 2011 11:38:05 +0200 Subject: [R] General indexing in multidimensional arrays Message-ID: <4E3673FD.5030800@yahoo.de> Dear R community, I have a general question regarding indexing in multidiemensional arrays. Imagine I have a three dimensional array and I only want to extract on vector along a single dimension from it: data <- array(rnorm(64),dim=c(4,4,4)) result <- data[1,1,] If I want to extract more than one of these vectors, it would now really help me to supply a logical matrix of the size of the first two dimensions: indices <- matrix(FALSE,ncol=4,nrow=4) indices[1,3] <- TRUE indices[4,1] <- TRUE result <- data[indices,] This, however would give me an error. I am used to this kind of indexing from Matlab and was wonderingt whether there exists an easy way to do this in R without supplying complicated index matrices of all three dimensions or logical vectors of the size of the whole matrix? The only way I could imagine would be to: result <- data[rep(as.vector(indices),times=4)] but this seems rather complicated and also depends on the order of the dimensions I want to extract. I do not want R to copy Matlabs behaviour, I am just wondering whether I missed one concept of indexing in R? Thanks a lot Jannis From dimitrios.kapetanakis at gmail.com Mon Aug 1 09:04:31 2011 From: dimitrios.kapetanakis at gmail.com (Dimitris.Kapetanakis) Date: Mon, 1 Aug 2011 00:04:31 -0700 (PDT) Subject: [R] memory problem; Error: cannot allocate vector of size 915.5 Mb In-Reply-To: <1312127619560-3707943.post@n4.nabble.com> References: <1312127619560-3707943.post@n4.nabble.com> Message-ID: <1312182271664-3709002.post@n4.nabble.com> Thanks a lot for the help. Actually, I am using a mac which (R for Mac OS X GUI 1.40-devel Leopard build 32-bit (5751)) but I think I can find access on windows 7 64-bit. What I am trying to do is a maximization through grid search (because I am not sure that any of the optim() methods works sufficiently to my case, at least all of them provide quite different results), the reason that I want the optimizing is because I want to use it for a Monte Carlo analysis for Smoothed Maximum Score estimator, and for that reason I want the optimization to be the most efficient possible, but given that I am kind of amateur on R and on programming in general, I doubt that I can do that sufficiently. Thanks again for your help Dimitris -- View this message in context: http://r.789695.n4.nabble.com/memory-problem-Error-cannot-allocate-vector-of-size-915-5-Mb-tp3707943p3709002.html Sent from the R help mailing list archive at Nabble.com. From amccul39 at yahoo.co.uk Mon Aug 1 11:44:29 2011 From: amccul39 at yahoo.co.uk (Andrew McCulloch) Date: Mon, 1 Aug 2011 10:44:29 +0100 (BST) Subject: [R] Plotting question Message-ID: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> Hi, I use?R to draw my graphs. I have 100 points on a simple xy-plot. The points are distinguished by a third variable which is categorical with 10 levels. I have been plotting x against y and using gray scales to distinguish the level of the categorical variable for each point. It looks ok to me but?a journal reviewer says this is not any use. I cannot afford to pay for colour prints. Any ideas on what is the best way to distinguish 10 groups on an xy scatter plot? If all else fails?I can just remove the graph and give them a table of regression coefficients. Thanks. Yours Sincerely Andrew McCulloch From pjura at me.com Mon Aug 1 10:56:09 2011 From: pjura at me.com (pjura) Date: Mon, 01 Aug 2011 08:56:09 +0000 (GMT) Subject: [R] Problem with gam() Message-ID: <397d620f-533c-e2a6-8ed1-ed9790349af0@me.com> Dear group, I experience some problems with gam() function after R update to version 2.13.1 The function in both gam and mgcv packages stopped to work. Before, with the same code I used, everything was fine. The function from gam package yields following warning: Residual degrees of freedom are negative or zero. This occurs when the sum of the parametric and nonparametric degrees of freedom exceeds the number of observations. The model is probably too complex for the amount of data available. while gam() from mgcv crashes R.? Did I miss something? Thank you in advance. PJ From lifty.gere at gmx.de Mon Aug 1 09:03:11 2011 From: lifty.gere at gmx.de (lifty.gere at gmx.de) Date: Mon, 01 Aug 2011 09:03:11 +0200 Subject: [R] Impact of multiple imputation on correlations Message-ID: <20110801070311.152960@gmx.net> Dear all, I have been attempting to use multiple imputation (MI) to handle missing data in my study. I use the mice package in R for this. The deeper I get into this process, the more I realize I first need to understand some basic concepts which I hope you can help me with. For example, let us consider two arbitrary variables in my study that have the following missingness pattern: Variable 1 available, Variable 2 available: 51 (of 118 observations, 43%) Variable 1 available, Variable 2 missing: 37 (31,3%) Variable 1 missing, Variable 2 available: 10 (8,4%) Variable 1 missing, Variable 2 missing: 20 (16,9%) I am interested in the correlation between Variable 1 and Variable 2. Q1. Does it even make sense for me to use MI (or anything else, really) to replace my missing data when such large fractions are not available? Plot 1 (http://imgur.com/KFV9y&CmV1sl) provides a scatter plot of these example variables in the original data. The correlation coefficient r = -0.34 and p = 0.016. Q2. I notice that correlations between variables in imputed data (pooled estimates over all imputations) are much lower and less significant than the correlations in the original data. For this example, the pooled estimates for the imputed data show r = -0.11 and p = 0.22. Since this seems to happen in all the variable combinations that I have looked at, I would like to know if MI is known to have this behavior, or whether this is specific to my imputation. Q3. When going through the imputations, the distribution of the individual variables (min, max, mean, etc.) matches the original data. However, correlations and least-square line fits vary quite a bit from imputation to imputation (see Plot 2, http://imgur.com/KFV9yl&CmV1s). Is this normal? Q4. Since my results differ (quite significantly) between the original and imputed data, which one should I trust? Thank you for your help in advance. Tina -- From stefan.theussl at wu.ac.at Mon Aug 1 10:44:00 2011 From: stefan.theussl at wu.ac.at (Stefan Theussl) Date: Mon, 01 Aug 2011 10:44:00 +0200 Subject: [R] [R-Forge] R 2.13.1 can't find package binaries on R-Forge In-Reply-To: References: <4E35742E.60501@yorku.ca> Message-ID: <4E366750.3020905@wu.ac.at> Dear all, this must have been a temporary problem. In this case I assume that the build cycle did not finish in time, i.e., binaries were synced to the staging area although not all were built. best, stefan On 07/31/2011 05:52 PM, David Winsemius wrote: > On Jul 31, 2011, at 11:26 AM, Michael Friendly wrote: > > >> [Env: Win XP] >> I've just upgraded from R 2.12.2 to R 2.13.1. As part of my upgrade >> process, I typically install some in-development >> packages from R-Forge that are not on cran. But for the first time, >> it >> doesn't work. >> >> e.g., >> >>> install.packages("p3d", repos="http://R-Forge.R-project.org") >>> >> trying URL >> 'http://R-Forge.R-project.org/bin/windows/contrib/2.13/p3d_0.02-2.zip' >> Error in download.file(url, destfile, method, mode = "wb", ...) : >> cannot open URL >> 'http://R-Forge.R-project.org/bin/windows/contrib/2.13/p3d_0.02-2.zip' >> In addition: Warning message: >> In download.file(url, destfile, method, mode = "wb", ...) : >> cannot open: HTTP status was '404 Not Found' >> Warning in download.packages(pkgs, destdir = tmpd, available = >> available, : >> download of package 'p3d' failed >> >> The list of packages I install this way is: >> >> special<- c("p3d", "patchDVI", "spacemakeR", "spida") >> install.packages(special,repos="http://R-Forge.R-project.org") >> >> >> Is this just an R-Forge problem? >> > I'm not informed about the workings of r-forge, but did you notice > that there were no packages in that bin/windows directory whose > alphabetical collation would be after lowercase "i". That seems to > suggest some sort of system error encountered before the next package > after "ipreds" was completed. > > On the project page the binaries for windows are listed as "offline". > > https://r-forge.r-project.org/R/?group_id=431 > > I don't see any C modules in the source. Have you tried installing > from source? > > From pmj83 at me.com Mon Aug 1 11:01:07 2011 From: pmj83 at me.com (Przemek Jura) Date: Mon, 01 Aug 2011 09:01:07 +0000 (GMT) Subject: [R] Problem with gam() after R update Message-ID: <54f538a8-4456-6025-9f32-e0e470bb8030@me.com> Dear group, I experience s?ome problems with gam() function after R update to version 2.13.1 The function in both gam and mgcv packages stopped to work. Before, with the same code I used, everything was fine. The function from gam package yields following warning: Residual degrees of freedom are negative or zero. This occurs when the sum of the parametric and nonparametric degrees of freedom exceeds the number of observations. The model is probably too complex for the amount of data available while gam() from mgcv crashes R. Did I miss something? Thank you in advance. PJ From r.hyacinth at sheffield.ac.uk Mon Aug 1 13:39:57 2011 From: r.hyacinth at sheffield.ac.uk (Rocky Hyacinth) Date: Mon, 1 Aug 2011 12:39:57 +0100 Subject: [R] error message jpeg62.dll missing Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From teichmann at binf.ku.dk Mon Aug 1 08:35:54 2011 From: teichmann at binf.ku.dk (Jan Teichmann) Date: Mon, 01 Aug 2011 08:35:54 +0200 Subject: [R] How to colour specific edges in a dendrogram Message-ID: <1312180554.9389.35.camel@x60s.lan> Dear Mailing-list I used hclust to make a dendrogram of 2613 leafs. I also have a list with the names of certain labels which are of interest and I would like to visualize their appearance within the dendrogram. I found an example how to use dendrapply to colour the labels but the problem is that with 2613 leafs I cannot plot the labels as it gets super messy. I now tried to write a function using dendrapply() to colour the edges of the leafs of interest red. Unfortunately, I fail writing this function. Could someone help me out with the stub of a function colouring edges? I have the dendrogram list of labels to colour their edges I would like to colour the edges between the final leaf node and their parental node. Thank you very much for your help! Jan From murdoch.duncan at gmail.com Mon Aug 1 16:50:51 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Mon, 01 Aug 2011 10:50:51 -0400 Subject: [R] General indexing in multidimensional arrays In-Reply-To: <4E3673FD.5030800@yahoo.de> References: <4E3673FD.5030800@yahoo.de> Message-ID: <4E36BD4B.9010304@gmail.com> On 11-08-01 5:38 AM, Jannis wrote: > Dear R community, > > > I have a general question regarding indexing in multidiemensional arrays. > > Imagine I have a three dimensional array and I only want to extract on > vector along a single dimension from it: > > > data<- array(rnorm(64),dim=c(4,4,4)) > > result<- data[1,1,] > > If I want to extract more than one of these vectors, it would now really > help me to supply a logical matrix of the size of the first two dimensions: > > > indices<- matrix(FALSE,ncol=4,nrow=4) > indices[1,3]<- TRUE > indices[4,1]<- TRUE > > result<- data[indices,] > > This, however would give me an error. I am used to this kind of indexing > from Matlab and was wonderingt whether there exists an easy way to do > this in R without supplying complicated index matrices of all three > dimensions or logical vectors of the size of the whole matrix? > > The only way I could imagine would be to: > > result<- data[rep(as.vector(indices),times=4)] > > but this seems rather complicated and also depends on the order of the > dimensions I want to extract. > > > I do not want R to copy Matlabs behaviour, I am just wondering whether I > missed one concept of indexing in R? > Base R doesn't have anything like that as far as I know. The closest is matrix indexing: you construct a 3 column matrix whose rows are the indices of each element you want to extract. Possibly plyr or some other package has functions to do this. Duncan Murdoch From murdoch.duncan at gmail.com Mon Aug 1 16:51:35 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Mon, 01 Aug 2011 10:51:35 -0400 Subject: [R] Plotting question In-Reply-To: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> References: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> Message-ID: <4E36BD77.1050409@gmail.com> On 11-08-01 5:44 AM, Andrew McCulloch wrote: > Hi, > > I use R to draw my graphs. I have 100 points on a simple xy-plot. The points are > distinguished by a third variable which is categorical with 10 levels. I have > been plotting x against y and using gray scales to distinguish the level of the > categorical variable for each point. It looks ok to me but a journal reviewer > says this is not any use. I cannot afford to pay for colour prints. Any ideas on > what is the best way to distinguish 10 groups on an xy scatter plot? Plot digits or letters or other symbols. Duncan Murdoch > > > > If all else fails I can just remove the graph and give them a table of > regression coefficients. > > > Thanks. > > Yours Sincerely > Andrew McCulloch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Marianne.ZEYRINGER at ec.europa.eu Mon Aug 1 16:58:31 2011 From: Marianne.ZEYRINGER at ec.europa.eu (Marianne.ZEYRINGER at ec.europa.eu) Date: Mon, 1 Aug 2011 16:58:31 +0200 Subject: [R] fitting a sinus curve In-Reply-To: References: <1311843912949-3700833.post@n4.nabble.com> Message-ID: Dear David and Hans- Werner, Thank you very much for your help. I would like to compare now if a polynomial or the sinus model fits better. How can I see R-squared or the F- Statistic for the sinus regression, so as to be able to compare it with the polynomial model? Thanks a lot and have a nice evening. Best, Mairanne -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Hans W Borchers Sent: Friday, July 29, 2011 12:21 PM To: r-help at stat.math.ethz.ch Subject: Re: [R] fitting a sinus curve David Winsemius comcast.net> writes: > > > On Jul 28, 2011, at 1:07 PM, Hans W Borchers wrote: > > > maaariiianne ec.europa.eu> writes: > > > >> Dear R community! > >> I am new to R and would be very grateful for any kind of help. I am > >> a PhD student and need to fit a model to an electricity load > >> profile of a household (curve with two peaks). I was thinking of > >> looking if a polynomial of 4th order, a sinus/cosinus combination > >> or a combination of 3 parabels fits the data best. I have problems > >> with the sinus/cosinus > >> regression: time <- c( 0.00, 0.15, 0.30, 0.45, 1.00, 1.15, 1.30, 1.45, 2.00, 2.15, 2.30, 2.45, 3.00, 3.15, 3.30, 3.45, 4.00, 4.15, 4.30, 4.45, 5.00, 5.15, 5.30, 5.45, 6.00, 6.15, 6.30, 6.45, 7.00, 7.15, 7.30, 7.45, 8.00, 8.15, 8.30, 8.45, 9.00, 9.15, 9.30, 9.45, 10.00, 10.15, 10.30, 10.45, 11.00, 11.15, 11.30, 11.45, 12.00, 12.15, 12.30, 12.45, 13.00, 13.15, 13.30, 13.45, 14.00, 14.15, 14.30, 14.45, 15.00, 15.15, 15.30, 15.45, 16.00, 16.15, 16.30, 16.45, 17.00, 17.15, 17.30, 17.45, 18.00, 18.15, 18.30, 18.45, 19.00, 19.15, 19.30, 19.45, 20.00, 20.15, 20.30, 20.45, 21.00, 21.15, 21.30, 21.45, 22.00, 22.15, 22.30, 22.45, 23.00, 23.15, 23.30, 23.45) watt <- c( 94.1, 70.8, 68.2, 65.9, 63.3, 59.5, 55, 50.5, 46.6, 43.9, 42.3, 41.4, 40.8, 40.3, 39.9, 39.5, 39.1, 38.8, 38.5, 38.3, 38.3, 38.5, 39.1, 40.3, 42.4, 45.6, 49.9, 55.3, 61.6, 68.9, 77.1, 86.1, 95.7, 105.8, 115.8, 124.9, 132.3, 137.6, 141.1, 143.3, 144.8, 146, 147.2, 148.4, 149.8, 151.5, 153.5, 156, 159, 162.4, 165.8, 168.4, 169.8, 169.4, 167.6, 164.8, 161.5, 158.1, 154.9, 151.8, 149, 146.5, 144.4, 142.7, 141.5, 140.9, 141.7, 144.9, 151.5, 161.9, 174.6, 187.4, 198.1, 205.2, 209.1, 211.1, 212.2, 213.2, 213, 210.4, 203.9, 192.9, 179, 164.4, 151.5, 141.9, 135.3, 131, 128.2, 126.1, 124.1, 121.6, 118.2, 113.4, 107.4, 100.8) > >> df<-data.frame(time, watt) > >> lmfit <- lm(time ~ watt + cos(time) + sin(time), data = df) > > > > Your regression formula does not make sense to me. > > You seem to expect a periodic function within 24 hours, and if not > > it would still be possible to subtract the trend and then look at a > > periodic solution. > > Applying a trigonometric regression results in the following > > approximations: library(pracma) plot(2*pi*time/24, watt, col="red") ts <- seq(0, 2*pi, len = 100) xs6 <- trigApprox(ts, watt, 6) xs8 <- trigApprox(ts, watt, 8) lines(ts, xs6, col="blue", lwd=2) lines(ts, xs8, col="green", lwd=2) grid() > > where as examples the trigonometric fits of degree 6 and 8 are used. > > I would not advise to use higher orders, even if the fit is not > > perfect. > > Thank you ! That is a real gem of a worked example. Not only did it > introduce me to a useful package I was not familiar with, but there > was even a worked example in one of the help pages that might have > specifically answered the question about getting a 2nd(?) order trig > regression. If I understood the commentary on that page, this method > might also be appropriate for an irregular time series, whereas > trigApprox and trigPoly would not? That's true. For the moment, the trigPoly() function works correctly only with equidistant data between 0 and 2*pi. > This is adapted from the trigPoly help page in Hans Werner's pracma > package: The error I made myself was to take the 'time' variable literally, though obviously the numbers after the decimal point were meant as minutes. Thus time <- seq(0, 23.75, len = 96) would be a better choice. The rest in your adaptation is absolutely correct. A <- cbind(1, cos(pi*time/24), sin(pi*time/24), cos(2*pi*time/24), sin(2*pi*time/24)) (ab <- qr.solve(A, watt)) # [1] 127.29131 -26.88824 -10.06134 -36.22793 -38.56219 ts <- seq(0, pi, length.out = 100) xs <- ab[1] + ab[2]*cos(ts) + ab[3]*sin(ts) + ab[4]*cos(2*ts) + ab[5]*sin(2*ts) plot(pi*time/24, watt, col = "red", xlim=c(0, pi), ylim=range(watt), main = "Trigonometric Regression") lines(ts, xs, col="blue") > Hans: I corrected the spelling of "Trigonometric", but other than > that I may well have introduced other errors for which I would be > happy to be corrected. For instance, I'm unsure of the terminology > regarding the ordinality of this model. I'm also not sure if my pi/24 > and 2*pi/24 factors were correct in normalizing the time scale, > although the prediction seemed sensible. And yes, this curve is the best trigonometric approximation you can get for this order(?). You will see the same result when you apply and plot xs1 <- trigApprox(ts, watt, 1) But I see your problem with the term 'order' I will have a closer look at this and clarify the terminology on the help page. [All this reminds me of an article in the Mathematical Intelligencer some years ago where it was convincingly argued that the universal constant \pi should have the value 2*pi (in today's notation).] Thanks, Hans Werner > > > > > Hans Werner > > > >> Thanks a lot, > >> Marianne > ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Mon Aug 1 17:22:16 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 11:22:16 -0400 Subject: [R] memory problem; Error: cannot allocate vector of size 915.5 Mb In-Reply-To: <1312182271664-3709002.post@n4.nabble.com> References: <1312127619560-3707943.post@n4.nabble.com> <1312182271664-3709002.post@n4.nabble.com> Message-ID: On Aug 1, 2011, at 3:04 AM, Dimitris.Kapetanakis wrote: > Thanks a lot for the help. > > Actually, I am using a mac which (R for Mac OS X GUI 1.40-devel > Leopard > build 32-bit (5751)) but I think I can find access on windows 7 64- > bit. I don't think that was what Holtman was advising. You just need more available memory, no need to use Win7. The Mac platform has been 64- bit capable longer than the Windoze OS, anyway. The way you get there might be as simple as rebooting, not starting any other applications, and re-running your code. Success depends upon how much addressable memory you have, which you did not state. All of the stuff below is immaterial to these considerations. > What > I am trying to do is a maximization through grid search (because I > am not > sure that any of the optim() methods works sufficiently to my case, > at least > all of them provide quite different results), the reason that I want > the > optimizing is because I want to use it for a Monte Carlo analysis for > Smoothed Maximum Score estimator, and for that reason I want the > optimization to be the most efficient possible, but given that I am > kind of > amateur on R and on programming in general, I doubt that I can do that > sufficiently. Your code ran without problem on my Mac running Leopard using an R64 GUI session with 32 GB RAM (R.app GUI 1.41 (5866)). > str(G.search) num [1:40000000, 1:3] 1 1 1 1 1 1 1 1 1 1 ... I have no idea whether it produced meaningful results, but a 120 million item matrix is not a problem with enough physical memory. It's only around a Gig. Your error indicated a problem with allocating 915.5 Mb. That should be possible (although borderline) in 4GB Mac running 32 bit R. (32 bit R is more memory efficient when working with physical memory of 4 GB or less because the pointer size is smaller.) -- david. > -- > View this message in context: http://r.789695.n4.nabble.com/memory-problem-Error-cannot-allocate-vector-of-size-915-5-Mb-tp3707943p3709002.html > Sent from the R help mailing list archive at Nabble.com. > David Winsemius, MD West Hartford, CT From kitty.a1000 at gmail.com Mon Aug 1 17:38:51 2011 From: kitty.a1000 at gmail.com (kitty) Date: Mon, 1 Aug 2011 16:38:51 +0100 Subject: [R] Plotting problems directional or rose plots In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Mon Aug 1 17:41:08 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 11:41:08 -0400 Subject: [R] General indexing in multidimensional arrays In-Reply-To: <4E36BD4B.9010304@gmail.com> References: <4E3673FD.5030800@yahoo.de> <4E36BD4B.9010304@gmail.com> Message-ID: On Aug 1, 2011, at 10:50 AM, Duncan Murdoch wrote: > On 11-08-01 5:38 AM, Jannis wrote: >> Dear R community, >> >> >> I have a general question regarding indexing in multidiemensional >> arrays. >> >> Imagine I have a three dimensional array and I only want to extract >> on >> vector along a single dimension from it: >> >> >> data<- array(rnorm(64),dim=c(4,4,4)) >> >> result<- data[1,1,] >> >> If I want to extract more than one of these vectors, it would now >> really >> help me to supply a logical matrix of the size of the first two >> dimensions: >> >> >> indices<- matrix(FALSE,ncol=4,nrow=4) >> indices[1,3]<- TRUE >> indices[4,1]<- TRUE >> >> result<- data[indices,] Is this the right answer? > result<- which(indices, arr.ind=TRUE) > result row col [1,] 4 1 [2,] 1 3 > apply(result, 1, function(x) data[x[1], x[2], ]) [,1] [,2] [1,] 1.62880528 0.7781005 [2,] -0.08861725 -2.1791674 [3,] 0.78242531 -1.0352826 [4,] 1.40012118 -1.2541230 ....if so, it should be possible to encapsulate that behavior in a function. -- David Winsemius, MD West Hartford, CT >> >> This, however would give me an error. I am used to this kind of >> indexing >> from Matlab and was wonderingt whether there exists an easy way to do >> this in R without supplying complicated index matrices of all three >> dimensions or logical vectors of the size of the whole matrix? >> >> The only way I could imagine would be to: >> >> result<- data[rep(as.vector(indices),times=4)] >> >> but this seems rather complicated and also depends on the order of >> the >> dimensions I want to extract. >> >> >> I do not want R to copy Matlabs behaviour, I am just wondering >> whether I >> missed one concept of indexing in R? >> > > Base R doesn't have anything like that as far as I know. The closest > is matrix indexing: you construct a 3 column matrix whose rows are > the indices of each element you want to extract. > > Possibly plyr or some other package has functions to do this. From gleynes+r at gmail.com Mon Aug 1 17:41:41 2011 From: gleynes+r at gmail.com (Gene Leynes) Date: Mon, 1 Aug 2011 10:41:41 -0500 Subject: [R] Plotting question In-Reply-To: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> References: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From francesca.pancotto at gmail.com Mon Aug 1 17:13:14 2011 From: francesca.pancotto at gmail.com (Francesca) Date: Mon, 1 Aug 2011 17:13:14 +0200 Subject: [R] Reorganize(stack data) a dataframe inducing names In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rwp7h at virginia.edu Mon Aug 1 17:32:11 2011 From: rwp7h at virginia.edu (Robert Pfister) Date: Mon, 1 Aug 2011 11:32:11 -0400 Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From margauxkeller at gmail.com Mon Aug 1 17:18:58 2011 From: margauxkeller at gmail.com (Margaux Keller) Date: Mon, 1 Aug 2011 11:18:58 -0400 Subject: [R] Write.table Question Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Mon Aug 1 17:48:00 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 11:48:00 -0400 Subject: [R] Problem with gam() after R update In-Reply-To: <54f538a8-4456-6025-9f32-e0e470bb8030@me.com> References: <54f538a8-4456-6025-9f32-e0e470bb8030@me.com> Message-ID: <2B03913D-7090-43A7-9821-58B032C3B6E9@comcast.net> On Aug 1, 2011, at 5:01 AM, Przemek Jura wrote: > Dear group, > I experience s?ome problems with gam() function after R update to > version 2.13.1 > The function in both gam and mgcv packages stopped to work. Before, > with the same code I used, everything was fine. Reports like this often turn out to be inaccurate because either the (not offered) code was not the "same" or the (also not offered) data was different. Did you reinstall these packages? How? How many versions "up" was the update? sessionInfo()? > The function from gam package yields following warning: > > Residual degrees of freedom are negative or zero. This occurs when > the sum of the parametric and nonparametric degrees of freedom > exceeds the number of observations. The model is probably too > complex for the amount of data available That certainly looks like an informative error message. What do you want us to do about it? > > while gam() from mgcv crashes R. A report of a real "crash" should go to the package maintainer with a lot more detail than you have provided above. > > Did I miss something? Perhaps reading the Posting Guide? > > Thank you in advance. > > PJ > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From gunter.berton at gene.com Mon Aug 1 17:48:54 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Mon, 1 Aug 2011 08:48:54 -0700 Subject: [R] Plotting question In-Reply-To: <4E36BD77.1050409@gmail.com> References: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> <4E36BD77.1050409@gmail.com> Message-ID: IMHO: On Mon, Aug 1, 2011 at 7:51 AM, Duncan Murdoch wrote: > On 11-08-01 5:44 AM, Andrew McCulloch wrote: >> >> Hi, >> >> I use R to draw my graphs. I have 100 points on a simple xy-plot. The >> points are >> distinguished by a third variable which is categorical with 10 levels. I >> have >> been plotting x against y and using gray scales to distinguish the level >> of the >> categorical variable for each point. It looks ok to me but a journal >> reviewer >> says this is not any use. I cannot afford to pay for colour prints. Any >> ideas on >> what is the best way to distinguish 10 groups on an xy scatter plot? > > Plot digits or letters or other symbols. > > Duncan Murdoch > No, this does not work. See Cleveland's books (e.g. "Visualizing Data"). 10 is too many symbols to constantly refer to a legend to keep straight, and digits or letters do not allow you to readily perceive the pattern. (Caveat: If "most" of the data are only 2 or 3 of the symbols, then these can work). I think the OP's idea of using gray scales was better. I would dispute the reviewer and refer them to appropriate references. Alternatively, thermometer plots (aka "filled rectangle" plots) would be best. Again, Cleveland's books provide scientific justification rather than merely the (possibly uninformed) aesthetic opinion of a reviewer. Presumably, the journal editor would accept hard data and psychological research in preference to opinions. >> >> >> >> If all else fails I can just remove the graph and give them a table of >> regression coefficients. No. I think your attempt to use a graph is a much better way to go. Try to resist poor practices such as just publishing summary statistics. Cheers, Bert >> >> >> Thanks. >> >> Yours Sincerely >> Andrew McCulloch >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From gleynes+r at gmail.com Mon Aug 1 17:50:57 2011 From: gleynes+r at gmail.com (Gene Leynes) Date: Mon, 1 Aug 2011 10:50:57 -0500 Subject: [R] General indexing in multidimensional arrays In-Reply-To: <4E3673FD.5030800@yahoo.de> References: <4E3673FD.5030800@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ivan.calandra at uni-hamburg.de Mon Aug 1 17:59:54 2011 From: ivan.calandra at uni-hamburg.de (Ivan Calandra) Date: Mon, 01 Aug 2011 17:59:54 +0200 Subject: [R] Write.table Question In-Reply-To: References: Message-ID: <4E36CD7A.2050808@uni-hamburg.de> Hi Margaux, Check the row.names and col.names arguments of write.table. See ?write.table write.table (dat, file = "/path/to/my/data.txt", sep = " ", col.names=FALSE, row.names=FALSE) HTH, Ivan Le 8/1/2011 17:18, Margaux Keller a ?crit : > Hi, > > I'm trying to create an abbreviated data file from a larger version. I can > use the subset command to create a value for this data: > > dat<-subset(raw.data, select=c(SNP, Pvalue)) > >> head (dat) > SNP Pvalue > 1 rs11 0.6516 > 2 rs12 0.3311 > 3 rs13 0.5615 > > but when I try to write.table using: > > write.table (dat, file = "/path/to/my/data.txt", sep = " ", col.names=NA) > > I end up with a file that looks like this: > > "" "SNP" "Pvalue" > "1" "rs11" 0.6516 > "2" "rs12" 0.3311 > "3" "rs13" 0.5615 > > when what I want is something that looks like this: > > rs11 0.6516 > rs12 0.3311 > rs13 0.5615 > > What should I be including? > > Thanks, > Margaux > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Dept. Mammalogy Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra at uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/1525_8_1.php From f.harrell at vanderbilt.edu Mon Aug 1 18:10:15 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Mon, 1 Aug 2011 09:10:15 -0700 (PDT) Subject: [R] How to make a nomogam and Calibration plot In-Reply-To: <1312213591451-3710068.post@n4.nabble.com> References: <1312213591451-3710068.post@n4.nabble.com> Message-ID: <1312215015817-3710126.post@n4.nabble.com> Kindly do not attach questions in a separate document. Install and read the documentation for the R rms package, and see handouts at http://biostat.mc.vanderbilt.edu/rms Frank sytangping wrote: > > Dear R users, > > I am a new R user and something stops me when I try to write a academic > article. I want to make a nomogram to predict the risk of prostate cancer > (PCa) using several factors which have been selected from the Logistic > regression run under the SPSS. Always, a calibration plot is needed to > validate the prediction accuracy of the nomogram. > However, I tried many times and read a lot of posts with respect to this > topic but I still couldn't figure out how to draw the nomogram and the > calibration plot. My dataset and questions in detail are shown in two > attached files. It will be very grateful if someone can save his/her time > to help for my questions. > > Warmest regards! > > Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls > Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc > R_help.doc > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710126.html Sent from the R help mailing list archive at Nabble.com. From buysellrentoffer at gmail.com Mon Aug 1 18:17:53 2011 From: buysellrentoffer at gmail.com (world peace) Date: Mon, 1 Aug 2011 12:17:53 -0400 Subject: [R] possible reason for merge not working Message-ID: Hi Guys, working on a "merge" for 2 data frames. Using the command: x <- merge(annotatedData, UCSCgenes, by.x="names", by.y="Ensembl.Gene.ID", all.x=TRUE) names and Ensembl.Gene.ID are columns with similar elements from the x and y data frames. annotatedData has 8909 entries, so has x(as expected). x has columns for UCSCgenes, but there is no data in them, all n/a, as if no match exists. This is not true as I can manually see and find many similarities between the names and UCSCgenes columns. I am wondering if there is any syntax error, or logical. comments appreciated. Thanks Dan From surgeon666666 at yahoo.com.cn Mon Aug 1 17:46:31 2011 From: surgeon666666 at yahoo.com.cn (sytangping) Date: Mon, 1 Aug 2011 08:46:31 -0700 (PDT) Subject: [R] How to make a nomogam and Calibration plot Message-ID: <1312213591451-3710068.post@n4.nabble.com> Dear R users, I am a new R user and something stops me when I try to write a academic article. I want to make a nomogram to predict the risk of prostate cancer (PCa) using several factors which have been selected from the Logistic regression run under the SPSS. Always, a calibration plot is needed to validate the prediction accuracy of the nomogram. However, I tried many times and read a lot of posts with respect to this topic but I still couldn't figure out how to draw the nomogram and the calibration plot. My dataset and questions in detail are shown in two attached files. It will be very grateful if someone can save his/her time to help for my questions. Warmest regards! Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc R_help.doc -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710068.html Sent from the R help mailing list archive at Nabble.com. From jholtman at gmail.com Mon Aug 1 18:22:38 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 1 Aug 2011 12:22:38 -0400 Subject: [R] Reorganize(stack data) a dataframe inducing names In-Reply-To: References: Message-ID: Try this: had to add extra names to your data since it was not clear how it was organized. Next time use 'dput' to enclose data. > x <- read.table(textConnection(" index time key date values + 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 + 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 + 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 + 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 + 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 + 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 + 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 + 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 + 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 + 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 + 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 + 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 + 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 + 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 + 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 + 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 + 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43") + , header = TRUE + , as.is = TRUE + ) > closeAllConnections() > x index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 2 13733 27974 DATA.Q211.SUM.Index 05/10/11 1.45 3 13734 27984 DATA.Q211.SUM.Index 06/01/11 1.22 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 5 13747 28624 DATA.Q211.TDS.Index 05/20/11 1.40 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 7 13755 29272 DATA.Q211.UBS.Index 05/03/11 1.48 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 11 13776 31225 DATA.Q211.WF.Index 05/12/11 1.42 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 13 13790 31875 DATA.Q211.WPC.Index 04/08/11 1.42 14 13791 31883 DATA.Q211.WPC.Index 05/10/11 1.43 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 16 13805 32525 DATA.Q211.XTB.Index 05/30/11 1.40 17 13806 32532 DATA.Q211.XTB.Index 06/28/11 1.43 > # get index of first occurance of 'key' column > indx <- !duplicated(x$key) > x[indx,] index time key date values 1 13732 27965 DATA.Q211.SUM.Index 04/08/11 1.42 4 13746 28615 DATA.Q211.TDS.Index 04/07/11 1.35 6 13754 29262 DATA.Q211.UBS.Index 05/02/11 1.30 8 13761 29915 DATA.Q211.UCM.Index 04/28/11 1.43 9 13768 30565 DATA.Q211.VDE.Index 05/02/11 1.48 10 13775 31215 DATA.Q211.WF.Index 04/14/11 1.44 12 13789 31865 DATA.Q211.WPC.Index 04/01/11 1.40 15 13804 32515 DATA.Q211.XTB.Index 04/29/11 1.50 > > On Mon, Aug 1, 2011 at 11:13 AM, Francesca wrote: > Dear Contributors > thanks for any help you can provide. I searched the threads > but I could not find any query that satisfied my needs. > This is my database: > ?index time ? ? ? ? values > 13732 ?27965 DATA.Q211.SUM.Index ? ?04/08/11 ? ? ? ? 1.42 > 13733 ?27974 DATA.Q211.SUM.Index ? ?05/10/11 ? ? ? ? 1.45 > 13734 ?27984 DATA.Q211.SUM.Index ? ?06/01/11 ? ? ? ? 1.22 > 13746 ?28615 DATA.Q211.TDS.Index ? ?04/07/11 ? ? ? ? 1.35 > 13747 ?28624 DATA.Q211.TDS.Index ? ?05/20/11 ? ? ? ? 1.40 > 13754 ?29262 DATA.Q211.UBS.Index ? ?05/02/11 ? ? ? ? 1.30 > 13755 ?29272 DATA.Q211.UBS.Index ? ?05/03/11 ? ? ? ? 1.48 > 13761 ?29915 DATA.Q211.UCM.Index ? ?04/28/11 ? ? ? ? 1.43 > 13768 ?30565 DATA.Q211.VDE.Index ? ?05/02/11 ? ? ? ? 1.48 > 13775 ?31215 DATA.Q211.WF.Index ? ? 04/14/11 ? ? ? ? 1.44 > 13776 ?31225 DATA.Q211.WF.Index ? ? 05/12/11 ? ? ? ? 1.42 > 13789 ?31865 DATA.Q211.WPC.Index ? ?04/01/11 ? ? ? ? 1.40 > 13790 ?31875 DATA.Q211.WPC.Index ? ?04/08/11 ? ? ? ? 1.42 > 13791 ?31883 DATA.Q211.WPC.Index ? ?05/10/11 ? ? ? ? 1.43 > 13804 ?32515 DATA.Q211.XTB.Index ? ?04/29/11 ? ? ? ? 1.50 > 13805 ?32525 DATA.Q211.XTB.Index ? ?05/30/11 ? ? ? ? 1.40 > 13806 ?32532 DATA.Q211.XTB.Index ? ?06/28/11 ? ? ? ? 1.43 > > I need to select only the rows of this database that correspond to each > of the first occurrences of the string represented in column > index. In the example shown I would like to obtain a new > data.frame which is > > index time ? ? ? ? values > 13732 ?27965 DATA.Q211.SUM.Index ? ?04/08/11 ? ? ? ? 1.42 > 13746 ?28615 DATA.Q211.TDS.Index ? ?04/07/11 ? ? ? ? 1.35 > 13754 ?29262 DATA.Q211.UBS.Index ? ?05/02/11 ? ? ? ? 1.30 > 13761 ?29915 DATA.Q211.UCM.Index ? ?04/28/11 ? ? ? ? 1.43 > 13768 ?30565 DATA.Q211.VDE.Index ? ?05/02/11 ? ? ? ? 1.48 > 13775 ?31215 DATA.Q211.WF.Index ? ?04/14/11 ? ? ? ? 1.44 > 13789 ?31865 DATA.Q211.WPC.Index ? ?04/01/11 ? ? ? ? 1.40 > 13804 ?32515 DATA.Q211.XTB.Index ? ?04/29/11 ? ? ? ? 1.50 > > As you can see, it is not the whole string to change, > rather a substring that is part of it. I want to select > only the first values related to the row that presents for the first time > the different part of the string(substring). > I know how to select rows according to a substring condition on the > index column, but I cannot use it here because the substring changes > and moreover the number of occurrences per substring is variable. > > Thank you for any help you can provide. > Francesca > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jholtman at gmail.com Mon Aug 1 18:25:13 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 1 Aug 2011 12:25:13 -0400 Subject: [R] possible reason for merge not working In-Reply-To: References: Message-ID: What you "see" and what the data really is may be two different things. You should have at least enclosed an 'str' of the two data frames; even better would be a subset of the data using 'dput'. Most likely your problem is that your data is not what you 'expect' it to be. On Mon, Aug 1, 2011 at 12:17 PM, world peace wrote: > Hi Guys, > > working on a "merge" for 2 data frames. > > Using the command: > > x <- merge(annotatedData, UCSCgenes, by.x="names", > by.y="Ensembl.Gene.ID", all.x=TRUE) > > names and Ensembl.Gene.ID are columns with similar elements from the x > and y data frames. > > annotatedData has 8909 entries, so has x(as expected). x has columns > for UCSCgenes, but there is no data in them, all n/a, as if no match > exists. > This is not true as I can manually see and find many similarities > between the names and UCSCgenes columns. > > I am wondering if there is any syntax error, or logical. > > comments appreciated. > > Thanks > Dan > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jvadams at usgs.gov Mon Aug 1 18:33:39 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Mon, 1 Aug 2011 11:33:39 -0500 Subject: [R] possible reason for merge not working In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Mon Aug 1 18:35:17 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 12:35:17 -0400 Subject: [R] possible reason for merge not working In-Reply-To: References: Message-ID: <4C5A46B6-5D6B-4F90-96FE-E7A179BF31CB@comcast.net> On Aug 1, 2011, at 12:17 PM, world peace wrote: > Hi Guys, > > working on a "merge" for 2 data frames. > > Using the command: > > x <- merge(annotatedData, UCSCgenes, by.x="names", > by.y="Ensembl.Gene.ID", all.x=TRUE) > > names and Ensembl.Gene.ID are columns with similar elements from the x > and y data frames. > > annotatedData has 8909 entries, so has x(as expected). x has columns > for UCSCgenes, but there is no data in them, all n/a, as if no match > exists. > This is not true as I can manually see and find many similarities The merge function does not work on "similarities". Matches need to be exact. > between the names and UCSCgenes columns. > > I am wondering if there is any syntax error, or logical. Probably logical. -- David Winsemius, MD West Hartford, CT From michael.weylandt at gmail.com Mon Aug 1 18:43:37 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Mon, 1 Aug 2011 12:43:37 -0400 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: <1312202982.4034.184.camel@mattotaupa> References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> <1312202982.4034.184.camel@mattotaupa> Message-ID: <1F1BFA51-F26B-4E7E-BC69-5D7EA02E9D5F@gmail.com> I've only got a 20 minute layover, but three quick remarks: 1) Do a sanity check on your data size: if you want a million walks of a thousand steps, that already gets you to a billion integers to store--even at a very low bound of one byte each, thats already 1GB for the data and you still have to process it all and run the OS. If you bump this to walks of length 10k, you are in big trouble. Considered like that, it shouldn't surprise you that you are getting near memory limits. If you really do need such a large simulation and are willing to make the time/space tradeoff, it may be worth doing simulations in smaller batches (say 50-100) and aggregating the needed stats for analysis. Also, consider direct use of the rm() function for memory management. 2) If you know that which.max()==1 can't happen for your data, might this trick be easier than forcing it through some tricky logic inside the which.max() X=which.max(...) if(X[1]==1) X=Inf # or whatever value 3) I dont have any texts at hand to confirm this but isn't the expected value of the first hit time of a RW infinite? I think a handwaving proof can be squeezed out of the optional stopping theorem with T=min(T_a,T_b) for a<0 -Inf. If I remember right, this suggests you are trying to calculate a CI for a distribution with no finite moments, a difficult task to say the least. Hope these help and I'll write a more detailed reply to your notes below later, Michael Weylandt PS - what's an iterated RW? This is all outside my field (hence my spitball on #2 above) PS2 - sorry about the row/column mix-up: I usually think of sample paths as rows... On Aug 1, 2011, at 8:49 AM, Paul Menzel wrote: > Am Sonntag, den 31.07.2011, 23:32 -0500 schrieb R. Michael Weylandt : >> Glad to help -- I haven't taken a look at Dennis' solution (which may be far >> better than mine), but if you do want to keep going down the path outlined >> below you might consider the following: > > I will try Dennis? solution right away but looked at your suggestions > first. Thank you very much. > >> Instead of throwing away a simulation if something starts negative, why not >> just multiply the entire sample by -1: that lets you still use the sample >> and saves you some computations: of course you'll have to remember to adjust >> your final results accordingly. > > That is a nice suggestion. For a symmetric random walk this is indeed > possible and equivalent to looking when the walk first hits zero. > >> This might avoid the loop: >> >> x = ## Whatever x is. >> xLag = c(0,x[-length(x)]) # 'lag' x by 1 step. >> which.max((x>=0) & (xLag <0)) + 1 # Depending on how you've decided to count >> things, this +1 may be extraneous. >> >> The inner expression sets a 0 except where there is a switch from negative >> to positive and a one there: the which.max function returns the location of >> the first maximum, which is the first 1, in the vector. If you are >> guaranteed the run starts negative, then the location of the first positive >> should give you the length of the negative run. > > That is the same idea as from Bill [1]. The problem is, when the walk > never returns to zero in a sample, `which.max(?everything FALSE)` > returns 1 [2]. That is no problem though, when we do not have to worry > about a walk starting with a positive value and adding 1 (+1) can be > omitted when we count the epochs of first hitting 0 instead of the time > of how long the walk stayed negative, which is always one less. > > Additionally my check `(x>=0) & (xLag <0)` is redundant when we know we > start with a negative value. `(x>=0)` should be good enough in this > case. > >> This all gives you, >> >> f4 <- function(n = 100000, # number of simulations >> length = 100000) # length of iterated sum >> { >> R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) >> >>> R = apply(R,1,cumsum) >>> >> R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row > > The line above seems to look the columns instead of rows. I think the > following is correct since after the `apply()` above the random walks > are in the columns. > > R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] > >>> fTemp <- function(x) { >>> >> xLag = c(0,x[-length(x)]) >> return(which.max((x>=0) & (xLag <0))+1) >> >>> countNegative = apply(R,2,fTemp) >>> tabulate(as.vector(countNegative), length) >>> } >> >> That just crashed my computer though, so I wouldn't recommend it for large >> n,length. > > Welcome to my world. I would have never thought that simulating random > walks with a length of say a million would create that much data and > push common desktop systems with let us say 4 GB of RAM to their limits. > >> Instead, you can help a little by combining the lagging and the & >> all in one. >> >> f4 <- function(n = 100000, llength = 100000) >> { >> R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) >> R = apply(R,1,cumsum) >> R[R[,1]==(1),] = -1 * R[R[,1]==(-1),] # If the first element in the row is positive, flip the entire row >> R = (cbind(rep(0,NROW(R)),R)<0)&(cbind(R,rep(0,NROW(R)))>=0) >> countNegative = apply(R,1,which.max) + 1 >> return (tabulate(as.vector(countNegative), length) ) >> >>> } > > I left that one out, because as written above the check can be > shortened. > >> Of course, this is all starting to approach a very specific question that >> could actually be approached much more efficiently if it's your end goal >> (though I think I remember from your first email a different end goal): > > That is true. But to learn some optimization techniques on a simple > example is much appreciated and will hopefully help me later on for the > iterated random walk cases. > >> We can use the symmetry and "restart"ability of RW to do the following: >> >> x = cumsum(sample(c(-1L,1L),BIGNUMBER,replace=T) >> D = diff(which(x == 0)) > > Nice! > >> This will give you a vector of how long x stays positive or negative at a >> time. Thinking through some simple translations lets you see that this set >> has the same distribution as how long a RW that starts negative stays >> negative. > > I have to write those translations down. On first sight though we need > again to handle the case where it stays negative the whole time. `D` > then has length 0 and we have to count that for a walk longer than > `BIGNUMBER`. > >> Again, this is only good for answering a very specific question >> about random walks and may not be useful if you have other more complicated >> questions in sight. > > Just testing for 0 for the iterated cases will not be enough for > iterated random walks since an iterated random walk can go from negative > to non-negative without being zero at this time/epoch. > > I implemented all your suggestions and got the following. > > -------- 8< -------- code -------- >8 -------- > f4 <- function(n = 100000, # number of simulations > length = 100000) # length of iterated sum > { > R = matrix(sample(c(-1L,1L),length*n,replace=T),nrow=n) > R = apply(R,1,cumsum) ## this applies cumsum `row-wise' to R and will make your life INFINITELY better > fTemp <- function(x) { > if (x[1] >= 0 ) { > return(1) > } > > for (i in 1:length-1) { > if (x[i] < 0 && x[i + 1] >= 0) { > return(as.integer(i/2 + 2)) > } > } > } > countNegative = apply(R,2,fTemp) > tabulate(as.vector(countNegative), length) > } > > f5 <- function(n = 100000, # number of simulations > length = 100000) # length of iterated sum > { > R = matrix(sample(c(-1L,1L), length*n,replace=T),nrow=n) > > R = apply(R,1,cumsum) > R[,R[1,]==(1)] = -1 * R[,R[1,]==(1)] # If the first element in the row is positive, flip the entire row > > R <- R>=0 > countNegative = apply(R,2,which.max) > tabulate(as.vector(countNegative), length) > } > > f6 <- function(n = 100000, # number of simulations > length = 100000) # length of iterated sum > { > x = cumsum(sample(c(-1L,1L), length*n,replace=T)) > D = diff(which(c(0, x) == 0)) > tabulate(D, max(D)) > } > -------- 8< -------- code -------- >8 -------- > > The timings differ quite much which is expected though. > >> # f1 is using only for loops but only does half the calculations >> # and does not yet flip random walks starting with a positive value. >> set.seed(1) ; system.time( z1 <- f1(300, 1e5) ) > User System verstrichen > 2.700 0.008 2.729 >> # f1 adapted with flips >> set.seed(1) ; system.time( z1f <- f1withflip(300, 1e5) ) > User System verstrichen > 4.457 0.004 4.475 >> set.seed(1) ; system.time( z4 <- f4(300, 1e5) ) > User System verstrichen > 8.033 0.380 8.739 >> set.seed(1) ; system.time( z5 <- f5(300, 1e5) ) > User System verstrichen > 9.640 0.812 10.588 >> set.seed(1) ; system.time( z6 <- f6(300, 1e5) ) > User System verstrichen > 4.208 0.328 4.606 > > So `f6` seems to be the most efficient setting right now and even is > slightly faster than `f1` with the for loops. But we have to keep in > mind that both operate on different data sets although `set.seed(1)` is > used and `f6` treats the problem totally different. > > One other thought is that when reusing the walks starting with a positiv > term and flipping those we can probably also take the backward/reverse > walk (dual problem). I will try that too. > > > Thank you very much, > > Paul > > > [1] https://stat.ethz.ch/pipermail/r-help/2011-July/285015.html > [2] https://stat.ethz.ch/pipermail/r-help/2011-July/285396.html From buysellrentoffer at gmail.com Mon Aug 1 18:47:45 2011 From: buysellrentoffer at gmail.com (world peace) Date: Mon, 1 Aug 2011 12:47:45 -0400 Subject: [R] possible reason for merge not working In-Reply-To: References: Message-ID: the answer was indeed in subtle differences, and 'str' did help. Problem is solved. Thanks everybody for comments which was all very useful. Best, On Mon, Aug 1, 2011 at 12:25 PM, jim holtman wrote: > What you "see" and what the data really is may be two different > things. ?You should have at least enclosed an 'str' of the two data > frames; even better would be a subset of the data using 'dput'. ?Most > likely your problem is that your data is not what you 'expect' it to > be. > > On Mon, Aug 1, 2011 at 12:17 PM, world peace wrote: >> Hi Guys, >> >> working on a "merge" for 2 data frames. >> >> Using the command: >> >> x <- merge(annotatedData, UCSCgenes, by.x="names", >> by.y="Ensembl.Gene.ID", all.x=TRUE) >> >> names and Ensembl.Gene.ID are columns with similar elements from the x >> and y data frames. >> >> annotatedData has 8909 entries, so has x(as expected). x has columns >> for UCSCgenes, but there is no data in them, all n/a, as if no match >> exists. >> This is not true as I can manually see and find many similarities >> between the names and UCSCgenes columns. >> >> I am wondering if there is any syntax error, or logical. >> >> comments appreciated. >> >> Thanks >> Dan >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > From jwiley.psych at gmail.com Mon Aug 1 18:52:16 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Mon, 1 Aug 2011 09:52:16 -0700 Subject: [R] Impact of multiple imputation on correlations In-Reply-To: <20110801070311.152960@gmx.net> References: <20110801070311.152960@gmx.net> Message-ID: Hi Tina, That is quite a bit of missingness, especially considering the sample size is not large to begin with. This would make me treat *any* result cautiously. That said, if you have a reasonable idea what the mechanism causing the missingness is or if from additional variables in your study, you can model the missing data mechanism sufficiently that you are confident (for some definition of confident) that the missingness is random after accounting for your model (conditional independence, I forget if Rubin calls it MCAR or MAR), you are in a reasonable place to use MI and draw inferences from the results. Even if you are uncertain about this, it is *not* any better to just say, "well there was too much missing data for me to feel safe using MI so here is the correlation based just on the observed data". That _will be biased_ unless the missing data mechanism is completely random (even unconditioned on anything else in your study; for example if participants flipped coins to decide which questions to respond to). When averaging correlations, it is conventional to average the inverse hyperbolic function of the correlations and then use the hyperbolic function to transform the averaged value back to the original units (also known as Fisher's Z transformation). The mice package may do this automatically if there is a functiong to compute pooled correlations. How results between simply deleted cases with any value unobserved and using MI varies. There may be no difference, are larger difference, or a smaller difference. Looking at the scatter plot matrix from the different imputations, I do not know that I would actually classify that as varying quite a bit. I realize the sign of the slope changes some, but that is not too surprising because all of them are somewhat close to flat. You can compare the between imputation variance to the within imputation variance (I think mice gives you this information). I partly addressed your last question at the beginning---I would certainly not trust the correlation obtained simply by deleting missingness, but I also would not trust the result obtained using MI unless it was well setup. Although you have shown us some of the data, you have not mentioned how you modelled the missingness. This can have a substantial impact on your results (and also their trustworthyness). mice provides a number of different models and you have a choice in what variables you use if you collect a lot in your study. Given all of this, I would suggest finding a local statistician or consultant to talk with about this. Your question(s) are more statistical than they are R related. Also, in addition to learning more about MI (there are several good books and articles on it that you can look up or email me offlist and I can provide references if you want), someone who is there can be more helpful because they will have access to your whole dataset and can work with you to find the best variables/model to model the missing data mechanism. I hope this helps and good luck, Josh On Mon, Aug 1, 2011 at 12:03 AM, wrote: > Dear all, > > I have been attempting to use multiple imputation (MI) to handle missing data in my study. I use the mice package in R for this. The deeper I get into this process, the more I realize I first need to understand some basic concepts which I hope you can help me with. > > For example, let us consider two arbitrary variables in my study that have the following missingness pattern: > > Variable 1 available, Variable 2 available: 51 (of 118 observations, 43%) > Variable 1 available, Variable 2 missing: 37 (31,3%) > Variable 1 missing, Variable 2 available: 10 (8,4%) > Variable 1 missing, Variable 2 missing: 20 (16,9%) > > I am interested in the correlation between Variable 1 and Variable 2. > > Q1. Does it even make sense for me to use MI (or anything else, really) to replace my missing data when such large fractions are not available? > > Plot 1 (http://imgur.com/KFV9y&CmV1sl) provides a scatter plot of these example variables in the original data. The correlation coefficient r = -0.34 and p = 0.016. > > Q2. I notice that correlations between variables in imputed data (pooled estimates over all imputations) are much lower and less significant than the correlations in the original data. For this example, the pooled estimates for the imputed data show r = -0.11 and p = 0.22. > > Since this seems to happen in all the variable combinations that I have looked at, I would like to know if MI is known to have this behavior, or whether this is specific to my imputation. > > Q3. When going through the imputations, the distribution of the individual variables (min, max, mean, etc.) matches the original data. However, correlations and least-square line fits vary quite a bit from imputation to imputation (see Plot 2, http://imgur.com/KFV9yl&CmV1s). Is this normal? > > Q4. Since my results differ (quite significantly) between the original and imputed data, which one should I trust? > > Thank you for your help in advance. > Tina > -- > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From rwp7h at virginia.edu Mon Aug 1 18:54:16 2011 From: rwp7h at virginia.edu (Robert Pfister) Date: Mon, 1 Aug 2011 12:54:16 -0400 Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vikas.bansal at kcl.ac.uk Mon Aug 1 18:59:38 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Mon, 1 Aug 2011 17:59:38 +0100 Subject: [R] Inserting column in between Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC93@KCL-MAIL01.kclad.ds.kcl.ac.uk> Dear all, I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From jvadams at usgs.gov Mon Aug 1 19:02:04 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Mon, 1 Aug 2011 12:02:04 -0500 Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jdnewmil at dcn.davis.ca.us Mon Aug 1 19:02:38 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Mon, 01 Aug 2011 10:02:38 -0700 Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 In-Reply-To: References: Message-ID: <65d3e963-7b22-49bd-ad78-944f8f29dd18@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sarah.goslee at gmail.com Mon Aug 1 19:10:05 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 1 Aug 2011 13:10:05 -0400 Subject: [R] Inserting column in between In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC93@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC93@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: > Dear all, > > I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. > > -- Sarah Goslee http://www.functionaldiversity.org From dcarlson at tamu.edu Mon Aug 1 18:51:50 2011 From: dcarlson at tamu.edu (David L Carlson) Date: Mon, 1 Aug 2011 11:51:50 -0500 Subject: [R] Limited number of principal components in PCA In-Reply-To: References: <1311964387395-3704956.post@n4.nabble.com> Message-ID: <011301cc506b$5060b6a0$f12223e0$@edu> Providing the data will help, but the first thing I noted is that you have more columns (variables) than rows (cases). PCA will return a maximum of (the number of columns) or (the number of rows-1) whichever is less. With 84 columns and 66 rows means you can get no more than 65 components. If the variables are highly correlated, you will get fewer components and that probably explains the reduction to 54. I would guess the variables are highly correlated and the first eigenvalue is very large. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Joshua Wiley Sent: Friday, July 29, 2011 10:20 PM To: William Armstrong Cc: r-help at r-project.org Subject: Re: [R] Limited number of principal components in PCA Hi Billy, Can you provide your data? You could attach it as a text file or provide it by pasting the output of: dput(Q) into an email. It would help if we could reproduce what you are doing. You might also consider a list or forum that is more statistics oriented than Rhelp, as your questions are more related to the statistics than the software itself (but still, if you give us data, you will probably get farther). Cheers, Josh On Fri, Jul 29, 2011 at 11:33 AM, William Armstrong wrote: > Hi all, > > I am attempting to run PCA on a matrix (nrow=66, ncol=84) using 'prcomp' > (stats package). My data (referred to as 'Q' in the code below) are > separate river streamflow gaging stations (columns) and peak instantaneous > discharge (rows). I am attempting to use PCA to identify regions of that > vary together. > > I am entering the following command: > > test_pca_Q<-prcomp(~.,data=Q,scale.=TRUE,retx=FALSE,na.action=na.omit) > > It is outputting 54 'standard deviation' numbers (which are the > sqrt(eigenvalues) in respect to a certain PC, am I correct?), and 54 > 'rotation' numbers, which are the variable loadings with respect to a given > PC. > > I have two questions: > > 1.) Why is it only outputting 54 PCs and standard deviations? If I have 84 > variables isn't the maximum number of PCs I can create 84 as well? > > 2.) Can I now use the 'rotation' values to find clusters of gages that I > acting together, or is there another step I must take? > > Thank you very much for your insight. > > Billy > > > -- > View this message in context: http://r.789695.n4.nabble.com/Limited-number-of-principal-components-in-PCA-tp3704956p3704956.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From gunter.berton at gene.com Mon Aug 1 19:17:50 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Mon, 1 Aug 2011 10:17:50 -0700 Subject: [R] Inserting column in between In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC93@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: > x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) > > On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >> Dear all, >> >> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >> >> > > > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From dcarlson at tamu.edu Mon Aug 1 19:20:36 2011 From: dcarlson at tamu.edu (David L Carlson) Date: Mon, 1 Aug 2011 12:20:36 -0500 Subject: [R] Plotting problems directional or rose plots In-Reply-To: References: Message-ID: <011401cc506f$5454fe70$fcfefb50$@edu> Searching R Graphical Manual (http://www.oga-lab.net/RGM2/, mirror http://www.oga-lab.net/RGM2/) shows possible candidates in packages circular (windrose), IDPmisc (plot.rose), climatol (rosavent), openair (windRose), and oce (as.windrose). ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of kitty Sent: Monday, August 01, 2011 10:39 AM To: r-help at r-project.org Subject: Re: [R] Plotting problems directional or rose plots Hi again, I have tried playing around with the code given to me by Alan and Jim, thank you for the code but unfortunately....I can't seem to get either of them to work... Alans does not work with the sample data and Jims is giving the error : Error in radial.grid(labels = labels, label.pos = label.pos, radlab = radlab, : could not find function "boxed.labels" I have also tried Rose plots in the (heR.Misc) library to to avail. Sorry, does anyone know how to get the plots I need? Thank you all for reading this and for your help k. On Tue, Jul 26, 2011 at 10:20 PM, kitty wrote: > Hi, > > I'm trying to get a plot that looks somewhat like the attached image > (sketched in word). > I think I need somthing called a rose diagram? but I can't get it to do > what I want. I'm happy to use any library. > > Essentially, I want a circle with degree slices every 10 degrees with 0 at > the top representing north, and > 'tick marks' around the outside in 10 degree increments to match the slices > (so the slices need to be ofset by 5 degrees so the 0 degree slice actually > faces north) > I then want to be able to colour in the slices depending on the distance > that the factor extends to; so for example the 9000 dist is the largest in > the example so should fill the slice, > a distance in this plot of 4500 would fill halfway up the slice. > I also want to be able to specify the colour of each slice so that I can > relate it back to the spatial correlograms I have. > > I have added some sample data below. > > Thank you for reading my post, > All help is greatly appreciated, > K > > sample data: > > #distance factor extends to > dist<-c(5000,7000,9000,4500,6000,500) > > #direction > angle<-c(0,10,20,30,40,50) > > #list of desired colour example, order corrisponds to associated > angle/direction > color.list<-c('red','blue','green','yellow','pink','black') > > (my real data is from 0 to 350 degrees, and so I have corresponding > distance and colour data for each 10 degree increment). > > > [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From gunter.berton at gene.com Mon Aug 1 19:27:26 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Mon, 1 Aug 2011 10:27:26 -0700 Subject: [R] Inserting column in between -- "better" way? Message-ID: Folks: I consider my reply below rather clumsy: One has to keep track of index numbers other than that which is inserted and must separately change column names. Is there as "essentially better" way to do this, either via base R or via an R package. I leave it to you to define "essentially better." Thanks. Cheers, Bert On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter wrote: > Doesn't work -- you lose column names. > > Try this instead: > > yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) > > Adjust column names after via: > > names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) > > Cheers, > Bert > > On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: >> x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) >> >> On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >>> Dear all, >>> >>> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >>> >>> >> >> >> >> -- >> Sarah Goslee >> http://www.functionaldiversity.org >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > "Men by nature long to get on to the ultimate truths, and will often > be impatient with elementary studies or fight shy of them. If it were > possible to reach the ultimate truths without the elementary studies > usually prefixed to them, these would not be preparatory studies but > superfluous diversions." > > -- Maimonides (1135-1204) > > Bert Gunter > Genentech Nonclinical Biostatistics > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From merik.nanish at gmail.com Mon Aug 1 19:34:39 2011 From: merik.nanish at gmail.com (Merik Nanish) Date: Mon, 1 Aug 2011 13:34:39 -0400 Subject: [R] Accessing the index of factor in by() function In-Reply-To: References: <30248_1311677098_4E2E9AAA_30248_13123_1_CAE6BpM5aT0nn8+7PS33KXvVwDg1sktQFL+ps_AbFVBUK8HvWbA@mail.gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dcarlson at tamu.edu Mon Aug 1 19:35:04 2011 From: dcarlson at tamu.edu (David L Carlson) Date: Mon, 1 Aug 2011 12:35:04 -0500 Subject: [R] Inserting column in between In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC93@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <012401cc5071$5920f380$0b62da80$@edu> Not when I do it. > a <- data.frame(A=1:10, B=11:20, D=31:40, E=41:50) > a A B D E 1 1 11 31 41 2 2 12 32 42 3 3 13 33 43 4 4 14 34 44 5 5 15 35 45 6 6 16 36 46 7 7 17 37 47 8 8 18 38 48 9 9 19 39 49 10 10 20 40 50 > b <- cbind(a[,1:2], C=21:30, a[,3:4]) > b A B C D E 1 1 11 21 31 41 2 2 12 22 32 42 3 3 13 23 33 43 4 4 14 24 34 44 5 5 15 25 35 45 6 6 16 26 36 46 7 7 17 27 37 47 8 8 18 28 38 48 9 9 19 29 39 49 10 10 20 30 40 50 ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter Sent: Monday, August 01, 2011 12:18 PM To: Sarah Goslee Cc: r-help at r-project.org Subject: Re: [R] Inserting column in between Doesn't work -- you lose column names. Try this instead: yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) Adjust column names after via: names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) Cheers, Bert On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: > x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) > > On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >> Dear all, >> >> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >> >> > > > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From sarah.goslee at gmail.com Mon Aug 1 19:37:12 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 1 Aug 2011 13:37:12 -0400 Subject: [R] Inserting column in between In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC93@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: Bert, On Mon, Aug 1, 2011 at 1:17 PM, Bert Gunter wrote: > Doesn't work -- you lose column names. But I don't lose column names: > x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) > x A B C D E 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 > newcol <- 4:6 > cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 It's even possible to change names in the cbind() statement: > cbind(x[,1:2], Y=newcol, x[,3:ncol(x)]) A B Y C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 If for some reason it isn't working for you, you might try explicitly calling cbind.data.frame() instead of the default cbind(). > Try this instead: > > yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) > > Adjust column names after via: > > names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) This shouldn't be necessary, I think. What happens if you use my above example? Sarah > Cheers, > Bert > > On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: >> x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) >> >> On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >>> Dear all, >>> >>> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >>> >>> >> > -- Sarah Goslee http://www.functionaldiversity.org From sarah.goslee at gmail.com Mon Aug 1 19:43:52 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 1 Aug 2011 13:43:52 -0400 Subject: [R] Inserting column in between -- "better" way? In-Reply-To: References: Message-ID: Bert, On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter wrote: > Folks: > > I consider my reply below rather clumsy: One has to keep track of > index numbers other than that which is inserted and must separately > change column names. Is there as "essentially better" way to do this, > either via base R or via an R package. I leave it to you to define > "essentially better." > Having tried your solution with sample data, I'd have to agree. :) Your approach does mess up the column names, and also doesn't work if x is a matrix rather than data frame. Mine, using the full cbind(), works in both cases, preserving the column names and running even if x is a matrix. It could be written as a function, but since it's only one line and really only requires knowing at what position you'd like to add the new column, it hardly seems worth it unless it's something to be done repeatedly. > x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) > newcol <- 4:6 > cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 > > > x[,3:6] <- cbind(newcol, x[,3:5]) > x A B C D E E.1 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 > > > x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) > x <- as.matrix(x) > cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E [1,] 1 1 4 1 1 1 [2,] 2 2 5 2 2 2 [3,] 3 3 6 3 3 3 > x[,3:6] <- cbind(newcol, x[,3:5]) Error in x[, 3:6] <- cbind(newcol, x[, 3:5]) : subscript out of bounds Sarah > Thanks. > > Cheers, > Bert > > On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter wrote: >> Doesn't work -- you lose column names. >> >> Try this instead: >> >> yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) >> >> Adjust column names after via: >> >> names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) >> >> Cheers, >> Bert >> >> On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: >>> x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) >>> >>> On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >>>> Dear all, >>>> >>>> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >>> -- Sarah Goslee http://www.functionaldiversity.org From izahn at psych.rochester.edu Mon Aug 1 19:50:14 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Mon, 1 Aug 2011 13:50:14 -0400 Subject: [R] Inserting column in between -- "better" way? In-Reply-To: References: Message-ID: On Mon, Aug 1, 2011 at 1:43 PM, Sarah Goslee wrote: > Bert, > > On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter wrote: >> Folks: >> >> I consider my reply below rather clumsy: One has to keep track of >> index numbers other than that which is inserted and must separately >> change column names. Is there as "essentially better" way to do this, >> either via base R or via an R package. I leave it to you to define >> "essentially better." A variation on the theme that I prefer for aesthetic reasons is a <- data.frame(A=1:10, B=11:20, D=31:40, E=41:50) a$F <- 21:30 a <- a[, c(1:2, 5, 3:4)] I doubt that it is "essentially better", as it still requires keeping track of the index, but to me this is easier to follow. Best, Ista >> > Having tried your solution with sample data, I'd have to agree. :) > Your approach does mess up the column names, and also doesn't work > if x is a matrix rather than data frame. Mine, using the full cbind(), works > in both cases, preserving the column names and running even if x is > a matrix. > > It could be written as a function, but since it's only one line and > really only requires knowing at what position you'd like to add > the new column, it hardly seems worth it unless it's something > to be done repeatedly. > >> ?x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) >> newcol <- 4:6 >> cbind(x[,1:2], newcol, x[,3:ncol(x)]) > ?A B newcol C D E > 1 1 1 ? ? ?4 1 1 1 > 2 2 2 ? ? ?5 2 2 2 > 3 3 3 ? ? ?6 3 3 3 >> >> >> x[,3:6] <- cbind(newcol, x[,3:5]) >> x > ?A B C D E E.1 > 1 1 1 4 1 1 ? 1 > 2 2 2 5 2 2 ? 2 > 3 3 3 6 3 3 ? 3 >> >> >> x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) >> x <- as.matrix(x) >> cbind(x[,1:2], newcol, x[,3:ncol(x)]) > ? ? A B newcol C D E > [1,] 1 1 ? ? ?4 1 1 1 > [2,] 2 2 ? ? ?5 2 2 2 > [3,] 3 3 ? ? ?6 3 3 3 >> x[,3:6] <- cbind(newcol, x[,3:5]) > Error in x[, 3:6] <- cbind(newcol, x[, 3:5]) : subscript out of bounds > > Sarah > >> Thanks. >> >> Cheers, >> Bert >> >> On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter wrote: >>> Doesn't work -- you lose column names. >>> >>> Try this instead: >>> >>> yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) >>> >>> Adjust column names after via: >>> >>> names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) >>> >>> Cheers, >>> Bert >>> >>> On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: >>>> x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) >>>> >>>> On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >>>>> Dear all, >>>>> >>>>> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >>>> > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From sarah.goslee at gmail.com Mon Aug 1 19:52:01 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Mon, 1 Aug 2011 13:52:01 -0400 Subject: [R] Accessing the index of factor in by() function In-Reply-To: References: <30248_1311677098_4E2E9AAA_30248_13123_1_CAE6BpM5aT0nn8+7PS33KXvVwDg1sktQFL+ps_AbFVBUK8HvWbA@mail.gmail.com> Message-ID: Merik, You did get an answer to the question, and it's even included in the material below. What doesn't work for you in Ista's suggestion? id <- c(1,1,1,1,1,2,2,2,3,3,3) month <- c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5) value <- c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15) dat.tmp <- data.frame(id, month, value) my.plot <- function(dat) {print(dat[, c("id", "value")])} by(dat.tmp, id, my.plot) But if for some reason you need to get the separate sections, not just act on them, this might also work: dat.split <- split(dat.tmp, dat.tmp$id) lapply(dat.split, my.plot) Sarah On Mon, Aug 1, 2011 at 1:34 PM, Merik Nanish wrote: > Since I didn't get an answer to this question, I'm rephrasing my question in > simpler terms: > > I have ?a dataframe and I want to split it based on the levels of one of its > columns, and apply a function to each section of the data. Output of the > function may be drawing a plot, returning ?a value, whatever. I want to do > it efficiently though (for loops are very slow). > > How can I do that? > > M > > On Tue, Jul 26, 2011 at 10:12 AM, Ista Zahn wrote: > >> Hi Merik, >> Please keep the mailing list copied. >> >> On Tue, Jul 26, 2011 at 6:44 AM, Merik Nanish >> wrote: >> > You can convert my data into a dataframe simply by dat <- data.frame(id, >> > month, value). That doesn't help though. >> >> Can you be more specific? What is the problem you are having? >> >> And no, that's not what I'm looking >> > for. What I intend to do is for by to loop through the data based on >> levels >> > of "id" factor (1,2, and 3), and for each level, for my function to >> printout >> > the values of "value" and "month" belonging to the section of data with >> that >> > "id". >> >> OK, easy enough: >> >> dat.tmp <- data.frame(id, month, value) >> my.plot <- function(dat) {print(dat[, c("id", "value")])} >> by(dat.tmp, id, my.plot) >> >> > Right now, I achieve this with a for loop but I want to avoid looping in >> the >> > data as much as possible. >> >> Why? What do you have against loops? >> >> Best, >> Ista >> >> > >> > On Tue, Jul 26, 2011 at 12:18 AM, Ista Zahn >> > wrote: >> >> >> >> Hi Merik, >> >> by() works most easily with data.frames. Is this what you are after? >> >> >> >> my.plot <- function(dat) { print(dat$value); >> >> print(dat$month[dat$id==dat$value]) } >> >> by(dat.tmp, id, my.plot) >> >> >> >> Best, >> >> Ista >> >> >> >> On Mon, Jul 25, 2011 at 9:19 PM, Merik Nanish >> >> wrote: >> >> > Hello, >> >> > >> >> > Here are three vectors to give context to my question below: >> >> > >> >> > *id ? ?<- c(1,1,1,1,1,2,2,2,3,3,3)) >> >> > month <- c(1, 1, 2, 3, 6, 2, 3, 6, 1, 3, 5) >> >> > value <- c(10, 12, 11, 14, 16, 12, 10, 8, 14, 11, 15)* >> >> > >> >> > and I want to plot "value" over "month" separately for each "id". >> Before >> >> > I >> >> > can do that, I need to section both month and value, based on ID. I >> >> > create a >> >> > my.plot function like this (at this point, it doesn't draw any plots, >> it >> >> > is >> >> > just an effort to help my understand what I'm doing): >> >> > >> >> > *my.plot <- function(y) { print(y); print(month[id==y]) }* >> >> > >> >> > Now, I tried: >> >> > >> >> > *by(value, id, my.plot)* >> >> > >> >> > But of course, it didn't do what I wanted. I realized that the >> parameter >> >> > passed to my.plot, is a "secion of value" per ID, and not the ID value >> >> > itself. Question is, how can I get the value of factor ID at each >> level >> >> > of >> >> > by()? >> >> > >> >> > Please advise, >> >> > >> >> > Merik >> >> > -- Sarah Goslee http://www.functionaldiversity.org From gunter.berton at gene.com Mon Aug 1 20:13:59 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Mon, 1 Aug 2011 11:13:59 -0700 Subject: [R] Inserting column in between In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC93@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: Thanks Sarah and David. Yes, but note this: > z <- data.frame(a=1:2,b=3:4) > z a b 1 1 3 2 2 4 > newdat <- 5:6 > cbind(z[,1],newdat,z[,2]) newdat [1,] 1 5 3 [2,] 2 6 4 > cbind.data.frame(z[,1],newdat,z[,2]) z[, 1] newdat z[, 2] 1 1 5 3 2 2 6 4 Aha moment! -- You need drop=FALSE: cbind(z[,1,drop=FALSE],newdat,z[,2,drop=FALSE]) a newdat b 1 1 5 3 2 2 6 4 So your solution does not work in general (and you may not have intended it to); while mine does, but is blatantly clumsy. I would say the "better" approach is merely to add the drop = FALSE option to yours even though it is unnecessary in your simple example: cbind(x[,1:2,drop = FALSE], newcol, x[,3:ncol(x)], drop= FALSE) ... and I would definitely count this as an R 'gotcha' . (and it has gotcha'ed me before). Cheers, -- Bert On Mon, Aug 1, 2011 at 10:37 AM, Sarah Goslee wrote: > Bert, > > On Mon, Aug 1, 2011 at 1:17 PM, Bert Gunter wrote: >> Doesn't work -- you lose column names. > > But I don't lose column names: > >> x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) >> x > ?A B C D E > 1 1 1 1 1 1 > 2 2 2 2 2 2 > 3 3 3 3 3 3 >> newcol <- 4:6 >> cbind(x[,1:2], newcol, x[,3:ncol(x)]) > ?A B newcol C D E > 1 1 1 ? ? ?4 1 1 1 > 2 2 2 ? ? ?5 2 2 2 > 3 3 3 ? ? ?6 3 3 3 > > It's even possible to change names in the cbind() statement: > >> cbind(x[,1:2], Y=newcol, x[,3:ncol(x)]) > ?A B Y C D E > 1 1 1 4 1 1 1 > 2 2 2 5 2 2 2 > 3 3 3 6 3 3 3 > > If for some reason it isn't working for you, you might try explicitly calling > cbind.data.frame() instead of the default cbind(). > > >> Try this instead: >> >> yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) >> >> Adjust column names after via: >> >> names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) > > This shouldn't be necessary, I think. What happens if you use my > above example? > > Sarah > > >> Cheers, >> Bert >> >> On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: >>> x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) >>> >>> On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >>>> Dear all, >>>> >>>> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >>>> >>>> >>> >> > > > -- > Sarah Goslee > http://www.functionaldiversity.org > From math at hush.com Mon Aug 1 20:12:56 2011 From: math at hush.com (monk) Date: Mon, 1 Aug 2011 11:12:56 -0700 (PDT) Subject: [R] fill Matrix quicker Message-ID: <1312222376367-3710428.post@n4.nabble.com> dear all, i have a quite simple question, i want to fill up a Matrix like done in the following function, but the performance is very bad for large dimensions is there a way to do this like with apply or something similar? makeMatrix <- function(a, b,dim) { X=matrix(0,ncol=dim,nrow=dim) for (i in c(1:dim)){ for (j in c(1:dim)) { if (i==j) {X[i,j]<-a} else { X[i,j]<- exp(( -1*abs(i-j))/(3*b)) } } } X } -- View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html Sent from the R help mailing list archive at Nabble.com. From ripley at stats.ox.ac.uk Mon Aug 1 20:21:57 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Mon, 1 Aug 2011 19:21:57 +0100 (BST) Subject: [R] error message jpeg62.dll missing In-Reply-To: References: Message-ID: See the footer of this and every R-help message. In particular, that DLL is not used by R itself, so this is probably something called from a third-party package. A number of packages used to use that DLL (which is rather out of date), but no longer, so is your R actually current (the posting guide asked you to update *before* posting: it also asked you for 'at a minimum information)? On Mon, 1 Aug 2011, Rocky Hyacinth wrote: > Dear R-help > > We are getting an error message `jpeg62.dll missing'. > > We are running Windows 7 64-bit, from a Mac using Boot Camp. > > Do you know of this error message, and can you give us help trying to > resolve the problem? > > many thanks > Rocky > > Rocky Hyacinth > Technician > Department of Archaeology > University of Sheffield > United Kingdom > > [[alternative HTML version deleted]] And not to send HTML .... > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From jvadams at usgs.gov Mon Aug 1 20:35:01 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Mon, 1 Aug 2011 13:35:01 -0500 Subject: [R] fill Matrix quicker In-Reply-To: <1312222376367-3710428.post@n4.nabble.com> References: <1312222376367-3710428.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rwp7h at virginia.edu Mon Aug 1 20:35:01 2011 From: rwp7h at virginia.edu (Robert Pfister) Date: Mon, 1 Aug 2011 14:35:01 -0400 Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jvadams at usgs.gov Mon Aug 1 20:39:36 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Mon, 1 Aug 2011 13:39:36 -0500 Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pburns at pburns.seanet.com Mon Aug 1 20:48:56 2011 From: pburns at pburns.seanet.com (Patrick Burns) Date: Mon, 01 Aug 2011 19:48:56 +0100 Subject: [R] fill Matrix quicker In-Reply-To: <1312222376367-3710428.post@n4.nabble.com> References: <1312222376367-3710428.post@n4.nabble.com> Message-ID: <4E36F518.2050109@pburns.seanet.com> Most certainly you can speed it up: X <- exp(-abs(row(X) - col(X)) / (3*b)) diag(X) <- a should do what you want. This is called 'vectorization' and is discussed lots of places -- for instance, in the two documents mentioned below in my signature. On 01/08/2011 19:12, monk wrote: > dear all, > > i have a quite simple question, i want to fill up a Matrix like done in the > following function, > but the performance is very bad for large dimensions > is there a way to do this like with apply or something similar? > > > makeMatrix<- function(a, b,dim) { > X=matrix(0,ncol=dim,nrow=dim) > > > > for (i in c(1:dim)){ > for (j in c(1:dim)) { > if (i==j) {X[i,j]<-a} > else { X[i,j]<- exp(( -1*abs(i-j))/(3*b)) } > } > } > X > } > > -- > View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Patrick Burns pburns at pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') From dcarlson at tamu.edu Mon Aug 1 20:50:01 2011 From: dcarlson at tamu.edu (David L Carlson) Date: Mon, 1 Aug 2011 13:50:01 -0500 Subject: [R] Inserting column in between -- "better" way? In-Reply-To: References: Message-ID: <013701cc507b$d464d840$7d2e88c0$@edu> Actually Sara's method fails if the insertion is after the first or before the last column: >x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) >newcol <- 4:6 >cbind(x[,1], newcol, x[,2:ncol(x)]) x[, 1] newcol B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 > cbind(x[,1:4], newcol, x[,ncol(x)]) A B C D newcol x[, ncol(x)] 1 1 1 1 1 4 1 2 2 2 2 2 5 2 3 3 3 3 3 6 3 Inserting drop=FALSE fixes them. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Sarah Goslee Sent: Monday, August 01, 2011 12:44 PM To: Bert Gunter Cc: r-help at r-project.org Subject: Re: [R] Inserting column in between -- "better" way? Bert, On Mon, Aug 1, 2011 at 1:27 PM, Bert Gunter wrote: > Folks: > > I consider my reply below rather clumsy: One has to keep track of > index numbers other than that which is inserted and must separately > change column names. Is there as "essentially better" way to do this, > either via base R or via an R package. I leave it to you to define > "essentially better." > Having tried your solution with sample data, I'd have to agree. :) Your approach does mess up the column names, and also doesn't work if x is a matrix rather than data frame. Mine, using the full cbind(), works in both cases, preserving the column names and running even if x is a matrix. It could be written as a function, but since it's only one line and really only requires knowing at what position you'd like to add the new column, it hardly seems worth it unless it's something to be done repeatedly. > x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) > newcol <- 4:6 > cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 > > > x[,3:6] <- cbind(newcol, x[,3:5]) > x A B C D E E.1 1 1 1 4 1 1 1 2 2 2 5 2 2 2 3 3 3 6 3 3 3 > > > x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) > x <- as.matrix(x) > cbind(x[,1:2], newcol, x[,3:ncol(x)]) A B newcol C D E [1,] 1 1 4 1 1 1 [2,] 2 2 5 2 2 2 [3,] 3 3 6 3 3 3 > x[,3:6] <- cbind(newcol, x[,3:5]) Error in x[, 3:6] <- cbind(newcol, x[, 3:5]) : subscript out of bounds Sarah > Thanks. > > Cheers, > Bert > > On Mon, Aug 1, 2011 at 10:17 AM, Bert Gunter wrote: >> Doesn't work -- you lose column names. >> >> Try this instead: >> >> yourframe[,30:51] <- cbind( newcolumn,yourframe[,30:50]) >> >> Adjust column names after via: >> >> names(yourframe) [30:51] <- c(newcolname,names(yourframe[30:50]) >> >> Cheers, >> Bert >> >> On Mon, Aug 1, 2011 at 10:10 AM, Sarah Goslee wrote: >>> x <- cbind(x[,1:29], newcolumn, x[,30:ncol(x)]) >>> >>> On Mon, Aug 1, 2011 at 12:59 PM, Bansal, Vikas wrote: >>>> Dear all, >>>> >>>> I have a very simple question.I have data frame of 50 columns and i want to insert a column in 30th position.But i do not want to delete that column.Is it possible to include a column in between, so that new values are in 30th column and 30 th column is now 31st and 31st is 32nd......so on and 50th column is 51st..?I will be very thankful to you. >>> -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From dimitri.liakhovitski at gmail.com Mon Aug 1 21:45:39 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Mon, 1 Aug 2011 15:45:39 -0400 Subject: [R] Identifying US holidays Message-ID: Hello! I am trying to identify which ones of a vector of dates are US holidays. And, ideally, which is which. And I do not know (a-priori) which dates those should be. I have, for example: x<-seq(as.Date("2011-01-01"),as.Date("2011-12-31"),by="day") (x) I think chron should help me here - but maybe I am not using it properly: library(chron) is.holiday(chron) # Says that none of those dates are holidays ?is.holiday says: "holidays" is an object that should be listing holidays. But I want to figure out which of my dates are US holidays and don't want to provide a list of Package timeDate does almost what I need: library(timeDate) holidayNYSE(2008:2010) holidayNYSE() However, I don't need all the NYSE holidays (like Good Friday). Just the major US holidays - New Years, MLK, Memorial Day, Independence Day, Labor Day, Halloween, Thanksgiving, Christmas. Is there any way to identify major US holidays? Thanks a lot! - Dimitri Liakhovitski marketfusionanalytics.com From bhh at xs4all.nl Mon Aug 1 21:47:22 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Mon, 1 Aug 2011 12:47:22 -0700 (PDT) Subject: [R] error in self-made function - cannot deal with objects of length = 1 In-Reply-To: <1312226062228-3710555.post@n4.nabble.com> References: <1312226062228-3710555.post@n4.nabble.com> Message-ID: <1312228042889-3710621.post@n4.nabble.com> bjmjarrett wrote: > > ... > rate <- function(x){ > storage <- matrix(nrow=length(x),ncol=1) > ifelse(length(x)==1,storage[1,] <- NA,{ > storage[1,] <- x[1]/max(x) > for(i in 2:length(x)){ > p <- i-1 > storage[i,] <- ((x[i] - x[p]) / max(x)) > } > }) > return(storage) > } > > but I end up with this error when I try and use the above function in > tapply(): > > Error in ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & : > replacement has length zero > > ifelse is for vector arguments. You should use if(....) {.......} else {.....} But why not just c(x[1], diff(x))/max(x) Berend -- View this message in context: http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710621.html Sent from the R help mailing list archive at Nabble.com. From diggsb at ohsu.edu Mon Aug 1 21:48:16 2011 From: diggsb at ohsu.edu (Brian Diggs) Date: Mon, 1 Aug 2011 12:48:16 -0700 Subject: [R] example package for devel newcomers In-Reply-To: <201107312224.31828@spsconsultoria.com> References: <201107311805.43392@spsconsultoria.com> <7F1462F7-6146-4781-9F40-FCA6EF2FC304@comcast.net> <201107312224.31828@spsconsultoria.com> Message-ID: <4E370300.2040006@ohsu.edu> On 7/31/2011 6:24 PM, Alexandre Aguiar wrote: > Em Domingo 31 Julho 2011, voc? escreveu: >> My memory is that this question gets asked every few months and one of >> the stock answers is to use the function 'package.skeleton' in the >> utils package as a starting point. > > Got that from docs. And actually I already have most of the code written. > My question addresses known tricks and impressions by experienced R > interface programmers. This kind of stuff can be really useful. For > instance, tricks are much better than docs when embedding php. > > Thanx. Hadley Wickham is working on this sort of thing. I know he has given a master class on package development. Some things related to that are on the wiki associated with his devtools package: https://github.com/hadley/devtools/wiki -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University From Sushil.Amirisetty at cchmc.org Mon Aug 1 20:55:19 2011 From: Sushil.Amirisetty at cchmc.org (Sushil Amirisetty) Date: Mon, 01 Aug 2011 14:55:19 -0400 Subject: [R] Error while trying to install a package Message-ID: <4E36BE570200009B000C2578@n6mcgw16.cchmc.org> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From math at hush.com Mon Aug 1 21:05:44 2011 From: math at hush.com (monk) Date: Mon, 1 Aug 2011 12:05:44 -0700 (PDT) Subject: [R] fill Matrix quicker In-Reply-To: <4E36F518.2050109@pburns.seanet.com> References: <1312222376367-3710428.post@n4.nabble.com> <4E36F518.2050109@pburns.seanet.com> Message-ID: <1312225544537-3710533.post@n4.nabble.com> thanks a lot , that will do the trick -- View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710533.html Sent from the R help mailing list archive at Nabble.com. From bjmjarrett at gmail.com Mon Aug 1 21:14:22 2011 From: bjmjarrett at gmail.com (bjmjarrett) Date: Mon, 1 Aug 2011 12:14:22 -0700 (PDT) Subject: [R] error in self-made function - cannot deal with objects of length = 1 Message-ID: <1312226062228-3710555.post@n4.nabble.com> I have a function to calculate the rate of increase (the difference between the value and the previous value divided by the total number of eggs in a year) of egg production over the course of a year: rate <- function(x){ storage <- matrix(nrow=length(x),ncol=1) storage[1,] <- x[1] / max(x) # as there is no previous value for( i in 2:length(x)){ p <- i - 1 storage[i,] <- ((x[i] - x[p] / max(x)) } return(storage) } However, as it requires the subtraction of one term with the previous term it fails when dealing with objects with length = 1 (when only one reading has been taken in a year). I have tried adding an ifelse() function into `rate' with NA added for length 1: rate <- function(x){ storage <- matrix(nrow=length(x),ncol=1) ifelse(length(x)==1,storage[1,] <- NA,{ storage[1,] <- x[1]/max(x) for(i in 2:length(x)){ p <- i-1 storage[i,] <- ((x[i] - x[p]) / max(x)) } }) return(storage) } but I end up with this error when I try and use the above function in tapply(): Error in ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & : replacement has length zero Thanks in advance, Ben -- View this message in context: http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710555.html Sent from the R help mailing list archive at Nabble.com. From rwp7h at virginia.edu Mon Aug 1 21:56:51 2011 From: rwp7h at virginia.edu (Robert Pfister) Date: Mon, 1 Aug 2011 15:56:51 -0400 Subject: [R] 5 arguments passed to .Internal(matrix) which requires 7 In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Mon Aug 1 22:07:44 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Mon, 1 Aug 2011 16:07:44 -0400 Subject: [R] error in self-made function - cannot deal with objects of length = 1 In-Reply-To: <1312226062228-3710555.post@n4.nabble.com> References: <1312226062228-3710555.post@n4.nabble.com> Message-ID: <52450B56-562F-446F-A4B1-69D6CB667746@gmail.com> Just jumping into this, but does the ROC(x, type="discrete") function of either the TTR or caTools (can't remember which) work if you need a prebuilt function? Also, why are you dividing by the max value? That seems a funny way to calculate ROC... On Aug 1, 2011, at 3:14 PM, bjmjarrett wrote: > I have a function to calculate the rate of increase (the difference between > the value and the previous value divided by the total number of eggs in a > year) of egg production over the course of a year: > > rate <- function(x){ > storage <- matrix(nrow=length(x),ncol=1) > storage[1,] <- x[1] / max(x) # as there is no previous value > for( i in 2:length(x)){ > p <- i - 1 > storage[i,] <- ((x[i] - x[p] / max(x)) > } > return(storage) > } > > However, as it requires the subtraction of one term with the previous term > it fails when dealing with objects with length = 1 (when only one reading > has been taken in a year). I have tried adding an ifelse() function into > `rate' with NA added for length 1: > > rate <- function(x){ > storage <- matrix(nrow=length(x),ncol=1) > ifelse(length(x)==1,storage[1,] <- NA,{ > storage[1,] <- x[1]/max(x) > for(i in 2:length(x)){ > p <- i-1 > storage[i,] <- ((x[i] - x[p]) / max(x)) > } > }) > return(storage) > } > > but I end up with this error when I try and use the above function in > tapply(): > > Error in ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & : > replacement has length zero > > Thanks in advance, > > Ben > > -- > View this message in context: http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710555.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From bjmjarrett at gmail.com Mon Aug 1 22:00:33 2011 From: bjmjarrett at gmail.com (bjmjarrett) Date: Mon, 1 Aug 2011 13:00:33 -0700 (PDT) Subject: [R] error in self-made function - cannot deal with objects of length = 1 In-Reply-To: <1312228042889-3710621.post@n4.nabble.com> References: <1312226062228-3710555.post@n4.nabble.com> <1312228042889-3710621.post@n4.nabble.com> Message-ID: <1312228833581-3710646.post@n4.nabble.com> > But why not just > c(x[1], diff(x))/max(x) So simple! Thank you ever so much Berend. Best wishes, Ben -- View this message in context: http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710646.html Sent from the R help mailing list archive at Nabble.com. From michael.weylandt at gmail.com Mon Aug 1 22:12:47 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Mon, 1 Aug 2011 16:12:47 -0400 Subject: [R] error in self-made function - cannot deal with objects of length = 1 In-Reply-To: <52450B56-562F-446F-A4B1-69D6CB667746@gmail.com> References: <1312226062228-3710555.post@n4.nabble.com> <52450B56-562F-446F-A4B1-69D6CB667746@gmail.com> Message-ID: But if you do mean to divide by max(x), I'll also vote for the prior ROI <- function(x) { if (length(x)==1) return(NA) r=c(x[1], diff(x))/max(x) return(r)} As being about as quick and elegant as this can be done in R. M On Aug 1, 2011, at 4:07 PM, "R. Michael Weylandt " wrote: > Just jumping into this, but does the ROC(x, type="discrete") function of either the TTR or caTools (can't remember which) work if you need a prebuilt function? > > Also, why are you dividing by the max value? That seems a funny way to calculate ROC... > > On Aug 1, 2011, at 3:14 PM, bjmjarrett wrote: > >> I have a function to calculate the rate of increase (the difference between >> the value and the previous value divided by the total number of eggs in a >> year) of egg production over the course of a year: >> >> rate <- function(x){ >> storage <- matrix(nrow=length(x),ncol=1) >> storage[1,] <- x[1] / max(x) # as there is no previous value >> for( i in 2:length(x)){ >> p <- i - 1 >> storage[i,] <- ((x[i] - x[p] / max(x)) >> } >> return(storage) >> } >> >> However, as it requires the subtraction of one term with the previous term >> it fails when dealing with objects with length = 1 (when only one reading >> has been taken in a year). I have tried adding an ifelse() function into >> `rate' with NA added for length 1: >> >> rate <- function(x){ >> storage <- matrix(nrow=length(x),ncol=1) >> ifelse(length(x)==1,storage[1,] <- NA,{ >> storage[1,] <- x[1]/max(x) >> for(i in 2:length(x)){ >> p <- i-1 >> storage[i,] <- ((x[i] - x[p]) / max(x)) >> } >> }) >> return(storage) >> } >> >> but I end up with this error when I try and use the above function in >> tapply(): >> >> Error in ans[!test & !nas] <- rep(no, length.out = length(ans))[!test & : >> replacement has length zero >> >> Thanks in advance, >> >> Ben >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/error-in-self-made-function-cannot-deal-with-objects-of-length-1-tp3710555p3710555.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. From dimitri.liakhovitski at gmail.com Mon Aug 1 22:18:50 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Mon, 1 Aug 2011 16:18:50 -0400 Subject: [R] Identifying US holidays In-Reply-To: References: Message-ID: Just to clarify - I realize that "major" is subjective here. Maybe I should say "most common". But maybe there is a way for me to select from a list of all NYSE holidays and flag only some of them? Just not sure how to do it... Thanks! Dimitri On Mon, Aug 1, 2011 at 3:45 PM, Dimitri Liakhovitski wrote: > Hello! > > I am trying to identify which ones of a vector of dates are US > holidays. And, ideally, which is which. And I do not know (a-priori) > which dates those should be. > I have, for example: > ?x<-seq(as.Date("2011-01-01"),as.Date("2011-12-31"),by="day") > (x) > > I think chron should help me here - but maybe I am not using it properly: > > library(chron) > is.holiday(chron) # Says that none of those dates are holidays > > ?is.holiday says: "holidays" is an object that should be listing > holidays. But I want to figure out which of my dates are US holidays > and don't want to provide a list of > > Package timeDate does almost what I need: > library(timeDate) > holidayNYSE(2008:2010) > holidayNYSE() > > However, I don't need all the NYSE holidays (like Good Friday). Just > the major US holidays - New Years, MLK, Memorial Day, Independence > Day, Labor Day, Halloween, Thanksgiving, Christmas. > Is there any way to identify major US holidays? > > Thanks a lot! > > - > Dimitri Liakhovitski > marketfusionanalytics.com > -- Dimitri Liakhovitski marketfusionanalytics.com From michael.weylandt at gmail.com Mon Aug 1 22:24:49 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Mon, 1 Aug 2011 16:24:49 -0400 Subject: [R] Identifying US holidays In-Reply-To: References: Message-ID: <381BAD24-EB2C-4DEA-A2B0-D2EB006512E2@gmail.com> Don't know if this is sufficiently slick for this list (which never fails to impress me with quick and elegant solutions) but I would point out to you that GF is the only NYSE holiday falling in March or April so it shouldn't be hard to discard it if desired. Michael Weylandt On Aug 1, 2011, at 4:18 PM, Dimitri Liakhovitski wrote: > Just to clarify - I realize that "major" is subjective here. Maybe I > should say "most common". > But maybe there is a way for me to select from a list of all NYSE > holidays and flag only some of them? > Just not sure how to do it... > Thanks! > Dimitri > > On Mon, Aug 1, 2011 at 3:45 PM, Dimitri Liakhovitski > wrote: >> Hello! >> >> I am trying to identify which ones of a vector of dates are US >> holidays. And, ideally, which is which. And I do not know (a-priori) >> which dates those should be. >> I have, for example: >> x<-seq(as.Date("2011-01-01"),as.Date("2011-12-31"),by="day") >> (x) >> >> I think chron should help me here - but maybe I am not using it properly: >> >> library(chron) >> is.holiday(chron) # Says that none of those dates are holidays >> >> ?is.holiday says: "holidays" is an object that should be listing >> holidays. But I want to figure out which of my dates are US holidays >> and don't want to provide a list of >> >> Package timeDate does almost what I need: >> library(timeDate) >> holidayNYSE(2008:2010) >> holidayNYSE() >> >> However, I don't need all the NYSE holidays (like Good Friday). Just >> the major US holidays - New Years, MLK, Memorial Day, Independence >> Day, Labor Day, Halloween, Thanksgiving, Christmas. >> Is there any way to identify major US holidays? >> >> Thanks a lot! >> >> - >> Dimitri Liakhovitski >> marketfusionanalytics.com >> > > > > -- > Dimitri Liakhovitski > marketfusionanalytics.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From matt.curcio.ri at gmail.com Mon Aug 1 22:47:26 2011 From: matt.curcio.ri at gmail.com (Matt Curcio) Date: Mon, 1 Aug 2011 16:47:26 -0400 Subject: [R] Errors, driving me nuts Message-ID: Greetings all, I am getting this error that is driving me nuts... (not a long trip, haha) I have a set of files and in these files I want to calculate ttests on rows 'compareA' and 'compareB' (these will change over time there I want a variable here). Also these files are in many different directories so I want a way filter out the junk... Anyway I don't believe that this is related to my errors but I mention it none the less. > files_to_test <- list.files (pattern = "kegg.combine") > for (i in 1:length (files_to_test)) { + raw_data <- read.table (files_to_test[i], header=TRUE, sep=" ") + tmpA <- raw_data[,compareA] + tmpB <- raw_data[,compareB] + tt <- t.test (tmpA, tmpB, var.equal=TRUE) + tt_pvalue[i] <- tt$p.value + } Error in tt_pvalue[i] <- tt$p.value : object 'tt_pvalue' not found # I tried setting up a vector... # as.vector(tt_pvalue, mode="any") ### but NO GO > file.name = paste("ttest.results.", compareA, compareB, "") > setwd(save_to) > write.table(tt_pvalue, file=file.name, sep="\t" ) Error in inherits(x, "data.frame") : object 'tt_pvalue' not found # No idea?? What is going wrong?? M Matt Curcio M: 401-316-5358 E: matt.curcio.ri at gmail.com From murdoch.duncan at gmail.com Mon Aug 1 22:48:19 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Mon, 01 Aug 2011 16:48:19 -0400 Subject: [R] Plotting question In-Reply-To: References: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> <4E36BD77.1050409@gmail.com> Message-ID: <4E371113.6050501@gmail.com> On 11-08-01 11:48 AM, Bert Gunter wrote: > IMHO: > > On Mon, Aug 1, 2011 at 7:51 AM, Duncan Murdoch wrote: >> On 11-08-01 5:44 AM, Andrew McCulloch wrote: >>> >>> Hi, >>> >>> I use R to draw my graphs. I have 100 points on a simple xy-plot. The >>> points are >>> distinguished by a third variable which is categorical with 10 levels. I >>> have >>> been plotting x against y and using gray scales to distinguish the level >>> of the >>> categorical variable for each point. It looks ok to me but a journal >>> reviewer >>> says this is not any use. I cannot afford to pay for colour prints. Any >>> ideas on >>> what is the best way to distinguish 10 groups on an xy scatter plot? >> >> Plot digits or letters or other symbols. >> >> Duncan Murdoch >> > No, this does not work. You have amazing perception to know that it doesn't work in Andrew's graph. But then you go on to suggest that sometimes it does, and then suggest using symbols. Obviously you need to see the graph to know what works. If the 10 categories are ordered, then something like thermometer plots would work. If they are grouped into a small number of variations on a small number of groups, then digits or letters combined with shading might work, especially if the groups are well separated, or there are clear patterns. I'd agree with the reviewer than 10 levels of shading is probably too many to distinguish, and I'd agree with you that digits 0-9 in equal quantities in an unstructured scatterplot are probably not a good presentation, but I wouldn't want to give specific advice about plotting a dataset without seeing it. Duncan Murdoch See Cleveland's books (e.g. "Visualizing > Data"). 10 is too many symbols to constantly refer to a legend to keep > straight, and digits or letters do not allow you to readily perceive > the pattern. (Caveat: If "most" of the data are only 2 or 3 of the > symbols, then these can work). > > I think the OP's idea of using gray scales was better. I would dispute > the reviewer and refer them to appropriate references. Alternatively, > thermometer plots (aka "filled rectangle" plots) would be best. Again, > Cleveland's books provide scientific justification rather than merely > the (possibly uninformed) aesthetic opinion of a reviewer. Presumably, > the journal editor would accept hard data and psychological research > in preference to opinions. > >>> >>> >>> >>> If all else fails I can just remove the graph and give them a table of >>> regression coefficients. > > No. I think your attempt to use a graph is a much better way to go. > Try to resist poor practices such as just publishing summary > statistics. > > Cheers, > Bert >>> >>> >>> Thanks. >>> >>> Yours Sincerely >>> Andrew McCulloch >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > From dimitri.liakhovitski at gmail.com Mon Aug 1 22:57:56 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Mon, 1 Aug 2011 16:57:56 -0400 Subject: [R] Identifying US holidays In-Reply-To: <381BAD24-EB2C-4DEA-A2B0-D2EB006512E2@gmail.com> References: <381BAD24-EB2C-4DEA-A2B0-D2EB006512E2@gmail.com> Message-ID: To be specific, I only need to get rid of 2 NYSE holidays: Washington's Birthday and Good Friday. Is there a way to reduce the vector of NYSE holidays in timeDate by throwing out those two? Thank you! Dimitri On Mon, Aug 1, 2011 at 4:24 PM, R. Michael Weylandt wrote: > Don't know if this is sufficiently slick for this list (which never fails to impress me with quick and elegant solutions) but I would point out to you that GF is the only NYSE holiday falling in March or April so it shouldn't be hard to discard it if desired. > > Michael Weylandt > > On Aug 1, 2011, at 4:18 PM, Dimitri Liakhovitski wrote: > >> Just to clarify - I realize that "major" is subjective here. Maybe I >> should say "most common". >> But maybe there is a way for me to select from a list of all NYSE >> holidays and flag only some of them? >> Just not sure how to do it... >> Thanks! >> Dimitri >> >> On Mon, Aug 1, 2011 at 3:45 PM, Dimitri Liakhovitski >> wrote: >>> Hello! >>> >>> I am trying to identify which ones of a vector of dates are US >>> holidays. And, ideally, which is which. And I do not know (a-priori) >>> which dates those should be. >>> I have, for example: >>> ?x<-seq(as.Date("2011-01-01"),as.Date("2011-12-31"),by="day") >>> (x) >>> >>> I think chron should help me here - but maybe I am not using it properly: >>> >>> library(chron) >>> is.holiday(chron) # Says that none of those dates are holidays >>> >>> ?is.holiday says: "holidays" is an object that should be listing >>> holidays. But I want to figure out which of my dates are US holidays >>> and don't want to provide a list of >>> >>> Package timeDate does almost what I need: >>> library(timeDate) >>> holidayNYSE(2008:2010) >>> holidayNYSE() >>> >>> However, I don't need all the NYSE holidays (like Good Friday). Just >>> the major US holidays - New Years, MLK, Memorial Day, Independence >>> Day, Labor Day, Halloween, Thanksgiving, Christmas. >>> Is there any way to identify major US holidays? >>> >>> Thanks a lot! >>> >>> - >>> Dimitri Liakhovitski >>> marketfusionanalytics.com >>> >> >> >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- Dimitri Liakhovitski marketfusionanalytics.com From izahn at psych.rochester.edu Mon Aug 1 22:57:39 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Mon, 1 Aug 2011 16:57:39 -0400 Subject: [R] Errors, driving me nuts In-Reply-To: References: Message-ID: Hi Matt, On Mon, Aug 1, 2011 at 4:47 PM, Matt Curcio wrote: > Greetings all, > I am getting this error that is driving me nuts... (not a long trip, haha) > > I have a set of files and in these files I want to calculate ttests on > rows 'compareA' and 'compareB' (these will change over time there I > want a variable here). Also these files are in many different > directories so I want a way filter out the junk... ?Anyway I don't > believe that this is related to my errors but I mention it none the > less. > >> files_to_test <- list.files (pattern = "kegg.combine") >> for (i in 1:length (files_to_test)) { > + ? ?raw_data <- read.table (files_to_test[i], header=TRUE, sep=" ") > + ? ?tmpA <- raw_data[,compareA] > + ? ?tmpB <- raw_data[,compareB] > + ? ?tt <- t.test (tmpA, tmpB, var.equal=TRUE) > + ? ?tt_pvalue[i] <- tt$p.value > + } > Error in tt_pvalue[i] <- tt$p.value : object 'tt_pvalue' not found > # I tried setting up a vector... > # as.vector(tt_pvalue, mode="any") ### but NO GO >> file.name = paste("ttest.results.", compareA, compareB, "") >> setwd(save_to) >> write.table(tt_pvalue, file=file.name, sep="\t" ) > Error in inherits(x, "data.frame") : object 'tt_pvalue' not found > # No idea?? you need to create tt_pvalue before you can assign values to it (i.e., before you start your for-loop). Consider this simpler example: tst[1] <- 1 Error in tst[1] <- 1 : object 'tst' not found tst <- vector() tst[1] <- 1 Best, Ista > > What is going wrong?? > M > > > Matt Curcio > M: 401-316-5358 > E: matt.curcio.ri at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From Marianne.ZEYRINGER at ec.europa.eu Tue Aug 2 00:08:13 2011 From: Marianne.ZEYRINGER at ec.europa.eu (Marianne.ZEYRINGER at ec.europa.eu) Date: Tue, 2 Aug 2011 00:08:13 +0200 Subject: [R] fitting a sinus curve References: <1311843912949-3700833.post@n4.nabble.com> <201107312240.p6VMe6pP009607@mail15.tpg.com.au> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gunter.berton at gene.com Tue Aug 2 00:11:52 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Mon, 1 Aug 2011 15:11:52 -0700 Subject: [R] Plotting question In-Reply-To: <4E371113.6050501@gmail.com> References: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> <4E36BD77.1050409@gmail.com> <4E371113.6050501@gmail.com> Message-ID: Well stated, Duncan, and I plead guilty, though I did try to weasel out with caveats. Perhaps I may plead down to a lesser sentence or probation by saying that I was offering what I still believe to be appropriate advice for a general strategy for handling this sort of plotting issue; but that as always, one's mileage may vary depending on the specifics. Extra inline comments below. Cheers, Bert And to return to R, note that any of these options is easy to implement in any of at least 3 different graphics frameworks (base, trellis, and ggplot). So Duncan and I can spar over what we'd like to do without being limited by what software **allows** us to do. On Mon, Aug 1, 2011 at 1:48 PM, Duncan Murdoch wrote: > On 11-08-01 11:48 AM, Bert Gunter wrote: >> >> IMHO: >> >> On Mon, Aug 1, 2011 at 7:51 AM, Duncan Murdoch >> ?wrote: >>> >>> On 11-08-01 5:44 AM, Andrew McCulloch wrote: >>>> >>>> Hi, >>>> >>>> I use R to draw my graphs. I have 100 points on a simple xy-plot. The >>>> points are >>>> distinguished by a third variable which is categorical with 10 levels. I >>>> have >>>> been plotting x against y and using gray scales to distinguish the level >>>> of the >>>> categorical variable for each point. It looks ok to me but a journal >>>> reviewer >>>> says this is not any use. I cannot afford to pay for colour prints. Any >>>> ideas on >>>> what is the best way to distinguish 10 groups on an xy scatter plot? >>> >>> Plot digits or letters or other symbols. >>> >>> Duncan Murdoch >>> >> No, this does not work. > > You have amazing perception to know that it doesn't work in Andrew's graph. > ?But then you go on to suggest that sometimes it does, and then suggest > using symbols. > > Obviously you need to see the graph to know what works. ?If the 10 > categories are ordered, then something like thermometer plots would work. > ?If they are grouped into a small number of variations on a small number of > groups, then digits or letters combined with shading might work, especially > if the groups are well separated, or there are clear patterns. > > I'd agree with the reviewer than 10 levels of shading is probably too many > to distinguish, But for ordered categories you may not wish to distinguish so much as give an overall gestalt, for which a gray scale with 10 levels could work quite well. So it depends on the specifics of what's being plotted, no? and I'd agree with you that digits 0-9 in equal quantities > in an unstructured scatterplot are probably not a good presentation, but I > wouldn't want to give specific advice about plotting a dataset without > seeing it. > > Duncan Murdoch > > See Cleveland's books (e.g. "Visualizing >> >> Data"). 10 is too many symbols to constantly refer to a legend to keep >> straight, and digits or letters do not allow you to readily perceive >> the pattern. (Caveat: If "most" of the data are only 2 or 3 of the >> symbols, then these can work). >> >> I think the OP's idea of using gray scales was better. I would dispute >> the reviewer and refer them to appropriate references. Alternatively, >> thermometer plots (aka "filled rectangle" plots) would be best. Again, >> Cleveland's books provide scientific justification rather than merely >> the (possibly uninformed) aesthetic opinion of a reviewer. Presumably, >> the journal editor would accept hard data and psychological research >> in preference to opinions. >> >>>> >>>> >>>> >>>> If all else fails I can just remove the graph and give them a table of >>>> regression coefficients. >> >> No. I think your attempt to use a graph is a much better way to go. >> Try to resist poor practices such as just publishing summary >> statistics. >> >> Cheers, >> Bert >>>> >>>> >>>> Thanks. >>>> >>>> Yours Sincerely >>>> Andrew McCulloch >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> > > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From jeffrey.m.allard at gmail.com Tue Aug 2 01:01:51 2011 From: jeffrey.m.allard at gmail.com (Jeff Allard) Date: Mon, 1 Aug 2011 19:01:51 -0400 Subject: [R] GLMNET ERROR Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From 1987.zhangxi at gmail.com Tue Aug 2 00:57:23 2011 From: 1987.zhangxi at gmail.com (zoe_zhang) Date: Mon, 1 Aug 2011 15:57:23 -0700 (PDT) Subject: [R] if function problems Message-ID: <1312239443964-3710995.post@n4.nabble.com> Dear All, Sorry to bother I want to write a function in R using if Say I have a dataset x, if x[i]<0, then x[i]=x[i], if x[i]>0, then x[i]=0 for example, x=-3:3, then using the function, x becomes [-3,-2,-1,0,0,0,0] I write the codes as follows, gjr=function(x) {lena=length(x) for(i in 1:lenx) if (x[i]<0) return (x[i]) if (x[i]>0) return (0) x} but then, doing gjr(x? it only comes out with one number Does anyone have any suggestions? I appreciate a lot! Sincerely, Zoe -- View this message in context: http://r.789695.n4.nabble.com/if-function-problems-tp3710995p3710995.html Sent from the R help mailing list archive at Nabble.com. From pdalgd at gmail.com Tue Aug 2 01:10:59 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 2 Aug 2011 01:10:59 +0200 Subject: [R] fill Matrix quicker In-Reply-To: <1312222376367-3710428.post@n4.nabble.com> References: <1312222376367-3710428.post@n4.nabble.com> Message-ID: <305C67A1-3B76-4287-A203-C2D0AD313F39@gmail.com> On Aug 1, 2011, at 20:12 , monk wrote: > dear all, > > i have a quite simple question, i want to fill up a Matrix like done in the > following function, > but the performance is very bad for large dimensions > is there a way to do this like with apply or something similar? > > > makeMatrix <- function(a, b,dim) { > X=matrix(0,ncol=dim,nrow=dim) > > > > for (i in c(1:dim)){ > for (j in c(1:dim)) { > if (i==j) {X[i,j]<-a} > else { X[i,j]<- exp(( -1*abs(i-j))/(3*b)) } > } > } > X > } > I'd go for something like X <- outer(1:dim, 1:dim, function(i,j) exp(-abs(i-j)/3/b)) diag(X) <- a > -- > View this message in context: http://r.789695.n4.nabble.com/fill-Matrix-quicker-tp3710428p3710428.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From dwinsemius at comcast.net Tue Aug 2 01:13:15 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 1 Aug 2011 19:13:15 -0400 Subject: [R] if function problems In-Reply-To: <1312239443964-3710995.post@n4.nabble.com> References: <1312239443964-3710995.post@n4.nabble.com> Message-ID: <51460335-851A-468E-8D7C-A11889F7543B@comcast.net> On Aug 1, 2011, at 6:57 PM, zoe_zhang wrote: > Dear All, > Sorry to bother > I want to write a function in R using if > Say I have a dataset x, > if x[i]<0, then x[i]=x[i], > if x[i]>0, then x[i]=0 > > for example, x=-3:3, > then using the function, x becomes [-3,-2,-1,0,0,0,0] Just use logical indexing x[ x>0 ] <- 0 > > I write the codes as follows, > > gjr=function(x) > {lena=length(x) > for(i in 1:lenx) > if (x[i]<0) return (x[i]) > if (x[i]>0) return (0) > x} > > but then, doing > gjr(x? > it only comes out with one number 'if' is not the right function. Look at ?"if" ?ifelse (But the logical indexing is easier in this case than using ifelse.) -- David Winsemius, MD West Hartford, CT From Achim.Zeileis at uibk.ac.at Tue Aug 2 01:14:35 2011 From: Achim.Zeileis at uibk.ac.at (Achim Zeileis) Date: Tue, 2 Aug 2011 01:14:35 +0200 (CEST) Subject: [R] zero truncated poisson regression In-Reply-To: <1312187842.60769.YahooMailNeo@web120601.mail.ne1.yahoo.com> References: <1312131597.14558.YahooMailNeo@web120616.mail.ne1.yahoo.com> <1312143730.59151.YahooMailNeo@web120607.mail.ne1.yahoo.com> <1312187842.60769.YahooMailNeo@web120601.mail.ne1.yahoo.com> Message-ID: On Mon, 1 Aug 2011, Iasonas Lamprianou wrote: > Thank you, it works! And the estimates (as well as the standard errors) > seem to be more reasonable now, compared to the "normal" Poisson model. > Thank you. However, I tried to find a manual? (although I did manage to > find the paper published in the Journal of Statistical Software. For > example, how does ?zerotrunc return? The package still needs some proper documentation and some cleaning up which is why it is not yet on CRAN. The code, however, is well tested because it is exactly the code underlying the hurdle() function in "pscl". Hence, the "zerotrunc" objects are rather similar in many respect to "hurdle" objects. Starting from the JSS paper you should hopefully be able to find your way. Z > ? > Dr. Iasonas Lamprianou > Department of Social and Political Sciences > University of Cyprus > > >> ________________________________ >> From: Achim Zeileis >> To: Iasonas Lamprianou >> Cc: Mitchell Maltenfort ; "r-help at r-project.org" >> Sent: Monday, 1 August 2011, 10:10 >> Subject: Re: [R] zero truncated poisson regression >> >> On Sun, 31 Jul 2011, Iasonas Lamprianou wrote: >> >>> Thanks >>> Pscl seems to be a sensible option. >>> >>> >>> I have the counts variable with the name "N". This variable can only >>> take values bigger than zero! >>> >>> I have two explanatory variables with the names "type" and "diam" >>> >>> but when I run >>> >>> hpm <- hurdle(n ~ type+diam, data = an, dist = "poisson") >>> >>> I get the message "invalid dependent variable, minimum count is not >>> zero". Well, I know that N>0, that is why want to run a zero-truncated >>> model. But I must be missing something...and the manual does not seem to >>> help a lot... >>> >>> Can anyone help please? >> >> As previously pointed out by others on this list: hurdle() is not what you >> are looking for (although it is related to what you want to do). The >> hurdle() model is a two-part model consisting of a zero-truncated count >> part and a binary part for modeling N=0 vs N>0. See also >> vignette("countreg", package = "pscl") for details. >> >> As you don't need the binary hurdle part, you cannot use hurdle() >> directly. >> >> This is why the package "countreg" on R-Forge provides the function >> zerotrunc() which essentially does the same thing as the count part in >> hurdle(). >> >> install.packages("countreg", repos = "http://R-Forge.R-project.org") >> library("countreg") >> m <- zerotrunc(n ~ type + diam, data = an, dist = "poisson") >> summary(m) >> >>> ? >>> Dr. Iasonas Lamprianou >>> Department of Social and Political Sciences >>> University of Cyprus >>> >>> >>>> ________________________________ >>>> From: Mitchell Maltenfort >>>> To: Iasonas Lamprianou ; "r-help at r-project.org" >>>> Sent: Sunday, 31 July 2011, 20:45 >>>> Subject: Re: [R] zero truncated poisson regression >>>> >>>> Pscl package. >>>> >>>> On 7/31/11, Iasonas Lamprianou wrote: >>>>> Dear friends, >>>>> >>>>> does anyone know how I can run a zero truncated poisson regression using R >>>>> (or even SPSS)? >>>>> >>>>> Dr. Iasonas Lamprianou >>>>> Department of Social and Political Sciences >>>>> University of Cyprus >>>>> >>>>> ??? [[alternative HTML version deleted]] >>>>> >>>>> >>>> >>>> -- >>>> Sent from my mobile device >>>> >>>> Due to the recession, requests for instant gratification will be >>>> deferred until arrears in scheduled gratification have been satisfied. >>>> >>>> >>>> >>> ??? [[alternative HTML version deleted]] >>> >>> >> >> >> > [[alternative HTML version deleted]] > > From mailinglist.honeypot at gmail.com Tue Aug 2 02:28:45 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Mon, 1 Aug 2011 20:28:45 -0400 Subject: [R] if function problems In-Reply-To: <1312239443964-3710995.post@n4.nabble.com> References: <1312239443964-3710995.post@n4.nabble.com> Message-ID: In addition to what David said: On Mon, Aug 1, 2011 at 6:57 PM, zoe_zhang <1987.zhangxi at gmail.com> wrote: > Dear All, > Sorry to bother > I want to write a function in R using if > Say I have a dataset x, > if x[i]<0, then x[i]=x[i], > if x[i]>0, then x[i]=0 > > for example, x=-3:3, > then using the function, x becomes [-3,-2,-1,0,0,0,0] > > I write the codes as follows, > > gjr=function(x) > {lena=length(x) > for(i in 1:lenx) > if (x[i]<0) return (x[i]) > if (x[i]>0) return (0) > x} > > but then, doing > gjr(x? > it only comes out with one number > > Does anyone have any suggestions? You define `lena`, but then use `lenx` in `for (i in 1:lenx)` in your function ... I guess this might have something to do with it. You shouldn't use a for loop, though, and just follow david's advice by using logical indexing, or the `ifelse` function, ie: R> ifelse(x < 0, x, 0) HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From paulepanter at users.sourceforge.net Tue Aug 2 02:28:49 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Tue, 02 Aug 2011 02:28:49 +0200 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: <1F1BFA51-F26B-4E7E-BC69-5D7EA02E9D5F@gmail.com> References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> <1312202982.4034.184.camel@mattotaupa> <1F1BFA51-F26B-4E7E-BC69-5D7EA02E9D5F@gmail.com> Message-ID: <1312244929.3640.46.camel@mattotaupa> Am Montag, den 01.08.2011, 12:43 -0400 schrieb R. Michael Weylandt : > I've only got a 20 minute layover, but three quick remarks: > > 1) Do a sanity check on your data size: if you want a million walks of > a thousand steps, that already gets you to a billion integers to > store--even at a very low bound of one byte each, thats already 1GB > for the data and you still have to process it all and run the OS. If > you bump this to walks of length 10k, you are in big trouble. > > Considered like that, it shouldn't surprise you that you are getting > near memory limits. > > If you really do need such a large simulation and are willing to make > the time/space tradeoff, it may be worth doing simulations in smaller > batches (say 50-100) and aggregating the needed stats for analysis. I already did that, saved the result and ran it again. I also found [1] and will look to do these things in parallel, since the simulations do not depend on each other. I hope I can avoid the matrix then and use just `replicate()`. > Also, consider direct use of the rm() function for memory management. I was looking for such a feature the last days and read to set the variables to `NULL` to delete them somewhere. Now I found the correct command. Thank you! > 2) If you know that which.max()==1 can't happen for your data, might > this trick be easier than forcing it through some tricky logic inside > the which.max() > > X=which.max(...) > if(X[1]==1) X=Inf # or whatever value Noted for when I need that again. > 3) I dont have any texts at hand to confirm this but isn't the > expected value of the first hit time of a RW infinite? That is indeed correct. The generating function of the first hitting time of zero T? is (g_T?)(s) ? 1/s (1 - \sqrt(1 - s). Therefore (g_T?)?(s) ? 1/s? (1 - \sqrt(1 - s) + 1/s (2(1 - s))^(-?) ? ? for s ? 1. > I think a handwaving proof can be squeezed out of the optional > stopping theorem with T=min(T_a,T_b) for a<0 -Inf. Apparently there are several ways to prove that. > If I remember right, this suggests you are trying to calculate a CI > for a distribution with no finite moments, a difficult task to say the > least. I do not know. It scares me. ;-) For the symmetric simple random walk S_n ? ?_{i=0}^n X_i I want to verify (1). (1) n^(-?) ~ p_n ? P(max_{1 ? k ? n} S_n < 0) a(x) ~ b(x) means that the quotient converges to 1 for x ? ?. [?] > PS - what's an iterated RW? This is all outside my field (hence my > spitball on #2 above) I am sorry, I meant *integrated* random walk [3][4]. Basically that is the integral (?area? ? can be negative). A_n ? ?_{i=0}^n S_i = ?_{i=0}^n (n - i + 1) X_i Being 0, S? and X? can always be omitted. So I basically just need to do one `cumsum()` more over the walks. > PS2 - sorry about the row/column mix-up: I usually think of sample > paths as rows... No problem at all. I already was confused that it looked differently (transposed) after the first `apply()`. But it made sense. Thanks, Paul [1] http://www.bioconductor.org/help/course-materials/2010/BioC2010/EfficientRProgramming.pdf [2] http://www.steinsaltz.me.uk/probA/ProbALN13.pdf [3] http://www-stat.stanford.edu/~amir/preprints/irw.ps [4] http://arxiv.org/abs/0911.5456 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From indra_calisto at yahoo.com Tue Aug 2 03:31:20 2011 From: indra_calisto at yahoo.com (Indrajit Sengupta) Date: Mon, 1 Aug 2011 18:31:20 -0700 (PDT) Subject: [R] Plotting question In-Reply-To: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> References: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> Message-ID: <1312248680.96299.YahooMailNeo@web65406.mail.ac4.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From zhenjiang.xu at gmail.com Tue Aug 2 05:14:37 2011 From: zhenjiang.xu at gmail.com (zhenjiang xu) Date: Mon, 1 Aug 2011 23:14:37 -0400 Subject: [R] how to control to save plots to which dev Message-ID: Hi, I have a for loop to make 2 types of plots and I'd like to save one type of plots to a pdf file and the other to another pdf file. How can I control which plot will be saved to which pdf? Thanks -- Best, Zhenjiang From jdnewmil at dcn.davis.ca.us Tue Aug 2 05:44:18 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Mon, 01 Aug 2011 20:44:18 -0700 Subject: [R] how to control to save plots to which dev In-Reply-To: References: Message-ID: <7cf3abd9-b30e-468e-8334-1610c402836e@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From 1987.zhangxi at gmail.com Tue Aug 2 01:46:26 2011 From: 1987.zhangxi at gmail.com (zoe_zhang) Date: Mon, 1 Aug 2011 16:46:26 -0700 (PDT) Subject: [R] if function problems In-Reply-To: <51460335-851A-468E-8D7C-A11889F7543B@comcast.net> References: <1312239443964-3710995.post@n4.nabble.com> <51460335-851A-468E-8D7C-A11889F7543B@comcast.net> Message-ID: <1312242386463-3711062.post@n4.nabble.com> David, I'm so appreciate! Sincerely, Zoe -- View this message in context: http://r.789695.n4.nabble.com/if-function-problems-tp3710995p3711062.html Sent from the R help mailing list archive at Nabble.com. From prodriguez at sdsc.edu Tue Aug 2 01:30:14 2011 From: prodriguez at sdsc.edu (Paul Rodriguez) Date: Mon, 1 Aug 2011 16:30:14 -0700 Subject: [R] going past restrictions on number of elements Message-ID: Hello R experts, I'm trying to test R in a shared memory environment in which addressable memory is aggregrated to about 600G. However, I get an error of 'too many elements' specified when I try creating a 45K x 100K matrix. I tried running R with a --max-nsize=50000000000 option, but got the same message. Is there a way to run create such large matrices? thanks, Paul Rodriguez From dwinsemius at comcast.net Tue Aug 2 06:44:26 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 00:44:26 -0400 Subject: [R] how to control to save plots to which dev In-Reply-To: References: Message-ID: <51F0B218-CEC5-4621-A3FC-497AEB90E47F@comcast.net> On Aug 1, 2011, at 11:14 PM, zhenjiang xu wrote: > Hi, > > I have a for loop to make 2 types of plots and I'd like to save one > type of plots to a pdf file and the other to another pdf file. How can > I control which plot will be saved to which pdf? Thanks Why not give them file names that identify the type? -- David Winsemius, MD West Hartford, CT From ripley at stats.ox.ac.uk Tue Aug 2 07:25:33 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Tue, 2 Aug 2011 06:25:33 +0100 (BST) Subject: [R] going past restrictions on number of elements In-Reply-To: References: Message-ID: On Mon, 1 Aug 2011, Paul Rodriguez wrote: > Hello R experts, > > I'm trying to test R in a shared memory environment in which addressable memory is aggregrated to about 600G. > > However, I get an error of 'too many elements' specified when I try creating a 45K x 100K matrix. > > I tried running R with a --max-nsize=50000000000 option, but got the same message. > > Is there a way to run create such large matrices? No. See ?"Memory-limits" (a matrix in R is also a vector). NB: setting a limit on Ncells (there normally is not one) isn't going to help allocation of vectors, is it? > > thanks, > Paul Rodriguez -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From ripley at stats.ox.ac.uk Tue Aug 2 07:28:22 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Tue, 2 Aug 2011 06:28:22 +0100 (BST) Subject: [R] how to control to save plots to which dev In-Reply-To: <51F0B218-CEC5-4621-A3FC-497AEB90E47F@comcast.net> References: <51F0B218-CEC5-4621-A3FC-497AEB90E47F@comcast.net> Message-ID: On Tue, 2 Aug 2011, David Winsemius wrote: > > On Aug 1, 2011, at 11:14 PM, zhenjiang xu wrote: > >> Hi, >> >> I have a for loop to make 2 types of plots and I'd like to save one >> type of plots to a pdf file and the other to another pdf file. How can >> I control which plot will be saved to which pdf? Thanks > > Why not give them file names that identify the type? I think he wants pdf("a.pdf") pdf("b.pdf") for(i in 1:n) { plot something on a.pdf plot something on b.pdf } This is done using dev.prev/dev.next/dev.set: see their help for details. > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From rolf.turner at xtra.co.nz Tue Aug 2 08:02:47 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Tue, 02 Aug 2011 18:02:47 +1200 Subject: [R] Inverse of FAQ 7.31. Message-ID: <4E379307.2020208@xtra.co.nz> Why does R think these numbers ***are*** equal? In a somewhat bizarre set of circumstances I calculated x0 <- 0.03580067 x1 <- 0.03474075 y0 <- 0.4918823 y1 <- 0.4474461 dx <- x1 - x0 dy <- y1 - y0 xx <- (x0 + x1)/2 yy <- (y0 + y1)/2 chk <- yy*dx - xx*dy + x0*dy - y0*dx If you think about it ***very*** carefully ( :-) ) you'll see that ``chk'' ought to be zero. Blow me down, R gets 0. Exactly. To as many significant digits/decimal places as I can get it to print out. But .... I wrote a wee function in C to do the *same* calculation and dyn.load()-ed it and called it with .C(). And I got -1.248844e-19. This is of course zero, to all floating point arithmetic intents and purposes. But if I name the result returned by my call to .C() ``xxx'' and ask xxx >= 0 I get FALSE whereas ``chk >= 0'' returns TRUE (as does ``chk <= 0'', of course). (And inside my C function, the comparison ``xxx >= 0'' yields ``false'' as well.) I was vaguely thinking that raw R arithmetic would be equivalent to C arithmetic. (Isn't R written in C?) Can someone explain to me how it is that R (magically) gets it exactly right, whereas a call to .C() gives the sort of ``approximately right'' answer that one might usually expect? I know that R Core is ***good*** but even they can't make C do infinite precision arithmetic. :-) This is really just idle curiosity --- I realize that this phenomenon is one that I'll simply have to live with. But if I can get some deeper insight as to why it occurs, well, that would be nice. cheers, Rolf Turner From C.Salgado at cmst.curtin.edu.au Tue Aug 2 09:12:11 2011 From: C.Salgado at cmst.curtin.edu.au (Chandra Salgado Kent) Date: Tue, 2 Aug 2011 15:12:11 +0800 Subject: [R] Loops to assign a unique ID to a column References: <4E379307.2020208@xtra.co.nz> Message-ID: <7F7C4ABC5772344991D04CD652ED180BBE65EE@EXMS2.staff.ad.curtin.edu.au> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Thierry.ONKELINX at inbo.be Tue Aug 2 09:57:50 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Tue, 2 Aug 2011 07:57:50 +0000 Subject: [R] Loops to assign a unique ID to a column In-Reply-To: <7F7C4ABC5772344991D04CD652ED180BBE65EE@EXMS2.staff.ad.curtin.edu.au> References: <4E379307.2020208@xtra.co.nz> <7F7C4ABC5772344991D04CD652ED180BBE65EE@EXMS2.staff.ad.curtin.edu.au> Message-ID: Dear Chandra, You're on the wrong track. You don't need for loops as you can do this vectorised. as.numeric(interaction(data$Groups, data$Dates, drop = TRUE)) Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Chandra Salgado Kent > Verzonden: dinsdag 2 augustus 2011 9:12 > Aan: r-help at r-project.org > Onderwerp: [R] Loops to assign a unique ID to a column > > Dear R help, > > > > I am fairly new in data management and programming in R, and am trying to > write what is probably a simple loop, but am not having any luck. I have a > dataframe with something like the following (but much bigger): > > > > Dates<-c("12/10/2010","12/10/2010","12/10/2010","13/10/2010", > "13/10/2010", "13/10/2010") > > Groups<-c("A","B","B","A","B","C") > > data<-data.frame(Dates, Groups) > > > > I would like to create a new column in the dataframe, and give each distinct > date by group a unique identifying number starting with 1, so that the resulting > column would look something like: > > > > ID<-c(1,2,2,3,4,5) > > > > The loop that I have started to write is something like this (but doesn't work!): > > > > data$ID<-as.number(c()) > > for(i in unique(data$Dates)){ > > for(j in unique(data$Groups)){ data$ID[i,j]<-i > > i<-i+1 > > } > > } > > > > Am I on the right track? > > > > Any help on this is much appreciated! > > > > Chandra > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From pdalgd at gmail.com Tue Aug 2 10:16:40 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 2 Aug 2011 10:16:40 +0200 Subject: [R] Inverse of FAQ 7.31. In-Reply-To: <4E379307.2020208@xtra.co.nz> References: <4E379307.2020208@xtra.co.nz> Message-ID: <9FD03D32-FA9C-496A-BC25-AFC76389EA55@gmail.com> On Aug 2, 2011, at 08:02 , Rolf Turner wrote: > > > Why does R think these numbers ***are*** equal? > > In a somewhat bizarre set of circumstances I calculated > > x0 <- 0.03580067 > x1 <- 0.03474075 > y0 <- 0.4918823 > y1 <- 0.4474461 > dx <- x1 - x0 > dy <- y1 - y0 > xx <- (x0 + x1)/2 > yy <- (y0 + y1)/2 > chk <- yy*dx - xx*dy + x0*dy - y0*dx > > If you think about it ***very*** carefully ( :-) ) you'll see that ``chk'' ought to be zero. > > Blow me down, R gets 0. Exactly. To as many significant digits/decimal places > as I can get it to print out. > > But .... I wrote a wee function in C to do the *same* calculation and dyn.load()-ed > it and called it with .C(). And I got -1.248844e-19. > > This is of course zero, to all floating point arithmetic intents and purposes. But if > I name the result returned by my call to .C() ``xxx'' and ask > > xxx >= 0 > > I get FALSE whereas ``chk >= 0'' returns TRUE (as does ``chk <= 0'', of course). > (And inside my C function, the comparison ``xxx >= 0'' yields ``false'' as well.) > > I was vaguely thinking that raw R arithmetic would be equivalent to C arithmetic. > (Isn't R written in C?) > > Can someone explain to me how it is that R (magically) gets it exactly right, whereas > a call to .C() gives the sort of ``approximately right'' answer that one might usually > expect? I know that R Core is ***good*** but even they can't make C do infinite > precision arithmetic. :-) > > This is really just idle curiosity --- I realize that this phenomenon is one that I'll simply have > to live with. But if I can get some deeper insight as to why it occurs, well, that would > be nice. I think the long and the short of it is that R lost a couple of bits of precision that C retained. This sort of thing happens if R stores things into 64 bit floating point objects while C keeps them in 80 bit CPU registers. In general, floating point calculations do not obey the laws of math, for example the associative law (i.e., (a+b)-c ?= a+(b-c), especially if b and c are large and nearly equal), so any reordering of expressions by the compiler may give a slightly different result. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From jim at bitwrit.com.au Tue Aug 2 10:53:22 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Tue, 02 Aug 2011 18:53:22 +1000 Subject: [R] Plotting problems directional or rose plots In-Reply-To: References: Message-ID: <4E37BB02.3080905@bitwrit.com.au> On 08/02/2011 01:38 AM, kitty wrote: > Hi again, > > I have tried playing around with the code given to me by Alan and Jim, thank > you for the code but unfortunately....I can't seem to get either of them to > work... Alans does not work with the sample data and Jims is giving the > error : > > Error in radial.grid(labels = labels, label.pos = label.pos, radlab = > radlab, : > could not find function "boxed.labels" > > I have also tried Rose plots in the (heR.Misc) library to to avail. > > Sorry, does anyone know how to get the plots I need? > Hi kitty, Oops, I forgot that the code calls "boxed.labels", a function in the plotrix package. Install that and it should work. Jim From Thorn.Thaler at rdls.nestle.com Tue Aug 2 11:02:03 2011 From: Thorn.Thaler at rdls.nestle.com (Thaler, Thorn, LAUSANNE, Applied Mathematics) Date: Tue, 2 Aug 2011 11:02:03 +0200 Subject: [R] Environment of a LM created in a function In-Reply-To: <4E356153.5060703@ucalgary.ca> References: <4E356153.5060703@ucalgary.ca> Message-ID: Dear Peter, Thanks for your concise answer, it works perfectly. By the way, I fully agree that "data" or "df" are not good names for data.frames and I am/was aware of that and I usually avoid those names (not consequently though I've to admit, it is too tempting ;). However, if one uses those evil names, one cannot expect to receive meaningful error messages. Thus, I was not astonished by the peculiar error message itself (in fact I was well aware that this has to do with the bad naming and the fact that "data" is, above all, a function) and I suspect the error to be due to environment issues. I tried the workaround with passing the very same data argument explicitly to update: > update(models[[1]], . ~ ., data = dat) which worked but which left the stale impression of redundancy and even more dangerous error proneness: what happens if the name of the data frame is changed earlier? Finally, your suggestion with > update(models[[1]], . ~ ., data = model.frame(models[[1]])) solved all the issues (and I was wondering why I did not try it out myself, so obviously I was not seeing the wood for the trees). So, thanks a lot for your help. Have a nice day. KR, -Thorn From petr.pikal at precheza.cz Tue Aug 2 11:14:26 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Tue, 2 Aug 2011 11:14:26 +0200 Subject: [R] if function problems In-Reply-To: References: <1312239443964-3710995.post@n4.nabble.com> Message-ID: Hi another possibility is to use logical values properties > (x < 0)*x [1] -3 -2 -1 0 0 0 0 Regards Petr > > In addition to what David said: > > On Mon, Aug 1, 2011 at 6:57 PM, zoe_zhang <1987.zhangxi at gmail.com> wrote: > > Dear All, > > Sorry to bother > > I want to write a function in R using if > > Say I have a dataset x, > > if x[i]<0, then x[i]=x[i], > > if x[i]>0, then x[i]=0 > > > > for example, x=-3:3, > > then using the function, x becomes [-3,-2,-1,0,0,0,0] > > > > I write the codes as follows, > > > > gjr=function(x) > > {lena=length(x) > > for(i in 1:lenx) > > if (x[i]<0) return (x[i]) > > if (x[i]>0) return (0) > > x} > > > > but then, doing > > gjr(x? > > it only comes out with one number > > > > Does anyone have any suggestions? > > You define `lena`, but then use `lenx` in `for (i in 1:lenx)` in your > function ... I guess this might have something to do with it. > > You shouldn't use a for loop, though, and just follow david's advice > by using logical indexing, or the `ifelse` function, ie: > > R> ifelse(x < 0, x, 0) > > HTH, > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > | Memorial Sloan-Kettering Cancer Center > | Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Achim.Zeileis at uibk.ac.at Tue Aug 2 11:31:32 2011 From: Achim.Zeileis at uibk.ac.at (Achim Zeileis) Date: Tue, 2 Aug 2011 11:31:32 +0200 (CEST) Subject: [R] ivreg and structural change In-Reply-To: References: Message-ID: On Mon, 1 Aug 2011, Claudio Shikida (??????????) wrote: > Hello, > > I am looking for some help with this question: how could I test structural > breaks in a instrumental variables?s model? In principle, most of tests used in the standard linear regression model can also be transferred to the IV case. However, many of the functions in "strucchange" do not do this. A notable exception is the function gefp(), see its manual page for references. This allows you to do something like gefp(y ~ x1 + x2 | z1 + z2, fit = ivreg, data = d) etc. > For example, I was trying to do something with my model with three time > series. > > tax_ivreg <- ivreg(l_y ~ l_x2 + l_x1+ dl_y | lag(l_x2, -1)+lag(l_x2, -2)+ > lag(l_x1, -1)+lag(l_x1, -2)+lag(l_y, -1)+lag(l_y, -2), data=tax1) > summary(tax_ivreg) I guess that this does not do what you want it to do. I would guess that this essentially yields a standard linear regression because the lag() is not correctly processed. If you want to use ivreg(), you need to set up the lagged variables by hand in advance. Alternatively, you can use dynlm() from the "dynlm" package which allows you to use lag() or the simpler L() function in the formula together with "zoo" data. For an example, how to set up the lagged variables by hand, you can look at the manual page of breakpoints(), especially the seatbelt data example. hth, Z > ## after estimating it, something weird happened with the several tests in > package "strucchange". For example: > > cusum <- efp(l_y ~ l_x2 + l_x1+ dl_y | lag(l_x2, -1)+lag(l_x2, -2)+ > lag(l_x1, -1)+lag(l_x1, -2)+lag(l_y, -1)+lag(l_y, -2), data=tax1, > type="OLS-CUSUM") > sctest(cusum) > plot(cusum) > coef(cusum, breaks=2) > > ## And: > > cusum <- efp(tax_ivreg, data=tax1, type="OLS-CUSUM") > sctest(cusum) > plot(cusum) > coef(cusum, breaks=2) > > ## 1. The plot of the two above were very different and > ## 2. When I ask for the breaks, instead of the dates, it returned me a line > of the summary of the estimated "tax_ivreg" > > Any help would be very appreciated. > > Thanks > > Claudio > > > > > -- > http://www.shikida.net and http://works.bepress.com/claudio_shikida/ > > Esta mensagem pode conter informa??o confidencial e/ou privilegiada. Se voc? > n?o for o destinat?rio ou a pessoa autorizada a receber esta mensagem, n?o > poder? usar, copiar ou divulgar as informa??es nela contidas ou tomar > qualquer a??o baseada nessas informa??es. Se voc? recebeu esta mensagem por > engano, por favor avise imediatamente o remetente, respondendo o presente > e-mail e apague-o em seguida. > This message may contain confidential and/or privileged ...{{dropped:9}} > > From 1987.zhangxi at gmail.com Tue Aug 2 06:50:19 2011 From: 1987.zhangxi at gmail.com (zoe_zhang) Date: Mon, 1 Aug 2011 21:50:19 -0700 (PDT) Subject: [R] if function problems In-Reply-To: References: <1312239443964-3710995.post@n4.nabble.com> Message-ID: <1312260619366-3711340.post@n4.nabble.com> Thank you for your adding, Steve, i followed Daivd's suggection and finally got the answer. It is my careless that should put lena instead of lenx. I also tried your codes and worked well. I appreciate your help. I learnt a lot from this forum. Cheers, Zoe -- View this message in context: http://r.789695.n4.nabble.com/if-function-problems-tp3710995p3711340.html Sent from the R help mailing list archive at Nabble.com. From remkoduursma at gmail.com Tue Aug 2 09:30:28 2011 From: remkoduursma at gmail.com (Remko Duursma) Date: Tue, 2 Aug 2011 00:30:28 -0700 (PDT) Subject: [R] How to 'mute' a function (like confint()) Message-ID: <1312270228949-3711537.post@n4.nabble.com> Dear R-helpers, I am using confint() within a function, and I want to turn off the message it prints: x <- rnorm(100) y <- x^1.1+rnorm(100) nlsfit <- nls(y ~ g0*x^g1, start=list(g0=1,g1=1)) > confint(nlsfit) Waiting for profiling to be done... 2.5% 97.5% g0 0.4484198 1.143761 g1 1.0380479 2.370057 I cannot find any way to turn off 'Waiting for. .." I tried options(max.print=0) and even sink(tempfile()) confint(nlsfit) sink() This suppresses the printing of the table, but not the cat()-ing of the 'Waiting for...'. But it keeps writing this message; is there any way to mute it, for this function and more generally? thanks, Remko -- View this message in context: http://r.789695.n4.nabble.com/How-to-mute-a-function-like-confint-tp3711537p3711537.html Sent from the R help mailing list archive at Nabble.com. From mandal.stat at gmail.com Tue Aug 2 11:26:48 2011 From: mandal.stat at gmail.com (Baidya Nath Mandal) Date: Tue, 2 Aug 2011 14:56:48 +0530 Subject: [R] R CMD check problem Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From nabble at binarybumps.com Tue Aug 2 10:57:43 2011 From: nabble at binarybumps.com (assaywiz) Date: Tue, 2 Aug 2011 01:57:43 -0700 (PDT) Subject: [R] Fitting ELISA measurements "unknowns" to 4 parameter logistic model In-Reply-To: <201102012118.36396.Hugo.Mildenberger@web.de> References: <201102012118.36396.Hugo.Mildenberger@web.de> Message-ID: <1312275463351-3711676.post@n4.nabble.com> Try http://www.myassays.com/four-parameter-fit.assay It?s free, requires no install and pre-configured for ELISAs. Just paste and go AW -- View this message in context: http://r.789695.n4.nabble.com/Fitting-ELISA-measurements-unknowns-to-4-parameter-logistic-model-tp3252381p3711676.html Sent from the R help mailing list archive at Nabble.com. From NICOADAMS000 at GMAIL.COM Tue Aug 2 07:26:55 2011 From: NICOADAMS000 at GMAIL.COM (DimmestLemming) Date: Mon, 1 Aug 2011 22:26:55 -0700 (PDT) Subject: [R] Clean up a scatterplot with too much data Message-ID: <1312262815893-3711389.post@n4.nabble.com> I'm working with a lot of data right now, but I'm new to R, and not very good with it, hence my request for help. What type of graph could I use to straighten out things like... http://r.789695.n4.nabble.com/file/n3711389/Untitled.png ...this? I want to see general frequencies. Should I use something like a 3D histogram, or is there an easier way like, say, shading? I'm sure these are both possible, but I don't know which is easiest or how to implement either of them. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html Sent from the R help mailing list archive at Nabble.com. From ripley at stats.ox.ac.uk Tue Aug 2 11:46:22 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Tue, 2 Aug 2011 10:46:22 +0100 (BST) Subject: [R] How to 'mute' a function (like confint()) In-Reply-To: <1312270228949-3711537.post@n4.nabble.com> References: <1312270228949-3711537.post@n4.nabble.com> Message-ID: See ?suppressMessages On Tue, 2 Aug 2011, Remko Duursma wrote: > Dear R-helpers, > > I am using confint() within a function, and I want to turn off the message > it prints: > > x <- rnorm(100) > y <- x^1.1+rnorm(100) > nlsfit <- nls(y ~ g0*x^g1, start=list(g0=1,g1=1)) > >> confint(nlsfit) > Waiting for profiling to be done... > 2.5% 97.5% > g0 0.4484198 1.143761 > g1 1.0380479 2.370057 > > > I cannot find any way to turn off 'Waiting for. .." > > I tried > > options(max.print=0) > > and even > > sink(tempfile()) > confint(nlsfit) > sink() > > This suppresses the printing of the table, but not the cat()-ing of the > 'Waiting for...'. > > But it keeps writing this message; is there any way to mute it, for this > function and more generally? > > > thanks, > Remko > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-mute-a-function-like-confint-tp3711537p3711537.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From paul.hiemstra at knmi.nl Tue Aug 2 11:53:17 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Tue, 02 Aug 2011 09:53:17 +0000 Subject: [R] Clean up a scatterplot with too much data In-Reply-To: <1312262815893-3711389.post@n4.nabble.com> References: <1312262815893-3711389.post@n4.nabble.com> Message-ID: <4E37C90D.9040104@knmi.nl> Hi, One solution could be to subsample the data, or jitter the data (give it some random noise). A more elegant solution, imho, is to use a 2d histogram (3d histogram is not a good alternative, I think it is much better to use color instead of a third dimension). I don't think this is easy to make using the standard plot system in R, but ggplot2 handles it nicely. This would involve you needing to learn ggplot2, but I would highly recommend that anyways :). An example of the plot I have in mind can be seen at: http://had.co.nz/ggplot2/stat_bin2d.html Just scroll down a bit for some examples. cheers, Paul On 08/02/2011 05:26 AM, DimmestLemming wrote: > I'm working with a lot of data right now, but I'm new to R, and not very good > with it, hence my request for help. What type of graph could I use to > straighten out things like... > > http://r.789695.n4.nabble.com/file/n3711389/Untitled.png > > ...this? > > I want to see general frequencies. Should I use something like a 3D > histogram, or is there an easier way like, say, shading? I'm sure these are > both possible, but I don't know which is easiest or how to implement either > of them. > > Thanks! > > -- > View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From fraenzi.korner at oikostat.ch Tue Aug 2 12:13:26 2011 From: fraenzi.korner at oikostat.ch (fraenzi.korner at oikostat.ch) Date: 2 Aug 2011 12:13:26 +0200 Subject: [R] =?utf-8?q?R-help_Digest=2C_Vol_102=2C_Issue_2?= Message-ID: <20110802101326.4278.qmail@srv5.yoursite.ch> Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden F?llen melden Sie sich bei Stefanie von Felten steffi.vonfelten at oikostat.ch We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfelten at oikostat.ch From francesca.pancotto at gmail.com Tue Aug 2 12:20:30 2011 From: francesca.pancotto at gmail.com (Francesca) Date: Tue, 2 Aug 2011 12:20:30 +0200 Subject: [R] Reorganize(stack data) a dataframe inducing names In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From karl at huftis.org Tue Aug 2 12:43:04 2011 From: karl at huftis.org (Karl Ove Hufthammer) Date: Tue, 2 Aug 2011 12:43:04 +0200 Subject: [R] Clean up a scatterplot with too much data References: <1312262815893-3711389.post@n4.nabble.com> Message-ID: DimmestLemming wrote: > I'm working with a lot of data right now, but I'm new to R, and not very > good with it, hence my request for help. What type of graph could I use to > straighten out things like... > > http://r.789695.n4.nabble.com/file/n3711389/Untitled.png Three nice alternatives: example(smoothScatter) example(sunflowerplot) library(hexbin) example(hexbinplot) (And do remove the outliers before plotting.) -- Karl Ove Hufthammer From murdoch.duncan at gmail.com Tue Aug 2 12:47:03 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 02 Aug 2011 06:47:03 -0400 Subject: [R] R CMD check problem In-Reply-To: References: Message-ID: <4E37D5A7.20009@gmail.com> On 11-08-02 5:26 AM, Baidya Nath Mandal wrote: > Dear friends, > > I am building an R package called *mypackage*. I followed every possible > steps (to my understanding) for the same. I got following problem while > doing *R CMD check mypackage*. > > * installing *source* package 'mypackage' ... > ** libs > cygwin warning: > MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf > Preferred POSIX equivalent is: > /cygdrive/c/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf > CYGWIN environment variable option "nodosfilewarning" turns off this > warning. > Consult the user's guide for more details about POSIX paths: > http://cygwin.com/cygwin-ug-net/using.html#using-pathnames I believe that warning is ignorable, but you can turn it off using set CYGWIN=nodosfilewarning It probably didn't cause the error below. > ERROR: compilation failed for package 'mypackage' I don't know what did cause that error, but it's likely something in your src directory of the package. What do you have there? Duncan Murdoch > * removing 'C:/Rpackages/mypackage.Rcheck/mypackage'. > > What I understood from above is that it is something with PATH variable. I > had set the following PATH variable: > C:\Rtools\bin;C:\Rtools\MinGW\bin;"C:\Program > Files\R\R-2.13.0\bin";"C:\Program Files\MiKTeX > 2.9\miktex\bin";%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;"C:\Program > Files\HTML Help Workshop" > > > Can anybody suggest what possibly could have gone wrong? > > Thanks, > BN Mandal > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From karl at huftis.org Tue Aug 2 12:54:21 2011 From: karl at huftis.org (Karl Ove Hufthammer) Date: Tue, 2 Aug 2011 12:54:21 +0200 Subject: [R] Plotting question References: <1312191869.99830.YahooMailRC@web26507.mail.ukl.yahoo.com> Message-ID: Andrew McCulloch wrote: > I use R to draw my graphs. I have 100 points on a simple xy-plot. The > points are distinguished by a third variable which is categorical with 10 > levels. I have been plotting x against y and using gray scales to > distinguish the level of the categorical variable for each point. It looks > ok to me but a journal reviewer says this is not any use. I cannot afford > to pay for colour prints. Any ideas on what is the best way to distinguish > 10 groups on an xy scatter plot? How about having *10* scatterplots + an identical grid in each plot? Try example(coplot) for an idea about it could look (ignore the marginal plots). Of course, do use the lattice or the ggplot2 package, not the coplot function. Too bad you have 10 groups and not 9 (or 12), BTW ... :-/ -- Karl Ove Hufthammer From jwiley.psych at gmail.com Tue Aug 2 12:54:30 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 2 Aug 2011 03:54:30 -0700 Subject: [R] R CMD check problem In-Reply-To: References: Message-ID: <773B2E0E-2992-4D1D-B5D1-1E90E78640AD@gmail.com> The cygwin warning should not be fatal. Is that what made you think there's a problem with your path? Can you upload mypackage online? Two options would be Github hosts that sort of thing or you could use a tar ball and any file hosting service. I (and possibly others more skilled) would be happy to try it on my system if I had it. You should also be able to see exactly where in the build process it failed from the log. Cheers, Josh On Aug 2, 2011, at 2:26, Baidya Nath Mandal wrote: > Dear friends, > > I am building an R package called *mypackage*. I followed every possible > steps (to my understanding) for the same. I got following problem while > doing *R CMD check mypackage*. > > * installing *source* package 'mypackage' ... > ** libs > cygwin warning: > MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf > Preferred POSIX equivalent is: > /cygdrive/c/PROGRA~1/R/R-213~1.0/etc/i386/Makeconf > CYGWIN environment variable option "nodosfilewarning" turns off this > warning. > Consult the user's guide for more details about POSIX paths: > http://cygwin.com/cygwin-ug-net/using.html#using-pathnames > ERROR: compilation failed for package 'mypackage' > * removing 'C:/Rpackages/mypackage.Rcheck/mypackage'. > > What I understood from above is that it is something with PATH variable. I > had set the following PATH variable: > C:\Rtools\bin;C:\Rtools\MinGW\bin;"C:\Program > Files\R\R-2.13.0\bin";"C:\Program Files\MiKTeX > 2.9\miktex\bin";%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;%SYSTEMROOT%\System32\WindowsPowerShell\v1.0\;"C:\Program > Files\HTML Help Workshop" > > > Can anybody suggest what possibly could have gone wrong? > > Thanks, > BN Mandal > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From paulepanter at users.sourceforge.net Tue Aug 2 13:41:12 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Tue, 02 Aug 2011 13:41:12 +0200 Subject: [R] Is R the right choice for simulating first passage times of random walks? In-Reply-To: References: <1311809771.29519.276.camel@mattotaupa> <1312155391.3609.135.camel@mattotaupa> Message-ID: <1312285272.3658.154.camel@mattotaupa> Dear Dennis and Steve, Am Sonntag, den 31.07.2011, 23:32 -0400 schrieb Steve Lianoglou: [?] > How about trying to write the of this `f4` function below using the > rcpp/inline combo. The C/C++ you will need to write looks to be quite > trivial, let's change f4 to accept an x argument as a vector: > > I've defined f4 in the same way as Dennis did: > > > f4 <- function() > > { > > x <- sample(c(-1L,1L), 1) > > > > if (x >= 0 ) {return(1)} else { > > csum <- x > > len <- 1 > > while(csum < 0) { > > csum <- csum + sample(c(-1, 1), 1) > > len <- len + 1 > > } } > > len > > } > > Now, let's do some inline/c++ mojo: > > library(inline) > inc <- " > #include > #include > #include > " > > fxx <-cxxfunction(includes=inc, plugin="Rcpp", body=" > int len = 1; > int x = ((rand() % 2 ) == 0) ? 1 : -1; > int csum = x; > > while (csum < 0) { > x = ((rand() % 2 ) == 0) ? 1 : -1; > len++; > csum = csum + x; > } > > return wrap(len); > ") > > Assuming I've faithfully translated this into c++, the timings aren't > all that comparable. > > Doing 500 replicates with the pure R version: > > set.seed(123) > system.time(out <- replicate(500, f4())) > user system elapsed > 31.525 0.120 32.510 > > Doing 10,000 replicates using the fxx function doesn't even break a sweat: > > system.time(outxx <- replicate(10000, fxx())) > user system elapsed > 0.371 0.001 0.373 > > range(out) > [1] 1 1994308 > > range(outxx) > [1] 1 11909394 thank you very much for your suggestions. This is indeed a nice speed. 1. I first had that implemented in FORTRAN (and Python) too, but turned to R for two reasons. First I wanted to use also other distributions later on and thought that it would be easier with R and that R would have that implemented as fast as possible. Secondly I thought that R would also operate faster having the right vectorization and using `csum()`. But I guess it is difficult to find a good model to use the advantages of R. Especially looking at `top` when running this example CPU is used 100 % but memory only 40 MB from 2 GB. So if one could use another data structure maybe the calculations could be done on more walks at once. 2. It is indeed possible that the walk never returns to zero, so I should make sure, that I abort the while loop after a certain length. 3. Looking at the data types I am wondering if some integer overflow(?) could happen. I could make the length variable unsigned I suppose [1]. But still `csum` could go from `-len` to 0 and for the normal random walk unsigned should not be a problem too besides that the logic/checks have to be adapted. For integrated random walks, `ccsum += csum`, `ccsum` would go from -(ccsum**2)/2 up to 0. So later on I should use probably the 64 bit data type (unsigned) `long` for `ccsum`, `csum` and `length` to avoid those problems. Memory does not seem to be a problem. Also I need to add an additional check for the height and length in the while loop like the following. (csum < 0) && (csum > -ULONG_MAX) && (len =< ULONG_MAX) So I came up with the following and to use unsigned I only consider that the random walk stays positive instead of negative. -------- 8< -------- code -------- >8 -------- library(inline) inc <- " #include #include #include #include " f9 <-cxxfunction(includes=inc, plugin="Rcpp", body=" unsigned long len = 1; if ((rand() % 2 ) == 0) { return wrap(len); } unsigned long x = 1; for (unsigned long csum = x; csum > 0; csum = ((rand() % 2 ) == 0) ? csum + 1: csum - 1) { len++; if ((csum == ULONG_MAX) && (len == ULONG_MAX)) { return wrap(len); } } return wrap(len); ") -------- 8< -------- code -------- >8 -------- I do not know if the compiler would have optimized it that way anyway and if there is any difference (besides the overflow checks). > set.seed(1); system.time( z9_1 <- replicate(1000, f9()) ) User System verstrichen 0.076 0.004 0.084 > range(z9_1) [1] 1 1449034 > length(z9_1) [1] 1000 Thanks, Paul [1] https://secure.wikimedia.org/wikipedia/en/wiki/Integer_(computer_science)#Common_integral_data_types -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From guillaume_bs at hotmail.com Tue Aug 2 11:45:58 2011 From: guillaume_bs at hotmail.com (Guillaume) Date: Tue, 2 Aug 2011 02:45:58 -0700 (PDT) Subject: [R] Memory limit in Aggregate() Message-ID: <1312278358738-3711819.post@n4.nabble.com> Dear all, I am trying to aggregate a table (divided in two lists here), but get a memory error. Here is the code I'm running : sessionInfo() print(paste("memory.limit() ", memory.limit())) print(paste("memory.size() ", memory.size())) print(paste("memory.size(TRUE) ", memory.size(TRUE))) print(paste("size listX ", object.size(listX))) print(paste("size listBy ", object.size(listBy))) print(paste("length ", object.size(nrow(listX)))) tableAgg <- aggregate(x = listX , by = listBy , FUN = "max") It returns : R version 2.9.0 Patched (2009-05-09 r48513) i386-pc-mingw32 locale: LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 attached base packages: [1];stats;graphics;grDevices;utils;datasets;methods;base other attached packages: [1];RODBC_1.3-2;HarpTools_1.4;HarpReport_1.9 loaded via a namespace (and not attached): [1];tools_2.9.0 [1];"memory.limit() 4095" [1];"memory.size() 31.92" [1];"memory.size(TRUE) 166.94" [1];"size listX 218312" [1];"size listBy 408552" [1];"length 9083" Erreur in vector("list", prod(extent)) : cannot allocate vector of length 1224643220 (the last line is translated from the french error message "impossible d'allouer un vecteur de longueur 1224643220" ) Why would R create such a long vector (my original lists , and is there a way to avoid this error ? Thank you for your help, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3711819.html Sent from the R help mailing list archive at Nabble.com. From kathryn.lord2000 at gmail.com Tue Aug 2 13:05:59 2011 From: kathryn.lord2000 at gmail.com (Kathie) Date: Tue, 2 Aug 2011 04:05:59 -0700 (PDT) Subject: [R] efficient way to reduce running time Message-ID: <1312283159235-3711985.post@n4.nabble.com> Dear R users, Would you plz tell me how to avoid this "for" loop blow?? I think there might be a better way to reduce running time. ---------------------------------------------------------------------------------------------- ## y1 and y2 are n*1 vectors for (k in 1:n){ ymax <- max( y1[k], y2[k] ) i <- 0:ymax sums<- -lgamma(y1[k]-i+1)-lgamma(i+1)-lgamma(y2[k]-i+1) maxsums <- max(sums) sums <- sums - maxsums lsum <- log( sum(exp(sums)) ) + maxsums logbp[k] <- y1[k] + y2[k] + lsum } ------------------------------------------------------------------------------------ Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/efficient-way-to-reduce-running-time-tp3711985p3711985.html Sent from the R help mailing list archive at Nabble.com. From chakri2sai at yahoo.co.in Tue Aug 2 13:07:49 2011 From: chakri2sai at yahoo.co.in (chakri) Date: Tue, 2 Aug 2011 04:07:49 -0700 (PDT) Subject: [R] Standard Deviation of a matrix Message-ID: <1312283269035-3711991.post@n4.nabble.com> Hello, My R knowledge could not take me any further, so this request ! I have a matrix of dimensions (1185 X 1185). I want to calculate standard deviation of entire matrix. sd function of {stats} calculates standard deviation for each row/column, giving 1 X 1185 matrix as result. I would like to have 1 X 1 matrix as result. Any ideas, how to do this ? TIA Chakri -- View this message in context: http://r.789695.n4.nabble.com/Standard-Deviation-of-a-matrix-tp3711991p3711991.html Sent from the R help mailing list archive at Nabble.com. From silvano at uel.br Tue Aug 2 13:53:06 2011 From: silvano at uel.br (Silvano) Date: Tue, 2 Aug 2011 08:53:06 -0300 Subject: [R] Using Function Message-ID: Hi, I have some simple statistics to calculate for a large number of variables. I created a simple function to apply to variables. I would like the variable name to be placed automatically. I tried the following function but is not working. desc = function(x){ media = mean(x, na.rm=T) desvio = sd(x, na.rm=T) cv = desvio/media*100 saida = cbind(media, desvio, cv) colnames(saida) = c(NULL, 'M?dia', 'Desvio', 'CV') rownames(saida) = c(x) saida } desc(Idade) M?dia Desvio CV Idade 44.04961 16.9388 38.4539 How do you get the variable name is placed as the first element? My objective is get something like: rbind( desc(Altura), desc(Idade), desc(IMC), desc(FC), desc(CIRCABD), desc(GLICOSE), desc(UREIA), desc(CREATINA), desc(CTOTAL), desc(CHDL), desc(CLDL), desc(CVLDL), desc(TRIG), desc(URICO), desc(SAQRS), desc(SOKOLOW_LYON), desc(CORNELL), desc(QRS_dur), desc(Interv_QT) ) Thanks a lot, -------------------------------------- Silvano Cesar da Costa Departamento de Estat?stica Universidade Estadual de Londrina Fone: 3371-4346 From paul.hiemstra at knmi.nl Tue Aug 2 14:06:54 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Tue, 02 Aug 2011 12:06:54 +0000 Subject: [R] Standard Deviation of a matrix In-Reply-To: <1312283269035-3711991.post@n4.nabble.com> References: <1312283269035-3711991.post@n4.nabble.com> Message-ID: <4E37E85E.2050603@knmi.nl> Hi! The sample below should give you what you want: M = matrix(runif(100), 10, 10) sd(as.numeric(M)) So the as.numeric command is the key. It transforms the matrix to a 1D vector. Or alternatively without using as.numeric: M = matrix(runif(100), 10, 10) M dim(M) = 100 M sd(M) Here I use the dim command to set the dimensions to a vector of 100 long. cheers, Paul On 08/02/2011 11:07 AM, chakri wrote: > Hello, > > My R knowledge could not take me any further, so this request ! > > I have a matrix of dimensions (1185 X 1185). I want to calculate standard > deviation of entire matrix. > sd function of {stats} calculates standard deviation for each row/column, > giving 1 X 1185 matrix as result. I would like to have 1 X 1 matrix as > result. > > Any ideas, how to do this ? > > TIA > Chakri > > -- > View this message in context: http://r.789695.n4.nabble.com/Standard-Deviation-of-a-matrix-tp3711991p3711991.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From paul.hiemstra at knmi.nl Tue Aug 2 14:43:38 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Tue, 02 Aug 2011 12:43:38 +0000 Subject: [R] Errors, driving me nuts In-Reply-To: References: Message-ID: <4E37F0FA.3010402@knmi.nl> On 08/01/2011 08:47 PM, Matt Curcio wrote: > Greetings all, > I am getting this error that is driving me nuts... (not a long trip, haha) > > I have a set of files and in these files I want to calculate ttests on > rows 'compareA' and 'compareB' (these will change over time there I > want a variable here). Also these files are in many different > directories so I want a way filter out the junk... Anyway I don't > believe that this is related to my errors but I mention it none the > less. > >> files_to_test <- list.files (pattern = "kegg.combine") >> for (i in 1:length (files_to_test)) { > + raw_data <- read.table (files_to_test[i], header=TRUE, sep=" ") > + tmpA <- raw_data[,compareA] > + tmpB <- raw_data[,compareB] > + tt <- t.test (tmpA, tmpB, var.equal=TRUE) > + tt_pvalue[i] <- tt$p.value > + } > Error in tt_pvalue[i] <- tt$p.value : object 'tt_pvalue' not found > # I tried setting up a vector... > # as.vector(tt_pvalue, mode="any") ### but NO GO ...an awesome alternative is to use ldply from the plyr package: library(plyr) files_to_test <- list.files (pattern = "kegg.combine") tt_pvalue <- ldply(files_to_test, function(fname) { raw_data <- read.table (files_to_test[i], header=TRUE, sep=" ") tmpA <- raw_data[,compareA] tmpB <- raw_data[,compareB] tt <- t.test (tmpA, tmpB, var.equal=TRUE) return(data.frame(fname = fname, pvalue = tt$p.value)) }, .progress = TRUE) This saves you some bookkeeping (no need to create tt_pvalue in advance and keep track of the iterator (i)) and you get a nice progress bar (good when loops take long). ldply (and other plyr functions) are what I use most when processing large amounts of information. cheers, Paul >> file.name = paste("ttest.results.", compareA, compareB, "") >> setwd(save_to) >> write.table(tt_pvalue, file=file.name, sep="\t" ) > Error in inherits(x, "data.frame") : object 'tt_pvalue' not found > # No idea?? > > What is going wrong?? > M > > > Matt Curcio > M: 401-316-5358 > E: matt.curcio.ri at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From petr.pikal at precheza.cz Tue Aug 2 14:48:54 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Tue, 2 Aug 2011 14:48:54 +0200 Subject: [R] Standard Deviation of a matrix In-Reply-To: <4E37E85E.2050603@knmi.nl> References: <1312283269035-3711991.post@n4.nabble.com> <4E37E85E.2050603@knmi.nl> Message-ID: Hi > Hi! > > The sample below should give you what you want: > > M = matrix(runif(100), 10, 10) > sd(as.numeric(M)) > > So the as.numeric command is the key. It transforms the matrix to a 1D > vector. Or alternatively without using as.numeric: > > M = matrix(runif(100), 10, 10) > M > dim(M) = 100 or dim(M)<-NULL > M > sd(M) > > Here I use the dim command to set the dimensions to a vector of 100 long. > > cheers, > Paul > > On 08/02/2011 11:07 AM, chakri wrote: > > Hello, > > > > My R knowledge could not take me any further, so this request ! > > > > I have a matrix of dimensions (1185 X 1185). I want to calculate standard > > deviation of entire matrix. > > sd function of {stats} calculates standard deviation for each row/column, > > giving 1 X 1185 matrix as result. I would like to have 1 X 1 matrix as > > result. > > > > Any ideas, how to do this ? > > > > TIA > > Chakri > > > > -- > > View this message in context: http://r.789695.n4.nabble.com/Standard- > Deviation-of-a-matrix-tp3711991p3711991.html > > Sent from the R help mailing list archive at Nabble.com. > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > -- > Paul Hiemstra, Ph.D. > Global Climate Division > Royal Netherlands Meteorological Institute (KNMI) > Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 > P.O. Box 201 | 3730 AE | De Bilt > tel: +31 30 2206 494 > > http://intamap.geo.uu.nl/~paul > http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Tue Aug 2 14:53:42 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 08:53:42 -0400 Subject: [R] Standard Deviation of a matrix In-Reply-To: References: <1312283269035-3711991.post@n4.nabble.com> <4E37E85E.2050603@knmi.nl> Message-ID: On Aug 2, 2011, at 8:48 AM, Petr PIKAL wrote: > Hi > >> Hi! >> >> The sample below should give you what you want: >> >> M = matrix(runif(100), 10, 10) >> sd(as.numeric(M)) >> >> So the as.numeric command is the key. It transforms the matrix to a >> 1D >> vector. Or alternatively without using as.numeric: >> >> M = matrix(runif(100), 10, 10) >> M >> dim(M) = 100 > > or dim(M)<-NULL shortest would surely be: sd( c(M) ) -- David. > >> M >> sd(M) >> >> Here I use the dim command to set the dimensions to a vector of 100 > long. >> >> cheers, >> Paul >> >> On 08/02/2011 11:07 AM, chakri wrote: >>> Hello, >>> >>> My R knowledge could not take me any further, so this request ! >>> >>> I have a matrix of dimensions (1185 X 1185). I want to calculate > standard >>> deviation of entire matrix. >>> sd function of {stats} calculates standard deviation for each > row/column, >>> giving 1 X 1185 matrix as result. I would like to have 1 X 1 >>> matrix as >>> result. >>> >>> Any ideas, how to do this ? >>> >>> TIA >>> Chakri >>> >>> -- >>> View this message in context: http://r.789695.n4.nabble.com/ >>> Standard- >> Deviation-of-a-matrix-tp3711991p3711991.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> -- >> Paul Hiemstra, Ph.D. >> Global Climate Division >> Royal Netherlands Meteorological Institute (KNMI) >> Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 >> P.O. Box 201 | 3730 AE | De Bilt >> tel: +31 30 2206 494 >> >> http://intamap.geo.uu.nl/~paul >> http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From dntssa at yahoo.com Tue Aug 2 14:18:24 2011 From: dntssa at yahoo.com (john james) Date: Tue, 2 Aug 2011 05:18:24 -0700 (PDT) Subject: [R] Functions for Sum of determinants of ranges of matrix subsets Message-ID: <1312287504.1746.YahooMailClassic@web112511.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From kli at dataminds.dk Tue Aug 2 14:22:18 2011 From: kli at dataminds.dk (=?iso-8859-1?Q?Kim_Lilles=F8e?=) Date: Tue, 2 Aug 2011 14:22:18 +0200 Subject: [R] execute r-code stored in a string variable Message-ID: <60F5867E29CEBC45A921E3D9B2B7511609796203E3@SERVER1.dataminds.local> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From adarwish at gmail.com Tue Aug 2 14:51:00 2011 From: adarwish at gmail.com (adarwish) Date: Tue, 2 Aug 2011 05:51:00 -0700 (PDT) Subject: [R] Problem Installing/Uninstalling Rattle Message-ID: <1312289460450-3712221.post@n4.nabble.com> Rattle won't install properly on my Windows 7 64 bit laptop. Here is what I've tried: I've followed the instructions here: http://rattle.togaware.com/rattle-install-mswindows.html I had R installed already. I downloaded the GTK+ packages, unzipped the 32 bit one into c:\gtkwin32. I put c:\gtkwin32\bin in the system variables PATH. I launched R, installed the rattle package, called the rattle library, called rattle(). It told me RGtk2 could not be found and asked to install it. I let it download it to install, but still nothing. Restarting/resintalling R has not helped. And when I try "remove.packages(rattle)" I get the error: Removing package(s) from ?C:/Users/darwish/Documents/R/win-library/2.13? (as ?lib? is unspecified) Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments I've restarted R before trying anything multiple times. >From what I understand, I need to clean everything off and start anew. How do I remove rattle so I can start fresh? What did I do wrong in my steps? Thanks in advance. -- View this message in context: http://r.789695.n4.nabble.com/Problem-Installing-Uninstalling-Rattle-tp3712221p3712221.html Sent from the R help mailing list archive at Nabble.com. From petr.pikal at precheza.cz Tue Aug 2 15:04:36 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Tue, 2 Aug 2011 15:04:36 +0200 Subject: [R] Odp: Using Function In-Reply-To: References: Message-ID: Hi > > Hi, > > I have some simple statistics to calculate for a large > number of variables. > I created a simple function to apply to variables. > I would like the variable name to be placed automatically. > I tried the following function but is not working. > > desc = function(x){ > media = mean(x, na.rm=T) > desvio = sd(x, na.rm=T) > cv = desvio/media*100 > saida = cbind(media, desvio, cv) > colnames(saida) = c(NULL, 'M?dia', > 'Desvio', 'CV') > rownames(saida) = c(x) > saida > } You are quite close. This seems to do what you want if I presume that your variables are located in data frame desc = function(x){ media = mean(x, na.rm=T) desvio = sd(x, na.rm=T) cv = desvio/media*100 saida = data.frame(Media=media, Desvio=desvio, CV=cv) saida } iris4 <- iris[,1:4] sapply(iris4, desc) Sepal.Length Sepal.Width Petal.Length Petal.Width Media 5.843333 3.057333 3.758 1.199333 Desvio 0.8280661 0.4358663 1.765298 0.7622377 CV 14.17113 14.25642 46.97441 63.55511 If you want switch rows and cols use t(sapply(iris4, desc)) Regards Petr > > desc(Idade) > > M?dia Desvio CV > Idade 44.04961 16.9388 38.4539 > > How do you get the variable name is placed as the first > element? > > My objective is get something like: > > rbind( > desc(Altura), > desc(Idade), > desc(IMC), > desc(FC), > desc(CIRCABD), > desc(GLICOSE), > desc(UREIA), > desc(CREATINA), > desc(CTOTAL), > desc(CHDL), > desc(CLDL), > desc(CVLDL), > desc(TRIG), > desc(URICO), > desc(SAQRS), > desc(SOKOLOW_LYON), > desc(CORNELL), > desc(QRS_dur), > desc(Interv_QT) > ) > > Thanks a lot, > > -------------------------------------- > Silvano Cesar da Costa > Departamento de Estat?stica > Universidade Estadual de Londrina > Fone: 3371-4346 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From djmuser at gmail.com Tue Aug 2 15:07:21 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 06:07:21 -0700 Subject: [R] Clean up a scatterplot with too much data In-Reply-To: <1312262815893-3711389.post@n4.nabble.com> References: <1312262815893-3711389.post@n4.nabble.com> Message-ID: In addition to the other responses (all of which I liked), a couple of other alternatives to consider are 2D density plots (see ?kde2d in the MASS package, for example) or geom_tile() in the ggplot2 package, which you can think of as a 3D histogram projected to 2D with color corresponding to (relative) frequency, as suggested by Paul Hiemstra. geom_tile() is a discretized, gridded version of a hexbin plot, but I would start with the hexbin myself. I echo KOH's comment: make sure you remove the outliers first, especially that one in the upper left corner :) After looking at your plot, here's my question: why would you plot kills/minute vs. minutes played? Doesn't the first variable render the second one moot? Wouldn't kills vs. minutes played be a more relevant (scatter)plot? If you have information on the skill level of the players, you could incorporate that information into the plot as well. There are several nice ways to go if this is the case. If kills/minute is the more appropriate measure, a univariate density plot would make sense, or a histogram. HTH, Dennis On Mon, Aug 1, 2011 at 10:26 PM, DimmestLemming wrote: > I'm working with a lot of data right now, but I'm new to R, and not very good > with it, hence my request for help. What type of graph could I use to > straighten out things like... > > http://r.789695.n4.nabble.com/file/n3711389/Untitled.png > > ...this? > > I want to see general frequencies. Should I use something like a 3D > histogram, or is there an easier way like, say, shading? I'm sure these are > both possible, but I don't know which is easiest or how to implement either > of them. > > Thanks! > > -- > View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From paul.hiemstra at knmi.nl Tue Aug 2 15:11:01 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Tue, 02 Aug 2011 13:11:01 +0000 Subject: [R] Clean up a scatterplot with too much data In-Reply-To: References: <1312262815893-3711389.post@n4.nabble.com> Message-ID: <4E37F765.8010900@knmi.nl> On 08/02/2011 01:07 PM, Dennis Murphy wrote: > In addition to the other responses (all of which I liked), a couple of > other alternatives to consider are 2D density plots (see ?kde2d in the > MASS package, for example) or geom_tile() in the ggplot2 package, > which you can think of as a 3D histogram projected to 2D with color > corresponding to (relative) frequency, as suggested by Paul Hiemstra. > geom_tile() is a discretized, gridded version of a hexbin plot, but I When using geom_tile you need to bin the data yourself. I much prefer using stat_bin2d which does all the work for you. cheers, Paul > would start with the hexbin myself. I echo KOH's comment: make sure > you remove the outliers first, especially that one in the upper left > corner :) > > After looking at your plot, here's my question: why would you plot > kills/minute vs. minutes played? Doesn't the first variable render the > second one moot? Wouldn't kills vs. minutes played be a more relevant > (scatter)plot? If you have information on the skill level of the > players, you could incorporate that information into the plot as well. > There are several nice ways to go if this is the case. > > If kills/minute is the more appropriate measure, a univariate density > plot would make sense, or a histogram. > > HTH, > Dennis > > On Mon, Aug 1, 2011 at 10:26 PM, DimmestLemming wrote: >> I'm working with a lot of data right now, but I'm new to R, and not very good >> with it, hence my request for help. What type of graph could I use to >> straighten out things like... >> >> http://r.789695.n4.nabble.com/file/n3711389/Untitled.png >> >> ...this? >> >> I want to see general frequencies. Should I use something like a 3D >> histogram, or is there an easier way like, say, shading? I'm sure these are >> both possible, but I don't know which is easiest or how to implement either >> of them. >> >> Thanks! >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/Clean-up-a-scatterplot-with-too-much-data-tp3711389p3711389.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Paul Hiemstra, Ph.D. Global Climate Division Royal Netherlands Meteorological Institute (KNMI) Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39 P.O. Box 201 | 3730 AE | De Bilt tel: +31 30 2206 494 http://intamap.geo.uu.nl/~paul http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770 From michael.weylandt at gmail.com Tue Aug 2 15:48:32 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Tue, 2 Aug 2011 09:48:32 -0400 Subject: [R] Identifying US holidays In-Reply-To: References: <381BAD24-EB2C-4DEA-A2B0-D2EB006512E2@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Tue Aug 2 15:56:58 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 06:56:58 -0700 Subject: [R] efficient way to reduce running time In-Reply-To: <1312283159235-3711985.post@n4.nabble.com> References: <1312283159235-3711985.post@n4.nabble.com> Message-ID: Hi: Could you please provide a reproducible example? In your code, (i) n is undefined; (ii) logbp is undefined. A description of what you want to do and/or a reproducible example with an expected outcome would be useful. As the bottom of each e-mail to R-help says... PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Dennis On Tue, Aug 2, 2011 at 4:05 AM, Kathie wrote: > Dear R users, > > Would you plz tell me how to avoid this "for" loop blow?? > > I think there might be a better way to reduce running time. > > ---------------------------------------------------------------------------------------------- > ## y1 and y2 are n*1 vectors > > ? ? ? ?for (k in 1:n){ > ? ? ? ? ? ? ? ?ymax <- max( y1[k], y2[k] ) > > ? ? ? ? ? ? ? ?i <- 0:ymax > > ? ? ? ? ? ? ? ?sums<- -lgamma(y1[k]-i+1)-lgamma(i+1)-lgamma(y2[k]-i+1) > > ? ? ? ? ? ? ? ?maxsums <- max(sums) > > ? ? ? ? ? ? ? ?sums <- sums - maxsums > > ? ? ? ? ? ? ? ?lsum <- log( sum(exp(sums)) ) + maxsums > > ? ? ? ? ? ? ? ?logbp[k] <- y1[k] ?+ y2[k] ?+ lsum > ? ? ? ?} > > ------------------------------------------------------------------------------------ > > Any suggestion will be greatly appreciated. > > Regards, > > Kathryn Lord > > -- > View this message in context: http://r.789695.n4.nabble.com/efficient-way-to-reduce-running-time-tp3711985p3711985.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From dimitri.liakhovitski at gmail.com Tue Aug 2 15:59:28 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Tue, 2 Aug 2011 09:59:28 -0400 Subject: [R] Identifying US holidays In-Reply-To: References: <381BAD24-EB2C-4DEA-A2B0-D2EB006512E2@gmail.com> Message-ID: Thanks a lot, Michael - that's exactly what I was looking for! Dimitri On Tue, Aug 2, 2011 at 9:48 AM, R. Michael Weylandt wrote: > Now that I'm back at my computer, I'll actually suggest you do something > else entirely. > > If you look at the code of holidayNYSE() or by calling listHolidays() of the > timeDate package you'll see that there are many many functions that get > every conceivable holiday directly. I'll let you pick the holidays you want, > but a simple script might be like this: > > x<-seq(as.Date("2011-01-01"), as.Date("2011-12-31"),by="day") > > GetHolidays <- function(x) { > ??? years = as.POSIXlt(x)$year+1900 > ??? years = unique(years) > ??? holidays <- NULL > ??? for (y in years) { > #If you don't need the if/then statements to include which years something > was a NYSE holiday, you should drop the loop since the holiday functions are > vectorized > ??????? if (y >= 1885) > ??????????? holidays <- c(holidays, as.character(USNewYearsDay(y))) > ??????? if (y >= 1885) > ??????????? holidays <- c(holidays, as.character(USIndependenceDay(y))) > ??????? if (y >= 1885) > ??????????? holidays <- c(holidays, as.character(USThanksgivingDay(y))) > ??????? if (y >= 1885) > ??????????? holidays <- c(holidays, as.character(USChristmasDay(y))) > ???? } > ???? holidays = as.Date(holidays,format="%Y-%m-%d") > ???? ans = x %in% holidays > ??? return(ans) > } > > This should return a boolean vector indicating which dates fall on the > selected holidays: feel free to add/delete holidays as you wish. To get the > actual holiday dates, this should work: x[GetHolidays(x)]. If you want to > identify things by holiday, you'll only have to modify the script slightly. > Let me know if I can help further! > > Michael Weylandt > > On Mon, Aug 1, 2011 at 4:57 PM, Dimitri Liakhovitski > wrote: >> >> To be specific, I only need to get rid of 2 NYSE holidays: >> Washington's Birthday and Good Friday. >> Is there a way to reduce the vector of NYSE holidays in timeDate by >> throwing out those two? >> Thank you! >> Dimitri >> >> On Mon, Aug 1, 2011 at 4:24 PM, R. Michael Weylandt >> wrote: >> > Don't know if this is sufficiently slick for this list (which never >> > fails to impress me with quick and elegant solutions) but I would point out >> > to you that GF is the only NYSE holiday falling in March or April so it >> > shouldn't be hard to discard it if desired. >> > >> > Michael Weylandt >> > >> > On Aug 1, 2011, at 4:18 PM, Dimitri Liakhovitski >> > wrote: >> > >> >> Just to clarify - I realize that "major" is subjective here. Maybe I >> >> should say "most common". >> >> But maybe there is a way for me to select from a list of all NYSE >> >> holidays and flag only some of them? >> >> Just not sure how to do it... >> >> Thanks! >> >> Dimitri >> >> >> >> On Mon, Aug 1, 2011 at 3:45 PM, Dimitri Liakhovitski >> >> wrote: >> >>> Hello! >> >>> >> >>> I am trying to identify which ones of a vector of dates are US >> >>> holidays. And, ideally, which is which. And I do not know (a-priori) >> >>> which dates those should be. >> >>> I have, for example: >> >>> ?x<-seq(as.Date("2011-01-01"),as.Date("2011-12-31"),by="day") >> >>> (x) >> >>> >> >>> I think chron should help me here - but maybe I am not using it >> >>> properly: >> >>> >> >>> library(chron) >> >>> is.holiday(chron) # Says that none of those dates are holidays >> >>> >> >>> ?is.holiday says: "holidays" is an object that should be listing >> >>> holidays. But I want to figure out which of my dates are US holidays >> >>> and don't want to provide a list of >> >>> >> >>> Package timeDate does almost what I need: >> >>> library(timeDate) >> >>> holidayNYSE(2008:2010) >> >>> holidayNYSE() >> >>> >> >>> However, I don't need all the NYSE holidays (like Good Friday). Just >> >>> the major US holidays - New Years, MLK, Memorial Day, Independence >> >>> Day, Labor Day, Halloween, Thanksgiving, Christmas. >> >>> Is there any way to identify major US holidays? >> >>> >> >>> Thanks a lot! >> >>> >> >>> - >> >>> Dimitri Liakhovitski >> >>> marketfusionanalytics.com >> >>> >> >> >> >> >> >> >> >> -- >> >> Dimitri Liakhovitski >> >> marketfusionanalytics.com >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com > > -- Dimitri Liakhovitski marketfusionanalytics.com From djmuser at gmail.com Tue Aug 2 16:03:54 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 07:03:54 -0700 Subject: [R] Functions for Sum of determinants of ranges of matrix subsets In-Reply-To: <1312287504.1746.YahooMailClassic@web112511.mail.gq1.yahoo.com> References: <1312287504.1746.YahooMailClassic@web112511.mail.gq1.yahoo.com> Message-ID: Hi: Try this: > z <- matrix(rnorm(100), nrow = 10) > sum(sapply(seq_len(nrow(z)), function(k) det(z[-k, -k]))) [1] 1421.06 where > sapply(seq_len(nrow(z)), function(k) det(z[-k, -k])) [1] 432.11613 81.65449 516.95791 54.72775 804.32097 -643.35436 [7] -411.15932 394.18780 84.13173 107.47665 HTH, Dennis On Tue, Aug 2, 2011 at 5:18 AM, john james wrote: > Dear R-help list, > Pls I have this problem. Suppose I have a matrix of size nxn say, generated as follows > > z<-matrix(rnorm(n*n,0,1),nrow=n) > > I want to write a function such that for i in 1:n, I will remove the rows and columns > corresponding to i (so, will be left with n-1*n-1 submatrix in each cases). Now I need > the sum of the determinant of each of this submatrices. As an example, if n=3, it means I will have det(1strow and 1stcolum removed) + det(2ndrow and 2ndcolum removed) + det(3rdrow and 3rdcolum removed). > > Any help will be appreciated. Thanks > > John > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From izahn at psych.rochester.edu Tue Aug 2 16:05:31 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Tue, 2 Aug 2011 10:05:31 -0400 Subject: [R] execute r-code stored in a string variable In-Reply-To: <60F5867E29CEBC45A921E3D9B2B7511609796203E3@SERVER1.dataminds.local> References: <60F5867E29CEBC45A921E3D9B2B7511609796203E3@SERVER1.dataminds.local> Message-ID: Hi Kim, You can use eval(parse(text = c)) Best, Ista On Tue, Aug 2, 2011 at 8:22 AM, Kim Lilles?e wrote: > Dear all > > I have a simple R question. How do I execute R-code stored in a variable? > > E.g if I have a variable which contains some R-code: > c = "reg <- lm(sales$sales~sales$price)" > > Is it possible to execute c > E.g like Exec(c) > > I hope someone can help. > > Thank you > Kim Lilles?e > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From pdalgd at gmail.com Tue Aug 2 16:18:40 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 2 Aug 2011 16:18:40 +0200 Subject: [R] Memory limit in Aggregate() In-Reply-To: <1312278358738-3711819.post@n4.nabble.com> References: <1312278358738-3711819.post@n4.nabble.com> Message-ID: <1D3F4706-DFE7-4E0F-95D9-955331EEAAD0@gmail.com> On Aug 2, 2011, at 11:45 , Guillaume wrote: > Dear all, > I am trying to aggregate a table (divided in two lists here), but get a > memory error. > Here is the code I'm running : > > sessionInfo() > > print(paste("memory.limit() ", memory.limit())) > print(paste("memory.size() ", memory.size())) > print(paste("memory.size(TRUE) ", memory.size(TRUE))) > > print(paste("size listX ", object.size(listX))) > print(paste("size listBy ", object.size(listBy))) > print(paste("length ", object.size(nrow(listX)))) > > tableAgg <- aggregate(x = listX > , by = listBy > , FUN = "max") > > > It returns : > > R version 2.9.0 Patched (2009-05-09 r48513) > i386-pc-mingw32 > locale: > LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252 > attached base packages: > [1];stats;graphics;grDevices;utils;datasets;methods;base > other attached packages: > [1];RODBC_1.3-2;HarpTools_1.4;HarpReport_1.9 > loaded via a namespace (and not attached): > [1];tools_2.9.0 > [1];"memory.limit() 4095" > [1];"memory.size() 31.92" > [1];"memory.size(TRUE) 166.94" > [1];"size listX 218312" > [1];"size listBy 408552" > [1];"length 9083" > Erreur in vector("list", prod(extent)) : > cannot allocate vector of length 1224643220 > > (the last line is translated from the french error message "impossible > d'allouer un vecteur de longueur 1224643220" ) > > Why would R create such a long vector (my original lists , and is there a > way to avoid this error ? > It would be easier if you described your data rather than just tell us their size, but as far as I can see, listX has about 50K columns and listBy has 100K. So you are trying to form a table of the max of 50000 variables over the cartesian product of 100000 classifiers? That's basically an infinite number of cells. > Thank you for your help, > > Guillaume > > -- > View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3711819.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From MKarol at syntapharma.com Tue Aug 2 16:32:30 2011 From: MKarol at syntapharma.com (Michael Karol) Date: Tue, 2 Aug 2011 10:32:30 -0400 Subject: [R] Help with aggregate syntax for a multi-column function please. Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dimitri.liakhovitski at gmail.com Tue Aug 2 16:36:47 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Tue, 2 Aug 2011 10:36:47 -0400 Subject: [R] identifying weeks (dates) that certain days (dates) fall into Message-ID: Hello! I have dates for the beginning of each week, e.g.: weekly<-data.frame(week=seq(as.Date("2010-04-01"), as.Date("2011-12-26"),by="week")) week # each week starts on a Monday I also have a vector of dates I am interested in, e.g.: july4<-as.Date(c("2010-07-04","2011-07-04")) I would like to flag the weeks in my weekly$week that contain those 2 individual dates. I can only think of a very clumsy way of doing it: myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7), which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7)) weekly$flag<-0 weekly$flag[myrows]<-1 It's clumsy - because actually, my vector of dates of interest (july4 above) is much longer. Is there maybe a more elegant way of doing it? Thank you! -- Dimitri Liakhovitski marketfusionanalytics.com From drobin at sandia.gov Tue Aug 2 16:50:24 2011 From: drobin at sandia.gov (Robinson, David G) Date: Tue, 2 Aug 2011 14:50:24 +0000 Subject: [R] matrix indexing (igraph ?) Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From langkamp at tomblog.de Tue Aug 2 17:00:57 2011 From: langkamp at tomblog.de (tomtomme) Date: Tue, 2 Aug 2011 08:00:57 -0700 (PDT) Subject: [R] merging lists within lists via time stamp Message-ID: <1312297257318-3712631.post@n4.nabble.com> >From multiple data.frames I created two lists, one with temperature, one with gps data. With your help and lapply I managed to interpolate the timestamps of gps and temperature data. Now I want to merge/join both lists via the time-stamp, taking only times, where both lists have data. For the single data-frames that worked just fine with: both <- merge(gps,temp) For the two lists of data.frames I first tried an lapply over both lists...something like both <- lapply(temp, gps, function(x){x <- merge.... Then I found "both<-merge.list(gps,temp)", but this doesn?t work either. It just transfers the first list "gps" to both Thanks for any hint, Thomas -- View this message in context: http://r.789695.n4.nabble.com/merging-lists-within-lists-via-time-stamp-tp3712631p3712631.html Sent from the R help mailing list archive at Nabble.com. From Samuel.Le at srlglobal.com Tue Aug 2 17:06:29 2011 From: Samuel.Le at srlglobal.com (Samuel Le) Date: Tue, 2 Aug 2011 15:06:29 +0000 Subject: [R] execute r-code stored in a string variable In-Reply-To: <60F5867E29CEBC45A921E3D9B2B7511609796203E3@SERVER1.dataminds.local> References: <60F5867E29CEBC45A921E3D9B2B7511609796203E3@SERVER1.dataminds.local> Message-ID: Yes, you can use: eval(parse(text=c)) On the other hand I would not recommend to use c as a variable name as it is the name of a very important function in the R language to aggregate data. HTH, Samuel -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Kim Lilles?e Sent: 02 August 2011 13:22 To: r-help at R-project.org Subject: [R] execute r-code stored in a string variable Dear all I have a simple R question. How do I execute R-code stored in a variable? E.g if I have a variable which contains some R-code: c = "reg <- lm(sales$sales~sales$price)" Is it possible to execute c E.g like Exec(c) I hope someone can help. Thank you Kim Lilles?e [[alternative HTML version deleted]] __________ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 6275 (20110707) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From dwinsemius at comcast.net Tue Aug 2 17:08:07 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 11:08:07 -0400 Subject: [R] identifying weeks (dates) that certain days (dates) fall into In-Reply-To: References: Message-ID: <1DE170DC-DE3D-45C3-A29D-D171875D3BF1@comcast.net> The findInterval function should surely be tried in some form or another. On Aug 2, 2011, at 10:36 AM, Dimitri Liakhovitski wrote: > Hello! > > I have dates for the beginning of each week, e.g.: > weekly<-data.frame(week=seq(as.Date("2010-04-01"), > as.Date("2011-12-26"),by="week")) > week # each week starts on a Monday > > I also have a vector of dates I am interested in, e.g.: > july4<-as.Date(c("2010-07-04","2011-07-04")) > > I would like to flag the weeks in my weekly$week that contain those 2 > individual dates. > findInterval(july4, weekly$week) [1] 14 66 # works "out of the box" Provides an index you cna use with weekly$week > I can only think of a very clumsy way of doing it: > > myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7), > which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7)) > weekly$flag<-0 > weekly$flag[myrows]<-1 > > It's clumsy - because actually, my vector of dates of interest (july4 > above) is much longer. > Is there maybe a more elegant way of doing it? -- David Winsemius, MD West Hartford, CT From jvadams at usgs.gov Tue Aug 2 17:12:01 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Tue, 2 Aug 2011 10:12:01 -0500 Subject: [R] Help with aggregate syntax for a multi-column function please. In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dcarlson at tamu.edu Tue Aug 2 17:36:10 2011 From: dcarlson at tamu.edu (David L Carlson) Date: Tue, 2 Aug 2011 10:36:10 -0500 Subject: [R] Loops to assign a unique ID to a column In-Reply-To: <7F7C4ABC5772344991D04CD652ED180BBE65EE@EXMS2.staff.ad.curtin.edu.au> References: <4E379307.2020208@xtra.co.nz> <7F7C4ABC5772344991D04CD652ED180BBE65EE@EXMS2.staff.ad.curtin.edu.au> Message-ID: <003f01cc5129$e7f9d280$b7ed7780$@edu> How about this? > indx <- unique(cbind(Dates, Groups)) > indx Dates Groups [1,] "12/10/2010" "A" [2,] "12/10/2010" "B" [3,] "13/10/2010" "A" [4,] "13/10/2010" "B" [5,] "13/10/2010" "C" > indx <- data.frame(indx, id=1:nrow(indx)) > indx Dates Groups id 1 12/10/2010 A 1 2 12/10/2010 B 2 3 13/10/2010 A 3 4 13/10/2010 B 4 5 13/10/2010 C 5 > newdata <- merge(data, indx) > newdata Dates Groups id 1 12/10/2010 A 1 2 12/10/2010 B 2 3 12/10/2010 B 2 4 13/10/2010 A 3 5 13/10/2010 B 4 6 13/10/2010 C 5 ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Chandra Salgado Kent Sent: Tuesday, August 02, 2011 2:12 AM To: r-help at r-project.org Subject: [R] Loops to assign a unique ID to a column Dear R help, I am fairly new in data management and programming in R, and am trying to write what is probably a simple loop, but am not having any luck. I have a dataframe with something like the following (but much bigger): Dates<-c("12/10/2010","12/10/2010","12/10/2010","13/10/2010", "13/10/2010", "13/10/2010") Groups<-c("A","B","B","A","B","C") data<-data.frame(Dates, Groups) I would like to create a new column in the dataframe, and give each distinct date by group a unique identifying number starting with 1, so that the resulting column would look something like: ID<-c(1,2,2,3,4,5) The loop that I have started to write is something like this (but doesn't work!): data$ID<-as.number(c()) for(i in unique(data$Dates)){ for(j in unique(data$Groups)){ data$ID[i,j]<-i i<-i+1 } } Am I on the right track? Any help on this is much appreciated! Chandra [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From anna.sramkova at iee.unibe.ch Tue Aug 2 15:40:43 2011 From: anna.sramkova at iee.unibe.ch (Sramkova, Anna (IEE)) Date: Tue, 2 Aug 2011 13:40:43 +0000 Subject: [R] vglm: warnings and errors Message-ID: Hello, I am using multinomial logit regression for the first time, and I am trying to understand the warnings and errors I get. My data consists of 200 to 600 samples with ~25 predictors (these are principal components). The response has three categories. I use the function "vglm" from the package VGAM, called as follows: fit1<-vglm(fmla, data=tr, multinomial,weights=regwt, maxit=500) "regwt" are Epanechnikov weights In general, the regression works, but - often, one of the categories has posterior probability zero, but the remaining two probabilities are non-zero (although very small) - I receive many warnings of the following type: "in checkwz(wz, m = M , trace = trace, wzeps = control$wzepsilo): n elements replaced by 1.819e-12" " in tfun(mu = mu, y = y, w =w, res = FALSE, eta = eta, ...: fitted values close to 0 or 1" ... if I understand it correctly, these have to do with the variance of the predictions being too small? - In some cases, I get an error: "Error in devmu[smallmu] = smy * log(smu): NAs are not allowed in subscripted arguments", sometimes this error goes away when I decrease the size of the training set. I would like to know if this is expected behavior for some types of data sets. The manual to VGAM states that "multinomial is prone to numerical difficulties if the groups are separable and/or fitted probabilities are close to 0 or 1", but does not explain why. The latter could be my case. I have to run the regression on 10,000s of data sets, so I would like to find a setting in which things go smoothly (i.e. without errors) I realize that this is probably more of a methodological than technical question, but maybe you can give some rules of thumb about a suitable number of samples/predictors or point me to some literature that would help me understand my problems. Thanks Anna From anthony.ch.ng at gmail.com Tue Aug 2 16:42:08 2011 From: anthony.ch.ng at gmail.com (Anthony Ching Ho Ng) Date: Tue, 2 Aug 2011 22:42:08 +0800 Subject: [R] Display/show the evaluation result of R commands automatically In-Reply-To: References: Message-ID: R-help and Barry Thank you for your suggestions. It works, and may I ask how I am able to do the opposite (disable the call back, so that I could control when to show and suppress the output). I would like to make a function to enable/disable the callback similar to the one as follow: enableOutput <- function() { h <<- taskCallbackManager() h$add(function(expr, value, ok, visible) {if(!visible){print(value)};TRUE}) } disableOutput <- function() { ........ } This shows output feature (and use '';" to suppress the output) is the default behavior of Matlab which I find it quite useful (without having to type in the variable name again every time to see the result of the expression). So I am just curious to know how to do it in R. Best Regards, Anthony On 31 July 2011 20:16, Barry Rowlingson wrote: >> h <- taskCallbackManager() >> h$add(function(expr, value, ok, visible) {if(!visible){print(value)};TRUE}) > > > On Sun, Jul 31, 2011 at 12:15 PM, Anthony Ching Ho Ng > wrote: >> Hello R-help, >> >> I wonder if it is possible to configure R, so that it will >> display/show the evaluation result of the R commands automatically >> (similar to the behavior of Matlab) >> >> i.e. If I type x <- 8 >> >> it will print 8 in the command prompt, instead of having type x >> explicitly to show the result and perhaps put an ";" at the end to >> suppress the output. >> >> i.e. x <- 8; >> > > The first thing I think you can do by adding a task callback manager > to print the value if the value would otherwise be invisible: > > ?> h <- taskCallbackManager() > ?> h$add(function(expr, value, ok, visible) {if(!visible){print(value)};TRUE}) > > ?The semicolon thing would probably need rewriting bits of R at the C > code level. > > ?I don't think many people would use it though. And my code above > might break things. I don't use it. > > Barry > From ggrothendieck at gmail.com Tue Aug 2 17:42:42 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Tue, 2 Aug 2011 11:42:42 -0400 Subject: [R] identifying weeks (dates) that certain days (dates) fall into In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 10:36 AM, Dimitri Liakhovitski wrote: > Hello! > > I have dates for the beginning of each week, e.g.: > weekly<-data.frame(week=seq(as.Date("2010-04-01"), > as.Date("2011-12-26"),by="week")) > week ?# each week starts on a Monday > > I also have a vector of dates I am interested in, e.g.: > july4<-as.Date(c("2010-07-04","2011-07-04")) > > I would like to flag the weeks in my weekly$week that contain those 2 > individual dates. > I can only think of a very clumsy way of doing it: > > myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7), > ? ? ? ?which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7)) > weekly$flag<-0 > weekly$flag[myrows]<-1 > > It's clumsy - because actually, my vector of dates of interest (july4 > above) is much longer. > Is there maybe a more elegant way of doing it? > Thank you! This gives myrows: as.numeric(july4 - weekly[1,1]) %/% 7 + 1 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From chakri2sai at yahoo.co.in Tue Aug 2 15:30:33 2011 From: chakri2sai at yahoo.co.in (chakri) Date: Tue, 2 Aug 2011 06:30:33 -0700 (PDT) Subject: [R] Standard Deviation of a matrix In-Reply-To: References: <1312283269035-3711991.post@n4.nabble.com> <4E37E85E.2050603@knmi.nl> Message-ID: <1312291833202-3712328.post@n4.nabble.com> Thank you everyone for your kind input, I forgot to add that I have decimal points in my matrix ! Enclosed input file (reduced to 10 X 10 matrix), scripts and output for your suggesions: Code 1: library(stats) Matrix<-read.table("test_input", head=T, sep=" ", dec=".") SD<-sd(as.numeric(Matrix)) SD Output 1: > library(stats) > Matrix<-read.table("test_input", head=T, sep="\t", dec=".") > SD<-sd(as.numeric(Matrix)) Error in sd(as.numeric(Matrix)) : (list) object cannot be coerced to type 'double' Execution halted Code 2: library(stats) Matrix<-read.table("test_input", head=T, sep="\t", dec=".") dim(Matrix)<-1 SD<-sd(Matrix) SD Output: > library(stats) > Matrix<-read.table("test_input", head=T, sep="\t", dec=".") > dim(Matrix)<-1 Error in dim(Matrix) <- 1 : dims [product 1] do not match the length of object [10] Execution halted Code 3: library(stats) Matrix<-read.table("test_input", head=T, sep="\t", dec=".") SD<-sd(c(Matrix)) SD Output: > library(stats) > Matrix<-read.table("test_input", head=T, sep="\t", dec=".") > SD<-sd(c(Matrix)) Error: is.atomic(x) is not TRUE Execution halted Any ideas, what am I missing here ? TIA chakri Input file: http://r.789695.n4.nabble.com/file/n3712328/test_input test_input -- View this message in context: http://r.789695.n4.nabble.com/Standard-Deviation-of-a-matrix-tp3711991p3712328.html Sent from the R help mailing list archive at Nabble.com. From gantkant at walla.com Tue Aug 2 16:14:24 2011 From: gantkant at walla.com (=?UTF-8?Q?=D7=A8=D7=90=D7=95=D7=91=D7=9F=20=D7=90=D7=91=D7=A8=D7=9E=D7=95=D7=91=D7=99=D7=A5?=) Date: Tue, 2 Aug 2011 17:14:24 +0300 Subject: [R] =?utf-8?q?how_to_get_the_percentile_of_a_number_in__a_vector?= Message-ID: <1312294464.309000-90527180-29456@walla.com> I'm familiar with the quantile() command, but what if I have a specific number that I want to know its location in a vector? I know that in known distributions, (for example the normal distribution), there is pnorm and qnorm, but how can I do it with unknown vector? thanks in advance _________________________________________________________________ Walla! Mail - [1]Get your free unlimited mail today References 1. http://www.walla.co.il/ From guillaume_bs at hotmail.com Tue Aug 2 17:10:14 2011 From: guillaume_bs at hotmail.com (Guillaume) Date: Tue, 2 Aug 2011 08:10:14 -0700 (PDT) Subject: [R] Memory limit in Aggregate() In-Reply-To: <1D3F4706-DFE7-4E0F-95D9-955331EEAAD0@gmail.com> References: <1312278358738-3711819.post@n4.nabble.com> <1D3F4706-DFE7-4E0F-95D9-955331EEAAD0@gmail.com> Message-ID: <1312297814593-3712671.post@n4.nabble.com> Hi Peter, Thanks for your answer. I made a mistake in the script I copied.... sorry ! The description of the object : listX has 3 column, listBy has 4 column, and they have 9000 rows : print(paste("ncol x ", length((listX)))) print(paste("ncol By ", length((listBy)))) print(paste("nrow ", length((listX[[1]])))) [1];"ncol x 3" [1];"ncol By 4" [1];"nrow 9083" It seems the "large" (=4) number of columns in listBy creates the troubles... Thanks, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3712671.html Sent from the R help mailing list archive at Nabble.com. From kathryn.lord2000 at gmail.com Tue Aug 2 17:37:01 2011 From: kathryn.lord2000 at gmail.com (Kathie) Date: Tue, 2 Aug 2011 08:37:01 -0700 (PDT) Subject: [R] My R code is not efficient Message-ID: <1312299421934-3712762.post@n4.nabble.com> Dear R users, I have two n*1 integer vectors, y1 and y2, where n is very very large. I'd like to compute elbp = 4^(y1) * 5^(y2) * sum_{i=0}^{max(y1, y2)} [{ (y1-i)! * (i)! * (y2-i)! }^(-1)]; that is, I need to compute "elbp" for each (y1, y2) pair. So I made R code like below, but I don't think it's efficient Would you plz tell me how to avoid this "for" loop blow?? ---------------------------------------------------------------------------------------------- for (k in 1:n){ ymax <- max( y1[k], y2[k] ) i <- 0:ymax sums<- -lgamma(y1[k]-i+1)-lgamma(i+1)-lgamma(y2[k]-i+1) maxsums <- max(sums) sums <- sums - maxsums lsum <- log( sum(exp(sums)) ) + maxsums lbp[k] <- y1[k]*log(4) + y2[k]*log(5) + lsum } elbp <- exp(lbp) ------------------------------------------------------------------------------------ Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/My-R-code-is-not-efficient-tp3712762p3712762.html Sent from the R help mailing list archive at Nabble.com. From miranda at orn.mpg.de Tue Aug 2 17:26:21 2011 From: miranda at orn.mpg.de (Catarina Miranda) Date: Tue, 2 Aug 2011 17:26:21 +0200 Subject: [R] Extract p value from coxme object Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jdnewmil at dcn.davis.ca.us Tue Aug 2 17:53:24 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Tue, 02 Aug 2011 08:53:24 -0700 Subject: [R] My R code is not efficient In-Reply-To: <1312299421934-3712762.post@n4.nabble.com> References: <1312299421934-3712762.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Tue Aug 2 18:00:28 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 12:00:28 -0400 Subject: [R] how to get the percentile of a number in a vector In-Reply-To: <1312294464.309000-90527180-29456@walla.com> References: <1312294464.309000-90527180-29456@walla.com> Message-ID: On Aug 2, 2011, at 10:14 AM, ????? ???????? wrote: > > I'm familiar with the quantile() command, but what if I have a > specific > number that I want to know its location in a vector? I know that > in known > distributions, (for example the normal distribution), there is > pnorm and > qnorm, but how can I do it with unknown vector? ?ecdf -- David Winsemius, MD West Hartford, CT From NordlDJ at dshs.wa.gov Tue Aug 2 18:07:36 2011 From: NordlDJ at dshs.wa.gov (Nordlund, Dan (DSHS/RDA)) Date: Tue, 2 Aug 2011 09:07:36 -0700 Subject: [R] Standard Deviation of a matrix In-Reply-To: <1312291833202-3712328.post@n4.nabble.com> References: <1312283269035-3711991.post@n4.nabble.com> <4E37E85E.2050603@knmi.nl> <1312291833202-3712328.post@n4.nabble.com> Message-ID: <941871A13165C2418EC144ACB212BDB002019F0C@dshsmxoly1504g.dshs.wa.lcl> > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of chakri > Sent: Tuesday, August 02, 2011 6:31 AM > To: r-help at r-project.org > Subject: Re: [R] Standard Deviation of a matrix > > Thank you everyone for your kind input, > > I forgot to add that I have decimal points in my matrix ! > > Enclosed input file (reduced to 10 X 10 matrix), scripts and output for > your > suggesions: > > Code 1: > library(stats) > Matrix<-read.table("test_input", head=T, sep=" ", dec=".") > SD<-sd(as.numeric(Matrix)) > SD First, your data attachment did not come through the list. Second, decimals are not a problem. Third, you don't have a matrix, you have a data frame (read.table produces data frames). As long as all columns are numeric you could do something like sd(c(as.matrix(m))) You could also convert to a matrix on input if you really don't need a dataframe for different column types. Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 From michael.weylandt at gmail.com Tue Aug 2 18:07:38 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Tue, 2 Aug 2011 12:07:38 -0400 Subject: [R] how to get the percentile of a number in a vector In-Reply-To: <1312294464.309000-90527180-29456@walla.com> References: <1312294464.309000-90527180-29456@walla.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From hpaul.benton08 at imperial.ac.uk Tue Aug 2 18:10:25 2011 From: hpaul.benton08 at imperial.ac.uk (Benton, Paul) Date: Tue, 2 Aug 2011 16:10:25 +0000 Subject: [R] SSOAP & chemspider In-Reply-To: References: Message-ID: <7595D702-E0F2-4F16-9793-488357A61A9E@imperial.ac.uk> Has anyone got SSOAP working on anything besides KEGG? I just tried another 3 SOAP servers. Both the WSDL and constructing the .SOAP call. Again the perl and ruby interface worked without any hitches. Paul > library(SSOAP) > massBank<-processWSDL("http://www.massbank.jp/api/services/MassBankAPI?wsdl") Error in parse(text = paste(txt, collapse = "\n")) : :1:29: unexpected input 1: function(x, ..., obj = new( ? ^ In addition: Warning message: In processWSDL("http://www.massbank.jp/api/services/MassBankAPI?wsdl") : Ignoring additional ... elements > > metlin<-processWSDL("http://metlin.scripps.edu/soap/metlin.wsdl") Error in parse(text = paste(txt, collapse = "\n")) : :1:29: unexpected input 1: function(x, ..., obj = new( ? ^ > pubchem<-processWSDL("http://pubchem.ncbi.nlm.nih.gov/pug_soap/pug_soap.cgi?wsdl") Error in parse(text = paste(txt, collapse = "\n")) : :1:29: unexpected input 1: function(x, ..., obj = new( ? ^ On 20 Jul 2011, at 01:54, Benton, Paul wrote: > Dear all, > > I've been trying on and off for the past few months to get SSOAP to work with chemspider. First I tried the WSDL file: > > cs<-processWSDL("http://www.chemspider.com/MassSpecAPI.asmx?WSDL") > Error in parse(text = paste(txt, collapse = "\n")) : > :1:29: unexpected input > 1: function(x, ..., obj = new( ? > ^ > In addition: Warning message: > In processWSDL("http://www.chemspider.com/MassSpecAPI.asmx?WSDL") : > Ignoring additional ... elements > > Next I've tried using just the pure .SOAP to call the database. > > s <- SOAPServer("http://www.chemspider.com/MassSpecAPI.asmx") > csid<- .SOAP(s, "SearchByMass2", mass=89.04767, range=0.01, > action = I("http://www.chemspider.com/SearchByMass2"), > xmlns = c("http://www.chemspider.com"), .opts = list(verbose = TRUE)) > > This seems to work and gives back a result. However, this result isn't the right result. It's seems to have converted the mass into 0. When I run the similar program in perl I get the correct id's. So this isn't a server side problem but SSOAP. Any thoughts or suggestions on other packages to use? > Further infomation about the SeachByMass2 method and it's xml that it's expecting. > http://www.chemspider.com/MassSpecAPI.asmx?op=SearchByMass2 > > Cheers, > > > Paul > > > PS Placing a fake error in the .SOAP code I can look at the xml it's sending to the server: > Browse[1]> doc > > > > > 89.04767 > 0.01 > > > From jvadams at usgs.gov Tue Aug 2 18:10:28 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Tue, 2 Aug 2011 11:10:28 -0500 Subject: [R] how to get the percentile of a number in a vector In-Reply-To: <1312294464.309000-90527180-29456@walla.com> References: <1312294464.309000-90527180-29456@walla.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pdalgd at gmail.com Tue Aug 2 18:26:01 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 2 Aug 2011 18:26:01 +0200 Subject: [R] Memory limit in Aggregate() In-Reply-To: <1312297814593-3712671.post@n4.nabble.com> References: <1312278358738-3711819.post@n4.nabble.com> <1D3F4706-DFE7-4E0F-95D9-955331EEAAD0@gmail.com> <1312297814593-3712671.post@n4.nabble.com> Message-ID: <7BE2D45D-B379-49E4-8A51-25914C7569EE@gmail.com> On Aug 2, 2011, at 17:10 , Guillaume wrote: > Hi Peter, > Thanks for your answer. > I made a mistake in the script I copied.... sorry ! > > The description of the object : listX has 3 column, listBy has 4 column, and So what is the contents of listBy? If they are all factors with 100 levels, then you're looking at a table with 10^8 entries... > they have 9000 rows : > > print(paste("ncol x ", length((listX)))) > print(paste("ncol By ", length((listBy)))) > print(paste("nrow ", length((listX[[1]])))) > > [1];"ncol x 3" > [1];"ncol By 4" > [1];"nrow 9083" > > It seems the "large" (=4) number of columns in listBy creates the > troubles... > > Thanks, > Guillaume > > -- > View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3712671.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From djmuser at gmail.com Tue Aug 2 18:23:13 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 09:23:13 -0700 Subject: [R] Help with aggregate syntax for a multi-column function please. In-Reply-To: References: Message-ID: Hi: Another way to do this is to use one of the summarization packages. The following uses the plyr package. The first step is to create a function that takes a data frame as input and outputs either a data frame or a scalar. In this case, the function returns a scalar, but if you want to carry along additional variables in the output, you can replace it with a data frame that returns the set of variables you want. You don't need to return the grouping variables, but no harm is done if you do. # This assumes the existence of a function AUC with the arguments # you stated in your post. I presume it returns a scalar value; if not, # you should modify it to return a data frame instead. It would probably # be better to modify AUC and call it in ddply() directly, but without the # function code there's not much one can do... myAUC <- function(df) AUC(df, 'TimeBestEstimate', 'Pt','ConcentrationBQLzero') library('plyr') ddply(PKdata, .(Cycle, DoseDayNominal, Drug), myAUC) This is obviously untested, so caveat emptor. Both plyr and data.table can accept functions with multiple arguments and do the right thing. The trick in plyr is to write a function that takes a generic input object (e.g., a (sub)data frame) and then uses (the variables within) it to do the necessary calculations. Generally, you want the output of the function to be compatible with the type of output you want from the **ply() function. In this case, ddply() means data frame input, data frame output; alply() would mean array input and list output, etc. If this doesn't work, please provide a reproducible example. HTH, Dennis On Tue, Aug 2, 2011 at 7:32 AM, Michael Karol wrote: > Dear R-experts: > > > > I am using a function called AUC whose arguments are data, time, id, and > dv. > > data is the name of the dataframe, > time is the independent variable column name, > id is the subject id and > dv is the dependent variable. > > The function computes area under the curve by trapezoidal rule, for each > subject id. > > I would like to embed this in aggregate to further subset by each Cycle, > DoseDayNominal and Drug, but I can't seem to get the aggregate syntax > correct. ?All the examples I can find use single column function such as > mean, whereas this AUC function requires four arguments. > > Could someone kindly show me the syntax? > > This is what I've tried so far: > > AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, > PKdata$Drug), > ? ? ? ? ? ? ? ? ? function(x,tm,pt,conc) {AUC(x)}, > tm="TimeBestEstimate", pt="Pt", conc="ConcentrationBQLzero" ) > > AUC.DF<- aggregate(PKdata, list(PKdata$Cycle, PKdata$DoseDayNominal, > PKdata$Drug), > ? ? ? ? ? ? ? ? ? function(x) {AUC(x,"TimeBestEstimate", "Pt", > "ConcentrationBQLzero" )} ) > > AUC syntax is: > args(AUC) > function (data, time = "TIME", id = "ID", dv = "DV") > > > thanks > > > > Regards, > > Michael > > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From Thorn.Thaler at rdls.nestle.com Tue Aug 2 18:07:35 2011 From: Thorn.Thaler at rdls.nestle.com (Thaler, Thorn, LAUSANNE, Applied Mathematics) Date: Tue, 2 Aug 2011 18:07:35 +0200 Subject: [R] lattice: index plot Message-ID: Dear all, How can I make an index plot with lattice, that is plotting a vector simply against its particular index in the vector, i.e. something similar to y <- rnorm(10) plot(y) I don't want to specify the x's manually, as this could become cumbersome when having multiple panels. I tried something like library(lattice) mp <- function(x, y, ...) { x <- 1:length(y) panel.xyplot(x, y, ...) } pp <- function(x, y, ...) { list(xlim = extendrange(1:length(y)), ylim = extendrange(y)) } set.seed(123) y <- rnorm(10) xyplot(y ~ 1, panel = mp, prepanel = pp, xlab="Index") but I was wondering whether there is a more straightforward way? By the way, if I do not specify the ylim in the prepanel function the plot is clipped, but reading Deepayan's book, p.140 : "[...], so a user-specified prepanel function is not required to return all of these components [i.e. xlim, ylim, xat, yat, dx and dy]; any missing component will be replaced by the corresponding default." I'd understand that if I do not specify ylim it is calculated automatically? Not a big thing though, but it seems to me to be inconsistent. Any help appreciated. KR, -Thorn From gunter.berton at gene.com Tue Aug 2 18:31:41 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 2 Aug 2011 09:31:41 -0700 Subject: [R] Loops to assign a unique ID to a column In-Reply-To: <003f01cc5129$e7f9d280$b7ed7780$@edu> References: <4E379307.2020208@xtra.co.nz> <7F7C4ABC5772344991D04CD652ED180BBE65EE@EXMS2.staff.ad.curtin.edu.au> <003f01cc5129$e7f9d280$b7ed7780$@edu> Message-ID: Whoa! 1. First and most important, there is very likely no reason you need to do this. R can handle multiple groupings automatically in fitting and plotting without creating artificial labels of the sort you appear to want to create. Please read an "Intro to R" and/or get help to see how. 2. The "solution" offered below is unnecessarily convoluted. Here is a simpler and faster one: z <- within(z, indx <- as.numeric(interaction(Dates,Groups, drop=TRUE, lex.order=TRUE))) Explanation: interaction() produces all possible combinations the individual groupings; drop=FALSE throws away any unused combinations, lex.order-TRUE lexicographically orders the levels as you indicated. ?interaction for details. By default, the result of the above is a factor, which as.numeric() converts to the numeric codes used in factor representations. ?factor . Finally, within() interprets and makes changes within z. The changed result is then assigned back to z so that it is not lost. ?within Cheers, Bert On Tue, Aug 2, 2011 at 8:36 AM, David L Carlson wrote: > How about this? > >> indx <- unique(cbind(Dates, Groups)) >> indx > ? ? Dates ? ? ? ?Groups > [1,] "12/10/2010" "A" > [2,] "12/10/2010" "B" > [3,] "13/10/2010" "A" > [4,] "13/10/2010" "B" > [5,] "13/10/2010" "C" > >> indx <- data.frame(indx, id=1:nrow(indx)) >> indx > ? ? ? Dates Groups id > 1 12/10/2010 ? ? ?A ?1 > 2 12/10/2010 ? ? ?B ?2 > 3 13/10/2010 ? ? ?A ?3 > 4 13/10/2010 ? ? ?B ?4 > 5 13/10/2010 ? ? ?C ?5 > >> newdata <- merge(data, indx) >> newdata > ? ? ? Dates Groups id > 1 12/10/2010 ? ? ?A ?1 > 2 12/10/2010 ? ? ?B ?2 > 3 12/10/2010 ? ? ?B ?2 > 4 13/10/2010 ? ? ?A ?3 > 5 13/10/2010 ? ? ?B ?4 > 6 13/10/2010 ? ? ?C ?5 > > ---------------------------------------------- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On > Behalf Of Chandra Salgado Kent > Sent: Tuesday, August 02, 2011 2:12 AM > To: r-help at r-project.org > Subject: [R] Loops to assign a unique ID to a column > > Dear R help, > > > > I am fairly new in data management and programming in R, and am trying to > write what is probably a simple loop, but am not having any luck. I have a > dataframe with something like the following (but much bigger): > > > > Dates<-c("12/10/2010","12/10/2010","12/10/2010","13/10/2010", "13/10/2010", > "13/10/2010") > > Groups<-c("A","B","B","A","B","C") > > data<-data.frame(Dates, Groups) > > > > I would like to create a new column in the dataframe, and give each distinct > date by group a unique identifying number starting with 1, so that the > resulting column would look something like: > > > > ID<-c(1,2,2,3,4,5) > > > > The loop that I have started to write is something like this (but doesn't > work!): > > > > data$ID<-as.number(c()) > > for(i in unique(data$Dates)){ > > ?for(j in unique(data$Groups)){ data$ID[i,j]<-i > > ?i<-i+1 > > ?} > > } > > > > Am I on the right track? > > > > Any help on this is much appreciated! > > > > Chandra > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From djmuser at gmail.com Tue Aug 2 18:34:55 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 09:34:55 -0700 Subject: [R] identifying weeks (dates) that certain days (dates) fall into In-Reply-To: References: Message-ID: Hi: You could try the lubridate package: library(lubridate) week(weekly$week) week(july4) [1] 27 27 > week function (x) yday(x)%/%7 + 1 which is essentially Gabor's code :) HTH, Dennis On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski wrote: > Hello! > > I have dates for the beginning of each week, e.g.: > weekly<-data.frame(week=seq(as.Date("2010-04-01"), > as.Date("2011-12-26"),by="week")) > week ?# each week starts on a Monday > > I also have a vector of dates I am interested in, e.g.: > july4<-as.Date(c("2010-07-04","2011-07-04")) > > I would like to flag the weeks in my weekly$week that contain those 2 > individual dates. > I can only think of a very clumsy way of doing it: > > myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7), > ? ? ? ?which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7)) > weekly$flag<-0 > weekly$flag[myrows]<-1 > > It's clumsy - because actually, my vector of dates of interest (july4 > above) is much longer. > Is there maybe a more elegant way of doing it? > Thank you! > -- > Dimitri Liakhovitski > marketfusionanalytics.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From pdalgd at gmail.com Tue Aug 2 18:52:18 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 2 Aug 2011 18:52:18 +0200 Subject: [R] Inserting column in between -- "better" way? In-Reply-To: <013701cc507b$d464d840$7d2e88c0$@edu> References: <013701cc507b$d464d840$7d2e88c0$@edu> Message-ID: <5FB1DE96-0F6B-4AEC-907C-92CE5CD956CA@gmail.com> On Aug 1, 2011, at 20:50 , David L Carlson wrote: > Actually Sara's method fails if the insertion is after the first or before > the last column: > >> x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) >> newcol <- 4:6 >> cbind(x[,1], newcol, x[,2:ncol(x)]) > Sarah (sic) is on the right track, just lose the commas so that you don't drop to a vector: > x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) > newcol <- 4:6 > cbind(x[1], newcol, x[2:ncol(x)]) A newcol B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 Also notice that there is a named form of cbind > cbind(x[1], foo=4:6, x[2:ncol(x)]) A foo B C D E 1 1 4 1 1 1 1 2 2 5 2 2 2 2 3 3 6 3 3 3 3 and that things will work (mostly) with matrices and data frames too: > newcol <- data.frame(x=4:6,y=6:4) > cbind(x[1], newcol, x[2:ncol(x)]) A x y B C D E 1 1 4 6 1 1 1 1 2 2 5 5 2 2 2 2 3 3 6 4 3 3 3 3 > cbind(x[1], as.matrix(newcol), x[2:ncol(x)]) A x y B C D E 1 1 4 6 1 1 1 1 2 2 5 5 2 2 2 2 3 3 6 4 3 3 3 3 (The "mostly" bit refers to some slight oddness occurring if you cbind a matrix with no column names: > cbind(x[1], cbind(4:6,7:9), x[2:ncol(x)]) A 1 2 B C D E 1 1 4 7 1 1 1 1 2 2 5 8 2 2 2 2 3 3 6 9 3 3 3 3 ) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From studentofr at gmail.com Tue Aug 2 18:51:02 2011 From: studentofr at gmail.com (r student) Date: Tue, 2 Aug 2011 09:51:02 -0700 Subject: [R] density plot for weighted data Message-ID: I'm trying to create a density plot using census data, where the weights don't sum to 1. >plot(density(oh$FINCP,weights=oh$PWGTP)) Warning message: In density.default(oh$FINCP, weights = oh$PWGTP) : sum(weights) != 1 -- will not get true density How would I go about doing this? Thanks! From dimitri.liakhovitski at gmail.com Tue Aug 2 18:57:03 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Tue, 2 Aug 2011 12:57:03 -0400 Subject: [R] identifying weeks (dates) that certain days (dates) fall into In-Reply-To: References: Message-ID: Thanks a lot, everyone! Dimitri On Tue, Aug 2, 2011 at 12:34 PM, Dennis Murphy wrote: > Hi: > > You could try the lubridate package: > > library(lubridate) > week(weekly$week) > week(july4) > [1] 27 27 > >> week > function (x) > yday(x)%/%7 + 1 > > > which is essentially Gabor's code :) > > HTH, > Dennis > > On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski > wrote: >> Hello! >> >> I have dates for the beginning of each week, e.g.: >> weekly<-data.frame(week=seq(as.Date("2010-04-01"), >> as.Date("2011-12-26"),by="week")) >> week ?# each week starts on a Monday >> >> I also have a vector of dates I am interested in, e.g.: >> july4<-as.Date(c("2010-07-04","2011-07-04")) >> >> I would like to flag the weeks in my weekly$week that contain those 2 >> individual dates. >> I can only think of a very clumsy way of doing it: >> >> myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7), >> ? ? ? ?which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7)) >> weekly$flag<-0 >> weekly$flag[myrows]<-1 >> >> It's clumsy - because actually, my vector of dates of interest (july4 >> above) is much longer. >> Is there maybe a more elegant way of doing it? >> Thank you! >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > -- Dimitri Liakhovitski marketfusionanalytics.com From ehlers at ucalgary.ca Tue Aug 2 19:01:43 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Tue, 02 Aug 2011 10:01:43 -0700 Subject: [R] lattice: index plot In-Reply-To: References: Message-ID: <4E382D77.1000700@ucalgary.ca> Does xyplot(y ~ seq_along(y), xlab = "Index") do what you want? Peter Ehlers On 2011-08-02 09:07, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote: > Dear all, > > How can I make an index plot with lattice, that is plotting a vector > simply against its particular index in the vector, i.e. something > similar to > > y<- rnorm(10) > plot(y) > > I don't want to specify the x's manually, as this could become > cumbersome when having multiple panels. > > I tried something like > > library(lattice) > mp<- function(x, y, ...) { > x<- 1:length(y) > panel.xyplot(x, y, ...) > } > > pp<- function(x, y, ...) { > list(xlim = extendrange(1:length(y)), ylim = extendrange(y)) > } > > set.seed(123) > y<- rnorm(10) > xyplot(y ~ 1, panel = mp, prepanel = pp, xlab="Index") > > but I was wondering whether there is a more straightforward way? > > By the way, if I do not specify the ylim in the prepanel function the > plot is clipped, but reading Deepayan's book, p.140 : > > "[...], so a user-specified prepanel function is not required to return > all of these components [i.e. xlim, ylim, xat, yat, dx and dy]; any > missing component will be replaced by the corresponding default." > > I'd understand that if I do not specify ylim it is calculated > automatically? Not a big thing though, but it seems to me to be > inconsistent. > > Any help appreciated. > > KR, > > -Thorn > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Tue Aug 2 19:06:43 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 13:06:43 -0400 Subject: [R] density plot for weighted data In-Reply-To: References: Message-ID: On Aug 2, 2011, at 12:51 PM, r student wrote: > I'm trying to create a density plot using census data, where the > weights don't sum to 1. > > >> plot(density(oh$FINCP,weights=oh$PWGTP)) > > > Warning message: > In density.default(oh$FINCP, weights = oh$PWGTP) : > sum(weights) != 1 -- will not get true density > > > How would I go about doing this? Wouldn't you just divide by the sum? -- David Winsemius, MD West Hartford, CT From guillaume_bs at hotmail.com Tue Aug 2 19:09:20 2011 From: guillaume_bs at hotmail.com (Guillaume) Date: Tue, 2 Aug 2011 10:09:20 -0700 (PDT) Subject: [R] Memory limit in Aggregate() In-Reply-To: <7BE2D45D-B379-49E4-8A51-25914C7569EE@gmail.com> References: <1312278358738-3711819.post@n4.nabble.com> <1D3F4706-DFE7-4E0F-95D9-955331EEAAD0@gmail.com> <1312297814593-3712671.post@n4.nabble.com> <7BE2D45D-B379-49E4-8A51-25914C7569EE@gmail.com> Message-ID: <1312304960604-3713042.post@n4.nabble.com> Hi Peter, Yes I have a large number of factors in the listBy table. Do you mean that aggregate() creates a complete cartesian product of the "by" columns ? (and creates combinations of values that do not exist in the orignial "by" table, before removing them when returning the aggregated table?) Thanks a lot, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3713042.html Sent from the R help mailing list archive at Nabble.com. From jagzbell at yahoo.com Tue Aug 2 19:00:56 2011 From: jagzbell at yahoo.com (Jagz Bell) Date: Tue, 2 Aug 2011 10:00:56 -0700 (PDT) Subject: [R] Data frame to matrix - revisited Message-ID: <1312304456.51386.YahooMailNeo@web121802.mail.ne1.yahoo.com> Hi, I've tried to look through all the previous related Threads/posts but can't find a solution to what's probably a simple question. ? I have a data frame comprised of three columns e.g.: ? ID1?ID2?Value a?b?1 b?d?1 c?a?2 c?e?1 d?a?1 e?d?2 ? I'd like to convert the data to a matrix i.e.: ? ?a b c d e a n/a 1 2 1 n/a b 1 n/a n/a 1 n/a? c 2 n/a n/a n/a 1 d 1 1 n/a n/a 2 e n/a n/a 1 2 n/a ? Any help is much appreciated, ? Jagz From gunter.berton at gene.com Tue Aug 2 19:17:56 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 2 Aug 2011 10:17:56 -0700 Subject: [R] Inserting column in between -- "better" way? In-Reply-To: <5FB1DE96-0F6B-4AEC-907C-92CE5CD956CA@gmail.com> References: <013701cc507b$d464d840$7d2e88c0$@edu> <5FB1DE96-0F6B-4AEC-907C-92CE5CD956CA@gmail.com> Message-ID: Thanks for this Peter: > > Sarah (sic) is on the right track, just lose the commas so that you don't drop to a vector: > >> x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) >> newcol <- 4:6 >> cbind(x[1], newcol, x[2:ncol(x)]) > ?A newcol B C D E > 1 1 ? ? ?4 1 1 1 1 > 2 2 ? ? ?5 2 2 2 2 > 3 3 ? ? ?6 3 3 3 3 > Am I correct in saying that this is a bit subtle: x[1] and x[2:ncol(x)] are actually lists with vector components; so you're cbinding lists, which retain the labels, no? If so, it's a nice subtlety to remember, anyway. -- Bert -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From dwinsemius at comcast.net Tue Aug 2 19:29:06 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 13:29:06 -0400 Subject: [R] density plot for weighted data In-Reply-To: References: Message-ID: <841DBA7E-1E01-4000-89DD-A2D0DFEE07FC@comcast.net> On Aug 2, 2011, at 1:11 PM, r student wrote: > Like below? > > plot(density(oh$FINCP,weights=oh$PWGTP/sum(oh$PWGTP))) I don't understand why you are asking for approval. You are the one with the data and know where they came from. We have none of that background. -- David. > On Tue, Aug 2, 2011 at 10:06 AM, David Winsemius > wrote: >> >> On Aug 2, 2011, at 12:51 PM, r student wrote: >> >>> I'm trying to create a density plot using census data, where the >>> weights don't sum to 1. >>> >>> >>>> plot(density(oh$FINCP,weights=oh$PWGTP)) >>> >>> >>> Warning message: >>> In density.default(oh$FINCP, weights = oh$PWGTP) : >>> sum(weights) != 1 -- will not get true density >>> >>> >>> How would I go about doing this? >> >> Wouldn't you just divide by the sum? >> >> -- >> >> David Winsemius, MD >> West Hartford, CT >> >> David Winsemius, MD West Hartford, CT From jvadams at usgs.gov Tue Aug 2 19:36:04 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Tue, 2 Aug 2011 12:36:04 -0500 Subject: [R] Data frame to matrix - revisited In-Reply-To: <1312304456.51386.YahooMailNeo@web121802.mail.ne1.yahoo.com> References: <1312304456.51386.YahooMailNeo@web121802.mail.ne1.yahoo.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pdalgd at gmail.com Tue Aug 2 19:47:00 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 2 Aug 2011 19:47:00 +0200 Subject: [R] Inserting column in between -- "better" way? In-Reply-To: References: <013701cc507b$d464d840$7d2e88c0$@edu> <5FB1DE96-0F6B-4AEC-907C-92CE5CD956CA@gmail.com> Message-ID: On Aug 2, 2011, at 19:17 , Bert Gunter wrote: > Thanks for this Peter: > >> >> Sarah (sic) is on the right track, just lose the commas so that you don't drop to a vector: >> >>> x <- data.frame(A=1:3, B=1:3, C=1:3, D=1:3, E=1:3) >>> newcol <- 4:6 >>> cbind(x[1], newcol, x[2:ncol(x)]) >> A newcol B C D E >> 1 1 4 1 1 1 1 >> 2 2 5 2 2 2 2 >> 3 3 6 3 3 3 3 >> > > Am I correct in saying that this is a bit subtle: x[1] and > x[2:ncol(x)] are actually lists with vector components; so you're > cbinding lists, which retain the labels, no? Well, to be precise they are obtained by indexing a data frame _as_ a list. The result of that is a data frame (always, which was the point). So you're cbind()-ing data frames, which is what you wanted to do all along. > > If so, it's a nice subtlety to remember, anyway. > > -- Bert > > > -- > "Men by nature long to get on to the ultimate truths, and will often > be impatient with elementary studies or fight shy of them. If it were > possible to reach the ultimate truths without the elementary studies > usually prefixed to them, these would not be preparatory studies but > superfluous diversions." > > -- Maimonides (1135-1204) > > Bert Gunter > Genentech Nonclinical Biostatistics -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From pdalgd at gmail.com Tue Aug 2 20:09:57 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 2 Aug 2011 20:09:57 +0200 Subject: [R] Memory limit in Aggregate() In-Reply-To: <1312304960604-3713042.post@n4.nabble.com> References: <1312278358738-3711819.post@n4.nabble.com> <1D3F4706-DFE7-4E0F-95D9-955331EEAAD0@gmail.com> <1312297814593-3712671.post@n4.nabble.com> <7BE2D45D-B379-49E4-8A51-25914C7569EE@gmail.com> <1312304960604-3713042.post@n4.nabble.com> Message-ID: On Aug 2, 2011, at 19:09 , Guillaume wrote: > Hi Peter, > > Yes I have a large number of factors in the listBy table. > > Do you mean that aggregate() creates a complete cartesian product of the > "by" columns ? (and creates combinations of values that do not exist in the > orignial "by" table, before removing them when returning the aggregated > table?) Hm, at least in recent versions that shouldn't happen. The "meat" of aggregate.data.frame is ans <- lapply(split(e, grp), FUN, ...) where grp is a numerical coding of the factor combination for each cell. That could conceivably contain some large values, but since it is numeric (and not a factor with levels, say, 0:(n1*n2*n3*n4-1)), split should not generate more groups than are present in data. Some of this stuff was rewritten in Jan 2010. You might want to try a version which is later than yours from May 2009... > > > Thanks a lot, > Guillaume > > -- > View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3713042.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From studentofr at gmail.com Tue Aug 2 19:11:42 2011 From: studentofr at gmail.com (r student) Date: Tue, 2 Aug 2011 10:11:42 -0700 Subject: [R] density plot for weighted data In-Reply-To: References: Message-ID: Like below? plot(density(oh$FINCP,weights=oh$PWGTP/sum(oh$PWGTP))) On Tue, Aug 2, 2011 at 10:06 AM, David Winsemius wrote: > > On Aug 2, 2011, at 12:51 PM, r student wrote: > >> I'm trying to create a density plot using census data, where the >> weights don't sum to 1. >> >> >>> plot(density(oh$FINCP,weights=oh$PWGTP)) >> >> >> Warning message: >> In density.default(oh$FINCP, weights = oh$PWGTP) : >> ?sum(weights) != 1 ?-- will not get true density >> >> >> How would I go about doing this? > > Wouldn't you just divide by the sum? > > -- > > David Winsemius, MD > West Hartford, CT > > From johnsen at fas.harvard.edu Tue Aug 2 20:21:20 2011 From: johnsen at fas.harvard.edu (Sverre Stausland) Date: Tue, 2 Aug 2011 14:21:20 -0400 Subject: [R] Extract names from vector according to their values Message-ID: Dear helpers, I can create a vector with the priority of the packages that came with R, like this: > installed.packages()[,"Priority"]->my.vector > my.vector base boot class cluster codetools "base" "recommended" "recommended" "recommended" "recommended" compiler datasets foreign graphics grDevices "base" "base" "recommended" "base" "base" grid KernSmooth lattice MASS Matrix "base" "recommended" "recommended" "recommended" "recommended" methods mgcv nlme nnet rpart "base" "recommended" "recommended" "recommended" "recommended" spatial splines stats stats4 survival "recommended" "base" "base" "base" "recommended" tcltk tools utils "base" "base" "base" How can I extract the names from this vector according to their priority? I.e. I want to create a vector from this with the names of the "base" packages, and another vector with the names of the "recommended" packages. Thank you Sverre From djmuser at gmail.com Tue Aug 2 20:30:49 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 11:30:49 -0700 Subject: [R] Data frame to matrix - revisited In-Reply-To: <1312304456.51386.YahooMailNeo@web121802.mail.ne1.yahoo.com> References: <1312304456.51386.YahooMailNeo@web121802.mail.ne1.yahoo.com> Message-ID: Hi: Here are a couple of ways. Since your data frame does not contain a 'c' in ID2, we redefine the factor to give it all five levels rather than the observed four: > df <- read.table(textConnection(" + ID1 ID2 Value + a b 1 + b d 1 + c a 2 + c e 1 + d a 1 + e d 2"), header = TRUE) str(df) > str(df) 'data.frame': 6 obs. of 3 variables: $ ID1 : Factor w/ 5 levels "a","b","c","d",..: 1 2 3 3 4 5 $ ID2 : Factor w/ 4 levels "a","b","d","e": 2 3 1 4 1 3 $ Value: int 1 1 2 1 1 2 df$ID2 <- factor(df$ID2, levels = letters[1:5]) > str(df) 'data.frame': 6 obs. of 3 variables: $ ID1 : Factor w/ 5 levels "a","b","c","d",..: 1 2 3 3 4 5 $ ID2 : Factor w/ 5 levels "a","b","c","d",..: 2 4 1 5 1 4 $ Value: int 1 1 2 1 1 2 Now we're good... # (1) xtabs: with(df, xtabs(Value ~ ID1 + ID2) + xtabs(Value ~ ID2 + ID1)) ID2 ID1 a b c d e a 0 1 2 1 0 b 1 0 0 1 0 c 2 0 0 0 1 d 1 1 0 0 2 e 0 0 1 2 0 # (2) acast() in the reshape2 package: library('reshape2') v1 <- acast(df, ID1 ~ ID2, value_var = 'Value', drop = FALSE, fill = 0) v2 <- acast(df, ID2 ~ ID1, value_var = 'Value', drop = FALSE, fill = 0) v <- v1 + v2 v[v == 0L] <- NA v a b c d e a NA 1 2 1 NA b 1 NA NA 1 NA c 2 NA NA NA 1 d 1 1 NA NA 2 e NA NA 1 2 NA HTH, Dennis On Tue, Aug 2, 2011 at 10:00 AM, Jagz Bell wrote: > Hi, > I've tried to look through all the previous related Threads/posts but can't find a solution to what's probably a simple question. > > I have a data frame comprised of three columns e.g.: > > ID1?ID2?Value > a?b?1 > b?d?1 > c?a?2 > c?e?1 > d?a?1 > e?d?2 > > I'd like to convert the data to a matrix i.e.: > > ?a b c d e > a n/a 1 2 1 n/a > b 1 n/a n/a 1 n/a > c 2 n/a n/a n/a 1 > d 1 1 n/a n/a 2 > e n/a n/a 1 2 n/a > > Any help is much appreciated, > > Jagz > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From dwinsemius at comcast.net Tue Aug 2 20:34:19 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 14:34:19 -0400 Subject: [R] Extract names from vector according to their values In-Reply-To: References: Message-ID: <797129A3-9633-4943-AB53-98DC62A808FA@comcast.net> On Aug 2, 2011, at 2:21 PM, Sverre Stausland wrote: > Dear helpers, > > I can create a vector with the priority of the packages that came with > R, like this: > >> installed.packages()[,"Priority"]->my.vector >> my.vector > base boot class cluster codetools > "base" "recommended" "recommended" "recommended" "recommended" > compiler datasets foreign graphics grDevices > "base" "base" "recommended" "base" "base" > grid KernSmooth lattice MASS Matrix > "base" "recommended" "recommended" "recommended" "recommended" > methods mgcv nlme nnet rpart > "base" "recommended" "recommended" "recommended" "recommended" > spatial splines stats stats4 survival > "recommended" "base" "base" "base" "recommended" > tcltk tools utils > "base" "base" "base" > > How can I extract the names from this vector according to their > priority? I.e. I want to create a vector from this with the names of > the "base" packages, and another vector with the names of the > "recommended" packages. > names( my.vector[which(my.vector=="recommended")]) [1] "boot" "class" "cluster" [4] "codetools" "foreign" "KernSmooth" [7] "lattice" "MASS" "Matrix" [10] "mgcv" "nlme" "nnet" [13] "rpart" "spatial" "survival" Note that some people may tell you that this form below should be preferred because the 'which' is superfluous. It is not. The "[" function returns all the NA's fr reasons that are unclear to me. It is wiser to use `which` so that you get numerical indexing. > names(my.vector[my.vector=="recommended"]) On my system it produces 493 items most of them NA's. -- David Winsemius, MD West Hartford, CT From jvadams at usgs.gov Tue Aug 2 20:34:42 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Tue, 2 Aug 2011 13:34:42 -0500 Subject: [R] Extract names from vector according to their values In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djmuser at gmail.com Tue Aug 2 20:41:04 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 11:41:04 -0700 Subject: [R] Extract names from vector according to their values In-Reply-To: References: Message-ID: Hi: One more possibility: > names(my.vector[grep('recommended', my.vector)]) [1] "Matrix" "boot" "class" "cluster" "codetools" [6] "foreign" "KernSmooth" "lattice" "MASS" "Matrix" [11] "mgcv" "nlme" "nnet" "rpart" "spatial" [16] "survival" > names(my.vector[grep('base', my.vector)]) [1] "base" "compiler" "datasets" "graphics" "grDevices" "grid" [7] "methods" "splines" "stats" "stats4" "tcltk" "tools" [13] "utils" HTH, Dennis On Tue, Aug 2, 2011 at 11:21 AM, Sverre Stausland wrote: > Dear helpers, > > I can create a vector with the priority of the packages that came with > R, like this: > >> installed.packages()[,"Priority"]->my.vector >> my.vector > ? ? ? ? base ? ? ? ? ?boot ? ? ? ? class ? ? ? cluster ? ? codetools > ? ? ? "base" "recommended" "recommended" "recommended" "recommended" > ? ? compiler ? ? ?datasets ? ? ? foreign ? ? ?graphics ? ? grDevices > ? ? ? "base" ? ? ? ?"base" "recommended" ? ? ? ?"base" ? ? ? ?"base" > ? ? ? ? grid ? ?KernSmooth ? ? ? lattice ? ? ? ? ?MASS ? ? ? ?Matrix > ? ? ? "base" "recommended" "recommended" "recommended" "recommended" > ? ? ?methods ? ? ? ? ?mgcv ? ? ? ? ?nlme ? ? ? ? ?nnet ? ? ? ? rpart > ? ? ? "base" "recommended" "recommended" "recommended" "recommended" > ? ? ?spatial ? ? ? splines ? ? ? ? stats ? ? ? ?stats4 ? ? ?survival > "recommended" ? ? ? ?"base" ? ? ? ?"base" ? ? ? ?"base" "recommended" > ? ? ? ?tcltk ? ? ? ? tools ? ? ? ? utils > ? ? ? "base" ? ? ? ?"base" ? ? ? ?"base" > > How can I extract the names from this vector according to their > priority? I.e. I want to create a vector from this with the names of > the "base" packages, and another vector with the names of the > "recommended" packages. > > Thank you > Sverre > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From pomchip at free.fr Tue Aug 2 20:51:42 2011 From: pomchip at free.fr (=?ISO-8859-1?Q?S=E9bastien_Bihorel?=) Date: Tue, 2 Aug 2011 14:51:42 -0400 Subject: [R] Need to compute density as done by panel.histogram Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From reith_william at bah.com Tue Aug 2 20:39:29 2011 From: reith_william at bah.com (wwreith) Date: Tue, 2 Aug 2011 11:39:29 -0700 (PDT) Subject: [R] 3D Bar Graphs in ggplot2? Message-ID: <1312310369354-3713305.post@n4.nabble.com> Does anyone know how to create a 3D Bargraph using ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a 2D bar graph with a 3D shaped bard. See attached excel file for an example. Before anyone asks I know that 3D looking bars don't add anything except "prettiness". http://r.789695.n4.nabble.com/file/n3713305/Example.xlsx Example.xlsx -- View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3713305.html Sent from the R help mailing list archive at Nabble.com. From tlumley at uw.edu Tue Aug 2 22:18:44 2011 From: tlumley at uw.edu (Thomas Lumley) Date: Wed, 3 Aug 2011 08:18:44 +1200 Subject: [R] density plot for weighted data In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 5:11 AM, r student wrote: > Like below? > > plot(density(oh$FINCP,weights=oh$PWGTP/sum(oh$PWGTP))) > Yes If you are doing lots of analyses with weighted data you might want to look at the survey package. It also has a density estimator, in svysmooth(), which works very much the same way as density() for weighted data, but doesn't complain about rescaling the weights. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland From diggsb at ohsu.edu Tue Aug 2 22:25:21 2011 From: diggsb at ohsu.edu (Brian Diggs) Date: Tue, 2 Aug 2011 13:25:21 -0700 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <1312310369354-3713305.post@n4.nabble.com> References: <1312310369354-3713305.post@n4.nabble.com> Message-ID: <4E385D31.7090504@ohsu.edu> On 8/2/2011 11:39 AM, wwreith wrote: > Does anyone know how to create a 3D Bargraph using ggplot2/qplot. I don't > mean 3D as in x,y,z coordinates. Just a 2D bar graph with a 3D shaped bard. > See attached excel file for an example. It is not possible. > Before anyone asks I know that 3D looking bars don't add anything except > "prettiness". That is being far too generous. Setting aside that "prettiness" may be subjective, if changing to a 3D effect only affected the "prettiness," there would not be the negative reaction to it that there is. In fact, a 3D effect reduces the ability for a graph to be correctly understood. It distorts the data. When I see a 3D bar plot, I think "This person wants me to think I've been presented with information, but they have deliberately chosen a format that distorts the data. I wonder what they are hiding?" At least you didn't ask about a 3D pie chart. > http://r.789695.n4.nabble.com/file/n3713305/Example.xlsx Example.xlsx > > -- > View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3713305.html > Sent from the R help mailing list archive at Nabble.com. -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University From murdoch.duncan at gmail.com Tue Aug 2 22:51:38 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 02 Aug 2011 16:51:38 -0400 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <1312310369354-3713305.post@n4.nabble.com> References: <1312310369354-3713305.post@n4.nabble.com> Message-ID: <4E38635A.1000009@gmail.com> On 11-08-02 2:39 PM, wwreith wrote: > Does anyone know how to create a 3D Bargraph using ggplot2/qplot. I don't > mean 3D as in x,y,z coordinates. Just a 2D bar graph with a 3D shaped bard. > See attached excel file for an example. > > Before anyone asks I know that 3D looking bars don't add anything except > "prettiness". If you want graphs like that, you should be using Excel, not R. Duncan Murdoch > > http://r.789695.n4.nabble.com/file/n3713305/Example.xlsx Example.xlsx > > -- > View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3713305.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From paulepanter at users.sourceforge.net Tue Aug 2 23:27:58 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Tue, 2 Aug 2011 23:27:58 +0200 Subject: [R] How to find the parameter of a power function to fit simulation data to it for the tail? Message-ID: <1312320478.3658.258.camel@mattotaupa> Dear R folks, having simulation data in a vector n2off, I know that they should be similar to a power function f [1], f(n) = n^(-1/r), r ? ?\{0}, and I want to find the value for r best fitting the simulation data. Furthermore I know that this is only true for big n, that means n2off(n) ~ f(n) ? n2off(n)/f(n) ? 1 for n ? ?. (The vector n2off is considered a function n2off(n).) I came up with the following example where I artificially munch(?) the values of a known function, n^(-?), and the fit should hopefully return r = 2, that means n^(-?). > n <- 1:10 # Should be more data points, but not useful for including into an email. > n [1] 1 2 3 4 5 6 7 8 9 10 > n2 <- n**(-0.5) > n2 [1] 1.0000000 0.7071068 0.5773503 0.5000000 0.4472136 0.4082483 0.3779645 [8] 0.3535534 0.3333333 0.3162278 > set.seed(1); n2off <- n2 + runif(1)/100 # for greater n the divisor should also be increased I guess. > n2off [1] 1.0026551 0.7097619 0.5800054 0.5026551 0.4498687 0.4109034 0.3806196 [8] 0.3562085 0.3359884 0.3188829 Weighting fits(?) larger n higher or only from certain n on, for example n ? 100, is not considered in this example. And probably the data points are too small in this function. I have to admit that I am new to this topic and I am just overwhelmed what I have found when searching for ?gafit? in the r-help archive [2] and ?curve parameter fitting? in rseek.org [3]. Reading ?nlm, ?nlminb, ?opitimze and ?optim there are just too many options there. Reading about gafit [4] it says that it is not maintained. Additionally I am not sure if this could be turned into a linear model using log(n^(-1/r)) = -1/r log(n). Somewhere it said that linear regression models have to fulfill certain assumptions. So if somebody of you experienced users could point me to the ?best? function or package to use here and some literature regarding this issue (fit only for big n) that would be much appreciated. Thank you in advance, Paul PS: Is that question too long for sending to the list and should I be less elaborate for further problems? [1] https://secure.wikimedia.org/wikipedia/en/wiki/Power_function [2] http://tolstoy.newcastle.edu.au/~rking/R/ [3] http://www.rseek.org/ [4] http://cran.r-project.org/web/packages/gafit/gafit.pdf -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From rolf.turner at xtra.co.nz Tue Aug 2 23:45:37 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Wed, 03 Aug 2011 09:45:37 +1200 Subject: [R] Inverse of FAQ 7.31. In-Reply-To: <9FD03D32-FA9C-496A-BC25-AFC76389EA55@gmail.com> References: <4E379307.2020208@xtra.co.nz> <9FD03D32-FA9C-496A-BC25-AFC76389EA55@gmail.com> Message-ID: <4E387001.9090605@xtra.co.nz> Thanks to Peter Dalgaard and to Baptiste Auguie (off-list) for the insights they provided. cheers, Rolf turner From johjeffrey at hotmail.com Tue Aug 2 23:47:51 2011 From: johjeffrey at hotmail.com (Jeffrey Joh) Date: Tue, 2 Aug 2011 14:47:51 -0700 Subject: [R] Calculate mean ignore null Message-ID: I have the following: Tout = c(". ", ". ", + "-51.0", " -9.6", " -9.6", " -9.6", " -9.6", " -9.6", " -9.6", + " -9.6", " -9.5", " -9.5", " -9.6", " -9.5", " -9.6", " -9.6", + " -9.5", " -9.4", " -9.3", " -9.3", " -9.3", " -9.2", " -9.0", + " -9.0", " -8.9", " -8.9", " -8.9") How can I take the mean while ignoring the null values? I don't want to delete the ". ", just ignore. na.rm=TRUE does not work for this. Jeffrey From michael.weylandt at gmail.com Tue Aug 2 23:56:17 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Tue, 2 Aug 2011 17:56:17 -0400 Subject: [R] Calculate mean ignore null In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gavin.simpson at ucl.ac.uk Wed Aug 3 01:21:23 2011 From: gavin.simpson at ucl.ac.uk (Gavin Simpson) Date: Wed, 03 Aug 2011 00:21:23 +0100 Subject: [R] Help with R In-Reply-To: References: <20110728074401.00e739d15504d69ae1e2e1dc974ae386.da77c4ccb6.wbe@email01.secureserver.net> Message-ID: <1312327283.4938.1.camel@chrysothemis.geog.ucl.ac.uk> On Thu, 2011-07-28 at 11:58 -0400, Sarah Goslee wrote: > Hi Mark, > > On Thu, Jul 28, 2011 at 10:44 AM, wrote: > > > > 1. How can I plot the entire tree produced by rpart? > > What does plot() not do that you are expecting? Not do any labelling... ;-) text(tree) where `tree` is your fitted tree will add the labels after using `plot()` as per Sarah's reply. > > > 2. How can I submit a vector of values to a tree produced by rpart and have > > it make an assignment? > > What does predict() not do that you are expecting? Indeed. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% From cb1 at ualberta.ca Wed Aug 3 01:25:33 2011 From: cb1 at ualberta.ca (Colin Bergeron) Date: Tue, 2 Aug 2011 17:25:33 -0600 Subject: [R] Wrong values when projecting LatLong in UTM Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From andywinterman at gmail.com Wed Aug 3 01:45:15 2011 From: andywinterman at gmail.com (Andrew Winterman) Date: Tue, 2 Aug 2011 16:45:15 -0700 Subject: [R] xlsx error Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From asaguiar at spsconsultoria.com Wed Aug 3 01:53:53 2011 From: asaguiar at spsconsultoria.com (Alexandre Aguiar) Date: Tue, 2 Aug 2011 20:53:53 -0300 Subject: [R] [Rd] example package for devel newcomers Message-ID: <201108022053.54027@spsconsultoria.com> Em Segunda 01 Agosto 2011, voc? escreveu: > Is there a preferred language you would like to use in your package > development? I randomly downloaded packages until I found some that > helped me along my way, and might be able to help you pick one. If you > are just looking at building a package of R functions and data you > have developed, possibly the following example will get you started > till you feel comfortable with the "Writing R Extensions" > documentation (http://cran.r-project.org/doc/manuals/R-exts.pdf): Dan, your message is cool. Well, here is what my project is about: it is a package to embed php into R. Named Rphp for now. It is mostly done from scratch. I have loved R-exts.pdf. Great stuff. Why embed php into R? My primary purpose is to use web content management systems (WCMS) ready and extensively tested code from R cgi scripts. Someone more experienced with php might think of other uses. My approach is RAD(ical) and innovative (IMextremelyHO :-D) because: a) *any* php based WCMS can be used from R code with no php or html coding; b) output fully compliant with the website appearance; c) WCMS automatic upgrades and interfaces changes (skins or themes) will be so unlikey to cause need for maintenance in R cgi scripts; d) R cgi scripts will not demand changes in php code; e) the builtin php session support obviates the need for any special session coding by R (likely non-web) programmers; f) potential for improved analysis of web databases and even of systems surveillance tasks. During my explorations of the R interface for extensions and the time spent in this tiny project, some questions emerged. 1. my code uses no recursion but I do not really know what is inside php code. Stack size could be a concern. Has any of you there ever needed to allocate a new stack for a package? Is it better to wait for complaints (if anyone ever would like to try this package...)? 2. can R_registerRoutines be called more than once within the same library (the same DllInfo data) so that it can reconfigure itself on the fly? 3. Is it safe (I guess it is) to "re-export" a function pointer retrieved with R_GetCCallable? 4. when loading a second library (in this case libphp5.so) is it better to put it in the package library directory and load it using the 'char *path' member of DllInfo? Using a second library has implications: a) a given R setup can be limited to the user space without root access; b) in the case of desktops where someone might use Rphp, most systems do not have libphp5.so installed by default and installing it frequently means to install apache and all (many) related packages; c) many sysadmins do not have root access but can compile their own php version; d) building the libphp5.so may not be an easy task for many. 5. Similar to 3, is it safe to "export" functions of the second library? libphp5.so will not be registered to R and has some interesting functions that can be "exported" directly or as pointers within Rphp library. A stub function can be used. 6. related to 4, with the many machine architectures and operating systems around I think it is neither desirable nor feasible to distribute precompiled libphp5.so versions; the package itself can download (wget and curl are everywhere) and compile php. Compiling php is not a lengthy task (6m12.9s in my quadcore desktop) but is a lot tricky and demands several development packages not installed by default in desktop systems. Their installations would require root access. What is the suggested approach to deploy libphp5.so? 7. I do not know how to produce a version for windows if requested. I have only an old MSC++ 97 and lcc (current) and have xp in a virtual box. This concern includes php. Can I get help regarding windows in this list? It might mean actual work: adapting code, compiling, packaging, etc. Not sure what is needed. 8. system safety does not seem a concern regarding this use of php, but... Any suggestions? I guess some manual steps will be necessary because of potential security breaches related to the use of a second library. Patching php to produce a special build to be used as the package library would not be a trivial task and would demand updates at every new php version. Something I can't assure I can do. And would have to distribute the whole php source code: still have to study php licensing scheme. BTW, I copied Rdynpriv.h by hand to my include path to get access to 'struct _DllInfo' definition. The R install process did not copy this file. Am I doing something wrong here? Sorry for the lengthy message. Thanx for your help. -- Alexandre -- Alexandre Santos Aguiar, MD, SCT -------------- Pr?xima Parte ---------- Um anexo n?o-texto foi limpo... Nome: n?o dispon?vel Tipo: application/pgp-signature Tamanho: 198 bytes Descri??o: This is a digitally signed message part. URL: From JSorkin at grecc.umaryland.edu Wed Aug 3 02:07:53 2011 From: JSorkin at grecc.umaryland.edu (John Sorkin) Date: Tue, 02 Aug 2011 20:07:53 -0400 Subject: [R] Is a string all blanks? In-Reply-To: References: Message-ID: <4E385919020000CB000919C3@medicine.umaryland.edu> windows 7 R 2.12.1 Is there any easy way to determine if a sting contains nothing but blanks? I need to check a series of strings of various length. OneBlank <- " " TwoBlanks <- " " ThreeBlanks <- " " NoBlanks <- "NoBlanks" What I want would be a function such as ALLBLANKS that would return the following ALLBLANKS(OneBlank) would be true (or 1) ALLBLANKS(TwoBlanks) would be true (or 1) ALLBLANKS(ThreeBlanks) would be true (or 1) ALLBLANKS(NoBlanks) would be false (or 0) Thanks, John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} From dwinsemius at comcast.net Wed Aug 3 02:12:49 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 20:12:49 -0400 Subject: [R] Is a string all blanks? In-Reply-To: <4E385919020000CB000919C3@medicine.umaryland.edu> References: <4E385919020000CB000919C3@medicine.umaryland.edu> Message-ID: <13103664-D690-4A31-A94F-11E029477C91@comcast.net> On Aug 2, 2011, at 8:07 PM, John Sorkin wrote: > windows 7 > R 2.12.1 > > Is there any easy way to determine if a sting contains nothing but > blanks? I need to check a series of strings of various length. > > OneBlank <- " " > TwoBlanks <- " " > ThreeBlanks <- " " > NoBlanks <- "NoBlanks" > > What I want would be a function such as ALLBLANKS that would return > the following > ALLBLANKS(OneBlank) would be true (or 1) > ALLBLANKS(TwoBlanks) would be true (or 1) > ALLBLANKS(ThreeBlanks) would be true (or 1) > ALLBLANKS(NoBlanks) would be false (or 0) > bvec <- c(OneBlank, TwoBlanks ,ThreeBlanks ,NoBlanks) > grepl(" +", bvec) [1] TRUE TRUE TRUE FALSE -- David Winsemius, MD West Hartford, CT From peter.langfelder at gmail.com Wed Aug 3 02:13:19 2011 From: peter.langfelder at gmail.com (Peter Langfelder) Date: Tue, 2 Aug 2011 17:13:19 -0700 Subject: [R] Is a string all blanks? In-Reply-To: <4E385919020000CB000919C3@medicine.umaryland.edu> References: <4E385919020000CB000919C3@medicine.umaryland.edu> Message-ID: On Tue, Aug 2, 2011 at 5:07 PM, John Sorkin wrote: > windows 7 > R 2.12.1 > > Is there any easy way to determine if a sting contains nothing but blanks? I need to check a series of strings of various length. > > OneBlank <- " " > TwoBlanks <- " ?" > ThreeBlanks <- " ? " > NoBlanks <- "NoBlanks" > > What I want would be a function such as ALLBLANKS that would return the following > ALLBLANKS(OneBlank) would be true (or 1) > ALLBLANKS(TwoBlanks) would be true (or 1) > ALLBLANKS(ThreeBlanks) would be true (or 1) > ALLBLANKS(NoBlanks) would be false (or 0) ALLBLANKS = function(x) {x==paste(rep(" ", nchar(x)), collapse = "")} > ALLBLANKS(" ") [1] TRUE > ALLBLANKS(" ") [1] TRUE > ALLBLANKS(" A ") [1] FALSE HTH, Peter From dwinsemius at comcast.net Wed Aug 3 02:14:23 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 2 Aug 2011 20:14:23 -0400 Subject: [R] How to find the parameter of a power function to fit simulation data to it for the tail? In-Reply-To: <1312320478.3658.258.camel@mattotaupa> References: <1312320478.3658.258.camel@mattotaupa> Message-ID: <292F9013-378E-4B5F-9D0E-831E6AD693EB@comcast.net> From a search at Barons site with "fitting distribution truncated power" http://finzi.psych.upenn.edu/R/library/brainwaver/html/fitting.html http://finzi.psych.upenn.edu/R/library/gamlss/doc/gamlss-manual.pdf http://finzi.psych.upenn.edu/R/library/bipartite/html/degreedistr.html http://finzi.psych.upenn.edu/R/Rhelp02a/archive/90168.html -- David. On Aug 2, 2011, at 5:27 PM, Paul Menzel wrote: > Dear R folks, > > > having simulation data in a vector n2off, I know that they should be > similar to a power function f [1], f(n) = n^(-1/r), r ? ?\{0}, > and I > want to find the value for r best fitting the simulation data. > Furthermore I know that this is only true for big n, that means > n2off(n) > ~ f(n) ? n2off(n)/f(n) ? 1 for n ? ?. (The vector n2off is > considered a > function n2off(n).) > > I came up with the following example where I artificially munch(?) the > values of a known function, n^(-?), and the fit should hopefully > return > r = 2, that means n^(-?). > >> n <- 1:10 # Should be more data points, but not useful for >> including into an email. >> n > [1] 1 2 3 4 5 6 7 8 9 10 >> n2 <- n**(-0.5) >> n2 > [1] 1.0000000 0.7071068 0.5773503 0.5000000 0.4472136 > 0.4082483 0.3779645 > [8] 0.3535534 0.3333333 0.3162278 >> set.seed(1); n2off <- n2 + runif(1)/100 # for greater n the divisor >> should also be increased I guess. >> n2off > [1] 1.0026551 0.7097619 0.5800054 0.5026551 0.4498687 > 0.4109034 0.3806196 > [8] 0.3562085 0.3359884 0.3188829 > > Weighting fits(?) larger n higher or only from certain n on, for > example > n ? 100, is not considered in this example. And probably the data > points > are too small in this function. > > I have to admit that I am new to this topic and I am just overwhelmed > what I have found when searching for ?gafit? in the r-help archive > [2] > and ?curve parameter fitting? in rseek.org [3]. > > Reading ?nlm, ?nlminb, ?opitimze and ?optim there are just too many > options there. Reading about gafit [4] it says that it is not > maintained. > > Additionally I am not sure if this could be turned into a linear model > using log(n^(-1/r)) = -1/r log(n). Somewhere it said that linear > regression models have to fulfill certain assumptions. > > So if somebody of you experienced users could point me to the > ?best? > function or package to use here and some literature regarding this > issue > (fit only for big n) that would be much appreciated. > > > Thank you in advance, > > Paul > > > PS: Is that question too long for sending to the list and should I be > less elaborate for further problems? > > > [1] https://secure.wikimedia.org/wikipedia/en/wiki/Power_function > [2] http://tolstoy.newcastle.edu.au/~rking/R/ > [3] http://www.rseek.org/ > [4] http://cran.r-project.org/web/packages/gafit/gafit.pdf > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From jholtman at gmail.com Wed Aug 3 02:30:37 2011 From: jholtman at gmail.com (jim holtman) Date: Tue, 2 Aug 2011 20:30:37 -0400 Subject: [R] Is a string all blanks? In-Reply-To: <4E385919020000CB000919C3@medicine.umaryland.edu> References: <4E385919020000CB000919C3@medicine.umaryland.edu> Message-ID: If the question is that the string contains all blanks, then a regular expression will probably be best: > OneBlank <- " " > TwoBlanks <- " " > ThreeBlanks <- " " > NoBlanks <- "NoBlanks" > bvec <- c(OneBlank, TwoBlanks ,ThreeBlanks ,NoBlanks) > bvec <- c(OneBlank, TwoBlanks ,ThreeBlanks ,NoBlanks, " A ") > grepl("^ *$", bvec) [1] TRUE TRUE TRUE FALSE FALSE > On Tue, Aug 2, 2011 at 8:07 PM, John Sorkin wrote: > windows 7 > R 2.12.1 > > Is there any easy way to determine if a sting contains nothing but blanks? I need to check a series of strings of various length. > > OneBlank <- " " > TwoBlanks <- " ?" > ThreeBlanks <- " ? " > NoBlanks <- "NoBlanks" > > What I want would be a function such as ALLBLANKS that would return the following > ALLBLANKS(OneBlank) would be true (or 1) > ALLBLANKS(TwoBlanks) would be true (or 1) > ALLBLANKS(ThreeBlanks) would be true (or 1) > ALLBLANKS(NoBlanks) would be false (or 0) > > Thanks, > John > > John David Sorkin M.D., Ph.D. > Chief, Biostatistics and Informatics > University of Maryland School of Medicine Division of Gerontology > Baltimore VA Medical Center > 10 North Greene Street > GRECC (BT/18/GR) > Baltimore, MD 21201-1524 > (Phone) 410-605-7119 > (Fax) 410-605-7913 (Please call phone number above prior to faxing) > > Confidentiality Statement: > This email message, including any attachments, is for ...{{dropped:18}} From lee.kitty at yahoo.com Wed Aug 3 02:52:31 2011 From: lee.kitty at yahoo.com (Kitty Lee) Date: Tue, 2 Aug 2011 17:52:31 -0700 (PDT) Subject: [R] Matplot & Polygon Message-ID: <1312332751.4600.YahooMailClassic@web36304.mail.mud.yahoo.com> I used matplot to plot multiple lines (over 300 lines). I'd like to draw a polygon and shade the area between upper and lower boundary. I know the plot function works pretty well with polygon(). How about matplot? I have too many lines and my upper/lower boundaries are formed by many different lines. Matplot seems to be more efficient in plotting our all the lines...or perhaps I'm wrong on this. I'd like to hear other suggestions. Thanks. K. From xenon99 at hotmail.com Wed Aug 3 01:32:22 2011 From: xenon99 at hotmail.com (Darius H) Date: Tue, 2 Aug 2011 23:32:22 +0000 Subject: [R] Writing multiple regression in one function Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ZNi at scanhealthplan.com Wed Aug 3 02:10:41 2011 From: ZNi at scanhealthplan.com (Zhiming Ni) Date: Tue, 2 Aug 2011 17:10:41 -0700 Subject: [R] convert a splus randomforest object to R Message-ID: <0F53A3F9BF49DF4DAB78D6E1839CA8400110C6E8@exmail01.scanhealthplan.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From johnsen at fas.harvard.edu Wed Aug 3 04:22:03 2011 From: johnsen at fas.harvard.edu (Sverre Stausland) Date: Tue, 2 Aug 2011 22:22:03 -0400 Subject: [R] Extract rows from a matrix according to value in column Message-ID: Dear helpers, I'm trying to extract certain rows from a matrix according to the values the rows have in a certain column. I've been googling for a while without result. Here's a reproducible example of a matrix (and the one I was playing with initially): > myrepo<-getOption("repos") > myrepo["CRAN"]<-"http://software.rc.fas.harvard.edu/mirrors/R/" > class(available.packages(contriburl=contrib.url(myrepo))) [1] "matrix" I can extract according to rows and column positions: > available.packages(contriburl=contrib.url(myrepo))[2,2] [1] "0.1" But how can I extract the rows according to their values in a column, when $ is not usable for vector matrices? Say I would like all the rows where the value in the column "Version" is "0.1" (as above). For a data frame, I would have done it like this: > available.packages(contriburl=contrib.url(myrepo))->avail.pack > avail.pack[avail.pack$Version=="0.1",] Error in avail.pack$Version : $ operator is invalid for atomic vectors Also, thanks to those who responded to my recent question in https://stat.ethz.ch/pipermail/r-help/2011-August/285650.html (I assume it's best not to respond at the end of threads with a "thank you" only, to avoid cluttering people's inboxes?) best Sverre From izahn at psych.rochester.edu Wed Aug 3 04:34:30 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Tue, 2 Aug 2011 22:34:30 -0400 Subject: [R] Extract rows from a matrix according to value in column In-Reply-To: References: Message-ID: Hi Sverre, On Tue, Aug 2, 2011 at 10:22 PM, Sverre Stausland wrote: > Dear helpers, > > I'm trying to extract certain rows from a matrix according to the > values the rows have in a certain column. I've been googling for a > while without result. > > Here's a reproducible example of a matrix (and the one I was playing > with initially): > >> myrepo<-getOption("repos") >> myrepo["CRAN"]<-"http://software.rc.fas.harvard.edu/mirrors/R/" >> class(available.packages(contriburl=contrib.url(myrepo))) > [1] "matrix" > > I can extract according to rows and column positions: > >> available.packages(contriburl=contrib.url(myrepo))[2,2] > [1] "0.1" > > But how can I extract the rows according to their values in a column, > when $ is not usable for vector matrices? Say I would like all the > rows where the value in the column "Version" is "0.1" (as above). For > a data frame, I would have done it like this: > >> available.packages(contriburl=contrib.url(myrepo))->avail.pack >> avail.pack[avail.pack$Version=="0.1",] > Error in avail.pack$Version : $ operator is invalid for atomic vectors Use [ instead of $, like this: avail.pack[avail.pack[, "Version"] == "0.1", ] best, Ista > > Also, thanks to those who responded to my recent question in > https://stat.ethz.ch/pipermail/r-help/2011-August/285650.html (I > assume it's best not to respond at the end of threads with a "thank > you" only, to avoid cluttering people's inboxes?) > > best > Sverre > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From kathryn.lord2000 at gmail.com Wed Aug 3 03:40:46 2011 From: kathryn.lord2000 at gmail.com (Kathie) Date: Tue, 2 Aug 2011 18:40:46 -0700 (PDT) Subject: [R] create a list under constraints Message-ID: <1312335646560-3714191.post@n4.nabble.com> Hi, R users, Here is an example. k <- c(1,2,3,4,5) i <- c(0,1,3,2,1) if k=1, then i=0 if k=2, then i=0, 1 if k=3, then i=0, 1, 2, 3 if k=4, then i=0, 1, 2 if k=5, then i=0, 1 so i'd like to create a list like below. > list k i 1 1 0 2 2 0 3 2 1 4 3 0 5 3 1 6 3 2 7 3 3 8 4 0 9 4 1 10 4 2 11 5 0 12 5 1 I tried expand.grid, but I can't. Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/create-a-list-under-constraints-tp3714191p3714191.html Sent from the R help mailing list archive at Nabble.com. From kathryn.lord2000 at gmail.com Wed Aug 3 04:56:00 2011 From: kathryn.lord2000 at gmail.com (Kathie) Date: Tue, 2 Aug 2011 19:56:00 -0700 (PDT) Subject: [R] expand.gird with constraints? Message-ID: <1312340160055-3714281.post@n4.nabble.com> Hi, R users, Here is an example. k <- c(1,2,3,4,5) i <- c(0,1,3,2,1) if k=1, then j=0 from i if k=2, then j=0, 1 from i if k=3, then j=0, 1, 2, 3 from i if k=4, then j=0, 1, 2 from i if k=5, then j=0, 1 from i so i'd like to create a list like below. > list k j 1 1 0 2 2 0 3 2 1 4 3 0 5 3 1 6 3 2 7 3 3 8 4 0 9 4 1 10 4 2 11 5 0 12 5 1 I tried expand.grid, but I can't. Any suggestion will be greatly appreciated. Regards, Kathryn Lord -- View this message in context: http://r.789695.n4.nabble.com/expand-gird-with-constraints-tp3714281p3714281.html Sent from the R help mailing list archive at Nabble.com. From Berwin.Turlach at gmail.com Wed Aug 3 06:10:13 2011 From: Berwin.Turlach at gmail.com (Berwin A Turlach) Date: Wed, 3 Aug 2011 12:10:13 +0800 Subject: [R] expand.gird with constraints? In-Reply-To: <1312340160055-3714281.post@n4.nabble.com> References: <1312340160055-3714281.post@n4.nabble.com> Message-ID: <20110803121013.2301c61b@bossiaea> G'day Kathie, On Tue, 2 Aug 2011 19:56:00 -0700 (PDT) Kathie wrote: > Hi, R users, > > Here is an example. > > k <- c(1,2,3,4,5) > i <- c(0,1,3,2,1) > > if k=1, then j=0 from i > if k=2, then j=0, 1 from i > if k=3, then j=0, 1, 2, 3 from i > if k=4, then j=0, 1, 2 from i > if k=5, then j=0, 1 from i > > so i'd like to create a list like below. > > > list > k j > 1 1 0 > 2 2 0 > 3 2 1 > 4 3 0 > 5 3 1 > 6 3 2 > 7 3 3 > 8 4 0 > 9 4 1 > 10 4 2 > 11 5 0 > 12 5 1 > > I tried expand.grid, but I can't. > > Any suggestion will be greatly appreciated. One possibility is: R> k <- c(1,2,3,4,5) R> i <- c(0,1,3,2,1) R> tt <- c(1, 2, 4, 3, 2) R> data.frame(k=rep(k, tt), j=unlist(sapply(tt, function(ii) i[1:ii]))) k j 1 1 0 2 2 0 3 2 1 4 3 0 5 3 1 6 3 3 7 3 2 8 4 0 9 4 1 10 4 3 11 5 0 12 5 1 Not sure whether this is generalisable to your real problem... HTH. Cheers, Berwin ========================== Full address ============================ A/Prof Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019) +61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009 e-mail: Berwin.Turlach at gmail.com Australia http://www.maths.uwa.edu.au/~berwin http://www.researcherid.com/rid/A-4995-2008 From wildernessness at gmail.com Wed Aug 3 06:52:28 2011 From: wildernessness at gmail.com (wildernessness) Date: Tue, 2 Aug 2011 21:52:28 -0700 (PDT) Subject: [R] cdplot error Message-ID: <1312347148954-3714454.post@n4.nabble.com> Fairly new at this. Trying to create a conditional density plot. >cdplot(status~harvd.l,data=phy) Error in cdplot.formula(status~harvd.l,data=phy): dependent variable should be a factor What does this error mean? Status is a binary response of infestation (0/1) and harvd.l is the log of timber harvest density per catchment. Thanks. -- View this message in context: http://r.789695.n4.nabble.com/cdplot-error-tp3714454p3714454.html Sent from the R help mailing list archive at Nabble.com. From surgeon666666 at yahoo.com.cn Wed Aug 3 07:15:12 2011 From: surgeon666666 at yahoo.com.cn (sytangping) Date: Tue, 2 Aug 2011 22:15:12 -0700 (PDT) Subject: [R] How to make a nomogam and Calibration plot In-Reply-To: <1312215015817-3710126.post@n4.nabble.com> References: <1312213591451-3710068.post@n4.nabble.com> <1312215015817-3710126.post@n4.nabble.com> Message-ID: <1312348512395-3714477.post@n4.nabble.com> Dear Harrell, Many thanks for your quick response! However, after try and try, I still have difficulty to solve my questions. I post my questions again. I hope someone can help me run the data and draw the nomogram and calibration plot for me. I know that is not good but indeed I have no way to go. The problems almost drove me mad! Best regards! Ping Tang Dear R users, I am a new R user and something stops me when I try to write a academic article. I want to make a nomogram to predict the risk of prostate cancer (PCa) using several factors which have been selected from the Logistic regression run under the SPSS. Always, a calibration plot is needed to validate the prediction accuracy of the nomogram. However, I tried many times and read a lot of posts with respect to this topic but I still couldn't figure out how to draw the nomogram and the calibration plot. Attached file is the dataset for the research. It will be very grateful if someone can save his/her time to help for my questions. Warmest regards! Logistic Regression Classification Tablea,b Observed Predicted Pca-YN Percentage Correct 0 1 Step 0 Pca-YN 0 295 0 100.0 1 218 0 .0 Overall Percentage 57.5 Variables in the Equation B S.E. Wald df Sig. Exp(B) 95.0% C.I.for EXP(B) Lower Upper Step 1a Age .031 .015 4.491 1 .034 1.032 1.002 1.062 DRE 1.173 .266 19.492 1 .000 3.233 1.920 5.443 LogPV -2.857 .509 31.532 1 .000 .057 .021 .156 LogPSA 2.316 .246 88.416 1 .000 10.132 6.253 16.419 Constant -1.024 1.273 .648 1 .421 .359 The equation: Probability = e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA 1+e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA My questions are, 1.How to draw a nomogram (similar to the below figure 1) to predict the probability of cancer using R? 2. How to make the Calibration plot (similar to the below figure 2) which used to validate the prediction accuracy of the nomogram using R? And how to calculate the concordance index (C-index) ? http://r.789695.n4.nabble.com/file/n3714477/untitled.jpg http://r.789695.n4.nabble.com/file/n3714477/%E9%99%84%E4%BB%B62.jpg http://r.789695.n4.nabble.com/file/n3714477/Dataset.xls Dataset.xls -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3714477.html Sent from the R help mailing list archive at Nabble.com. From groentemanr at landcareresearch.co.nz Wed Aug 3 07:48:04 2011 From: groentemanr at landcareresearch.co.nz (RonnyG) Date: Tue, 2 Aug 2011 22:48:04 -0700 (PDT) Subject: [R] AICcmodavg functions and 'mer' class models Message-ID: <1312350484197-3714534.post@n4.nabble.com> What is teh reason some functions in the AICcmodavg package do not work with 'mer' class models? One such example would be the 'importance' function. Thanks Ronny -- View this message in context: http://r.789695.n4.nabble.com/AICcmodavg-functions-and-mer-class-models-tp3714534p3714534.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Wed Aug 3 08:10:22 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 2 Aug 2011 23:10:22 -0700 Subject: [R] create a list under constraints In-Reply-To: <1312335646560-3714191.post@n4.nabble.com> References: <1312335646560-3714191.post@n4.nabble.com> Message-ID: Hi: Here's one way; using the mdply() function in the plyr package: k <- c(1,2,3,4,5) i <- c(0,1,3,2,1) # Takes two scalars k and i as input, outputs a data frame ff <- function(k, i) data.frame(k = rep(k, i+1), i = seq(0, i, by = 1)) library('plyr') mdply(data.frame(k, i), ff) # returns a data frame # ------------- Another way to do this is to use the mapply() function This one returns a matrix. # ------------- gg <- function(k, i) cbind(k = rep(k, i+1), i = seq(0, i, by = 1)) do.call(rbind, mapply(gg, k, i)) HTH, Dennis On Tue, Aug 2, 2011 at 6:40 PM, Kathie wrote: > Hi, R users, > > Here is an example. > > k <- c(1,2,3,4,5) > i <- c(0,1,3,2,1) > > if k=1, then i=0 > if k=2, then i=0, 1 > if k=3, then i=0, 1, 2, 3 > if k=4, then i=0, 1, 2 > if k=5, then i=0, 1 > > so i'd like to create a list like below. > >> list > ? k i > 1 ?1 0 > 2 ?2 0 > 3 ?2 1 > 4 ?3 0 > 5 ?3 1 > 6 ?3 2 > 7 ?3 3 > 8 ?4 0 > 9 ?4 1 > 10 4 2 > 11 5 0 > 12 5 1 > > I tried expand.grid, but I can't. > > Any suggestion will be greatly appreciated. > > Regards, > > Kathryn Lord > > -- > View this message in context: http://r.789695.n4.nabble.com/create-a-list-under-constraints-tp3714191p3714191.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From p_connolly at slingshot.co.nz Wed Aug 3 08:12:38 2011 From: p_connolly at slingshot.co.nz (Patrick Connolly) Date: Wed, 3 Aug 2011 18:12:38 +1200 Subject: [R] Error while trying to install a package In-Reply-To: <4E36BE570200009B000C2578@n6mcgw16.cchmc.org> References: <4E36BE570200009B000C2578@n6mcgw16.cchmc.org> Message-ID: <20110803061238.GA5650@slingshot.co.nz> On Mon, 01-Aug-2011 at 02:55PM -0400, Sushil Amirisetty wrote: |> |> Hi Everyone, |> |> When i try to install a package using |> |> > install.packages("agricolae") |> --- Please select a CRAN mirror for use in this session --- |> | |> |> |> The cursor keeps blinking i dont get a popup menu to choose a CRAN |> mirror? Is it due to my proxy server settings? I tried to echo |> $http_proxy , it doesnt carry any proxy , its blank. Please help |> me. Look back in the archives. There was a discussion in the last month or so on how you make those settings. I don't remember the details because they were to do with Windows (which I'm guessing is what you're using). |> Thanks, |> Sushil. |> [[alternative HTML version deleted]] |> |> ______________________________________________ |> R-help at r-project.org mailing list |> https://stat.ethz.ch/mailman/listinfo/r-help |> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html |> and provide commented, minimal, self-contained, reproducible code. -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. From mackay at northnet.com.au Wed Aug 3 08:22:20 2011 From: mackay at northnet.com.au (Duncan Mackay) Date: Wed, 03 Aug 2011 16:22:20 +1000 Subject: [R] convert a splus randomforest object to R In-Reply-To: <0F53A3F9BF49DF4DAB78D6E1839CA8400110C6E8@exmail01.scanheal thplan.com> References: <0F53A3F9BF49DF4DAB78D6E1839CA8400110C6E8@exmail01.scanhealthplan.com> Message-ID: <201108030624.p736O4XZ018576@mail15.tpg.com.au> Hi Jimmy Years ago I think that Splus introduced an argument when dumping of old.style = T or something similar to dump it into a form that could be read into R. This may only be for data.frames etc not things like random forest objects Regards Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mackay at northnet.com.au At 10:10 03/08/2011, you wrote: >Hi, > >I have a randomforest object "cost.rf" that was created in splus 8.0, >now I need to use this trained RF model in R. So in Splus, I dump the RF >file as below > >data.dump("cost.rf", file="cost.rf.txt", oldStyle=T) > >then in R, restore the dumped file, > >library(foreign) > >data.restore("cost.rf.txt") > >it works fine and able to restore the "cost.rf" object. But when I try >to pass a new data through this randomforest object using predict() >function, it gives me error message. > >in R: > >library(randomForest) >set.seed(2211) > >pred <- predict(cost.rf, InputData[ , ]) > >Error in object$forest$cutoff : $ operator is invalid for atomic vectors > > >Looks like after restoring the dump file, the object is not compatible >in R. Have anyone successfully converted a splus randomforest object to >R? what will be the appropriate method to do this? > >Thanks in advance. > >Jimmy > >========================================== >This communication contains information that is confidential, and >solely for the use of the intended recipient. It may contain >information that is privileged and exempt from disclosure under >applicable law. If you are not the intended recipient of this >communication, please be advised that any disclosure, copying, >distribution or use of this communication is strictly prohibited. >Please also immediately notify SCAN Health Plan at 1-800-247-5091, >x5263 and return the communication to the originating address. >Thank You. >========================================== > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. From allane at cybaea.com Wed Aug 3 08:49:22 2011 From: allane at cybaea.com (Allan Engelhardt (CYBAEA)) Date: Wed, 03 Aug 2011 07:49:22 +0100 Subject: [R] cdplot error In-Reply-To: <1312347148954-3714454.post@n4.nabble.com> References: <1312347148954-3714454.post@n4.nabble.com> Message-ID: <4E38EF72.7020004@cybaea.com> On 03/08/11 05:52, wildernessness wrote: > Fairly new at this. > Trying to create a conditional density plot. > >> cdplot(status~harvd.l,data=phy) > Error in cdplot.formula(status~harvd.l,data=phy): > dependent variable should be a factor > > What does this error mean? Status is a binary response of infestation (0/1) Probably status is a numerical variable rather than a factor**. Try print(is.factor(phy$status)) and if that is FALSE then phy$status <- factor(phy$status, labels=c("N", "Y")) cdplot(status~harvd.l,data=phy) Hope this helps a little. Allan > and harvd.l is the log of timber harvest density per catchment. > > Thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/cdplot-error-tp3714454p3714454.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From r.m.krug at gmail.com Wed Aug 3 09:03:01 2011 From: r.m.krug at gmail.com (Rainer M Krug) Date: Wed, 3 Aug 2011 09:03:01 +0200 Subject: [R] Finding dependancies? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Thorn.Thaler at rdls.nestle.com Wed Aug 3 09:24:13 2011 From: Thorn.Thaler at rdls.nestle.com (Thaler, Thorn, LAUSANNE, Applied Mathematics) Date: Wed, 3 Aug 2011 09:24:13 +0200 Subject: [R] lattice: index plot In-Reply-To: <4E382D77.1000700@ucalgary.ca> References: <4E382D77.1000700@ucalgary.ca> Message-ID: > Does > > xyplot(y ~ seq_along(y), xlab = "Index") > > do what you want? Not exactly, because it does not work once multipanel conditioning comes into play: xyplot(y~seq_along(y)|factor(rep(1:2, each=5)), xlab = "Index") The points in the right panel are plotted from 6:10 while the points in the left panel are plotted from 1:5. Of course I could do something like xyplot(y~rep(1:5, 2) |factor(rep(1:2, each=5)), xlab = "Index") in this toy example, but as pointed out this becomes very cumbersome if the grouping variable does not follow a pattern. BTW: my toy example did not work with multipanel conditioning either, but one can work around that too using the subscripts argument in the panel function (I skipped that exercise for the sake of brevity, but I must admit that it obscured somehow my real intention, sorry for that). However, the more I think of it the more I believe that I have to provide the x's explicitly nevertheless and my solution would be: set.seed(123) y <- rnorm(20) grp <- index <- sample(3, 20, TRUE) index[unlist(lapply(levels(as.factor(grp)), function(n) which(as.factor(grp)==n)))] <- unlist(tapply(grp, grp, seq_along)) xyplot(y ~ index | factor(grp), xlab = "Index") This should work, but it seems to be a rather elaborate solution, especially since an index plot is nothing too fancy. So maybe I'm not seeing the wood for trees, but does anybody know an easier way? Thanks. KR, -Thorn From ehlers at ucalgary.ca Wed Aug 3 09:36:42 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Wed, 03 Aug 2011 00:36:42 -0700 Subject: [R] cdplot error In-Reply-To: <1312347148954-3714454.post@n4.nabble.com> References: <1312347148954-3714454.post@n4.nabble.com> Message-ID: <4E38FA8A.6090003@ucalgary.ca> On 2011-08-02 21:52, wildernessness wrote: > Fairly new at this. > Trying to create a conditional density plot. > >> cdplot(status~harvd.l,data=phy) > Error in cdplot.formula(status~harvd.l,data=phy): > dependent variable should be a factor > > What does this error mean? Status is a binary response of infestation (0/1) > and harvd.l is the log of timber harvest density per catchment. Your question suggests that have not looked at help(cdplot) which clearly says just what the error message says and/or you aren't aware that 'factor' has a specific meaning in R in which case a look at chapter 4 of 'An Introduction to R' likely would be profitable. Peter Ehlers > > Thanks. > > -- > View this message in context: http://r.789695.n4.nabble.com/cdplot-error-tp3714454p3714454.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From antonio.rrz at gmail.com Wed Aug 3 10:04:50 2011 From: antonio.rrz at gmail.com (Antonio Rodriges) Date: Wed, 3 Aug 2011 11:04:50 +0300 Subject: [R] Running R in a sandbox Message-ID: Hello, The idea is to grant access of remote users to R running on Linux. Users must have ability to run their R scripts but avoid corrupting the operating system. How one can restrict/limit access of remote users to certain R functions? For example, dealing with IO (file system), graphical tools, etc. Thank you. -- Kind regards, Antonio Rodriges From ms3437 at gmx.de Wed Aug 3 08:43:57 2011 From: ms3437 at gmx.de (ms3437 at gmx.de) Date: Wed, 03 Aug 2011 08:43:57 +0200 Subject: [R] fixInNamespace Message-ID: <20110803064357.119250@gmx.net> Dear all, I would like to ask how one can access certain methods via fixInNamespace. Is there some option / way for selecting a certain methods for a defined signature. Thank you for your answer and efforts in advance! Best, Michael -- From zcatav at gmail.com Wed Aug 3 10:05:54 2011 From: zcatav at gmail.com (zcatav) Date: Wed, 3 Aug 2011 01:05:54 -0700 (PDT) Subject: [R] conditional data replace (recode, change or whatsoever) Message-ID: <1312358754120-3714715.post@n4.nabble.com> Hello, I have a big data.frame, a piece of it as follows. a b c d 1 58009 2010-11-02 0 NA 2 114761 NA 1 2008-11-05 3 184440 NA 1 2009-12-08 4 189372 NA 0 NA 5 105286 NA 0 NA 6 186717 NA 0 NA 7 189106 NA 0 NA 8 127306 NA 0 NA 9 157342 2011-04-25 0 NA I want to replace b[NA] values with "20011-07-28" where c==0. I use rstudio and i'm a novice. -- View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3714715.html Sent from the R help mailing list archive at Nabble.com. From petr.pikal at precheza.cz Wed Aug 3 11:18:39 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Wed, 3 Aug 2011 11:18:39 +0200 Subject: [R] Odp: conditional data replace (recode, change or whatsoever) In-Reply-To: <1312358754120-3714715.post@n4.nabble.com> References: <1312358754120-3714715.post@n4.nabble.com> Message-ID: Hi > > Hello, > I have a big data.frame, a piece of it as follows. > > a b c d > 1 58009 2010-11-02 0 NA > 2 114761 NA 1 2008-11-05 > 3 184440 NA 1 2009-12-08 > 4 189372 NA 0 NA > 5 105286 NA 0 NA > 6 186717 NA 0 NA > 7 189106 NA 0 NA > 8 127306 NA 0 NA > 9 157342 2011-04-25 0 NA > > I want to replace b[NA] values with "20011-07-28" where c==0. I use rstudio > and i'm a novice. I believe there are better solutions but I would use two steps select rows where c==0 (see also FAQ 7.31) sel<-which(big.data.frame$c==0) change NA values in b column based on sel big.data.frame$b[sel][is.na(big.data.frame$b[sel])]<-"20011-07-28" Beware of data types AFAIK R can not accept "20011-07-28" as a date. Regards Petr > > > -- > View this message in context: http://r.789695.n4.nabble.com/conditional- > data-replace-recode-change-or-whatsoever-tp3714715p3714715.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From rd6137 at gmail.com Wed Aug 3 10:28:18 2011 From: rd6137 at gmail.com (Romain DOUMENC) Date: Wed, 03 Aug 2011 10:28:18 +0200 Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: <1312358754120-3714715.post@n4.nabble.com> References: <1312358754120-3714715.post@n4.nabble.com> Message-ID: <4E3906A2.40709@gmail.com> Please do your homework before asking the list: An introduction to R, chapter 7 Am 03.08.2011 10:05, schrieb zcatav: > Hello, > I have a big data.frame, a piece of it as follows. > > a b c d > 1 58009 2010-11-02 0 NA > 2 114761 NA 1 2008-11-05 > 3 184440 NA 1 2009-12-08 > 4 189372 NA 0 NA > 5 105286 NA 0 NA > 6 186717 NA 0 NA > 7 189106 NA 0 NA > 8 127306 NA 0 NA > 9 157342 2011-04-25 0 NA > > I want to replace b[NA] values with "20011-07-28" where c==0. I use rstudio > and i'm a novice. > > > -- > View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3714715.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From fraenzi.korner at oikostat.ch Wed Aug 3 12:11:32 2011 From: fraenzi.korner at oikostat.ch (fraenzi.korner at oikostat.ch) Date: 3 Aug 2011 12:11:32 +0200 Subject: [R] =?utf-8?q?R-help_Digest=2C_Vol_102=2C_Issue_3?= Message-ID: <20110803101132.1303.qmail@srv5.yoursite.ch> Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden F?llen melden Sie sich bei Stefanie von Felten steffi.vonfelten at oikostat.ch We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfelten at oikostat.ch From bt_jannis at yahoo.de Wed Aug 3 12:30:03 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 3 Aug 2011 11:30:03 +0100 (BST) Subject: [R] syntax with do.call and `[` Message-ID: <1312367403.49149.YahooMailClassic@web28214.mail.ukl.yahoo.com> Dear List, i would like to mimic the behaviour or the following indexing with a do.call construct to be able to supply the arguments to `[` as a list: test = matrix[1:4,2] result = test[2,] My try, however, did not work: result = do.call(`[`,list(test,2,NULL)) result = do.call(`[`,list(test,2,)) result = do.call(`[`,list(test,2,'')) How can I use the do.call in that way with leaving the second indexing vector blanc? Cheers Jannis From ehlers at ucalgary.ca Wed Aug 3 12:57:26 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Wed, 3 Aug 2011 03:57:26 -0700 Subject: [R] Need to compute density as done by panel.histogram In-Reply-To: References: Message-ID: <4E392996.1000106@ucalgary.ca> On 2011-08-02 11:51, S?bastien Bihorel wrote: > Hi, > > This might be a simple problem but I don't know how to calculate a random > variable density the way panel.histogram does it before it creates the > actual density rectangles. The documentation says that it uses the density > function but the actual code suggests that the hist.constructor function > (which does not seem to be easily accessible). The documentation in ?histogram is misleading. I think that the intent is that density() is used in panel.densityplot but panel.histogram uses hist(), as is clear in ?panel.histogram. So you'll find the code for the density rectangles in hist.default where 'counts' is computed and followed with dens <- counts/(n * diff(breaks)) You might find the code for truehist() in the MASS package easy to follow. To see how hist.constructor calls hist(): lattice:::hist.constructor Peter Ehlers > > Any suggestion for computing the density values of foo$x in the following > example will be welcome. > > > require(lattice) > set.seed(12345) > > foo1<- > data.frame(x=rnorm(100,0,0.1),grp=1,by=rep(1:2,each=50),by2=rep(1:2,times=50)) > foo2<- > data.frame(x=rnorm(100,2,1),grp=2,by=rep(1:2,each=50),by2=rep(1:2,times=50)) > foo<- rbind(foo1,foo2) > > xplot<- histogram(~x,data=foo, type='density') > > > PS: the present question relates to a workaround for another problem > previously submitted to the list ( > https://stat.ethz.ch/pipermail/r-help/attachments/20110727/5f0a8853/attachment.pl). > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From michael.weylandt at gmail.com Wed Aug 3 13:10:44 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Wed, 3 Aug 2011 07:10:44 -0400 Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: <1312358754120-3714715.post@n4.nabble.com> References: <1312358754120-3714715.post@n4.nabble.com> Message-ID: <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> As others have noted, this is discussed in many free R tutorials, but if you want to do it in one line I think this should do it: X[is.NA(X[,"b"])&(X[,"c"]==0),"b"]<-"2011-07-28" #where X is the name of the data frame. It's a somewhat convoluted line of code but if you read it inside out the logic is clear: Find those rows where column b is NA and c is 1 by searching all rows of the relevant column (the X[,something] syntax): select those rows and the b column. Put the desired date in those slots. let me know of I can further clarify this. I changed the date assuming a typo on your end. Welcome and good luck getting started with R, Michael Weylandt On Aug 3, 2011, at 4:05 AM, zcatav wrote: > Hello, > I have a big data.frame, a piece of it as follows. > > a b c d > 1 58009 2010-11-02 0 NA > 2 114761 NA 1 2008-11-05 > 3 184440 NA 1 2009-12-08 > 4 189372 NA 0 NA > 5 105286 NA 0 NA > 6 186717 NA 0 NA > 7 189106 NA 0 NA > 8 127306 NA 0 NA > 9 157342 2011-04-25 0 NA > > I want to replace b[NA] values with "20011-07-28" where c==0. I use rstudio > and i'm a novice. > > > -- > View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3714715.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From zcatav at gmail.com Wed Aug 3 13:02:42 2011 From: zcatav at gmail.com (zcatav) Date: Wed, 3 Aug 2011 04:02:42 -0700 (PDT) Subject: [R] Odp: conditional data replace (recode, change or whatsoever) In-Reply-To: References: <1312358754120-3714715.post@n4.nabble.com> Message-ID: <1312369362818-3715080.post@n4.nabble.com> Petr Pikal wrote: > > Hi > I believe there are better solutions but I would use two steps > > select rows where c==0 (see also FAQ 7.31) > sel<-which(big.data.frame$c==0) > > change NA values in b column based on sel > big.data.frame$b[sel][is.na(big.data.frame$b[sel])]<-"20011-07-28" > > Beware of data types AFAIK R can not accept "20011-07-28" as a date. > > Regards > Petr > > Thanks, it runs like a charm. Replaced date format just a typo. -- View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3715080.html Sent from the R help mailing list archive at Nabble.com. From zcatav at gmail.com Wed Aug 3 14:09:30 2011 From: zcatav at gmail.com (zcatav) Date: Wed, 3 Aug 2011 05:09:30 -0700 (PDT) Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> Message-ID: <1312373370210-3715218.post@n4.nabble.com> Your suggestion works perfect as i pointed previous message. Now have another question about data editing. I try this code: X[X[,"c"]==1,"b"]<-X[,"d"] and results with error: `[<-.data.frame`(`*tmp*`, X[, "c"] == 1, "b", value = c(NA, : replacement has 9 rows, data has 2 Logically i selected 2 rows with X[,"c"]==1. Than i want to replace in that rows its own data from "d" to "b" with X[,"b"]<-X[,"d"]. What is wrong? -- View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3715218.html Sent from the R help mailing list archive at Nabble.com. From MorganPH at cardiff.ac.uk Wed Aug 3 14:12:27 2011 From: MorganPH at cardiff.ac.uk (Peter Morgan) Date: Wed, 3 Aug 2011 13:12:27 +0100 Subject: [R] Coefficient names when using lm() with contrasts Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Wed Aug 3 14:31:35 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Wed, 3 Aug 2011 08:31:35 -0400 Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: <1312373370210-3715218.post@n4.nabble.com> References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> <1312373370210-3715218.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jvadams at usgs.gov Wed Aug 3 14:42:10 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Wed, 3 Aug 2011 07:42:10 -0500 Subject: [R] syntax with do.call and `[` In-Reply-To: <1312367403.49149.YahooMailClassic@web28214.mail.ukl.yahoo.com> References: <1312367403.49149.YahooMailClassic@web28214.mail.ukl.yahoo.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ggrothendieck at gmail.com Wed Aug 3 14:42:13 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Wed, 3 Aug 2011 08:42:13 -0400 Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: <1312373370210-3715218.post@n4.nabble.com> References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> <1312373370210-3715218.post@n4.nabble.com> Message-ID: On Wed, Aug 3, 2011 at 8:09 AM, zcatav wrote: > Your suggestion works perfect as i pointed previous message. Now have another > question about data editing. I try this code: > X[X[,"c"]==1,"b"]<-X[,"d"] > and results with error: `[<-.data.frame`(`*tmp*`, X[, "c"] == 1, "b", value > = c(NA, ?: > ?replacement has 9 rows, data has 2 > > Logically i selected 2 rows with X[,"c"]==1. Than i want to replace in that > rows its own data from "d" to "b" with X[,"b"]<-X[,"d"]. What is wrong? > Also check out transform and ifelse, e.g. transform(X, b = ifelse(is.na(b) & c == 0, "2011-07-28", b)) transform(X, b = ifelse(c == 1, d, c)) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From f.harrell at vanderbilt.edu Wed Aug 3 14:53:57 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Wed, 3 Aug 2011 05:53:57 -0700 (PDT) Subject: [R] How to make a nomogam and Calibration plot In-Reply-To: <1312348512395-3714477.post@n4.nabble.com> References: <1312213591451-3710068.post@n4.nabble.com> <1312215015817-3710126.post@n4.nabble.com> <1312348512395-3714477.post@n4.nabble.com> Message-ID: <1312376037359-3715336.post@n4.nabble.com> The nomogram you included was produced by the Design package, the precursor to the rms package. You will have to take the time to intensively read the rms package documentation. Note that how you developed the model (e.g., allowing for non-linearity in log PSA, not using stepwise regression which invalidates the results, making sure all clinically relevant predictors are in the model, ...) is the most important step. The process you are going through generally requires an M.S. in biostatistics. Frank sytangping wrote: > > Dear Harrell, > > Many thanks for your quick response! > However, after try and try, I still have difficulty to solve my questions. > I post my questions again. I hope someone can help me run the data and > draw the nomogram and calibration plot for me. I know that is not good but > indeed I have no way to go. The problems almost drove me mad! > > Best regards! > > Ping Tang > > Dear R users, > > I am a new R user and something stops me when I try to write a academic > article. I want to make a nomogram to predict the risk of prostate cancer > (PCa) using several factors which have been selected from the Logistic > regression run under the SPSS. Always, a calibration plot is needed to > validate the prediction accuracy of the nomogram. > However, I tried many times and read a lot of posts with respect to this > topic but I still couldn't figure out how to draw the nomogram and the > calibration plot. Attached file is the dataset for the research. It will > be very grateful if someone can save his/her time to help for my > questions. > > Warmest regards! > > Logistic Regression > > Classification Tablea,b > Observed Predicted > Pca-YN Percentage Correct > 0 1 > Step 0 Pca-YN 0 295 0 100.0 > 1 218 0 .0 > Overall Percentage 57.5 > > > Variables in the Equation > B S.E. Wald df Sig. Exp(B) 95.0% C.I.for EXP(B) > Lower Upper > Step 1a Age .031 .015 4.491 1 .034 1.032 1.002 1.062 > DRE 1.173 .266 19.492 1 .000 3.233 1.920 5.443 > LogPV -2.857 .509 31.532 1 .000 .057 .021 .156 > LogPSA 2.316 .246 88.416 1 .000 10.132 6.253 16.419 > Constant -1.024 1.273 .648 1 .421 .359 > > > The equation: > > Probability = > > e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA > 1+e-1.024+0.31age+1.173DRE+-2.857LogPV+2.316LogPSA > > My questions are, > > 1.How to draw a nomogram (similar to the below figure 1) to predict the > probability of cancer using R? > > 2. How to make the Calibration plot (similar to the below figure 2) which > used to validate the prediction accuracy of the nomogram using R? And how > to calculate the concordance index (C-index) ? > > > http://r.789695.n4.nabble.com/file/n3714477/untitled.jpg > http://r.789695.n4.nabble.com/file/n3714477/%E9%99%84%E4%BB%B62.jpg > http://r.789695.n4.nabble.com/file/n3714477/Dataset.xls Dataset.xls > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3715336.html Sent from the R help mailing list archive at Nabble.com. From dieter.menne at menne-biomed.de Wed Aug 3 14:59:09 2011 From: dieter.menne at menne-biomed.de (Dieter Menne) Date: Wed, 3 Aug 2011 05:59:09 -0700 (PDT) Subject: [R] Running R in a sandbox In-Reply-To: References: Message-ID: <1312376349163-3715351.post@n4.nabble.com> Antonio Rodriges wrote: > > > The idea is to grant access of remote users to R running on Linux. Users > must have ability to run their > R scripts but avoid corrupting the operating system. > > Check RStudio.org Dieter -- View this message in context: http://r.789695.n4.nabble.com/Running-R-in-a-sandbox-tp3714716p3715351.html Sent from the R help mailing list archive at Nabble.com. From dieter.menne at menne-biomed.de Wed Aug 3 15:03:39 2011 From: dieter.menne at menne-biomed.de (Dieter Menne) Date: Wed, 3 Aug 2011 06:03:39 -0700 (PDT) Subject: [R] xlsx error In-Reply-To: References: Message-ID: <1312376619973-3715367.post@n4.nabble.com> Andrew Winterman wrote: > > > I'm trying to use the xlsx package to read a series of excel spreadsheets > into R, but my code is failing at the first step. > > I setwd into my the directory with the spreadsheets, and, as a test ask > for > the first one: > > read.xlsx(file = "Argentina Final.xls", sheetIndex = 1) > I promptly get an error message: > Error in .jcall(row[[ir]], "Lorg/apache/poi/xssf/usermodel/XSSFCell;", : > > Check if your Java installation is ok. Try to access an rJava function directly for a test Dieter -- View this message in context: http://r.789695.n4.nabble.com/xlsx-error-tp3714057p3715367.html Sent from the R help mailing list archive at Nabble.com. From reith_william at bah.com Wed Aug 3 15:07:46 2011 From: reith_william at bah.com (wwreith) Date: Wed, 3 Aug 2011 06:07:46 -0700 (PDT) Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <4E385D31.7090504@ohsu.edu> References: <1312310369354-3713305.post@n4.nabble.com> <4E385D31.7090504@ohsu.edu> Message-ID: <1312376866384-3715382.post@n4.nabble.com> So I take it 3D pie charts are out? P.S. It is not about hiding anything. It is about consulting and being told by your client to make 3D pie charts and change this font or that color to make the graphs more apealing. Given that I am the one trying to open the door to using R where I work it would be much easier if I could simply use a 2D graph. -- View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3715382.html Sent from the R help mailing list archive at Nabble.com. From dwinsemius at comcast.net Wed Aug 3 15:08:52 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 09:08:52 -0400 Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: <1312373370210-3715218.post@n4.nabble.com> References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> <1312373370210-3715218.post@n4.nabble.com> Message-ID: <058ADE52-636D-4DD0-9B44-67071D139C66@comcast.net> On Aug 3, 2011, at 8:09 AM, zcatav wrote: > Your suggestion works perfect as i pointed previous message. Now > have another > question about data editing. I try this code: > X[X[,"c"]==1,"b"]<-X[,"d"] > and results with error: `[<-.data.frame`(`*tmp*`, X[, "c"] == 1, > "b", value > = c(NA, : > replacement has 9 rows, data has 2 > > Logically i selected 2 rows with X[,"c"]==1. Than i want to replace > in that > rows its own data from "d" to "b" with X[,"b"]<-X[,"d"]. What is > wrong? You need to apply the same logical test/selection on the rows of the RHS as you are doing on the LHS. Possibly: X[ X[,"c"]==1, "b"] <- X[ X[,"c"]==1, "d"] (No data, not tested code.) -- David Winsemius, MD West Hartford, CT From b.rowlingson at lancaster.ac.uk Wed Aug 3 15:16:00 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Wed, 3 Aug 2011 14:16:00 +0100 Subject: [R] Running R in a sandbox In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 9:04 AM, Antonio Rodriges wrote: > Hello, > > The idea is to grant access of remote users to R running on Linux. > Users must have ability to run their > R scripts but avoid corrupting the operating system. Ordinary users can't corrupt the operating system on Linux[1]. The worst they can do is run CPU- and memory-intensive tasks that can slow things down for everyone and conceivably bring the system to a halt, but there are ways of limiting CPU and memory usage per user session. What don't you want them to do? Barry [1] Security holes excepted. But those will be present in any sandbox solution. From mohammad.gh at gmail.com Wed Aug 3 14:53:04 2011 From: mohammad.gh at gmail.com (mohammad.gh at gmail.com) Date: Wed, 3 Aug 2011 05:53:04 -0700 (PDT) Subject: [R] Error Installing or Updating Packages (Maybe because of a proxy) In-Reply-To: <20110609122207.0da6d381.olivier.crouzet@univ-nantes.fr> References: <4DAE995D.9050504@statistik.tu-dortmund.de> <20110608173236.ba844ccb.olivier.crouzet@univ-nantes.fr> <20110609121455.916aab99.olivier.crouzet@univ-nantes.fr> <20110609122207.0da6d381.olivier.crouzet@univ-nantes.fr> Message-ID: <1312375984623-3715332.post@n4.nabble.com> Hello David, I encountered the same problem of yours. What did you do to resolve it? Thanks for your reply Mohammad -- View this message in context: http://r.789695.n4.nabble.com/Error-Installing-or-Updating-Packages-Maybe-because-of-a-proxy-tp3462312p3715332.html Sent from the R help mailing list archive at Nabble.com. From caroline.faisst at gmail.com Wed Aug 3 15:25:45 2011 From: caroline.faisst at gmail.com (Caroline Faisst) Date: Wed, 3 Aug 2011 15:25:45 +0200 Subject: [R] slow computation of functions over large datasets Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From timothy.c.bates at gmail.com Wed Aug 3 15:16:56 2011 From: timothy.c.bates at gmail.com (Timothy Bates) Date: Wed, 3 Aug 2011 14:16:56 +0100 Subject: [R] equivalent of var.test(x,y) for skew and kurtosis Message-ID: <652BA3AA-0B09-4E19-A224-735DD92BCFBB@gmail.com> Dear R-users, I am comparing differences in variance, skew, and kurtosis between two groups. For variance the comparison is easy: just var.test(group1, group2) I am using agostino.test() for skew, and anscombe.test() for kurtosis. However, I can't find an equivalent of the F.test or Mood.test for comparing kurtosis or skewness between two samples. Would the test just be a 1 df test on the difference in Z or F scores returned by the agostino or anscombe? How are the differences distributed: chi2? Any guidance greatly appreciated. google and wikipedia return hits for measuring the third and fourth standardized moments, but none I can see for comparing differences on these parameters. best, tim From Thierry.ONKELINX at inbo.be Wed Aug 3 15:59:08 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Wed, 3 Aug 2011 13:59:08 +0000 Subject: [R] slow computation of functions over large datasets In-Reply-To: References: Message-ID: Dear Caroline, Here is a faster and more elegant solution. > n <- 10000 > exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) > library(plyr) > system.time({ + ddply(exampledata, .(orderID), function(x){ + data.frame(itemPrice = x$itemPrice, orderAmount = cumsum(x$itemPrice)) + }) + }) user system elapsed 1.67 0.00 1.69 > exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] > system.time(for (i in 2:length(exampledata[,1])) + {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) user system elapsed 11.94 0.02 11.97 Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Caroline Faisst > Verzonden: woensdag 3 augustus 2011 15:26 > Aan: r-help at r-project.org > Onderwerp: [R] slow computation of functions over large datasets > > Hello there, > > > I'm computing the total value of an order from the price of the order items using > a "for" loop and the "ifelse" function. I do this on a large dataframe (close to > 1m lines). The computation of this function is painfully slow: in 1min only about > 90 rows are calculated. > > > The computation time taken for a given number of rows increases with the size > of the dataset, see the example with my function below: > > > # small dataset: function performs well > > exampledata<- > data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) > > exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] > > system.time(for (i in 2:length(exampledata[,1])) > {exampledata[i,"orderAmount"]<- > ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i- > 1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) > > > # large dataset: the very same computational task takes much longer > > exampledata2<- > data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,1 > 0,1,9,7,25:2000020)) > > exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] > > system.time(for (i in 2:9) > {exampledata2[i,"orderAmount"]<- > ifelse(exampledata2[i,"orderID"]==exampledata2[i- > 1,"orderID"],exampledata2[i- > 1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) > > > > Does someone know a way to increase the speed? > > > Thank you very much! > > Caroline > > [[alternative HTML version deleted]] From vishalthapar at gmail.com Wed Aug 3 16:06:23 2011 From: vishalthapar at gmail.com (Vishal Thapar) Date: Wed, 3 Aug 2011 10:06:23 -0400 Subject: [R] Combining multiple dependent variables for machine learning Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pomchip at free.fr Wed Aug 3 16:07:02 2011 From: pomchip at free.fr (=?ISO-8859-1?Q?S=E9bastien_Bihorel?=) Date: Wed, 3 Aug 2011 10:07:02 -0400 Subject: [R] Need to compute density as done by panel.histogram In-Reply-To: <4E392996.1000106@ucalgary.ca> References: <4E392996.1000106@ucalgary.ca> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sarah.goslee at gmail.com Wed Aug 3 16:16:04 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 3 Aug 2011 10:16:04 -0400 Subject: [R] Combining multiple dependent variables for machine learning In-Reply-To: References: Message-ID: Hi, On Wed, Aug 3, 2011 at 10:06 AM, Vishal Thapar wrote: > Hi, > > I apologize for posting this here, I am also trying to post this on machine > learning emailing lists. > I have a set (18K) of sequences (22 nt long) and I have their counts at 4 > different stages. The difference in counts from one stage to the next > represents how well the sequence performed in the transition. The total > counts remain about the same in each stage. So if a 1 sequence loses some > counts in 1 stage, another sequence gains those counts in that stage. I am > trying to build a predictor that combines these 4 stages. I have already > tried to build an SVM using just the counts in the final stage but its not > that great (0.3 correlation with test set). The problem I am facing now is > how to combine these 4 stages into 1 dependent variable or something like > that. The 4 stages are the dependent variables and the sequence is my > independent variable. The aim is to use the count information in each stage > to select how well the sequence performs across all 4 stages. > > I appreciate any suggestions for this problem. Suggestions? Yes. Read the posting guide and follow it. It isn't clear that this is even an R question, since you don't tell us anything about the packages or functions you are using, or about your data. There aren't any actual questions in your message, and your problem statement is exceedingly vague. You might find more help on the Bioconductor list, if in fact you are using R for your problem. Sarah -- Sarah Goslee http://www.functionaldiversity.org From ehlers at ucalgary.ca Wed Aug 3 16:16:52 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Wed, 03 Aug 2011 07:16:52 -0700 Subject: [R] lattice: index plot In-Reply-To: References: <4E382D77.1000700@ucalgary.ca> Message-ID: <4E395854.2020306@ucalgary.ca> On 2011-08-03 00:24, Thaler,Thorn,LAUSANNE,Applied Mathematics wrote: >> Does >> >> xyplot(y ~ seq_along(y), xlab = "Index") >> >> do what you want? > > > Not exactly, because it does not work once multipanel conditioning comes > into play: > > xyplot(y~seq_along(y)|factor(rep(1:2, each=5)), xlab = "Index") > > The points in the right panel are plotted from 6:10 while the points in > the left panel are plotted from 1:5. Of course I could do something like > > > xyplot(y~rep(1:5, 2) |factor(rep(1:2, each=5)), xlab = "Index") > > in this toy example, but as pointed out this becomes very cumbersome if > the grouping variable does not follow a pattern. > > BTW: my toy example did not work with multipanel conditioning either, > but one can work around that too using the subscripts argument in the > panel function (I skipped that exercise for the sake of brevity, but I > must admit that it obscured somehow my real intention, sorry for that). > > However, the more I think of it the more I believe that I have to > provide the x's explicitly nevertheless and my solution would be: > > set.seed(123) > y<- rnorm(20) > grp<- index<- sample(3, 20, TRUE) > index[unlist(lapply(levels(as.factor(grp)), function(n) > which(as.factor(grp)==n)))]<- unlist(tapply(grp, grp, seq_along)) > xyplot(y ~ index | factor(grp), xlab = "Index") > > This should work, but it seems to be a rather elaborate solution, > especially since an index plot is nothing too fancy. > > So maybe I'm not seeing the wood for trees, but does anybody know an > easier way? Here's a way to use 'subscripts' in the xyplot. The main problem is to determine the xlims to use. dat <- data.frame(y, grp) ## xlims xL <- function(groups){ tbl <- table(groups) xlim <- c(0, max(tbl) + 1) xlim } xyplot(y ~ seq_along(y) | factor(grp), data = dat, xlim = xL(dat$grp), panel = function(y, subscripts){ x <- seq_along(subscripts) panel.xyplot(x, y) } ) Peter Ehlers > > Thanks. > > KR, > > -Thorn > > > From jszhao at yeah.net Wed Aug 3 16:18:41 2011 From: jszhao at yeah.net (Jinsong Zhao) Date: Wed, 03 Aug 2011 22:18:41 +0800 Subject: [R] confint() in stats4 package Message-ID: <4E3958C1.5000903@yeah.net> Hi there, I had a problem when I hoped to get confidence intervals for the parameters I got using mle() of stats4 package. This problem would not appear if ``fixed'' option was not used. The following mini-example will demo the problem: x <- c(100, 56, 32, 18, 10, 1) r <- c(18, 17, 10, 6, 4, 3) n <- c(18, 22, 17, 21, 23, 20) loglik.1 <- function(alpha, beta, c) { x <- log10(x) P <- c + (1-c) * pnorm(alpha + beta * x) control <- which(x == -Inf) if (length(control) != 0) P[control] <- c P <- pmax(pmin(P,1),0) -(sum(r * log(P)) + sum((n - r)* log(1-P))) } loglik.2 <- function(alpha, beta) { x <- log10(x) P <- pnorm(alpha + beta * x) P <- pmax(pmin(P,1),0) -(sum(r * log(P)) + sum((n - r)* log(1-P))) } library(stats4) fit.1 <- mle(loglik.1, start = list(alpha = 0, beta = 0, c = 0), method = "BFGS", fixed = list(c=0)) fit.2 <- mle(loglik.2, start = list(alpha = 0, beta = 0), method = "BFGS", fixed = list()) > confint(fit.1) Profiling... Error in approx(sp$y, sp$x, xout = cutoff) : need at least two non-NA values to interpolate In addition: Warning message: In approx(sp$y, sp$x, xout = cutoff) : collapsing to unique 'x' values > confint(fit.2) Profiling... 2.5 % 97.5 % alpha -2.5187909 -1.144600 beta 0.9052395 1.876322 The version I test the above code is 2.11.1 and 2.13.1. I hope to know what's the matter? and how to avoid the error, and get the correct confidence intervals for the parameters? Any suggestions will be really appreciated. P.S.: I noticed that there was a file named mle.R.rej in the source directory of stats4. A broken patch? Thanks! Regards, Jinsong From pdalgd at gmail.com Wed Aug 3 16:24:26 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Wed, 3 Aug 2011 16:24:26 +0200 Subject: [R] syntax with do.call and `[` In-Reply-To: <1312367403.49149.YahooMailClassic@web28214.mail.ukl.yahoo.com> References: <1312367403.49149.YahooMailClassic@web28214.mail.ukl.yahoo.com> Message-ID: <56C31DEB-6365-48F3-A033-F49039A18B25@gmail.com> On Aug 3, 2011, at 12:30 , Jannis wrote: > Dear List, > > > > i would like to mimic the behaviour or the following indexing with a do.call construct to be able to supply the arguments to `[` as a list: > > > test = matrix[1:4,2] > > result = test[2,] > > > My try, however, did not work: > > result = do.call(`[`,list(test,2,NULL)) > result = do.call(`[`,list(test,2,)) > result = do.call(`[`,list(test,2,'')) > > > How can I use the do.call in that way with leaving the second indexing vector blanc? > alist() actually allows this, although probably more by coincidence than by design. Watch: > do.call(`[`, alist(test, 2, )) [1] 2 4 If you want to turn this into a programming idiom, be aware that there are subtle differences because alist() does not evaluate its arguments. E.g., the two plots below are not quite the same. > x <- 1:10 > y <- rnorm(10) > do.call(plot, list(x, y)) > do.call(plot, alist(x, y)) -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From dwinsemius at comcast.net Wed Aug 3 16:26:36 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 10:26:36 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: References: Message-ID: <4498AEDA-CE00-4656-A0C6-123EE5DBDB3C@comcast.net> On Aug 3, 2011, at 9:25 AM, Caroline Faisst wrote: > Hello there, > > > I?m computing the total value of an order from the price of the > order items > using a ?for? loop and the ?ifelse? function. Ouch. Schools really should stop teaching SAS and BASIC as a first language. > I do this on a large dataframe > (close to 1m lines). The computation of this function is painfully > slow: in > 1min only about 90 rows are calculated. > > > The computation time taken for a given number of rows increases with > the > size of the dataset, see the example with my function below: > > > # small dataset: function performs well > > exampledata<- > data > .frame > (orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) > > exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] > > system.time(for (i in 2:length(exampledata[,1])) > {exampledata[i,"orderAmount"]<- > ifelse > (exampledata > [i > ,"orderID > "]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"] > +exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) Try instead using 'ave' to calculate a cumulative 'sum' within "orderID": exampledata$orderAmt <- with(exampledata, ave(itemPrice, orderID, FUN=cumsum) ) I assure you this will be more reproducible, faster, and understandable. > # large dataset: "medium" dataset really. Barely nudges the RAM dial on my machine. > the very same computational task takes much longer > > exampledata2<- > data > .frame > (orderID > = > c > (1,1,1,2,2,3,3,3,4,5 > :2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) > > exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] > > system.time(for (i in 2:9) > {exampledata2[i,"orderAmount"]<- > ifelse > (exampledata2 > [i > ,"orderID > "]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"] > +exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) > > > system.time( exampledata2$orderAmt <- with(exampledata2, ave(itemPrice, orderID, FUN=cumsum) ) ) user system elapsed 35.106 0.811 35.822 On a three year-old machine. Not as fast as I expected, but not long enough to require refilling the coffee cup either. -- David. > > Does someone know a way to increase the speed? > -- David Winsemius, MD West Hartford, CT From d.schwegler at email.de Wed Aug 3 16:33:41 2011 From: d.schwegler at email.de (Diana Schwegler) Date: Wed, 3 Aug 2011 07:33:41 -0700 (PDT) Subject: [R] step Message-ID: <1312382021343-3715681.post@n4.nabble.com> Hello I am using the "step" function in order to do backward selection for a linear model of more than 200 variables but it doesn't work correctly. I think, there is a problem, if the matrix has same or more columns than rows. And if the matrix has too much columns the step-function doesn't work because the function will work with all columns together and I think, this is the problem. Is there a solution or a bug fixing of this problem? Thanks a lot -- View this message in context: http://r.789695.n4.nabble.com/step-tp3715681p3715681.html Sent from the R help mailing list archive at Nabble.com. From ripley at stats.ox.ac.uk Wed Aug 3 16:40:32 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 3 Aug 2011 15:40:32 +0100 (BST) Subject: [R] syntax with do.call and `[` In-Reply-To: <56C31DEB-6365-48F3-A033-F49039A18B25@gmail.com> References: <1312367403.49149.YahooMailClassic@web28214.mail.ukl.yahoo.com> <56C31DEB-6365-48F3-A033-F49039A18B25@gmail.com> Message-ID: On Wed, 3 Aug 2011, peter dalgaard wrote: > > On Aug 3, 2011, at 12:30 , Jannis wrote: > >> Dear List, >> >> >> >> i would like to mimic the behaviour or the following indexing with a do.call construct to be able to supply the arguments to `[` as a list: >> >> >> test = matrix[1:4,2] >> >> result = test[2,] >> >> >> My try, however, did not work: >> >> result = do.call(`[`,list(test,2,NULL)) >> result = do.call(`[`,list(test,2,)) >> result = do.call(`[`,list(test,2,'')) >> >> >> How can I use the do.call in that way with leaving the second indexing vector blanc? >> > > alist() actually allows this, although probably more by coincidence than by design. > > Watch: > >> do.call(`[`, alist(test, 2, )) > [1] 2 4 > > If you want to turn this into a programming idiom, be aware that there are subtle differences because alist() does not evaluate its arguments. E.g., the two plots below are not quite the same. > >> x <- 1:10 >> y <- rnorm(10) >> do.call(plot, list(x, y)) >> do.call(plot, alist(x, y)) I decided to forbear suggesting that, not least as someone who writes >> test = matrix[1:4,2] appears to know very little R and test even less. But for indexing the answer could be something like do.call(`[`, list(test, 2, TRUE)) as in almost all cases (including here) an empty index is equivalent to TRUE (which is recycled to the required length). > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From nightwolfzor at gmail.com Wed Aug 3 16:23:09 2011 From: nightwolfzor at gmail.com (NightWolf) Date: Wed, 3 Aug 2011 07:23:09 -0700 (PDT) Subject: [R] Rattle loading String to Vector file from WEKA Message-ID: <1312381389502-3715641.post@n4.nabble.com> Hi all, I have been using WEKA to do some text classification work and I want to try out R. The problem is I cannot load the String to Vector ARFF files created by WEKA's string parser into Rattle . Looking at the logs I get something like: /Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : scan() expected 'a real', got '2281}'/ *My ARFF data file looks a bit like this:* /@relation 'reviewData' @attribute polarity {0,2} ..... @attribute $$ numeric @attribute we numeric @attribute wer numeric @attribute win numeric @attribute work numeric @data {0 2,63 1,71 1,100 1,112 1,140 1,186 1,228 1} {14 1,40 1,48 1,52 1,61 1,146 1} {2 1,41 1,43 1,57 1,71 1,79 1,106 1,108 1,133 1,146 1,149 1,158 1,201 1} {0 2,6 1,25 1,29 1,42 1,49 1,69 1,82 1,108 1,116 1,138 1,140 1,155 1} ..../ Any ideas how I can convert this into an R readable format? Cheers! -- View this message in context: http://r.789695.n4.nabble.com/R-Rattle-loading-String-to-Vector-file-from-WEKA-tp3715641p3715641.html Sent from the R help mailing list archive at Nabble.com. From zcatav at gmail.com Wed Aug 3 15:54:12 2011 From: zcatav at gmail.com (zcatav) Date: Wed, 3 Aug 2011 06:54:12 -0700 (PDT) Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> <1312373370210-3715218.post@n4.nabble.com> Message-ID: <1312379652405-3715525.post@n4.nabble.com> Gabor Grothendieck wrote: > > On Wed, Aug 3, 2011 at 8:09 AM, zcatav <zcatav at gmail.com> wrote: >> Your suggestion works perfect as i pointed previous message. Now have >> another >> question about data editing. I try this code: >> X[X[,"c"]==1,"b"]<-X[,"d"] >> and results with error: `[<-.data.frame`(`*tmp*`, X[, "c"] == 1, "b", >> value >> = c(NA, ?: >> ?replacement has 9 rows, data has 2 >> >> Logically i selected 2 rows with X[,"c"]==1. Than i want to replace in >> that >> rows its own data from "d" to "b" with X[,"b"]<-X[,"d"]. What is wrong? >> > > Also check out transform and ifelse, e.g. > > transform(X, b = ifelse(is.na(b) & c == 0, "2011-07-28", b)) > > transform(X, b = ifelse(c == 1, d, c)) > > > transform(X, b = ifelse(is.na(b) & c == 0, "2011-07-28", b)) This code results as follows. Data at [1,b] and [9,b] not managed as Date. a b c d 1 58009 14915 0 2 114761 1 2008-11-05 3 184440 1 2009-12-08 4 189372 2011-07-28 0 5 105286 2011-07-28 0 6 186717 2011-07-28 0 7 189106 2011-07-28 0 8 127306 2011-07-28 0 9 157342 15089 0 And the second code > transform(X, b = ifelse(c == 1, d, c)) results as follows. Data at [,b] are completly lost. a b c d 1 58009 1 0 2 114761 14188 1 2008-11-05 3 184440 14586 1 2009-12-08 4 189372 1 0 5 105286 1 0 6 186717 1 0 7 189106 1 0 8 127306 1 0 9 157342 1 0 I think this solution not proper for me. -- View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3715525.html Sent from the R help mailing list archive at Nabble.com. From zcatav at gmail.com Wed Aug 3 15:57:49 2011 From: zcatav at gmail.com (zcatav) Date: Wed, 3 Aug 2011 06:57:49 -0700 (PDT) Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: <058ADE52-636D-4DD0-9B44-67071D139C66@comcast.net> References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> <1312373370210-3715218.post@n4.nabble.com> <058ADE52-636D-4DD0-9B44-67071D139C66@comcast.net> Message-ID: <1312379869684-3715544.post@n4.nabble.com> David Winsemius wrote: > > On Aug 3, 2011, at 8:09 AM, zcatav wrote: > ........................ > You need to apply the same logical test/selection on the rows of the > RHS as you are doing on the LHS. > Possibly: > > X[ X[,"c"]==1, "b"] <- X[ X[,"c"]==1, "d"] > > This solution was suggested by R. Michael Weylandt and it works great. -- View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3715544.html Sent from the R help mailing list archive at Nabble.com. From wtemptation at hotmail.co.uk Wed Aug 3 16:19:33 2011 From: wtemptation at hotmail.co.uk (xy) Date: Wed, 3 Aug 2011 07:19:33 -0700 (PDT) Subject: [R] lme4 help pls! Message-ID: <1312381173062-3715628.post@n4.nabble.com> Hi, I have some difficulties to work with the function lmer from lme4. My responses are binary form and i want to use forward selection to my 12 covariates but i dont know how can I choose them based on deviance. Can someone pls give me a example so i can apply. For example my covariates are gestation,smoking ...and my response baby b1=lmer(baby~ (1|id), data, binomial) Thanks. -- View this message in context: http://r.789695.n4.nabble.com/lme4-help-pls-tp3715628p3715628.html Sent from the R help mailing list archive at Nabble.com. From guillaume_bs at hotmail.com Wed Aug 3 16:20:28 2011 From: guillaume_bs at hotmail.com (Guillaume) Date: Wed, 3 Aug 2011 07:20:28 -0700 (PDT) Subject: [R] Memory limit in Aggregate() In-Reply-To: References: <1312278358738-3711819.post@n4.nabble.com> <1D3F4706-DFE7-4E0F-95D9-955331EEAAD0@gmail.com> <1312297814593-3712671.post@n4.nabble.com> <7BE2D45D-B379-49E4-8A51-25914C7569EE@gmail.com> <1312304960604-3713042.post@n4.nabble.com> Message-ID: <1312381228029-3715633.post@n4.nabble.com> Hi Peter, Thanks for these information. I used a column concatenating the listBy data to do this aggregation : (I don't know if it's the best solution, but it seems to work). aggregateMultiBy <- function(x, by, FUN){ tableBy = data.frame(by) tableBy$byKey = "" for(colBy in names(by)) tableBy$byKey = paste(tableBy$byKey, as.character(tableBy[,colBy]),"") tableOut <- aggregate( x = x , by = list(byKey = tableBy$byKey) , FUN = FUN) tableOut <- merge( x = tableOut , y = tableBy , by = "byKey") tableOut$byKey <- NULL return(tableOut) } Thanks again, Guillaume -- View this message in context: http://r.789695.n4.nabble.com/Memory-limit-in-Aggregate-tp3711819p3715633.html Sent from the R help mailing list archive at Nabble.com. From niu at isis.georgetown.edu Wed Aug 3 16:41:38 2011 From: niu at isis.georgetown.edu (tn85) Date: Wed, 3 Aug 2011 07:41:38 -0700 (PDT) Subject: [R] How to map current Europe? Message-ID: <1312382498584-3715709.post@n4.nabble.com> Hello All, I was trying to generate a map of Europe with the following codes: europe<-map(database="world", fill=FALSE, plot=TRUE,xlim=c(-25,70),ylim=c(35,71)) However, the "world" database is too old to have right European country names. Could anyone help? Thanks, Tianchan -- View this message in context: http://r.789695.n4.nabble.com/How-to-map-current-Europe-tp3715709p3715709.html Sent from the R help mailing list archive at Nabble.com. From gunter.berton at gene.com Wed Aug 3 16:49:25 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Wed, 3 Aug 2011 07:49:25 -0700 Subject: [R] Combining multiple dependent variables for machine learning -- fortunes candidate? Message-ID: I thought Sarah's reply was great and, alas, should probably be templated for this list. Not sure it fits as a fortunes package entry, but I thought it at least worthy of consideration. Cheers, Bert >> ... >> I appreciate any suggestions for this problem. Sarah Goslee replied: > Suggestions? Yes. Read the posting guide and follow it. It isn't clear that > this is even an R question, since you don't tell us anything about the > packages or functions you are using, or about your data. There aren't > any actual questions in your message, and your problem statement > is exceedingly vague. > .... > > Sarah > > -- > Sarah Goslee > http://www.functionaldiversity.org > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From gavin.simpson at ucl.ac.uk Wed Aug 3 16:50:47 2011 From: gavin.simpson at ucl.ac.uk (Gavin Simpson) Date: Wed, 03 Aug 2011 15:50:47 +0100 Subject: [R] Running R in a sandbox In-Reply-To: References: Message-ID: <1312383047.5178.11.camel@prometheus.geog.ucl.ac.uk> On Wed, 2011-08-03 at 11:04 +0300, Antonio Rodriges wrote: > Hello, > > The idea is to grant access of remote users to R running on Linux. > Users must have ability to run their > R scripts but avoid corrupting the operating system. > > How one can restrict/limit access of remote users to certain R > functions? For example, dealing with IO (file system), graphical > tools, etc. We've been here before, IIRC. But I'm too lazy to check the archives - that's your job ;-) Try a search on http://finzi.psych.upenn.edu/search.html for relevant terms and make sure you turn on the email lists and off the functions/vignettes. > Thank you. G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% From pomchip at free.fr Wed Aug 3 16:51:12 2011 From: pomchip at free.fr (=?ISO-8859-1?Q?S=E9bastien_Bihorel?=) Date: Wed, 3 Aug 2011 10:51:12 -0400 Subject: [R] How to calculate the number of time a given string can be displayed in the width of a grid viewport Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Wed Aug 3 17:10:25 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 11:10:25 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: References: Message-ID: On Aug 3, 2011, at 9:59 AM, ONKELINX, Thierry wrote: > Dear Caroline, > > Here is a faster and more elegant solution. > >> n <- 10000 >> exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace >> = TRUE), itemPrice = rpois(n, 10)) >> library(plyr) >> system.time({ > + ddply(exampledata, .(orderID), function(x){ > + data.frame(itemPrice = x$itemPrice, orderAmount = cumsum(x > $itemPrice)) > + }) > + }) > user system elapsed > 1.67 0.00 1.69 >> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >> system.time(for (i in 2:length(exampledata[,1])) > + {exampledata[i,"orderAmount"]<- > ifelse > (exampledata > [i > ,"orderID > "]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"] > +exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) > user system elapsed > 11.94 0.02 11.97 I tried running this method on the "large dataset" (2MM row) the OP offered, and needed to eventually interrupt it so I could get my console back: > system.time({ + ddply(exampledata2, .(orderID), function(x){ + data.frame(itemPrice = x$itemPrice, orderAmount = cumsum(x $itemPrice)) + }) + }) Timing stopped at: 808.473 1013.749 1816.125 The same task with ave() took 35 seconds. -- david. > > Best regards, > > Thierry >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org >> ] >> Namens Caroline Faisst >> Verzonden: woensdag 3 augustus 2011 15:26 >> Aan: r-help at r-project.org >> Onderwerp: [R] slow computation of functions over large datasets >> >> Hello there, >> >> >> I'm computing the total value of an order from the price of the >> order items using >> a "for" loop and the "ifelse" function. I do this on a large >> dataframe (close to >> 1m lines). The computation of this function is painfully slow: in >> 1min only about >> 90 rows are calculated. >> >> >> The computation time taken for a given number of rows increases >> with the size >> of the dataset, see the example with my function below: >> >> >> # small dataset: function performs well >> >> exampledata<- >> data >> .frame >> (orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >> >> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >> >> system.time(for (i in 2:length(exampledata[,1])) >> {exampledata[i,"orderAmount"]<- >> ifelse >> (exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i- >> 1,"orderAmount"] >> +exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >> >> # large dataset: the very same computational task takes much longer >> >> exampledata2<- >> data >> .frame >> (orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,1 >> 0,1,9,7,25:2000020)) >> >> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >> >> system.time(for (i in 2:9) >> {exampledata2[i,"orderAmount"]<- >> ifelse(exampledata2[i,"orderID"]==exampledata2[i- >> 1,"orderID"],exampledata2[i- >> 1,"orderAmount"] >> +exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >> >> Does someone know a way to increase the speed? >> >> >> Thank you very much! >> >> Caroline David Winsemius, MD West Hartford, CT From jtor14 at gmail.com Wed Aug 3 17:43:25 2011 From: jtor14 at gmail.com (Justin) Date: Wed, 3 Aug 2011 15:43:25 +0000 Subject: [R] conditional data replace (recode, change or whatsoever) References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> <1312373370210-3715218.post@n4.nabble.com> Message-ID: zcatav gmail.com> writes: > > Your suggestion works perfect as i pointed previous message. Now have another > question about data editing. I try this code: > X[X[,"c"]==1,"b"]<-X[,"d"] > and results with error: `[<-.data.frame`(`*tmp*`, X[, "c"] == 1, "b", value > = c(NA, : > replacement has 9 rows, data has 2 > is this equivalent and/or preferred to: X$b[X$c==1]<-X$d[X$c==1] ?? I assume this goes back to the various indexing methods for a dataframe, an object vector that is a column of a data frame vs. an object data frame that happens to be one column of a larger data frame. on a very large data set is one preferable for speed? one for memory use? I tend to index using $ operators often and if I should quit let me know!! Thanks, Justin > Logically i selected 2 rows with X[,"c"]==1. Than i want to replace in that > rows its own data from "d" to "b" with X[,"b"]<-X[,"d"]. What is wrong? > > -- > View this message in context: http://r.789695.n4.nabble.com/conditional-data-replace-recode-change-or-whatsoever-tp3714715p3715218.html > Sent from the R help mailing list archive at Nabble.com. > > From diggsb at ohsu.edu Wed Aug 3 17:49:53 2011 From: diggsb at ohsu.edu (Brian Diggs) Date: Wed, 3 Aug 2011 08:49:53 -0700 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <1312376866384-3715382.post@n4.nabble.com> References: <1312310369354-3713305.post@n4.nabble.com> <4E385D31.7090504@ohsu.edu> <1312376866384-3715382.post@n4.nabble.com> Message-ID: <4E396E21.5030904@ohsu.edu> On 8/3/2011 6:07 AM, wwreith wrote: > So I take it 3D pie charts are out? At least with ggplot, yes. 2D pie charts are somewhat tricky with ggplot, even. They can be gone with stacked, normalized bar charts projected into polar coordinates, if I recall properly. Not limited to ggplot, there is pie() in the graphics package, and pie3D() in the plotrix package. I couldn't find anything that would do bar plots with a 3D effect; the closest was the scatterplot3d package, but that is more a way to do a two dimensional array of bars, rather than a 3D effect. > P.S. It is not about hiding anything. It is about consulting and being told > by your client to make 3D pie charts and change this font or that color to > make the graphs more apealing. Given that I am the one trying to open the > door to using R where I work it would be much easier if I could simply use a > 2D graph. External requirements can make us make choices we otherwise might not have. If the client is amenable to education, you could slowly try to persuade (say, using side-by-side examples), but some are not. Good luck. > -- > View this message in context: http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3715382.html > Sent from the R help mailing list archive at Nabble.com. > -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University From pomchip at free.fr Wed Aug 3 17:51:03 2011 From: pomchip at free.fr (=?ISO-8859-1?Q?S=E9bastien_Bihorel?=) Date: Wed, 3 Aug 2011 11:51:03 -0400 Subject: [R] How to calculate the number of times a given string can be displayed in the width of a grid viewport Message-ID: There was too many spelling mistakes in my original post so I have decided to re-submit it. So here is it Dear R users, I am trying to determine how many characters can be displayed within the width of an open grid viewport. Unfortunately, the arithmetic operation that seems obvious in this case is not permitted with unit objects (see example below). Although there is a brute force way to get this number (using a while loop where the string would be modified by appending the original string to itseft until its width is larger than the width of the viewport), this solution seems a bit overworked. Any suggestion would be welcome. Sebastien require(grid) dev.off() dev.new() nstr <- '' str <- 'OXXXX' nInWidth <- floor(unit(1,'npc')/unit(1,'strwidth',str)) # Does not work nInWidth <- 0 convertWidth(unit(1,'strwidth',nstr),'npc') while (unclass(convertWidth(unit(1,'strwidth',paste(nstr,str,sep='')),'npc'))[1] - 1 <=.Machine$double.eps){ nInWidth <- nInWidth +1 nstr <- paste(nstr,str,sep='') } nInWidth grid.text(paste(rep(str,nInWidth),collapse=''), x = unit(0.5, "npc"), y = unit(0.5, "npc"), draw = TRUE) From ligges at statistik.tu-dortmund.de Wed Aug 3 17:57:55 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Wed, 03 Aug 2011 17:57:55 +0200 Subject: [R] Error Installing or Updating Packages (Maybe because of a proxy) In-Reply-To: <1312375984623-3715332.post@n4.nabble.com> References: <4DAE995D.9050504@statistik.tu-dortmund.de> <20110608173236.ba844ccb.olivier.crouzet@univ-nantes.fr> <20110609121455.916aab99.olivier.crouzet@univ-nantes.fr> <20110609122207.0da6d381.olivier.crouzet@univ-nantes.fr> <1312375984623-3715332.post@n4.nabble.com> Message-ID: <4E397003.3090101@statistik.tu-dortmund.de> 1. you wrote to the mailing list rather than to the original poster. 2. you forgot to cite the original post, hence we do not know what you are referring to. PLease do read the posting guide to this list! Uwe Ligges On 03.08.2011 14:53, mohammad.gh at gmail.com wrote: > Hello David, > I encountered the same problem of yours. > What did you do to resolve it? > Thanks for your reply > Mohammad > > -- > View this message in context: http://r.789695.n4.nabble.com/Error-Installing-or-Updating-Packages-Maybe-because-of-a-proxy-tp3462312p3715332.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From therneau at mayo.edu Wed Aug 3 18:15:06 2011 From: therneau at mayo.edu (Terry Therneau) Date: Wed, 03 Aug 2011 11:15:06 -0500 Subject: [R] Extract p value from coxme object Message-ID: <1312388106.21397.7.camel@nemo> You can look at the code coxme:::print.coxme There you will see that the global test is a chisquare chi1 <- 2*diff(x$loglik[1:2]) with x$df[1] degrees of freedom. The fixed effects coefficients are found in x$coefficients$fixed, and the variances are diag(x$var)[-(1:nfrail)]. (The variances for the random coefficients are first, and then those for the fixed effects). If there are 5 fixed coefficients, their variance/covariance matrix is the lower right 5x5 corner of x$var. Terry Therneau From jholtman at gmail.com Wed Aug 3 18:20:06 2011 From: jholtman at gmail.com (jim holtman) Date: Wed, 3 Aug 2011 12:20:06 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: References: Message-ID: This takes about 2 secs for 1M rows: > n <- 1000000 > exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) > require(data.table) > # convert to data.table > ed.dt <- data.table(exampledata) > system.time(result <- ed.dt[ + , list(total = sum(itemPrice)) + , by = orderID + ] + ) user system elapsed 1.30 0.05 1.34 > > str(result) Classes ?data.table? and 'data.frame': 198708 obs. of 2 variables: $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... $ total : num 49 37 72 92 50 76 34 22 65 39 ... > head(result) orderID total [1,] 1 49 [2,] 2 37 [3,] 3 72 [4,] 4 92 [5,] 5 50 [6,] 6 76 > On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst wrote: > Hello there, > > > I?m computing the total value of an order from the price of the order items > using a ?for? loop and the ?ifelse? function. I do this on a large dataframe > (close to 1m lines). The computation of this function is painfully slow: in > 1min only about 90 rows are calculated. > > > The computation time taken for a given number of rows increases with the > size of the dataset, see the example with my function below: > > > # small dataset: function performs well > > exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) > > exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] > > system.time(for (i in 2:length(exampledata[,1])) > {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) > > > # large dataset: the very same computational task takes much longer > > exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) > > exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] > > system.time(for (i in 2:9) > {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) > > > > Does someone know a way to increase the speed? > > > Thank you very much! > > Caroline > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From f.calboli at imperial.ac.uk Wed Aug 3 18:37:24 2011 From: f.calboli at imperial.ac.uk (Federico Calboli) Date: Wed, 3 Aug 2011 17:37:24 +0100 Subject: [R] strsplit and forward slash '/' Message-ID: <576B97F0-B443-4F7D-9926-C03BC782C317@imperial.ac.uk> Hi All, is there a way of using strsplit with a forward slash '/' as the splitting point? For data such as: 1 T/T C/C 16/33 2 T/T C/C 33/36 3 T/T C/C 16/34 4 T/T C/C 16/31 5 C/C C/C 28/29 6 T/T C/C 16/34 strsplit(my.data[1,1], "/") # and any variation thereof Error in strsplit(apoe[1, 1], "/") : non-character argument Any advice will be gratefully received. Best wishes, Federico -- Federico C. F. Calboli Department of Epidemiology and Biostatistics Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com From gbrenes at ssc.wisc.edu Wed Aug 3 18:40:16 2011 From: gbrenes at ssc.wisc.edu (gbrenes at ssc.wisc.edu) Date: Wed, 3 Aug 2011 11:40:16 -0500 Subject: [R] gstat error Message-ID: <1acbe04d8356bc51c23c7fb0f5a106b0.squirrel@webmail.ssc.wisc.edu> Hello. I am running the examples provided in the gstat help menus. When I try to run the following in predict.gstat: data(meuse) coordinates(meuse)= ~x+y v<-variogram(log(zinc)~1, meuse) I get the following error message: Error in vector("double", length) : invalid 'length' argument What's the problem? Gilbert From murdoch.duncan at gmail.com Wed Aug 3 18:41:42 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 3 Aug 2011 12:41:42 -0400 Subject: [R] strsplit and forward slash '/' In-Reply-To: <576B97F0-B443-4F7D-9926-C03BC782C317@imperial.ac.uk> References: <576B97F0-B443-4F7D-9926-C03BC782C317@imperial.ac.uk> Message-ID: <4E397A46.5000201@gmail.com> On 03/08/2011 12:37 PM, Federico Calboli wrote: > Hi All, > > is there a way of using strsplit with a forward slash '/' as the splitting point? > > For data such as: > > 1 T/T C/C 16/33 > 2 T/T C/C 33/36 > 3 T/T C/C 16/34 > 4 T/T C/C 16/31 > 5 C/C C/C 28/29 > 6 T/T C/C 16/34 > > strsplit(my.data[1,1], "/") # and any variation thereof > Error in strsplit(apoe[1, 1], "/") : non-character argument It looks as though your my.data[1,1] value is a factor, not a character value. strsplit(as.character(my.data[1,1]), "/") would work, or you could avoid getting factors in the first place, using the stringsAsFactors argument when you create the dataframe. Duncan Murdoch > Any advice will be gratefully received. > > Best wishes, > > Federico > > > -- > Federico C. F. Calboli > Department of Epidemiology and Biostatistics > Imperial College, St. Mary's Campus > Norfolk Place, London W2 1PG > > Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 > > f.calboli [.a.t] imperial.ac.uk > f.calboli [.a.t] gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From sarah.goslee at gmail.com Wed Aug 3 18:46:35 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 3 Aug 2011 12:46:35 -0400 Subject: [R] strsplit and forward slash '/' In-Reply-To: <576B97F0-B443-4F7D-9926-C03BC782C317@imperial.ac.uk> References: <576B97F0-B443-4F7D-9926-C03BC782C317@imperial.ac.uk> Message-ID: Hi Federico, A forward slash isn't a special character: > strsplit("T/T", "/") [[1]] [1] "T" "T" so there's some other problem. Are you sure that your first column contains strings and not factors? What does str(my.data) tell you? Does strsplit(as.character(my.data[1,1]), "/") work? If you used read.table() to get your data in, you might want the as.is=TRUE or the stringsAsFactors=FALSE argument. Sarah On Wed, Aug 3, 2011 at 12:37 PM, Federico Calboli wrote: > Hi All, > > is there a way of using strsplit with a forward slash '/' as the splitting point? > > For data such as: > > 1 ? ? ?T/T ? ?C/C ?16/33 > 2 ? ? ?T/T ? ?C/C ?33/36 > 3 ? ? ?T/T ? ?C/C ?16/34 > 4 ? ? ?T/T ? ?C/C ?16/31 > 5 ? ? ?C/C ? ?C/C ?28/29 > 6 ? ? ?T/T ? ?C/C ?16/34 > > strsplit(my.data[1,1], "/") # and any variation thereof > Error in strsplit(apoe[1, 1], "/") : non-character argument > > Any advice will be gratefully received. > > Best wishes, > > Federico > > -- Sarah Goslee http://www.sarahgoslee.com From f.calboli at imperial.ac.uk Wed Aug 3 18:55:29 2011 From: f.calboli at imperial.ac.uk (Federico Calboli) Date: Wed, 3 Aug 2011 17:55:29 +0100 Subject: [R] strsplit and forward slash '/' In-Reply-To: <4E397A46.5000201@gmail.com> References: <576B97F0-B443-4F7D-9926-C03BC782C317@imperial.ac.uk> <4E397A46.5000201@gmail.com> Message-ID: <4FBC4880-4AC8-4CFA-854D-2FD722081F91@imperial.ac.uk> On 3 Aug 2011, at 17:41, Duncan Murdoch wrote: > > It looks as though your my.data[1,1] value is a factor, not a character value. > > strsplit(as.character(my.data[1,1]), "/") Thanks Duncan, this solved it. Best Federico > > would work, or you could avoid getting factors in the first place, using the stringsAsFactors argument when you create the dataframe. > > Duncan Murdoch > > >> Any advice will be gratefully received. >> >> Best wishes, >> >> Federico >> >> >> -- >> Federico C. F. Calboli >> Department of Epidemiology and Biostatistics >> Imperial College, St. Mary's Campus >> Norfolk Place, London W2 1PG >> >> Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 >> >> f.calboli [.a.t] imperial.ac.uk >> f.calboli [.a.t] gmail.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- Federico C. F. Calboli Department of Epidemiology and Biostatistics Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com From f.calboli at imperial.ac.uk Wed Aug 3 18:56:01 2011 From: f.calboli at imperial.ac.uk (Federico Calboli) Date: Wed, 3 Aug 2011 17:56:01 +0100 Subject: [R] strsplit and forward slash '/' In-Reply-To: References: <576B97F0-B443-4F7D-9926-C03BC782C317@imperial.ac.uk> Message-ID: <3D0070FE-69EC-47A0-AB65-43812B840672@imperial.ac.uk> On 3 Aug 2011, at 17:46, Sarah Goslee wrote: > Hi Federico, > > A forward slash isn't a special character: > >> strsplit("T/T", "/") > [[1]] > [1] "T" "T" > > so there's some other problem. > > Are you sure that your first column contains strings and not factors? > What does str(my.data) tell you? > > Does > strsplit(as.character(my.data[1,1]), "/") > work? yes! Thanks Federico > > If you used read.table() to get your data in, you might want the > as.is=TRUE or the stringsAsFactors=FALSE argument. > > Sarah > > On Wed, Aug 3, 2011 at 12:37 PM, Federico Calboli > wrote: >> Hi All, >> >> is there a way of using strsplit with a forward slash '/' as the splitting point? >> >> For data such as: >> >> 1 T/T C/C 16/33 >> 2 T/T C/C 33/36 >> 3 T/T C/C 16/34 >> 4 T/T C/C 16/31 >> 5 C/C C/C 28/29 >> 6 T/T C/C 16/34 >> >> strsplit(my.data[1,1], "/") # and any variation thereof >> Error in strsplit(apoe[1, 1], "/") : non-character argument >> >> Any advice will be gratefully received. >> >> Best wishes, >> >> Federico >> >> > > > -- > Sarah Goslee > http://www.sarahgoslee.com -- Federico C. F. Calboli Department of Epidemiology and Biostatistics Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com From murdoch.duncan at gmail.com Wed Aug 3 19:04:59 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 03 Aug 2011 13:04:59 -0400 Subject: [R] R CMD check problem In-Reply-To: References: <4E37D5A7.20009@gmail.com> Message-ID: <4E397FBB.8010901@gmail.com> On 03/08/2011 12:47 PM, Baidya Nath Mandal wrote: > Dear Murdoch, > > After setting CYGWIN=nodosfilewarning, i re-ran the R CMD check and got > following message: > > * installing *source* package 'mypackage' ... > ** libs > ERROR: compilation failed for package 'mypackage' > * removing 'C:/Rpackages/mypackage.Rcheck/mypackage' > > The log file contained following. > * using log directory 'C:/Rpackages/mypackage.Rcheck' > * using R version 2.13.0 (2011-04-13) > * using platform: i386-pc-mingw32 (32-bit) > * using session charset: ISO8859-1 > * checking for file 'mypackage/DESCRIPTION' ... OK > * this is package 'mypackage' version '1.1' > * checking package name space information ... OK > * checking package dependencies ... OK > * checking if this is a source package ... OK > * checking for executable files ... OK > * checking whether package 'mypackage' can be installed ... ERROR > Installation failed. > See 'C:/Rpackages/mypackage.Rcheck/00install.out' for details. > > The src directory contains nothing since all my codes are in R and are in > the R directory. I have checked that the code works fine in R console. My > DESCRIPTION file is like this: > Package: mypackage > Version: 1.1 > Date: 2011-07-14 > Title: abcd > Author: B N Mandal > Maintainer: B N Mandal > Depends: R(>= 2.13.0) > Description: xyz > License: GPL (>=2) > > and NAMESPACE file contains > export(fun1) > > I have checked Rd files are fine. > > Can you suggest what may be wrong now? You should delete your src directory if you don't need it. Duncan Murdoch > regards, > BN Mandal > > On Tue, Aug 2, 2011 at 4:17 PM, Duncan Murdochwrote: > > > On 11-08-02 5:26 AM, Baidya Nath Mandal wrote: > > > >> Dear friends, > >> > >> I am building an R package called *mypackage*. I followed every possible > >> steps (to my understanding) for the same. I got following problem while > >> doing *R CMD check mypackage*. > >> > >> * installing *source* package 'mypackage' ... > >> ** libs > >> cygwin warning: > >> MS-DOS style path detected: C:/PROGRA~1/R/R-213~1.0/etc/**i386/Makeconf > >> Preferred POSIX equivalent is: > >> /cygdrive/c/PROGRA~1/R/R-213~**1.0/etc/i386/Makeconf > >> CYGWIN environment variable option "nodosfilewarning" turns off this > >> warning. > >> Consult the user's guide for more details about POSIX paths: > >> http://cygwin.com/cygwin-ug-**net/using.html#using-pathnames > >> > > > > I believe that warning is ignorable, but you can turn it off using > > > > set CYGWIN=nodosfilewarning > > > > It probably didn't cause the error below. > > > > > > ERROR: compilation failed for package 'mypackage' > >> > > > > I don't know what did cause that error, but it's likely something in your > > src directory of the package. What do you have there? > > > > Duncan Murdoch > > > > * removing 'C:/Rpackages/mypackage.**Rcheck/mypackage'. > >> > >> What I understood from above is that it is something with PATH variable. I > >> had set the following PATH variable: > >> C:\Rtools\bin;C:\Rtools\MinGW\**bin;"C:\Program > >> Files\R\R-2.13.0\bin";"C:\**Program Files\MiKTeX > >> 2.9\miktex\bin";%SystemRoot%\**system32;%SystemRoot%;%** > >> SystemRoot%\System32\Wbem;%**SYSTEMROOT%\System32\** > >> WindowsPowerShell\v1.0\;"C:\**Program > >> Files\HTML Help Workshop" > >> > >> > >> Can anybody suggest what possibly could have gone wrong? > >> > >> Thanks, > >> BN Mandal > >> > >> [[alternative HTML version deleted]] > >> > >> ______________________________**________________ > >> R-help at r-project.org mailing list > >> https://stat.ethz.ch/mailman/**listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/** > >> posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > From dwinsemius at comcast.net Wed Aug 3 19:12:30 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 13:12:30 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: References: Message-ID: <427F9332-302E-4EF3-91C2-80D75255B1DD@comcast.net> On Aug 3, 2011, at 12:20 PM, jim holtman wrote: > This takes about 2 secs for 1M rows: > >> n <- 1000000 >> exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace >> = TRUE), itemPrice = rpois(n, 10)) >> require(data.table) >> # convert to data.table >> ed.dt <- data.table(exampledata) >> system.time(result <- ed.dt[ > + , list(total = sum(itemPrice)) > + , by = orderID > + ] > + ) > user system elapsed > 1.30 0.05 1.34 Interesting. Impressive. And I noted that the OP wanted what cumsum would provide and for some reason creating that longer result is even faster on my machine than the shorter result using sum. -- David. >> >> str(result) > Classes ?data.table? and 'data.frame': 198708 obs. of 2 variables: > $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... > $ total : num 49 37 72 92 50 76 34 22 65 39 ... >> head(result) > orderID total > [1,] 1 49 > [2,] 2 37 > [3,] 3 72 > [4,] 4 92 > [5,] 5 50 > [6,] 6 76 >> > > > On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst > wrote: >> Hello there, >> >> >> I?m computing the total value of an order from the price of the >> order items >> using a ?for? loop and the ?ifelse? function. I do this on a large >> dataframe >> (close to 1m lines). The computation of this function is painfully >> slow: in >> 1min only about 90 rows are calculated. >> >> >> The computation time taken for a given number of rows increases >> with the >> size of the dataset, see the example with my function below: >> >> >> # small dataset: function performs well >> >> exampledata<- >> data >> .frame >> (orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >> >> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >> >> system.time(for (i in 2:length(exampledata[,1])) >> {exampledata[i,"orderAmount"]<- >> ifelse >> (exampledata >> [i >> ,"orderID >> "]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"] >> +exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >> >> >> # large dataset: the very same computational task takes much longer >> >> exampledata2<- >> data >> .frame >> (orderID >> = >> c >> (1,1,1,2,2,3,3,3,4,5 >> :2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) >> >> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >> >> system.time(for (i in 2:9) >> {exampledata2[i,"orderAmount"]<- >> ifelse >> (exampledata2 >> [i >> ,"orderID >> "]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"] >> +exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >> >> >> >> Does someone know a way to increase the speed? >> >> >> Thank you very much! >> >> Caroline >> >> [[alternative HTML version deleted]] >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From ehlers at ucalgary.ca Wed Aug 3 19:22:47 2011 From: ehlers at ucalgary.ca (Peter Ehlers) Date: Wed, 03 Aug 2011 10:22:47 -0700 Subject: [R] gstat error In-Reply-To: <1acbe04d8356bc51c23c7fb0f5a106b0.squirrel@webmail.ssc.wisc.edu> References: <1acbe04d8356bc51c23c7fb0f5a106b0.squirrel@webmail.ssc.wisc.edu> Message-ID: <4E3983E7.4000301@ucalgary.ca> On 2011-08-03 09:40, gbrenes at ssc.wisc.edu wrote: > Hello. > > I am running the examples provided in the gstat help menus. When I try to > run the following in predict.gstat: > > data(meuse) > coordinates(meuse)= ~x+y > v<-variogram(log(zinc)~1, meuse) > > I get the following error message: > > Error in vector("double", length) : invalid 'length' argument > > > What's the problem? You should at the very least provide your sessionInfo(). Peter Ehlers > > > Gilbert > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Greg.Snow at imail.org Wed Aug 3 20:02:14 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Wed, 3 Aug 2011 12:02:14 -0600 Subject: [R] Coefficient names when using lm() with contrasts In-Reply-To: References: Message-ID: If you add column names to your contrast matrix (treat3) then those names will be used in the coefficient names. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Peter Morgan > Sent: Wednesday, August 03, 2011 6:12 AM > To: r-help at r-project.org > Subject: [R] Coefficient names when using lm() with contrasts > > Dear R Users, > > Am using lm() with contrasts as below. If I skip the contrasts() > statement, I get the coefficient names to be > > names(results$coef) > [1] "(Intercept)" "VarAcat" "VarArat" "VarB" > > which are much more meaningful than ones based on integers. > > Can anyone tell me how to get R to keep the coefficient names based on > the > factor levels whilst using contrasts rather than labelling them with > integers? > > Many thanks in advance, > > Pete > > Cardiff, UK > > > dt=read.table("testreg.txt",sep=",",header=T) > > dt > ID VarA VarB VarC > 1 1 cat 2 23 > 2 2 dog 3 56 > 3 3 rat 5 35 > 4 4 cat 2 43 > 5 5 cat 7 51 > 6 6 dog 3 31 > 7 7 dog 4 65 > 8 8 rat 1 18 > 9 9 rat 6 49 > 10 10 dog 3 28 > > dt$VarA=relevel(dt$VarA,ref="dog") > > treat3=matrix(-1/3,ncol=2,nrow=3); for (i in 1:2) {treat3[i+1,i]=2/3} > > contrasts(dt$VarA)=treat3 > > levels(dt$VarA) > [1] "dog" "cat" "rat" > > results=lm(formula=VarC~VarA+VarB, data=dt) > > names(results$coef) > [1] "(Intercept)" "VarA1" "VarA2" "VarB" > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Wed Aug 3 20:09:42 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 14:09:42 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: <44DE9313-3653-4201-954A-E770FDF2FAEF@gmail.com> References: <427F9332-302E-4EF3-91C2-80D75255B1DD@comcast.net> <44DE9313-3653-4201-954A-E770FDF2FAEF@gmail.com> Message-ID: <3E01647D-7ED8-4B01-B8B4-6D258AB79CC3@comcast.net> On Aug 3, 2011, at 2:01 PM, Ken wrote: > Hello, > Perhaps transpose the table attach(as.data.frame(t(data))) and use > ColSums() function with order id as header. > -Ken Hutchison Got any code? The OP offered a reproducible example, after all. -- David. > > On Aug 3, 2554 BE, at 1:12 PM, David Winsemius > wrote: > >> >> On Aug 3, 2011, at 12:20 PM, jim holtman wrote: >> >>> This takes about 2 secs for 1M rows: >>> >>>> n <- 1000000 >>>> exampledata <- data.frame(orderID = sample(floor(n / 5), n, >>>> replace = TRUE), itemPrice = rpois(n, 10)) >>>> require(data.table) >>>> # convert to data.table >>>> ed.dt <- data.table(exampledata) >>>> system.time(result <- ed.dt[ >>> + , list(total = sum(itemPrice)) >>> + , by = orderID >>> + ] >>> + ) >>> user system elapsed >>> 1.30 0.05 1.34 >> >> Interesting. Impressive. And I noted that the OP wanted what cumsum >> would provide and for some reason creating that longer result is >> even faster on my machine than the shorter result using sum. >> >> -- >> David. >>>> >>>> str(result) >>> Classes ?data.table? and 'data.frame': 198708 obs. of 2 variables: >>> $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... >>> $ total : num 49 37 72 92 50 76 34 22 65 39 ... >>>> head(result) >>> orderID total >>> [1,] 1 49 >>> [2,] 2 37 >>> [3,] 3 72 >>> [4,] 4 92 >>> [5,] 5 50 >>> [6,] 6 76 >>>> >>> >>> >>> On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst >>> wrote: >>>> Hello there, >>>> >>>> >>>> I?m computing the total value of an order from the price of the >>>> order items >>>> using a ?for? loop and the ?ifelse? function. I do this on a >>>> large dataframe >>>> (close to 1m lines). The computation of this function is >>>> painfully slow: in >>>> 1min only about 90 rows are calculated. >>>> >>>> >>>> The computation time taken for a given number of rows increases >>>> with the >>>> size of the dataset, see the example with my function below: >>>> >>>> >>>> # small dataset: function performs well >>>> >>>> exampledata<- >>>> data >>>> .frame >>>> (orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >>>> >>>> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >>>> >>>> system.time(for (i in 2:length(exampledata[,1])) >>>> {exampledata[i,"orderAmount"]<- >>>> ifelse >>>> (exampledata >>>> [i >>>> ,"orderID >>>> "]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"] >>>> +exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >>>> >>>> >>>> # large dataset: the very same computational task takes much longer >>>> >>>> exampledata2<- >>>> data >>>> .frame >>>> (orderID >>>> = >>>> c >>>> (1,1,1,2,2,3,3,3,4,5 >>>> :2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) >>>> >>>> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >>>> >>>> system.time(for (i in 2:9) >>>> {exampledata2[i,"orderAmount"]<- >>>> ifelse >>>> (exampledata2 >>>> [i >>>> ,"orderID >>>> "]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"] >>>> +exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >>>> >>>> >>>> >>>> Does someone know a way to increase the speed? >>>> >>>> >>>> Thank you very much! >>>> >>>> Caroline >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>> >>> >>> >>> -- >>> Jim Holtman >>> Data Munger Guru >>> >>> What is the problem that you are trying to solve? >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From gbrenes at ssc.wisc.edu Wed Aug 3 20:45:09 2011 From: gbrenes at ssc.wisc.edu (gbrenes at ssc.wisc.edu) Date: Wed, 3 Aug 2011 13:45:09 -0500 Subject: [R] gstat error In-Reply-To: <4E3983E7.4000301@ucalgary.ca> References: <1acbe04d8356bc51c23c7fb0f5a106b0.squirrel@webmail.ssc.wisc.edu> <4E3983E7.4000301@ucalgary.ca> Message-ID: <17dd0543208bb9f9fd6085b273fdc539.squirrel@webmail.ssc.wisc.edu> Here is my sessionInfo() > sessionInfo() R version 2.12.2 (2011-02-25) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] splines grid stats graphics grDevices utils datasets methods [9] base other attached packages: [1] spsurvey_2.1-2 lmtest_0.9-27 zoo_1.6-5 [4] car_2.0-9 survival_2.36-5 nnet_7.3-1 [7] spgwr_0.6-10 spatialCovariance_0.6-4 spatial_7.3-2 [10] spatgraphs_2.44 sgeostat_1.0-23 rworldmap_0.1211 [13] fields_6.3 spam_0.23-0 RPyGeo_0.9-2 [16] RSAGA_0.91-1 shapefiles_0.6 RgoogleMaps_1.1.9.7 [19] raster_1.8-22 RArcInfo_0.4-10 RColorBrewer_1.0-2 [22] PBSmodelling_2.61.210 PBSmapping_2.61.9 mapproj_1.1-8.3 [25] mapdata_2.1-4 intamap_1.3-8 evd_2.2-4 [28] mvtnorm_0.9-96 automap_1.0-9 rgdal_0.6-33 [31] gmaps_0.2 maps_2.1-6 glmmBUGS_1.9 [34] spdep_0.5-32 coda_0.14-2 deldir_0.0-13 [37] maptools_0.8-7 foreign_0.8-42 Matrix_0.999375-46 [40] lattice_0.19-17 boot_1.2-43 abind_1.3-0 [43] MASS_7.3-11 geosphere_1.2-19 geonames_0.8 [46] rjson_0.2.3 ctv_0.7-2 GEOmap_1.5-13 [49] akima_0.5-4 RPMG_2.0-5 splancs_2.01-27 [52] geomapdata_1.0-4 geoRglm_0.8-33 geoR_1.6-34 [55] gstat_0.9-81 sp_0.9-81 nlme_3.1-98 loaded via a namespace (and not attached): [1] tcltk_2.12.2 tools_2.12.2 > > On 2011-08-03 09:40, gbrenes at ssc.wisc.edu wrote: >> Hello. >> >> I am running the examples provided in the gstat help menus. When I try >> to >> run the following in predict.gstat: >> >> data(meuse) >> coordinates(meuse)= ~x+y >> v<-variogram(log(zinc)~1, meuse) >> >> I get the following error message: >> >> Error in vector("double", length) : invalid 'length' argument >> >> >> What's the problem? > > You should at the very least provide your sessionInfo(). > > Peter Ehlers > >> >> >> Gilbert >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > From johjeffrey at hotmail.com Wed Aug 3 21:04:16 2011 From: johjeffrey at hotmail.com (Jeffrey Joh) Date: Wed, 3 Aug 2011 12:04:16 -0700 Subject: [R] Convert matrix to numeric Message-ID: I have a matrix that looks like this: structure(c("0.0376673981759913", "0.111066500741386", "1", "1103", "18", "OPEN", "DEPR", "0.0404073656092023", "0.115186044704599", "1", "719", "18", "OPEN", "DEPR", "0.0665342096693433", "0.197570061769498", "1", "1103", "18", "OPEN", "DEPR", "0.119287147905722", "0.356427096010845", "1", "1103", "18", "OPEN", "DEPR"), .Dim = c(7L, 4L), .Dimnames = list( c("Sn", "SlnC", "housenum", "date", "hour", "flue", "pressurization" ), c("10019.BLO", "1002.BLO", "10020.BLO", "10021.BLO"))) How do I convert rows 1-5 to numeric? I tried mode() <- "numeric" but that doesn't change anything. I also tried converting this to a table then converting to numeric, but I got: (list) object cannot be coerced to type 'double' Jeff From knifeboot at 163.com Wed Aug 3 17:21:56 2011 From: knifeboot at 163.com (KnifeBoot) Date: Wed, 3 Aug 2011 23:21:56 +0800 (CST) Subject: [R] r-help Message-ID: <5e37969a.8715.131903c9ab2.Coremail.knifeboot@163.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dpessoa at igc.gulbenkian.pt Wed Aug 3 17:07:05 2011 From: dpessoa at igc.gulbenkian.pt (Delphine Pessoa) Date: Wed, 3 Aug 2011 16:07:05 +0100 Subject: [R] Case-by-case tolerance needed for successful integrate() In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Wed Aug 3 18:00:08 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Wed, 03 Aug 2011 18:00:08 +0200 Subject: [R] R-help Digest, Vol 102, Issue 3 In-Reply-To: <20110803101132.1303.qmail@srv5.yoursite.ch> References: <20110803101132.1303.qmail@srv5.yoursite.ch> Message-ID: <4E397088.6020409@statistik.tu-dortmund.de> Since we got this the x-th time now: Dear Fr?nzi Korner, please please please never ever add auto-replies to your account that also reply to mailing list messages! Thousands of readers of R-help get your auto reply everey day now! Best, Uwe Ligges On 03.08.2011 12:11, fraenzi.korner at oikostat.ch wrote: > Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden F?llen melden Sie sich bei Stefanie von Felten steffi.vonfelten at oikostat.ch > > We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfelten at oikostat.ch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From gantkant at walla.com Wed Aug 3 18:00:04 2011 From: gantkant at walla.com (=?UTF-8?Q?=D7=A8=D7=90=D7=95=D7=91=D7=9F=20=D7=90=D7=91=D7=A8=D7=9E=D7=95=D7=91=D7=99=D7=A5?=) Date: Wed, 3 Aug 2011 19:00:04 +0300 Subject: [R] =?utf-8?q?limits_on_liniar_model?= Message-ID: <1312387204.492000-87448997-25688@walla.com> Can I put limits on the lm() command? I only know that you can choose a liniar model with or without an intercept, but can I put other limits on the coefficients (for example- the intercept must be bigger than 1) ? _________________________________________________________________ Walla! Mail - [1]Get your free unlimited mail today References 1. http://www.walla.co.il/ From wtemptation at hotmail.co.uk Wed Aug 3 18:29:40 2011 From: wtemptation at hotmail.co.uk (xy) Date: Wed, 3 Aug 2011 09:29:40 -0700 (PDT) Subject: [R] Model selection Message-ID: <1312388980686-3716109.post@n4.nabble.com> Dear List, I have some difficulties to work with the function lmer from lme4. My responses are binary form and i want to use forward selection to my 12 covariates but i dont know how can I choose them based on deviance. Can someone pls give me a example so i can apply. For example my covariates are gestation,smoking ...and my response baby b1=lmer(baby~ (1|id), data, binomial) i will appreciate any help.thankss -- View this message in context: http://r.789695.n4.nabble.com/Model-selection-tp3716109p3716109.html Sent from the R help mailing list archive at Nabble.com. From wludwick at mac.com Wed Aug 3 18:35:59 2011 From: wludwick at mac.com (Walter Ludwick) Date: Wed, 03 Aug 2011 17:35:59 +0100 Subject: [R] R.app installer probs on Snow Leopard Message-ID: <0A01FBF4-5A06-4EDC-AD24-5DE687EB149C@mac.com> Have tried to install R.app several times (6, in fact: versions 2.12, 13 & 14, both 32 and 64 bit versions), using packages freshly downloaded from the official project page, and failed every time, given exception reports such as the following (appended below, the 2 reports arising out of my 1st & 6th attempts). Machine & software version specifics are all contained therein. What am i missing, i wonder? Any clues would be most appreciated -thanx! /w 8<--------(snip)------->8 Process: R [15997] Path: /Applications/R.app/Contents/MacOS/R Identifier: org.R-project.R Version: ??? (???) Code Type: X86-64 (Native) Parent Process: launchd [179] Date/Time: 2011-08-03 16:13:36.857 +0100 OS Version: Mac OS X 10.6.8 (10K540) Report Version: 6 Interval Since Last Report: 23665 sec Crashes Since Last Report: 5 Per-App Crashes Since Last Report: 3 Anonymous UUID: A3B4FAD8-70A5-420F-A0E1-E02624B493A5 Exception Type: EXC_BREAKPOINT (SIGTRAP) Exception Codes: 0x0000000000000002, 0x0000000000000000 Crashed Thread: 0 Dyld Error Message: Library not loaded: /Library/Frameworks/R.framework/Versions/2.14/Resources/lib/libR.dylib Referenced from: /Applications/R.app/Contents/MacOS/R Reason: image not found Binary Images: 0x7fff5fc00000 - 0x7fff5fc3bdef dyld 132.1 (???) <69130DA3-7CB3-54C8-ABC5-423DECDD2AF7> /usr/lib/dyld Model: MacBookPro5,5, BootROM MBP55.00AC.B03, 2 processors, Intel Core 2 Duo, 2.53 GHz, 4 GB, SMC 1.47f2 Graphics: NVIDIA GeForce 9400M, NVIDIA GeForce 9400M, PCI, 256 MB Memory Module: global_name AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x8D), Broadcom BCM43xx 1.0 (5.10.131.42.4) Bluetooth: Version 2.4.5f3, 2 service, 19 devices, 1 incoming serial ports Network Service: AirPort, AirPort, en1 Serial ATA Device: ST9250315ASG, 232.89 GB Serial ATA Device: HL-DT-ST DVDRW GS23N USB Device: Internal Memory Card Reader, 0x05ac (Apple Inc.), 0x8403, 0x26500000 / 2 USB Device: Built-in iSight, 0x05ac (Apple Inc.), 0x8507, 0x24400000 / 2 USB Device: BRCM2046 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0x06100000 / 2 USB Device: Bluetooth USB Host Controller, 0x05ac (Apple Inc.), 0x8213, 0x06110000 / 4 USB Device: Apple Internal Keyboard / Trackpad, 0x05ac (Apple Inc.), 0x0237, 0x04600000 / 3 USB Device: IR Receiver, 0x05ac (Apple Inc.), 0x8242, 0x04500000 / 2 8<--------(snip)------->8 Process: R [16330] Path: /Applications/R.app/Contents/MacOS/R Identifier: org.R-project.R Version: ??? (???) Code Type: X86 (Native) Parent Process: launchd [179] Date/Time: 2011-08-03 17:18:06.587 +0100 OS Version: Mac OS X 10.6.8 (10K540) Report Version: 6 Interval Since Last Report: 27534 sec Crashes Since Last Report: 9 Per-App Crashes Since Last Report: 7 Anonymous UUID: A3B4FAD8-70A5-420F-A0E1-E02624B493A5 Exception Type: EXC_BREAKPOINT (SIGTRAP) Exception Codes: 0x0000000000000002, 0x0000000000000000 Crashed Thread: 0 Dyld Error Message: Library not loaded: /Library/Frameworks/R.framework/Versions/2.12/Resources/lib/libR.dylib Referenced from: /Applications/R.app/Contents/MacOS/R Reason: image not found Binary Images: 0x8fe00000 - 0x8fe4162b dyld 132.1 (???) <1C06ECD9-A2D7-BB10-AF50-0F2B598A7DEC> /usr/lib/dyld Model: MacBookPro5,5, BootROM MBP55.00AC.B03, 2 processors, Intel Core 2 Duo, 2.53 GHz, 4 GB, SMC 1.47f2 Graphics: NVIDIA GeForce 9400M, NVIDIA GeForce 9400M, PCI, 256 MB Memory Module: global_name AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x8D), Broadcom BCM43xx 1.0 (5.10.131.42.4) Bluetooth: Version 2.4.5f3, 2 service, 19 devices, 1 incoming serial ports Network Service: AirPort, AirPort, en1 Serial ATA Device: ST9250315ASG, 232.89 GB Serial ATA Device: HL-DT-ST DVDRW GS23N USB Device: Internal Memory Card Reader, 0x05ac (Apple Inc.), 0x8403, 0x26500000 / 2 USB Device: Built-in iSight, 0x05ac (Apple Inc.), 0x8507, 0x24400000 / 2 USB Device: BRCM2046 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0x06100000 / 2 USB Device: Bluetooth USB Host Controller, 0x05ac (Apple Inc.), 0x8213, 0x06110000 / 4 USB Device: Apple Internal Keyboard / Trackpad, 0x05ac (Apple Inc.), 0x0237, 0x04600000 / 3 USB Device: IR Receiver, 0x05ac (Apple Inc.), 0x8242, 0x04500000 / 2 8<--------(snip)------->8 From vicvoncastle at gmail.com Wed Aug 3 20:01:38 2011 From: vicvoncastle at gmail.com (Ken) Date: Wed, 3 Aug 2011 14:01:38 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: <427F9332-302E-4EF3-91C2-80D75255B1DD@comcast.net> References: <427F9332-302E-4EF3-91C2-80D75255B1DD@comcast.net> Message-ID: <44DE9313-3653-4201-954A-E770FDF2FAEF@gmail.com> Hello, Perhaps transpose the table attach(as.data.frame(t(data))) and use ColSums() function with order id as header. -Ken Hutchison On Aug 3, 2554 BE, at 1:12 PM, David Winsemius wrote: > > On Aug 3, 2011, at 12:20 PM, jim holtman wrote: > >> This takes about 2 secs for 1M rows: >> >>> n <- 1000000 >>> exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) >>> require(data.table) >>> # convert to data.table >>> ed.dt <- data.table(exampledata) >>> system.time(result <- ed.dt[ >> + , list(total = sum(itemPrice)) >> + , by = orderID >> + ] >> + ) >> user system elapsed >> 1.30 0.05 1.34 > > Interesting. Impressive. And I noted that the OP wanted what cumsum would provide and for some reason creating that longer result is even faster on my machine than the shorter result using sum. > > -- > David. >>> >>> str(result) >> Classes ?data.table? and 'data.frame': 198708 obs. of 2 variables: >> $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... >> $ total : num 49 37 72 92 50 76 34 22 65 39 ... >>> head(result) >> orderID total >> [1,] 1 49 >> [2,] 2 37 >> [3,] 3 72 >> [4,] 4 92 >> [5,] 5 50 >> [6,] 6 76 >>> >> >> >> On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst >> wrote: >>> Hello there, >>> >>> >>> I?m computing the total value of an order from the price of the order items >>> using a ?for? loop and the ?ifelse? function. I do this on a large dataframe >>> (close to 1m lines). The computation of this function is painfully slow: in >>> 1min only about 90 rows are calculated. >>> >>> >>> The computation time taken for a given number of rows increases with the >>> size of the dataset, see the example with my function below: >>> >>> >>> # small dataset: function performs well >>> >>> exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >>> >>> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >>> >>> system.time(for (i in 2:length(exampledata[,1])) >>> {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >>> >>> >>> # large dataset: the very same computational task takes much longer >>> >>> exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) >>> >>> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >>> >>> system.time(for (i in 2:9) >>> {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >>> >>> >>> >>> Does someone know a way to increase the speed? >>> >>> >>> Thank you very much! >>> >>> Caroline >>> >>> [[alternative HTML version deleted]] >>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> >> -- >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Wed Aug 3 21:04:59 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 15:04:59 -0400 Subject: [R] gstat error In-Reply-To: <17dd0543208bb9f9fd6085b273fdc539.squirrel@webmail.ssc.wisc.edu> References: <1acbe04d8356bc51c23c7fb0f5a106b0.squirrel@webmail.ssc.wisc.edu> <4E3983E7.4000301@ucalgary.ca> <17dd0543208bb9f9fd6085b273fdc539.squirrel@webmail.ssc.wisc.edu> Message-ID: <4704BF59-9113-499B-B613-DC2CDF3B79FF@comcast.net> I see a 'variogram' function in both spatial and gstat when I use ?? variogram on my machine that probably does not have even all of those packages installed. Are you sure they are the same (I looked .... they are not) or failing that that the one you expect is being chosen? And are you even sure that there is not a third or a fourth 'variogram' in one of those other packages? -- David. On Aug 3, 2011, at 2:45 PM, gbrenes at ssc.wisc.edu wrote: > Here is my sessionInfo() > >> sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: i386-pc-mingw32/i386 (32-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] splines grid stats graphics grDevices utils > datasets > methods > [9] base > > other attached packages: > [1] spsurvey_2.1-2 lmtest_0.9-27 zoo_1.6-5 > [4] car_2.0-9 survival_2.36-5 nnet_7.3-1 > [7] spgwr_0.6-10 spatialCovariance_0.6-4 spatial_7.3-2 > [10] spatgraphs_2.44 sgeostat_1.0-23 rworldmap_0.1211 > [13] fields_6.3 spam_0.23-0 RPyGeo_0.9-2 > [16] RSAGA_0.91-1 shapefiles_0.6 > RgoogleMaps_1.1.9.7 > [19] raster_1.8-22 RArcInfo_0.4-10 > RColorBrewer_1.0-2 > [22] PBSmodelling_2.61.210 PBSmapping_2.61.9 mapproj_1.1-8.3 > [25] mapdata_2.1-4 intamap_1.3-8 evd_2.2-4 > [28] mvtnorm_0.9-96 automap_1.0-9 rgdal_0.6-33 > [31] gmaps_0.2 maps_2.1-6 glmmBUGS_1.9 > [34] spdep_0.5-32 coda_0.14-2 deldir_0.0-13 > [37] maptools_0.8-7 foreign_0.8-42 > Matrix_0.999375-46 > [40] lattice_0.19-17 boot_1.2-43 abind_1.3-0 > [43] MASS_7.3-11 geosphere_1.2-19 geonames_0.8 > [46] rjson_0.2.3 ctv_0.7-2 GEOmap_1.5-13 > [49] akima_0.5-4 RPMG_2.0-5 splancs_2.01-27 > [52] geomapdata_1.0-4 geoRglm_0.8-33 geoR_1.6-34 > [55] gstat_0.9-81 sp_0.9-81 nlme_3.1-98 > > loaded via a namespace (and not attached): > [1] tcltk_2.12.2 tools_2.12.2 >> > > >> On 2011-08-03 09:40, gbrenes at ssc.wisc.edu wrote: >>> Hello. >>> >>> I am running the examples provided in the gstat help menus. When >>> I try >>> to >>> run the following in predict.gstat: >>> >>> data(meuse) >>> coordinates(meuse)= ~x+y >>> v<-variogram(log(zinc)~1, meuse) >>> >>> I get the following error message: >>> >>> Error in vector("double", length) : invalid 'length' argument >>> >>> >>> What's the problem? >> >> You should at the very least provide your sessionInfo(). >> >> Peter Ehlers >> >>> >>> >>> Gilbert >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From jason.roberts at duke.edu Wed Aug 3 20:48:11 2011 From: jason.roberts at duke.edu (Jason Roberts) Date: Wed, 3 Aug 2011 14:48:11 -0400 Subject: [R] How to fit model in function using passed-in formula, then predict from another function Message-ID: <002401cc520d$e556b860$b0042920$@roberts@duke.edu> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vicvoncastle at gmail.com Wed Aug 3 21:05:59 2011 From: vicvoncastle at gmail.com (Ken) Date: Wed, 3 Aug 2011 15:05:59 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: <3E01647D-7ED8-4B01-B8B4-6D258AB79CC3@comcast.net> References: <427F9332-302E-4EF3-91C2-80D75255B1DD@comcast.net> <44DE9313-3653-4201-954A-E770FDF2FAEF@gmail.com> <3E01647D-7ED8-4B01-B8B4-6D258AB79CC3@comcast.net> Message-ID: Sorry about the lack of code, but using Davids example, would: tapply(itemPrice, INDEX=orderID, FUN=sum) work? -Ken Hutchison On Aug 3, 2554 BE, at 2:09 PM, David Winsemius wrote: > > On Aug 3, 2011, at 2:01 PM, Ken wrote: > >> Hello, >> Perhaps transpose the table attach(as.data.frame(t(data))) and use ColSums() function with order id as header. >> -Ken Hutchison > > Got any code? The OP offered a reproducible example, after all. > > -- > David. >> >> On Aug 3, 2554 BE, at 1:12 PM, David Winsemius wrote: >> >>> >>> On Aug 3, 2011, at 12:20 PM, jim holtman wrote: >>> >>>> This takes about 2 secs for 1M rows: >>>> >>>>> n <- 1000000 >>>>> exampledata <- data.frame(orderID = sample(floor(n / 5), n, replace = TRUE), itemPrice = rpois(n, 10)) >>>>> require(data.table) >>>>> # convert to data.table >>>>> ed.dt <- data.table(exampledata) >>>>> system.time(result <- ed.dt[ >>>> + , list(total = sum(itemPrice)) >>>> + , by = orderID >>>> + ] >>>> + ) >>>> user system elapsed >>>> 1.30 0.05 1.34 >>> >>> Interesting. Impressive. And I noted that the OP wanted what cumsum would provide and for some reason creating that longer result is even faster on my machine than the shorter result using sum. >>> >>> -- >>> David. >>>>> >>>>> str(result) >>>> Classes ?data.table? and 'data.frame': 198708 obs. of 2 variables: >>>> $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... >>>> $ total : num 49 37 72 92 50 76 34 22 65 39 ... >>>>> head(result) >>>> orderID total >>>> [1,] 1 49 >>>> [2,] 2 37 >>>> [3,] 3 72 >>>> [4,] 4 92 >>>> [5,] 5 50 >>>> [6,] 6 76 >>>>> >>>> >>>> >>>> On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst >>>> wrote: >>>>> Hello there, >>>>> >>>>> >>>>> I?m computing the total value of an order from the price of the order items >>>>> using a ?for? loop and the ?ifelse? function. I do this on a large dataframe >>>>> (close to 1m lines). The computation of this function is painfully slow: in >>>>> 1min only about 90 rows are calculated. >>>>> >>>>> >>>>> The computation time taken for a given number of rows increases with the >>>>> size of the dataset, see the example with my function below: >>>>> >>>>> >>>>> # small dataset: function performs well >>>>> >>>>> exampledata<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >>>>> >>>>> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >>>>> >>>>> system.time(for (i in 2:length(exampledata[,1])) >>>>> {exampledata[i,"orderAmount"]<-ifelse(exampledata[i,"orderID"]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"]+exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >>>>> >>>>> >>>>> # large dataset: the very same computational task takes much longer >>>>> >>>>> exampledata2<-data.frame(orderID=c(1,1,1,2,2,3,3,3,4,5:2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) >>>>> >>>>> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >>>>> >>>>> system.time(for (i in 2:9) >>>>> {exampledata2[i,"orderAmount"]<-ifelse(exampledata2[i,"orderID"]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"]+exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >>>>> >>>>> >>>>> >>>>> Does someone know a way to increase the speed? >>>>> >>>>> >>>>> Thank you very much! >>>>> >>>>> Caroline >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Jim Holtman >>>> Data Munger Guru >>>> >>>> What is the problem that you are trying to solve? >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > From mandal.stat at gmail.com Wed Aug 3 18:47:21 2011 From: mandal.stat at gmail.com (Baidya Nath Mandal) Date: Wed, 3 Aug 2011 22:17:21 +0530 Subject: [R] R CMD check problem In-Reply-To: <4E37D5A7.20009@gmail.com> References: <4E37D5A7.20009@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mandal.stat at gmail.com Wed Aug 3 19:35:58 2011 From: mandal.stat at gmail.com (Baidya Nath Mandal) Date: Wed, 3 Aug 2011 23:05:58 +0530 Subject: [R] R CMD check problem In-Reply-To: <4E397FBB.8010901@gmail.com> References: <4E37D5A7.20009@gmail.com> <4E397FBB.8010901@gmail.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tinsley at sdac.harvard.edu Wed Aug 3 18:17:44 2011 From: tinsley at sdac.harvard.edu (Nynese Tinsley) Date: Wed, 3 Aug 2011 16:17:44 +0000 Subject: [R] Help Needed in attempting to install 64-bit R! Message-ID: Hello R Help, I am attempting to install/build a 64-bit version of R to hopefully resolve some memory.limit problems for a user who is running a simulation. The 'configure' runs fine and the compilation (make) runs fine until the very last part (see below). I have libiconv in /usr/local/lib (no sure why I am getting the referencing error). The config.site file is attached. ANY HELP YOU CAN GIVE WOULD BE GREATLY APPRECIATED! :) ./configure --without-readline DYLIB_LDFLAGS=-xarch=v9 MAIN_LDFLAGS=-xarch=v9 Then I run /usr/ccs/bin/make cc -xc99=all -xarch=v9 -xopenmp -L/usr/local/bin -o R.bin Rmain.o libR.a -L../../lib -lRblas -R/usr/local/opt/SUNWspro/lib/v9:/opt/SUNWspro/lib/v9 -L/usr/local/opt/SUNWspro/lib/v9 -L/usr/local/opt/SUNWspro/prod/lib/v9 -L/usr/ccs/lib/sparcv9 -L/lib/sparcv9 -L/usr/lib/sparcv9 -lfui -lfai -lfai2 -lfsumai -lfprodai -lfminlai -lfmaxlai -lfminvai -lfmaxvai -lfsu -lsunmath -lmtsk -lm -lnsl -lsocket -ldl -lm -licuuc -licui18n cc: Warning: Specify a supported level of optimization when using -xopenmp, -xopenmp will not set an optimization level in a future release. Optimization level changed to 3 to support -xopenmp Undefined first referenced symbol in file libiconv_close libR.a(sysutils.o) libiconv_open libR.a(sysutils.o) libiconv libR.a(sysutils.o) ld: fatal: Symbol referencing errors. No output written to R.bin *** Error code 1 make: Fatal error: Command failed for target `R.bin' Current working directory /usr/local/pkg/R-2.13.1/src/main *** Error code 1 The following command caused the error: /usr/ccs/bin/make install-bin-local make: Fatal error: Command failed for target `R' Current working directory /usr/local/pkg/R-2.13.1/src/main *** Error code 1 The following command caused the error: for d in scripts include extra appl nmath unix main modules library; do \ (cd ${d} && /usr/ccs/bin/make R) || exit 1; \ done make: Fatal error: Command failed for target `R' Current working directory /usr/local/pkg/R-2.13.1/src *** Error code 1 The following command caused the error: for d in m4 tools doc etc share src tests po; do \ (cd ${d} && /usr/ccs/bin/make R) || exit 1; \ done make: Fatal error: Command failed for target `R' Thanks, Nynese Nynese Tinsley, BSEE, MSCIS UNIX Systems Analyst Harvard School of Public Health Center for Biostatistics in AIDS Research 651 Huntington Ave, FXB 614 Boston, MA 02115 617-432-3244 office# 617-432-2843 fax# From murdoch.duncan at gmail.com Wed Aug 3 21:11:02 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 03 Aug 2011 15:11:02 -0400 Subject: [R] Convert matrix to numeric In-Reply-To: References: Message-ID: <4E399D46.6030803@gmail.com> On 03/08/2011 3:04 PM, Jeffrey Joh wrote: > I have a matrix that looks like this: > > > structure(c("0.0376673981759913", "0.111066500741386", "1", "1103", > "18", "OPEN", "DEPR", "0.0404073656092023", "0.115186044704599", > "1", "719", "18", "OPEN", "DEPR", "0.0665342096693433", "0.197570061769498", > "1", "1103", "18", "OPEN", "DEPR", "0.119287147905722", "0.356427096010845", > "1", "1103", "18", "OPEN", "DEPR"), .Dim = c(7L, 4L), .Dimnames = list( > c("Sn", "SlnC", "housenum", "date", "hour", "flue", "pressurization" > ), c("10019.BLO", "1002.BLO", "10020.BLO", "10021.BLO"))) > > > > How do I convert rows 1-5 to numeric? I tried mode()<- "numeric" but that doesn't change anything. Every entry in a matrix has the same type, so you can't change just those rows other than by extracting them into a separate matrix and changing that. Duncan Murdoch > > > I also tried converting this to a table then converting to numeric, but I got: (list) object cannot be coerced to type 'double' > > > > Jeff > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From vicvoncastle at gmail.com Wed Aug 3 21:12:44 2011 From: vicvoncastle at gmail.com (Ken) Date: Wed, 3 Aug 2011 15:12:44 -0400 Subject: [R] Convert matrix to numeric In-Reply-To: References: Message-ID: <44343C5A-7EB9-4417-96B9-60519236CAD7@gmail.com> How about Matrix[1:5,]=as.numeric(Matrix[1:5,]) -Ken Hutchison On Aug 3, 2554 BE, at 3:04 PM, Jeffrey Joh wrote: > > I have a matrix that looks like this: > > > structure(c("0.0376673981759913", "0.111066500741386", "1", "1103", > "18", "OPEN", "DEPR", "0.0404073656092023", "0.115186044704599", > "1", "719", "18", "OPEN", "DEPR", "0.0665342096693433", "0.197570061769498", > "1", "1103", "18", "OPEN", "DEPR", "0.119287147905722", "0.356427096010845", > "1", "1103", "18", "OPEN", "DEPR"), .Dim = c(7L, 4L), .Dimnames = list( > c("Sn", "SlnC", "housenum", "date", "hour", "flue", "pressurization" > ), c("10019.BLO", "1002.BLO", "10020.BLO", "10021.BLO"))) > > > > How do I convert rows 1-5 to numeric? I tried mode() <- "numeric" but that doesn't change anything. > > > > I also tried converting this to a table then converting to numeric, but I got: (list) object cannot be coerced to type 'double' > > > > Jeff > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From murdoch.duncan at gmail.com Wed Aug 3 21:12:33 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 03 Aug 2011 15:12:33 -0400 Subject: [R] implicit functions (was r-help) In-Reply-To: <5e37969a.8715.131903c9ab2.Coremail.knifeboot@163.com> References: <5e37969a.8715.131903c9ab2.Coremail.knifeboot@163.com> Message-ID: <4E399DA1.2020901@gmail.com> On 03/08/2011 11:21 AM, KnifeBoot wrote: > Hey, > Is there any function plotting several "implicit functions" (F(x,y)=0) on the same fig. Is there anyone who has an example code of how to do this? > The contour3d function in the misc3d package only work with the functions with three dimensions. > Thanks a lot. contour() will do it. Use add=TRUE to add extra functions. Duncan Murdoch P.S. Please use a more informative subject line! From sarah.goslee at gmail.com Wed Aug 3 21:19:10 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 3 Aug 2011 15:19:10 -0400 Subject: [R] Convert matrix to numeric In-Reply-To: References: Message-ID: Hi Jeffrey, On Wed, Aug 3, 2011 at 3:04 PM, Jeffrey Joh wrote: > > I have a matrix that looks like this: > > > structure(c("0.0376673981759913", "0.111066500741386", "1", "1103", > "18", "OPEN", "DEPR", "0.0404073656092023", "0.115186044704599", > "1", "719", "18", "OPEN", "DEPR", "0.0665342096693433", "0.197570061769498", > "1", "1103", "18", "OPEN", "DEPR", "0.119287147905722", "0.356427096010845", > "1", "1103", "18", "OPEN", "DEPR"), .Dim = c(7L, 4L), .Dimnames = list( > ? ?c("Sn", "SlnC", "housenum", "date", "hour", "flue", "pressurization" > ? ?), c("10019.BLO", "1002.BLO", "10020.BLO", "10021.BLO"))) Thank you for providing a small working example. > How do I convert rows 1-5 to numeric? ?I tried mode() <- "numeric" but that doesn't change anything. Two things are going on here. First, a matrix can only contain one kind of data. For this example, since there are strings the whole thing has to be character. A data frame is intended to hold different kinds of data, but each column has to be a single type. So if you want those values to be numeric instead of character, you'll need to transpose your matrix and convert it to a data frame. tempdata <- structure(c("0.0376673981759913", "0.111066500741386", "1", "1103", "18", "OPEN", "DEPR", "0.0404073656092023", "0.115186044704599", "1", "719", "18", "OPEN", "DEPR", "0.0665342096693433", "0.197570061769498", "1", "1103", "18", "OPEN", "DEPR", "0.119287147905722", "0.356427096010845", "1", "1103", "18", "OPEN", "DEPR"), .Dim = c(7L, 4L), .Dimnames = list( c("Sn", "SlnC", "housenum", "date", "hour", "flue", "pressurization" ), c("10019.BLO", "1002.BLO", "10020.BLO", "10021.BLO"))) tempdata <- data.frame(t(tempdata), stringsAsFactors=FALSE) Once you have the right kind of object, you can convert the five columns of interest to numeric. This needs to be done a column at a time, I think: tempdata[, 1:5] <- apply(tempdata[,1:5], 2, as.numeric) Sarah -- Sarah Goslee http://www.functionaldiversity.org From gunter.berton at gene.com Wed Aug 3 21:20:59 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Wed, 3 Aug 2011 12:20:59 -0700 Subject: [R] limits on liniar model In-Reply-To: <1312387204.492000-87448997-25688@walla.com> References: <1312387204.492000-87448997-25688@walla.com> Message-ID: Please use R's search capabilities before posting. RSiteSearch("Linear Model with Constraints") appears to give you what you're looking for. Incidentally, with constraints, the model is no longer linear, I believe. -- Bert 2011/8/3 ????? ???????? : > > ? Can I put limits on the lm() command? I only know that you can choose a > ? liniar model with or without an intercept, but can I put other limits on > ? the coefficients (for example- the intercept must be bigger than 1) ? > > ? ? _________________________________________________________________ > > ? Walla! Mail - [1]Get your free unlimited mail today > > References > > ? 1. http://www.walla.co.il/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From NordlDJ at dshs.wa.gov Wed Aug 3 21:28:19 2011 From: NordlDJ at dshs.wa.gov (Nordlund, Dan (DSHS/RDA)) Date: Wed, 3 Aug 2011 12:28:19 -0700 Subject: [R] Convert matrix to numeric In-Reply-To: <44343C5A-7EB9-4417-96B9-60519236CAD7@gmail.com> References: <44343C5A-7EB9-4417-96B9-60519236CAD7@gmail.com> Message-ID: <941871A13165C2418EC144ACB212BDB00201A0F9@dshsmxoly1504g.dshs.wa.lcl> > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of Ken > Sent: Wednesday, August 03, 2011 12:13 PM > To: Jeffrey Joh > Cc: > Subject: Re: [R] Convert matrix to numeric > > How about > Matrix[1:5,]=as.numeric(Matrix[1:5,]) > -Ken Hutchison > > On Aug 3, 2554 BE, at 3:04 PM, Jeffrey Joh > wrote: > > > > > I have a matrix that looks like this: > > > > > > structure(c("0.0376673981759913", "0.111066500741386", "1", "1103", > > "18", "OPEN", "DEPR", "0.0404073656092023", "0.115186044704599", > > "1", "719", "18", "OPEN", "DEPR", "0.0665342096693433", > "0.197570061769498", > > "1", "1103", "18", "OPEN", "DEPR", "0.119287147905722", > "0.356427096010845", > > "1", "1103", "18", "OPEN", "DEPR"), .Dim = c(7L, 4L), .Dimnames = > list( > > c("Sn", "SlnC", "housenum", "date", "hour", "flue", > "pressurization" > > ), c("10019.BLO", "1002.BLO", "10020.BLO", "10021.BLO"))) > > > > > > > > How do I convert rows 1-5 to numeric? I tried mode() <- "numeric" > but that doesn't change anything. > > Ken, You can't store the numeric values back in the matrix, because rows 6 and 7 contain character values. Everything will just be converted back to character. You need to create a new matrix for the numeric values. Hope this is helpful, Dan Daniel J. Nordlund Washington State Department of Social and Health Services Planning, Performance, and Accountability Research and Data Analysis Division Olympia, WA 98504-5204 From dwinsemius at comcast.net Wed Aug 3 21:46:09 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 15:46:09 -0400 Subject: [R] slow computation of functions over large datasets In-Reply-To: References: <427F9332-302E-4EF3-91C2-80D75255B1DD@comcast.net> <44DE9313-3653-4201-954A-E770FDF2FAEF@gmail.com> <3E01647D-7ED8-4B01-B8B4-6D258AB79CC3@comcast.net> Message-ID: <81DB9356-7DB7-46CF-9A82-204077159521@comcast.net> On Aug 3, 2011, at 3:05 PM, Ken wrote: > Sorry about the lack of code, but using Davids example, would: > tapply(itemPrice, INDEX=orderID, FUN=sum) > work? Doesn't do the cumulative sums or the assignment into column of the same data.frame. That's why I used ave, because it keeps the sequence correct. -- David. > -Ken Hutchison > > On Aug 3, 2554 BE, at 2:09 PM, David Winsemius > wrote: > >> >> On Aug 3, 2011, at 2:01 PM, Ken wrote: >> >>> Hello, >>> Perhaps transpose the table attach(as.data.frame(t(data))) and use >>> ColSums() function with order id as header. >>> -Ken Hutchison >> >> Got any code? The OP offered a reproducible example, after all. >> >> -- >> David. >>> >>> On Aug 3, 2554 BE, at 1:12 PM, David Winsemius >> > wrote: >>> >>>> >>>> On Aug 3, 2011, at 12:20 PM, jim holtman wrote: >>>> >>>>> This takes about 2 secs for 1M rows: >>>>> >>>>>> n <- 1000000 >>>>>> exampledata <- data.frame(orderID = sample(floor(n / 5), n, >>>>>> replace = TRUE), itemPrice = rpois(n, 10)) >>>>>> require(data.table) >>>>>> # convert to data.table >>>>>> ed.dt <- data.table(exampledata) >>>>>> system.time(result <- ed.dt[ >>>>> + , list(total = sum(itemPrice)) >>>>> + , by = orderID >>>>> + ] >>>>> + ) >>>>> user system elapsed >>>>> 1.30 0.05 1.34 >>>> >>>> Interesting. Impressive. And I noted that the OP wanted what >>>> cumsum would provide and for some reason creating that longer >>>> result is even faster on my machine than the shorter result using >>>> sum. >>>> >>>> -- >>>> David. >>>>>> >>>>>> str(result) >>>>> Classes ?data.table? and 'data.frame': 198708 obs. of 2 >>>>> variables: >>>>> $ orderID: int 1 2 3 4 5 6 8 9 10 11 ... >>>>> $ total : num 49 37 72 92 50 76 34 22 65 39 ... >>>>>> head(result) >>>>> orderID total >>>>> [1,] 1 49 >>>>> [2,] 2 37 >>>>> [3,] 3 72 >>>>> [4,] 4 92 >>>>> [5,] 5 50 >>>>> [6,] 6 76 >>>>>> >>>>> >>>>> >>>>> On Wed, Aug 3, 2011 at 9:25 AM, Caroline Faisst >>>>> wrote: >>>>>> Hello there, >>>>>> >>>>>> >>>>>> I?m computing the total value of an order from the price of the >>>>>> order items >>>>>> using a ?for? loop and the ?ifelse? function. I do this on a >>>>>> large dataframe >>>>>> (close to 1m lines). The computation of this function is >>>>>> painfully slow: in >>>>>> 1min only about 90 rows are calculated. >>>>>> >>>>>> >>>>>> The computation time taken for a given number of rows increases >>>>>> with the >>>>>> size of the dataset, see the example with my function below: >>>>>> >>>>>> >>>>>> # small dataset: function performs well >>>>>> >>>>>> exampledata<- >>>>>> data >>>>>> .frame >>>>>> (orderID >>>>>> =c(1,1,1,2,2,3,3,3,4),itemPrice=c(10,17,9,12,25,10,1,9,7)) >>>>>> >>>>>> exampledata[1,"orderAmount"]<-exampledata[1,"itemPrice"] >>>>>> >>>>>> system.time(for (i in 2:length(exampledata[,1])) >>>>>> {exampledata[i,"orderAmount"]<- >>>>>> ifelse >>>>>> (exampledata >>>>>> [i >>>>>> ,"orderID >>>>>> "]==exampledata[i-1,"orderID"],exampledata[i-1,"orderAmount"] >>>>>> +exampledata[i,"itemPrice"],exampledata[i,"itemPrice"])}) >>>>>> >>>>>> >>>>>> # large dataset: the very same computational task takes much >>>>>> longer >>>>>> >>>>>> exampledata2<- >>>>>> data >>>>>> .frame >>>>>> (orderID >>>>>> = >>>>>> c >>>>>> (1,1,1,2,2,3,3,3,4,5 >>>>>> :2000000),itemPrice=c(10,17,9,12,25,10,1,9,7,25:2000020)) >>>>>> >>>>>> exampledata2[1,"orderAmount"]<-exampledata2[1,"itemPrice"] >>>>>> >>>>>> system.time(for (i in 2:9) >>>>>> {exampledata2[i,"orderAmount"]<- >>>>>> ifelse >>>>>> (exampledata2 >>>>>> [i >>>>>> ,"orderID >>>>>> "]==exampledata2[i-1,"orderID"],exampledata2[i-1,"orderAmount"] >>>>>> +exampledata2[i,"itemPrice"],exampledata2[i,"itemPrice"])}) >>>>>> >>>>>> >>>>>> >>>>>> Does someone know a way to increase the speed? >>>>>> >>>>>> >>>>>> Thank you very much! >>>>>> >>>>>> Caroline >>>>>> >>>>>> [[alternative HTML version deleted]] >>>>>> >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible >>>>>> code. >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jim Holtman >>>>> Data Munger Guru >>>>> >>>>> What is the problem that you are trying to solve? >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> David Winsemius, MD >>>> West Hartford, CT >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> David Winsemius, MD West Hartford, CT From 1987.zhangxi at gmail.com Wed Aug 3 21:21:49 2011 From: 1987.zhangxi at gmail.com (zoe_zhang) Date: Wed, 3 Aug 2011 12:21:49 -0700 (PDT) Subject: [R] the significance of BEKK estimation Message-ID: <1312399309745-3716586.post@n4.nabble.com> Dear ALL, I use BEKK package to estimate Bivariate GARCH model. But when the results come out, there's no t-stat or p-value of the estimated coeffients. Does anyone know how to get the significance? Followings are the codes I input, >P1=data.frame(x,y) >y1=mvBEKK.est(P1) >mvBEKK.diag(y1) Anyhelp would be appreciated! Sincere, Zoe -- View this message in context: http://r.789695.n4.nabble.com/the-significance-of-BEKK-estimation-tp3716586p3716586.html Sent from the R help mailing list archive at Nabble.com. From 1987.zhangxi at gmail.com Wed Aug 3 21:25:51 2011 From: 1987.zhangxi at gmail.com (zoe_zhang) Date: Wed, 3 Aug 2011 12:25:51 -0700 (PDT) Subject: [R] the significance of BEKK estimation In-Reply-To: <1312399309745-3716586.post@n4.nabble.com> References: <1312399309745-3716586.post@n4.nabble.com> Message-ID: <1312399551526-3716597.post@n4.nabble.com> Here is one more question, How could I input an asymmetry in volatility speci cation in the BEKK function? As far as I know, the BEKK estimation function is mvBEKK.est(eps, order = c(1,1), params = NULL, fixed = NULL, method = "BFGS", verbose = F) I totally have no idea to exert an asymmetry into. Many thanks! Sincere, Zoe -- View this message in context: http://r.789695.n4.nabble.com/the-significance-of-BEKK-estimation-tp3716586p3716597.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Wed Aug 3 21:57:15 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 3 Aug 2011 12:57:15 -0700 Subject: [R] gstat error In-Reply-To: <4704BF59-9113-499B-B613-DC2CDF3B79FF@comcast.net> References: <1acbe04d8356bc51c23c7fb0f5a106b0.squirrel@webmail.ssc.wisc.edu> <4E3983E7.4000301@ucalgary.ca> <17dd0543208bb9f9fd6085b273fdc539.squirrel@webmail.ssc.wisc.edu> <4704BF59-9113-499B-B613-DC2CDF3B79FF@comcast.net> Message-ID: To add to David's comments (nice catch, BTW), I found three variogram() functions as a result of ??variogram. The one that gets used is from the package that is highest in the search path (notice that gstat is 55th (!!)) - that would be the one from the spatial package. [The other is in the SpatialExtremes package, which is not loaded, so the one in spatial is masking the one in gstat.] To use the variogram() function in gstat, call gstat::variogram (...). Dennis On Wed, Aug 3, 2011 at 12:04 PM, David Winsemius wrote: > I see a 'variogram' function in both spatial and gstat when I use > ??variogram on my machine that probably does not have even all of those > packages installed. Are you sure they are the same (I looked .... they are > not) ?or failing that that the one you expect is being chosen? And are you > even sure that there is not a third or a fourth 'variogram' in one of those > other packages? > > -- > David. > > > On Aug 3, 2011, at 2:45 PM, gbrenes at ssc.wisc.edu wrote: > >> Here is my sessionInfo() >> >>> sessionInfo() >> >> R version 2.12.2 (2011-02-25) >> Platform: i386-pc-mingw32/i386 (32-bit) >> >> locale: >> [1] LC_COLLATE=English_United States.1252 ?LC_CTYPE=English_United >> States.1252 >> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C >> [5] LC_TIME=English_United States.1252 >> >> attached base packages: >> [1] splines ? grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets >> methods >> [9] base >> >> other attached packages: >> [1] spsurvey_2.1-2 ? ? ? ? ?lmtest_0.9-27 ? ? ? ? ? zoo_1.6-5 >> [4] car_2.0-9 ? ? ? ? ? ? ? survival_2.36-5 ? ? ? ? nnet_7.3-1 >> [7] spgwr_0.6-10 ? ? ? ? ? ?spatialCovariance_0.6-4 spatial_7.3-2 >> [10] spatgraphs_2.44 ? ? ? ? sgeostat_1.0-23 ? ? ? ? rworldmap_0.1211 >> [13] fields_6.3 ? ? ? ? ? ? ?spam_0.23-0 ? ? ? ? ? ? RPyGeo_0.9-2 >> [16] RSAGA_0.91-1 ? ? ? ? ? ?shapefiles_0.6 ? ? ? ? ?RgoogleMaps_1.1.9.7 >> [19] raster_1.8-22 ? ? ? ? ? RArcInfo_0.4-10 ? ? ? ? RColorBrewer_1.0-2 >> [22] PBSmodelling_2.61.210 ? PBSmapping_2.61.9 ? ? ? mapproj_1.1-8.3 >> [25] mapdata_2.1-4 ? ? ? ? ? intamap_1.3-8 ? ? ? ? ? evd_2.2-4 >> [28] mvtnorm_0.9-96 ? ? ? ? ?automap_1.0-9 ? ? ? ? ? rgdal_0.6-33 >> [31] gmaps_0.2 ? ? ? ? ? ? ? maps_2.1-6 ? ? ? ? ? ? ?glmmBUGS_1.9 >> [34] spdep_0.5-32 ? ? ? ? ? ?coda_0.14-2 ? ? ? ? ? ? deldir_0.0-13 >> [37] maptools_0.8-7 ? ? ? ? ?foreign_0.8-42 ? ? ? ? ?Matrix_0.999375-46 >> [40] lattice_0.19-17 ? ? ? ? boot_1.2-43 ? ? ? ? ? ? abind_1.3-0 >> [43] MASS_7.3-11 ? ? ? ? ? ? geosphere_1.2-19 ? ? ? ?geonames_0.8 >> [46] rjson_0.2.3 ? ? ? ? ? ? ctv_0.7-2 ? ? ? ? ? ? ? GEOmap_1.5-13 >> [49] akima_0.5-4 ? ? ? ? ? ? RPMG_2.0-5 ? ? ? ? ? ? ?splancs_2.01-27 >> [52] geomapdata_1.0-4 ? ? ? ?geoRglm_0.8-33 ? ? ? ? ?geoR_1.6-34 >> [55] gstat_0.9-81 ? ? ? ? ? ?sp_0.9-81 ? ? ? ? ? ? ? nlme_3.1-98 >> >> loaded via a namespace (and not attached): >> [1] tcltk_2.12.2 tools_2.12.2 >>> >> >> >>> On 2011-08-03 09:40, gbrenes at ssc.wisc.edu wrote: >>>> >>>> Hello. >>>> >>>> I am running the examples provided in the gstat help menus. ?When I try >>>> to >>>> run the following in predict.gstat: >>>> >>>> data(meuse) >>>> coordinates(meuse)= ~x+y >>>> v<-variogram(log(zinc)~1, meuse) >>>> >>>> I get the following error message: >>>> >>>> Error in vector("double", length) : invalid 'length' argument >>>> >>>> >>>> What's the problem? >>> >>> You should at the very least provide your sessionInfo(). >>> >>> Peter Ehlers >>> >>>> >>>> >>>> Gilbert >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From dwinsemius at comcast.net Wed Aug 3 22:28:09 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 16:28:09 -0400 Subject: [R] Convert matrix to numeric In-Reply-To: References: Message-ID: <08CD8702-71E3-4784-AAD6-FF7732485DEE@comcast.net> Here's what you _should_ do 1) transpose 2a) as.data.frame 3a) fix the stupid default stringsAsFactor behavior 4a) convert the first 5 columns to numeric dfrm <- as.data.frame( t( structure(.) ) ) dfrm[, 1:5] <-lapply(dfrm[, 1:5], as.character) dfrm[, 1:5] <-lapply(dfrm[, 1:5], as.numeric) Or: 1) transpose 2b) as.data.frame with stringsAsFactors= FALSE 3b) convert to numeric On Aug 3, 2011, at 3:04 PM, Jeffrey Joh wrote: > > I have a matrix that looks like this: > > > structure(c("0.0376673981759913", "0.111066500741386", "1", "1103", > "18", "OPEN", "DEPR", "0.0404073656092023", "0.115186044704599", > "1", "719", "18", "OPEN", "DEPR", "0.0665342096693433", > "0.197570061769498", > "1", "1103", "18", "OPEN", "DEPR", "0.119287147905722", > "0.356427096010845", > "1", "1103", "18", "OPEN", "DEPR"), .Dim = c(7L, 4L), .Dimnames = > list( > c("Sn", "SlnC", "housenum", "date", "hour", "flue", > "pressurization" > ), c("10019.BLO", "1002.BLO", "10020.BLO", "10021.BLO"))) > > > > How do I convert rows 1-5 to numeric? I tried mode() <- "numeric" > but that doesn't change anything. > > > > I also tried converting this to a table then converting to numeric, > but I got: (list) object cannot be coerced to type 'double' > > > > Jeff > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From jsorkin at grecc.umaryland.edu Wed Aug 3 22:35:31 2011 From: jsorkin at grecc.umaryland.edu (John Sorkin) Date: Wed, 03 Aug 2011 16:35:31 -0400 Subject: [R] limits on liniar model Message-ID: <4E3978D3020000CB00091BD4@medicine.umaryland.edu> It is hard to prove a negative, but to the best of my knowledge lm will not do what you want. This does not mean there is not a function that will perform your analyses; the sort of thing you want to do is often accomplished using non-linear methods. John >>> ????? ???????? 8/3/2011 12:00:04 PM >>> Can I put limits on the lm() command? I only know that you can choose a liniar model with or without an intercept, but can I put other limits on the coefficients (for example- the intercept must be bigger than 1) ? _________________________________________________________________ Walla! Mail - [1]Get your free unlimited mail today References 1. http://www.walla.co.il/ Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:6}} From pdalgd at gmail.com Wed Aug 3 22:39:04 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Wed, 3 Aug 2011 22:39:04 +0200 Subject: [R] R.app installer probs on Snow Leopard In-Reply-To: <0A01FBF4-5A06-4EDC-AD24-5DE687EB149C@mac.com> References: <0A01FBF4-5A06-4EDC-AD24-5DE687EB149C@mac.com> Message-ID: <7031F36A-589C-40B1-9813-5BA2EC2CBC50@gmail.com> On Aug 3, 2011, at 18:35 , Walter Ludwick wrote: > Have tried to install R.app several times (6, in fact: versions 2.12, 13 & 14, both 32 and 64 bit versions), using packages freshly downloaded from the official project page, and failed every time, given exception reports such as the following (appended below, the 2 reports arising out of my 1st & 6th attempts). > > Machine & software version specifics are all contained therein. > > What am i missing, i wonder? Any clues would be most appreciated -thanx! /w What did you do to install? For a plain install, just get http://cran.r-project.org/bin/macosx/R-2.13.1.pkg open it and follow the instructions. If you tried to install the http://cran.r-project.org/bin/macosx/Mac-GUI-1.41.tar.gz then I suspect that you missed the point, that R.app is something you install _on_ _top_ _of_ an installation of R itself. > > > > 8<--------(snip)------->8 > > Process: R [15997] > Path: /Applications/R.app/Contents/MacOS/R > Identifier: org.R-project.R > Version: ??? (???) > Code Type: X86-64 (Native) > Parent Process: launchd [179] > > Date/Time: 2011-08-03 16:13:36.857 +0100 > OS Version: Mac OS X 10.6.8 (10K540) > Report Version: 6 > > Interval Since Last Report: 23665 sec > Crashes Since Last Report: 5 > Per-App Crashes Since Last Report: 3 > Anonymous UUID: A3B4FAD8-70A5-420F-A0E1-E02624B493A5 > > Exception Type: EXC_BREAKPOINT (SIGTRAP) > Exception Codes: 0x0000000000000002, 0x0000000000000000 > Crashed Thread: 0 > > Dyld Error Message: > Library not loaded: /Library/Frameworks/R.framework/Versions/2.14/Resources/lib/libR.dylib > Referenced from: /Applications/R.app/Contents/MacOS/R > Reason: image not found > > Binary Images: > 0x7fff5fc00000 - 0x7fff5fc3bdef dyld 132.1 (???) <69130DA3-7CB3-54C8-ABC5-423DECDD2AF7> /usr/lib/dyld > > Model: MacBookPro5,5, BootROM MBP55.00AC.B03, 2 processors, Intel Core 2 Duo, 2.53 GHz, 4 GB, SMC 1.47f2 > Graphics: NVIDIA GeForce 9400M, NVIDIA GeForce 9400M, PCI, 256 MB > Memory Module: global_name > AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x8D), Broadcom BCM43xx 1.0 (5.10.131.42.4) > Bluetooth: Version 2.4.5f3, 2 service, 19 devices, 1 incoming serial ports > Network Service: AirPort, AirPort, en1 > Serial ATA Device: ST9250315ASG, 232.89 GB > Serial ATA Device: HL-DT-ST DVDRW GS23N > USB Device: Internal Memory Card Reader, 0x05ac (Apple Inc.), 0x8403, 0x26500000 / 2 > USB Device: Built-in iSight, 0x05ac (Apple Inc.), 0x8507, 0x24400000 / 2 > USB Device: BRCM2046 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0x06100000 / 2 > USB Device: Bluetooth USB Host Controller, 0x05ac (Apple Inc.), 0x8213, 0x06110000 / 4 > USB Device: Apple Internal Keyboard / Trackpad, 0x05ac (Apple Inc.), 0x0237, 0x04600000 / 3 > USB Device: IR Receiver, 0x05ac (Apple Inc.), 0x8242, 0x04500000 / 2 > > 8<--------(snip)------->8 > > Process: R [16330] > Path: /Applications/R.app/Contents/MacOS/R > Identifier: org.R-project.R > Version: ??? (???) > Code Type: X86 (Native) > Parent Process: launchd [179] > > Date/Time: 2011-08-03 17:18:06.587 +0100 > OS Version: Mac OS X 10.6.8 (10K540) > Report Version: 6 > > Interval Since Last Report: 27534 sec > Crashes Since Last Report: 9 > Per-App Crashes Since Last Report: 7 > Anonymous UUID: A3B4FAD8-70A5-420F-A0E1-E02624B493A5 > > Exception Type: EXC_BREAKPOINT (SIGTRAP) > Exception Codes: 0x0000000000000002, 0x0000000000000000 > Crashed Thread: 0 > > Dyld Error Message: > Library not loaded: /Library/Frameworks/R.framework/Versions/2.12/Resources/lib/libR.dylib > Referenced from: /Applications/R.app/Contents/MacOS/R > Reason: image not found > > Binary Images: > 0x8fe00000 - 0x8fe4162b dyld 132.1 (???) <1C06ECD9-A2D7-BB10-AF50-0F2B598A7DEC> /usr/lib/dyld > > Model: MacBookPro5,5, BootROM MBP55.00AC.B03, 2 processors, Intel Core 2 Duo, 2.53 GHz, 4 GB, SMC 1.47f2 > Graphics: NVIDIA GeForce 9400M, NVIDIA GeForce 9400M, PCI, 256 MB > Memory Module: global_name > AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x8D), Broadcom BCM43xx 1.0 (5.10.131.42.4) > Bluetooth: Version 2.4.5f3, 2 service, 19 devices, 1 incoming serial ports > Network Service: AirPort, AirPort, en1 > Serial ATA Device: ST9250315ASG, 232.89 GB > Serial ATA Device: HL-DT-ST DVDRW GS23N > USB Device: Internal Memory Card Reader, 0x05ac (Apple Inc.), 0x8403, 0x26500000 / 2 > USB Device: Built-in iSight, 0x05ac (Apple Inc.), 0x8507, 0x24400000 / 2 > USB Device: BRCM2046 Hub, 0x0a5c (Broadcom Corp.), 0x4500, 0x06100000 / 2 > USB Device: Bluetooth USB Host Controller, 0x05ac (Apple Inc.), 0x8213, 0x06110000 / 4 > USB Device: Apple Internal Keyboard / Trackpad, 0x05ac (Apple Inc.), 0x0237, 0x04600000 / 3 > USB Device: IR Receiver, 0x05ac (Apple Inc.), 0x8242, 0x04500000 / 2 > > 8<--------(snip)------->8 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From dwinsemius at comcast.net Wed Aug 3 22:39:44 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 3 Aug 2011 16:39:44 -0400 Subject: [R] R.app installer probs on Snow Leopard In-Reply-To: <0A01FBF4-5A06-4EDC-AD24-5DE687EB149C@mac.com> References: <0A01FBF4-5A06-4EDC-AD24-5DE687EB149C@mac.com> Message-ID: Did you install R first? R.app is just a GUI around the actual R code that could run without any assistance in a terminal session. Generally one installs both R and R.app from the "super-bundle". Since you provided no details of which .pkg files were chosen we are left guessing. (And this is really supposed to be posted to the MAC-SIG list, anyway.) -- David. On Aug 3, 2011, at 12:35 PM, Walter Ludwick wrote: > Have tried to install R.app several times (6, in fact: versions > 2.12, 13 & 14, both 32 and 64 bit versions), using packages Names? Links? > freshly downloaded from the official project page, And that means what? The ATT page? or the CRAN page? > and failed every time, given exception reports such as the following > (appended below, the 2 reports arising out of my 1st & 6th attempts). > > Machine & software version specifics are all contained therein. > > What am i missing, i wonder? We don't know because so many details are still, .... missing. > Any clues would be most appreciated -thanx! /w > (Running R 2.13.1 on OSX 10.5.8) -- David Winsemius, MD West Hartford, CT From jholtman at gmail.com Wed Aug 3 22:41:06 2011 From: jholtman at gmail.com (jim holtman) Date: Wed, 3 Aug 2011 16:41:06 -0400 Subject: [R] r-help In-Reply-To: <5e37969a.8715.131903c9ab2.Coremail.knifeboot@163.com> References: <5e37969a.8715.131903c9ab2.Coremail.knifeboot@163.com> Message-ID: Some sample data would be useful. If you want to add more lines to a plot, the use 'lines' plot(fun1,....) lines(fun2, .... lines(fun3, .... On Wed, Aug 3, 2011 at 11:21 AM, KnifeBoot wrote: > ?Hey, > ? ?Is there any function plotting several "implicit functions" (F(x,y)=0) on the same fig. ? ? Is there anyone who has an example code of how to do this? > ? ?The contour3d function in the misc3d package only work with the ?functions with three dimensions. > ? ?Thanks a lot. > ? ?Many thanks for your help. > ? ?KnifeBoot > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From peter.langfelder at gmail.com Wed Aug 3 22:41:34 2011 From: peter.langfelder at gmail.com (Peter Langfelder) Date: Wed, 3 Aug 2011 13:41:34 -0700 Subject: [R] R CMD check thinks my function is an S3 method Message-ID: Hi all, in my package I have a function with name plot.cor (this function is inherited from another legacy package). According to CRAN package checks reports, the check apparently thinks plot.cor is a method for the plot generic (I hope I'm using the correct terminology). checking Rd \usage sections ... NOTE S3 methods shown with full name in documentation object 'plot.cor': ?plot.cor? Although technically it doesn't seem to be an error (and CRAN maintainers haven't warned me about this), I was asked to clean up the package to the point where the package check goes through without any problems. My question is: do I need to rename the function (e.g. to plotCor, which won't be mistaken for an S3 method) or is there a way to tell R that this is not an S3 method? Thanks, Peter From ajn21 at case.edu Wed Aug 3 22:57:58 2011 From: ajn21 at case.edu (a217) Date: Wed, 3 Aug 2011 13:57:58 -0700 (PDT) Subject: [R] Writing a summary file in R In-Reply-To: <2AF93DE3-7255-469F-8952-16CACFFED6FA@comcast.net> References: <1311807756302-3700031.post@n4.nabble.com> <2AF93DE3-7255-469F-8952-16CACFFED6FA@comcast.net> Message-ID: <1312405078060-3716837.post@n4.nabble.com> Just a very simple follow-up. In the summary table (listed as "summ" below), the "TR" column I would like to display the total number of rows (i.e. counts) which I have done via "NROW()" function. However, in the "RG1" I would only like to count the number of rows with a 'totalread' count >= 1 (i.e. rows that don't contain zero). This may be confusing given the data I've provided, but values in the 'totalreads' column don't have to be 1 or 0, they can be any value. Therefore using "sum()" won't work in every case. As you can see I've tried using NROW() below for "RG1" but it didn't work out like I had planned. For example, given the input data, "chr4 100 300" should have RG1=1 and percent=0.5. Instead, it just counts every row regardless of value. The solution is probably something very simple I'm overlooking, but if you could help I'd appreciate it. Below is the code I've slightly modified from David's reply: ###############################Code############################## > colnames(data) <- > c("chr","start","end","base1","base2","totalreads","methylation","strand") > data #this is the input file #################################### chr start end base1 base2 totalreads methylation strand 1 chr1 100 159 104 104 1 0.05 + 2 chr1 100 159 145 145 1 0.04 + 3 chr1 200 260 205 205 1 0.12 + 4 chr1 500 750 600 600 1 0.09 + 5 chr3 450 700 500 500 1 0.03 + 6 chr4 100 300 150 150 1 0.05 + 7 chr4 100 300 175 175 0 0.00 + 8 chr7 350 600 400 400 1 0.06 + 9 chr7 350 600 550 550 0 0.00 + 10 chr9 100 125 100 100 1 0.10 + 11 chr11 679 687 680 680 1 0.07 + 12 chr11 679 687 681 681 0 0.00 + 13 chr22 100 200 105 105 1 0.03 + 14 chr22 100 200 110 110 1 0.08 + 15 chr22 300 400 350 350 0 0.00 + #################################### > splinp <- split(data, paste(data$chr, data$start)) > df <- as.data.frame(t(sapply(splinp, function(x) list(end=x$end[1], > TR=NROW(x[['totalreads']]), RG1=NROW(x[['totalreads']]>=1), > percent=(NROW(x[['totalreads']]>=1)/NROW(x[['totalreads']])))))) > df ####################### end TR RG1 percent chr1 100 159 2 2 1 chr1 200 260 1 1 1 chr1 500 750 1 1 1 chr11 679 687 2 2 1 chr22 100 200 2 2 1 chr22 300 400 1 1 1 chr3 450 700 1 1 1 chr4 100 300 2 2 1 chr7 350 600 2 2 1 chr9 100 125 1 1 1 ####################### > df.summ <- as.data.frame(t(sapply(splinp, function(x) > summary(x$methylation)))) > summ<-cbind(df,df.summ) > summ #the finished output ########################################### end TR RG1 percent Min. 1st Qu. Median Mean 3rd Qu. Max. chr1 100 159 2 2 1 0.04 0.0425 0.045 0.045 0.0475 0.05 chr1 200 260 1 1 1 0.12 0.1200 0.120 0.120 0.1200 0.12 chr1 500 750 1 1 1 0.09 0.0900 0.090 0.090 0.0900 0.09 chr11 679 687 2 2 1 0.00 0.0175 0.035 0.035 0.0525 0.07 chr22 100 200 2 2 1 0.03 0.0425 0.055 0.055 0.0675 0.08 chr22 300 400 1 1 1 0.00 0.0000 0.000 0.000 0.0000 0.00 chr3 450 700 1 1 1 0.03 0.0300 0.030 0.030 0.0300 0.03 chr4 100 300 2 2 1 0.00 0.0125 0.025 0.025 0.0375 0.05 chr7 350 600 2 2 1 0.00 0.0150 0.030 0.030 0.0450 0.06 chr9 100 125 1 1 1 0.10 0.1000 0.100 0.100 0.1000 0.10 ############################################ ############################################################## David Winsemius wrote: > > On Jul 27, 2011, at 9:42 PM, Dennis Murphy wrote: > >> Hi: >> >> Is this more or less what you're after? >> >> ## Note: This is the preferred way to send your data by e-mail. >> ## I used dput(data-frame-name) to produce this, >> ## where data-frame-name = 'df' on my end. >> df <- structure(list(V1 = c("chr1", "chr1", "chr1", "chr1", "chr3", >> "chr4", "chr4", "chr7", "chr7", "chr9", "chr11", "chr11", "chr22", >> "chr22", "chr22"), V2 = c(100L, 100L, 200L, 500L, 450L, 100L, >> 100L, 350L, 350L, 100L, 679L, 679L, 100L, 100L, 300L), V3 = c(159L, >> 159L, 260L, 750L, 700L, 300L, 300L, 600L, 600L, 125L, 687L, 687L, >> 200L, 200L, 400L), V4 = c(104L, 145L, 205L, 600L, 500L, 150L, >> 175L, 400L, 550L, 100L, 680L, 681L, 105L, 110L, 350L), V5 = c(104L, >> 145L, 205L, 600L, 500L, 150L, 175L, 400L, 550L, 100L, 680L, 681L, >> 105L, 110L, 350L), V6 = c(1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, >> 1L, 1L, 0L, 1L, 1L, 0L), V7 = c(0.05, 0.04, 0.12, 0.09, 0.03, >> 0.05, 0, 0.06, 0, 0.1, 0.07, 0, 0.03, 0.08, 0), V8 = c("+", "+", >> "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+", "+" >> )), .Names = c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8" >> ), class = "data.frame", row.names = c(NA, -15L)) >> >> ############ >> # This is the structure you should see: >>> str(df) >> 'data.frame': 15 obs. of 8 variables: >> $ V1: chr "chr1" "chr1" "chr1" "chr1" ... >> $ V2: int 100 100 200 500 450 100 100 350 350 100 ... >> $ V3: int 159 159 260 750 700 300 300 600 600 125 ... >> $ V4: int 104 145 205 600 500 150 175 400 550 100 ... >> $ V5: int 104 145 205 600 500 150 175 400 550 100 ... >> $ V6: int 1 1 1 1 1 1 0 1 0 1 ... >> $ V7: num 0.05 0.04 0.12 0.09 0.03 0.05 0 0.06 0 0.1 ... >> $ V8: chr "+" "+" "+" "+" ... >> ############ >> >> # Method 1: Write a function and call ddply() >> summfun <- function(d) { >> dsum <- as.data.frame(as.list(summary(d[['V7']]))) >> names(dsum) <- c('Min', 'Q1', 'Median', 'Mean', 'Q3', 'Max') >> data.frame(V3 = d[1, 'V3'], dsum) >> } >> library('plyr') >> ddply(df, .(V1, V2), summfun) >> >> The idea behind summfun is this: ddply() prefers functions that take a >> data frame as input and a data frame (or scalar) as output. dsum >> converts summary(V7) to a data frame by first coercing it into a list >> and then to a data frame. The names are changed for convenience. dsum >> has one line, so we add V3 to the data frame before outputting it. >> ddply() will attach the grouping variables to the output >> automatically; however, you can put them into the output data frame >> and ddply() will not duplicate the grouping variables in the output. >> >> The alternative in ddply(), which is simpler code, outputs the results >> from summary() in different rows for each grouping. In this event, it >> is useful to carry along the names of the summaries so that one can >> recast the data with the cast() function from the reshape package: >> >> # Method 2: Summarize and reshape >> # V3 is unnecessary but it is useful to carry it along for the output >> u <- ddply(df, .(V1, V2, V3), summarise, summ = summary(V7), >> summtype = names(summary(V7))) >> library('reshape') >> cast(u, V1 + V2 + V3 ~ summtype, value = 'summ') >> >> HTH, >> Dennis >> >> PS: I may be one of those folks to whom David was referring in >> relation to plyr :) > > I've been really impressed at Dennis' facility with plyr, reshape, and > reshape2. Note that the 'reshape' function has nothing to do with the > 'reshape' package. Here's what I came up with using base functions: > > > str(inpdat) > 'data.frame': 15 obs. of 8 variables: > $ chromosome : chr "chr1" "chr1" "chr1" "chr1" ... > $ startreg : int 100 100 200 500 450 100 100 350 350 100 ... > $ endreg : int 159 159 260 750 700 300 300 600 600 125 ... > $ base1 : int 104 145 205 600 500 150 175 400 550 100 ... > $ base2 : int 104 145 205 600 500 150 175 400 550 100 ... > $ totalreads : int 1 1 1 1 1 1 0 1 0 1 ... > $ methylation: num 0.05 0.04 0.12 0.09 0.03 0.05 0 0.06 0 0.1 ... > $ strand : chr "+" "+" "+" "+" ... > # The split into distinct 'chromosome' and 'startreg' categories: > splinp <- split(inpdat, paste(inpdat$chromosome, inpdat$startreg) ) > > # Process within separate categories: the tapply, aggragate and by > functions are all related > > > df <- as.data.frame( t(sapply(splinp, function(x) list(chr=x > $chromosome[1], strt=x$startreg[1], end=x$endreg[1], > frac=sum(x[['totalreads']]>=1)/nrow(x) )) ) ) > # You often need the t() function when working with apply functions > > df > chr strt end frac > chr1 100 chr1 100 159 1 > chr1 200 chr1 200 260 1 > chr1 500 chr1 500 750 1 > chr11 679 chr11 679 687 0.5 > chr22 100 chr22 100 200 1 > chr22 300 chr22 300 400 0 > chr3 450 chr3 450 700 1 > chr4 100 chr4 100 300 0.5 > chr7 350 chr7 350 600 0.5 > chr9 100 chr9 100 125 1 > > > as.data.frame(t(sapply(splinp, function(x) summary(x > $methylation )) ) ) > Min. 1st Qu. Median Mean 3rd Qu. Max. > chr1 100 0.04 0.0425 0.045 0.045 0.0475 0.05 > chr1 200 0.12 0.1200 0.120 0.120 0.1200 0.12 > chr1 500 0.09 0.0900 0.090 0.090 0.0900 0.09 > chr11 679 0.00 0.0175 0.035 0.035 0.0525 0.07 > chr22 100 0.03 0.0425 0.055 0.055 0.0675 0.08 > chr22 300 0.00 0.0000 0.000 0.000 0.0000 0.00 > chr3 450 0.03 0.0300 0.030 0.030 0.0300 0.03 > chr4 100 0.00 0.0125 0.025 0.025 0.0375 0.05 > chr7 350 0.00 0.0150 0.030 0.030 0.0450 0.06 > chr9 100 0.10 0.1000 0.100 0.100 0.1000 0.10 > > # The coup de grace: bind the columns together > > > df.summ <- as.data.frame(t(sapply(splinp, function(x) summary(x > $methylation )) ) ) > > cbind(df, df.summ) > chr strt end frac Min. 1st Qu. Median Mean 3rd Qu. Max. > chr1 100 chr1 100 159 1 0.04 0.0425 0.045 0.045 0.0475 0.05 > chr1 200 chr1 200 260 1 0.12 0.1200 0.120 0.120 0.1200 0.12 > chr1 500 chr1 500 750 1 0.09 0.0900 0.090 0.090 0.0900 0.09 > chr11 679 chr11 679 687 0.5 0.00 0.0175 0.035 0.035 0.0525 0.07 > chr22 100 chr22 100 200 1 0.03 0.0425 0.055 0.055 0.0675 0.08 > chr22 300 chr22 300 400 0 0.00 0.0000 0.000 0.000 0.0000 0.00 > chr3 450 chr3 450 700 1 0.03 0.0300 0.030 0.030 0.0300 0.03 > chr4 100 chr4 100 300 0.5 0.00 0.0125 0.025 0.025 0.0375 0.05 > chr7 350 chr7 350 600 0.5 0.00 0.0150 0.030 0.030 0.0450 0.06 > chr9 100 chr9 100 125 1 0.10 0.1000 0.100 0.100 0.1000 0.10 > > -- > Best; > > David. > >> >> On Wed, Jul 27, 2011 at 4:02 PM, a217 <ajn21 at case.edu> wrote: >>> Hello, >>> >>> I have an input file: >>> http://r.789695.n4.nabble.com/file/n3700031/testOut.txt testOut.txt >>> >>> where col 1 is chromosome, column2 is start of region, column 3 is >>> end of >>> region, column 4 and 5 is base position, column 6 is total reads, >>> column 7 >>> is methylation data, and column 8 is the strand. >>> >>> >>> I would like a summary output file such as: >>> http://r.789695.n4.nabble.com/file/n3700031/out.summary.txt >>> out.summary.txt >>> >>> where column 1 is chromosome, column 2 is start of region, column 3 >>> is end >>> of region, column 4 is total reads in general, column 5 is total >>> reads >=1, >>> column 6 is (col4/col5) or the percentage, and at the end I'd like >>> to list 6 >>> more columns based on summary results from summary() function in R. >>> >>> The summary() function will be used to analyze all of the >>> methylation data >>> (col7 from input) for each region (bounded by col2 and col3). >>> >>> For example for chr1 100 159 summary() gives: >>> Min. 1st Qu. Median Mean 3rd Qu. Max. >>> 0.0400 0.0425 0.0450 0.0450 0.0475 0.0500 >>> >>> which is simply the methylation data input into summary() only in >>> the region >>> of chr1 100 159. >>> >>> I know how to perform all of the required functions line-by-line, >>> but the >>> hard part for me is essentially taking the input data with multiple >>> positions in each region and assigning all of the summary results >>> to one >>> line identified by the region. >>> >>> If any of you have any suggestions I would appreciate it. >>> >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Writing-a-summary-file-in-R-tp3700031p3700031.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/Writing-a-summary-file-in-R-tp3700031p3716837.html Sent from the R help mailing list archive at Nabble.com. From aziz4 at illinois.edu Wed Aug 3 23:06:33 2011 From: aziz4 at illinois.edu (aziz4 at illinois.edu) Date: Wed, 3 Aug 2011 16:06:33 -0500 (CDT) Subject: [R] Need help with xyplot Message-ID: <20110803160633.DGM91589@expms2.cites.uiuc.edu> Consider I have the following data: AgeRange AgeOfPerson PersonNo FriendsAtYear0 FriendsAtYear1 FriendsAtYear2 FriendsAtYear3 FriendsAtYear4 FriendsAtYear5 10 - 12 11 1 0 1 2 2 3 3 10 - 12 12 2 0 1 2 2 3 3 15 - 18 13 3 1 2 3 4 6 7 15 - 18 14 4 1 3 4 5 7 7 30 - 40 33 5 3 5 5 6 8 9 30 - 40 36 6 4 4 4 4 4 4 I want to plot the number of friends against number of years, as to show how friendships grew over time. Also, I want to group the graphs by AgeRange of persons and also color them with respect to the Age of Persons. So far, I have this code: FilePath = "C:\\Documents and Settings\\All Users\\Documents\\Desktop\\Fayez\\ResearchWork\\Pajek_Work\\Program\\2011-03-07-Wed-FriMeetingApr15\\RPractice\\" NumberOfyears <- c(0, 1, 2, 3, 4 ,5); File2Open = paste(FilePath, "FriendshipNetExample.txt", sep = "") # print(File2Open) DataTable = read.table(File2Open, header = TRUE, sep = "\t") # print(DataTable) print(xyplot(FriendsAtYear0 + FriendsAtYear1 + FriendsAtYear2 + FriendsAtYear3 + FriendsAtYear4 + FriendsAtYear5 ~ AgeOfPerson | AgeRange, data = DataTable, xlab = "Number of Years (0 to 5)", ylab = "Number of Friends", main = "Number of Friends vs. Years", #aspect = "xy", # calculate an optimal aspect ratio panel = function(x,y) { panel.grid(); if (10 <= x && x < 12) panel.xyplot(x,y,col="red"); if (15 <= x && x < 18) panel.xyplot(x,y,col="salmon"); if (30 <= x && x < 40) panel.xyplot(x,y,col="maroon"); } ) ) But it obviously does not serve the purpose. Urgent help would be most appreciated. Best, Fayez Grad student - UIUC From bt_jannis at yahoo.de Wed Aug 3 23:00:13 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 03 Aug 2011 23:00:13 +0200 Subject: [R] General indexing in multidimensional arrays In-Reply-To: References: <4E3673FD.5030800@yahoo.de> Message-ID: <4E39B6DD.1070606@yahoo.de> Thanks for all the replies!Unfortunately the solutions only work for extracting subsets of the data (which was exactly what I was asking for) and not to replace subsets with other values. I used them, however, to program a rather akward function to do that. Seems I found one of the few aspects where Matlab actually is slightly easier to use than R. Thanks for your help! Jannis On 08/01/2011 05:50 PM, Gene Leynes wrote: > What do you think about this? > > apply(data, 3, '[', indices) > > > On Mon, Aug 1, 2011 at 4:38 AM, Jannis wrote: > >> Dear R community, >> >> >> I have a general question regarding indexing in multidiemensional arrays. >> >> Imagine I have a three dimensional array and I only want to extract on >> vector along a single dimension from it: >> >> >> data<- array(rnorm(64),dim=c(4,4,4)) >> >> result<- data[1,1,] >> >> If I want to extract more than one of these vectors, it would now really >> help me to supply a logical matrix of the size of the first two dimensions: >> >> >> indices<- matrix(FALSE,ncol=4,nrow=4) >> indices[1,3]<- TRUE >> indices[4,1]<- TRUE >> >> result<- data[indices,] >> >> This, however would give me an error. I am used to this kind of indexing >> from Matlab and was wonderingt whether there exists an easy way to do this >> in R without supplying complicated index matrices of all three dimensions or >> logical vectors of the size of the whole matrix? >> >> The only way I could imagine would be to: >> >> result<- data[rep(as.vector(indices),**times=4)] >> >> but this seems rather complicated and also depends on the order of the >> dimensions I want to extract. >> >> >> I do not want R to copy Matlabs behaviour, I am just wondering whether I >> missed one concept of indexing in R? >> >> >> >> Thanks a lot >> Jannis >> >> ______________________________**________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/**listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/** >> posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> From gouri.vadali at gmail.com Wed Aug 3 23:04:30 2011 From: gouri.vadali at gmail.com (Gouri) Date: Wed, 3 Aug 2011 14:04:30 -0700 (PDT) Subject: [R] Trouble reading Illumina Bead Studio Ouput file Message-ID: <1312405470752-3716854.post@n4.nabble.com> Hello All, I am trying to run normalization on the Illumina bead studio output file version 1.5.1.3 using the lumi package. Sine the original file is a huge file I am splitting it and reading it as a batch in the lumi package. This is where the problem begins.. it just seems to be not reading it.. but instead gives me an error : Error in m[rownames(x), colnames(x)] <- x : subscript out of bounds In addition: Warning message: In combine(x.lumi, x.lumi.i) : Two data sets have some duplicated sample names! I am using the lumiR.batch command on my file. I have checked it all but it seems to be not running at all.. Please help! -G -- View this message in context: http://r.789695.n4.nabble.com/Trouble-reading-Illumina-Bead-Studio-Ouput-file-tp3716854p3716854.html Sent from the R help mailing list archive at Nabble.com. From baptiste.auguie at googlemail.com Wed Aug 3 23:53:11 2011 From: baptiste.auguie at googlemail.com (baptiste auguie) Date: Thu, 4 Aug 2011 09:53:11 +1200 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <4E396E21.5030904@ohsu.edu> References: <1312310369354-3713305.post@n4.nabble.com> <4E385D31.7090504@ohsu.edu> <1312376866384-3715382.post@n4.nabble.com> <4E396E21.5030904@ohsu.edu> Message-ID: A barplot rendered with povray, http://zoonek2.free.fr/UNIX/48_R/03.html#10 At the other end of the spectrum, library(txtplot) x <- factor(c("orange", "orange", "red", "green", "green", "red", "yellow", "purple", "purple", "orange")) o <- capture.output(txtbarchart(x)) library(gplots) textplot(o) Best, baptiste On 4 August 2011 03:49, Brian Diggs wrote: > On 8/3/2011 6:07 AM, wwreith wrote: >> >> So I take it 3D pie charts are out? > > At least with ggplot, yes. ?2D pie charts are somewhat tricky with ggplot, > even. ?They can be gone with stacked, normalized bar charts projected into > polar coordinates, if I recall properly. > > Not limited to ggplot, there is pie() in the graphics package, and pie3D() > in the plotrix package. > > I couldn't find anything that would do bar plots with a 3D effect; the > closest was the scatterplot3d package, but that is more a way to do a two > dimensional array of bars, rather than a 3D effect. > >> P.S. It is not about hiding anything. It is about consulting and being >> told >> by your client to make 3D pie charts and change this font or that color to >> make the graphs more apealing. Given that I am the one trying to open the >> door to using R where I work it would be much easier if I could simply use >> a >> 2D graph. > > External requirements can make us make choices we otherwise might not have. > ?If the client is amenable to education, you could slowly try to persuade > (say, using side-by-side examples), but some are not. ?Good luck. > >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/3D-Bar-Graphs-in-ggplot2-tp3713305p3715382.html >> Sent from the R help mailing list archive at Nabble.com. >> > > > -- > Brian S. Diggs, PhD > Senior Research Associate, Department of Surgery > Oregon Health & Science University > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From michael.weylandt at gmail.com Thu Aug 4 00:12:00 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Wed, 3 Aug 2011 18:12:00 -0400 Subject: [R] General indexing in multidimensional arrays In-Reply-To: References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From matt.curcio.ri at gmail.com Thu Aug 4 00:14:40 2011 From: matt.curcio.ri at gmail.com (Matt Curcio) Date: Wed, 3 Aug 2011 18:14:40 -0400 Subject: [R] Error message for MCC Message-ID: Greetings all, I am getting an error message that is stifling me. Any ideas? > ## Define Directories ## > load_from <- "/home/mcc/Dropbox/abrodsky/kegg_combine_data/" > save_to <- "/home/mcc/Dropbox/abrodsky/ttest_results/" > > ############################### > ## Define Columns To Compare ## > compareA <- "log_b_rich" > compareB <- "Fc_cdt_rich_tot" > > ################################ > ## Collect Files To Compare ## > setwd(load_from) > files_to_test <- list.files(pattern = "combine.kegg") > > ########################## > ## Initialize Variables ## > vl <- length(files_to_test) > temp <- vector(mode="numeric", length = vl) > colA <- vector(mode="numeric", length = vl) > colB <- vector(mode="numeric", length = vl) > tt <- vector(mode="numeric", length = vl) > > > ######################## > ## Calculate P-values ## > for (i in 1:3){ + temp1 <- read.table(files_to_test[i], header=TRUE, sep=" ") + numrows <- nrow(temp1) + tt_pvalue <- matrix(data=temp, nrow=numrows, ncol=vl) + colA <- temp[,compareA] + colB <- temp[,compareB] + tt <- t.test(colA, colB, var.equal=TRUE) + tt_pvalue <- tt$p.value + } Error in temp[, compareA] : incorrect number of dimensions -- Matt Curcio M: 401-316-5358 E: matt.curcio.ri at gmail.com From mailzhuyao at gmail.com Thu Aug 4 01:28:27 2011 From: mailzhuyao at gmail.com (zhu yao) Date: Thu, 4 Aug 2011 07:28:27 +0800 Subject: [R] How to make a nomogam and Calibration plot In-Reply-To: <1312213591451-3710068.post@n4.nabble.com> References: <1312213591451-3710068.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From c6h5no2 at gmail.com Thu Aug 4 01:26:09 2011 From: c6h5no2 at gmail.com (C6H5NO2) Date: Thu, 4 Aug 2011 07:26:09 +0800 Subject: [R] Possible bug of QR decomposition in package Matrix Message-ID: Hello R users, I am trying to give the QR decomposition for a large sparse matrix in the format of dgCMatrix. When I run qr function for this matrix, the R session simply stops and exits to the shell. The matrix is of size 108595x108595, and it has 4866885 non-zeros. I did the experiment on windows 7 and linux mint 11 (both 64 bit), and the results are the same. I have uploaded my data file to http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I hope someone could kindly download it. The code to see my problem is: library(Matrix) load("A.RData") B <- qr(A) Best wishes, C6 From rvalliant at survey.umd.edu Thu Aug 4 03:38:57 2011 From: rvalliant at survey.umd.edu (Richard Valliant) Date: Wed, 03 Aug 2011 21:38:57 -0400 Subject: [R] Tinn-R problem: unable to send code to R Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From f.harrell at vanderbilt.edu Thu Aug 4 04:18:53 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Wed, 3 Aug 2011 19:18:53 -0700 (PDT) Subject: [R] How to make a nomogam and Calibration plot In-Reply-To: References: <1312213591451-3710068.post@n4.nabble.com> Message-ID: <1312424333899-3717519.post@n4.nabble.com> I want to add a comment related to the calibration plot that was presented in a previous post (which probably cannot be done optimally in SPSS). The plot lacks sufficient resolution in the x-axis values that are calibrated. It is far better to use loess to estimate a smooth nonparametric calibration curve, with no (arbitrary) binning of x. And it is not adequate to validate only the 3-4 points that were plotted. Frank yz wrote: > > Nomogram is user-friendly, but the equation is also acceptable. It should > be > kept in mind that the process of model development really counts. > > BTW: You can calculate C-index (AUC) in SPSS. Calibration plot can also be > plotted (may be manually) from the result of SPSS. > > *Yao Zhu* > *Department of Urology > Fudan University Shanghai Cancer Center > Shanghai, China* > > > 2011/8/1 sytangping <surgeon666666 at yahoo.com.cn> > >> Dear R users, >> >> I am a new R user and something stops me when I try to write a academic >> article. I want to make a nomogram to predict the risk of prostate cancer >> (PCa) using several factors which have been selected from the Logistic >> regression run under the SPSS. Always, a calibration plot is needed to >> validate the prediction accuracy of the nomogram. >> However, I tried many times and read a lot of posts with respect to this >> topic but I still couldn't figure out how to draw the nomogram and the >> calibration plot. My dataset and questions in detail are shown in two >> attached files. It will be very grateful if someone can save his/her time >> to >> help for my questions. >> >> Warmest regards! >> >> Ping Tang http://r.789695.n4.nabble.com/file/n3710068/Dataset.xls >> Dataset.xls http://r.789695.n4.nabble.com/file/n3710068/R_help.doc >> R_help.doc >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3710068.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/How-to-make-a-nomogam-and-Calibration-plot-tp3710068p3717519.html Sent from the R help mailing list archive at Nabble.com. From jholtman at gmail.com Thu Aug 4 06:03:49 2011 From: jholtman at gmail.com (Jim Holtman) Date: Thu, 4 Aug 2011 00:03:49 -0400 Subject: [R] Error message for MCC In-Reply-To: References: Message-ID: 'temp' is only a vector and you are trying to reference it as a matrix, therefore the error message Sent from my iPad On Aug 3, 2011, at 18:14, Matt Curcio wrote: > Greetings all, > I am getting an error message that is stifling me. > Any ideas? > >> ## Define Directories ## >> load_from <- "/home/mcc/Dropbox/abrodsky/kegg_combine_data/" >> save_to <- "/home/mcc/Dropbox/abrodsky/ttest_results/" >> >> ############################### >> ## Define Columns To Compare ## >> compareA <- "log_b_rich" >> compareB <- "Fc_cdt_rich_tot" >> >> ################################ >> ## Collect Files To Compare ## >> setwd(load_from) >> files_to_test <- list.files(pattern = "combine.kegg") >> >> ########################## >> ## Initialize Variables ## >> vl <- length(files_to_test) >> temp <- vector(mode="numeric", length = vl) >> colA <- vector(mode="numeric", length = vl) >> colB <- vector(mode="numeric", length = vl) >> tt <- vector(mode="numeric", length = vl) >> >> >> ######################## >> ## Calculate P-values ## >> for (i in 1:3){ > + temp1 <- read.table(files_to_test[i], header=TRUE, sep=" ") > + numrows <- nrow(temp1) > + tt_pvalue <- matrix(data=temp, nrow=numrows, ncol=vl) > + colA <- temp[,compareA] > + colB <- temp[,compareB] > + tt <- t.test(colA, colB, var.equal=TRUE) > + tt_pvalue <- tt$p.value > + } > Error in temp[, compareA] : incorrect number of dimensions > > -- > > > Matt Curcio > M: 401-316-5358 > E: matt.curcio.ri at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ripley at stats.ox.ac.uk Thu Aug 4 07:14:52 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Thu, 4 Aug 2011 06:14:52 +0100 (BST) Subject: [R] R CMD check thinks my function is an S3 method In-Reply-To: References: Message-ID: On Wed, 3 Aug 2011, Peter Langfelder wrote: > Hi all, > > in my package I have a function with name plot.cor (this function is > inherited from another legacy package). According to CRAN package > checks reports, the check apparently thinks plot.cor is a method for > the plot generic (I hope I'm using the correct terminology). > > checking Rd \usage sections ... NOTE > S3 methods shown with full name in documentation object 'plot.cor': > ?plot.cor? > > > Although technically it doesn't seem to be an error (and CRAN Technically this is an error if you did not intend it to be an S3 method: the note says you incorrectly documented your S3 method. > maintainers haven't warned me about this), I was asked to clean up the > package to the point where the package check goes through without any > problems. > > My question is: do I need to rename the function (e.g. to plotCor, > which won't be mistaken for an S3 method) or is there a way to tell R > that this is not an S3 method? The former. (There is a list of historical exceptions used by R CMD check, but we will not be adding to it for new packages.) > > Thanks, > > Peter > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From mkzodet at comcast.net Thu Aug 4 03:01:37 2011 From: mkzodet at comcast.net (M/K Zodet) Date: Wed, 3 Aug 2011 21:01:37 -0400 Subject: [R] labelling a stacked barchart (lattice) Message-ID: <2038A337-718D-4F44-B468-93BF366FA0E8@comcast.net> All: Below is my code for creating a basic horizontal, stacked barchart. I'd like to label the plot in two ways: 1) place the x values in each piece and 2) place the y values above each piece (angled). I'm currently using lattice, but I'm open to suggestions using ggplot2. Questions: 1. Can this be done?...I assume yes. So, what are the best options/functions for doing this. 2. Is there a way to alter the transparency of the bar fill with the brewer palette? I know I can alter this w/ heat.~, topo.~, cm.colors, etc. Thanks in advance. Marc Using R for Mac OS X GUI 1.40-devel Leopard build 64-bit dta <- data.frame(x=c(46.0, 14.7, 16.4, 15.8, 7.0), y=c("Back", "Neck", "Extrem", "MuscSkel", "Oth")) dta barchart(data=dta, ~x, group=y, stack=T, col=sort(brewer.pal(7,"Purples")), xlab="Percent", box.width=.5, scales=list(tick.number=10)) From M.Rosario.Garcia at slu.se Thu Aug 4 02:58:28 2011 From: M.Rosario.Garcia at slu.se (Rosario Garcia Gil) Date: Thu, 4 Aug 2011 02:58:28 +0200 Subject: [R] persp() Message-ID: Hello I am trying to draw a basic black and white map of two European countries. After searching some key words in google and reading many pages I arrived to the conclusion that persp() could be used to draw that map. I have prepared three small example files, which are supposed to be the files required for running that function. xvector is a vector with the longitudes yvector is a vector with the latitudes zmatrix is supposed to the height, but since I only need a flat map I just gave the value 1 to each of the entries of the matrix (I am not sure this is correct though). The first question for me when using persp() is that x and y values should be in increasing values (following the instructions), but I understand that the coordinates x and y are actually pairs of values (longitude/latitude pairs of values) and if I order them in ascending order both then the pairing is gone. I guess I am totally lost! Still even if I try to run persp() by ordering in ascending value x and y values (even if it does not make sense for me) I still get this message: <- persp(xvector,yvector,zmatrix,theta=-40,phi=30) Error in persp.default(xvector, yvector, zmatrix, theta = -40, phi = 30) : increasing 'x' and 'y' values expected Any help is wellcome. Is there any other better function to draw a flat map (2D), also example of the imput files is wellcome. Thanks in advance. Rosario From dhinds at sonic.net Thu Aug 4 04:01:22 2011 From: dhinds at sonic.net (dhinds at sonic.net) Date: Thu, 4 Aug 2011 02:01:22 +0000 Subject: [R] Question about contrasts and interpreting glm output for factors Message-ID: I'm fitting a logistic regression model of the form: outcome ~ covariates + A*B where A and B are factors -- A has 4 levels, B has 2 levels. The A and B term each have significant main effects and the interaction term is significant. I'd like to ask, how does a particular set of A and B values affect the predicted outcome, compared to the mean prediction across all levels. The design is unbalanced but is essentially a random sample of the underlying population, at least with respect to A and B. So I think what I'm asking for are contrasts for each combination of A and B, against a weighted sum of regression coefficients for all values of A and B. I'm currently doing this with the 'rms' package using things like: contrast(model, list(A=a0,B=b0),list(A=levels(A),B=levels(B)), type='average', weights=as.data.frame(table(A,B))$Freq) where a0 is a particular level of A, and b0 is a level of B. Is this a reasonable thing to do? The results are fairly consistent with what I get if I fit models where I replace the A*B term with a indicator for a particular combination of levels of A and B, like I(A==a0 & B==b0), and use the Wald test on that term. Any suggestions for good information sources for using complex contrasts would also be appreciated; I haven't found a great one so far. -- Dave From flodel at gmail.com Thu Aug 4 04:06:18 2011 From: flodel at gmail.com (Florent D.) Date: Wed, 3 Aug 2011 22:06:18 -0400 Subject: [R] Error message for MCC In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From paul at stat.auckland.ac.nz Thu Aug 4 05:31:01 2011 From: paul at stat.auckland.ac.nz (Paul Murrell) Date: Thu, 04 Aug 2011 15:31:01 +1200 Subject: [R] grImport symbols In-Reply-To: References: Message-ID: <4E3A1275.9010303@stat.auckland.ac.nz> Hi baptiste auguie wrote: > Dear list, > > I have two questions regarding grid.symbols() in the grImport package. > This package allows you to import a vector graphic in R, and > grid.symbols() can be used to plot the resulting glyph at arbitrary > locations in a grid viewport. > > I have tried the code in the grImport vignette, which is most > interesting, however I am stuck with two problems: > > 1- Is there a possibility to create a Picture object directly from R > without importing an external image, but rather specified as a regular > grob/gTree, say. I would like to imitate my.symbols from > TeachingDemos, but with grid. You can directly create a Picture object. An example is on slide 18 of http://www.stat.auckland.ac.nz/~paul/Talks/import.pdf > 2- I could not figure out what the size argument in grid.symbols() > does. The following example fails rather curiously. > > petal.ps <- "%!PS > newpath % start a new shape > 0 0 moveto % move to a start location > -5 10 lineto % line to a new location > -10 20 10 20 5 10 curveto % curve to a third location > 5 10 lineto % line to a fourth location > closepath % connect back to the start location > 0 setgray % set the drawing colour to black > fill % fill the current shape" > > cat(petal.ps, file="petal.ps") > library(grImport) > PostScriptTrace("petal.ps") > petal <- readPicture("petal.ps.xml") > > grid.symbols(petal, 1:10/10, 1:10/10, size=0.1) # OK > > grid.symbols(petal, 1:10/10, 1:10/10, size = seq(0.01, 0.1, > length=10)) # Warning and weird shapes That's a bug in 'grImport' (could not cope with 'size' of length > 1). A fix is winging its way to CRAN now. Paul > sessionInfo() > R version 2.13.1 (2011-07-08) > Platform: i386-apple-darwin9.8.0/i386 (32-bit) > > locale: > [1] en_GB.UTF-8/en_GB.UTF-8/C/C/en_GB.UTF-8/en_GB.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets grid > methods base > > other attached packages: > [1] grImport_0.7-3 XML_3.4-0 ggplot2_0.8.9 proto_0.3-9.2 reshape_0.8.4 > [6] plyr_1.5.2 > > loaded via a namespace (and not attached): > [1] tools_2.13.1 > > Best regards, > > baptiste > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Dr Paul Murrell Department of Statistics The University of Auckland Private Bag 92019 Auckland New Zealand 64 9 3737599 x85392 paul at stat.auckland.ac.nz http://www.stat.auckland.ac.nz/~paul/ From themjwok at gmail.com Thu Aug 4 07:42:03 2011 From: themjwok at gmail.com (Michael O'Keeffe) Date: Thu, 4 Aug 2011 15:42:03 +1000 Subject: [R] Rdconv LaTeX files Message-ID: Hi, I've written a package and converted my Rd files into LaTeX using Rdconv. When I copy and paste these files in to my Sweave document I get the?error message when compiling the Sweave file: ! Undefined control sequence. l.32 \inputencoding {utf8} I also get the error message for the "\HeaderA", "\keyword". Additionally I get an environment undefined error for some of the "\begin" control sequences. Do I need to specifically "include" certain LaTeX libraries in my Sweave document? Regards, M From xuanlong.ma at uts.edu.au Thu Aug 4 03:38:30 2011 From: xuanlong.ma at uts.edu.au (richard.ma) Date: Wed, 3 Aug 2011 18:38:30 -0700 (PDT) Subject: [R] How to extract sublist from a list? Message-ID: <1312421910777-3717451.post@n4.nabble.com> Hi everyone, Suppose I have a list named "lst", see below: > lst $sub1 ... $sub1$x ... $sub1$y .... $sub2 ... $sub2$x ... $sub2$y ? $sub3 ... ... ... Now, I want to extract the sub-sublist $y from every sublist(sub1, sub2...) and then storage them to a new list. I know how to extract them by subscript or list name one by one, but I wonder if there exist some tricks to finish this job more automatically. Best regards, Richard @Sydney -- View this message in context: http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717451.html Sent from the R help mailing list archive at Nabble.com. From xuanlong.ma at uts.edu.au Thu Aug 4 07:29:05 2011 From: xuanlong.ma at uts.edu.au (Richard Ma) Date: Wed, 3 Aug 2011 22:29:05 -0700 (PDT) Subject: [R] How to extract sublist from a list? In-Reply-To: <1312425956068-3717556.post@n4.nabble.com> References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> Message-ID: <1312435745257-3717713.post@n4.nabble.com> Thank you so much GlenB! I got it done using your method. I'm just curious how did you get this idea? Cause for me, this looks so tricky.... Cheers, Richard ----- I'm a PhD student interested in Remote Sensing and R Programming. -- View this message in context: http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717713.html Sent from the R help mailing list archive at Nabble.com. From jwiley.psych at gmail.com Thu Aug 4 08:14:13 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 3 Aug 2011 23:14:13 -0700 Subject: [R] Possible bug of QR decomposition in package Matrix In-Reply-To: References: Message-ID: Hi C6 (were C1 - 5 already taken in your family?), I downloaded your data and can replicate your problem. R ceases responding and terminates. This does not occur with all uses of qr on a dgCMatrix object. I know nothing about sparse matrices, but if you believe this should not be occurring, you should contact the package maintainers. Here is my sessionInfo() (FYI, it would probably be helpful to report yours also in case the issue is version dependent): R Under development (unstable) (2011-07-30 r56564) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30 loaded via a namespace (and not attached): [1] grid_2.14.0 tools_2.14.0 Cheers, Josh On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 wrote: > Hello R users, > > I am trying to give the QR decomposition for a large sparse matrix in > the format of dgCMatrix. When I run qr function for this matrix, the R > session simply stops and exits to the shell. > The matrix is of size 108595x108595, and it has 4866885 non-zeros. I > did the experiment on windows 7 and linux mint 11 (both 64 bit), and > the results are the same. > > I have uploaded my data file to http://ifile.it/elf2p6z/A.RData . The > file is 10.681 MB and I hope someone could kindly download it. > The code to see my problem is: > library(Matrix) > load("A.RData") > B <- qr(A) > > Best wishes, > C6 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From ashimkapoor at gmail.com Thu Aug 4 08:25:59 2011 From: ashimkapoor at gmail.com (Ashim Kapoor) Date: Thu, 4 Aug 2011 11:55:59 +0530 Subject: [R] How to extract sublist from a list? In-Reply-To: <1312435745257-3717713.post@n4.nabble.com> References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ashimkapoor at gmail.com Thu Aug 4 09:12:03 2011 From: ashimkapoor at gmail.com (Ashim Kapoor) Date: Thu, 4 Aug 2011 12:42:03 +0530 Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jwiley.psych at gmail.com Thu Aug 4 09:32:05 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Thu, 4 Aug 2011 00:32:05 -0700 Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: On Thu, Aug 4, 2011 at 12:12 AM, Ashim Kapoor wrote: >> How would we do this problem looping over seq(1:2) ? Because this goes to an email list serv, it is good practice to quote the original problem. I have no idea what "this" is. >> >> > To extend the example in the corresponding nabble post : - > ?sub1<-list(x="a",y="ab") > ?sub2<-list(x="c",y="ad") > ?lst<-list(sub1=sub1,sub2=sub2) > ?for ( t in seq(1:2) ) ?print(lst[[t]]$y) > > So I can print out the sub1$y/sub2$y but it's not clear how to extract them. Well, to extract them, just drop the call to print. You could use them directly in the loop or could store them in new variables. ## note seq(1:2) is redundant with simply 1:2 or (t in 1:2) print(nchar(lst[[t]]$y)) I am guess, though, that what you might be hoping to do is extract specific elements from a list and store the extract elements in a new list. lapply(1:2, function(i) lst[[i]]["y"]) ## or compare lapply(1:2, function(i) lst[[i]][["y"]]) > > My original was different though. > > How would ?say:- > > for ( t in seq(1:2) ) sub"t"$y > > Where sub"t" evaluates to sub1 or sub 2? if you actually want "sub1", or "sub2": ## note that I am wrapping in print() not so that it works ## but so that you can see it at the console for (t in 1:2) print(paste("sub", t, sep = '')) from which we can surmise that the following should work: for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]) which trivially extends to: for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]$y) or perhaps more appropriately for (t in 1:2) print(lst[[paste("sub", t, sep = '')]][["y"]]) If you just need to go one level down for *all* elements of your list lapply(lst, `[[`, "y") ## or if you are only retrieving a single value sapply(lst, `[[`, "y") Hope this helps, Josh > > Many thanks. > Ashim > > >> On Thu, Aug 4, 2011 at 10:59 AM, Richard Ma wrote: >> >>> Thank you so much GlenB! >>> >>> I got it done using your method. >>> >>> I'm just curious how did you get this idea? Cause for me, this looks so >>> tricky.... >>> >>> Cheers, >>> Richard >>> >>> ----- >>> I'm a PhD student interested in Remote Sensing and R Programming. >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717713.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From c6h5no2 at gmail.com Thu Aug 4 10:03:54 2011 From: c6h5no2 at gmail.com (C6H5NO2) Date: Thu, 4 Aug 2011 16:03:54 +0800 Subject: [R] Possible bug of QR decomposition in package Matrix In-Reply-To: References: Message-ID: Thank you very much, Josh! As you suggested, I will contact the developers of "Matrix". PS, C6 are just initial characters of my email account :-) Best wishes, C6 2011/8/4 Joshua Wiley : > Hi C6 (were C1 - 5 already taken in your family?), > > I downloaded your data and can replicate your problem. ?R ceases > responding and terminates. ?This does not occur with all uses of qr on > a dgCMatrix object. ?I know nothing about sparse matrices, but if you > believe this should not be occurring, you should contact the package > maintainers. ?Here is my sessionInfo() (FYI, it would probably be > helpful to report yours also in case the issue is version dependent): > > R Under development (unstable) (2011-07-30 r56564) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats ? ? graphics ?grDevices utils ? ? datasets ?methods ? base > > other attached packages: > [1] Matrix_0.999375-50 lattice_0.19-30 > > loaded via a namespace (and not attached): > [1] grid_2.14.0 ?tools_2.14.0 > > Cheers, > > Josh > > On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 wrote: >> Hello R users, >> >> I am trying to give the QR decomposition for a large sparse matrix in >> the format of dgCMatrix. When I run qr function for this matrix, the R >> session simply stops and exits to the shell. >> The matrix is of size 108595x108595, and it has 4866885 non-zeros. I >> did the experiment on windows 7 and linux mint 11 (both 64 bit), and >> the results are the same. >> >> I have uploaded my data file to http://ifile.it/elf2p6z/A.RData . The >> file is 10.681 MB and I hope someone could kindly download it. >> The code to see my problem is: >> library(Matrix) >> load("A.RData") >> B <- qr(A) >> >> Best wishes, >> C6 >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, ATS Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ > From ashimkapoor at gmail.com Thu Aug 4 10:40:16 2011 From: ashimkapoor at gmail.com (Ashim Kapoor) Date: Thu, 4 Aug 2011 14:10:16 +0530 Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jwiley.psych at gmail.com Thu Aug 4 10:42:24 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Thu, 4 Aug 2011 01:42:24 -0700 Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: On Thu, Aug 4, 2011 at 1:40 AM, Ashim Kapoor wrote: > > > On Thu, Aug 4, 2011 at 1:02 PM, Joshua Wiley wrote: >> >> On Thu, Aug 4, 2011 at 12:12 AM, Ashim Kapoor >> wrote: >> >> How would we do this problem looping over seq(1:2) ? >> >> Because this goes to an email list serv, it is good practice to quote >> the original problem. ?I have no idea what "this" is. >> >> >> >> >> >> > To extend the example in the corresponding nabble post : - >> > ?sub1<-list(x="a",y="ab") >> > ?sub2<-list(x="c",y="ad") >> > ?lst<-list(sub1=sub1,sub2=sub2) >> > ?for ( t in seq(1:2) ) ?print(lst[[t]]$y) >> > >> > So I can print out the sub1$y/sub2$y but it's not clear how to extract >> > them. >> >> Well, to extract them, just drop the call to print. You could use them >> directly in the loop or could store them in new variables. >> > >> j<- for ( t in seq(1:2) )? lst[[t]]$y >> j > NULL > > Why is j NULL? ? You are confusing how for loops work, please read the documentation for ?for > >> >> ## note seq(1:2) is redundant with simply 1:2 >> or (t in 1:2) print(nchar(lst[[t]]$y)) >> >> I am guess, though, that what you might be hoping to do is extract >> specific elements from a list and store the extract elements in a new >> list. >> >> lapply(1:2, function(i) lst[[i]]["y"]) >> ## or compare >> lapply(1:2, function(i) lst[[i]][["y"]]) >> >> > >> > My original was different though. >> > >> > How would ?say:- >> > >> > for ( t in seq(1:2) ) sub"t"$y >> > >> > Where sub"t" evaluates to sub1 or sub 2? >> >> if you actually want "sub1", or "sub2": >> >> ## note that I am wrapping in print() not so that it works >> ## but so that you can see it at the console >> for (t in 1:2) print(paste("sub", t, sep = '')) >> >> from which we can surmise that the following should work: >> >> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]) >> >> which trivially extends to: >> >> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]$y) >> >> or perhaps more appropriately >> >> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]][["y"]]) >> >> If you just need to go one level down for *all* elements of your list >> >> lapply(lst, `[[`, "y") >> ## or if you are only retrieving a single value >> sapply(lst, `[[`, "y") >> >> Hope this helps, >> >> >> Josh >> >> > >> > Many thanks. >> > Ashim >> > >> > >> >> On Thu, Aug 4, 2011 at 10:59 AM, Richard Ma >> >> wrote: >> >> >> >>> Thank you so much GlenB! >> >>> >> >>> I got it done using your method. >> >>> >> >>> I'm just curious how did you get this idea? Cause for me, this looks >> >>> so >> >>> tricky.... >> >>> >> >>> Cheers, >> >>> Richard >> >>> >> >>> ----- >> >>> I'm a PhD student interested in Remote Sensing and R Programming. >> >>> -- >> >>> View this message in context: >> >>> >> >>> http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717713.html >> >>> Sent from the R help mailing list archive at Nabble.com. >> >>> >> >>> ______________________________________________ >> >>> R-help at r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>> PLEASE do read the posting guide >> >>> http://www.R-project.org/posting-guide.html >> >>> and provide commented, minimal, self-contained, reproducible code. >> >>> >> >> >> >> >> > >> > ? ? ? ?[[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, ATS Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/ From striess at iiasa.ac.at Thu Aug 4 10:48:52 2011 From: striess at iiasa.ac.at (Erich Striessnig) Date: Thu, 4 Aug 2011 10:48:52 +0200 Subject: [R] How to get the test statistic corresponding to the p-value in mtable? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jwiley.psych at gmail.com Thu Aug 4 10:55:01 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Thu, 4 Aug 2011 01:55:01 -0700 Subject: [R] How to get the test statistic corresponding to the p-value in mtable? In-Reply-To: References: Message-ID: Hi Erich, Thanks for the example. You can access the coefficient table output by a call to summary: coef(summary(res1)) so if you want the test statistic (the t value here), just extract the third column of the matrix: coef(summary(res1))[, 3] or to keep in matrixy form coef(summary(res1))[, 3, drop = FALSE] Hope this helps, Josh On Thu, Aug 4, 2011 at 1:48 AM, Erich Striessnig wrote: > Dear R-Users, > > I want to use mtable from package "memisc" to produce Latex-style estimation > output. However, mtable() only gives me a p-value and not the corresponding > test-statistic. Does anyone know how to extract it, either from a glm/anova > object or mtable? Here is a short example: > > # Run this #################### > install.packages("memisc") > library(memisc) > > set.seed(1) > data1 <- rnorm(400) > dim(data1) <- c(100,4) > data1 <- as.data.frame(data1) > names(data1) <- c("y",paste("x",1:3,sep="")) > > res1 <- glm(y~x1+x2,data=data1) > res2 <- glm(y~x2+x3,data=data1) > > mtable("Model 1"=res1,"Model 2"=res2) > ############################# > > Cheers, > Erich > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From ashimkapoor at gmail.com Thu Aug 4 11:08:51 2011 From: ashimkapoor at gmail.com (Ashim Kapoor) Date: Thu, 4 Aug 2011 14:38:51 +0530 Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jwiley.psych at gmail.com Thu Aug 4 11:23:56 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Thu, 4 Aug 2011 02:23:56 -0700 Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: On Thu, Aug 4, 2011 at 2:08 AM, Ashim Kapoor wrote: > > On Thu, Aug 4, 2011 at 2:12 PM, Joshua Wiley wrote: >> >> On Thu, Aug 4, 2011 at 1:40 AM, Ashim Kapoor >> wrote: >> > >> > >> > On Thu, Aug 4, 2011 at 1:02 PM, Joshua Wiley >> > wrote: >> >> >> >> On Thu, Aug 4, 2011 at 12:12 AM, Ashim Kapoor >> >> wrote: >> >> >> How would we do this problem looping over seq(1:2) ? >> >> >> >> Because this goes to an email list serv, it is good practice to quote >> >> the original problem. ?I have no idea what "this" is. >> >> >> >> >> >> >> >> >> >> > To extend the example in the corresponding nabble post : - >> >> > ?sub1<-list(x="a",y="ab") >> >> > ?sub2<-list(x="c",y="ad") >> >> > ?lst<-list(sub1=sub1,sub2=sub2) >> >> > ?for ( t in seq(1:2) ) ?print(lst[[t]]$y) >> >> > >> >> > So I can print out the sub1$y/sub2$y but it's not clear how to >> >> > extract >> >> > them. >> >> >> >> Well, to extract them, just drop the call to print. You could use them >> >> directly in the loop or could store them in new variables. >> >> >> > >> >> j<- for ( t in seq(1:2) )? lst[[t]]$y >> >> j >> > NULL >> > >> > Why is j NULL? ? >> >> You are confusing how for loops work, please read the documentation for >> ?for >> > The help says : - > ?? ?for?, ?while? and ?repeat? return ?NULL? invisibly.? ?for? sets > ???? ?var? to the last used element of ?seq?, or to ?NULL? if it was of > ???? length zero. > > but it does not tell me how to fix my problem which is to return the values. sure it does, look at the Examples! for returns null, so you need to do the assignment on a function that actually returns what you want, that would be: lst[[t]]$y (yep, [[]] and $ are really functions that return values although because they are operators you may not typically think of them like regular functions). Of course you are using a loop so you do not want to just keep overwriting the same variable, so you will need to instatiate a variable outside the loop (preferablly sized appropriately for the number of iterations in your loop) and then do something like: for (i in 1:2) j[[i]] <- lst[[i]]]$y loops get to be a bit of a pain in this regard (in my opinion), which is why I showed you several solutions that use lapply instead. If you have not already (hopefully you did), try them out, you'll like them...you can basically do what you tried simply assigning the output of lapply to a variable j without having to worry about instatiating it and assigning to a new position each iteration, etc. j2 <- lapply(1:2, function(i) lst[[i]]$y) if you set up j as a list, identical(j, j2) ought to be TRUE. Of course (as I showed using sapply() ), because you are returning a single value each time, it would also be reasonable for j to simply be a vector. Cheers > >> >> > >> >> >> >> ## note seq(1:2) is redundant with simply 1:2 >> >> or (t in 1:2) print(nchar(lst[[t]]$y)) >> >> >> >> I am guess, though, that what you might be hoping to do is extract >> >> specific elements from a list and store the extract elements in a new >> >> list. >> >> >> >> lapply(1:2, function(i) lst[[i]]["y"]) >> >> ## or compare >> >> lapply(1:2, function(i) lst[[i]][["y"]]) >> >> >> >> > >> >> > My original was different though. >> >> > >> >> > How would ?say:- >> >> > >> >> > for ( t in seq(1:2) ) sub"t"$y >> >> > >> >> > Where sub"t" evaluates to sub1 or sub 2? >> >> >> >> if you actually want "sub1", or "sub2": >> >> >> >> ## note that I am wrapping in print() not so that it works >> >> ## but so that you can see it at the console >> >> for (t in 1:2) print(paste("sub", t, sep = '')) >> >> >> >> from which we can surmise that the following should work: >> >> >> >> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]) >> >> >> >> which trivially extends to: >> >> >> >> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]$y) >> >> >> >> or perhaps more appropriately >> >> >> >> for (t in 1:2) print(lst[[paste("sub", t, sep = '')]][["y"]]) >> >> >> >> If you just need to go one level down for *all* elements of your list >> >> >> >> lapply(lst, `[[`, "y") >> >> ## or if you are only retrieving a single value >> >> sapply(lst, `[[`, "y") >> >> >> >> Hope this helps, >> >> >> >> >> >> Josh >> >> >> >> > >> >> > Many thanks. >> >> > Ashim >> >> > >> >> > >> >> >> On Thu, Aug 4, 2011 at 10:59 AM, Richard Ma >> >> >> wrote: >> >> >> >> >> >>> Thank you so much GlenB! >> >> >>> >> >> >>> I got it done using your method. >> >> >>> >> >> >>> I'm just curious how did you get this idea? Cause for me, this >> >> >>> looks >> >> >>> so >> >> >>> tricky.... >> >> >>> >> >> >>> Cheers, >> >> >>> Richard >> >> >>> >> >> >>> ----- >> >> >>> I'm a PhD student interested in Remote Sensing and R Programming. >> >> >>> -- >> >> >>> View this message in context: >> >> >>> >> >> >>> >> >> >>> http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717713.html >> >> >>> Sent from the R help mailing list archive at Nabble.com. >> >> >>> >> >> >>> ______________________________________________ >> >> >>> R-help at r-project.org mailing list >> >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >> >>> PLEASE do read the posting guide >> >> >>> http://www.R-project.org/posting-guide.html >> >> >>> and provide commented, minimal, self-contained, reproducible code. >> >> >>> >> >> >> >> >> >> >> >> > >> >> > ? ? ? ?[[alternative HTML version deleted]] >> >> > >> >> > ______________________________________________ >> >> > R-help at r-project.org mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide >> >> > http://www.R-project.org/posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible code. >> >> > >> >> >> >> >> >> >> >> -- >> >> Joshua Wiley >> >> Ph.D. Student, Health Psychology >> >> Programmer Analyst II, ATS Statistical Consulting Group >> >> University of California, Los Angeles >> >> https://joshuawiley.com/ > > Many Thanks, > Ashim > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, ATS Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ From s.wood at bath.ac.uk Thu Aug 4 11:30:26 2011 From: s.wood at bath.ac.uk (Simon Wood) Date: Thu, 04 Aug 2011 10:30:26 +0100 Subject: [R] Incorrect degrees of freedom for splines using GAMM4? In-Reply-To: References: Message-ID: <4E3A66B2.8060207@bath.ac.uk> Thanks for reporting this. It was a bug, now fixed in gamm4 0.1-3. Simon On 19/07/11 22:16, Melinda Power wrote: > Hello, > > I'm running mixed models in GAMM4 with 2 (non-nested) random intercepts and > I want to include a spline term for one of my exposure variables. However, > when I include a spline term, I always get reported degrees of freedom of > less than 1, even when I know that my spline is using more than 1 degree of > freedom. For example, here is the code for my model: > >> global.gamm4<-gamm4(zcog~s(adjpatx, fx=TRUE, k=5)+int234+cogagec+cogagesq > + > + + oldfran +newus +alc2 +alc3 +alc4 +alcmiss +smk2 +smk3 > + +mdinc10c +mdinc10sq+ pwhtc +pwhtsq +edu2+ edu3 +husbgs > +husbcol+ husbmiss > + +currpmh +pastpmh +neverpmh, random= ~(1|id) > +(1|cogtest), data=global) > > Using> summary(global.gamm4$mer), I get the following output for my spline > term, indicating that I use the expected 4 degrees of freedom. > > Xs(adjpatx)Fx1 0.1018943 0.1073225 0.949 > Xs(adjpatx)Fx2 -0.0708114 0.1123845 -0.630 > Xs(adjpatx)Fx3 0.7459511 0.6836413 1.091 > Xs(adjpatx)Fx4 -0.2062321 0.0923569 -2.233 > > However, when I use> summary(global.gamm4$gam). I get an estimate of > degrees of freedom that is not 4: > > Approximate significance of smooth terms: > edf Ref.df F p-value > s(adjpatx) 0.7588 0.7588 1.346 0.234 > > This degree of freedom = 0.76 also shows up on my plot. > > Ultimately, I would like to use a cubic regression penalized spline, > allowing R to choose the degrees of freedom for me using GCV. However, when > I use the correct code for this or variants of it using mgcv, I also get > degrees of freedom less than 1. For example, in the following code provides > a degree of freedom of less than 1 as well: > > >> global.gamm4<-gamm4(zcog~s(adjpatx, fx=FALSE)+int234+cogagec+cogagesq + > + + oldfran +newus +alc2 +alc3 +alc4 +alcmiss +smk2 +smk3 > + +mdinc10c +mdinc10sq+ pwhtc +pwhtsq +edu2+ edu3 +husbgs > +husbcol+ husbmiss > + +currpmh +pastpmh +neverpmh, random= ~(1|id) > +(1|cogtest), data=global) > > Output indicating that this spline should probably look linear: > >> summary(global.gamm4$mer) > Random effects: > Groups Name Variance Std.Dev. > id (Intercept) 0.1823454 0.427019 > cogtest (Intercept) 0.0025498 0.050496 > Xr.1 s(adjtibx) 0.0000000 0.000000 > Residual 0.7782969 0.882211 > > Xs(adjtibx)Fx1 -0.0387360 0.0215596 -1.797 > > > Output getting a df for this spline of 0.20. > >> summary(global.gamm4$gam) > > Approximate significance of smooth terms: > edf Ref.df F p-value > s(adjtibx) 0.2009 0.2009 16.07 NA > > The plot looks linear, but reports a df =0.20. > > > So...to summarize my questions: > > 1. Are the splines produced by s(exp, fx=FALSE) or s(exp, fx=TRUE, k=k) > correct even though the reported degrees of freedom appears to be wrong? > > 2. Can I believe my plot? > > 3. How can I get the true df used when I use s(exp, fx=FALSE)? > > > > > > Thanks for any and all help you can provide! > > Melinda > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Simon Wood, Mathematical Science, University of Bath BA2 7AY UK +44 (0)1225 386603 http://people.bath.ac.uk/sw283 From ashimkapoor at gmail.com Thu Aug 4 11:38:43 2011 From: ashimkapoor at gmail.com (Ashim Kapoor) Date: Thu, 4 Aug 2011 15:08:43 +0530 Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From fraenzi.korner at oikostat.ch Thu Aug 4 12:17:35 2011 From: fraenzi.korner at oikostat.ch (fraenzi.korner at oikostat.ch) Date: 4 Aug 2011 12:17:35 +0200 Subject: [R] =?utf-8?q?R-help_Digest=2C_Vol_102=2C_Issue_4?= Message-ID: <20110804101735.29947.qmail@srv5.yoursite.ch> Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden F?llen melden Sie sich bei Stefanie von Felten steffi.vonfelten at oikostat.ch We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfelten at oikostat.ch From jim at bitwrit.com.au Thu Aug 4 13:20:55 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Thu, 04 Aug 2011 21:20:55 +1000 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <1312376866384-3715382.post@n4.nabble.com> References: <1312310369354-3713305.post@n4.nabble.com> <4E385D31.7090504@ohsu.edu> <1312376866384-3715382.post@n4.nabble.com> Message-ID: <4E3A8097.3080602@bitwrit.com.au> On 08/03/2011 11:07 PM, wwreith wrote: > So I take it 3D pie charts are out? > > P.S. It is not about hiding anything. It is about consulting and being told > by your client to make 3D pie charts and change this font or that color to > make the graphs more apealing. Given that I am the one trying to open the > door to using R where I work it would be much easier if I could simply use a > 2D graph. > > -- Hi wwreith, Take a look at pie3D in the plotrix package. It's not as fancy as the latest 3D pie charts, but it may get you over the line. Jim From E.Vettorazzi at uke.uni-hamburg.de Thu Aug 4 13:44:41 2011 From: E.Vettorazzi at uke.uni-hamburg.de (Eik Vettorazzi) Date: Thu, 04 Aug 2011 13:44:41 +0200 Subject: [R] Tinn-R problem: unable to send code to R In-Reply-To: References: Message-ID: <4E3A8629.80501@uke.uni-hamburg.de> Hi Richard, Have you configured Tinn-R to work with R? Check your Rprofile.site #quick configuration check in R grep(".trPaths",readLines(file.path(R.home(component="etc"),"Rprofile.site")))>0 If that fails, Tinn-R helps you to do all the necessary config steps, just use the menu R -> Configure -> Permanent More help is near at http://sourceforge.net/projects/tinn-r/forums/forum/481900 which is the obvious place to look at, since it is a problem with your editor and not so much related to R itself. cheers Am 04.08.2011 03:38, schrieb Richard Valliant: > The problem mentioned in the 06 Dec 2010 email below still occurs with > Tinn-R (v.2.3.7.1) when highlighting a string of code, copying to the > clipboard, and trying to send to Rgui via Shift+Ctrl+Q. > > I can copy 1 line of code to the clipboard and send it to Rgui with > Shift+CTRL+Q. This fails with 2 or more lines of code, generating the > error: > Error in source(.trPaths[5], echo = TRUE, max.deparse.length = 150) : > object '.trPaths' not found > > > Is there a fix for this? > > R version 2.13.1 (2011-07-08) > Platform: x86_64-pc-mingw32/x64 (64-bit) > Windows 7 (v.6.1, Buid 7601: Service Pack 1) > > thnx > R. Valliant > U. of Michigan, US > > > From: > Date: Mon, 06 Dec 2010 08:36:33 -0500 > > > I am also finding the link between TINN - R (2.3.7.0) and R (2.12.0 > 2010 - 10 - 15) to be problematic. > source(.trPaths[5], echo=TRUE, max.deparse.length=150) Error in > source(.trPaths[5], echo = TRUE, max.deparse.length = 150) : object > '.trPaths' not found > Steve Friedman Ph. D. > Ecologist / Spatial Statistical Analyst Everglades and Dry Tortugas > National Park 950 N Krome Ave (3rd Floor) > Homestead, Florida 33034 > Steve_Friedman_at_nps.gov > Office (305) 224 - 4282 > Fax (305) 224 - 4147 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 From wludwick at mac.com Thu Aug 4 13:45:09 2011 From: wludwick at mac.com (Walter Ludwick) Date: Thu, 04 Aug 2011 12:45:09 +0100 Subject: [R] R.app installer probs on Snow Leopard In-Reply-To: <7031F36A-589C-40B1-9813-5BA2EC2CBC50@gmail.com> References: <0A01FBF4-5A06-4EDC-AD24-5DE687EB149C@mac.com> <7031F36A-589C-40B1-9813-5BA2EC2CBC50@gmail.com> Message-ID: <8D5BF448-87F1-4C2E-B29D-A3F798AF2A76@mac.com> Thanks, Peter and David, for the pointers -and my apologies to the list for posting to the wrong place; i will in future use the SIG-Mac list for such things. To the problem: Yes, i did indeed make that fundamental error of taking the R.app GUI package to be all i needed to run the program. I gathered this, and downloaded all versions of that package from http://r.research.att.com/ (which is obviously no "official" download page, tho i took it as such, linked as it was from www.r-project.org at a high level). So now: i did a plain install of the R-2.13.1 package using link that Peter provided below, and that seems to be working just fine. Thanks again, Peter! /w On Aug 3, 2011, at 9:39 PM, peter dalgaard wrote: > > On Aug 3, 2011, at 18:35 , Walter Ludwick wrote: > >> Have tried to install R.app several times (6, in fact: versions 2.12, 13 & 14, both 32 and 64 bit versions), using packages freshly downloaded from the official project page, and failed every time, given exception reports such as the following (appended below, the 2 reports arising out of my 1st & 6th attempts). >> >> Machine & software version specifics are all contained therein. >> >> What am i missing, i wonder? Any clues would be most appreciated -thanx! /w > > What did you do to install? For a plain install, just get > > http://cran.r-project.org/bin/macosx/R-2.13.1.pkg > > open it and follow the instructions. > > > If you tried to install the > > http://cran.r-project.org/bin/macosx/Mac-GUI-1.41.tar.gz > > then I suspect that you missed the point, that R.app is something you install _on_ _top_ _of_ an installation of R itself. > ... From ea304 at cam.ac.uk Thu Aug 4 09:34:31 2011 From: ea304 at cam.ac.uk (ea819) Date: Thu, 4 Aug 2011 00:34:31 -0700 (PDT) Subject: [R] Problems with Z in rhierMnlRwMixture using bayesm Message-ID: <1312443271516-3717887.post@n4.nabble.com> Dear All, I am using rhierMnlRwMixture in the bayesm package for the analysis of data from a choice experiment. I am trying to follow the margarine example set out in the bayesm manual (p.28). However, after several attempts I keep getting an error message with regards to my Z matrix as below. > Error in Z %*% t(matrix(olddelta, ncol = nz)) : >requires numeric/complex matrix/vector arguments I think the problem is arising from the following lines in the code that I am running. >Z=NULL >nlgt=length(lgtdata) >for(i in 1:nlgt) { >Z=rbind(Z,Demog[Demog[,1]==lgtdata[[i]]$id,3:9]) I am creating a Z matrix from the variables included in the ?Demog? spreadsheet columns 3 to 9 which includes the demographic variables of the respondents. Then the rbind matches these columns with the ids in the lgtdata (which contains the choice attributes). I am not sure if I am interpreting this code correctly or if I need to amend it. Can anyone help me with the interpretation of the below and whether I would need to specify it differently? Also below is the full code that I am running. select= c(1,2,3) id=levels(as.factor(ChoiceAtt[,1])) lgtdata=NULL nlgt=length(id) p=length(select) ind=1 for (i in 1:nlgt) { nobs=sum(ChoiceAtt[,1]==id[i]) data=ChoiceAtt y=data[,2] names(y)=NULL X=createX(p=p,na=1,Xa=data[,3:5],nd=NULL,Xd=NULL,INT=TRUE,base=1) lgtdata[[ind]]=list(y=y,X=X,id=id[i]); ind=ind+1 } nlgt=length(lgtdata) Z=NULL nlgt=length(lgtdata) for(i in 1:nlgt) { Z=rbind(Z,Demog[Demog[,1]==lgtdata[[i]]$id,3:9]) } Z=log(Z) Z[,1]=Z[,1]-mean(Z[,1]) Z[,2]=Z[,2]-mean(Z[,2]) keep=5 R=5000 mcmc1=list(keep=keep,R=R) out=rhierMnlRwMixture(Data=list(p=p,lgtdata=lgtdata,Z=Z),Prior=list(ncomp=1),Mcmc=mcmc1) Many thanks for all your help and time. Regards, Elcin -- View this message in context: http://r.789695.n4.nabble.com/Problems-with-Z-in-rhierMnlRwMixture-using-bayesm-tp3717887p3717887.html Sent from the R help mailing list archive at Nabble.com. From bt_jannis at yahoo.de Thu Aug 4 12:33:05 2011 From: bt_jannis at yahoo.de (Jannis) Date: Thu, 04 Aug 2011 12:33:05 +0200 Subject: [R] General indexing in multidimensional arrays In-Reply-To: References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> Message-ID: <4E3A7561.2030102@yahoo.de> Thanks, Michael. I was, however, after a function I coul use for both extracting and replacing subarrays. In case anybody else stumbles over this problem, here is my solution. Its programming is most probably horribly clumsy: ind.datacube = function( ##title<< create logical index matrices for multidimensional datacubes datacube ##<< array: datacube from which to extract the subparts , logical.ind ##<< logical array: TRUE/FALSE index matrix for a subset of the dimensions ## of datacube. The size of logical.ind`s dimesnions has to match the ## sizes of the corresponding dimesnions in datacube. , dims='auto' ##<< integer vector or 'auto' : indices of the dimensions in datacube corresponding ## to the dimensions of logical.ind. If set to 'auto' this matching is tried to ## be accomplished by comparing the sizes of the dimensions of the two objects. ) { if (sum(logical.ind) == 0) { stop('No TRUE value in index matrix!') } else { if (dims == 'auto') { if (is.null(dim(logical.ind)[1])) { size.ind = length(logical.ind) logical.ind = matrix(logical.ind,ncol=1) } else { size.ind = dim(logical.ind) } dims = match(size.ind, dim(datacube)) if (sum(duplicated(size.ind)) > 0 || sum(duplicated(dims)) > 0 ) stop('dimensions do not match unambigously. Supply dims manually!') } dims.nonapply <- setdiff(1:length(dim(datacube)),dims) ind.matrix <- which(logical.ind, arr.ind = TRUE) args.expand.grid <- list() counter = 1 for (i in 1: length(dim(datacube))) { if (is.element(i,dims.nonapply)) { args.expand.grid[[i]] = 1:dim(datacube)[dims.nonapply[i]] } else { args.expand.grid[[i]] = ind.matrix[,counter] counter = counter + 1 } } ind.all <- as.matrix(do.call(expand.grid, args.expand.grid)) ind.matrix <- ind.all[,order(c(dims.nonapply,dims))] } ##value<< integer index matrix which can be used to index datacube ind.matrix } On 08/04/2011 12:12 AM, R. Michael Weylandt wrote: >> This might be a little late: but how about this (slightly clumsy) function: >> >> putValues<- function(Insert, Destination, Location) { >> Location = as.matrix(Location) >> Location = array(Location,dim(Destination)) >> Destination[Location]<- Insert >> return(Destination) >> } >> >> It currently assumes that the location array lines up in dimension order, >> but other than that seems to work pretty well. If you want, it shouldn't be >> hard to change it to take in a set of dimensions to arrange Location along. >> If you like any of the other suggested behaviors, you could put in a >> is.null(Insert) option that returns the desired subset of values. I haven't >> tested it completely, but for a few sample inputs, it seems be do as >> desired. >> >> Michael >> >> >> On Wed, Aug 3, 2011 at 5:00 PM, Jannis wrote: >> >>> Thanks for all the replies!Unfortunately the solutions only work for >>> extracting subsets of the data (which was exactly what I was asking for) and >>> not to replace subsets with other values. I used them, however, to program a >>> rather akward function to do that. Seems I found one of the few aspects >>> where Matlab actually is slightly easier to use than R. >>> >>> >>> Thanks for your help! >>> Jannis >>> >>> On 08/01/2011 05:50 PM, Gene Leynes wrote: >>> >>>> What do you think about this? >>>> >>>> apply(data, 3, '[', indices) >>>> >>>> >>>> On Mon, Aug 1, 2011 at 4:38 AM, Jannis wrote: >>>> >>>> Dear R community, >>>>> >>>>> I have a general question regarding indexing in multidiemensional >>>>> arrays. >>>>> >>>>> Imagine I have a three dimensional array and I only want to extract on >>>>> vector along a single dimension from it: >>>>> >>>>> >>>>> data<- array(rnorm(64),dim=c(4,4,4)) >>>>> >>>>> result<- data[1,1,] >>>>> >>>>> If I want to extract more than one of these vectors, it would now really >>>>> help me to supply a logical matrix of the size of the first two >>>>> dimensions: >>>>> >>>>> >>>>> indices<- matrix(FALSE,ncol=4,nrow=4) >>>>> indices[1,3]<- TRUE >>>>> indices[4,1]<- TRUE >>>>> >>>>> result<- data[indices,] >>>>> >>>>> This, however would give me an error. I am used to this kind of indexing >>>>> from Matlab and was wonderingt whether there exists an easy way to do >>>>> this >>>>> in R without supplying complicated index matrices of all three >>>>> dimensions or >>>>> logical vectors of the size of the whole matrix? >>>>> >>>>> The only way I could imagine would be to: >>>>> >>>>> result<- data[rep(as.vector(indices),****times=4)] >>>>> >>>>> but this seems rather complicated and also depends on the order of the >>>>> dimensions I want to extract. >>>>> >>>>> >>>>> I do not want R to copy Matlabs behaviour, I am just wondering whether I >>>>> missed one concept of indexing in R? >>>>> >>>>> >>>>> >>>>> Thanks a lot >>>>> Jannis >>>>> >>>>> ______________________________****________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/****listinfo/r-help >>>>> >>>>> PLEASE do read the posting guide http://www.R-project.org/** >>>>> posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>> ______________________________**________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/** >>> posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> From erik.sengewald at uni-jena.de Thu Aug 4 13:44:13 2011 From: erik.sengewald at uni-jena.de (eriksengewald) Date: Thu, 4 Aug 2011 04:44:13 -0700 (PDT) Subject: [R] Sweave: pdf-graphics got three times lager with R 2.13.1 Message-ID: <1312458253420-3718378.post@n4.nabble.com> Dear R-Users, I am using R for years now but recently I encounter a problem with the pdf-size of graphics generated with sweave. However, I am new in this forum. Hence, please don't hesitate if I am wrong here. I use a script which runs perfectly in R 2.11.1 and the pdf-size of the graphs is about 3 KB. Running the same script with R 2.13.1 the file size increases to 12 KB. I am using about 300 pdf-pictures in my tex-file, generated by sweave. So this change in file size is dramatic for the finally pdf-file. Has somebody an explanation for this phenomenon? Thanks a lot for your help. Erik Here some more facts: - I am running a windows system (same problem with linux system) - This is an example of my code chunk: <<fig=TRUE,echo=false,width=4,height=0.6>>= PlotFunctionRef2(daten,daten_ref1,daten_ref2,g=1) //plotting function @ -- View this message in context: http://r.789695.n4.nabble.com/Sweave-pdf-graphics-got-three-times-lager-with-R-2-13-1-tp3718378p3718378.html Sent from the R help mailing list archive at Nabble.com. From j.vanwoerden at aim.uzh.ch Thu Aug 4 13:32:50 2011 From: j.vanwoerden at aim.uzh.ch (j.vanwoerden at aim.uzh.ch) Date: Thu, 4 Aug 2011 13:32:50 +0200 Subject: [R] phyres function in caper package Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From hiheister at gmail.com Thu Aug 4 11:25:35 2011 From: hiheister at gmail.com (Roby) Date: Thu, 4 Aug 2011 02:25:35 -0700 (PDT) Subject: [R] extractAIC Message-ID: <1312449935045-3718096.post@n4.nabble.com> Use extractAIC in frailty cox model (estimated with coxph function, gaussian random effect) i obtaided > extractAIC(fit.cox.f) [1] 11.84563 8649.11736 but I don't know why I can't use the classic formulation of the AIC where the degree of freedom are the number of the parameter (in my case 3). -- View this message in context: http://r.789695.n4.nabble.com/extractAIC-tp3718096p3718096.html Sent from the R help mailing list archive at Nabble.com. From Hilary.Browne at bristol.ac.uk Thu Aug 4 13:46:13 2011 From: Hilary.Browne at bristol.ac.uk (Hilary Browne) Date: Thu, 04 Aug 2011 12:46:13 +0100 Subject: [R] New multilevel modelling course practicals for the lmer and glmer functions - now online Message-ID: <89F08E7939335E8EFB0C5F83@educ-pc223b.edn.bris.ac.uk> The Centre for Multilevel Modelling is very pleased to announce the addition of R practicals to our free on-line multilevel modelling course. These give detailed instructions of how to carry out a range of analyses in R, starting from multiple regression and progressing through to multilevel modelling of continuous and binary data using the lmer and glmer functions. MLwiN and Stata versions of these practicals are already available. You will need to log on or register onto the course to view these practicals. Read More... http://www.cmm.bris.ac.uk/lemma/course/view.php?id=13 With best regards, Hilary Browne Technical and Business Manager (Part-time: working days Tuesdays, Thursdays and Fridays) Centre for Multilevel Modelling University of Bristol 2 Priory Road Bristol BS8 1TX Tel: +44 (0)117 331 0846 Web: From vhaguiar at gmail.com Thu Aug 4 08:46:12 2011 From: vhaguiar at gmail.com (pegasus_sudaka) Date: Wed, 3 Aug 2011 23:46:12 -0700 (PDT) Subject: [R] Semiparametric double-index Klein Vella 2009 estimator question. Message-ID: <1312440372354-3717812.post@n4.nabble.com> Dear List's Members, I'm trying to implement "1. Roger Klein and Francis Vella, ?A semiparametric model for binary response and continuous outcomes under index heteroscedasticity,? Journal of Applied Econometrics 24, no. 5 (2009): 735-762. " estimator. I have a technical doubt about the choice of the optimizer for the likelihood function maximization. That of pg. 743, the function is Q.star<-sum(tx*(y2*log(P.star)+(1-y2)*log(1-P.star))) with y2 ={0,1} and P.star a probability estimated using a semiparametric double-index estimation. I've tried DEoptim that is a global optimizer, but I would like to know what are the best options for this problems. Both to achieve the global maximum and also speed. Nloptr? Alabama? optimize? Please help me. Victor Ecuador South America -- View this message in context: http://r.789695.n4.nabble.com/Semiparametric-double-index-Klein-Vella-2009-estimator-question-tp3717812p3717812.html Sent from the R help mailing list archive at Nabble.com. From vhaguiar at gmail.com Thu Aug 4 08:54:28 2011 From: vhaguiar at gmail.com (pegasus_sudaka) Date: Wed, 3 Aug 2011 23:54:28 -0700 (PDT) Subject: [R] How to extract sublist from a list? In-Reply-To: <1312421910777-3717451.post@n4.nabble.com> References: <1312421910777-3717451.post@n4.nabble.com> Message-ID: <1312440868039-3717823.post@n4.nabble.com> Hi, have you considered using array, for example example<-array(1,dim=c(2,2,3)) dimnames(example)<-list(c("x1","x2"),c("y1","y2")) then if you want to extract the component "x1","y1" from every member of the list of matrices you have to example["x1","y1",] leaving blank in the last position. hope it helps. Victor -- View this message in context: http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717823.html Sent from the R help mailing list archive at Nabble.com. From MorganPH at cardiff.ac.uk Thu Aug 4 12:54:14 2011 From: MorganPH at cardiff.ac.uk (Peter Morgan) Date: Thu, 4 Aug 2011 11:54:14 +0100 Subject: [R] Coefficient names when using lm() with contrasts In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From xuanlong.ma at uts.edu.au Thu Aug 4 12:57:58 2011 From: xuanlong.ma at uts.edu.au (Richard Ma) Date: Thu, 4 Aug 2011 03:57:58 -0700 (PDT) Subject: [R] How to extract sublist from a list? In-Reply-To: References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> Message-ID: <1312455478593-3718282.post@n4.nabble.com> Hi Joshua, Really helpful that you posted so many useful solutions for my problem. I can understand all your codes except these: *[code] lapply(lst, `[[`, "y") ## or if you are only retrieving a single value sapply(lst, `[[`, "y") [/code]* Can you explain these a little bit? What does '[[' means? Thanks a lot, Richard Joshua Wiley-2 wrote: > > On Thu, Aug 4, 2011 at 12:12 AM, Ashim Kapoor > <ashimkapoor at gmail.com> wrote: >>> How would we do this problem looping over seq(1:2) ? > > Because this goes to an email list serv, it is good practice to quote > the original problem. I have no idea what "this" is. > >>> >>> >> To extend the example in the corresponding nabble post : - >> ?sub1<-list(x="a",y="ab") >> ?sub2<-list(x="c",y="ad") >> ?lst<-list(sub1=sub1,sub2=sub2) >> ?for ( t in seq(1:2) ) ?print(lst[[t]]$y) >> >> So I can print out the sub1$y/sub2$y but it's not clear how to extract >> them. > > Well, to extract them, just drop the call to print. You could use them > directly in the loop or could store them in new variables. > > ## note seq(1:2) is redundant with simply 1:2 > or (t in 1:2) print(nchar(lst[[t]]$y)) > > I am guess, though, that what you might be hoping to do is extract > specific elements from a list and store the extract elements in a new > list. > > lapply(1:2, function(i) lst[[i]]["y"]) > ## or compare > lapply(1:2, function(i) lst[[i]][["y"]]) > >> >> My original was different though. >> >> How would ?say:- >> >> for ( t in seq(1:2) ) sub"t"$y >> >> Where sub"t" evaluates to sub1 or sub 2? > > if you actually want "sub1", or "sub2": > > ## note that I am wrapping in print() not so that it works > ## but so that you can see it at the console > for (t in 1:2) print(paste("sub", t, sep = '')) > > from which we can surmise that the following should work: > > for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]) > > which trivially extends to: > > for (t in 1:2) print(lst[[paste("sub", t, sep = '')]]$y) > > or perhaps more appropriately > > for (t in 1:2) print(lst[[paste("sub", t, sep = '')]][["y"]]) > > If you just need to go one level down for *all* elements of your list > > lapply(lst, `[[`, "y") > ## or if you are only retrieving a single value > sapply(lst, `[[`, "y") > > Hope this helps, > > Josh > >> >> Many thanks. >> Ashim >> >> >>> On Thu, Aug 4, 2011 at 10:59 AM, Richard Ma >>> <xuanlong.ma at uts.edu.au>wrote: >>> >>>> Thank you so much GlenB! >>>> >>>> I got it done using your method. >>>> >>>> I'm just curious how did you get this idea? Cause for me, this looks so >>>> tricky.... >>>> >>>> Cheers, >>>> Richard >>>> >>>> ----- >>>> I'm a PhD student interested in Remote Sensing and R Programming. >>>> -- >>>> View this message in context: >>>> http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3717713.html >>>> Sent from the R help mailing list archive at Nabble.com. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, ATS Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- I'm a PhD student interested in Remote Sensing and R Programming. -- View this message in context: http://r.789695.n4.nabble.com/How-to-extract-sublist-from-a-list-tp3717451p3718282.html Sent from the R help mailing list archive at Nabble.com. From mayouf.k at gmail.com Thu Aug 4 08:37:48 2011 From: mayouf.k at gmail.com (mayouf.k) Date: Wed, 3 Aug 2011 23:37:48 -0700 (PDT) Subject: [R] Cannot clean infinite values In-Reply-To: <1312439868748-885312.post@n4.nabble.com> References: <1312439868748-885312.post@n4.nabble.com> Message-ID: <1312439868746-3717799.post@n4.nabble.com> hi everyone, i had the same problem, i simply wrote: *ifelse(is.infinite(AALB),NA,AALB)* this code replace infinite values by NA's, then you can use "na.omit", "replace", or even a zero, which is my case. goog luck Nigel -- View this message in context: http://r.789695.n4.nabble.com/Cannot-clean-infinite-values-tp885312p3717799.html Sent from the R help mailing list archive at Nabble.com. From testrider at gmail.com Thu Aug 4 11:27:55 2011 From: testrider at gmail.com (testrider) Date: Thu, 4 Aug 2011 02:27:55 -0700 (PDT) Subject: [R] R loop problem Message-ID: <1312450075348-3718103.post@n4.nabble.com> I have run into a speed issue, and given the size of the problem it feels like there should be an easy solution. Here is the problem statement with some arbitrary numbers added. #p,q: vector with length(q)==length(p)==10000 and length(levels(p))==3000 #y,z: vectors with length(levels(y))=length(y)==length(z)==5000 for (i in levels(p)){ q[i==p]<-z[i==y]} At first i used two for loops which was horrible, now i got rid of one but i don't know how to lose the second one. PS. I expect the solution to be available through google etc and I have searched for a solution but i did not find any usefull websites, probably because i cannot pinpoint the best search words. -- View this message in context: http://r.789695.n4.nabble.com/R-loop-problem-tp3718103p3718103.html Sent from the R help mailing list archive at Nabble.com. From maechler at stat.math.ethz.ch Thu Aug 4 14:20:00 2011 From: maechler at stat.math.ethz.ch (Martin Maechler) Date: Thu, 4 Aug 2011 14:20:00 +0200 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <4E38635A.1000009@gmail.com> References: <1312310369354-3713305.post@n4.nabble.com> <4E38635A.1000009@gmail.com> Message-ID: <20026.36464.561281.782365@stat.math.ethz.ch> >>>>> Duncan Murdoch >>>>> on Tue, 2 Aug 2011 16:51:38 -0400 writes: > On 11-08-02 2:39 PM, wwreith wrote: >> Does anyone know how to create a 3D Bargraph using >> ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a >> 2D bar graph with a 3D shaped bard. See attached excel file >> for an example. >> >> Before anyone asks I know that 3D looking bars don't add >> anything except "prettiness". > If you want graphs like that, you should be using Excel, not R. Yes!! 10th commandment of R: You shall not misuse R! Martin Maechler > Duncan Murdoch From E.Vettorazzi at uke.uni-hamburg.de Thu Aug 4 14:21:39 2011 From: E.Vettorazzi at uke.uni-hamburg.de (Eik Vettorazzi) Date: Thu, 04 Aug 2011 14:21:39 +0200 Subject: [R] persp() In-Reply-To: References: Message-ID: <4E3A8ED3.1090209@uke.uni-hamburg.de> Hi Rosario, you might have a look at the "maps" and "maptools" (for reading shape-files) packages. #e.g. library(maps) map("world",c("sweden","germany")) Cheers Am 04.08.2011 02:58, schrieb Rosario Garcia Gil: > Hello > > I am trying to draw a basic black and white map of two European countries. > > After searching some key words in google and reading many pages I arrived to the conclusion that persp() could be used to draw that map. > > I have prepared three small example files, which are supposed to be the files required for running that function. > > xvector is a vector with the longitudes > yvector is a vector with the latitudes > zmatrix is supposed to the height, but since I only need a flat map I just gave the value 1 to each of the entries of the matrix (I am not sure this is correct though). > > The first question for me when using persp() is that x and y values should be in increasing values (following the instructions), but I understand that the coordinates x and y are actually pairs of values (longitude/latitude pairs of values) and if I order them in ascending order both then the pairing is gone. I guess I am totally lost! > > Still even if I try to run persp() by ordering in ascending value x and y values (even if it does not make sense for me) I still get this message: > > <- persp(xvector,yvector,zmatrix,theta=-40,phi=30) > Error in persp.default(xvector, yvector, zmatrix, theta = -40, phi = 30) : > increasing 'x' and 'y' values expected > > Any help is wellcome. Is there any other better function to draw a flat map (2D), also example of the imput files is wellcome. Thanks in advance. > Rosario > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Department of Medical Biometry and Epidemiology University Medical Center Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/7410-58243 F ++49/40/7410-57790 From jholtman at gmail.com Thu Aug 4 14:27:06 2011 From: jholtman at gmail.com (jim holtman) Date: Thu, 4 Aug 2011 08:27:06 -0400 Subject: [R] R loop problem In-Reply-To: <1312450075348-3718103.post@n4.nabble.com> References: <1312450075348-3718103.post@n4.nabble.com> Message-ID: A subset of actual data and what you would expect as a result would be very helpful. All you say is that p.q are vectors, but it would appear that they are character vectors, but the content is unknown. Also will the expression "q[i==p]<-z[i==y]" have the same length on each side; the vectors appear to be of different lengths -- what happens if recycling kicks in? Or it is always a match of length 1" So a little more definition, and data and example, would help. On Thu, Aug 4, 2011 at 5:27 AM, testrider wrote: > I have run into a speed issue, and given the size of the problem it feels > like there should be an easy solution. Here is the problem statement with > some arbitrary numbers added. > > #p,q: vector with length(q)==length(p)==10000 and length(levels(p))==3000 > #y,z: vectors with length(levels(y))=length(y)==length(z)==5000 > > for (i in levels(p)){ > q[i==p]<-z[i==y]} > > At first i used two for loops which was horrible, now i got rid of one but i > don't know how to lose the second one. > > > PS. I expect the solution to be available through google etc and I have > searched for a solution but i did not find any usefull websites, probably > because i cannot pinpoint the best search words. > > -- > View this message in context: http://r.789695.n4.nabble.com/R-loop-problem-tp3718103p3718103.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jvadams at usgs.gov Thu Aug 4 14:31:37 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Thu, 4 Aug 2011 07:31:37 -0500 Subject: [R] persp() In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From murdoch.duncan at gmail.com Thu Aug 4 14:41:29 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Thu, 04 Aug 2011 08:41:29 -0400 Subject: [R] Rdconv LaTeX files In-Reply-To: References: Message-ID: <4E3A9379.6020709@gmail.com> On 11-08-04 1:42 AM, Michael O'Keeffe wrote: > Hi, > I've written a package and converted my Rd files into LaTeX using > Rdconv. When I copy and paste these files in to my Sweave document I > get the error message when compiling the Sweave file: > > ! Undefined control sequence. > l.32 \inputencoding > {utf8} > > I also get the error message for the "\HeaderA", "\keyword". > Additionally I get an environment undefined error for some of the > "\begin" control sequences. Do I need to specifically "include" > certain LaTeX libraries in my Sweave document? Yes, Rd files use the Rd.sty style. So \usepackage{Rd} would be necessary. It's not designed for this kind of use, so you might find incompatibilities; I've never tried it. Duncan Murdoch From jvadams at usgs.gov Thu Aug 4 14:41:47 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Thu, 4 Aug 2011 07:41:47 -0500 Subject: [R] R loop problem In-Reply-To: <1312450075348-3718103.post@n4.nabble.com> References: <1312450075348-3718103.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From murdoch.duncan at gmail.com Thu Aug 4 14:48:17 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Thu, 04 Aug 2011 08:48:17 -0400 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <20026.36464.561281.782365@stat.math.ethz.ch> References: <1312310369354-3713305.post@n4.nabble.com> <4E38635A.1000009@gmail.com> <20026.36464.561281.782365@stat.math.ethz.ch> Message-ID: <4E3A9511.4060704@gmail.com> On 11-08-04 8:20 AM, Martin Maechler wrote: >>>>>> Duncan Murdoch >>>>>> on Tue, 2 Aug 2011 16:51:38 -0400 writes: > > > On 11-08-02 2:39 PM, wwreith wrote: > >> Does anyone know how to create a 3D Bargraph using > >> ggplot2/qplot. I don't mean 3D as in x,y,z coordinates. Just a > >> 2D bar graph with a 3D shaped bard. See attached excel file > >> for an example. > >> > >> Before anyone asks I know that 3D looking bars don't add > >> anything except "prettiness". > > > If you want graphs like that, you should be using Excel, not R. > > Yes!! > > 10th commandment of R: You shall not misuse R! I admit that the commandment partly motivated my message, but this was also advice to William: Excel is designed to produce those things, R is not, so it is an inefficient use of his time to try to produce them in R. Duncan Murdoch From jbelmont at bcm.edu Thu Aug 4 14:53:17 2011 From: jbelmont at bcm.edu (Belmont, John W) Date: Thu, 4 Aug 2011 07:53:17 -0500 Subject: [R] limma contrast matrix In-Reply-To: References: <1312450075348-3718103.post@n4.nabble.com> Message-ID: <3A5A7B0CDDA794449406911C6DC92FC84C701736FD@EXCMSMBX05.ad.bcm.edu> I am trying to correct for the effect of 2 covariates in a gene expression data set. design<-model.matrix(~0 + Factor + cov1 + cov2) QUESTION: How to set up the contrast matrix? The usual commands fit <- lmFit(selDataMatrix, design) cont.matrix <- makeContrasts(FacCont=Group1-Group2, levels=design) fit2 <- contrasts.fit(fit, cont.matrix) does not work because the original design matrix includes the covariates. I think I don't really understand how the contrast matrix works. John W. Belmont, M.D.,Ph.D. Professor Department of Molecular and Human Genetics Baylor College of Medicine 1100 Bates, Room 8070 Houston, TX 77030 713-798-4634 fax: 713-798-7187 From ajn21 at case.edu Thu Aug 4 14:56:09 2011 From: ajn21 at case.edu (a217) Date: Thu, 4 Aug 2011 05:56:09 -0700 (PDT) Subject: [R] Counting rows given conditional Message-ID: <1312462569252-3718541.post@n4.nabble.com> Hello, I have an input file that contains multiple columns, but the column I'm concerned about looks like: "TR" 5 0 4 1 0 2 0 To count all of the rows in the column I know how to do NROW(x$TR) which gives 7. However, I would also like to count only the number of rows with values >=1 (i.e. not 0). I've tried NROW(x$TR>=1) which did not give the intended output. Do any of you have any suggestions as to where I'm going wrong? -- View this message in context: http://r.789695.n4.nabble.com/Counting-rows-given-conditional-tp3718541p3718541.html Sent from the R help mailing list archive at Nabble.com. From maechler at stat.math.ethz.ch Thu Aug 4 15:05:24 2011 From: maechler at stat.math.ethz.ch (Martin Maechler) Date: Thu, 4 Aug 2011 15:05:24 +0200 Subject: [R] Possible bug of QR decomposition in package Matrix In-Reply-To: References: Message-ID: <20026.39188.418729.906712@stat.math.ethz.ch> >>>>> C6H5NO2 >>>>> on Thu, 4 Aug 2011 16:03:54 +0800 writes: > Thank you very much, Josh! > As you suggested, I will contact the developers of "Matrix". > PS, C6 are just initial characters of my email account :-) > Best wishes, > C6 well, as the posting guide (http://www.R-project.org/posting-guide.html) says, this is regarded as impolite by many, and if I wasn't one of the Matrix package authors, I would not spend time helping 'C6' either. > 2011/8/4 Joshua Wiley : >> Hi C6 (were C1 - 5 already taken in your family?), >> >> I downloaded your data and can replicate your problem. ?R >> ceases responding and terminates. ?This does not occur with all >> uses of qr on a dgCMatrix object. ?I know nothing about sparse >> matrices, but if you believe this should not be occurring, you >> should contact the package maintainers. ?Here is my >> sessionInfo() (FYI, it would probably be helpful to report >> yours also in case the issue is version dependent): >> >> R Under development (unstable) (2011-07-30 r56564) Platform: >> x86_64-pc-mingw32/x64 (64-bit) >> >> locale: [1] LC_COLLATE=English_United States.1252 [2] >> LC_CTYPE=English_United States.1252 [3] >> LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] >> LC_TIME=English_United States.1252 >> >> attached base packages: [1] stats ? ? graphics ?grDevices utils >> ? ? datasets ?methods ? base >> >> other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30 >> >> loaded via a namespace (and not attached): [1] grid_2.14.0 >> ?tools_2.14.0 >> >> Cheers, >> >> Josh >> >> On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 >> wrote: >>> Hello R users, >>> >>> I am trying to give the QR decomposition for a large sparse >>> matrix in the format of dgCMatrix. When I run qr function for >>> this matrix, the R session simply stops and exits to the >>> shell. The matrix is of size 108595x108595, and it has >>> 4866885 non-zeros. I did the experiment on windows 7 and linux >>> mint 11 (both 64 bit), and the results are the same. >>> >>> I have uploaded my data file to >>> http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I >>> hope someone could kindly download it. The code to see my >>> problem is: >>> library(Matrix) >>> load("A.RData") >>> B <- qr(A) >>> Best wishes, C6 And what's the size of RAM your two computers have ?? The answer is of quite some importance. Short answer: If you have a large very sparse matrix, you don't know if the QR decomposition of that matrix is also very sparse... and if it ain't it will blow up memory, and that's what I'm pretty sure happened with you. What I don't see is why R "simply stops" for you and does not through a an error message about insufficient memory. As I show below, I do get a seg.fault --- which may be considered a bug --- *BUT* I do get the message about memory problems. Did you really *not* get any such message? Is it because you've used a GUI that hides such valuable information from the user? Here's the more detailed reason / analysis about why the above "kills R". This is commented R code, you can cut paste after you've got 'A' : str(A) ## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ## ..@ i : int [1:4866885] 0 1 2 16 32 33 2392 2417 0 1 ... ## ..@ p : int [1:108596] 0 8 21 35 44 51 59 63 69 78 ... ## ..@ Dim : int [1:2] 108595 108595 ## ..@ Dimnames:List of 2 ## .. ..$ : NULL ## .. ..$ : NULL ## ..@ x : num [1:4866885] 140.03 14.79 14.79 1.78 1.78 ... ## ..@ factors : list() system.time(# the following is still not as fast as it could be): isSymmetric(A) # yes ! )# 1.13 {the *2nd* time; 1.9 the 1st time !} ## First work with a submatrix: n <- 10000 A1 <- A[1:n, 1:n] system.time( qr1 <- qr(A1))# on cmath-8, machine with 48 GB RAM memory ## user system elapsed ## 59.884 0.316 60.240 !! str(qr1) ## Formal class 'sparseQR' [package "Matrix"] with 6 slots ## ..@ V :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ## .. .. ..@ i : int [1:7692948] 0 1 2 3 4 3 4 5 4 5 ... ## .. .. ..@ p : int [1:10001] 0 1 2 5 8 10 11 12 18 26 ... ## .. .. ..@ Dim : int [1:2] 10000 10000 ## .. .. ..@ Dimnames:List of 2 ## .. .. .. ..$ : NULL ## .. .. .. ..$ : NULL ## .. .. ..@ x : num [1:7692948] 1 1 -3.71 8.68 -8.68 ... ## .. .. ..@ factors : list() ## ..@ beta: num [1:10000] 0 0 0.01216 0.00743 0.01758 ... ## ..@ p : int [1:10000] 61 62 63 64 94 80 161 162 163 164 ... ## ..@ R :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots ## .. .. ..@ i : int [1:13581659] 0 1 2 2 3 3 4 2 3 4 ... ## .. .. ..@ p : int [1:10001] 0 1 2 3 5 7 11 12 13 15 ... ## .. .. ..@ Dim : int [1:2] 10000 10000 ## .. .. ..@ Dimnames:List of 2 ## .. .. .. ..$ : NULL ## .. .. .. ..$ : NULL ## .. .. ..@ x : num [1:13581659] 370.37 3.47 22.14 12.48 17.12 ... ## .. .. ..@ factors : list() ## ..@ q : int [1:10000] 61 62 63 64 80 94 161 162 163 164 ... ## ..@ Dim : int [1:2] 10000 10000 object.size(A1) ## 4352184 bytes object.size(qr1) ## 255539456 bytes, i.e. 255 MB c(object.size(qr1) / object.size(A1)) ## 58.715 c(object.size(A) / object.size(A1)) ## 13.52 ##--> "predicted size" of qr(A): c(object.size(A) / object.size(A1))*object.size(qr1) ## ~ 3 G Bytes n <- 20000 A2 <- A[1:n, 1:n] system.time( qr2 <- qr(A2))# on cmath-8, machine with 48 GB RAM memory ## user system elapsed ## 1024.068 2.850 1027.488 -- 17 minutes object.size(A2) ## 8504464 bytes object.size(qr2) ## 1432'809992 bytes, i.e. 1432.81 MBytes c(object.size(qr2) / object.size(A2)) ## 168.4774 ##--> "predicted size" of qr(A): c(object.size(A) / object.size(A2) *object.size(qr2)) ## 9912'944757 == 9912.944757 MBytes ~= 10 GBytes --- this will not fit! ## Ok: one step further n <- 30000 A3 <- A[1:n, 1:n] system.time( qr3 <- qr(A3))# on cmath-8, machine with 48 GB RAM memory ## user system elapsed ## 3384.183 32.234 3418.392 -- almost one hour ! object.size(A3) ## 11'335112 bytes object.size(qr3) ## 3059'252216 bytes, i.e. 3059 MBytes c(object.size(qr3) / object.size(A3)) ## 269.9 ##--> "predicted size" of qr(A): c(object.size(A) / object.size(A3) *object.size(qr3)) ## 1.588e+10 --- ~ 15 GB -- this is *MORE* than an R object can contain: .Machine$integer.max ## 2147483647 = 2.147'483'647 e9 system.time(ch1 <- chol(A1)) ## CHOLMOD warning: ## Error in .local(x, ...) : CHOLMOD factorization was unsuccessful ## In addition: Warning message: ## In .local(x, ...) : ## Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, system.time(lu1 <- lu(A1)) ## Error ... : cs_lu(A) failed: near-singular A (or out of memory) ##--- Ok, now try the full thing, and see if "R dies without a word" ## or if it at least says something before death : system.time( qrA <- qr(A) ) ## ## *** caught segfault *** ## address 0x7f3f5f48cf70, cause 'memory not mapped' ## /u/maechler/bin/R_arg: line 137: 15063 Segmentation fault $exe $@ From petr.pikal at precheza.cz Thu Aug 4 15:08:08 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Thu, 4 Aug 2011 15:08:08 +0200 Subject: [R] Odp: Counting rows given conditional In-Reply-To: <1312462569252-3718541.post@n4.nabble.com> References: <1312462569252-3718541.post@n4.nabble.com> Message-ID: Hi r-help-bounces at r-project.org napsal dne 04.08.2011 14:56:09: > a217 > Odeslal: r-help-bounces at r-project.org > > 04.08.2011 14:56 > > Komu > > r-help at r-project.org > > Kopie > > P?edm?t > > [R] Counting rows given conditional > > Hello, > > I have an input file that contains multiple columns, but the column I'm > concerned about looks like: > > "TR" > 5 > 0 > 4 > 1 > 0 > 2 > 0 > > To count all of the rows in the column I know how to do NROW(x$TR) which > gives 7. > > However, I would also like to count only the number of rows with values >=1 > (i.e. not 0). I've tried NROW(x$TR>=1) which did not give the intended > output. You are quite close. x$TR>=1 gives you logical vector TRUE/FALSE. You can compute count of TRUE values by sum(logical.vector) e.g. sum(x$TR>=1) Regards Petr > > Do any of you have any suggestions as to where I'm going wrong? > > > > -- > View this message in context: http://r.789695.n4.nabble.com/Counting-rows- > given-conditional-tp3718541p3718541.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dimitri.liakhovitski at gmail.com Thu Aug 4 15:26:37 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Thu, 4 Aug 2011 09:26:37 -0400 Subject: [R] identifying weeks (dates) that certain days (dates) fall into In-Reply-To: References: Message-ID: Sorry for renewing the topoic. I thought it worked but now I've run into a little problem: # My data frame with dates for week starts (Mondays) y<-data.frame(week=seq(as.Date("2009-12-28"), as.Date("2011-12-26"),by="week") ) # I have a vector of super bowl dates (including the future one for 2012): sbwl.dates<-as.Date(c("2005-02-06","2006-02-05","2007-02-04","2008-02-03","2009-02-01","2010-02-07","2011-02-06","2012-02-05")) I want to find the weeks in y that contain super bowl dates for applicable years. I am trying: sbwl.weeks<-findInterval(sbwl.dates, y$week) sbwl.weeks<-sbwl.weeks[sbwl.weeks>0] (sbwl.weeks) > 6 58 105 y$flag<-0 y$flag[sbwl.weeks]<-1 6 and 58 are correct. But why am I getting 105 (the last row)? Any way to fix it? Thanks a lot! Dimitri On Tue, Aug 2, 2011 at 12:57 PM, Dimitri Liakhovitski wrote: > Thanks a lot, everyone! > Dimitri > > On Tue, Aug 2, 2011 at 12:34 PM, Dennis Murphy wrote: >> Hi: >> >> You could try the lubridate package: >> >> library(lubridate) >> week(weekly$week) >> week(july4) >> [1] 27 27 >> >>> week >> function (x) >> yday(x)%/%7 + 1 >> >> >> which is essentially Gabor's code :) >> >> HTH, >> Dennis >> >> On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski >> wrote: >>> Hello! >>> >>> I have dates for the beginning of each week, e.g.: >>> weekly<-data.frame(week=seq(as.Date("2010-04-01"), >>> as.Date("2011-12-26"),by="week")) >>> week ?# each week starts on a Monday >>> >>> I also have a vector of dates I am interested in, e.g.: >>> july4<-as.Date(c("2010-07-04","2011-07-04")) >>> >>> I would like to flag the weeks in my weekly$week that contain those 2 >>> individual dates. >>> I can only think of a very clumsy way of doing it: >>> >>> myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7), >>> ? ? ? ?which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7)) >>> weekly$flag<-0 >>> weekly$flag[myrows]<-1 >>> >>> It's clumsy - because actually, my vector of dates of interest (july4 >>> above) is much longer. >>> Is there maybe a more elegant way of doing it? >>> Thank you! >>> -- >>> Dimitri Liakhovitski >>> marketfusionanalytics.com >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > > > -- > Dimitri Liakhovitski > marketfusionanalytics.com > -- Dimitri Liakhovitski marketfusionanalytics.com From jmacdon at med.umich.edu Thu Aug 4 15:36:54 2011 From: jmacdon at med.umich.edu (James W. MacDonald) Date: Thu, 4 Aug 2011 09:36:54 -0400 Subject: [R] limma contrast matrix In-Reply-To: <3A5A7B0CDDA794449406911C6DC92FC84C701736FD@EXCMSMBX05.ad.bcm.edu> References: <1312450075348-3718103.post@n4.nabble.com> <3A5A7B0CDDA794449406911C6DC92FC84C701736FD@EXCMSMBX05.ad.bcm.edu> Message-ID: <4E3AA076.7060702@med.umich.edu> Hi John, The limma package is part of Bioconductor, so you should be asking this question on the BioC listserv rather than R-help. In addition, please see the posting guide. The example you give is not self-contained, and the term 'does not work' is so unspecific as to be useless. On 8/4/2011 8:53 AM, Belmont, John W wrote: > I am trying to correct for the effect of 2 covariates in a gene expression data set. > > > design<-model.matrix(~0 + Factor + cov1 + cov2) > > > QUESTION: > How to set up the contrast matrix? > > The usual commands > > fit<- lmFit(selDataMatrix, design) > cont.matrix<- makeContrasts(FacCont=Group1-Group2, levels=design) Why would your design matrix have columns labeled 'Group1' and 'Group2'? Your code doesn't indicate that you did any such naming. > fit2<- contrasts.fit(fit, cont.matrix) > > does not work because the original design matrix includes the covariates. > > I think I don't really understand how the contrast matrix works. I agree. So let's make some wild assumptions, and maybe I can help. Factor <- factor(rep(1:2, each=6)) cov1 <- factor(rep(1:2, times=2, each=3)) ## let's say you had two batches. cov2 <- rnorm(12) ## and the other covariate is continuous mat <- model.matrix(~ 0 + Factor + cov1 + cov2) > mat Factor1 Factor2 cov12 cov2 1 1 0 0 0.75675940 2 1 0 0 -0.80761696 3 1 0 0 0.61228480 4 1 0 1 -1.13920820 5 1 0 1 -0.24367358 6 1 0 1 0.32244694 7 0 1 0 0.51438468 8 0 1 0 -2.23587057 9 0 1 0 -0.06560733 10 0 1 1 -0.11273432 11 0 1 1 0.33398074 12 0 1 1 0.70900581 Now we have a model that controls for the fact that you did things in two batches (naughty boy) and that you have some random continuous covariate you want to control for. So Factor1 is the mean of the first group after controlling for the other two covariates, and Factor2 is the same, for the other group. Can you see that a contrast comparing these two factors is what you are after? So something like makeContrasts(whatIwant=Factor1-Factor2, groups=design) should get you where you want to go. Best, Jim > > > John W. Belmont, M.D.,Ph.D. > Professor > Department of Molecular and Human Genetics > Baylor College of Medicine > 1100 Bates, Room 8070 > Houston, TX 77030 > 713-798-4634 > fax: 713-798-7187 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- James W. MacDonald, M.S. Biostatistician Douglas Lab University of Michigan Department of Human Genetics 5912 Buhl 1241 E. Catherine St. Ann Arbor MI 48109-5618 734-615-7826 ********************************************************** Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues From michael.weylandt at gmail.com Thu Aug 4 15:30:28 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Thu, 4 Aug 2011 09:30:28 -0400 Subject: [R] How to extract sublist from a list? In-Reply-To: <1312455478593-3718282.post@n4.nabble.com> References: <1312421910777-3717451.post@n4.nabble.com> <1312425956068-3717556.post@n4.nabble.com> <1312435745257-3717713.post@n4.nabble.com> <1312455478593-3718282.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From testrider at gmail.com Thu Aug 4 15:36:07 2011 From: testrider at gmail.com (testrider) Date: Thu, 4 Aug 2011 06:36:07 -0700 (PDT) Subject: [R] R loop problem In-Reply-To: References: <1312450075348-3718103.post@n4.nabble.com> Message-ID: <1312464967603-3718670.post@n4.nabble.com> Worked like a charm! Thanks a lot Jean V Adams -- View this message in context: http://r.789695.n4.nabble.com/R-loop-problem-tp3718449p3718670.html Sent from the R help mailing list archive at Nabble.com. From cddesjardins at gmail.com Thu Aug 4 15:38:07 2011 From: cddesjardins at gmail.com (Christopher Desjardins) Date: Thu, 4 Aug 2011 08:38:07 -0500 Subject: [R] Plotting just a portion of a smoother graph in ggplot2 Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From juliane_struve at yahoo.co.uk Thu Aug 4 15:47:36 2011 From: juliane_struve at yahoo.co.uk (Juliane Struve) Date: Thu, 4 Aug 2011 14:47:36 +0100 (BST) Subject: [R] Plotting multiple age distributions with histogram() Message-ID: <1312465656.53666.YahooMailNeo@web29509.mail.ird.yahoo.com> Dear list, ? I would like to plot several age distributions that are influenced by a factor (called group in the data example below). I would like to use historam() in the lattice package, but I can't figure out how to set up the data. Could someone give me a hint how to do it? for the example below? ? Thanks a lot, ? Juliane? ? data <- structure(list(Age = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L), No = c(1084609L, 1076523L, 1066171L, 887874L, 881239L, 872763L, 726800L, 720632L, 712984L), Group = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L)), .Names = c("Age", "No", "Group"), class = "data.frame", row.names = c(NA, -9L)) From michael.weylandt at gmail.com Thu Aug 4 16:03:01 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Thu, 4 Aug 2011 10:03:01 -0400 Subject: [R] conditional data replace (recode, change or whatsoever) In-Reply-To: References: <1312358754120-3714715.post@n4.nabble.com> <09724EC7-2696-4150-ADF0-CCE5FA904AF7@gmail.com> <1312373370210-3715218.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From paul.hiemstra at knmi.nl Thu Aug 4 16:10:37 2011 From: paul.hiemstra at knmi.nl (Paul Hiemstra) Date: Thu, 04 Aug 2011 14:10:37 +0000 Subject: [R] slow computation of functions over large datasets In-Reply-To: References: Message-ID: <4E3AA85D.5070606@knmi.nl> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From themjwok at gmail.com Thu Aug 4 16:24:29 2011 From: themjwok at gmail.com (Michael O'Keeffe) Date: Fri, 5 Aug 2011 00:24:29 +1000 Subject: [R] Rdconv LaTeX files In-Reply-To: <4E3A9379.6020709@gmail.com> References: <4E3A9379.6020709@gmail.com> Message-ID: Thanks! I found (by properly reading docs) to use the inputenc package. The Rd package solve most problems except for where I has a multiple line code element in my Rd file. I tried to convert it to a LaTeX Rd 'code' element and use an 'alltt' environment to hold the multi-line text. I worked around this by using the verbatim element. Thank you, your help was invaluable! Regards, Michael On Thu, Aug 4, 2011 at 10:41 PM, Duncan Murdoch wrote: >> >> Hi, >> I've written a package and converted my Rd files into LaTeX using >> Rdconv. When I copy and paste these files in to my Sweave document I >> get the error message when compiling the Sweave file: >> >> ! Undefined control sequence. >> l.32 \inputencoding >> ? ? ? ? ? ? ? ? ? ?{utf8} >> >> I also get the error message for the "\HeaderA", "\keyword". >> Additionally I get an environment undefined error for some of the >> "\begin" control sequences. Do I need to specifically "include" >> certain LaTeX libraries in my Sweave document? > > Yes, Rd files use the Rd.sty style. ?So \usepackage{Rd} would be necessary. > ?It's not designed for this kind of use, so you might find > incompatibilities; I've never tried it. > > Duncan Murdoch > From patrick.breheny at uky.edu Thu Aug 4 16:33:47 2011 From: patrick.breheny at uky.edu (Patrick Breheny) Date: Thu, 4 Aug 2011 10:33:47 -0400 Subject: [R] limma contrast matrix In-Reply-To: <3A5A7B0CDDA794449406911C6DC92FC84C701736FD@EXCMSMBX05.ad.bcm.edu> References: <1312450075348-3718103.post@n4.nabble.com> <3A5A7B0CDDA794449406911C6DC92FC84C701736FD@EXCMSMBX05.ad.bcm.edu> Message-ID: <4E3AADCB.7010603@uky.edu> On 08/04/2011 08:53 AM, Belmont, John W wrote: > I am trying to correct for the effect of 2 covariates in a gene > expression data set. > > design<-model.matrix(~0 + Factor + cov1 + cov2) > > > QUESTION: How to set up the contrast matrix? > > The usual commands > > fit <- lmFit(selDataMatrix, design) > cont.matrix <- makeContrasts(FacCont=Group1-Group2, levels=design) > fit2 <- contrasts.fit(fit, cont.matrix) > > does not work because the original design matrix includes the > covariates. > > I think I don't really understand how the contrast matrix works. 1) Unless you are sure you know what you're doing and have given this a lot of thought, I doubt you want to remove the intercept in this model. 2) A contrast specifies the coefficients of the linear combination of parameters that you wish to estimate/test. If you don't know what this means, then I would advise learning more about regression or consulting a statistician before proceeding. 3) If the first two comments haven't scared you off, then you can specify contrasts via: cont.matrix <- matrix(c(-1,1,0,0),ncol=1) under the original parameterization, or cont.matrix <- matrix(c(0,1,0,0),ncol=1) if you take the advice of point 1) and put an intercept in your model. -- Patrick Breheny Assistant Professor Department of Biostatistics Department of Statistics University of Kentucky From timothy.c.bates at gmail.com Thu Aug 4 16:51:37 2011 From: timothy.c.bates at gmail.com (Timothy Bates) Date: Thu, 4 Aug 2011 15:51:37 +0100 Subject: [R] Plotting just a portion of a smoother graph in ggplot2 In-Reply-To: References: Message-ID: It's not immediately obvious You need to look at coord_cartesian() and its ylim argument. Best, t Sent from my iPhone On 4 Aug 2011, at 02:38 PM, Christopher Desjardins wrote: > Hi, > I am using ggplot2 to with the following code: > > gmathk2 <- > qplot(time,math,colour=Kids,data=kids.ach.lm.k5,geom="smooth",method="lm",formula=y~ns(x,1)) > + opts(title="Smoother Plot: Math K-5") + xlab("Time") + ylab("Math") + > scale_colour_brewer(pal="Set1"); gmathk2 > > This plots all the smoother for all the x values. What I'd like to do is > plot the smoother for the x values that are only greater than or equal to 0. > I don't want this: > > gmathk2 <- > qplot(time,math,colour=Kids,data=kids.ach.lm.k5,geom="smooth",method="lm",formula=y~ns(x,1)) > + opts(title="Smoother Plot: Math K-5") + xlab("Time") + ylab("Math") + > scale_colour_brewer(pal="Set1") + xlim(0,50); gmathk2 > > Because adding xlim seems to throw away the data below 0 when calculating > the smoother. What I want it to do is have ggplot2 give me the same graph as > the first command but just not plot the part of the smoother that is below > 0. > > Thanks, > Chris > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From c6h5no2 at gmail.com Thu Aug 4 17:11:50 2011 From: c6h5no2 at gmail.com (C6H5NO2) Date: Thu, 4 Aug 2011 23:11:50 +0800 Subject: [R] Possible bug of QR decomposition in package Matrix In-Reply-To: <20026.39188.418729.906712@stat.math.ethz.ch> References: <20026.39188.418729.906712@stat.math.ethz.ch> Message-ID: Thank you, Martin. Yes, on my computer (48Gb memory) it also gives a message "caught segfault" as you got. So it is because the matrix size is too large. I apologize for the offence I've caused. 2011/8/4 Martin Maechler : >>>>>> C6H5NO2 ? >>>>>> ? ? on Thu, 4 Aug 2011 16:03:54 +0800 writes: > > ? ?> Thank you very much, Josh! > ? ?> As you suggested, I will contact the developers of "Matrix". > > ? ?> PS, C6 are just initial characters of my email account :-) > > ? ?> Best wishes, > ? ?> C6 > > well, as the posting guide (http://www.R-project.org/posting-guide.html) > says, this is regarded as impolite by many, > and if I wasn't one of the Matrix package authors, > I would not spend time helping ?'C6' ?either. > > ? ?> 2011/8/4 Joshua Wiley : > ? ?>> Hi C6 (were C1 - 5 already taken in your family?), > ? ?>> > ? ?>> I downloaded your data and can replicate your problem. ?R > ? ?>> ceases responding and terminates. ?This does not occur with all > ? ?>> uses of qr on a dgCMatrix object. ?I know nothing about sparse > ? ?>> matrices, but if you believe this should not be occurring, you > ? ?>> should contact the package maintainers. ?Here is my > ? ?>> sessionInfo() (FYI, it would probably be helpful to report > ? ?>> yours also in case the issue is version dependent): > ? ?>> > ? ?>> R Under development (unstable) (2011-07-30 r56564) Platform: > ? ?>> x86_64-pc-mingw32/x64 (64-bit) > ? ?>> > ? ?>> locale: [1] LC_COLLATE=English_United States.1252 [2] > ? ?>> LC_CTYPE=English_United States.1252 [3] > ? ?>> LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] > ? ?>> LC_TIME=English_United States.1252 > ? ?>> > ? ?>> attached base packages: [1] stats ? ? graphics ?grDevices utils > ? ?>> ? ? datasets ?methods ? base > ? ?>> > ? ?>> other attached packages: [1] Matrix_0.999375-50 lattice_0.19-30 > ? ?>> > ? ?>> loaded via a namespace (and not attached): [1] grid_2.14.0 > ? ?>> ?tools_2.14.0 > ? ?>> > ? ?>> Cheers, > ? ?>> > ? ?>> Josh > ? ?>> > ? ?>> On Wed, Aug 3, 2011 at 4:26 PM, C6H5NO2 > ? ?>> wrote: > ? ?>>> Hello R users, > ? ?>>> > ? ?>>> I am trying to give the QR decomposition for a large sparse > ? ?>>> matrix in the format of dgCMatrix. When I run qr function for > ? ?>>> this matrix, the R session simply stops and exits to the > ? ?>>> shell. ?The matrix is of size 108595x108595, and it has > ? ?>>> 4866885 non-zeros. I did the experiment on windows 7 and linux > ? ?>>> mint 11 (both 64 bit), and the results are the same. > ? ?>>> > ? ?>>> I have uploaded my data file to > ? ?>>> http://ifile.it/elf2p6z/A.RData . The file is 10.681 MB and I > ? ?>>> hope someone could kindly download it. ?The code to see my > ? ?>>> problem is: > > ? ?>>> ?library(Matrix) > ? ?>>> ?load("A.RData") > ? ?>>> ?B <- qr(A) > > > ? ?>>> Best wishes, C6 > > And what's the size of RAM your two computers have ?? > The answer is of quite some importance. > > Short answer: If you have a large very sparse matrix, > you don't know if the QR decomposition of that matrix is also > very sparse... and if it ain't it will blow up memory, > and that's what I'm pretty sure happened with you. > > What I don't see is why R "simply stops" for you and does not > through a an error message about insufficient memory. > As I show below, I do get a seg.fault > --- which may be considered a bug --- > *BUT* I do get the message about memory problems. > Did you really *not* get any such message? > Is it because you've used a GUI that hides such valuable > information from the user? > > Here's the more detailed reason / analysis about why the above > "kills R". This is commented R code, > you can cut paste after you've got 'A' : > > str(A) > ## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots > ## ? ..@ i ? ? ? : int [1:4866885] 0 1 2 16 32 33 2392 2417 0 1 ... > ## ? ..@ p ? ? ? : int [1:108596] 0 8 21 35 44 51 59 63 69 78 ... > ## ? ..@ Dim ? ? : int [1:2] 108595 108595 > ## ? ..@ Dimnames:List of 2 > ## ? .. ..$ : NULL > ## ? .. ..$ : NULL > ## ? ..@ x ? ? ? : num [1:4866885] 140.03 14.79 14.79 1.78 1.78 ... > ## ? ..@ factors : list() > > system.time(# the following is still not as fast as it could be): > isSymmetric(A) # yes ! > )# 1.13 {the *2nd* time; 1.9 the 1st time !} > > ## First work with a ?submatrix: > n <- 10000 > A1 <- A[1:n, 1:n] > system.time( > ? ? ? ? ? ?qr1 <- qr(A1))# on cmath-8, machine with 48 GB ?RAM memory > ## ? user ?system elapsed > ## 59.884 ? 0.316 ?60.240 ?!! > > str(qr1) > ## Formal class 'sparseQR' [package "Matrix"] with 6 slots > ## ? ..@ V ? :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots > ## ? .. .. ..@ i ? ? ? : int [1:7692948] 0 1 2 3 4 3 4 5 4 5 ... > ## ? .. .. ..@ p ? ? ? : int [1:10001] 0 1 2 5 8 10 11 12 18 26 ... > ## ? .. .. ..@ Dim ? ? : int [1:2] 10000 10000 > ## ? .. .. ..@ Dimnames:List of 2 > ## ? .. .. .. ..$ : NULL > ## ? .. .. .. ..$ : NULL > ## ? .. .. ..@ x ? ? ? : num [1:7692948] 1 1 -3.71 8.68 -8.68 ... > ## ? .. .. ..@ factors : list() > ## ? ..@ beta: num [1:10000] 0 0 0.01216 0.00743 0.01758 ... > ## ? ..@ p ? : int [1:10000] 61 62 63 64 94 80 161 162 163 164 ... > ## ? ..@ R ? :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots > ## ? .. .. ..@ i ? ? ? : int [1:13581659] 0 1 2 2 3 3 4 2 3 4 ... > ## ? .. .. ..@ p ? ? ? : int [1:10001] 0 1 2 3 5 7 11 12 13 15 ... > ## ? .. .. ..@ Dim ? ? : int [1:2] 10000 10000 > ## ? .. .. ..@ Dimnames:List of 2 > ## ? .. .. .. ..$ : NULL > ## ? .. .. .. ..$ : NULL > ## ? .. .. ..@ x ? ? ? : num [1:13581659] 370.37 3.47 22.14 12.48 17.12 ... > ## ? .. .. ..@ factors : list() > ## ? ..@ q ? : int [1:10000] 61 62 63 64 80 94 161 162 163 164 ... > ## ? ..@ Dim : int [1:2] 10000 10000 > > object.size(A1) > ## ? 4352184 bytes > object.size(qr1) > ## 255539456 bytes, i.e. 255 MB > c(object.size(qr1) / object.size(A1)) ## 58.715 > c(object.size(A) ? / object.size(A1)) ## 13.52 > ##--> "predicted size" of ?qr(A): > c(object.size(A) / object.size(A1))*object.size(qr1) > ## ~ 3 G Bytes > > n <- 20000 > A2 <- A[1:n, 1:n] > system.time( > ? ? ? ? ? ?qr2 <- qr(A2))# on cmath-8, machine with 48 GB ?RAM memory > ## ? ? user ? system ?elapsed > ## 1024.068 ? ?2.850 1027.488 ?-- 17 minutes > > object.size(A2) > ## ? 8504464 bytes > object.size(qr2) > ## 1432'809992 bytes, i.e. 1432.81 MBytes > c(object.size(qr2) / object.size(A2)) ## 168.4774 > ##--> "predicted size" of ?qr(A): > c(object.size(A) / object.size(A2) *object.size(qr2)) > ## 9912'944757 == 9912.944757 MBytes ~= 10 GBytes --- this will not fit! > > ## Ok: one step further > n <- 30000 > A3 <- A[1:n, 1:n] > system.time( > ? ? ? ? ? ?qr3 <- qr(A3))# on cmath-8, machine with 48 GB ?RAM memory > ## ? ? user ? system ?elapsed > ## 3384.183 ? 32.234 3418.392 -- almost one hour ! > > object.size(A3) > ## ? 11'335112 bytes > object.size(qr3) > ## 3059'252216 bytes, i.e. 3059 MBytes > c(object.size(qr3) / object.size(A3)) ## 269.9 > ##--> "predicted size" of ?qr(A): > c(object.size(A) / object.size(A3) *object.size(qr3)) > ## ?1.588e+10 --- ~ 15 GB -- this is *MORE* than an R object can contain: > .Machine$integer.max > ## 2147483647 = 2.147'483'647 e9 > > system.time(ch1 <- chol(A1)) > ## CHOLMOD warning: > ## Error in .local(x, ...) : CHOLMOD factorization was unsuccessful > ## In addition: Warning message: > ## In .local(x, ...) : > ## ? Cholmod warning 'not positive definite' at file:../Cholesky/t_cholmod_rowfac.c, > > system.time(lu1 <- lu(A1)) > ## Error ... : cs_lu(A) failed: near-singular A (or out of memory) > > ##--- Ok, now try the full thing, and see if ?"R dies without a word" > ## or if it at least says something before death : > system.time( > ? ? ? ? ? ?qrA <- qr(A) > ) > ## > ## ?*** caught segfault *** > ## address 0x7f3f5f48cf70, cause 'memory not mapped' > ## /u/maechler/bin/R_arg: line 137: 15063 Segmentation fault ? ? ?$exe $@ > From pllcc023 at gmail.com Thu Aug 4 17:22:03 2011 From: pllcc023 at gmail.com (Paola Lecca) Date: Thu, 4 Aug 2011 17:22:03 +0200 Subject: [R] use of modMCMC Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dimitri.liakhovitski at gmail.com Thu Aug 4 17:24:24 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Thu, 4 Aug 2011 11:24:24 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? Message-ID: Hello! I have a data set: set.seed(123) y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01-31"),by="week")) y$var1<-c(1,2,3,round(rnorm(54),1)) y$var2<-c(10,20,30,round(rnorm(54),1)) # All I need is to create lagged variables for var1 and var2. I looked around a bit and found several ways of doing it. They all seem quite complicated - while in SPSS it's just a few letters (like LAG()). Here is what I've written but I wonder. It works - but maybe there is a very simple way of doing it in R that I could not find? I need the same for "lead" (opposite of lag). Any hint is greatly appreciated! ### The function I created: mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame temp<-as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] for(i in 1:length(temp)){ names(temp)[i]<-paste(names(x),".lag",i,sep="") } return(temp) } ### Running mylag to get my result: myvars<-c("var1","var2") for(i in myvars) { y<-cbind(y,mylag(y[i]),max.lag=2) } (y) -- Dimitri Liakhovitski marketfusionanalytics.com From Xiao.Hu at biogenidec.com Thu Aug 4 16:03:20 2011 From: Xiao.Hu at biogenidec.com (Xiao Hu) Date: Thu, 4 Aug 2011 14:03:20 +0000 Subject: [R] error bar plot with log scale in lattice Message-ID: <2C96AE6E40E4274B9E2AD48076C02C8BE7C7CC@biommmbx01.resource.corp.biogen.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Thu Aug 4 17:45:14 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 11:45:14 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pllcc023 at gmail.com Thu Aug 4 17:49:12 2011 From: pllcc023 at gmail.com (Paola Lecca) Date: Thu, 4 Aug 2011 17:49:12 +0200 Subject: [R] use of modMCMC Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Thu Aug 4 19:36:26 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 13:36:26 -0400 Subject: [R] special recursive filter In-Reply-To: <20110801120614.79380@gmx.net> References: <20110729151621.223570@gmx.net> <20110801120614.79380@gmx.net> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Thu Aug 4 20:20:17 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 14:20:17 -0400 Subject: [R] identifying weeks (dates) that certain days (dates) fall into In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dimitri.liakhovitski at gmail.com Thu Aug 4 20:44:54 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Thu, 4 Aug 2011 14:44:54 -0400 Subject: [R] identifying weeks (dates) that certain days (dates) fall into In-Reply-To: References: Message-ID: Michael, thanks a lot! Really appreciate it - what wasn't hard for you would be for me! Dimitri On Thu, Aug 4, 2011 at 2:20 PM, R. Michael Weylandt wrote: > You are getting 105 because the default behavior of findInterval? is such > that v[N+1] := + Inf (as noted in ? findInterval); that is, the last > interval is actually taken to stretch from the final element of sbwl.dates > unto eternity. It shouldn't be hard to write a catch line to set that to > whatever you want for it to go to. > > E.g., > > myFindInterval <- function(x, vec, rightmost.closed = FALSE, all.inside = > FALSE) { > ??? FI = findInterval(x,vec,rightmost.closed, all.inside) > ??? FI[FI==length(vec)] <- 0 # or Inf or whatever > ??? return(FI) > } > > Michael Weylandt > > > On Thu, Aug 4, 2011 at 9:26 AM, Dimitri Liakhovitski > wrote: >> >> Sorry for renewing the topoic. I thought it worked but now I've run >> into a little problem: >> >> ?# My data frame with dates for week starts (Mondays) >> y<-data.frame(week=seq(as.Date("2009-12-28"), >> as.Date("2011-12-26"),by="week") ) >> >> # I have a vector of super bowl dates (including the future one for 2012): >> >> sbwl.dates<-as.Date(c("2005-02-06","2006-02-05","2007-02-04","2008-02-03","2009-02-01","2010-02-07","2011-02-06","2012-02-05")) >> I want to find the weeks in y that contain super bowl dates for >> applicable years. I am trying: >> sbwl.weeks<-findInterval(sbwl.dates, y$week) >> sbwl.weeks<-sbwl.weeks[sbwl.weeks>0] >> (sbwl.weeks) >> > 6 58 105 >> y$flag<-0 >> y$flag[sbwl.weeks]<-1 >> >> 6 and 58 are correct. But why am I getting 105 (the last row)? >> Any way to fix it? >> Thanks a lot! >> Dimitri >> >> >> >> On Tue, Aug 2, 2011 at 12:57 PM, Dimitri Liakhovitski >> wrote: >> > Thanks a lot, everyone! >> > Dimitri >> > >> > On Tue, Aug 2, 2011 at 12:34 PM, Dennis Murphy >> > wrote: >> >> Hi: >> >> >> >> You could try the lubridate package: >> >> >> >> library(lubridate) >> >> week(weekly$week) >> >> week(july4) >> >> [1] 27 27 >> >> >> >>> week >> >> function (x) >> >> yday(x)%/%7 + 1 >> >> >> >> >> >> which is essentially Gabor's code :) >> >> >> >> HTH, >> >> Dennis >> >> >> >> On Tue, Aug 2, 2011 at 7:36 AM, Dimitri Liakhovitski >> >> wrote: >> >>> Hello! >> >>> >> >>> I have dates for the beginning of each week, e.g.: >> >>> weekly<-data.frame(week=seq(as.Date("2010-04-01"), >> >>> as.Date("2011-12-26"),by="week")) >> >>> week ?# each week starts on a Monday >> >>> >> >>> I also have a vector of dates I am interested in, e.g.: >> >>> july4<-as.Date(c("2010-07-04","2011-07-04")) >> >>> >> >>> I would like to flag the weeks in my weekly$week that contain those 2 >> >>> individual dates. >> >>> I can only think of a very clumsy way of doing it: >> >>> >> >>> myrows<-c(which(weekly$week==weekly$week[weekly$week>july4[1]][1]-7), >> >>> ? ? ? ?which(weekly$week==weekly$week[weekly$week>july4[2]][1]-7)) >> >>> weekly$flag<-0 >> >>> weekly$flag[myrows]<-1 >> >>> >> >>> It's clumsy - because actually, my vector of dates of interest (july4 >> >>> above) is much longer. >> >>> Is there maybe a more elegant way of doing it? >> >>> Thank you! >> >>> -- >> >>> Dimitri Liakhovitski >> >>> marketfusionanalytics.com >> >>> >> >>> ______________________________________________ >> >>> R-help at r-project.org mailing list >> >>> https://stat.ethz.ch/mailman/listinfo/r-help >> >>> PLEASE do read the posting guide >> >>> http://www.R-project.org/posting-guide.html >> >>> and provide commented, minimal, self-contained, reproducible code. >> >>> >> >> >> > >> > >> > >> > -- >> > Dimitri Liakhovitski >> > marketfusionanalytics.com >> > >> >> >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- Dimitri Liakhovitski marketfusionanalytics.com From arrayprofile at yahoo.com Thu Aug 4 20:44:58 2011 From: arrayprofile at yahoo.com (array chip) Date: Thu, 4 Aug 2011 11:44:58 -0700 (PDT) Subject: [R] survival probability estimate method Message-ID: <1312483498.80250.YahooMailNeo@web125814.mail.ne1.yahoo.com> Hi, I was reading a paper published in JCO "Prediction of risk of distant recurrence using 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to estimate the 9-year risk of distant recurrence as a function of continuous recurrence score (RS). The method is special as author states: ? "To define the continuous relation between RS, as a linear covariate, and 9-year risk of distant recurrence, the logarithm of the baseline cumulative hazard function was fitted by constrained cubic splines with 3 df. These models tend to be more robust for prediction of survival probabilities and corresponding confidence limits at late follow-up time as a result of the modeling of the baseline cumulative hazard function by natural cubic splines (in contrast to using the crude hazard function itself)." ? Does R provide a package/function to do this particular method for estimating survival probability as a function of a continuous variable? Is the survest.cph() in rms package doing estimation with just the crude hazard function? ? Thanks very much! ? John From dimitri.liakhovitski at gmail.com Thu Aug 4 20:46:58 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Thu, 4 Aug 2011 14:46:58 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: Thanks a lot, guys! It's really helpful. But - to be objective- it's still quite a few lines longer than in SPSS. Dimitri On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund wrote: > > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> On Behalf Of Dimitri Liakhovitski >> Sent: Thursday, August 04, 2011 8:24 AM >> To: r-help >> Subject: [R] Efficient way of creating a shifted (lagged) variable? >> >> Hello! >> >> I have a data set: >> set.seed(123) >> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01- >> 31"),by="week")) >> y$var1<-c(1,2,3,round(rnorm(54),1)) >> y$var2<-c(10,20,30,round(rnorm(54),1)) >> >> # All I need is to create lagged variables for var1 and var2. I looked >> around a bit and found several ways of doing it. They all seem quite >> complicated - while in SPSS it's just a few letters (like LAG()). Here >> is what I've written but I wonder. It works - but maybe there is a >> very simple way of doing it in R that I could not find? >> I need the same for "lead" (opposite of lag). >> Any hint is greatly appreciated! >> >> ### The function I created: >> mylag <- function(x,max.lag=1){ ? # x has to be a 1-column data frame >> ? ?temp<- >> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] >> ? ?for(i in 1:length(temp)){ >> ? ? ?names(temp)[i]<-paste(names(x),".lag",i,sep="") >> ? ? } >> ? return(temp) >> } >> >> ### Running mylag to get my result: >> myvars<-c("var1","var2") >> for(i in myvars) { >> ? y<-cbind(y,mylag(y[i]),max.lag=2) >> } >> (y) >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> > > Dimitri, > > I would first look into the zoo package as has already been suggested. ?However, if you haven't already got your solution then here are a couple of functions that might help you get started. ?I won't vouch for efficiency. > > > lag.fun <- function(df, x, max.lag=1) { > ?for(i in x) { > ? ?for(j in 1:max.lag){ > ? ? ?lagx <- paste(i,'.lag',j,sep='') > ? ? ?df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i]) > ? ?} > ?} > ?df > } > > lead.fun <- function(df, x, max.lead=1) { > ?for(i in x) { > ? ?for(j in 1:max.lead){ > ? ? ?leadx <- paste(i,'.lead',j,sep='') > ? ? ?df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j)) > ? ?} > ?} > ?df > } > > y <- lag.fun(y,myvars,2) > y <- lead.fun(y,myvars,2) > > > Hope this is helpful, > > Dan > > Daniel Nordlund > Bothell, WA USA > > > -- Dimitri Liakhovitski marketfusionanalytics.com From michael.weylandt at gmail.com Thu Aug 4 21:01:38 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 15:01:38 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pjmiller_57 at yahoo.com Thu Aug 4 21:03:50 2011 From: pjmiller_57 at yahoo.com (Paul Miller) Date: Thu, 4 Aug 2011 12:03:50 -0700 (PDT) Subject: [R] Multiple endpoint (possibly group sequential) sample size calculation In-Reply-To: Message-ID: <1312484630.13217.YahooMailClassic@web161616.mail.bf1.yahoo.com> Hello everyone, I need to do a sample size calculation. The study two arms and two endpoints. The two arms are two different cancer drugs and the two endpoints reflect efficacy (based on progression free survival) and toxicity. Until now, I have been trying to understand this in terms of a one-arm design, where the acceptable rate of efficacy might be 0.40, the unacceptable rate of efficacy might be 0.20, the acceptable rate of non-toxicity might be 0.85, and the unacceptable rate of non-toxicity might be 0.65. Then one would pick an alpha for the probability of accepting a poor response, an alpha for the probability of accepting a toxic drug, and a beta for the probability of rejecting a good drug. I'm not really sure how that sort of thinking translates into a two-arm design though. Ideally, I'd like the calculation to be based on a group sequential design with two stages, but that's certainly not necessary, and I'd be very happy to learn how to do things both with and without this extra element. Any help with this would be greatly appreciated. Thanks, Paul From gleynes+r at gmail.com Thu Aug 4 21:20:38 2011 From: gleynes+r at gmail.com (Gene Leynes) Date: Thu, 4 Aug 2011 14:20:38 -0500 Subject: [R] General indexing in multidimensional arrays In-Reply-To: <4E3A7561.2030102@yahoo.de> References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> <4E3A7561.2030102@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From djnordlund at frontier.com Thu Aug 4 21:22:41 2011 From: djnordlund at frontier.com (Daniel Nordlund) Date: Thu, 4 Aug 2011 12:22:41 -0700 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: > -----Original Message----- > From: Dimitri Liakhovitski [mailto:dimitri.liakhovitski at gmail.com] > Sent: Thursday, August 04, 2011 11:47 AM > To: Daniel Nordlund; r-help > Subject: Re: [R] Efficient way of creating a shifted (lagged) variable? > > Thanks a lot, guys! > It's really helpful. But - to be objective- it's still quite a few > lines longer than in SPSS. > Dimitri > > On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund > wrote: > > > > > >> -----Original Message----- > >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] > >> On Behalf Of Dimitri Liakhovitski > >> Sent: Thursday, August 04, 2011 8:24 AM > >> To: r-help > >> Subject: [R] Efficient way of creating a shifted (lagged) variable? > >> > >> Hello! > >> > >> I have a data set: > >> set.seed(123) > >> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01- > >> 31"),by="week")) > >> y$var1<-c(1,2,3,round(rnorm(54),1)) > >> y$var2<-c(10,20,30,round(rnorm(54),1)) > >> > >> # All I need is to create lagged variables for var1 and var2. I looked > >> around a bit and found several ways of doing it. They all seem quite > >> complicated - while in SPSS it's just a few letters (like LAG()). Here > >> is what I've written but I wonder. It works - but maybe there is a > >> very simple way of doing it in R that I could not find? > >> I need the same for "lead" (opposite of lag). > >> Any hint is greatly appreciated! > >> > >> ### The function I created: > >> mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame > >> temp<- > >> > as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] > >> for(i in 1:length(temp)){ > >> names(temp)[i]<-paste(names(x),".lag",i,sep="") > >> } > >> return(temp) > >> } > >> > >> ### Running mylag to get my result: > >> myvars<-c("var1","var2") > >> for(i in myvars) { > >> y<-cbind(y,mylag(y[i]),max.lag=2) > >> } > >> (y) > >> > >> -- > >> Dimitri Liakhovitski > >> marketfusionanalytics.com > >> > > > > Dimitri, > > > > I would first look into the zoo package as has already been suggested. > However, if you haven't already got your solution then here are a couple > of functions that might help you get started. I won't vouch for > efficiency. > > > > > > lag.fun <- function(df, x, max.lag=1) { > > for(i in x) { > > for(j in 1:max.lag){ > > lagx <- paste(i,'.lag',j,sep='') > > df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i]) > > } > > } > > df > > } > > > > lead.fun <- function(df, x, max.lead=1) { > > for(i in x) { > > for(j in 1:max.lead){ > > leadx <- paste(i,'.lead',j,sep='') > > df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j)) > > } > > } > > df > > } > > > > y <- lag.fun(y,myvars,2) > > y <- lead.fun(y,myvars,2) > > > > > > > > > > > > Dimitri, I (and probably a lot of others on the list) don't know SPSS anymore. I haven't used it in 30 years. So, I don't know how you would use LAG() in SPSS to achieve what you want, and you didn't give us any example of how you would like to be able to use a lag function in your code. Without at least some pseudo code demonstrating the simple usage you are looking for, it is hard to give you code that works the way you want. That being said, you can always use SPSS. Dan Daniel Nordlund Bothell, WA USA From phhs80 at gmail.com Thu Aug 4 21:23:05 2011 From: phhs80 at gmail.com (Paul Smith) Date: Thu, 4 Aug 2011 20:23:05 +0100 Subject: [R] Automatic creation of binary logistic models Message-ID: Dear All, Suppose that you are trying to create a binary logistic model by trying different combinations of predictors. Has R got an automatic way of doing this, i.e., is there some way of automatically generating different tentative models and checking their corresponding AIC value? If so, could you please direct me to an example? Thanks in advance, Paul From anopheles123 at gmail.com Thu Aug 4 21:28:05 2011 From: anopheles123 at gmail.com (Weidong Gu) Date: Thu, 4 Aug 2011 15:28:05 -0400 Subject: [R] labelling a stacked barchart (lattice) In-Reply-To: <2038A337-718D-4F44-B468-93BF366FA0E8@comcast.net> References: <2038A337-718D-4F44-B468-93BF366FA0E8@comcast.net> Message-ID: Marc, Try this one barchart(data=dta, ~x, group=y, stack=T,col=sort(brewer.pal(7,"Purples")), xlab="Percent", box.width=.5, scales=list(tick.number=10), panel=function(x,y,...){ panel.barchart(x,y,...) panel.text(cumsum(x)-dta$x/2,y,labels=dta$x) panel.text(cumsum(x)-dta$x/2,1.3,labels=as.character(dta$y)) }) Weidong Gu On Wed, Aug 3, 2011 at 9:01 PM, M/K Zodet wrote: > All: > > Below is my code for creating a basic horizontal, stacked barchart. ?I'd like to label the plot in two ways: ?1) place the x values in each piece and 2) place the y values above each piece (angled). ?I'm currently using lattice, but I'm open to suggestions using ggplot2. > > Questions: > > 1. ?Can this be done?...I assume yes. ?So, what are the best options/functions for doing this. > 2. ?Is there a way to alter the transparency of the bar fill with the brewer palette? ?I know I can alter this w/ heat.~, topo.~, cm.colors, etc. > > Thanks in advance. > > Marc > Using R for Mac OS X GUI 1.40-devel Leopard build 64-bit > > > dta <- data.frame(x=c(46.0, 14.7, 16.4, 15.8, 7.0), y=c("Back", "Neck", "Extrem", "MuscSkel", "Oth")) > dta > barchart(data=dta, ~x, group=y, stack=T, col=sort(brewer.pal(7,"Purples")), > ? ? ? ? xlab="Percent", box.width=.5, scales=list(tick.number=10)) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jeremy.miles at gmail.com Thu Aug 4 21:28:23 2011 From: jeremy.miles at gmail.com (Jeremy Miles) Date: Thu, 4 Aug 2011 12:28:23 -0700 Subject: [R] Automatic creation of binary logistic models In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From emammendes at gmail.com Thu Aug 4 21:40:58 2011 From: emammendes at gmail.com (Eduardo M. A. M. Mendes) Date: Thu, 4 Aug 2011 16:40:58 -0300 Subject: [R] Sweave - landscape figure Message-ID: <922E1595-6562-4726-B5EB-9C3DA4BC6058@gmail.com> Dear R-users I am trying to understand how Sweave works by running some simple examples. In the example I am working with there is a chunk where the R-commands related to plotting a figure are placed. When running R CMD Sweave ? , pdflatex the output is a portrait figure. I wonder whether it would be possible to change the orientation to landscape (not in the latex file but in Rnw file). Many thanks Ed From michael.weylandt at gmail.com Thu Aug 4 21:58:44 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 15:58:44 -0400 Subject: [R] General indexing in multidimensional arrays In-Reply-To: <4E3A7561.2030102@yahoo.de> References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> <4E3A7561.2030102@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From murdoch.duncan at gmail.com Thu Aug 4 21:58:12 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Thu, 04 Aug 2011 15:58:12 -0400 Subject: [R] Sweave - landscape figure In-Reply-To: <922E1595-6562-4726-B5EB-9C3DA4BC6058@gmail.com> References: <922E1595-6562-4726-B5EB-9C3DA4BC6058@gmail.com> Message-ID: <4E3AF9D4.9010605@gmail.com> On 04/08/2011 3:40 PM, Eduardo M. A. M. Mendes wrote: > Dear R-users > > I am trying to understand how Sweave works by running some simple examples. In the example I am working with there is a chunk where the R-commands related to plotting a figure are placed. When running R CMD Sweave ? , pdflatex the output is a portrait figure. I wonder whether it would be possible to change the orientation to landscape (not in the latex file but in Rnw file). > Sweave can change the height and width of the figure so it is more landscape-shaped (width > height) using options at the start of the chunk. Rotating a figure is something LaTeX needs to do: you would tell Sweave to produce the figure but not include it, then use \includegraphics{} with the right option to rotate it. For example: <>= plot(rnorm(100)) @ \includegraphics[angle=90,width=0.8\textheight]{Myfig} This is untested, and you'll need to consult a LaTeX reference for rotating the figure caption, etc. Duncan Murdoch From jwiley.psych at gmail.com Thu Aug 4 22:02:02 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Thu, 4 Aug 2011 13:02:02 -0700 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: On Aug 4, 2011, at 11:46, Dimitri Liakhovitski wrote: > Thanks a lot, guys! > It's really helpful. But - to be objective- it's still quite a few > lines longer than in SPSS. Not once you've sources the function! For the simple case of a vector, try: X <- 1:10 mylag2 <- function(X, lag) { c(rep(NA, length(seq(lag))), X[-seq(lag)]) } Though this does not work for lead, it is fairly short. Then you could use the *apply family if you needed it on multiple columns or vectors. Cheers, Josh > Dimitri > > On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund wrote: >> >> >>> -----Original Message----- >>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >>> On Behalf Of Dimitri Liakhovitski >>> Sent: Thursday, August 04, 2011 8:24 AM >>> To: r-help >>> Subject: [R] Efficient way of creating a shifted (lagged) variable? >>> >>> Hello! >>> >>> I have a data set: >>> set.seed(123) >>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01- >>> 31"),by="week")) >>> y$var1<-c(1,2,3,round(rnorm(54),1)) >>> y$var2<-c(10,20,30,round(rnorm(54),1)) >>> >>> # All I need is to create lagged variables for var1 and var2. I looked >>> around a bit and found several ways of doing it. They all seem quite >>> complicated - while in SPSS it's just a few letters (like LAG()). Here >>> is what I've written but I wonder. It works - but maybe there is a >>> very simple way of doing it in R that I could not find? >>> I need the same for "lead" (opposite of lag). >>> Any hint is greatly appreciated! >>> >>> ### The function I created: >>> mylag <- function(x,max.lag=1){ # x has to be a 1-column data frame >>> temp<- >>> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] >>> for(i in 1:length(temp)){ >>> names(temp)[i]<-paste(names(x),".lag",i,sep="") >>> } >>> return(temp) >>> } >>> >>> ### Running mylag to get my result: >>> myvars<-c("var1","var2") >>> for(i in myvars) { >>> y<-cbind(y,mylag(y[i]),max.lag=2) >>> } >>> (y) >>> >>> -- >>> Dimitri Liakhovitski >>> marketfusionanalytics.com >>> >> >> Dimitri, >> >> I would first look into the zoo package as has already been suggested. However, if you haven't already got your solution then here are a couple of functions that might help you get started. I won't vouch for efficiency. >> >> >> lag.fun <- function(df, x, max.lag=1) { >> for(i in x) { >> for(j in 1:max.lag){ >> lagx <- paste(i,'.lag',j,sep='') >> df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i]) >> } >> } >> df >> } >> >> lead.fun <- function(df, x, max.lead=1) { >> for(i in x) { >> for(j in 1:max.lead){ >> leadx <- paste(i,'.lead',j,sep='') >> df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j)) >> } >> } >> df >> } >> >> y <- lag.fun(y,myvars,2) >> y <- lead.fun(y,myvars,2) >> >> >> Hope this is helpful, >> >> Dan >> >> Daniel Nordlund >> Bothell, WA USA >> >> >> > > > > -- > Dimitri Liakhovitski > marketfusionanalytics.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From antonio at piccolboni.info Thu Aug 4 20:04:15 2011 From: antonio at piccolboni.info (Antonio Piccolboni) Date: Thu, 4 Aug 2011 11:04:15 -0700 Subject: [R] input equivalent of capture.output Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From armstrwa at bc.edu Thu Aug 4 20:07:09 2011 From: armstrwa at bc.edu (William Armstrong) Date: Thu, 4 Aug 2011 11:07:09 -0700 (PDT) Subject: [R] Limited number of principal components in PCA In-Reply-To: <011301cc506b$5060b6a0$f12223e0$@edu> References: <1311964387395-3704956.post@n4.nabble.com> <011301cc506b$5060b6a0$f12223e0$@edu> Message-ID: <1312481229763-3719440.post@n4.nabble.com> David and Josh, Thank you for the suggestions. I have attached a file ('q_values.txt') that contains the values of the 'Q' variable. David -- I am attempting an 'S' mode PCA, where the columns are actually the cases (different stream gaging stations) and the rows are the variables (the maximum flow at each station for a given year). I think the format you are referring to is 'R' mode, but I was under the impression that R (the program, not the PCA mode) could handle the analyses in either format. Am I mistaken? My first eigenvalue is: > unrotated_pca_q$sdev[1]^2 [1] 17.77812 Does that value seem large enough to explain the reduction in principal components from 65 to 54? Also, the loadings on the first PC are not particularly high: > max(abs(unrotated_pca_q$rotation[1:84])) [1] 0.1794776 Does that suggest that maybe the data are not very highly correlated? Thank you both very much for your help. Billy http://r.789695.n4.nabble.com/file/n3719440/q_values.txt q_values.txt -- View this message in context: http://r.789695.n4.nabble.com/Limited-number-of-principal-components-in-PCA-tp3704956p3719440.html Sent from the R help mailing list archive at Nabble.com. From PJ at DCE.AU.DK Thu Aug 4 22:09:40 2011 From: PJ at DCE.AU.DK (Peter Jepsen) Date: Thu, 4 Aug 2011 22:09:40 +0200 Subject: [R] survival probability estimate method In-Reply-To: <1312483498.80250.YahooMailNeo@web125814.mail.ne1.yahoo.com> References: <1312483498.80250.YahooMailNeo@web125814.mail.ne1.yahoo.com> Message-ID: <3C94838831897E459067E57AA51295AB0929F36800@SIF.svf.au.dk> Dear John I am not aware of an R package that does this, but I believe that Patrick Royston's -stpm- function for Stata does. Here's two references found in http://www.stata-journal.com/sjpdf.html?articlenum=st0001_2: Royston, P. 2001. Flexible parametric alternatives to the Cox model. Stata Journal 1(1): 1-28. Royston, P. and M. K. B. Parmar. 2002. Flexible parametric-hazards and proportional odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21: 2175-2197. Best regards, Peter. -----Oprindelig meddelelse----- Fra: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] P? vegne af array chip Sendt: 4. august 2011 20:45 Til: r-help Emne: [R] survival probability estimate method Hi, I was reading a paper published in JCO "Prediction of risk of distant recurrence using 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to estimate the 9-year risk of distant recurrence as a function of continuous recurrence score (RS). The method is special as author states: ? "To define the continuous relation between RS, as a linear covariate, and 9-year risk of distant recurrence, the logarithm of the baseline cumulative hazard function was fitted by constrained cubic splines with 3 df. These models tend to be more robust for prediction of survival probabilities and corresponding confidence limits at late follow-up time as a result of the modeling of the baseline cumulative hazard function by natural cubic splines (in contrast to using the crude hazard function itself)." ? Does R provide a package/function to do this particular method for estimating survival probability as a function of a continuous variable? Is the survest.cph() in rms package doing estimation with just the crude hazard function? ? Thanks very much! ? John ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From marc_schwartz at me.com Thu Aug 4 22:09:24 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Thu, 04 Aug 2011 15:09:24 -0500 Subject: [R] Sweave - landscape figure In-Reply-To: <922E1595-6562-4726-B5EB-9C3DA4BC6058@gmail.com> References: <922E1595-6562-4726-B5EB-9C3DA4BC6058@gmail.com> Message-ID: On Aug 4, 2011, at 2:40 PM, Eduardo M. A. M. Mendes wrote: > Dear R-users > > I am trying to understand how Sweave works by running some simple examples. In the example I am working with there is a chunk where the R-commands related to plotting a figure are placed. When running R CMD Sweave ? , pdflatex the output is a portrait figure. I wonder whether it would be possible to change the orientation to landscape (not in the latex file but in Rnw file). > > Many thanks > > Ed You can use the lscape package by placing: \usepackage{lscape} in your LaTeX preamble in the .Rnw file. Then use: \begin{landscape} other code here \end{landscape} That way you can create a landscape oriented page within a document that might otherwise contain portrait orientation pages. HTH, Marc Schwartz From bodinsoul at gmail.com Thu Aug 4 17:55:05 2011 From: bodinsoul at gmail.com (Bogdan Lataianu) Date: Thu, 4 Aug 2011 08:55:05 -0700 (PDT) Subject: [R] How to see the previous commands after save workspace/load workspace ? Message-ID: <4071d228-64dd-4bd7-b3cf-36c2e5c1b7d6@j15g2000yqf.googlegroups.com> I did save workspace and when I load it, I can see the variables, using ls(). But I cannot see the commands from the program I saved. How to do that? From chihlin.chi at gmail.com Thu Aug 4 19:52:32 2011 From: chihlin.chi at gmail.com (Chih-Lin Chi) Date: Thu, 4 Aug 2011 13:52:32 -0400 Subject: [R] Apply sensitivity library to dummy variables Message-ID: I am using SRC function in?the sensitivity library. The independent variables (X) are a few dummy variables and the dependent variable (y) is numeric, ranged from 0,18 to 0.84. Example of independent variables are red coded as [0,0,0,1], green [0,0,1,0], blue [0,1,0,0], and yellow [1,0,0,0]. Error message occurs "Error in if (const(t, min(1e-08, mean(t, na.rm = TRUE)/1e+06))) { : missing value where TRUE/FALSE needed". Any suggestion? If I choose not to code red [0,0,0], and code only three groups, green [0,0,1], blue [0,1,0], and yellow [1,0,0]. SRC works. Do I interpret the result like "compared to red, the SRC score for green is 0.8"? Thanks for your help! From phhs80 at gmail.com Thu Aug 4 22:14:54 2011 From: phhs80 at gmail.com (Paul Smith) Date: Thu, 4 Aug 2011 21:14:54 +0100 Subject: [R] Automatic creation of binary logistic models In-Reply-To: References: Message-ID: Wonderful! Thanks, Jeremy. Is bestglm() also able of trying nonlinear transformations of the variables, say log(X1) for instance? Paul On Thu, Aug 4, 2011 at 8:28 PM, Jeremy Miles wrote: > > Sounds like you want a best subsets regression, the bestglm() function, > found in the bestglm() package will do the trick. > Jeremy > > On 4 August 2011 12:23, Paul Smith wrote: >> >> Dear All, >> >> Suppose that you are trying to create a binary logistic model by >> trying different combinations of predictors. Has R got an automatic >> way of doing this, i.e., is there some way of automatically generating >> different tentative models and checking their corresponding AIC value? >> If so, could you please direct me to an example? >> >> Thanks in advance, >> >> Paul >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > From padmanabhan.vijayan at gmail.com Thu Aug 4 18:03:43 2011 From: padmanabhan.vijayan at gmail.com (Vijayan Padmanabhan) Date: Thu, 4 Aug 2011 21:33:43 +0530 Subject: [R] random value generation with added constraint. Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dimitri.liakhovitski at gmail.com Thu Aug 4 22:19:06 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Thu, 4 Aug 2011 16:19:06 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: Thanks a lot for the recommendations - some of them I am implementing already. Just a clarification: the only reason I try to compare things to SPSS is that I am the only person in my office using R. Whenever I work on an R code my goal is not just to make it work, but also to "boast" to the SPSS users that it's much easier/faster/niftier in R. So, you are preaching to the choir here. Dimitri On Thu, Aug 4, 2011 at 4:02 PM, Joshua Wiley wrote: > > > On Aug 4, 2011, at 11:46, Dimitri Liakhovitski wrote: > >> Thanks a lot, guys! >> It's really helpful. But - to be objective- it's still quite a few >> lines longer than in SPSS. > > Not once you've sources the function! ?For the simple case of a vector, try: > > X <- 1:10 > mylag2 <- function(X, lag) { > ?c(rep(NA, length(seq(lag))), X[-seq(lag)]) > } > > Though this does not work for lead, it is fairly short. Then you could use the *apply family if you needed it on multiple columns or vectors. > > Cheers, > > Josh > >> Dimitri >> >> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund wrote: >>> >>> >>>> -----Original Message----- >>>> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >>>> On Behalf Of Dimitri Liakhovitski >>>> Sent: Thursday, August 04, 2011 8:24 AM >>>> To: r-help >>>> Subject: [R] Efficient way of creating a shifted (lagged) variable? >>>> >>>> Hello! >>>> >>>> I have a data set: >>>> set.seed(123) >>>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01- >>>> 31"),by="week")) >>>> y$var1<-c(1,2,3,round(rnorm(54),1)) >>>> y$var2<-c(10,20,30,round(rnorm(54),1)) >>>> >>>> # All I need is to create lagged variables for var1 and var2. I looked >>>> around a bit and found several ways of doing it. They all seem quite >>>> complicated - while in SPSS it's just a few letters (like LAG()). Here >>>> is what I've written but I wonder. It works - but maybe there is a >>>> very simple way of doing it in R that I could not find? >>>> I need the same for "lead" (opposite of lag). >>>> Any hint is greatly appreciated! >>>> >>>> ### The function I created: >>>> mylag <- function(x,max.lag=1){ ? # x has to be a 1-column data frame >>>> ? ?temp<- >>>> as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] >>>> ? ?for(i in 1:length(temp)){ >>>> ? ? ?names(temp)[i]<-paste(names(x),".lag",i,sep="") >>>> ? ? } >>>> ? return(temp) >>>> } >>>> >>>> ### Running mylag to get my result: >>>> myvars<-c("var1","var2") >>>> for(i in myvars) { >>>> ? y<-cbind(y,mylag(y[i]),max.lag=2) >>>> } >>>> (y) >>>> >>>> -- >>>> Dimitri Liakhovitski >>>> marketfusionanalytics.com >>>> >>> >>> Dimitri, >>> >>> I would first look into the zoo package as has already been suggested. ?However, if you haven't already got your solution then here are a couple of functions that might help you get started. ?I won't vouch for efficiency. >>> >>> >>> lag.fun <- function(df, x, max.lag=1) { >>> ?for(i in x) { >>> ? ?for(j in 1:max.lag){ >>> ? ? ?lagx <- paste(i,'.lag',j,sep='') >>> ? ? ?df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i]) >>> ? ?} >>> ?} >>> ?df >>> } >>> >>> lead.fun <- function(df, x, max.lead=1) { >>> ?for(i in x) { >>> ? ?for(j in 1:max.lead){ >>> ? ? ?leadx <- paste(i,'.lead',j,sep='') >>> ? ? ?df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j)) >>> ? ?} >>> ?} >>> ?df >>> } >>> >>> y <- lag.fun(y,myvars,2) >>> y <- lead.fun(y,myvars,2) >>> >>> >>> Hope this is helpful, >>> >>> Dan >>> >>> Daniel Nordlund >>> Bothell, WA USA >>> >>> >>> >> >> >> >> -- >> Dimitri Liakhovitski >> marketfusionanalytics.com >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > -- Dimitri Liakhovitski marketfusionanalytics.com From murdoch.duncan at gmail.com Thu Aug 4 22:21:14 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Thu, 04 Aug 2011 16:21:14 -0400 Subject: [R] random value generation with added constraint. In-Reply-To: References: Message-ID: <4E3AFF3A.2010204@gmail.com> On 04/08/2011 12:03 PM, Vijayan Padmanabhan wrote: > Hi > I am looking at generating a random dataset of say 100 values fitting in a > normal distribution of a given mean and SD, I am aware of rnorm > function. However i am trying to build into this function one added > constraint that all the random value generated should also obey the > constraint that they only take values between say X to X+25 > How do i do this in R? The easiest way is to use the inverse-CDF method to generate values. For example: mu <- 50 sd <- 10 X <- 30 lower <- pnorm(X, mean=mu, sd=sd) upper <- pnorm(X+25, mean=mu, sd=sd) U <- runif(1000, lower, upper) Y <- qnorm(U, mean=mu, sd=sd) This will fail if you go too far out in the tail (e.g. trying mu=0, sd=1, X=30); for that you need to be more careful, and work with log probabilities, etc. Duncan Murdoch > Any help would be highly appreciated,. > Thanks > Vijayan Padmanabhan > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From marc_schwartz at me.com Thu Aug 4 22:35:33 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Thu, 4 Aug 2011 15:35:33 -0500 Subject: [R] Automatic creation of binary logistic models In-Reply-To: References: Message-ID: On Aug 4, 2011, at 2:23 PM, Paul Smith wrote: > Dear All, > > Suppose that you are trying to create a binary logistic model by > trying different combinations of predictors. Has R got an automatic > way of doing this, i.e., is there some way of automatically generating > different tentative models and checking their corresponding AIC value? > If so, could you please direct me to an example? > > Thanks in advance, > > Paul Hi Paul, If it were not for JSS going on at the moment, you would likely get a reply from Frank Harrell telling you why using this approach is not a good idea. This is tantamount to using a stepwise approach with variables going in and out of the model, based upon either AIC or perhaps Wald p values. If you search the R list archives using rseek.org with keywords such as "stepwise regression Harrell", you will see a plethora of discussions on this over the years. You might want to obtain a copy of Frank's book Regression Modeling Strategies along with Ewout Steyerberg's book Clinical Prediction Models, which cover this topic and offer alternative solutions to model development. These generally include the pre-specification of full models, considering how many covariate degrees of freedom you can reasonably include in the model and applying shrinkage/penalization. If you need to engage in data reduction, you might want to consider using the LASSO, as implemented in the glmnet package on CRAN. More information on this method is available at: http://www-stat.stanford.edu/~tibs/lasso.html. An alternative might be backward elimination, which Frank does touch on and covers in: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf which is a supplement to his course. Automated creation of models ignores the expertise of both the statistician and subject matter experts, to the detriment of inference. Regards, Marc Schwartz From bt_jannis at yahoo.de Thu Aug 4 22:38:48 2011 From: bt_jannis at yahoo.de (Jannis) Date: Thu, 04 Aug 2011 22:38:48 +0200 Subject: [R] General indexing in multidimensional arrays In-Reply-To: References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> <4E3A7561.2030102@yahoo.de> Message-ID: <4E3B0358.8030906@yahoo.de> Thanks, Gene, for your hint! I indeed did not check any possible situation and my function was not returning what I intened it to return. This updated version, however, should. I am sure there are much easier ways (or ready made functions) to do the same. ind.datacube = function( ##title<< create logical index matrices for multidimensional datacubes datacube ##<< array: datacube from which to extract the subparts , logical.ind ##<< logical array: TRUE/FALSE index matrix for a subset of the dimensions ## of datacube. The size of logical.ind`s dimesnions has to match the ## sizes of the corresponding dimensions in datacube. , dims='auto' ##<< integer vector or 'auto' : indices of the dimensions in datacube corresponding ## to the dimensions of logical.ind. If set to 'auto' this matching is tried to ## be accomplished by comparing the sizes of the dimensions of the two objects. ) { if (sum(logical.ind) == 0) { stop('No TRUE value in index matrix!') } else { if (dims[1] == 'auto') { if (is.null(dim(logical.ind)[1])) { size.ind = length(logical.ind) logical.ind = matrix(logical.ind,ncol=1) } else { size.ind = dim(logical.ind) } dims = match(size.ind, dim(datacube)) if (sum(duplicated(size.ind)) > 0 || sum(duplicated(dims)) > 0 ) stop('dimensions do not match unambigously. Supply dims manually!') } dims.all <- setdiff(1:length(dim(datacube)),dims) ind.matrix.choice <- which(logical.ind, arr.ind = TRUE) dims.all.expand <- list() for (i in 1:length(dims.all)) dims.all.expand[[i]] <- 1:dim(datacube)[dims.all[i]] dims.all.grid <- as.matrix(do.call(expand.grid, dims.all.expand)) expgrid.dims.all <- as.matrix(do.call(expand.grid, dims.all.expand)) dims.all.mat <- matrix(rep(dims.all.grid,times=2),ncol=length(dims.all)) ind.matrix.all <- cbind(ind.matrix.choice[rep(1:dim(ind.matrix.choice)[1],each=dim(dims.all.grid)[1]),] , dims.all.mat) ind.matrix.ord <- ind.matrix.all[,order(c(dims,dims.all))] } colnames(ind.matrix.ord) <- paste('dim',1:length(dim(datacube)),sep='') ##value<< integer index matrix which can be used to index datacube ind.matrix.ord } data <- array(rnorm(64),dim=c(4,4,4)) indices <- matrix(FALSE,ncol=4,nrow=4) indices[1,3] <- TRUE indices[4,1] <- TRUE #result <- data[indices,] ind.datacube(data, indices, dims=c(1,2)) On 08/04/2011 09:20 PM, Gene Leynes wrote: > data<- array(rnorm(64),dim=c(4,4,4)) From lisajca at gmail.com Thu Aug 4 22:24:45 2011 From: lisajca at gmail.com (Lisa) Date: Thu, 4 Aug 2011 13:24:45 -0700 (PDT) Subject: [R] Text annotation of a graph Message-ID: <1312489485191-3719775.post@n4.nabble.com> Dear All, I am trying to add some text annotation to a graph using matplot() as follows: vars <- c("v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8", "v8", "v10") id <- seq(5.000, 0.001, by = -0.001) sid <- c(4.997, 3.901, 2.339, 0.176, 0.151, 0.101, 0.076, 0.051, 0.026, 0.001) rn <- sample(seq(0, 0.6, by = 0.001), 5000, replace = T) matplot(rbind(rn, rep(0, length(rn))), rbind(id, id), xlim = c(0, 1), type = "l", lty = 1, lwd = 1, col = 1, xlab = "", ylab = "", axes = F) abline(v = 0, lty = 2) axis(1) mtext(side = 2, text = c(vars), at = sid, las = 2, line = 0.8) axis(3) But the text annotation can not be displayed correctly, i.e., some of them stick together. Can anybody help me with this particular problem? Thanks in advance. Lisa -- View this message in context: http://r.789695.n4.nabble.com/Text-annotation-of-a-graph-tp3719775p3719775.html Sent from the R help mailing list archive at Nabble.com. From kmill117 at alaska.edu Thu Aug 4 22:38:27 2011 From: kmill117 at alaska.edu (Katharine Miller) Date: Thu, 4 Aug 2011 12:38:27 -0800 Subject: [R] randomForest partial dependence plot variable names Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gleynes+r at gmail.com Thu Aug 4 22:40:56 2011 From: gleynes+r at gmail.com (Gene Leynes) Date: Thu, 4 Aug 2011 15:40:56 -0500 Subject: [R] General indexing in multidimensional arrays In-Reply-To: <4E3A7561.2030102@yahoo.de> References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> <4E3A7561.2030102@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Thu Aug 4 22:54:54 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 16:54:54 -0400 Subject: [R] How to see the previous commands after save workspace/load workspace ? In-Reply-To: <4071d228-64dd-4bd7-b3cf-36c2e5c1b7d6@j15g2000yqf.googlegroups.com> References: <4071d228-64dd-4bd7-b3cf-36c2e5c1b7d6@j15g2000yqf.googlegroups.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From marc_schwartz at me.com Thu Aug 4 22:57:35 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Thu, 04 Aug 2011 15:57:35 -0500 Subject: [R] Multiple endpoint (possibly group sequential) sample size calculation In-Reply-To: <1312484630.13217.YahooMailClassic@web161616.mail.bf1.yahoo.com> References: <1312484630.13217.YahooMailClassic@web161616.mail.bf1.yahoo.com> Message-ID: On Aug 4, 2011, at 2:03 PM, Paul Miller wrote: > > Hello everyone, > > I need to do a sample size calculation. The study two arms and two endpoints. The two arms are two different cancer drugs and the two endpoints reflect efficacy (based on progression free survival) and toxicity. > > Until now, I have been trying to understand this in terms of a one-arm design, where the acceptable rate of efficacy might be 0.40, the unacceptable rate of efficacy might be 0.20, the acceptable rate of non-toxicity might be 0.85, and the unacceptable rate of non-toxicity might be 0.65. Then one would pick an alpha for the probability of accepting a poor response, an alpha for the probability of accepting a toxic drug, and a beta for the probability of rejecting a good drug. > > I'm not really sure how that sort of thinking translates into a two-arm design though. > > Ideally, I'd like the calculation to be based on a group sequential design with two stages, but that's certainly not necessary, and I'd be very happy to learn how to do things both with and without this extra element. > > Any help with this would be greatly appreciated. > > Thanks, > > Paul Hi Paul, I am not clear that there is a current R package that will handle both of these considerations, though will stand to be corrected if wrong. A Google search using "sample size calculation multiple endpoints" yields some possible theoretical papers that might be helpful, such as: http://www.ncbi.nlm.nih.gov/pubmed/20687162 http://www.ncbi.nlm.nih.gov/pubmed/17674404 I have used the gsDesign package on CRAN to assist with group sequential designs with a single primary endpoint. There was just a review in this month's The American Statistician which covered several software implementations, including gsDesign: http://pubs.amstat.org/doi/abs/10.1198/tast.2011.10213 There is also a CRAN Task View that might be helpful: http://cran.r-project.org/web/views/ClinicalTrials.html My various books on group sequential designs are at home, so am unable to check them at the moment, but pretty sure that at least Jennison and Turnbull (1999) have a chapter on multiple endpoints. HTH, Marc Schwartz From vishalthapar at gmail.com Thu Aug 4 23:06:08 2011 From: vishalthapar at gmail.com (Vishal Thapar) Date: Thu, 4 Aug 2011 17:06:08 -0400 Subject: [R] Multi-task SVM in R Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gleynes+r at gmail.com Thu Aug 4 23:15:12 2011 From: gleynes+r at gmail.com (Gene Leynes) Date: Thu, 4 Aug 2011 16:15:12 -0500 Subject: [R] General indexing in multidimensional arrays In-Reply-To: References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> <4E3A7561.2030102@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bouldinjr at gmail.com Thu Aug 4 23:17:01 2011 From: bouldinjr at gmail.com (Jim Bouldin) Date: Thu, 4 Aug 2011 17:17:01 -0400 Subject: [R] functions on rows or columns of two (or more) arrays Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From arrayprofile at yahoo.com Thu Aug 4 23:22:43 2011 From: arrayprofile at yahoo.com (array chip) Date: Thu, 4 Aug 2011 14:22:43 -0700 (PDT) Subject: [R] survival probability estimate method In-Reply-To: <1312483498.80250.YahooMailNeo@web125814.mail.ne1.yahoo.com> References: <1312483498.80250.YahooMailNeo@web125814.mail.ne1.yahoo.com> Message-ID: <1312492963.78325.YahooMailNeo@web125817.mail.ne1.yahoo.com> Hi all, the reference for this method was: ?Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modeling and estimation of treatment effects? published in Stat Med (2002) 21: 2175 ? The abstract is: ? Modelling of censored survival data is almost always done by Cox proportional-hazards regression. However, use of parametric models for such data may have some advantages. For example, non-proportional hazards, a potential difficulty with Cox models, may sometimes be handled in a simple way, and visualization of the hazard function is much easier. Extensions of the Weibull and log-logistic models are proposed in which natural cubic splines are used to smooth the baseline log cumulative hazard and log cumulative odds of failure functions. Further extensions to allow non-proportional effects of some or all of the covariates are introduced. A hypothesis test of the appropriateness of the scale chosen for covariate effects (such as of treatment) is proposed. The new models are applied to two data sets in cancer. The results throw interesting light on the behaviour of both the hazard function and the hazard ratio over time. The tools described here may be a step towards providing greater insight into the natural history of the disease and into possible underlying causes of clinical events. We illustrate these aspects by using the two examples in cancer. ? Hope this helps someone give me some hints how to do this in R. ? Thanks ? John ? ----- Original Message ----- From: array chip To: r-help Cc: Sent: Thursday, August 4, 2011 11:44 AM Subject: [R] survival probability estimate method Hi, I was reading a paper published in JCO "Prediction of risk of distant recurrence using 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to estimate the 9-year risk of distant recurrence as a function of continuous recurrence score (RS). The method is special as author states: ? "To define the continuous relation between RS, as a linear covariate, and 9-year risk of distant recurrence, the logarithm of the baseline cumulative hazard function was fitted by constrained cubic splines with 3 df. These models tend to be more robust for prediction of survival probabilities and corresponding confidence limits at late follow-up time as a result of the modeling of the baseline cumulative hazard function by natural cubic splines (in contrast to using the crude hazard function itself)." ? Does R provide a package/function to do this particular method for estimating survival probability as a function of a continuous variable? Is the survest.cph() in rms package doing estimation with just the crude hazard function? ? Thanks very much! ? John ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From michael.weylandt at gmail.com Thu Aug 4 23:29:57 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 17:29:57 -0400 Subject: [R] functions on rows or columns of two (or more) arrays In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From liulei at virginia.edu Thu Aug 4 23:36:39 2011 From: liulei at virginia.edu (Lei Liu) Date: Thu, 04 Aug 2011 17:36:39 -0400 Subject: [R] multi-dimensional Gaussian quadrature Message-ID: Hi there, Does anyone know if there is a package in R for multi-demensional Gaussian quadrature? I checked out the package "gaussquad" but it can only do 1D. Thanks! Lei Liu Associate Professor Division of Biostatistics Department of Public Health Sciences University of Virginia School of Medicine http://people.virginia.edu/~ll9f/ From WeganM at michigan.gov Thu Aug 4 23:45:44 2011 From: WeganM at michigan.gov (Wegan, Michael (DNRE)) Date: Thu, 4 Aug 2011 17:45:44 -0400 Subject: [R] matrix rows to single numeric element Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From arrayprofile at yahoo.com Thu Aug 4 23:49:08 2011 From: arrayprofile at yahoo.com (array chip) Date: Thu, 4 Aug 2011 14:49:08 -0700 (PDT) Subject: [R] survival probability estimate method In-Reply-To: <3C94838831897E459067E57AA51295AB0929F36800@SIF.svf.au.dk> References: <1312483498.80250.YahooMailNeo@web125814.mail.ne1.yahoo.com> <3C94838831897E459067E57AA51295AB0929F36800@SIF.svf.au.dk> Message-ID: <1312494548.35163.YahooMailNeo@web125811.mail.ne1.yahoo.com> Dear Peter, Thanks very much for the references. It seems the method is based on parametric proportional hazard models by incorporating cubic spline of the baseline hazards, not sure if tweaking survreg() would do? Best, John ----- Original Message ----- From: Peter Jepsen To: 'array chip' ; r-help Cc: Sent: Thursday, August 4, 2011 1:09 PM Subject: SV: [R] survival probability estimate method Dear John I am not aware of an R package that does this, but I believe that Patrick Royston's -stpm- function for Stata does. Here's two references found in http://www.stata-journal.com/sjpdf.html?articlenum=st0001_2: Royston, P. 2001. Flexible parametric alternatives to the Cox model. Stata Journal 1(1): 1-28. Royston, P. and M. K. B. Parmar. 2002. Flexible parametric-hazards and proportional odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statistics in Medicine 21: 2175-2197. Best regards, Peter. -----Oprindelig meddelelse----- Fra: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] P? vegne af array chip Sendt: 4. august 2011 20:45 Til: r-help Emne: [R] survival probability estimate method Hi, I was reading a paper published in JCO "Prediction of risk of distant recurrence using 21-gene recurrence score in node-negative and node-positive postmenopausal patients with breast cancer treated with anastrozole or tamoxifen: a TransATAC study" (ICO 2010 28: 1829). The author uses a method to estimate the 9-year risk of distant recurrence as a function of continuous recurrence score (RS). The method is special as author states: ? "To define the continuous relation between RS, as a linear covariate, and 9-year risk of distant recurrence, the logarithm of the baseline cumulative hazard function was fitted by constrained cubic splines with 3 df. These models tend to be more robust for prediction of survival probabilities and corresponding confidence limits at late follow-up time as a result of the modeling of the baseline cumulative hazard function by natural cubic splines (in contrast to using the crude hazard function itself)." ? Does R provide a package/function to do this particular method for estimating survival probability as a function of a continuous variable? Is the survest.cph() in rms package doing estimation with just the crude hazard function? ? Thanks very much! ? John ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From michael.weylandt at gmail.com Thu Aug 4 23:53:29 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 17:53:29 -0400 Subject: [R] matrix rows to single numeric element In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Fri Aug 5 00:06:07 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 18:06:07 -0400 Subject: [R] General indexing in multidimensional arrays In-Reply-To: <4E3B150D.1040808@yahoo.de> References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> <4E3A7561.2030102@yahoo.de> <4E3B150D.1040808@yahoo.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From phhs80 at gmail.com Fri Aug 5 00:33:02 2011 From: phhs80 at gmail.com (Paul Smith) Date: Thu, 4 Aug 2011 23:33:02 +0100 Subject: [R] Automatic creation of binary logistic models In-Reply-To: References: Message-ID: On Thu, Aug 4, 2011 at 9:35 PM, Marc Schwartz wrote: >> Suppose that you are trying to create a binary logistic model by >> trying different combinations of predictors. Has R got an automatic >> way of doing this, i.e., is there some way of automatically generating >> different tentative models and checking their corresponding AIC value? >> If so, could you please direct me to an example? > > Hi Paul, > > If it were not for JSS going on at the moment, you would likely get a reply from Frank Harrell telling you why using this approach is not a good idea. This is tantamount to using a stepwise approach with variables going in and out of the model, based upon either AIC or perhaps Wald p values. > > If you search the R list archives using rseek.org with keywords such as "stepwise regression Harrell", you will see a plethora of discussions on this over the years. > > You might want to obtain a copy of Frank's book Regression Modeling Strategies along with Ewout Steyerberg's book Clinical Prediction Models, which cover this topic and offer alternative solutions to model development. These generally include the pre-specification of full models, considering how many covariate degrees of freedom you can reasonably include in the model and applying shrinkage/penalization. > > If you need to engage in data reduction, you might want to consider using the LASSO, as implemented in the glmnet package on CRAN. More information on this method is available at: http://www-stat.stanford.edu/~tibs/lasso.html. An alternative might be backward elimination, which Frank does touch on and covers in: > > ?http://biostat.mc.vanderbilt.edu/wiki/pub/Main/RmS/rms.pdf > > which is a supplement to his course. > > Automated creation of models ignores the expertise of both the statistician and subject matter experts, to the detriment of inference. Thanks, Marc, for your very useful reply. Paul From phhs80 at gmail.com Fri Aug 5 00:41:59 2011 From: phhs80 at gmail.com (Paul Smith) Date: Thu, 4 Aug 2011 23:41:59 +0100 Subject: [R] Can glmnet handle models with numeric and categorical data? Message-ID: Dear All, Can the x matrix in the glmnet() function of glmnet package be a data.frame with numeric columns and factor columns? I am asking this because I have a model with both numeric and categorical predictors, which I would like to study with glmnet. I have already tried to use a data.frame, but with no success -- as far as I know, the matrix object can only have data of a single type. Is there some way of circumventing this problem? Thanks in advance, Paul From marc_schwartz at me.com Fri Aug 5 01:02:55 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Thu, 4 Aug 2011 18:02:55 -0500 Subject: [R] Can glmnet handle models with numeric and categorical data? In-Reply-To: References: Message-ID: <7A2BCEAF-D51B-4D24-A353-5AA998A519EF@me.com> On Aug 4, 2011, at 5:41 PM, Paul Smith wrote: > Dear All, > > Can the x matrix in the glmnet() function of glmnet package be a > data.frame with numeric columns and factor columns? I am asking this > because I have a model with both numeric and categorical predictors, > which I would like to study with glmnet. I have already tried to use a > data.frame, but with no success -- as far as I know, the matrix object > can only have data of a single type. Is there some way of > circumventing this problem? > > Thanks in advance, > > Paul Hi Paul, My recollection is that you would use ?model.matrix on the data frame to create the requisite matrix input for glmnet(). The caution however, is that glmnet() standardizes the input covariates, which is not appropriate for factors. Thus, you would want to set 'standardize = FALSE' and use appropriate methods in pre-processing continuous variables. HTH, Marc Schwartz From phhs80 at gmail.com Fri Aug 5 01:30:59 2011 From: phhs80 at gmail.com (Paul Smith) Date: Fri, 5 Aug 2011 00:30:59 +0100 Subject: [R] Can glmnet handle models with numeric and categorical data? In-Reply-To: <7A2BCEAF-D51B-4D24-A353-5AA998A519EF@me.com> References: <7A2BCEAF-D51B-4D24-A353-5AA998A519EF@me.com> Message-ID: On Fri, Aug 5, 2011 at 12:02 AM, Marc Schwartz wrote: >> Can the x matrix in the glmnet() function of glmnet package be a >> data.frame with numeric columns and factor columns? I am asking this >> because I have a model with both numeric and categorical predictors, >> which I would like to study with glmnet. I have already tried to use a >> data.frame, but with no success -- as far as I know, the matrix object >> can only have data of a single type. Is there some way of >> circumventing this problem? > > My recollection is that you would use ?model.matrix on the data frame to create the requisite matrix input for glmnet(). > > The caution however, is that glmnet() standardizes the input covariates, which is not appropriate for factors. Thus, you would want to set 'standardize = FALSE' and use appropriate methods in pre-processing continuous variables. Again, Mark, thanks a lot for your so helpful answer -- I completely ignored model.matrix(). Paul From flodel at gmail.com Fri Aug 5 01:56:19 2011 From: flodel at gmail.com (Florent D.) Date: Thu, 4 Aug 2011 19:56:19 -0400 Subject: [R] functions on rows or columns of two (or more) arrays In-Reply-To: References: Message-ID: The apply function also works with multi-dimensional arrays, I think this is what you want to achieve using a 3d array: aaa <- array(NA, dim = c(2, dim(a))) aaa[1,,] <- a aaa[2,,] <- a2 apply(aaa, 3, function(x)lm(x[1,]~x[2,])) From vicvoncastle at gmail.com Fri Aug 5 01:57:58 2011 From: vicvoncastle at gmail.com (Ken H) Date: Thu, 4 Aug 2011 19:57:58 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Mark.Ebbert at hci.utah.edu Thu Aug 4 23:24:13 2011 From: Mark.Ebbert at hci.utah.edu (Mark Ebbert) Date: Thu, 4 Aug 2011 15:24:13 -0600 Subject: [R] source() or OS X Lion? Message-ID: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> Dear R Gurus, I'm seeing some strange behavior that I can't explain. I'm generating a figure for a paper and I like to save the script (no matter how simple) for future reference. My practice is to write the script and run it using the 'source()' function. What's weird is that the resultant figure is not readable by OS X 10.7.0 (Lion). While trying to figure out what I did wrong, I discovered that typing the exact same code into the R prompt (running in Terminal) will produce the figure as I would expect it. The only idea I have is that something has changed in Lion that doesn't allow 'source()' to interpret it properly. Any ideas? Here is the exact code I'm using: 1 x<-read.delim("path/data.txt") 2 3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf") 4 histogram(~Total.Change,data=x,xlab="Percent Change") 5 dev.off() I appreciate any help. I'm especially curious if there's a Lion user who could give this a try. OS X 10.7.0 (Lion) R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) From bt_jannis at yahoo.de Thu Aug 4 23:54:21 2011 From: bt_jannis at yahoo.de (Jannis) Date: Thu, 04 Aug 2011 23:54:21 +0200 Subject: [R] General indexing in multidimensional arrays In-Reply-To: References: <4E3673FD.5030800@yahoo.de> <4E39B6DD.1070606@yahoo.de> <4E3A7561.2030102@yahoo.de> Message-ID: <4E3B150D.1040808@yahoo.de> Your function only works for the first dimensions (e.g. indices indicating the positions in the first two dimensions in datacube), correct? Otherwise it looks very handy! And certainly more elegent than my function monster! Jannis On 08/04/2011 09:58 PM, R. Michael Weylandt wrote: > Hi Jannis, > > Like Gene, I'm not entirely sure what your code is intended to do, but it > wasn't hard to adapt mine to at least print out the desired slice through > the higher-dimensional array: > > cubeValues<- function(Insert, Destination, Location) { > Location = as.matrix(Location) > Location = array(Location,dim(Destination)) > if (is.null(Insert)) { > Destination = round(Destination,3) > Destination[!Location] = NA > print(Destination) > return(invisible()) > } > Destination[Location]<- Insert > return(Destination) > } > > If Insert = NULL, it adopts a printing rather than value assigning behavior. > > > If you could indicate how you want the values when they come out, it's > pretty easy to adapt this to do whatever, but I can't just pull out > subarrays of arbitrary shape while keeping shape: e.g., > > x = matrix(1:4,2,2,byrow=T) > y = rbind(c(T,F),c(F,T)) > >> is(x[y]) > "vector" > > If you just want the values in a vector, take this version of my code: > > cubeValues<- function(Insert, Destination, Location) { > Location = as.matrix(Location) > Location = array(Location,dim(Destination)) > if (is.null(Insert)) { > return(Destination[Location]) > } > Destination[Location]<- Insert > return(Destination) > } > > It sounds like you've got what you want, but hopefully this will be of some > use to anyone who stumbles across this and, like Gene& myself, doesn't > really get your code. > > NB: I have not tested the provided code very much -- it relies on the > array() function to repeat Location as appropriate. If you know how R > repeats smaller arrays to make them fit big arrays, this should be fine for > you -- caveat code-or. > > Michael Weylandt > > PS -- The combinations() function of the gtools package might be of help to > you as well. We could get the entire example Gene got by > > ans = combinations(1:4,2,repeats.allowed=T) > rbind(cbind(ans,4),cbind(ans,1)) > > and it's probably not hard to simplify the entire code as desired. > > On Thu, Aug 4, 2011 at 6:33 AM, Jannis wrote: > >> Thanks, Michael. I was, however, after a function I coul use for both >> extracting and replacing subarrays. In case anybody else stumbles over this >> problem, here is my solution. Its programming is most probably horribly >> clumsy: >> >> >> ind.datacube = function( >> ##title<< create logical index matrices for multidimensional datacubes >> datacube ##<< array: datacube from which to extract the subparts >> , logical.ind ##<< logical array: TRUE/FALSE index matrix for a subset of >> the dimensions >> ## of datacube. The size of logical.ind`s dimesnions has >> to match the >> ## sizes of the corresponding dimesnions in datacube. >> , dims='auto' ##<< integer vector or 'auto' : indices of the dimensions >> in datacube corresponding >> ## to the dimensions of logical.ind. If set to 'auto' >> this matching is tried to >> ## be accomplished by comparing the sizes of the >> dimensions of the two objects. >> ) >> { >> if (sum(logical.ind) == 0) { >> stop('No TRUE value in index matrix!') >> } else { >> if (dims == 'auto') >> { >> if (is.null(dim(logical.ind)[1])) { >> size.ind = length(logical.ind) >> logical.ind = matrix(logical.ind,ncol=1) >> } else { >> size.ind = dim(logical.ind) >> } >> dims = match(size.ind, dim(datacube)) >> if (sum(duplicated(size.ind))> 0 || sum(duplicated(dims))> 0 ) >> stop('dimensions do not match unambigously. Supply dims >> manually!') >> } >> dims.nonapply<- setdiff(1:length(dim(datacube)**),dims) >> ind.matrix<- which(logical.ind, arr.ind = TRUE) >> >> args.expand.grid<- list() >> counter = 1 >> for (i in 1: length(dim(datacube))) >> { >> if (is.element(i,dims.nonapply)) { >> args.expand.grid[[i]] = 1:dim(datacube)[dims.nonapply[**i]] >> } else { >> args.expand.grid[[i]] = ind.matrix[,counter] >> counter = counter + 1 >> } >> } >> >> ind.all<- as.matrix(do.call(expand.grid, args.expand.grid)) >> ind.matrix<- ind.all[,order(c(dims.**nonapply,dims))] >> >> } >> ##value<< integer index matrix which can be used to index datacube >> ind.matrix >> >> } >> >> >> On 08/04/2011 12:12 AM, R. Michael Weylandt >> wrote: >> >>> This might be a little late: but how about this (slightly clumsy) >>>> function: >>>> >>>> putValues<- function(Insert, Destination, Location) { >>>> Location = as.matrix(Location) >>>> Location = array(Location,dim(**Destination)) >>>> Destination[Location]<- Insert >>>> return(Destination) >>>> } >>>> >>>> It currently assumes that the location array lines up in dimension order, >>>> but other than that seems to work pretty well. If you want, it shouldn't >>>> be >>>> hard to change it to take in a set of dimensions to arrange Location >>>> along. >>>> If you like any of the other suggested behaviors, you could put in a >>>> is.null(Insert) option that returns the desired subset of values. I >>>> haven't >>>> tested it completely, but for a few sample inputs, it seems be do as >>>> desired. >>>> >>>> Michael >>>> >>>> >>>> On Wed, Aug 3, 2011 at 5:00 PM, Jannis wrote: >>>> >>>> Thanks for all the replies!Unfortunately the solutions only work for >>>>> extracting subsets of the data (which was exactly what I was asking for) >>>>> and >>>>> not to replace subsets with other values. I used them, however, to >>>>> program a >>>>> rather akward function to do that. Seems I found one of the few aspects >>>>> where Matlab actually is slightly easier to use than R. >>>>> >>>>> >>>>> Thanks for your help! >>>>> Jannis >>>>> >>>>> On 08/01/2011 05:50 PM, Gene Leynes wrote: >>>>> >>>>> What do you think about this? >>>>>> apply(data, 3, '[', indices) >>>>>> >>>>>> >>>>>> On Mon, Aug 1, 2011 at 4:38 AM, Jannis wrote: >>>>>> >>>>>> Dear R community, >>>>>> >>>>>>> I have a general question regarding indexing in multidiemensional >>>>>>> arrays. >>>>>>> >>>>>>> Imagine I have a three dimensional array and I only want to extract on >>>>>>> vector along a single dimension from it: >>>>>>> >>>>>>> >>>>>>> data<- array(rnorm(64),dim=c(4,4,4)) >>>>>>> >>>>>>> result<- data[1,1,] >>>>>>> >>>>>>> If I want to extract more than one of these vectors, it would now >>>>>>> really >>>>>>> help me to supply a logical matrix of the size of the first two >>>>>>> dimensions: >>>>>>> >>>>>>> >>>>>>> indices<- matrix(FALSE,ncol=4,nrow=4) >>>>>>> indices[1,3]<- TRUE >>>>>>> indices[4,1]<- TRUE >>>>>>> >>>>>>> result<- data[indices,] >>>>>>> >>>>>>> This, however would give me an error. I am used to this kind of >>>>>>> indexing >>>>>>> from Matlab and was wonderingt whether there exists an easy way to do >>>>>>> this >>>>>>> in R without supplying complicated index matrices of all three >>>>>>> dimensions or >>>>>>> logical vectors of the size of the whole matrix? >>>>>>> >>>>>>> The only way I could imagine would be to: >>>>>>> >>>>>>> result<- data[rep(as.vector(indices),******times=4)] >>>>>>> >>>>>>> but this seems rather complicated and also depends on the order of the >>>>>>> dimensions I want to extract. >>>>>>> >>>>>>> >>>>>>> I do not want R to copy Matlabs behaviour, I am just wondering whether >>>>>>> I >>>>>>> missed one concept of indexing in R? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks a lot >>>>>>> Jannis >>>>>>> >>>>>>> ______________________________******________________ >>>>>>> R-help at r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/******listinfo/r-help >>>>>>> >>>>>>> >>>>>>> >>>>>>> PLEASE do read the posting guide http://www.R-project.org/** >>>>>>> posting-guide.html>>>>>> **html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>>> >>>>>>> ______________________________****________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/****listinfo/r-help >>>>> >>>>> PLEASE do read the posting guide http://www.R-project.org/** >>>>> posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> From sumukh.sathnur at gmail.com Fri Aug 5 00:54:29 2011 From: sumukh.sathnur at gmail.com (Sumukh Sathnur) Date: Thu, 04 Aug 2011 15:54:29 -0700 Subject: [R] assigning colors with color2D.matplot In-Reply-To: <4E36722B.1080204@bitwrit.com.au> References: <4E33364A.8010309@gmail.com> <4E36722B.1080204@bitwrit.com.au> Message-ID: <4E3B2325.2010303@gmail.com> Jim, Thanks for the response! It turns out that using color2D.matplot actually eliminates my need to rotate the matrix at all. However, I'm not quite sure I understand the arguments for assigning colors to the cells. Using image and image.plot, I used col= and breaks= to define a certain color to a specific range of values (in my case, shades of gray). However, I just can't figure out how to do this in color2D.matplot. I mostly just don't really understand cs1 cs2 and cs3, and I'm more or less lost on how to reproduce the same sort of plot. I've only been using R for about a month, so I hope that explains my lack of knowledge...if you could help me out it would be much appreciated. Thanks, Sumukh On 8/1/2011 2:30 AM, Jim Lemon wrote: > On 07/30/2011 08:38 AM, Sumukh Sathnur wrote: >> Hi all, >> >> I used image.plot() to create a heat map of a matrix: >> >> as.matrix(read.table("Matrix.txt", sep="\t"))->x >> HeatBrk<-seq(5,25,2.5) >> MyCol= gray((7:0)/7) >> library(fields) >> image.plot(x, col=MyCol, breaks=HeatBrk, legend.shrink=0.3) >> >> dev.copy(device=pdf, file="HEAT4!.pdf", height=8, width=8) >> dev.off() >> >> >> >> There are a few things that I would like to do that I can't seem to find >> help with online: >> >> 1) Add axes to the bottom and left that go from 1:ncol(x) ; in this >> case, 104. Every time I try to add axes in some form or another they >> either go through the center of the image or do not show up at all. >> >> 2) Limit the scale of the legend from the minimum value to the maximum >> value that I assign a color to; as of now it goes from the minimum value >> (0) to the maximum value (300) but I would like it to stop at 25. >> >> 3) Rotate the resulting image 90 degrees to the right. Is there a >> generic way to do this? everything I have found online is case-specific >> and extremely complicated... >> > Hi Sumukh, > 1) Have a look at color2D.matplot in the plotrix package. > 2) Have a look at color.legend in the plotrix package > 3) Use the t (transpose) command > > Jim > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Matrix.txt URL: From sseely at thetus.com Thu Aug 4 22:54:49 2011 From: sseely at thetus.com (Scott Seely) Date: Thu, 4 Aug 2011 13:54:49 -0700 Subject: [R] Running a column loop through the Moran.I function. Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Fri Aug 5 02:36:44 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt ) Date: Thu, 4 Aug 2011 20:36:44 -0400 Subject: [R] Running a column loop through the Moran.I function. In-Reply-To: References: Message-ID: <45592C3E-D823-4E69-8B12-9EC828793139@gmail.com> I'm on my phone so I can't verify this but have you tried the apply function? apply(attri,2,Moran.I,Locate.dists.inv,NA.rm=T) Michael Weylandt On Aug 4, 2011, at 4:54 PM, Scott Seely wrote: > Dear R users, > > I have two data frames that consist of statistical information for most > countries around the world. One dataframe consists of the latitude and > longitude ("coord.csv") of each country, while the other consists of 100's > of different attributes ("countryattri.csv") for each country (like, GDP, > Population, etc.). The data is organized with a header and then countries > down the first column. I'm trying to run a spatial autocorrelation for each > column or attribute in the "attri" dataframe. The process for running this > on a single column is as follows: > > example of "coord.csv" dataframe: > > country, lat, long > Albania, 41.00, 20.00 > Algeria, 28.00, 3.00 > Angola, -12.30, 18.30 > > example of "countryattri.csv" dataframe: > > country, GDP, Population > Albania, 100000, 20000 > Algeria, 1200000, 300000 > Angola, 1300000, 300000 > >> Locate<-read.csv("coord.csv", header = TRUE, sep = ",") >> Locate.dists<-as.matrix(dist(cbind(Locate$long, Locate$lat))) >> Locate.dists.inv<-1/((Locate.dists)^2) >> diag(Locate.dists.inv)<-0 >> attri<-read.csv("countryattri.csv", header = TRUE, sep = ",") >> Moran.I(attri$GDP, Locate.dists.inv, na.rm = TRUE) > > This gives you Moran's I (correlation coefficient) for the GDP column in the > "attri" dataframe. I am trying to run the Moran.I function as a loop for all > columns in the "attri" dataframe so that I can obtain Moran coefficients for > every attribute. > > I've tried a number of different approaches but cannot get a column loop to > run through this function. I would be very grateful if someone could help me > with this problem. > > Thank you, > > *Scott Seely* > p.s. the Moran.I function is a part of the "ape" package. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ted.harding at wlandres.net Fri Aug 5 02:44:13 2011 From: ted.harding at wlandres.net (Ted Harding) Date: Fri, 5 Aug 2011 01:44:13 +0100 Subject: [R] Not really off-topic ... Message-ID: Greetings all! Just arrived via my daily digest of the ALLSTAT list is the following: http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I If you watch/listen, you will see/hear that it is not at all off-topic. And now to bed, before my correspondence is analysed. Best wishes, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 05-Aug-11 Time: 01:44:07 ------------------------------ XFMail ------------------------------ From murdoch.duncan at gmail.com Fri Aug 5 03:22:06 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Thu, 04 Aug 2011 21:22:06 -0400 Subject: [R] source() or OS X Lion? In-Reply-To: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> References: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> Message-ID: <4E3B45BE.1040606@gmail.com> On 11-08-04 5:24 PM, Mark Ebbert wrote: > Dear R Gurus, > > I'm seeing some strange behavior that I can't explain. I'm generating a figure for a paper and I like to save the script (no matter how simple) for future reference. My practice is to write the script and run it using the 'source()' function. What's weird is that the resultant figure is not readable by OS X 10.7.0 (Lion). While trying to figure out what I did wrong, I discovered that typing the exact same code into the R prompt (running in Terminal) will produce the figure as I would expect it. The only idea I have is that something has changed in Lion that doesn't allow 'source()' to interpret it properly. > > Any ideas? Here is the exact code I'm using: > 1 x<-read.delim("path/data.txt") > 2 > 3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf") > 4 histogram(~Total.Change,data=x,xlab="Percent Change") > 5 dev.off() > > I appreciate any help. I'm especially curious if there's a Lion user who could give this a try. I don't have Lion, but it would be helpful to know what "not readable" means. If you try to open the file in Preview, what happens? Duncan Murdoch > > OS X 10.7.0 (Lion) > R version 2.13.0 (2011-04-13) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dylan.beaudette at gmail.com Fri Aug 5 05:05:21 2011 From: dylan.beaudette at gmail.com (Dylan Beaudette) Date: Thu, 4 Aug 2011 20:05:21 -0700 Subject: [R] source() or OS X Lion? In-Reply-To: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> References: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From matt.curcio.ri at gmail.com Fri Aug 5 05:19:10 2011 From: matt.curcio.ri at gmail.com (Matt Curcio) Date: Thu, 4 Aug 2011 23:19:10 -0400 Subject: [R] Which is more efficient? Message-ID: Greetings all, I am curious to know if either of these two sets of code is more efficient? Example1: ## t-test ## colA <- temp [ , j ] colB <- temp [ , k ] ttr <- t.test ( colA, colB, var.equal=TRUE) tt_pvalue [ i ] <- ttr$p.value or Example2: tt_pvalue [ i ] <- t.test ( temp[ , j ], temp[ , k ], var.equal=TRUE) ------------- I have three loops, i, j, k. One to test the all of files in a directory. One to tease out column and compare it by means of t-test to column in each of the files. --------------- for ( i in 1:num_files ) { temp <- read.table ( files_to_test [ i ], header=TRUE, sep="\t") num_cols <- ncol ( temp ) ## Define Columns To Compare ## for ( j in 2 : num_cols ) { for ( k in 3 : num_cols ) { ## t-test ## colA <- temp [ , j ] colB <- temp [ , k ] ttr <- t.test ( colA, colB, var.equal=TRUE) tt_pvalue [ i ] <- ttr$p.value } } } -------------------------------- I am a novice writer of code and am interested to hear if there are any (dis)advantages to one way or the other. M Matt Curcio M: 401-316-5358 E: matt.curcio.ri at gmail.com From rolf.turner at xtra.co.nz Fri Aug 5 05:35:31 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Fri, 05 Aug 2011 15:35:31 +1200 Subject: [R] source() or OS X Lion? In-Reply-To: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> References: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> Message-ID: <4E3B6503.7060605@xtra.co.nz> Note that ***histogram()*** (as opposed to "hist()") is a function from the "lattice" package. So at some stage you must have issued the command "require(lattice)" or equivalently "library(lattice)". (So your ``exact code'' is a little misleading.) You are thus getting bitten by the fact that the output of lattice plot functions must be explicitly *printed* when called from within a function such as source(). See fortune("line 800"). They needn't be explicitly printed when called from the command line, which is why you are getting the figure you expect by typing the code at the R prompt. So put print(histogram(~Total.Change,data=x,xlab="Percent Change")) in the file that you are source()-ing and all will be in harmony in the universe. Nothing really to do either with source() or with OS X Lion. cheers, Rolf Turner P. S. Apropos of nothing (but speaking of lions) everyone should read ``The Lion of Boaz-Jachin and Jachin-Boaz'' by Russell Hoban. :-) R. T. On 05/08/11 09:24, Mark Ebbert wrote: > Dear R Gurus, > > I'm seeing some strange behavior that I can't explain. I'm generating a figure for a paper and I like to save the script (no matter how simple) for future reference. My practice is to write the script and run it using the 'source()' function. What's weird is that the resultant figure is not readable by OS X 10.7.0 (Lion). While trying to figure out what I did wrong, I discovered that typing the exact same code into the R prompt (running in Terminal) will produce the figure as I would expect it. The only idea I have is that something has changed in Lion that doesn't allow 'source()' to interpret it properly. > > Any ideas? Here is the exact code I'm using: > 1 x<-read.delim("path/data.txt") > 2 > 3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf") > 4 histogram(~Total.Change,data=x,xlab="Percent Change") > 5 dev.off() > > I appreciate any help. I'm especially curious if there's a Lion user who could give this a try. > > OS X 10.7.0 (Lion) > R version 2.13.0 (2011-04-13) > Copyright (C) 2011 The R Foundation for Statistical Computing > ISBN 3-900051-07-0 > Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) From michael.weylandt at gmail.com Fri Aug 5 05:37:28 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 23:37:28 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Fri Aug 5 05:56:11 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Thu, 4 Aug 2011 23:56:11 -0400 Subject: [R] Which is more efficient? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From asaguiar at spsconsultoria.com Fri Aug 5 06:09:05 2011 From: asaguiar at spsconsultoria.com (Alexandre Aguiar) Date: Fri, 5 Aug 2011 01:09:05 -0300 Subject: [R] boolean SEXP interpretation upon function return Message-ID: <201108050109.06283@spsconsultoria.com> Hi, When a function returns a SEXP of type LGLSXP (logical) to signal whether it succeeded or failed, how is it intrepreted? Is it like C where SUCCESS = 0 or other value? Thanks. -- Alexandre -- Alexandre Santos Aguiar, MD, SCT -------------- Pr?xima Parte ---------- Um anexo n?o-texto foi limpo... Nome: n?o dispon?vel Tipo: application/pgp-signature Tamanho: 198 bytes Descri??o: This is a digitally signed message part. URL: From jrkrideau at yahoo.ca Fri Aug 5 06:19:27 2011 From: jrkrideau at yahoo.ca (John Kane) Date: Thu, 4 Aug 2011 21:19:27 -0700 Subject: [R] Not really off-topic ... In-Reply-To: Message-ID: <1312517967.97197.YahooMailClassic@web38401.mail.mud.yahoo.com> You clearly have some strange ... err, whatever. --- On Thu, 8/4/11, Ted Harding wrote: > From: Ted Harding > Subject: [R] Not really off-topic ... > To: r-help at stat.math.ethz.ch > Received: Thursday, August 4, 2011, 8:44 PM > Greetings all! > Just arrived via my daily digest of the ALLSTAT list > is the following: > > http://www.youtube.com/StatisticalSongs#p/u/4/JEYLfIVvR9I > > If you watch/listen, you will see/hear that it is not > at all off-topic. > > And now to bed, before my correspondence is analysed. > > Best wishes, > Ted. > > -------------------------------------------------------------------- > E-Mail: (Ted Harding) > Fax-to-email: +44 (0)870 094 0861 > Date: 05-Aug-11? ? ? ? ? ? > ? ? ? ? ? ? ? ? > ? ? ? ? ???Time: > 01:44:07 > > From kebennett at alaska.edu Fri Aug 5 03:14:16 2011 From: kebennett at alaska.edu (Katrina Bennett) Date: Thu, 4 Aug 2011 17:14:16 -0800 Subject: [R] Translate Sine Function in R? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Mark.Ebbert at hci.utah.edu Fri Aug 5 05:48:02 2011 From: Mark.Ebbert at hci.utah.edu (Mark Ebbert) Date: Thu, 4 Aug 2011 21:48:02 -0600 Subject: [R] source() or OS X Lion? In-Reply-To: <4E3B6503.7060605@xtra.co.nz> References: <5E2B2AAB-CD30-4A8D-BB6F-6825E4FDF285@hci.utah.edu> <4E3B6503.7060605@xtra.co.nz> Message-ID: For pete's sake, I knew that. I apologize for wasting everyone's time. I tell ya, this has been an off week for me. Thank you for your kind responses. On Aug 4, 2011, at 9:35 PM, Rolf Turner wrote: > > Note that ***histogram()*** (as opposed to "hist()") is a function from the > "lattice" package. So at some stage you must have issued the command > "require(lattice)" or equivalently "library(lattice)". (So your ``exact > code'' > is a little misleading.) > > You are thus getting bitten by the fact that the output of lattice > plot functions must be explicitly *printed* when called from within > a function such as source(). See fortune("line 800"). They needn't > be explicitly printed when called from the command line, which is > why you are getting the figure you expect by typing the code at the > R prompt. > > So put > > print(histogram(~Total.Change,data=x,xlab="Percent Change")) > > in the file that you are source()-ing and all will be in harmony in the > universe. > > Nothing really to do either with source() or with OS X Lion. > > cheers, > > Rolf Turner > > P. S. Apropos of nothing (but speaking of lions) everyone should read > ``The Lion of Boaz-Jachin and Jachin-Boaz'' by Russell Hoban. :-) > > R. T. > > On 05/08/11 09:24, Mark Ebbert wrote: >> Dear R Gurus, >> >> I'm seeing some strange behavior that I can't explain. I'm generating a figure for a paper and I like to save the script (no matter how simple) for future reference. My practice is to write the script and run it using the 'source()' function. What's weird is that the resultant figure is not readable by OS X 10.7.0 (Lion). While trying to figure out what I did wrong, I discovered that typing the exact same code into the R prompt (running in Terminal) will produce the figure as I would expect it. The only idea I have is that something has changed in Lion that doesn't allow 'source()' to interpret it properly. >> >> Any ideas? Here is the exact code I'm using: >> 1 x<-read.delim("path/data.txt") >> 2 >> 3 pdf("path/PaperFig-hist_of_perc_change-individ_samps-by_subtype.pdf") >> 4 histogram(~Total.Change,data=x,xlab="Percent Change") >> 5 dev.off() >> >> I appreciate any help. I'm especially curious if there's a Lion user who could give this a try. >> >> OS X 10.7.0 (Lion) >> R version 2.13.0 (2011-04-13) >> Copyright (C) 2011 The R Foundation for Statistical Computing >> ISBN 3-900051-07-0 >> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit) > From djmuser at gmail.com Fri Aug 5 07:19:08 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Thu, 4 Aug 2011 22:19:08 -0700 Subject: [R] functions on rows or columns of two (or more) arrays In-Reply-To: References: Message-ID: Hi: Here's one approach: a=matrix(1:50,nrow=10) a2=floor(jitter(a,amount=50)) # Write a function to combine the columns of interest # into a data frame and fit a linear model regfn <- function(k) { rdf <- data.frame(x = a[k, ], y = a2[k, ]) lm(y ~ x, data = rdf) } # Use lapply() to run regfn() recursively along # the rows of a and a2: modlist <- lapply(seq_len(nrow(a)), regfn) # I prefer plyr for extraction of output from a list of models. # Here are a few examples: library('plyr') # Extract the R^2 values ldply(modlist, function(m) summary(m)$r.squared) # Extract the residuals laply(modlist, function(m) resid(m)) # Extract the estimated model coefficients ldply(modlist, function(m) coef(m)) # Extract the coefficient summary tables as a list llply(modlist, function(m) summary(m)$coefficients) In the anonymous functions, the argument m refers to an arbitrary lm object, so you can do to it what you would with any given lm object; all you're doing is abstracting the process. HTH, Dennis On Thu, Aug 4, 2011 at 2:17 PM, Jim Bouldin wrote: > I realize this should be simple, but even after reading over the several > help pages several times, I still cannot decide between the myriad "apply" > functions to address it. ?I simply want to apply a function to all the rows > (or columns) of the same index from two (or more) identically sized arrays > (or data frames). > > For example: > >> a=matrix(1:50,nrow=10) >> a2=floor(jitter(a,amount=50)) >> a > ? ? ?[,1] [,2] [,3] [,4] [,5] > ?[1,] ? ?1 ? 11 ? 21 ? 31 ? 41 > ?[2,] ? ?2 ? 12 ? 22 ? 32 ? 42 > ?[3,] ? ?3 ? 13 ? 23 ? 33 ? 43 > ?[4,] ? ?4 ? 14 ? 24 ? 34 ? 44 > ?[5,] ? ?5 ? 15 ? 25 ? 35 ? 45 > ?[6,] ? ?6 ? 16 ? 26 ? 36 ? 46 > ?[7,] ? ?7 ? 17 ? 27 ? 37 ? 47 > ?[8,] ? ?8 ? 18 ? 28 ? 38 ? 48 > ?[9,] ? ?9 ? 19 ? 29 ? 39 ? 49 > [10,] ? 10 ? 20 ? 30 ? 40 ? 50 >> a2 > ? ? ?[,1] [,2] [,3] [,4] [,5] > ?[1,] ? 31 56 -29 -13 10 > ?[2,] ? 38 ? 61 ? 71 ? 55 ? ?9 > ?[3,] ?-29 ? 38 ? 47 ? 12 ? 38 > ?[4,] ? 12 ? ?2 ? 43 ? 39 ? 93 > ?[5,] ?-43 ? 23 ?-23 ? 62 ? ?1 > ?[6,] ?-13 ? 61 ? 55 ? 11 ? ?2 > ?[7,] ?-42 ? ?1 ? 38 ? 12 ? ?8 > ?[8,] ?-13 ? -6 ?-18 ? 16 ? 95 > ?[9,] ?-19 ? -2 ? 78 ? 33 ? ?1 > [10,] ? 20 -16 -11 19 17 > > if I try the following for example: > apply(a,1,function(x) lm(a~a2)) > > I get 10 identical repeats (except for the list indexer) of the following: > > [[1]] > > Call: > lm(formula = a ~ a2) > > Coefficients: > ? ? ? ? ? ? [,1] ? ? ? [,2] ? ? ? [,3] ? ? ? [,4] ? ? ? [,5] > (Intercept) ? 8.372135 ?18.372135 ?28.372135 ?38.372135 ?48.372135 > a21 ? ? ? ? ?-0.006163 ?-0.006163 ?-0.006163 ?-0.006163 ?-0.006163 > a22 ? ? ? ? ?-0.093390 ?-0.093390 ?-0.093390 ?-0.093390 ?-0.093390 > a23 ? ? ? ? ? 0.009315 ? 0.009315 ? 0.009315 ? 0.009315 ? 0.009315 > a24 ? ? ? ? ?-0.015143 ?-0.015143 ?-0.015143 ?-0.015143 ?-0.015143 > a25 ? ? ? ? ?-0.026761 ?-0.026761 ?-0.026761 ?-0.026761 ?-0.026761 > > ...Which is clearly very wrong, in a number of ways. ?If I try by columns: > apply(a,2,function(x) lm(a~a2)) > ...I get exactly the same result. > > So, which is the appropriate apply-type function when two arrays (or > d.f.'s?) are involved like this? Or none of them and some other approach > (other than looping which I can do but which I assume is not optimal)? > Thanks for any help. > -- > Jim Bouldin, PhD > Research Ecologist > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From lamprianou at yahoo.com Fri Aug 5 07:58:52 2011 From: lamprianou at yahoo.com (Iasonas Lamprianou) Date: Thu, 4 Aug 2011 22:58:52 -0700 (PDT) Subject: [R] Latent Class with covariates In-Reply-To: References: Message-ID: <1312523932.87169.YahooMailNeo@web120604.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mackay at northnet.com.au Fri Aug 5 08:07:16 2011 From: mackay at northnet.com.au (Duncan Mackay) Date: Fri, 05 Aug 2011 16:07:16 +1000 Subject: [R] Sweave - landscape figure Message-ID: <201108050609.p75690hb030148@mail16.tpgi.com.au> Hi Eduardo in the preamble put \usepackage[figureright]{rotating} see manual for figureright if you do not like it and then some graphics with options where needed \begin{sidewaysfigure} \centering \includegraphics[width=,% clip=true,% trim=0in 0in 0in 0in,% LBRT keepaspectratio=true]% {filename} \end{sidewaysfigure} otherwise \usepackage landscape (check spelling) for a full page HTH Duncan Duncan Mackay Department of Agronomy and Soil Science University of New England ARMIDALE NSW 2351 Email: home mackay at northnet.com.au At 05:58 05/08/2011, you wrote: >On 04/08/2011 3:40 PM, Eduardo M. A. M. Mendes wrote: >>Dear R-users >> >>I am trying to understand how Sweave works by >>running some simple examples. In the example I >>am working with there is a chunk where the >>R-commands related to plotting a figure are >>placed. When running R CMD Sweave , pdflatex >>the output is a portrait figure. I wonder >>whether it would be possible to change the >>orientation to landscape (not in the latex file but in Rnw file). > >Sweave can change the height and width of the >figure so it is more landscape-shaped (width > >height) using options at the start of the chunk. > >Rotating a figure is something LaTeX needs to >do: you would tell Sweave to produce the figure >but not include it, then use \includegraphics{} >with the right option to rotate it. > >For example: > ><>= >plot(rnorm(100)) >@ > >\includegraphics[angle=90,width=0.8\textheight]{Myfig} > >This is untested, and you'll need to consult a >LaTeX reference for rotating the figure caption, etc. > >Duncan Murdoch > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. From petr.pikal at precheza.cz Fri Aug 5 08:40:36 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 08:40:36 +0200 Subject: [R] Odp: How to see the previous commands after save workspace/load workspace ? In-Reply-To: <4071d228-64dd-4bd7-b3cf-36c2e5c1b7d6@j15g2000yqf.googlegroups.com> References: <4071d228-64dd-4bd7-b3cf-36c2e5c1b7d6@j15g2000yqf.googlegroups.com> Message-ID: Hi > > I did save workspace and when I load it, I can see the variables, > using ls(). > But I cannot see the commands from the program I saved. How to do > that? Perhaps you can check .Rhistory file. Regards Petr > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From djmuser at gmail.com Fri Aug 5 09:07:38 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Fri, 5 Aug 2011 00:07:38 -0700 Subject: [R] Which is more efficient? In-Reply-To: References: Message-ID: Hi: Your question about efficiency does not seem well-posed to me. Efficient relative to what criterion? Rather than to address your question directly, I'll show how different possible situations that could arise in the general context of your problem can be addressed. One of the first rules in R programming is to learn the concepts of vectorization and indexing. This saves a lot of code down the line. R is not C(++) or Java, and it shouldn't be programmed as though it were. As a result, iterative approaches to problem solving in R are usually, but not always, inefficient. R has many vectorized functions which should be used whenever possible. Usually, the apply family of functions or one of the summarization packages (notably data.table, doBy and plyr, although there are others) can be exploited to recursively apply a function to different subsets of data. Consider three different situations below in which one might want to apply a t-test. Only one uses iteration. I'm using the plyr package because it is most flexible in terms of the types of input and output objects it can process. Let's start by manufacturing some matrix data: ## function to generate a matrix mgen <- function() matrix(rnorm(50), nrow = 10) ## use replicate() to generate an array marr <- replicate(4, mgen()) # a 10 x 5 x 4 array marr # A matrix of column indices to use in t.test() tcols <- matrix(c(1, 2, 1, 3, 1, 4, 1, 5), ncol = 2, byrow = TRUE) colnames(tcols) <- c('i', 'j') tcols # ------------------------ # Situation 1: multiple matrices, test the same pair # of columns in each, in this case 2 and 4. # The input argument m is a matrix. A data frame is # returned because that's what the adply() function in # the plyr package expects as output (a = array input, # d = data frame output) tfun1 <- function(m) { v <- t.test(m[, 2], m[, 4], var.equal = TRUE) data.frame(tstat = v$statistic, pval = v$p.value) } # adply takes the input array marr, iterates over the third index # and applies tfun1 to each marginal matrix res1 <- adply(marr, 3, tfun1) res1 # ------------------------ # Situation 2: one matrix, test multiple pairs of columns mat <- mgen() # generate a single matrix tfun2 <- function(i, j) { v <- t.test(mat[, i], mat[, j], var.equal = TRUE) data.frame(tstat = v$statistic, pval = v$p.value) } # mdply() takes the matrix of column indices as its first # argument. Notice that tfun2 was written so that its # arguments are i and j, the column names of tcols. # This is required, and the order matters. For each # row of tcols, the function tfun2 is applied to the # matrix mat. res2 <- mdply(tcols, tfun2) res2 # ------------------- # Situation 3: n matrices, different pairs of columns # tested in each # The idea is to perform a t-test on different pairs of # columns in each submatrix of marr. # The simplest thing to do in this situation is to # iterate, although there is probably some clever way to # do this using nested apply family calls. The reason for # iteration is that we want to operate on the same # relevant index of *both* marr and tcols. It's possible to # use mapply() for this task, but that would take more # explanation and this is long-winded enough. outmat <- matrix(NA, nrow = nrow(tcols), ncol = 4) for(k in seq_len(nrow(tcols))) { mat <- marr[, , k] # take k-th submatrix of marr cols <- tcols[k, ] # take k-th row of tcols v <- t.test(mat[, cols[1]], mat[, cols[2]], var.equal = TRUE) outmat[k, ] <- c(cols[1], cols[2], v$statistic, v$p.value) } colnames(outmat) <- c('col1', 'col2', 'tstat', 'pval') outmat Notice that the type of input matters, so the way in which the data are arranged has much to do with the way you program in R, especially with the apply family of functions and their offshoots in different packages. The basic programming strategy is to write a utility function that works for a generic subset of the input data, and then use one of the **ply() functions or functions in the apply family to map the function to different data subsets. HTH, Dennis On Thu, Aug 4, 2011 at 8:19 PM, Matt Curcio wrote: > Greetings all, > I am curious to know if either of these two sets of code is more efficient? > > Example1: > ?## t-test ## > colA <- temp [ , j ] > colB <- temp [ , k ] > ttr <- t.test ( colA, colB, var.equal=TRUE) > tt_pvalue [ i ] <- ttr$p.value > > or > Example2: > tt_pvalue [ i ] <- t.test ( temp[ , j ], temp[ , k ], var.equal=TRUE) > ------------- > I have three loops, i, j, k. > One to test the all of files in a directory. ?One to tease out > column and compare it by means of t-test to column in each of > the files. > --------------- > for ( i in 1:num_files ) { > ? temp <- read.table ( files_to_test [ i ], header=TRUE, sep="\t") > ? num_cols <- ncol ( temp ) > ? ## Define Columns To Compare ## > ? for ( j in 2 : num_cols ) { > ? ? ?for ( k in 3 : num_cols ) { > ? ? ? ? ?## t-test ## > ? ? ? ? ?colA <- temp [ , j ] > ? ? ? ? ?colB <- temp [ , k ] > ? ? ? ? ?ttr <- t.test ( colA, colB, var.equal=TRUE) > ? ? ? ? ?tt_pvalue [ i ] <- ttr$p.value > ? ? ?} > ? } > } > -------------------------------- > I am a novice writer of code and am interested to hear if there are > any (dis)advantages to one way or the other. > M > > > Matt Curcio > M: 401-316-5358 > E: matt.curcio.ri at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From petr.pikal at precheza.cz Fri Aug 5 09:14:31 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 09:14:31 +0200 Subject: [R] Odp: Translate Sine Function in R? In-Reply-To: References: Message-ID: Hi Are you sure about sine fit? Seems to me that logistics would be better fit<-nls(dat ~ SSlogis(x, Asym, xmid,scal), data = dat.df, start = list(Asym=90, xmid = 75, scal = -6)) plot(dat.df) lines(dat.df$x[complete.cases(dat.df)], predict(fit)) Regards Petr > > Hello, I'm trying to generate a sine wave in R to fit my observations using > the general formula: > > y=a*sin(b[x+h*pi)]+k > > where a = amplitude, b=period, h=phase shift, and k=vertical shift > > I want to use following translation to bring the sine function up onto the > y-axis to range from 0-1, and this will place the wave on the x-axis from > 0-pi/2. > > y=1/2sin(2[x+ 1/4*pi]) + 1/2 > > Additionally, I need to spread this along a x-axis that spans 1-153 (days). > > Can anyone help with this? I seem to be able to use the curve function fine, > but entering the translations doesn't seem to provide an answer. > > Here is an example of the data set I am trying to 'match' using this > function. > > dat <- > c(75.44855206,NA,NA,NA,82.70745342,82.5335019,88.56617647,80.00128866,94. > 15418227,86.63987539,93.91052952,74.10612245,86.62289562,90. > 47961047,NA,NA,82.45320197,72.14371257,NA,71.44104803,72.59742896,68. > 36363636,NA,NA,61,NA,NA,71.26502909,NA,85.93333333,84.34248284,79. > 00522193,79.64223058,97.2074017,88.43700548,96.40413877,95.13511869,92. > 57379057,93.97498475,NA,97.55995131,89.53321146,97.21728545,93.21980198, > 77.54054054,95.85392575,86.25684723,97.55325624,80.03950617,NA,91. > 34023128,92.42906574,88.59433962,65.77272727,89.63772455,NA,NA,NA,NA,74. > 86344239,83.57594937,70.22516556,65.30543319,NA,NA,67.84852294,60. > 90909091,54.79303797,NA,52.18735363,33.47003155,NA,41.34693878,24. > 5047043,NA,NA,NA,NA,9.944444444,13.6875,NA,11.90267176,84.14285714,3. > 781456954,NA,1.432926829,4.26557377,1.823529412,0.444620253,4. > 711155378,NA,6.320284698,0.581632653,0.144578313,3.666666667,0,0,0,0,0,NA, > 0.032947462,0,0,10.54545455,0,NA,0.561007958,0.75,NA,0.048780488,0. > 74137931,NA,2.023339318,0,0,0,NA,NA,0.156950673,NA,0.283769634,32. > 81818182,NA,NA,0,NA,0,0,0,NA,0.212454212,3.120181406,NA,0.011811024,NA,0, > 0.120430108,5.928571429,1.75,0.679292929,0.97,NA,0,NA,NA,1,0.38547486,NA, > 1.460732984,0.007795889,0.05465288,0.004341534) > plot(dat/100) > par(new=F) > x.seq <- seq(100, 0, , 153) > y <- ??? #*y = 2 sin 2?? (x - 1/4)* or y ~ a + c*sin(x+b) > > However, I can't find a reference for the no place for k. Also, I've tried a > lot of different iterations, but can't seem to figure out how to do this in > R. > > Any thoughts or ideas on this? > > Thank you, > > Katrina > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From maechler at stat.math.ethz.ch Fri Aug 5 09:45:27 2011 From: maechler at stat.math.ethz.ch (Martin Maechler) Date: Fri, 5 Aug 2011 09:45:27 +0200 Subject: [R] Can glmnet handle models with numeric and categorical data? In-Reply-To: References: <7A2BCEAF-D51B-4D24-A353-5AA998A519EF@me.com> Message-ID: <20027.40855.91997.454268@stat.math.ethz.ch> >>>>> "PS" == Paul Smith >>>>> on Fri, 5 Aug 2011 00:30:59 +0100 writes: PS> On Fri, Aug 5, 2011 at 12:02 AM, Marc Schwartz PS> wrote: >>> Can the x matrix in the glmnet() function of glmnet >>> package be a data.frame with numeric columns and factor >>> columns? I am asking this because I have a model with >>> both numeric and categorical predictors, which I would >>> like to study with glmnet. I have already tried to use a >>> data.frame, but with no success -- as far as I know, the >>> matrix object can only have data of a single type. Is >>> there some way of circumventing this problem? >> >> My recollection is that you would use ?model.matrix on >> the data frame to create the requisite matrix input for >> glmnet(). >> >> The caution however, is that glmnet() standardizes the >> input covariates, which is not appropriate for >> factors. Thus, you would want to set 'standardize = >> FALSE' and use appropriate methods in pre-processing >> continuous variables. PS> Again, Mark, thanks a lot for your so helpful answer -- PS> I completely ignored model.matrix(). Note the following: As soon as you use "categorical predictors", i.e., factors, and particularly when these have many levels (instead of just being binary), the resulting model matrix is often sparse, i.e. contains many zeros. When the matrix is ``really sparse',say, #{zeros} / #{non-zeros} >= 10 it can pay much to use the sparse matrices that the 'Matrix' package provides (you have 'Matrix' as part of your R installation). For exactly this reason, 'glmnet' has supported the use of sparse matrices for a long time, and we have provided the convenience function sparse.model.matrix() {package 'Matrix'} for easy construction of such matrices. There's also a very small extension package 'MatrixModels' which goes one step further, with its function model.Matrix(..... sparse = TRUE/FALSE) but you would not need that for using the sparseMatrix in 'glmnet'. -- Martin Maechler, ETH Zurich From jszhao at yeah.net Fri Aug 5 10:05:31 2011 From: jszhao at yeah.net (Jinsong Zhao) Date: Fri, 05 Aug 2011 16:05:31 +0800 Subject: [R] [Bug 14647] profile.mle can not get correct result In-Reply-To: <20110805065628.00FF99703DE@ix.urbanek.info> References: <20110805065628.00FF99703DE@ix.urbanek.info> Message-ID: <4E3BA44B.8090609@yeah.net> Thank you very much. now, i call mle(minuslogl=loglik, start=start, method <<- method, fixed=list()) in the mle.wrap() function, and the profile.mle() worked. however, it created a variable named "method" in user workspace. if there had been a variable with same name, then the value of that variable would be destroyed. Is there a way to avoid that happen? Thanks again. Regards, Jinsong On 2011-8-5 14:56, r-bugs at r-project.org wrote: > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14647 > > Brian Ripley changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|NEW |CLOSED > Resolution| |INVALID > > --- Comment #1 from Brian Ripley 2011-08-04 11:21:41 EDT --- > This is not a bug in R, just in your understanding of scoping. > > Please review the R FAQ and only use R-bugs for things you 'know for > certain' should work. > > On Thu, 4 Aug 2011, r-bugs at r-project.org wrote: > >> https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14647 >> >> Summary: profile.mle can not get correct result >> Product: R >> Version: R 2.13.1 >> Platform: ix86 (32-bit) >> OS/Version: Windows 32-bit >> Status: NEW >> Severity: enhancement >> Priority: P5 >> Component: S4methods >> AssignedTo: R-core at R-project.org >> ReportedBy: jszhao at yeah.net >> Estimated Hours: 0.0 >> >> >> Hi there, >> >> I hope to wrap mle() in a function, just like the following: >> >> mle.wrap<- function(x, n, r, method = "BFGS") { >> loglik<- function(alpha, beta) { >> P<- pnorm(alpha + beta * x) >> -(sum(r * log(P)) + sum((n - r) * log(1-P))) >> } >> start<- list(alpha = 0, beta = 0) >> mle(minuslogl = loglik, start = start, method = method, fixed = list()) >> } >> >> Then I call this function: >> >> x<- c(100, 56, 32, 18, 10, 1) >> r<- c(18, 17, 10, 6, 4, 3) >> n<- c(18, 22, 17, 21, 23, 20) >> >> z<- mle.wrap(x, n, r) >> >>> profile(z,1) >> An object of class "profile.mle" >> Slot "profile": >> $alpha >> z par.vals.alpha par.vals.beta >> 1 0 -1.17522678 0.03674818 >> >> >> Slot "summary": >> Maximum likelihood estimation >> >> Call: >> mle(minuslogl = loglik, start = start, method = method, fixed = list()) >> >> Coefficients: >> Estimate Std. Error >> alpha -1.17522678 0.21572863 >> beta 0.03674818 0.00656062 >> >> -2 log L: 111.1682 >> >> The result of profile(z,1) was not correct. I tried to track the bug (or >> feature?), and found that: >> >> pfit<- tryCatch(eval.parent(call, 2L), error = identity) >> >> give the following error message: >> >> >> >> Therefore, I thought that >> call$method<- fitted at method >> should be add into the profile.mle at, e.g., after >> call$minuslogl<- fitted at minuslogl >> >> >> Regards, >> Jinsong >> >> -- >> Configure bugmail: https://bugs.r-project.org/bugzilla3/userprefs.cgi?tab=email >> ------- You are receiving this mail because: ------- >> You are the assignee for the bug. >> >> _______________________________________________ >> R-core list: https://stat.ethz.ch/mailman/listinfo/r-core >> > From tz at looper.hu Fri Aug 5 09:51:27 2011 From: tz at looper.hu (=?UTF-8?B?VMOzdGggWm9sdMOhbg==?=) Date: Fri, 5 Aug 2011 00:51:27 -0700 Subject: [R] random value generation with added constraint. In-Reply-To: References: Message-ID: Hi, If I understand correctly, you can just simply keep generating norm values till you don't get the 100 appropriate one. I am pretty new to R, but here you go with a function that does this: rnorm25 <- function(mean,sd,X){ i = 0; ret = c() while ( i < 100 ){ r = rnorm(1,mean,sd) if ( r >= X & r <= X + 25 ){ ret = c(ret,r) i = i + 1 } } print(ret) } # test rnorm25(10,20,17) Cheers, zoltanctoth On Thu, Aug 4, 2011 at 9:03 AM, Vijayan Padmanabhan wrote: > Hi > I am looking at generating a random dataset of say 100 values fitting in a > normal distribution of a given mean and SD, I am aware of rnorm > function. However i am trying to build into this function one added > constraint that all the random value generated should also obey the > constraint that they only take values between say X to X+25 > How do i do this in R? > Any help would be highly appreciated,. > Thanks > Vijayan Padmanabhan > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jokel.meyer at googlemail.com Fri Aug 5 10:32:56 2011 From: jokel.meyer at googlemail.com (Jokel Meyer) Date: Fri, 5 Aug 2011 10:32:56 +0200 Subject: [R] Using NCBI E-Utilities in R to extract data from PubMed Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From joe.stata at gmail.com Fri Aug 5 10:40:25 2011 From: joe.stata at gmail.com (joe j) Date: Fri, 5 Aug 2011 10:40:25 +0200 Subject: [R] matrix into vector with vertex names Message-ID: Using Igraph, I create shortest paths, then convert the matrix into three column vectors - "vertex1", "vertex2", "shortestpath" - as the code below shows. #code for generating shortest path matrix and creating a 3 columns from an igraph graph object "y" y_s<-shortest.paths(y, weights = NULL) y_s <- melt(y_s)[melt(upper.tri(y_s))$value,] #Step 2: this is where the trouble with memory occurs y_s[,1] <- V(y)$name[y_s[,1]] y_s[,2] <- V(y)$name[y_s[,2]] names(y_s)<-c("vertex1", "vertex2", "shortestpath") However I am looking for an alternative way of doing this becase at the second step I run into a fight with my machine's memory. I know I can create vectors using as.vector(), c(), etc, but I am not able to create the two other columns with vertex names. Best regards, Joe. From jokel.meyer at googlemail.com Fri Aug 5 10:45:27 2011 From: jokel.meyer at googlemail.com (Jokel Meyer) Date: Fri, 5 Aug 2011 10:45:27 +0200 Subject: [R] Main-effect of categorical variables in meta-analysis (metafor) Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From JRadinger at gmx.at Fri Aug 5 11:02:20 2011 From: JRadinger at gmx.at (Johannes Radinger) Date: Fri, 05 Aug 2011 11:02:20 +0200 Subject: [R] Regression with ranges and displaying them in an XY-Plot Message-ID: <20110805090220.127520@gmx.net> Hello UseRs, just some additional questions to my post below: I have now an idea how I want to solve my question with the intervals as input for regression. a probable approach Y <- dependent X1 <- first independent X2 <- runif(X2a, X2b) and then the standard approach lm(Y~X1+X2) now just some questions: * how can I use runif(X2a, X2b) so that a whole vector is calculated? Do I have to use it in a for-loop. If I have defined the variables X2a and X2b and try runif(X2a, X2b) only the first value is calculated. *how can I repeat the lm function several (lets say 10000) times, so that every time a new random number is generated and the regression is calculated. And how can that be then summarized, like the mean regression coeeficients etc? I don't know what is the appropriate tool for that. thank you /Johannes Message: 11 Date: Thu, 28 Jul 2011 14:49:00 +0200 From: "Johannes Radinger" To: r-help at r-project.org Subject: [R] Regression with ranges and displaying them in an XY-Plot Message-ID: <20110728124900.198380 at gmx.net> Content-Type: text/plain; charset="utf-8" Hello UseRs, I've got 3 variables, the dependent variable Y as well as a max and a min value of the independent variable (Xa and Xb) where in some cases Xa=Xb (so actually a single value for X). First I'd like to perform a regression, but my problem is that my X is a range (acutally a censored independent variable Xa-Xb) rather then one single value. I know already some possible approaches like Bayesian Regression or EM algorithms but I've to dig deeper into that...But any suggestions so far? Another question arises: How can I display the ranges in a XY-Plot to just have a look at the data and possibly overlay it with the resulting regression line? How can I do that in R? I'd like to just display a line for every Xa-Xb pair. An idea was also to use the color for the lines depending on the range-length (the longer the more grey, the shorter the more black). Maybe you can give me some tips or some simple sample codes. Thank you very much /Johannes -- -- From lianpeng82 at gmail.com Fri Aug 5 11:34:49 2011 From: lianpeng82 at gmail.com (janus) Date: Fri, 5 Aug 2011 17:34:49 +0800 Subject: [R] A problem of is.list function Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From petr.pikal at precheza.cz Fri Aug 5 11:45:07 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 11:45:07 +0200 Subject: [R] Odp: A problem of is.list function In-Reply-To: References: Message-ID: Hi > > Hi all > > I want to use function of is.list and is.data.frame in if-else statement, > but I get a trouble. To replicate this trouble, I run codes like that: > > > set.seed(123) > > x <- rnorm(100) > > x <- data.frame(matrix(x, 10, 10)) > > class(x) > [1] "data.frame" > > is.list(x) > [1] TRUE > > is.data.frame(x) > [1] TRUE > > > version > _ > platform i386-pc-mingw32 > arch i386 > os mingw32 > system i386, mingw32 > status > major 2 > minor 13.1 > year 2011 > month 07 > day 08 > svn rev 56322 > language R > version.string R version 2.13.1 (2011-07-08) > > > The class of x is data frame, but when I use is.list, I also get a TRUE > result. Why does this happen? R-intro document which shall be included in any installation tells you about data frame 6.3 Data frames A data frame is a list with class "data.frame". There are restrictions on lists that may be made into data frames, namely The components must be vectors (numeric, character, or logical), factors, numeric matrices, lists, or other data frames. Matrices, lists, and data frames provide as many variables to the new data frame as they have columns, elements, or variables, respectively. Numeric vectors, logicals and factors are included as is, and character vectors are coerced to be factors, whose levels are the unique values appearing in the vector. Vector structures appearing as variables of the data frame must all have the same length, and matrix structures must all have the same row size. Regards Petr > > Lian Peng > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jim at bitwrit.com.au Fri Aug 5 12:24:09 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Fri, 05 Aug 2011 20:24:09 +1000 Subject: [R] 3D Bar Graphs in ggplot2? In-Reply-To: <20026.36464.561281.782365@stat.math.ethz.ch> References: <1312310369354-3713305.post@n4.nabble.com> <4E38635A.1000009@gmail.com> <20026.36464.561281.782365@stat.math.ethz.ch> Message-ID: <4E3BC4C9.4080803@bitwrit.com.au> On 08/04/2011 10:20 PM, Martin Maechler wrote: > ... > Yes!! > > 10th commandment of R: You shall not misuse R! > Moses, meet eminem. The human race will persistently misuse anything. Sometimes we even make a great discovery by doing so. Jim From p_connolly at slingshot.co.nz Fri Aug 5 12:24:45 2011 From: p_connolly at slingshot.co.nz (Patrick Connolly) Date: Fri, 5 Aug 2011 22:24:45 +1200 Subject: [R] RPMs needed to compile R using the tar.gz file Message-ID: <20110805102445.GD5650@slingshot.co.nz> I don't wish to install R by rpm. I need to know what Fedora rpms I need to install to give me the capability to install R using the tar.gz source file as I've done for years. On previous occasions when I've installed Fedora, I've used the DVD which has thousands of RPMs. Lately I've installed Fedora 15 from the Live CD which has a lot fewer and so a lot of necessary stuff is not installed yet. I've done the same not long ago with Kubuntu which required me to install about 20 debs before I could compile R. If I had access to that installation, I could probably work out what the corresponding rpms are. But I figured some clever person will have a list of the necessary rpms somewhere already. Or even a smarter search string than I can think of would be appreciated. There's a lot of information about installing R from an rpm but that's not what I wish to do. TIA -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Average minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Eleanor Roosevelt ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. From wolfgang.viechtbauer at maastrichtuniversity.nl Fri Aug 5 12:30:13 2011 From: wolfgang.viechtbauer at maastrichtuniversity.nl (Viechtbauer Wolfgang (STAT)) Date: Fri, 5 Aug 2011 12:30:13 +0200 Subject: [R] Main-effect of categorical variables in meta-analysis (metafor) In-Reply-To: References: Message-ID: <077E31A57DA26E46AB0D493C9966AC730C3258E5F8@UM-MAIL4112.unimaas.nl> Dear Jokel, If a moderator has 4 levels, then you need 3 dummy variables (one of the levels will become your "reference" level, so you do not need a dummy for that one). You can do the dummy-coding yourself or let R handle it. For example, if the moderator is called "catmod", then: rma(yi, vi, mods = ~ factor(catmod), data=some.data.frame) should do the trick (you do not need factor() if catmod is already a factor variable in the data frame). R will create the dummies for you. If you do not like which level is chosen as the reference level, then take a look at the relevel() function. In particular, rma(yi, vi, mods = ~ relevel(factor(catmod), ref = "reflevel"), data=some.data.frame) will set the reference level to "reflevel". If you have two categorical moderators, let's call them catmod1 and catmod2, then: res0 <- rma(yi, vi, mods = ~ factor(catmod1) + factor(catmod2), data=some.data.frame) will give you a model without and res1 <- rma(yi, vi, mods = ~ factor(catmod1) * factor(catmod2), data=some.data.frame) will give you a model with the interaction between the two factors. In the latter model, you will get lots of coefficients for the interaction, each testing whether the difference between a particular catmod2 level and the reference catmod2 level differs across the levels of catmod1 (and vice-versa). To test whether the interaction is significant in general, you can either do a Wald-type test with: rma(yi, vi, mods = ~ factor(catmod1)*factor(catmod2), data=some.data.frame, btt=X:Y) where X is the number of the first "interaction coefficient" and Y is the number of the last "interaction coefficient" (so, these are indices to indicate which coefficients should be tested simultaneously). In the output, you will the results of this test under "Test of Moderators". Alternatively, you can do a likelihood-ratio test with: res0 <- rma(yi, vi, mods = ~ factor(catmod1) + factor(catmod2), data=some.data.frame, method="ML") res1 <- rma(yi, vi, mods = ~ factor(catmod1) * factor(catmod2), data=some.data.frame, method="ML") anova(res1, res0) Note that you need to use ML-estimation when doing the LR-test. Similar omnibus tests of several coefficients (full versus reduced model comparisons) can be done for the main effects. An example with the BCG dataset: data(dat.bcg) dat <- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg, append=TRUE) dat$ablat.cat <- ifelse(dat$ablat > 30, "far", "close") rma(yi, vi, mods = ~ factor(alloc) * factor(ablat.cat), data=dat, btt=5:6) res0 <- rma(yi, vi, mods = ~ factor(alloc) + factor(ablat.cat), data=dat, method="ML") res1 <- rma(yi, vi, mods = ~ factor(alloc) * factor(ablat.cat), data=dat, method="ML") anova(res0, res1) I hope this helps! Best, -- Wolfgang Viechtbauer Department of Psychiatry and Neuropsychology School for Mental Health and Neuroscience Maastricht University, P.O. Box 616 6200 MD Maastricht, The Netherlands Tel: +31 (43) 368-5248 Fax: +31 (43) 368-8689 Web: http://www.wvbauer.com > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of Jokel Meyer > Sent: Friday, August 05, 2011 10:45 > To: r-help at r-project.org > Subject: [R] Main-effect of categorical variables in meta-analysis > (metafor) > > Dear R-experts! > > In a meta-analysis (metafor) I would like to assess the effect of two > categorical covariates (A & B) whereas they both have 4 levels. > Is my understanding correct that this would require to dummy-code (0,1) > each > level of each covariate (A & B)? > However I am interested in the main-effects and the interaction of these > two > covariates and the dummy-coding would only allow to detect the effect of > one > level of one factor. Would there be a way to assess main-effects and > interactions (something like an meta-analysis-ANOVA)? > > Many thanks, > Jokel > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From phhs80 at gmail.com Fri Aug 5 13:00:06 2011 From: phhs80 at gmail.com (Paul Smith) Date: Fri, 5 Aug 2011 12:00:06 +0100 Subject: [R] Can glmnet handle models with numeric and categorical data? In-Reply-To: <20027.40855.91997.454268@stat.math.ethz.ch> References: <7A2BCEAF-D51B-4D24-A353-5AA998A519EF@me.com> <20027.40855.91997.454268@stat.math.ethz.ch> Message-ID: On Fri, Aug 5, 2011 at 8:45 AM, Martin Maechler wrote: > Note the following: As soon as you use "categorical predictors", > i.e., factors, and particularly when these have many levels (instead of just > being binary), the resulting model matrix is often sparse, > i.e. contains many zeros. > When the matrix is ``really sparse',say, > ? ? #{zeros} / #{non-zeros} >= 10 > it can pay much to use the sparse matrices that the 'Matrix' > package provides (you have 'Matrix' as part of your R > installation). > > For exactly this reason, ?'glmnet' > has supported the use of sparse matrices for a long time, > and we have provided the convenience function > ? ?sparse.model.matrix() ?{package 'Matrix'} > for easy construction of such matrices. > > There's also a very small extension package ?'MatrixModels' > which goes one step further, with its function > ? ? ?model.Matrix(..... sparse = TRUE/FALSE) > but you would not need that for using the sparseMatrix in > 'glmnet'. Thanks, Martin. In my case, the number of potential predictors is high and many of them are factors with 5 categories. With sparse.model.matrix(), I am getting the following error : ?Error: C stack usage is too close to the limit.? I realize that my sparse matrix is huge -- and the error given by sparse.model.matrix() perfectly justified --, but I wonder whether this problem can be overcome by having sparse.model.matrix() using dynamic memory instead of static one. Paul From murdoch.duncan at gmail.com Fri Aug 5 13:08:23 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Fri, 5 Aug 2011 07:08:23 -0400 Subject: [R] boolean SEXP interpretation upon function return In-Reply-To: <201108050109.06283@spsconsultoria.com> References: <201108050109.06283@spsconsultoria.com> Message-ID: <4E3BCF27.8060206@gmail.com> On 11-08-05 12:09 AM, Alexandre Aguiar wrote: > Hi, > > When a function returns a SEXP of type LGLSXP (logical) to signal whether > it succeeded or failed, how is it intrepreted? Is it like C where SUCCESS > = 0 or other value? Usually TRUE is used to signal success. TRUE is non-zero. Duncan Murdoch From ripley at stats.ox.ac.uk Fri Aug 5 13:19:56 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Fri, 5 Aug 2011 12:19:56 +0100 Subject: [R] boolean SEXP interpretation upon function return In-Reply-To: <4E3BCF27.8060206@gmail.com> References: <201108050109.06283@spsconsultoria.com> <4E3BCF27.8060206@gmail.com> Message-ID: On Fri, 5 Aug 2011, Duncan Murdoch wrote: > On 11-08-05 12:09 AM, Alexandre Aguiar wrote: >> Hi, >> >> When a function returns a SEXP of type LGLSXP (logical) to signal whether >> it succeeded or failed, how is it intrepreted? Is it like C where SUCCESS >> = 0 or other value? > > Usually TRUE is used to signal success. TRUE is non-zero. Strictly, TRUE is not numeric: it is coerced to 1 when coerced to a numeric value. If you are looking at C level at the SEXP: don't as the internal representation is just that: 'internal and subject to change'. There is no C convention to use 0 for success: that is a Unix convention for status values as returned by exit(), and even there the man page will advise you to use the symbol EXIT_SUCCESS. Other OSes do differ. > > Duncan Murdoch > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From jim at bitwrit.com.au Fri Aug 5 13:48:51 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Fri, 5 Aug 2011 21:48:51 +1000 Subject: [R] boolean SEXP interpretation upon function return In-Reply-To: References: <201108050109.06283@spsconsultoria.com> <4E3BCF27.8060206@gmail.com> Message-ID: <4E3BD8A3.6040407@bitwrit.com.au> On 08/05/2011 09:19 PM, Prof Brian Ripley wrote: > On Fri, 5 Aug 2011, Duncan Murdoch wrote: > >> On 11-08-05 12:09 AM, Alexandre Aguiar wrote: >>> Hi, >>> >>> When a function returns a SEXP of type LGLSXP (logical) to signal >>> whether >>> it succeeded or failed, how is it intrepreted? Is it like C where >>> SUCCESS >>> = 0 or other value? >> >> Usually TRUE is used to signal success. TRUE is non-zero. > > Strictly, TRUE is not numeric: it is coerced to 1 when coerced to a > numeric value. > > If you are looking at C level at the SEXP: don't as the internal > representation is just that: 'internal and subject to change'. > > There is no C convention to use 0 for success: that is a Unix convention > for status values as returned by exit(), and even there the man page > will advise you to use the symbol EXIT_SUCCESS. Other OSes do differ. > see p164 Kernighan & Ritchie. Jim From alex at chaotic-neutral.de Fri Aug 5 14:10:57 2011 From: alex at chaotic-neutral.de (Alexander Engelhardt) Date: Fri, 5 Aug 2011 14:10:57 +0200 Subject: [R] RPMs needed to compile R using the tar.gz file In-Reply-To: <20110805102445.GD5650@slingshot.co.nz> References: <20110805102445.GD5650@slingshot.co.nz> Message-ID: <4E3BDDD1.4060008@chaotic-neutral.de> Am 05.08.2011 12:24, schrieb Patrick Connolly: > I don't wish to install R by rpm. I need to know what Fedora rpms I > need to install to give me the capability to install R using the > tar.gz source file as I've done for years. Try this command: rpm -qpR your_R.rpm | xargs rpm -ivh The part before the pipe symbol lists all dependent packages, and the xargs command uses those packages and appends them to 'rpm -ivh'. I guess you'll need the R rpm to check for its dependencies, but you don't have to install it. I didn't test the command, but I guess it will work, maybe with some minor tweaking. Cheers, Alex From abder.rahman.ali at gmail.com Fri Aug 5 12:45:24 2011 From: abder.rahman.ali at gmail.com (Abder-Rahman Ali) Date: Fri, 5 Aug 2011 12:45:24 +0200 Subject: [R] Scatter plot in R Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From emma.raven at jbaconsulting.co.uk Fri Aug 5 12:27:28 2011 From: emma.raven at jbaconsulting.co.uk (bevare) Date: Fri, 5 Aug 2011 03:27:28 -0700 (PDT) Subject: [R] excel dates and times in R Message-ID: <1312540048193-3720887.post@n4.nabble.com> Hello, I am having some fun dealing with dates and times. My input is a excel csv file with two columns with data in the following format: date time 25-Jun-1961 04:00:00 i.e. day - month - year hour:min:sec I would like to have a single object in R that combines these and converts them into a sensible R format (e.g. ISOdatetime(1961,06,25,04,00,00,tz="GMT"). I have played with the function chron and also strptime but can't seem to get them to work. Can anybody help me out please? Thanks Bevare -- View this message in context: http://r.789695.n4.nabble.com/excel-dates-and-times-in-R-tp3720887p3720887.html Sent from the R help mailing list archive at Nabble.com. From fraenzi.korner at oikostat.ch Fri Aug 5 12:14:40 2011 From: fraenzi.korner at oikostat.ch (fraenzi.korner at oikostat.ch) Date: 5 Aug 2011 12:14:40 +0200 Subject: [R] =?utf-8?q?R-help_Digest=2C_Vol_102=2C_Issue_5?= Message-ID: <20110805101440.32728.qmail@srv5.yoursite.ch> Wir sind bis am 20. August in den Ferien und werden keine e-mails beantworten. Bei dringenden F?llen melden Sie sich bei Stefanie von Felten steffi.vonfelten at oikostat.ch We are on vacation until 20. August. In urgent cases, please contact Stefanie von Felten steffi.vonfelten at oikostat.ch From hajjja at yahoo.com Fri Aug 5 13:54:21 2011 From: hajjja at yahoo.com (khadeeja ismail) Date: Fri, 5 Aug 2011 04:54:21 -0700 (PDT) Subject: [R] Very silent R Message-ID: <1312545261.53724.YahooMailClassic@web114703.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From kokavolchkov at gmail.com Fri Aug 5 12:11:56 2011 From: kokavolchkov at gmail.com (kokavolchkov) Date: Fri, 5 Aug 2011 03:11:56 -0700 (PDT) Subject: [R] R compare cells in one matrix Message-ID: <1312539116659-3720854.post@n4.nabble.com> Good morning! Please, could you help me with the problem? I have a matrix *144x73.* An example of a matrix: [,1] [,2] [,3] [,4] [,5] [1,] 277.4 276.24 275.62 276.55 278.05 [2,] 277.4 276.24 275.55 276.42 277.72 [3,] 277.4 276.24 275.50 276.22 277.39 [4,] 277.4 276.24 275.42 276.02 277.02 [5,] 277.4 276.22 275.37 275.82 276.64 And I want to *compare*its cells like this: a11 and a12; a11 and a21; a11 and a22. then a12 and a22; a12 and a13; a12 and a23. and so on in a cycle. *Is there a function or a package that can be used for comparation?* Thank you! -- View this message in context: http://r.789695.n4.nabble.com/R-compare-cells-in-one-matrix-tp3720854p3720854.html Sent from the R help mailing list archive at Nabble.com. From johannes at huesing.name Fri Aug 5 12:06:56 2011 From: johannes at huesing.name (Johannes =?ISO-8859-1?Q?H=FCsing?=) Date: Fri, 05 Aug 2011 06:06:56 -0400 Subject: [R] persp() In-Reply-To: References: Message-ID: <1312538816.1699.18.camel@kamil> Am Donnerstag, den 04.08.2011, 02:58 +0200 schrieb Rosario Garcia Gil: > I am trying to draw a basic black and white map of two European countries. > Are you planning just to draw the boundaries? Or what do you mean by "basic black and white". > After searching some key words in google and reading many pages I arrived to the conclusion that persp() could be used to draw that map. If you want to create a 3D map where z are, say, the altitudes, yes. Have you read http://cran.r-project.org/view=spatial ? There is no mentioning of persp(). > I have prepared three small example files, which are supposed to be the files required for running that function. > I haven't seen these files as attachments of your mail. I am certain they would help us a lot to see what your problem is. > xvector is a vector with the longitudes > yvector is a vector with the latitudes > zmatrix is supposed to the height, but since I only need a flat map I just gave the value 1 to each of the entries of the matrix (I am not sure this is correct though). > Then persp() will draw you a mesh of a flat plane, which is not informative, which gives me the impression that persp() is not hat you need. Have you run example(persp) at all? The result does not even look "basic black and white". > The first question for me when using persp() is that x and y values should be in increasing values (following the instructions), but I understand that the coordinates x and y are actually pairs of values (longitude/latitude pairs of values) and if I order them in ascending order both then the pairing is gone. I guess I am totally lost! > > Still even if I try to run persp() by ordering in ascending value x and y values (even if it does not make sense for me) I still get this message: > > <- persp(xvector,yvector,zmatrix,theta=-40,phi=30) > Error in persp.default(xvector, yvector, zmatrix, theta = -40, phi = 30) : > increasing 'x' and 'y' values expected > > Any help is wellcome. Is there any other better function to draw a flat map (2D), also example of the imput files is wellcome. Thanks in advance. > Rosario > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From pdalgd at gmail.com Fri Aug 5 14:32:29 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Fri, 5 Aug 2011 14:32:29 +0200 Subject: [R] RPMs needed to compile R using the tar.gz file In-Reply-To: <4E3BDDD1.4060008@chaotic-neutral.de> References: <20110805102445.GD5650@slingshot.co.nz> <4E3BDDD1.4060008@chaotic-neutral.de> Message-ID: <62E792A8-8E0E-44F2-959A-F329B63CDF9E@gmail.com> On Aug 5, 2011, at 14:10 , Alexander Engelhardt wrote: > Am 05.08.2011 12:24, schrieb Patrick Connolly: >> I don't wish to install R by rpm. I need to know what Fedora rpms I >> need to install to give me the capability to install R using the >> tar.gz source file as I've done for years. > > Try this command: > > rpm -qpR your_R.rpm | xargs rpm -ivh > > The part before the pipe symbol lists all dependent packages, and the xargs command uses those packages and appends them to 'rpm -ivh'. > I guess you'll need the R rpm to check for its dependencies, but you don't have to install it. > > I didn't test the command, but I guess it will work, maybe with some minor tweaking. The command will likely work, but it might not give the right answer. In some cases, you'll need -devel version of the RPMs, and they are not necessarily required by the binary RPM of R. So take Brian's advice over on R-devel. (Or do it the sloppy way that I have always used on Linuxen: Get a basic install going, and whenever a capability turns out to be missing, figure out which -devel RPM is needed and install it; lather, rinse, repeat...) -pd PS: The wisdom of saving, say, 20K of header files on installing a multi-megabyte library, just because "ordinary users don't need them" has always eluded me. I think it is against the spirit of free software to artificially separate users and programmers like that, but that's the way things are. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com "D?den skal tape!" --- Nordahl Grieg From murdoch.duncan at gmail.com Fri Aug 5 14:35:27 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Fri, 5 Aug 2011 08:35:27 -0400 Subject: [R] boolean SEXP interpretation upon function return In-Reply-To: References: <201108050109.06283@spsconsultoria.com> <4E3BCF27.8060206@gmail.com> Message-ID: <4E3BE38F.6010707@gmail.com> On 11-08-05 7:19 AM, Prof Brian Ripley wrote: > On Fri, 5 Aug 2011, Duncan Murdoch wrote: > >> On 11-08-05 12:09 AM, Alexandre Aguiar wrote: >>> Hi, >>> >>> When a function returns a SEXP of type LGLSXP (logical) to signal whether >>> it succeeded or failed, how is it intrepreted? Is it like C where SUCCESS >>> = 0 or other value? >> >> Usually TRUE is used to signal success. TRUE is non-zero. > > Strictly, TRUE is not numeric: it is coerced to 1 when coerced to a > numeric value. Another point I should have made: using logical values to signal success is fairly rare among R functions. More common is to trigger an error or return NULL or a zero length vector on failure. Most R functions return something other than TRUE on success, even if they are called for their side effects (e.g. setwd() returns the old directory, plot.new() returns NULL on success). Duncan Murdoch > > If you are looking at C level at the SEXP: don't as the internal > representation is just that: 'internal and subject to change'. > > There is no C convention to use 0 for success: that is a Unix > convention for status values as returned by exit(), and even there the > man page will advise you to use the symbol EXIT_SUCCESS. Other OSes > do differ. > >> >> Duncan Murdoch >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > From pllcc023 at gmail.com Fri Aug 5 14:40:43 2011 From: pllcc023 at gmail.com (Paola Lecca) Date: Fri, 5 Aug 2011 14:40:43 +0200 Subject: [R] fit a 2-variables function to data Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jholtman at gmail.com Fri Aug 5 14:44:11 2011 From: jholtman at gmail.com (jim holtman) Date: Fri, 5 Aug 2011 08:44:11 -0400 Subject: [R] Very silent R In-Reply-To: <1312545261.53724.YahooMailClassic@web114703.mail.gq1.yahoo.com> References: <1312545261.53724.YahooMailClassic@web114703.mail.gq1.yahoo.com> Message-ID: My version of R is not silent: > setwd("/notHere") Error in base::setwd(dir) : cannot change working directory > library(notHere) Error in library(notHere) : there is no package called 'notHere' Can you give the specific context in which you are using it. On Fri, Aug 5, 2011 at 7:54 AM, khadeeja ismail wrote: > Dear List, > > How can I get R to display error messages, for example, if I try to change to a non-existent directory or try to load a library that is not installed? Currently R is very silent. I did fix the problem once using 'options' (show.error.messages, I think), but id doesn't seem to be working any more, and R doesn't tell me if I have an error in my command. > Please let me know how I can fix this. > > Regards, > Hajja > > > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From petr.pikal at precheza.cz Fri Aug 5 14:43:33 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 14:43:33 +0200 Subject: [R] Odp: excel dates and times in R In-Reply-To: <1312540048193-3720887.post@n4.nabble.com> References: <1312540048193-3720887.post@n4.nabble.com> Message-ID: Hi > > Hello, > > I am having some fun dealing with dates and times. My input is a excel csv > file with two columns with data in the following format: > > date time > 25-Jun-1961 04:00:00 > > i.e. day - month - year hour:min:sec > > I would like to have a single object in R that combines these and converts > them into a sensible R format (e.g. > ISOdatetime(1961,06,25,04,00,00,tz="GMT"). > > I have played with the function chron and also strptime but can't seem to > get them to work. Maybe you did not set appropriate locale. See Sys.getlocale() > Sys.setlocale("LC_TIME","us") [1] "English_United States.1252" > strptime(paste(date, time, sep=" "), format="%d-%b-%Y %H:%M:%S") [1] "1961-06-25 04:00:00" > But AFAIK Jun is used in Great Britain similarly like in US so you shall not have problems. Regards Petr > > Can anybody help me out please? > > Thanks > > Bevare > > > -- > View this message in context: http://r.789695.n4.nabble.com/excel-dates- > and-times-in-R-tp3720887p3720887.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From bt_jannis at yahoo.de Fri Aug 5 14:48:08 2011 From: bt_jannis at yahoo.de (Jannis) Date: Fri, 05 Aug 2011 14:48:08 +0200 Subject: [R] Scatter plot in R In-Reply-To: References: Message-ID: <4E3BE688.9050603@yahoo.de> This is a really basic question that is answered in many R tutorials. Why dont you just google: "R import csv" And the first hit will tell you straight away what to do? Jannis P.S. I just guessed from your not very specific post that you may want to import from csv ... On 08/05/2011 12:45 PM, Abder-Rahman Ali wrote: > Hi, > > I have 334 records, with two columns: > > Column (1): Resolution Column (2): Number of images with a specific > resolution > > How can I make a scatter plot in R with this data? Is there a way to *import > * the records, since it will be time consuming to enter 334 records? > > Thanks. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From petr.pikal at precheza.cz Fri Aug 5 14:48:25 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 14:48:25 +0200 Subject: [R] Odp: Scatter plot in R In-Reply-To: References: Message-ID: Hi > > Hi, > > I have 334 records, with two columns: > > Column (1): Resolution Column (2): Number of images with a specific > resolution > > How can I make a scatter plot in R with this data? Is there a way to *import If you have your data in two column data frame you can just plot(your.data.frame) > * the records, since it will be time consuming to enter 334 records? Where do you have those 334 records. If on paper I can not see any other way then manually enter the records into computer. If they are aready in some kind of file you can use read.table or its relatives see ?read.table But without being more specific you can get only vague answers. Regards Petr > > Thanks. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From petr.pikal at precheza.cz Fri Aug 5 14:50:44 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 14:50:44 +0200 Subject: [R] Very silent R In-Reply-To: References: <1312545261.53724.YahooMailClassic@web114703.mail.gq1.yahoo.com> Message-ID: > > My version of R is not silent: Neither mine > library(akima) Error in library(akima) : there is no package called ?akima? > Regards Petr > > > setwd("/notHere") > Error in base::setwd(dir) : cannot change working directory > > library(notHere) > Error in library(notHere) : there is no package called 'notHere' > > > > Can you give the specific context in which you are using it. > > On Fri, Aug 5, 2011 at 7:54 AM, khadeeja ismail wrote: > > Dear List, > > > > How can I get R to display error messages, for example, if I try to > change to a non-existent directory or try to load a library that is not > installed? Currently R is very silent. I did fix the problem once using > 'options' (show.error.messages, I think), but id doesn't seem to be > working any more, and R doesn't tell me if I have an error in my command. > > Please let me know how I can fix this. > > > > Regards, > > Hajja > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From petr.pikal at precheza.cz Fri Aug 5 14:55:41 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 14:55:41 +0200 Subject: [R] Odp: fit a 2-variables function to data In-Reply-To: References: Message-ID: Hi > > Dearl all, > > I have to fit a function > > y = f(x1, x2) > to data experiemntal data describing the measured behavior of y. > > x1 and x2 are the independent variables. > > Could you suggest me wich R package can I use for this purpose? ?nls ?lm ?loess ?glm And there are plenty more if you do not stick to base packages. Hard to say without knowing what function do you want to use. Regards Petr > > Thanks, > Paola. > > > -- > *Paola Lecca, PhD* > *The Microsoft Research - University of Trento* > *Centre for Computational and Systems Biology* > *Piazza Manci 17 38123 Povo/Trento, Italy* > *Phome: +39 0461282843* > *Fax: +39 0461282814* > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ted.harding at wlandres.net Fri Aug 5 14:59:16 2011 From: ted.harding at wlandres.net ( (Ted Harding)) Date: Fri, 05 Aug 2011 13:59:16 +0100 (BST) Subject: [R] excel dates and times in R In-Reply-To: <1312540048193-3720887.post@n4.nabble.com> Message-ID: On 05-Aug-11 10:27:28, bevare wrote: > Hello, > I am having some fun dealing with dates and times. My input > is a excel csv file with two columns with data in the following > format: > > date time > 25-Jun-1961 04:00:00 > > i.e. day - month - year hour:min:sec > > I would like to have a single object in R that combines these > and converts them into a sensible R format (e.g. > ISOdatetime(1961,06,25,04,00,00,tz="GMT"). > > I have played with the function chron and also strptime but > can't seem to get them to work. > > Can anybody help me out please? > > Thanks > Bevare I know almost nothing about using the "time" functions in R, but since I see that I get: ISOdatetime(1961,06,25,04,00,00,tz="GMT") # [1] "1961-06-25 04:00:00 GMT" it would seem that you already have it almost made in the original Excel fields, namely: paste("1961-06-25","04:00:00","GMT",sep=" ") # [1] "1961-06-25 04:00:00 GMT" So you could write this as a function getISOdt <- function(date,time,zone){ paste(date,time,zone,sep=" ") } where the parameters date, time and zone are supplied as character strings: getISOdt("1961-06-25","04:00:00","GMT") [1] "1961-06-25 04:00:00 GMT" Then you can apply() this to the two comlumns in the dataframe you get when reading in the Excel CSV file, with zone being supplied independently. Hoping this helps, Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 05-Aug-11 Time: 13:59:14 ------------------------------ XFMail ------------------------------ From petr.pikal at precheza.cz Fri Aug 5 15:03:41 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 15:03:41 +0200 Subject: [R] fit a 2-variables function to data In-Reply-To: References: Message-ID: > > Thanks Petr for your response. > > The function I wnat to fit is > > y = (k1 * v2_Kd *p1^v2_h) / ( (v2_Kd^v2_h) + p1^v2_h ) ) * ( 1/(1 + (p6/ > v5_Kd)^v5_h) ) > where p1 and p2 are the independent variables. > > v2_K2, v2_h, v5_Kd and v5_h are the parameters I have to estimate. In that case you should look to ?nls or maybe ?optim Regards Petr > > Paola. > > > > On Fri, Aug 5, 2011 at 2:55 PM, Petr PIKAL wrote: > Hi > > > > Dearl all, > > > > I have to fit a function > > > > y = f(x1, x2) > > to data experiemntal data describing the measured behavior of y. > > > > x1 and x2 are the independent variables. > > > > Could you suggest me wich R package can I use for this purpose? > ?nls > ?lm > ?loess > ?glm > > And there are plenty more if you do not stick to base packages. Hard to > say without knowing what function do you want to use. > > Regards > Petr > > > > > > Thanks, > > Paola. > > > > > > -- > > *Paola Lecca, PhD* > > *The Microsoft Research - University of Trento* > > *Centre for Computational and Systems Biology* > > *Piazza Manci 17 38123 Povo/Trento, Italy* > > *Phome: +39 0461282843* > > *Fax: +39 0461282814* > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Paola Lecca, PhD > The Microsoft Research - University of Trento > Centre for Computational and Systems Biology > Piazza Manci 17 38123 Povo/Trento, Italy > Phome: +39 0461282843 > Fax: +39 0461282814 From bt_jannis at yahoo.de Fri Aug 5 15:04:56 2011 From: bt_jannis at yahoo.de (Jannis) Date: Fri, 05 Aug 2011 15:04:56 +0200 Subject: [R] fit a 2-variables function to data In-Reply-To: References: Message-ID: <4E3BEA78.5070002@yahoo.de> The answer pretty much depends on the kind of model you want to fit. For standard model would imagine that you do not need any R packages for this. Have a look at: ?lm #for linear models ?nls #for non linear stuff If you would have googled: "R fit model tutorial" prior to posting you would have found many helpful tutorials. HTH Jannis On 08/05/2011 02:40 PM, Paola Lecca wrote: > Dearl all, > > I have to fit a function > > y = f(x1, x2) > to data experiemntal data describing the measured behavior of y. > > x1 and x2 are the independent variables. > > Could you suggest me wich R package can I use for this purpose? > > Thanks, > Paola. > > From dimitri.liakhovitski at gmail.com Fri Aug 5 15:11:22 2011 From: dimitri.liakhovitski at gmail.com (Dimitri Liakhovitski) Date: Fri, 5 Aug 2011 09:11:22 -0400 Subject: [R] Efficient way of creating a shifted (lagged) variable? In-Reply-To: References: Message-ID: Michael is totally correct. I work with a data frame that happens to have weeks associated with it. So, it looks like I would not really benefit from ts functionality... Dimitri On Thu, Aug 4, 2011 at 11:37 PM, R. Michael Weylandt wrote: > Yes, the stats package has a lag function, but it's not really appropriate > for the sample data Dimitri gave us: to wit, it's not of "ts" (time series) > class so lag doesn't know what to do with it and gives an error message. > Perhaps it's just that I never really took the time to get used to it, but > I'm not a fan of R's "ts" class > > And while I do endorse the use of time series objects over data frames when > appropriate, I'd suggest going for the xts class which has greater > functionality and it's default lag seems to have a more logical default: > lagging the series in xts moves the first data point back, while it moves it > forward in the ts class. > > I guess my point is: if Dimitri is working with the dates directly instead > of time series (as some of his other posts have suggested), he should stay > in the data frame type and use the lag I, Daniel, or Josh wrote: if he has > proper time series, he might as well jump right into xts. > > Michael Weylandt > > On Thu, Aug 4, 2011 at 7:57 PM, Ken H wrote: >> >> Hey all, >> ? Correct me if I'm wrong but, the 'stats' package has a lag() function >> like so >> ? ?lagged.series=lag(series,number of lags wanted) >> ? ?Furthermore, I am pretty sure that lag( ) accepts negative lags:=> >> leads. >> ? ? ? ? ? ? ? ? ? lag(x,1)=> object of one lag, lag(x,-1) object with one >> lead. >> ? ? ? ? ? ? ? Hope this answers your question, >> ? ? ? ? ? ? ? ? ? ? ? ?Ken >> >> On Thu, Aug 4, 2011 at 4:19 PM, Dimitri Liakhovitski < >> dimitri.liakhovitski at gmail.com> wrote: >> >> > Thanks a lot for the recommendations - some of them I am implementing >> > already. >> > >> > Just a clarification: >> > the only reason I try to compare things to SPSS is that I am the only >> > person in my office using R. Whenever I work on an R code my goal is >> > not just to make it work, but also to "boast" to the SPSS users that >> > it's much easier/faster/niftier in R. So, you are preaching to the >> > choir here. >> > >> > Dimitri >> > >> > >> > On Thu, Aug 4, 2011 at 4:02 PM, Joshua Wiley >> > wrote: >> > > >> > > >> > > On Aug 4, 2011, at 11:46, Dimitri Liakhovitski < >> > dimitri.liakhovitski at gmail.com> wrote: >> > > >> > >> Thanks a lot, guys! >> > >> It's really helpful. But - to be objective- it's still quite a few >> > >> lines longer than in SPSS. >> > > >> > > Not once you've sources the function! ?For the simple case of a >> > > vector, >> > try: >> > > >> > > X <- 1:10 >> > > mylag2 <- function(X, lag) { >> > > ?c(rep(NA, length(seq(lag))), X[-seq(lag)]) >> > > } >> > > >> > > Though this does not work for lead, it is fairly short. Then you could >> > use the *apply family if you needed it on multiple columns or vectors. >> > > >> > > Cheers, >> > > >> > > Josh >> > > >> > >> Dimitri >> > >> >> > >> On Thu, Aug 4, 2011 at 2:36 PM, Daniel Nordlund < >> > djnordlund at frontier.com> wrote: >> > >>> >> > >>> >> > >>>> -----Original Message----- >> > >>>> From: r-help-bounces at r-project.org [mailto: >> > r-help-bounces at r-project.org] >> > >>>> On Behalf Of Dimitri Liakhovitski >> > >>>> Sent: Thursday, August 04, 2011 8:24 AM >> > >>>> To: r-help >> > >>>> Subject: [R] Efficient way of creating a shifted (lagged) variable? >> > >>>> >> > >>>> Hello! >> > >>>> >> > >>>> I have a data set: >> > >>>> set.seed(123) >> > >>>> y<-data.frame(week=seq(as.Date("2010-01-03"), as.Date("2011-01- >> > >>>> 31"),by="week")) >> > >>>> y$var1<-c(1,2,3,round(rnorm(54),1)) >> > >>>> y$var2<-c(10,20,30,round(rnorm(54),1)) >> > >>>> >> > >>>> # All I need is to create lagged variables for var1 and var2. I >> > >>>> looked >> > >>>> around a bit and found several ways of doing it. They all seem >> > >>>> quite >> > >>>> complicated - while in SPSS it's just a few letters (like LAG()). >> > >>>> Here >> > >>>> is what I've written but I wonder. It works - but maybe there is a >> > >>>> very simple way of doing it in R that I could not find? >> > >>>> I need the same for "lead" (opposite of lag). >> > >>>> Any hint is greatly appreciated! >> > >>>> >> > >>>> ### The function I created: >> > >>>> mylag <- function(x,max.lag=1){ ? # x has to be a 1-column data >> > >>>> frame >> > >>>> ? ?temp<- >> > >>>> >> > as.data.frame(embed(c(rep(NA,max.lag),x[[1]]),max.lag+1))[2:(max.lag+1)] >> > >>>> ? ?for(i in 1:length(temp)){ >> > >>>> ? ? ?names(temp)[i]<-paste(names(x),".lag",i,sep="") >> > >>>> ? ? } >> > >>>> ? return(temp) >> > >>>> } >> > >>>> >> > >>>> ### Running mylag to get my result: >> > >>>> myvars<-c("var1","var2") >> > >>>> for(i in myvars) { >> > >>>> ? y<-cbind(y,mylag(y[i]),max.lag=2) >> > >>>> } >> > >>>> (y) >> > >>>> >> > >>>> -- >> > >>>> Dimitri Liakhovitski >> > >>>> marketfusionanalytics.com >> > >>>> >> > >>> >> > >>> Dimitri, >> > >>> >> > >>> I would first look into the zoo package as has already been >> > >>> suggested. >> > ?However, if you haven't already got your solution then here are a >> > couple of >> > functions that might help you get started. ?I won't vouch for >> > efficiency. >> > >>> >> > >>> >> > >>> lag.fun <- function(df, x, max.lag=1) { >> > >>> ?for(i in x) { >> > >>> ? ?for(j in 1:max.lag){ >> > >>> ? ? ?lagx <- paste(i,'.lag',j,sep='') >> > >>> ? ? ?df[,lagx] <- c(rep(NA,j),df[1:(nrow(df)-j),i]) >> > >>> ? ?} >> > >>> ?} >> > >>> ?df >> > >>> } >> > >>> >> > >>> lead.fun <- function(df, x, max.lead=1) { >> > >>> ?for(i in x) { >> > >>> ? ?for(j in 1:max.lead){ >> > >>> ? ? ?leadx <- paste(i,'.lead',j,sep='') >> > >>> ? ? ?df[,leadx] <- c(df[(j+1):(nrow(df)),i],rep(NA,j)) >> > >>> ? ?} >> > >>> ?} >> > >>> ?df >> > >>> } >> > >>> >> > >>> y <- lag.fun(y,myvars,2) >> > >>> y <- lead.fun(y,myvars,2) >> > >>> >> > >>> >> > >>> Hope this is helpful, >> > >>> >> > >>> Dan >> > >>> >> > >>> Daniel Nordlund >> > >>> Bothell, WA USA >> > >>> >> > >>> >> > >>> >> > >> >> > >> >> > >> >> > >> -- >> > >> Dimitri Liakhovitski >> > >> marketfusionanalytics.com >> > >> >> > >> ______________________________________________ >> > >> R-help at r-project.org mailing list >> > >> https://stat.ethz.ch/mailman/listinfo/r-help >> > >> PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > >> and provide commented, minimal, self-contained, reproducible code. >> > > >> > >> > >> > >> > -- >> > Dimitri Liakhovitski >> > marketfusionanalytics.com >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- Dimitri Liakhovitski marketfusionanalytics.com From michael.weylandt at gmail.com Fri Aug 5 15:12:34 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Fri, 5 Aug 2011 09:12:34 -0400 Subject: [R] R compare cells in one matrix In-Reply-To: <1312539116659-3720854.post@n4.nabble.com> References: <1312539116659-3720854.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bt_jannis at yahoo.de Fri Aug 5 14:53:27 2011 From: bt_jannis at yahoo.de (Jannis) Date: Fri, 05 Aug 2011 14:53:27 +0200 Subject: [R] excel dates and times in R In-Reply-To: <1312540048193-3720887.post@n4.nabble.com> References: <1312540048193-3720887.post@n4.nabble.com> Message-ID: <4E3BE7C7.4020108@yahoo.de> Well, strptime is certainly the way to go. You did not provide any reproducible example so i can just roughly point out the way to go: 1: combine the two colums to one with paste(,collapse='_') 2: strptime() with the corresponding formats (like "%d-%b-%Y_%H:%M:%S" or similar) HTH Jannis On 08/05/2011 12:27 PM, bevare wrote: > Hello, > > I am having some fun dealing with dates and times. My input is a excel csv > file with two columns with data in the following format: > > date time > 25-Jun-1961 04:00:00 > > i.e. day - month - year hour:min:sec > > I would like to have a single object in R that combines these and converts > them into a sensible R format (e.g. > ISOdatetime(1961,06,25,04,00,00,tz="GMT"). > > I have played with the function chron and also strptime but can't seem to > get them to work. > > Can anybody help me out please? > > Thanks > > Bevare > > > -- > View this message in context: http://r.789695.n4.nabble.com/excel-dates-and-times-in-R-tp3720887p3720887.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jholtman at gmail.com Fri Aug 5 15:20:50 2011 From: jholtman at gmail.com (jim holtman) Date: Fri, 5 Aug 2011 09:20:50 -0400 Subject: [R] R compare cells in one matrix In-Reply-To: References: <1312539116659-3720854.post@n4.nabble.com> Message-ID: But there is: Com`pa`ra?tion n. 1. A making ready; provision. Webster's Revised Unabridged Dictionary, published 1913 by C. & G. Merriam Co. On Fri, Aug 5, 2011 at 9:12 AM, R. Michael Weylandt wrote: > *Is there a function or a package that can be used for comparation?* > > Given that there is no such thing as comparation, I can only guess that > there's not... > > Michael > > PS -- Write back and actually explain what you are ?trying to do and we'll > talk. > > On Fri, Aug 5, 2011 at 6:11 AM, kokavolchkov wrote: > >> Good morning! >> >> Please, could you help me with the problem? >> >> I have a matrix *144x73.* >> >> An example of a matrix: >> >> ? ? ? ?[,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? [,5] >> ?[1,] 277.4 276.24 275.62 276.55 278.05 >> ?[2,] 277.4 276.24 275.55 276.42 277.72 >> ?[3,] 277.4 276.24 275.50 276.22 277.39 >> ?[4,] 277.4 276.24 275.42 276.02 277.02 >> ?[5,] 277.4 276.22 275.37 275.82 276.64 >> >> And I want to *compare*its cells like this: >> >> a11 and a12; >> a11 and a21; >> a11 and a22. >> >> then >> >> a12 and a22; >> a12 and a13; >> a12 and a23. >> >> and so on in a cycle. >> *Is there a function or a package that can be used for comparation?* >> >> Thank you! >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/R-compare-cells-in-one-matrix-tp3720854p3720854.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jvadams at usgs.gov Fri Aug 5 15:22:21 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Fri, 5 Aug 2011 08:22:21 -0500 Subject: [R] R compare cells in one matrix In-Reply-To: <1312539116659-3720854.post@n4.nabble.com> References: <1312539116659-3720854.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From michael.weylandt at gmail.com Fri Aug 5 15:26:56 2011 From: michael.weylandt at gmail.com (R. Michael Weylandt) Date: Fri, 5 Aug 2011 09:26:56 -0400 Subject: [R] R compare cells in one matrix In-Reply-To: References: <1312539116659-3720854.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From hajjja at yahoo.com Fri Aug 5 15:21:34 2011 From: hajjja at yahoo.com (khadeeja ismail) Date: Fri, 5 Aug 2011 06:21:34 -0700 (PDT) Subject: [R] Very silent R In-Reply-To: Message-ID: <1312550494.86779.YahooMailClassic@web114713.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jholtman at gmail.com Fri Aug 5 15:32:46 2011 From: jholtman at gmail.com (jim holtman) Date: Fri, 5 Aug 2011 09:32:46 -0400 Subject: [R] R compare cells in one matrix In-Reply-To: References: <1312539116659-3720854.post@n4.nabble.com> Message-ID: My standard variation of my tag line is "tell me what you want to do, not how you want to do it". This is what you were originally saying. On Fri, Aug 5, 2011 at 9:26 AM, R. Michael Weylandt wrote: > Oh snap! > > Well then, kokavolchkov, I suggest you look into the riveRforge package I've > just put on R forge: I'll add it as a subroutine to the dieofdysentry(..., > scurvy=F) function I've been tweaking. :-P > > Michael > > PS -- Jean Adams is obviously much nicer than I am... > > On Fri, Aug 5, 2011 at 9:20 AM, jim holtman wrote: >> >> But there is: >> >> Com`pa`ra?tion >> n. 1. A making ready; provision. >> >> Webster's Revised Unabridged Dictionary, published 1913 by C. & G. Merriam >> Co. >> >> >> >> On Fri, Aug 5, 2011 at 9:12 AM, R. Michael Weylandt >> wrote: >> > *Is there a function or a package that can be used for comparation?* >> > >> > Given that there is no such thing as comparation, I can only guess that >> > there's not... >> > >> > Michael >> > >> > PS -- Write back and actually explain what you are ?trying to do and >> > we'll >> > talk. >> > >> > On Fri, Aug 5, 2011 at 6:11 AM, kokavolchkov >> > wrote: >> > >> >> Good morning! >> >> >> >> Please, could you help me with the problem? >> >> >> >> I have a matrix *144x73.* >> >> >> >> An example of a matrix: >> >> >> >> ? ? ? ?[,1] ? ? [,2] ? ? ?[,3] ? ? ?[,4] ? ? [,5] >> >> ?[1,] 277.4 276.24 275.62 276.55 278.05 >> >> ?[2,] 277.4 276.24 275.55 276.42 277.72 >> >> ?[3,] 277.4 276.24 275.50 276.22 277.39 >> >> ?[4,] 277.4 276.24 275.42 276.02 277.02 >> >> ?[5,] 277.4 276.22 275.37 275.82 276.64 >> >> >> >> And I want to *compare*its cells like this: >> >> >> >> a11 and a12; >> >> a11 and a21; >> >> a11 and a22. >> >> >> >> then >> >> >> >> a12 and a22; >> >> a12 and a13; >> >> a12 and a23. >> >> >> >> and so on in a cycle. >> >> *Is there a function or a package that can be used for comparation?* >> >> >> >> Thank you! >> >> >> >> -- >> >> View this message in context: >> >> >> >> http://r.789695.n4.nabble.com/R-compare-cells-in-one-matrix-tp3720854p3720854.html >> >> Sent from the R help mailing list archive at Nabble.com. >> >> >> >> ______________________________________________ >> >> R-help at r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > ? ? ? ?[[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> >> -- >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? > > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From sarah.goslee at gmail.com Fri Aug 5 15:35:27 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Fri, 5 Aug 2011 09:35:27 -0400 Subject: [R] Very silent R In-Reply-To: <1312550494.86779.YahooMailClassic@web114713.mail.gq1.yahoo.com> References: <1312550494.86779.YahooMailClassic@web114713.mail.gq1.yahoo.com> Message-ID: Well, that's enough to establish that it's a problem with your system, not with R. What happens if you do: options(show.error.messages=TRUE) What is your sessionInfo() ? We need more information to be able to have any chance of helping. On Fri, Aug 5, 2011 at 9:21 AM, khadeeja ismail wrote: > > My version of R is not silent: > >> setwd("/notHere") > Error in base::setwd(dir) : cannot change working directory >> library(is notHere) > Error in library(notHere) : there is no package called 'notHere' > > > That is the exact problem. > If I type the above commands, R says nothing and just displays the prompt. >> setwd("/notHere") >> >> library(is notHere) >> > > ...and it is so annoying :\ > > Hajja > > > > > > > > > > Can you give the specific context in which you are using it. > > On Fri, Aug 5, 2011 at 7:54 AM, khadeeja ismail wrote: >> Dear List, >> >> How can I get R to display error messages, for example, if I try to change to a non-existent directory or try to load a library that is not installed? Currently R is very silent. I did fix the problem once using 'options' (show.error.messages, I think), but id doesn't seem to be working any more, and R doesn't tell me if I have an error in my command. >> Please let me know how I can fix this. >> >> Regards, >> Hajja >> >> >> > -- Sarah Goslee http://www.functionaldiversity.org From jholtman at gmail.com Fri Aug 5 15:36:30 2011 From: jholtman at gmail.com (jim holtman) Date: Fri, 5 Aug 2011 09:36:30 -0400 Subject: [R] Very silent R In-Reply-To: <1312550494.86779.YahooMailClassic@web114713.mail.gq1.yahoo.com> References: <1312550494.86779.YahooMailClassic@web114713.mail.gq1.yahoo.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From petr.pikal at precheza.cz Fri Aug 5 15:38:24 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 5 Aug 2011 15:38:24 +0200 Subject: [R] Very silent R In-Reply-To: <1312550494.86779.YahooMailClassic@web114713.mail.gq1.yahoo.com> References: <1312550494.86779.YahooMailClassic@web114713.mail.gq1.yahoo.com> Message-ID: Hi r-help-bounces at r-project.org napsal dne 05.08.2011 15:21:34: > khadeeja ismail > Odeslal: r-help-bounces at r-project.org > > > > My version of R is not silent: > > > setwd("/notHere") > Error in base::setwd(dir) : cannot change working directory > > library(is notHere) > Error in library(notHere) : there is no package called 'notHere' > > > That is the exact problem. > If I type the above commands, R says nothing and just displays the prompt. > > setwd("/notHere") > > > > library(is notHere) > > R version, OS? You can try it with -vanilla switch when starting R session? This shall start plain R with only necessary packages and without any data. Regards Petr > > ...and it is so annoying :\ > > Hajja > > > > > > > > > > Can you give the specific context in which you are using it. > > On Fri, Aug 5, 2011 at 7:54 AM, khadeeja ismail wrote: > > Dear List, > > > > How can I get R to display error messages, for example, if I try to > change to a non-existent directory or try to load a library that is not > installed? Currently R is very silent. I did fix the problem once using > 'options' (show.error.messages, I think), but id doesn't seem to be > working any more, and R doesn't tell me if I have an error in my command. > > Please let me know how I can fix this. > > > > Regards, > > Hajja > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From phhs80 at gmail.com Fri Aug 5 15:47:48 2011 From: phhs80 at gmail.com (Paul Smith) Date: Fri, 5 Aug 2011 14:47:48 +0100 Subject: [R] Goodness of fit of binary logistic model Message-ID: Dear All, I have just estimated this model: ----------------------------------------------------------- Logistic Regression Model lrm(formula = Y ~ X16, x = T, y = T) Model Likelihood Discrimination Rank Discrim. Ratio Test Indexes Indexes Obs 82 LR chi2 5.58 R2 0.088 C 0.607 0 46 d.f. 1 g 0.488 Dxy 0.215 1 36 Pr(> chi2) 0.0182 gr 1.629 gamma 0.589 max |deriv| 9e-11 gp 0.107 tau-a 0.107 Brier 0.231 Coef S.E. Wald Z Pr(>|Z|) Intercept -1.3218 0.5627 -2.35 0.0188 X16=1 1.3535 0.6166 2.20 0.0282 ----------------------------------------------------------- Analyzing the goodness of fit: ----------------------------------------------------------- > resid(model.lrm,'gof') Sum of squared errors Expected value|H0 SD 1.890393e+01 1.890393e+01 6.073415e-16 Z P -8.638125e+04 0.000000e+00 > ----------------------------------------------------------- >From the above calculated p-value (0.000000e+00), one should discard this model. However, there is something that is puzzling me: If the 'Expected value|H0' is so coincidental with the 'Sum of squared errors', why should one discard the model? I am certainly missing something. Thanks in advance, Paul From pjmiller_57 at yahoo.com Fri Aug 5 15:52:31 2011 From: pjmiller_57 at yahoo.com (Paul Miller) Date: Fri, 5 Aug 2011 06:52:31 -0700 (PDT) Subject: [R] Multiple endpoint (possibly group sequential) sample size calculation In-Reply-To: Message-ID: <1312552351.95835.YahooMailClassic@web161619.mail.bf1.yahoo.com> Thanks for your reply Marc. I'll look into the information you sent. It ocurred to me that I could have done a better job of describing what I'm trying to do. It can be difficult to do that sometimes when your unfamiliar with a topic and are still trying to figure it out yourself. In investigating this sample size calculation, I started by learning a little about Simon Minimax designs. I started out with code that looks like the following: > library("clinfun") > > #### Single Stage Design #### > > ph2single(0.2, 0.4, 0.05, 0.10, 1) n r Type I error Type II error 1 47 14 0.03663689 0.09877433 > > #pu unacceptable response rate > #pa response rate that is desirable > #ep1 threshold for the probability of declaring drug desirable under p0 > #ep2 threshold for the probability of rejecting the drug under p1 > #nsoln number of designs with given alpha and beta > > #### Simon 2-Stage Optimal and Minimax Designs #### > > (simon <- ph2simon(0.2, 0.4, 0.05, 0.10)) Simon 2-stage Phase II design Un Desirable response rate: 0.4 Error rates: alpha = 0.05 ; beta = 0.1 r1 n1 r n EN(p0) PET(p0) Optimal 4 19 15 54 30.43 0.6733 Minimax 5 24 13 45 31.23 0.6559 > > #pu unacceptable response rate > #pa response rate that is desirable > #ep1 threshold for the probability of declaring drug desirable under p0 > #ep2 threshold for the probability of rejecting the drug under p1 > #nmax maximum total sample size (default 100; can be at most 500) Then I needed to simultaneously take into account an efficacy and a toxicity endpoint. The idea is that the new therapy might be more effective than an older one but also more toxic. The hope is that it will have substantially higher response rates without an unacceptable increase in toxicity. I found a online calculator that does this at: http://www.upci.upmc.edu/bf/resources.cfm The calculator produces results like this: INPUT PARAMATERS Probability of Accepting Poor Respose (alphar) --> 0.1 Probability of Accepting Toxic Drug (alphat) ----> 0.15 Probability of Rejecting Good Drug (beta) -------> 0.15 Unacceptable Response Probability (Pr0) ---------> 0.2 Acceptable Response Probability (Pr1) -----------> 0.4 Unacceptable Non-toxicity Probability (Pt0) ----> 0.6 Acceptable Non-toxicity Probability (Pt1) ----> 0.8 EARLY TERMINATION PROBABILITY Poor Response and Excessive Toxicity ------------> 0.79 Poor Response and Acceptable Toxicity -----------> 0.55 Good Response and Excessive Toxicity ------------> 0.56 Good Response and Acceptable Toxicity -----------> 0.05 THE OPTIMAL SOLUTION First Stage Sample Size ---------------------------> 22 Upper Limit For 1st Stage Rejecting Drug Due To Inadequate Response -> 4 Upper Limit For 1st Stage Rejecting Drug Due To Excessive Toxcity -> 13 Maximum Sample Size -------------------------------> 33 Upper Limit for 2nd Stage Rejecting Drug Due To Inadequate Response -> 9 Upper Limit for 2nd Stage Rejecting Drug Due To Excessive Toxcity -> 22 ---> Expected Sample Size 26.93 So this is an extension on the design. What I need though is an extension on an extension. Specifically, I need a sample size estimate for a two-arm instead of a one-arm design. Of course, some other approach to sample size estimation that achieves the same goal would be most welcome. I'm not sure I'm likely to find anyone with a solution to this problem. I just thought I should take the time to restate the problem as clearly as I can just in case I'm wrong. Thanks, Paul From andrew.steen at biology.au.dk Fri Aug 5 15:58:53 2011 From: andrew.steen at biology.au.dk (Andrew Steen) Date: Fri, 5 Aug 2011 15:58:53 +0200 Subject: [R] Games-Howell post-hoc testing Message-ID: Has anyone written a function for Games-Howell post-hoc testing* in R? Google tells me that there was none as of 2005, but perhaps things have changed since then. Thanks, Drew ?*Or similar: I am looking for a post-hoc testing algorithm that will work with (slightly) unequal sample sizes and possibly unequal variance among samples. -- Andrew D. Steen, Ph.D. Center for Geomicrobiology, Aarhus University Ny Munkegade 114 8000 ?rhus C Denmark andrew.steen at biology.au.dk From emammendes at gmail.com Fri Aug 5 16:23:20 2011 From: emammendes at gmail.com (Eduardo Mendes) Date: Fri, 5 Aug 2011 11:23:20 -0300 Subject: [R] Sweave - landscape figure In-Reply-To: <201108050609.p75690hb030148@mail16.tpgi.com.au> References: <201108050609.p75690hb030148@mail16.tpgi.com.au> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pllcc023 at gmail.com Fri Aug 5 16:40:33 2011 From: pllcc023 at gmail.com (Paola Lecca) Date: Fri, 5 Aug 2011 16:40:33 +0200 Subject: [R] problemsn in using nls Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From a.salucci at yahoo.com Fri Aug 5 16:49:24 2011 From: a.salucci at yahoo.com (Anera Salucci) Date: Fri, 5 Aug 2011 07:49:24 -0700 (PDT) Subject: [R] modeling repeated measurement ordinal responses Message-ID: <1312555764.53450.YahooMailNeo@web120305.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From murdoch.duncan at gmail.com Fri Aug 5 16:54:27 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Fri, 05 Aug 2011 10:54:27 -0400 Subject: [R] Sweave - landscape figure In-Reply-To: References: <201108050609.p75690hb030148@mail16.tpgi.com.au> Message-ID: <4E3C0423.1000209@gmail.com> On 05/08/2011 10:23 AM, Eduardo Mendes wrote: > Hello > > Many thanks for the replies. > > Solution 1 (landscape package) works but the output figure is kind of small. > > Solution2: includegraphics outside - Unfortunately it does not work. > Includegraphics cannot find Myfig. You did something you aren't telling us. (Since you aren't showing us what you did, that's pretty much certain.) > Solution 3- \usepackage[figureright]{rotating} works and the output figure > is in a reasonable size. > > I am using print(xyplot) from lattice to plot the figures and have noticed > that adding width and height options breaks Sweave in all cases. > It doesn't if you do it right. (I did it wrong: I used Latex style 7in instead of Sweave style 7.) Duncan Murdoch > Cheers > > Ed > > > > On Fri, Aug 5, 2011 at 3:07 AM, Duncan Mackaywrote: > > > Hi Eduardo > > > > in the preamble put > > > > \usepackage[figureright]{**rotating} > > > > see manual for figureright if you do not like it > > > > and then some graphics with options where needed > > > > \begin{sidewaysfigure} > > \centering > > \includegraphics[width=,% > > clip=true,% > > trim=0in 0in 0in 0in,% LBRT > > keepaspectratio=true]% > > {filename} > > \end{sidewaysfigure} > > > > otherwise \usepackage landscape (check spelling) for a full page > > > > HTH > > > > Duncan > > > > Duncan Mackay > > Department of Agronomy and Soil Science > > University of New England > > ARMIDALE NSW 2351 > > Email: home mackay at northnet.com.au > > > > > > > > At 05:58 05/08/2011, you wrote: > > > >> On 04/08/2011 3:40 PM, Eduardo M. A. M. Mendes wrote: > >> > >>> Dear R-users > >>> > >>> I am trying to understand how Sweave works by running some simple > >>> examples. In the example I am working with there is a chunk where the > >>> R-commands related to plotting a figure are placed. When running R CMD > >>> Sweave ? , pdflatex the output is a portrait figure. I wonder whether it > >>> would be possible to change the orientation to landscape (not in the latex > >>> file but in Rnw file). > >>> > >> > >> Sweave can change the height and width of the figure so it is more > >> landscape-shaped (width> height) using options at the start of the chunk. > >> > >> Rotating a figure is something LaTeX needs to do: you would tell Sweave > >> to produce the figure but not include it, then use \includegraphics{} with > >> the right option to rotate it. > >> > >> For example: > >> > >> <>= > >> plot(rnorm(100)) > >> @ > >> > >> \includegraphics[angle=90,**width=0.8\textheight]{Myfig} > >> > >> This is untested, and you'll need to consult a LaTeX reference for > >> rotating the figure caption, etc. > >> > >> Duncan Murdoch > >> > >> ______________________________**________________ > >> R-help at r-project.org mailing list > >> https://stat.ethz.ch/mailman/**listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/** > >> posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > ______________________________**________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/**listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/** > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From knifeboot at 163.com Fri Aug 5 17:23:06 2011 From: knifeboot at 163.com (KnifeBoot) Date: Fri, 5 Aug 2011 23:23:06 +0800 (CST) Subject: [R] plot the implicit function Message-ID: <2d3fb47d.cb7e.1319a8a661a.Coremail.knifeboot@163.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jdnewmil at dcn.davis.ca.us Fri Aug 5 17:27:00 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Fri, 05 Aug 2011 08:27:00 -0700 Subject: [R] problemsn in using nls In-Reply-To: References: Message-ID: <7ef7f6a2-f308-4a5d-8091-c198fbcd874c@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jvadams at usgs.gov Fri Aug 5 17:28:12 2011 From: jvadams at usgs.gov (Jean V Adams) Date: Fri, 5 Aug 2011 10:28:12 -0500 Subject: [R] problemsn in using nls In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From murdoch.duncan at gmail.com Fri Aug 5 17:28:58 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Fri, 05 Aug 2011 11:28:58 -0400 Subject: [R] plot the implicit function In-Reply-To: <2d3fb47d.cb7e.1319a8a661a.Coremail.knifeboot@163.com> References: <2d3fb47d.cb7e.1319a8a661a.Coremail.knifeboot@163.com> Message-ID: <4E3C0C3A.8000206@gmail.com> On 05/08/2011 11:23 AM, KnifeBoot wrote: > Is there anybody willing to help me with the method of plot. > please show me the command lines to plot the implicit function "x^4+y^3=x^2+y" You were already told how to do this: use contour. The help page explains how. > and > "x=(sin[t])^3 y=(cos[t])^3 t :[0,2*pi]" Compute t using seq(), then plot the x and y values from it. Duncan Murdoch From info at aghmed.fsnet.co.uk Fri Aug 5 17:33:56 2011 From: info at aghmed.fsnet.co.uk (Michael Dewey) Date: Fri, 05 Aug 2011 16:33:56 +0100 Subject: [R] Limited number of principal components in PCA In-Reply-To: <1312481229763-3719440.post@n4.nabble.com> References: <1311964387395-3704956.post@n4.nabble.com> <011301cc506b$5060b6a0$f12223e0$@edu> <1312481229763-3719440.post@n4.nabble.com> Message-ID: At 19:07 04/08/2011, William Armstrong wrote: >David and Josh, > >Thank you for the suggestions. I have attached a file ('q_values.txt') that >contains the values of the 'Q' variable. > >David -- I am attempting an 'S' mode PCA, where the columns are actually the >cases (different stream gaging stations) and the rows are the variables (the >maximum flow at each station for a given year). I think the format you are >referring to is 'R' mode, but I was under the impression that R (the >program, not the PCA mode) could handle the analyses in either format. Am I >mistaken? > >My first eigenvalue is: > > > unrotated_pca_q$sdev[1]^2 >[1] 17.77812 > >Does that value seem large enough to explain the reduction in principal >components from 65 to 54? try doing table(complete.cases(q_values)) or whatever you are calling q_valu