From marc_schwartz at me.com Fri Jul 1 00:07:29 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Thu, 30 Jun 2011 17:07:29 -0500 Subject: [R] storing the return of system() function In-Reply-To: References: Message-ID: <99AC2BB7-8827-4A53-9759-7E1C0C085B4A@me.com> On Jun 30, 2011, at 4:55 PM, John Dennison wrote: > To get the tech specs out of the way I am running Rstudio(R 2.13.0) on a > ubuntu box. I am trying to use information returned from the system command > within an Rscript. System() passes commands to my ubuntu prompt and returns > the normal messages. > For example. > >> system("date")Thu Jun 30 21:48:20 UTC 2011 > > However when i try to store this return(something i need to do for my > purposes) nothing is stored(or at least not what shown on the R prompt) > >> store<-system("date")Thu Jun 30 21:49:27 UTC 2011> store[1] 0 > > How do i capture the return of system()? > > Thanks R-world, > > John Dennison See the 'intern' argument: > system("date", intern = TRUE) [1] "Thu Jun 30 17:06:12 CDT 2011" This returns a character vector of the results of the command executed. HTH, Marc Schwartz From murdoch.duncan at gmail.com Fri Jul 1 00:30:55 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Thu, 30 Jun 2011 18:30:55 -0400 Subject: [R] source, echo...and clicking the mouse In-Reply-To: References: <001001cc3762$978b6740$c6a235c0$@msu.edu> <1014D1BD-97EB-4BBC-A0BE-258CACBA32AF@comcast.net> <002101cc376a$d988e430$8c9aac90$@msu.edu> Message-ID: <4E0CF91F.2080904@gmail.com> On 30/06/2011 5:33 PM, Greg Snow wrote: > On some operating systems (which we don't know yours, see the posting guide) the output is buffered and including a call to flush.console() will flush all the output from the buffer to the console. Put the function call throughout the script and when it is run it will stop buffering for a bit. The OP said he's running in Windows 7. In that case, there's the "Misc|Buffered output" option, which makes things appear as soon as they are printed. It appears to work for source(..., echo=TRUE) as well. Duncan Murdoch > > The other possibility is that your script does some plotting and that is what is pausing until you click. In that case you need to use a different graphics device or methodology to avoid this (see ?interactive). > From ashimkapoor at gmail.com Fri Jul 1 00:37:16 2011 From: ashimkapoor at gmail.com (Ashim Kapoor) Date: Fri, 1 Jul 2011 04:07:16 +0530 Subject: [R] Points but no lines in qplot. In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From walmeszeviani at gmail.com Fri Jul 1 01:14:03 2011 From: walmeszeviani at gmail.com (Walmes Zeviani) Date: Thu, 30 Jun 2011 20:14:03 -0300 Subject: [R] Error "singular gradient matrix at initial parameter estimates" in nls In-Reply-To: <4E0CD1BB.1090603@ucalgary.ca> References: <1309439667.12889.57.camel@niklaus-desktop> <4E0CD1BB.1090603@ucalgary.ca> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cody.shawn at yahoo.com Fri Jul 1 01:17:47 2011 From: cody.shawn at yahoo.com (Cody Hamilton) Date: Thu, 30 Jun 2011 16:17:47 -0700 (PDT) Subject: [R] testInstalledBasic Message-ID: <534401.21270.qm@web120505.mail.ne1.yahoo.com> Hello, I installed R 2.13.0 on a Windows 2003 server.? I downloaded the Rtools213.exe from http://www.murdoch-sutherland.com/Rtools/ and placed it in the path (C:\Program Files\R\R-2.13.0\bin). I submitted the following code: library(tools) Sys.setenv(LC_COLLATE=C) testInstalledBasic('basic') I get the following message in the R Console, which I believe corresponds to a failure of the test: > library(tools) > Sys.setenv(LC_COLLATE=C) > testInstalledBasic('basic') running strict specific tests ? running code in ?eval-etc.R? ? comparing ?eval-etc.Rout? to ?eval-etc.Rout.save? ...[1] 1 Is there something wrong with my install? Regards, ???-Cody From lwaldron.research at gmail.com Fri Jul 1 02:02:10 2011 From: lwaldron.research at gmail.com (Levi Waldron) Date: Thu, 30 Jun 2011 20:02:10 -0400 Subject: [R] highlighting clusters in a heatmap Message-ID: I would like to draw horizontal or vertical lines on a heatmap to highlight the clusters at some specified cut depth of the dendrogram. As a hacked example, the following code would work if I could set the coordinates of the top and bottom of the false color image correctly (ymin and ymax), but the correct values seem to depend on the output device and its size. I realize that heatmaps use a 2x2 layout which makes the coordinate system non-obvious, but the result seems very difficult to customize. I would appreciate any suggestions for manual or pre-made solutions. Example: set.seed(2) x <- matrix(rnorm(1000),ncol=10) #obviously no real clusters here... row.hclust <- hclust(dist(x)) row.dendro <- as.dendrogram(row.hclust) heatmap(x, Rowv=row.dendro) row.cut <- cutree(row.hclust,3)[row.hclust$order] cutpoints <- which(row.cut[-1]!=row.cut[-length(row.cut)]) ymin <- par("usr")[3] #in general incorrect ymax <- par("usr")[4] #in general incorrect for (i in cutpoints){ thisy <- ymin + (ymax-ymin)*(i-1)/nrow(x) abline(h=thisy,lw=3) } From wolfste4 at msu.edu Fri Jul 1 03:12:57 2011 From: wolfste4 at msu.edu (Steven Wolf) Date: Thu, 30 Jun 2011 21:12:57 -0400 Subject: [R] source, echo...and clicking the mouse In-Reply-To: <4E0CF91F.2080904@gmail.com> References: <001001cc3762$978b6740$c6a235c0$@msu.edu> <1014D1BD-97EB-4BBC-A0BE-258CACBA32AF@comcast.net> <002101cc376a$d988e430$8c9aac90$@msu.edu> <4E0CF91F.2080904@gmail.com> Message-ID: <000601cc378c$04692310$0d3b6930$@msu.edu> Ok, I think I see how flush.console() works. Just put it in and whatever has been done so far (and stored in the buffer) pops out. Unfortunately, I'm doing an optimization, and the process is slow. So that is where the "Misc|Buffered output" option is helpful. Thanks! -Steve -----Original Message----- From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] Sent: Thursday, June 30, 2011 6:31 PM To: Greg Snow Cc: Steven Wolf; 'David Winsemius'; r-help at r-project.org Subject: Re: [R] source, echo...and clicking the mouse On 30/06/2011 5:33 PM, Greg Snow wrote: > On some operating systems (which we don't know yours, see the posting guide) the output is buffered and including a call to flush.console() will flush all the output from the buffer to the console. Put the function call throughout the script and when it is run it will stop buffering for a bit. The OP said he's running in Windows 7. In that case, there's the "Misc|Buffered output" option, which makes things appear as soon as they are printed. It appears to work for source(..., echo=TRUE) as well. Duncan Murdoch > > The other possibility is that your script does some plotting and that is what is pausing until you click. In that case you need to use a different graphics device or methodology to avoid this (see ?interactive). > From dwinsemius at comcast.net Fri Jul 1 03:26:38 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 30 Jun 2011 21:26:38 -0400 Subject: [R] highlighting clusters in a heatmap In-Reply-To: References: Message-ID: On Jun 30, 2011, at 8:02 PM, Levi Waldron wrote: > I would like to draw horizontal or vertical lines on a heatmap to > highlight the clusters at some specified cut depth of the dendrogram. > As a hacked example, the following code would work if I could set the > coordinates of the top and bottom of the false color image correctly > (ymin and ymax), but the correct values seem to depend on the output > device and its size. I realize that heatmaps use a 2x2 layout which > makes the coordinate system non-obvious, but the result seems very > difficult to customize. I would appreciate any suggestions for manual > or pre-made solutions. The code discloses the color image is made with the 'image' function and that its arguments include: ..., xlim = 0.5 + c(0, nc), ylim = 0.5 + c(0, nr), ... ?image # for details of coordinate choices. Given the way the coordinate systems of that call might get messed up by the subsequent 'plot' call, my guess is that the fastest way to get what you want is to hack 'heatmap' by adding a couple of argument and sticking a couple of "highlighting" functions just after the image call. (You haven't said how you are picking these levels.) -- David. > > Example: > > set.seed(2) > x <- matrix(rnorm(1000),ncol=10) #obviously no real clusters here... > > row.hclust <- hclust(dist(x)) > row.dendro <- as.dendrogram(row.hclust) > > heatmap(x, > Rowv=row.dendro) > row.cut <- cutree(row.hclust,3)[row.hclust$order] > cutpoints <- which(row.cut[-1]!=row.cut[-length(row.cut)]) > ymin <- par("usr")[3] #in general incorrect > ymax <- par("usr")[4] #in general incorrect > for (i in cutpoints){ > thisy <- ymin + (ymax-ymin)*(i-1)/nrow(x) > abline(h=thisy,lw=3) > } > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From Thomas.Adams at noaa.gov Fri Jul 1 03:41:10 2011 From: Thomas.Adams at noaa.gov (Thomas.Adams at noaa.gov) Date: Thu, 30 Jun 2011 21:41:10 -0400 Subject: [R] Looking for Filliben (correlation test) In-Reply-To: <1309452697532-3636248.post@n4.nabble.com> References: <1309448830334-3636069.post@n4.nabble.com> <6187B43F-177C-4BF7-88DE-6F2478AD8FD8@comcast.net> <1309452697532-3636248.post@n4.nabble.com> Message-ID: ----- Original Message ----- From: gaiarrido Date: Thursday, June 30, 2011 1:11 pm Subject: Re: [R] Looking for Filliben (correlation test) To: r-help at r-project.org Mario, I did a google search and found this: http://genepi.qimr.edu.au/staff/davidD/R/filliben.R Cheers! Tom > Thanks very much, but...the OP, what's that? > Sorry > > ----- > Mario Garrido Escudero > PhD student > Dpto. de Biolog?a Animal, Ecolog?a, Parasitolog?a, Edafolog?a y Qca. Agr?cola > Universidad de Salamanca > -- > View this message in context: > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. From PDowney at urban.org Fri Jul 1 05:08:33 2011 From: PDowney at urban.org (Downey, Patrick) Date: Thu, 30 Jun 2011 23:08:33 -0400 Subject: [R] merge function Message-ID: <0F96478603980B46AAAFBA77069582ED110A2546@UIEXCH.urban.org> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From PDowney at urban.org Fri Jul 1 05:22:36 2011 From: PDowney at urban.org (Downey, Patrick) Date: Thu, 30 Jun 2011 23:22:36 -0400 Subject: [R] merge function References: <0F96478603980B46AAAFBA77069582ED110A2546@UIEXCH.urban.org> Message-ID: <0F96478603980B46AAAFBA77069582ED110A2547@UIEXCH.urban.org> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jdnewmil at dcn.davis.ca.us Fri Jul 1 06:48:01 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Thu, 30 Jun 2011 21:48:01 -0700 Subject: [R] merge function In-Reply-To: <0F96478603980B46AAAFBA77069582ED110A2546@UIEXCH.urban.org> References: <0F96478603980B46AAAFBA77069582ED110A2546@UIEXCH.urban.org> Message-ID: <074be3d7-d1c3-40d7-bdc1-34513a1ef43a@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From akhussanov at gmail.com Fri Jul 1 09:01:53 2011 From: akhussanov at gmail.com (UnitRoot) Date: Fri, 1 Jul 2011 00:01:53 -0700 (PDT) Subject: [R] How to fit ARMA model Message-ID: <1309503713258-3637632.post@n4.nabble.com> Hello, I am having some problems with fitting an ARMA model to my time series data (randomly generated numbers). The thing is I have tried many packages [tseries, fseries, FitARMA etc.] and all of them giving very different results. I would appreciate if someone could post here what the best package is for my purpose. Also, after having done the fitting, I would like to check for the model's adequacy. How can I do this? Thanks. -- View this message in context: http://r.789695.n4.nabble.com/How-to-fit-ARMA-model-tp3637632p3637632.html Sent from the R help mailing list archive at Nabble.com. From anna.botto at gmail.com Fri Jul 1 01:28:12 2011 From: anna.botto at gmail.com (nany23) Date: Thu, 30 Jun 2011 16:28:12 -0700 (PDT) Subject: [R] Numerical integration In-Reply-To: References: <1309391539780-3634365.post@n4.nabble.com> Message-ID: <1309476492602-3637092.post@n4.nabble.com> thank you very much for your suggestion! I tried to do that with the psf I need to use: the 3 parameters Lognormal. I did that with a single xstar and a single triplet of parameters to check it works.[I put some numbers to make it woks , but actually they comes from statistical analysis] /# these are the 3 parameters a<- 414.566 b<- 345.5445 g<- -0.9695679 xstar<- 1397.923 #I create a vector pars <-expand.grid(xstar = xstar, a= a, b= b , g= g) fun <- function(xstar, a,b,g,k) { f <- function(x, xstar, a, b,g,k) f.lognorm(x) * k * x * (x >= xstar) integrate(f, -Inf, Inf, xstar = xstar, a = a, b =b, g=g, k=k)$value } # Method 1: (outputs a data frame) library(plyr) out <- mdply(pars, fun)/ at this stage a warning message comes out: *Errore in k > -1e-07 : 'k' is missing* Any ideas of why this error come out ? I really don' know..... Moreover , can I use the algorithm suggested iteratively by using a grid of xstar values and different triplets of parameters ( instead of different value of k) ? Thank you so much for any help!! -- View this message in context: http://r.789695.n4.nabble.com/Numerical-integration-tp3634365p3637092.html Sent from the R help mailing list archive at Nabble.com. From n.bowora at gmail.com Fri Jul 1 02:46:26 2011 From: n.bowora at gmail.com (Edward Bowora) Date: Fri, 1 Jul 2011 02:46:26 +0200 Subject: [R] Help fix last line of my optimization code Message-ID: Hi I need help figure out how to fix my code. When I call into R >optimize(llik,init.params=F) I get this error message ####Error in optimize(llik, init.params = F) : element 1 is empty; the part of the args list of 'min' being evaluated was: (interval)#### My data and my code looks like below. R_j R_m 0.002 0.026567296 0.01 0.003194435 . . . . . . . . 0.0006 0.010281122 a=read.table("D:/ff.txt",header=T) attach(a) llik=function(R_j,R_m) #The parameters al_j, au_j, b_j , and sigma_j need to be estimated and there are no initial estimates to them. if(R_j< 0) { LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2] }else if(R_j>0) { LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2] }else { LF=sum[(log(pnorm((au_j-b_j*R_m)/sigma_j)-pnorm((al_j-b_j*R_m)/sigma_j)))] } optimize(llik,init.params=F) Error in optimize(llik, init.params = F) : element 1 is empty; the part of the args list of 'min' being evaluated was: (interval) Thank you Edward From gaiarrido at usal.es Fri Jul 1 08:34:56 2011 From: gaiarrido at usal.es (gaiarrido) Date: Thu, 30 Jun 2011 23:34:56 -0700 (PDT) Subject: [R] Looking for Filliben (correlation test) In-Reply-To: References: <1309448830334-3636069.post@n4.nabble.com> <6187B43F-177C-4BF7-88DE-6F2478AD8FD8@comcast.net> <1309452697532-3636248.post@n4.nabble.com> Message-ID: <1309502096365-3637591.post@n4.nabble.com> Thanks very much, I got, it runs perfect. ----- Mario Garrido Escudero PhD student Dpto. de Biolog?a Animal, Ecolog?a, Parasitolog?a, Edafolog?a y Qca. Agr?cola Universidad de Salamanca -- View this message in context: http://r.789695.n4.nabble.com/Looking-for-Filliben-correlation-test-tp3636069p3637591.html Sent from the R help mailing list archive at Nabble.com. From pdalgd at gmail.com Fri Jul 1 10:15:17 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Fri, 1 Jul 2011 10:15:17 +0200 Subject: [R] merge function In-Reply-To: <074be3d7-d1c3-40d7-bdc1-34513a1ef43a@email.android.com> References: <0F96478603980B46AAAFBA77069582ED110A2546@UIEXCH.urban.org> <074be3d7-d1c3-40d7-bdc1-34513a1ef43a@email.android.com> Message-ID: <4CA5BD4D-7445-48D6-844B-7DD53CF39FE5@gmail.com> On Jul 1, 2011, at 06:48 , Jeff Newmiller wrote: > You haven't provided a reproducible example. > > I do notice you are using T and F which are variables that can be redefined (which is why TRUE and FALSE are preferred. Also, if x and y really are "vectors" (I bet they're not, though), you'll get the cartesian product whatever all.x and all.y are, unless you specify by.x="x" and by.y="y". I.e., > merge(1:3,2:4,all.y=F,all.x=T) x y 1 1 2 2 2 2 3 3 2 4 1 3 5 2 3 6 3 3 7 1 4 8 2 4 9 3 4 > merge(1:3,2:4,by.x="x",by.y="y") x 1 2 2 3 > merge(1:3,2:4,by.x="x",by.y="y", all.x=T) x 1 1 2 2 3 3 All just to point out the importance of actual examples. Mind reading is sort of fun and some correspondents on mailing lists get rather good at it, but it is more expedient to have a well-defined problem from the outset. -pd > > "Downey, Patrick" wrote: > > Hello, > > I'm clearly confused about the merge function. In the following > > r <- merge(x,y,all.x=T,all.y=F) > > my y vector has only unique values (no duplicates). So I don't understand > how this can ever generate an r which is of greater length than x. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From Graham.Williams at togaware.com Fri Jul 1 11:19:06 2011 From: Graham.Williams at togaware.com (Graham Williams) Date: Fri, 1 Jul 2011 19:19:06 +1000 Subject: [R] Launcher for Rattle? In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vchouraki at yahoo.fr Fri Jul 1 12:52:43 2011 From: vchouraki at yahoo.fr (vincent chouraki) Date: Fri, 1 Jul 2011 11:52:43 +0100 (BST) Subject: [R] methods package not loaded by default when using Rscript in R2.13 Message-ID: <1309517563.77876.YahooMailRC@web25401.mail.ukl.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bhh at xs4all.nl Fri Jul 1 13:38:29 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Fri, 1 Jul 2011 04:38:29 -0700 (PDT) Subject: [R] methods package not loaded by default when using Rscript in R2.13 In-Reply-To: <1309517563.77876.YahooMailRC@web25401.mail.ukl.yahoo.com> References: <1309517563.77876.YahooMailRC@web25401.mail.ukl.yahoo.com> Message-ID: <1309520309121-3638002.post@n4.nabble.com> Vincent Chouraki wrote: > > Dear all, > > As the object of this mail suggests, the methods package is not loaded by > default in R2.13 when using Rscript whereas it is loaded when using an > interactive session. > ?Rscript will tell you why and what you could do to change the default. Berend -- View this message in context: http://r.789695.n4.nabble.com/methods-package-not-loaded-by-default-when-using-Rscript-in-R2-13-tp3637937p3638002.html Sent from the R help mailing list archive at Nabble.com. From pdalgd at gmail.com Fri Jul 1 13:41:52 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Fri, 1 Jul 2011 13:41:52 +0200 Subject: [R] methods package not loaded by default when using Rscript in R2.13 In-Reply-To: <1309517563.77876.YahooMailRC@web25401.mail.ukl.yahoo.com> References: <1309517563.77876.YahooMailRC@web25401.mail.ukl.yahoo.com> Message-ID: On Jul 1, 2011, at 12:52 , vincent chouraki wrote: > Dear all, > > As the object of this mail suggests, the methods package is not loaded by > default in R2.13 when using Rscript whereas it is loaded when using an > interactive session. > > An example with the metafor package : > > library(metafor) > example(addpoly.rma) > > this works in an interaction R shell, but put it in a .R file and use it with > Rscript, it won't work until you added > > library(methods) > > One of my colleague had the same issue with other custom R functions (work in > 2.12, not anymore in 2.13 because of the same behavior) > > Is this something that was planned or a bug? It's intentional. See help(Rscript) for the reason and the remedies. -pd -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From n.bowora at gmail.com Fri Jul 1 10:16:25 2011 From: n.bowora at gmail.com (Edward Bowora) Date: Fri, 1 Jul 2011 10:16:25 +0200 Subject: [R] help optimize Message-ID: Hi I need help figure out how to fix my code. When I call into R >optimize(llik,init.params=F) I get this error message ####Error in optimize(llik, init.params = F) : element 1 is empty; the part of the args list of 'min' being evaluated was: (interval)#### My data and my code looks like below. R_j R_m 0.002 0.026567296 0.01 0.003194435 . . . . . . . . 0.0006 0.010281122 a=read.table("D:/ff.txt",header=T) attach(a) llik=function(R_j,R_m) #The parameters al_j, au_j, b_j , and sigma_j need to be estimated and there are no initial estimates to them. if(R_j< 0) { LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2] }else if(R_j>0) { LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2] }else { LF=sum[(log(pnorm((au_j-b_j*R_m)/sigma_j)-pnorm((al_j-b_j*R_m)/sigma_j)))] } optimize(llik,init.params=F) Error in optimize(llik, init.params = F) : element 1 is empty; the part of the args list of 'min' being evaluated was: (interval) Thank you Edward From stam_kiral at hotmail.com Fri Jul 1 11:24:19 2011 From: stam_kiral at hotmail.com (stamkiral) Date: Fri, 1 Jul 2011 02:24:19 -0700 (PDT) Subject: [R] multiple moderated regression steps Message-ID: <1309512259495-3637807.post@n4.nabble.com> hi, ?m studying moderated effects of percieved social support and justice world belief on relationship between stress coping strategies and depression level. ? haver never run this analysis before soi ? want to check my steps whether correct or not. first ? run regression in step 1 centered independent variables and centered moderators in step2 two way interactions instep 3 three way interactions as results ? found significiant two way and three way interactions. "It ?s important after this" for example two way interactions; ? run 2 slopes that fit in with Winnifred's suggestion (http://www.docstoc.com/docs/21151269/Moderated-Multiple-Regression-v5 p.6) my criteria for significence was the centered indipendent vairable (that has significant interaction level) in Block 2--------- is this way is correct????? second to plot this slope ? used Dawson' s 2 way unstandardised excel spreadsheet (http://www.jeremydawson.co.uk/slopes.htm). In spreadsheet it is required to enter unstandardised regression coefficients of IV, moderator and interaction. ?n which block the coefficients are true for entering IV and moderator co efficients????? ?n 1 block (that include main effects) or in 2 block (that include interactions)????? one more question: In three way spreadshhet it is necessary to enter variance of IV*Moderator coefficient. ?n matr?x which covariance block is correct to enter this value? should ? look at second covariance block or third covariance block. like this,for value of Covariance of IV*Mod1, IV*Mod2 coefficients which covariance block must be taken as correct??? thaks so much Note: excuse me for my English -- View this message in context: http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3637807.html Sent from the R help mailing list archive at Nabble.com. From ligges at statistik.tu-dortmund.de Fri Jul 1 14:13:02 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Fri, 1 Jul 2011 14:13:02 +0200 Subject: [R] Help fix last line of my optimization code In-Reply-To: References: Message-ID: <4E0DB9CE.5010801@statistik.tu-dortmund.de> Please a) read the posting guide b) really submit reproducible code. Yours is not. c) read ?optimize and learn to specify the interval d) you specified an argument init.params, but that is not used in optimize() nor can it be usefully passed to llik. Uwe Ligges On 01.07.2011 02:46, Edward Bowora wrote: > Hi > > I need help figure out how to fix my code. > > When I call into R >> optimize(llik,init.params=F) > I get this error message > ####Error in optimize(llik, init.params = F) : element 1 is empty; > the part of the args list of 'min' being evaluated was: > (interval)#### > > > My data and my code looks like below. > > > R_j R_m > 0.002 0.026567296 > 0.01 0.003194435 > . . > . . > . . > . . > 0.0006 0.010281122 > > a=read.table("D:/ff.txt",header=T) > attach(a) > llik=function(R_j,R_m) > #The parameters al_j, au_j, b_j , > and sigma_j need to be estimated and there are no initial estimates to > them. > if(R_j< 0) > { > LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2] > }else if(R_j>0) > { > LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2] > }else > { > LF=sum[(log(pnorm((au_j-b_j*R_m)/sigma_j)-pnorm((al_j-b_j*R_m)/sigma_j)))] > } > optimize(llik,init.params=F) > Error in optimize(llik, init.params = F) : element 1 is empty; > the part of the args list of 'min' being evaluated was: > (interval) > > Thank you > > Edward > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Stephen.Ellison at lgcgroup.com Fri Jul 1 14:20:33 2011 From: Stephen.Ellison at lgcgroup.com (S Ellison) Date: Fri, 1 Jul 2011 13:20:33 +0100 Subject: [R] Help fix last line of my optimization code In-Reply-To: References: Message-ID: <98B156BB22D11342A931E823798D434853E9727CEA@GOLD.corp.lgc-group.com> You need to read the help page for optimize. Again, if you already have ;-) The error message >Error in optimize(llik, init.params = F) : element 1 is empty; > the part of the args list of 'min' being evaluated was: > (interval)#### is telling you that the parameter (interval) required by optimise in the absence of a specified min and max is empty. And if you look at your code you will see that that is because you have not provided it.... There are other problems: Optimize requires a function taking _one_ parameter to optimise and an interval to optimise over, or alternatively a maximum and a minimum. You have provided no interval, max or min (which is why it went looking for interval) and you have fed it a function taking two parameters, both of which you want optimised. You have also given it initial parameters (?) as a named parameter which it can't recognise - it's not one of the named arguments to optimize - so optimize will try to pass that to the function being optimised. That will fail if it ever gets there because _that_ doesn't use a parameter called init.params either. I'm not clear what you wanted init.params to do but at a rough guess you had in mind a function for which init.params is the initial parameter vector and if FALSE is guessed at. optmize() is not that function. Perhaps you meant to use optim - which does two-parameter optimisation - but you _will_ need to specify a starting vector as that doesn't take an init.params argument either. As an alternative, you could use something like nls which takes a self-starting function argument which generats its own initial values, but you'll have to create the self-starting function yourself (see ?selfStart) > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Edward Bowora > Sent: 01 July 2011 01:46 > To: r-help > Subject: [R] Help fix last line of my optimization code > > Hi > > I need help figure out how to fix my code. > > When I call into R > >optimize(llik,init.params=F) > I get this error message > ####Error in optimize(llik, init.params = F) : element 1 is empty; > the part of the args list of 'min' being evaluated was: > (interval)#### > > > My data and my code looks like below. > > > R_j R_m > 0.002 0.026567296 > 0.01 0.003194435 > . . > . . > . . > . . > 0.0006 0.010281122 > > a=read.table("D:/ff.txt",header=T) > attach(a) > llik=function(R_j,R_m) > #The parameters al_j, au_j, b_j , > and sigma_j need to be estimated and there are no initial > estimates to them. > if(R_j< 0) > { > > LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j- > b_j*R_m))^2] > }else if(R_j>0) > { > > LF=sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j- > b_j*R_m))^2] > }else > { > > LF=sum[(log(pnorm((au_j-b_j*R_m)/sigma_j)-pnorm((al_j-b_j*R_m) > /sigma_j)))] > } > optimize(llik,init.params=F) > Error in optimize(llik, init.params = F) : element 1 is empty; > the part of the args list of 'min' being evaluated was: > (interval) > > Thank you > > Edward > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} From anna.botto at gmail.com Fri Jul 1 14:47:40 2011 From: anna.botto at gmail.com (nany23) Date: Fri, 1 Jul 2011 05:47:40 -0700 (PDT) Subject: [R] Numerical integration In-Reply-To: <1309476492602-3637092.post@n4.nabble.com> References: <1309391539780-3634365.post@n4.nabble.com> <1309476492602-3637092.post@n4.nabble.com> Message-ID: <1309524460489-3638104.post@n4.nabble.com> thanks for the Italian! I apologize for my previuos explanation which was not clear actually there are two "k" parameters, so I change one them; let's put it this way /# these are the 3 parameters a<- 414.566 b<- 345.5445 g<- -0.9695679 xstar<- 1397.923 *m<-100* #I create a vector pars <-expand.grid(xstar = xstar, a= a, b= b , g= g) fun <- function(xstar, a,b,g,*m*) { f <- function(x, xstar, a, b,g,m) f.lognorm(x) * *m* * x * (x >= xstar) integrate(f, -Inf, Inf, xstar = xstar, a = a, b =b, g=g, *m=m*)$value } # Method 1: (outputs a data frame) library(plyr) out <- mdply(pars, fun) at this stage a warning message comes out: Errore in k > -1e-07 : 'k' is missing/ The "k" the errore refers to is one of the three parameters of the pdf distribution whose formula is the following /function (x, xi, alfa, k) { if ((*k > -1e-07*) & (k < 1e-07)) { y <- (x - xi)/alfa } else { y <- -k^(-1) * log(1 - k * (x - xi)/alfa) } f <- exp(k * y - (y^2)/2)/(alfa * sqrt(2 * pi)) return(f) }/ So the xi,alfa, k of the function are those which I call a,b,g[the parameters] I'm afraid I'm making some very silly mistakes in the syntax but don't know where and how correct them... I tried different ways but they don' work... -- View this message in context: http://r.789695.n4.nabble.com/Numerical-integration-tp3634365p3638104.html Sent from the R help mailing list archive at Nabble.com. From gm.spam2011 at gmail.com Fri Jul 1 14:32:31 2011 From: gm.spam2011 at gmail.com (B Marktplaats) Date: Fri, 1 Jul 2011 14:32:31 +0200 Subject: [R] defining new variable Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vchouraki at yahoo.fr Fri Jul 1 13:57:53 2011 From: vchouraki at yahoo.fr (vincent chouraki) Date: Fri, 1 Jul 2011 12:57:53 +0100 (BST) Subject: [R] Re : methods package not loaded by default when using Rscript in R2.13 In-Reply-To: References: <1309517563.77876.YahooMailRC@web25401.mail.ukl.yahoo.com> Message-ID: <1309521473.64391.YahooMailRC@web25402.mail.ukl.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Eugeniusz.Kaluza at polsl.pl Fri Jul 1 14:53:45 2011 From: Eugeniusz.Kaluza at polsl.pl (=?iso-8859-2?Q?Eugeniusz_Ka=B3u=BFa?=) Date: Fri, 1 Jul 2011 14:53:45 +0200 Subject: [R] How to filter XY pairs of inacurate gps position along track, taking into account the time index to not mix track from different days in one average track Message-ID: <4D81F24AB1569B4984BCB79755E34DDF0154CFD0@styks.polsl.pl> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Fri Jul 1 15:04:18 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Fri, 01 Jul 2011 15:04:18 +0200 Subject: [R] testInstalledBasic In-Reply-To: <534401.21270.qm@web120505.mail.ne1.yahoo.com> References: <534401.21270.qm@web120505.mail.ne1.yahoo.com> Message-ID: <4E0DC5D2.5090002@statistik.tu-dortmund.de> On 01.07.2011 01:17, Cody Hamilton wrote: > Hello, > > I installed R 2.13.0 on a Windows 2003 server. I downloaded the Rtools213.exe from http://www.murdoch-sutherland.com/Rtools/ and placed it in the path (C:\Program Files\R\R-2.13.0\bin). > > I submitted the following code: > > library(tools) > Sys.setenv(LC_COLLATE=C) > testInstalledBasic('basic') > > I get the following message in the R Console, which I believe corresponds to a failure of the test: > >> library(tools) >> Sys.setenv(LC_COLLATE=C) >> testInstalledBasic('basic') > running strict specific tests > running code in ?eval-etc.R? > comparing ?eval-etc.Rout? to ?eval-etc.Rout.save? ...[1] 1 > > Is there something wrong with my install? I took a closer look and your problem is that you want Sys.setenv(LC_COLLATE="C") rather than Sys.setenv(LC_COLLATE=C) since C is a function but "C" the character you actually want to set. Anyway, there is a bug in ./src/library/tools/R/testing.R (e.g. for today's R-devel): The line tests3 <- c("reg-tests-1", "reg-tests-2", "reg-IO", "reg-IO2", "reg-S4") needs to be replaced by tests3 <- c("reg-tests-1a", "reg-tests-1b", "reg-tests-2", "reg-IO", "reg-IO2", "reg-S4") Any R core member around to fix this? Best, Uwe Ligges > Regards, > -Cody > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From hadley at rice.edu Fri Jul 1 15:01:25 2011 From: hadley at rice.edu (Hadley Wickham) Date: Fri, 1 Jul 2011 08:01:25 -0500 Subject: [R] [R-pkgs] stringr 0.5 Message-ID: # stringr Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The `stringr` package aims to remedy these problems by providing a clean, modern interface to common string operations. More concretely, `stringr`: * Processes factors and characters in the same way. * Gives functions consistent names and arguments. * Simplifies string operations by eliminating options that you don't need 95% of the time. * Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. * Completes R's string handling functions with useful functions from other programming languages. stringr 0.5 =========== * new `str_wrap` function which gives `strwrap` output in a more convenient format * new `word` function extract words from a string given user defined separator (thanks to suggestion by David Cooper) * `str_locate` now returns consistent type when matching empty string (thanks to Stavros Macrakis) * new `str_count` counts number of matches in a string. * `str_pad` and `str_trim` receive performance tweaks - for large vectors this should give at least a two order of magnitude speed up * str_length returns NA for invalid multibyte strings * fix small bug in internal `recyclable` function -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ _______________________________________________ R-packages mailing list R-packages at r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages From bbolker at gmail.com Fri Jul 1 15:16:45 2011 From: bbolker at gmail.com (Ben Bolker) Date: Fri, 1 Jul 2011 13:16:45 +0000 Subject: [R] How to fit ARMA model References: <1309503713258-3637632.post@n4.nabble.com> Message-ID: UnitRoot gmail.com> writes: > > Hello, > I am having some problems with fitting an ARMA model to my time series data > (randomly generated numbers). The thing is I have tried many packages > [tseries, fseries, FitARMA etc.] and all of them giving very different > results. I would appreciate if someone could post here what the best package > is for my purpose. > Also, after having done the fitting, I would like to check for the model's > adequacy. How can I do this? > Thanks. It's hard to say without more detail -- we don't know what your purpose is (beyond the general one of fitting an ARMA model to data). I would say that when in doubt, if there is functionality in 'core' R -- the base and recommended packages -- that it is likely to be the most stable and well tested. (Not always true -- there are some very good contributed packages, and sometimes the functions in R are missing some advanced features -- but a good rule of thumb.) So try ?arima. Another rule of thumb is that Venables and Ripley 2002 is a good starting point (although again not necessarily as extensive as specialized topics) for "how do I do xxx in R"? See Chapter 14. As suggested in the help for ?arima, ?tsdiag is "a generic function to plot time series diagnostics" From f.harrell at vanderbilt.edu Fri Jul 1 15:22:31 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Fri, 1 Jul 2011 06:22:31 -0700 (PDT) Subject: [R] multiple moderated regression steps In-Reply-To: <1309512259495-3637807.post@n4.nabble.com> References: <1309512259495-3637807.post@n4.nabble.com> Message-ID: <1309526551121-3638186.post@n4.nabble.com> How does it help to center the variables? What statistical principles are you using to fit your model and how are you ensuring statistical validity of inferential estimates? Why do it in multiple steps? Why use a mixture of software tools? What is the utility of standardized regression coefficients? Frank stamkiral wrote: > > hi, > > ?m studying moderated effects of percieved social support and justice > world belief on relationship between stress coping strategies and > depression level. ? haver never run this analysis before soi ? want to > check my steps whether correct or not. > > first ? run regression > in step 1 > centered independent variables and centered moderators > > in step2 > two way interactions > > instep 3 > three way interactions > > as results ? found significiant two way and three way interactions. "It > ?s important after this" > > for example two way interactions; > ? run 2 slopes that fit in with Winnifred's suggestion > (http://www.docstoc.com/docs/21151269/Moderated-Multiple-Regression-v5 > p.6) > my criteria for significence was the centered indipendent vairable (that > has significant interaction level) in Block 2--------- is this way is > correct????? > > second to plot this slope ? used Dawson' s 2 way unstandardised excel > spreadsheet (http://www.jeremydawson.co.uk/slopes.htm). In spreadsheet it > is required to enter unstandardised regression coefficients of IV, > moderator and interaction. ?n which block the coefficients are true for > entering IV and moderator co efficients????? > ?n 1 block (that include main effects) or in 2 block (that include > interactions)????? > > one more question: In three way spreadshhet it is necessary to enter > variance of IV*Moderator coefficient. ?n matr?x which covariance block is > correct to enter this value? should ? look at second covariance block or > third covariance block. > like this,for value of Covariance of IV*Mod1, IV*Mod2 coefficients which > covariance block must be taken as correct??? > thaks so much > > Note: excuse me for my English > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3638186.html Sent from the R help mailing list archive at Nabble.com. From petr.pikal at precheza.cz Fri Jul 1 15:22:24 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Fri, 1 Jul 2011 15:22:24 +0200 Subject: [R] Odp: defining new variable In-Reply-To: References: Message-ID: Hi r-help-bounces at r-project.org napsal dne 01.07.2011 14:32:31: > B Marktplaats > Odeslal: r-help-bounces at r-project.org > > 01.07.2011 14:32 > > Komu > > r-help at r-project.org > > Kopie > > P?edm?t > > [R] defining new variable > > Hello, > > I'm new to R and I'm trying to define new quite simple variable but I'm > struggling with R syntax (when coming to dates) for a while and still > getting on it. > > I would be very grateful if someone could help me with this, to be able to > move on. > > I have the following variables: > > - Transplant.date > - Faildate > - Death.date > > The new variable Time should do the following thing: > > Time <- > > If Not IsNull() Then DaysBetween( ,) > Else If IsNull() And Not IsNull() Then > DaysBetween( ,) Else If IsNull() And > IsNull() Then DaysBetween( ,CurrentDate()) I bet there is more elegant solution but with such data frame you can > df td fd dd 1 2011-06-11 2011-06-16 2 2011-06-12 2011-06-22 3 2011-06-13 2011-06-23 4 2011-06-14 2011-06-24 5 2011-06-15 df$days<-rowSums(sapply(df[,2:3], "-", df$td), na.rm=T) Then you can check for NA values in fd and dd and change respective values by df$days[rowSums(is.na(df[,2:3]))==2] <- Sys.Date()-df$td[rowSums(is.na(df[,2:3]))==2] df td fd dd days 1 2011-06-11 2011-06-16 5 2 2011-06-12 2011-06-22 10 3 2011-06-13 2011-06-23 10 4 2011-06-14 2011-06-24 10 5 2011-06-15 16 > Regards Petr > > > > > Thank you very much!! > > Laura > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ross.dunne at tcd.ie Fri Jul 1 16:10:38 2011 From: ross.dunne at tcd.ie (dunner) Date: Fri, 1 Jul 2011 07:10:38 -0700 (PDT) Subject: [R] Multilevel Survival Analysis - Cox PH Model Message-ID: <1309529438658-3638278.post@n4.nabble.com> Hello all, thanks for your time and patience. I'm looking for a method in R to analyse the following data: Time to waking after anaesthetic for medical procedures repeated on the same individual. > str(mysurv) labelled [1:740, 1:2] 20 20 15 20 30+ 40+ 50 30 15 10 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:2] "time" "status" - attr(*, "type")= chr "right" - attr(*, "units")= chr "Day" - attr(*, "time.label")= chr "ORIENTATION" - attr(*, "event.label")= chr "FullyOrientated" mysurv is constructed from the following data: head(data.frame(MRN, ORIENTATION, FullyOrientated)) MRN ORIENTATION FullyOrientated 1 0008291 20 2 2 0008469 20 2 3 0008469 15 2 4 0010188 20 2 5 0013664 30 1 6 0014217 40 1 I had planned to use a Cox PH model to analyse time to waking (ORIENTATION = 10, 15, 20 mins ....... 50 mins) and whether or not people (MRN) are fully awake within an hour (FullyOrientated). I've put GENDER, etc. into the model but I have the following bias: The procedure is repeated weekly on each individual (MRN), so each individual has 5-9 cases associated with them. Currently I am including these in the model as if they were independent. Is there a way to account for the non-independence of these waking times? I'm thinking of something similar to the NLMER package and Multilevel / Mixed Effects analysis as described in Pinheiro and Bates. I'd be appreciative of any help at all? Thanks again, R -- View this message in context: http://r.789695.n4.nabble.com/Multilevel-Survival-Analysis-Cox-PH-Model-tp3638278p3638278.html Sent from the R help mailing list archive at Nabble.com. From lui.r.project at googlemail.com Fri Jul 1 16:11:05 2011 From: lui.r.project at googlemail.com (Lui ##) Date: Fri, 1 Jul 2011 22:11:05 +0800 Subject: [R] SNOW libraries/functions, rGenoud Message-ID: Dear group, does anybody know how to export libraries/functions to all nodes when launching snow? I want to use a function from fBasics (dstable) for a rGenoud optimization routine, but I fail "making the function accessible" to the nodes created. I know how it works for variables, I know how it works in snowfall(which cant be used in that case), but I dont know how it culd work in snow. Help appreciated! Lui From ligges at statistik.tu-dortmund.de Fri Jul 1 16:17:09 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Fri, 01 Jul 2011 16:17:09 +0200 Subject: [R] SNOW libraries/functions, rGenoud In-Reply-To: References: Message-ID: <4E0DD6E5.8080604@statistik.tu-dortmund.de> See ?clusterEvalQ Uwe Ligges On 01.07.2011 16:11, Lui ## wrote: > Dear group, > > does anybody know how to export libraries/functions to all nodes when > launching snow? I want to use a function from fBasics (dstable) for a > rGenoud optimization routine, but I fail "making the function > accessible" to the nodes created. I know how it works for variables, I > know how it works in snowfall(which cant be used in that case), but I > dont know how it culd work in snow. > > Help appreciated! > > Lui > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Fri Jul 1 16:22:21 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 10:22:21 -0400 Subject: [R] Multilevel Survival Analysis - Cox PH Model In-Reply-To: <1309529438658-3638278.post@n4.nabble.com> References: <1309529438658-3638278.post@n4.nabble.com> Message-ID: On Jul 1, 2011, at 10:10 AM, dunner wrote: > Hello all, thanks for your time and patience. > > I'm looking for a method in R to analyse the following data: > > Time to waking after anaesthetic for medical procedures repeated on > the same > individual. > >> str(mysurv) > labelled [1:740, 1:2] 20 20 15 20 30+ 40+ 50 30 15 10 ... > - attr(*, "dimnames")=List of 2 > ..$ : NULL > ..$ : chr [1:2] "time" "status" > - attr(*, "type")= chr "right" > - attr(*, "units")= chr "Day" > - attr(*, "time.label")= chr "ORIENTATION" > - attr(*, "event.label")= chr "FullyOrientated" > > mysurv is constructed from the following data: > > head(data.frame(MRN, ORIENTATION, FullyOrientated)) > > MRN ORIENTATION FullyOrientated > 1 0008291 20 2 > 2 0008469 20 2 > 3 0008469 15 2 > 4 0010188 20 2 > 5 0013664 30 1 > 6 0014217 40 1 > > > I had planned to use a Cox PH model to analyse time to waking > (ORIENTATION = > 10, 15, 20 mins ....... 50 mins) and whether or not people (MRN) are > fully > awake within an hour (FullyOrientated). I've put GENDER, etc. into > the > model but I have the following bias: > > The procedure is repeated weekly on each individual (MRN), so each > individual has 5-9 cases associated with them. Currently I am > including > these in the model as if they were independent. > > Is there a way to account for the non-independence of these waking > times? > > I'm thinking of something similar to the NLMER package and > Multilevel / > Mixed Effects analysis as described in Pinheiro and Bates. Have you looked at the coxme package? -- David Winsemius, MD West Hartford, CT From pdalgd at gmail.com Fri Jul 1 16:26:42 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Fri, 1 Jul 2011 16:26:42 +0200 Subject: [R] Odp: defining new variable In-Reply-To: References: Message-ID: On Jul 1, 2011, at 15:22 , Petr PIKAL wrote: > Hi > > r-help-bounces at r-project.org napsal dne 01.07.2011 14:32:31: > >> B Marktplaats >> >> >> I'm new to R and I'm trying to define new quite simple variable but I'm >> struggling with R syntax (when coming to dates) for a while and still >> getting on it. >> >> I would be very grateful if someone could help me with this, to be able > to >> move on. >> >> I have the following variables: >> >> - Transplant.date >> - Faildate >> - Death.date >> >> The new variable Time should do the following thing: >> >> Time <- >> >> If Not IsNull() Then DaysBetween( > ,) >> Else If IsNull() And Not IsNull() Then >> DaysBetween( ,) Else If IsNull() > And >> IsNull() Then DaysBetween( ,CurrentDate()) > > I bet there is more elegant solution but > > with such data frame you can > >> df > td fd dd > 1 2011-06-11 2011-06-16 > 2 2011-06-12 2011-06-22 > 3 2011-06-13 2011-06-23 > 4 2011-06-14 2011-06-24 > 5 2011-06-15 > > df$days<-rowSums(sapply(df[,2:3], "-", df$td), na.rm=T) > > Then you can check for NA values in fd and dd and change respective values > by > > df$days[rowSums(is.na(df[,2:3]))==2] <- > Sys.Date()-df$td[rowSums(is.na(df[,2:3]))==2] > df > td fd dd days > 1 2011-06-11 2011-06-16 5 > 2 2011-06-12 2011-06-22 10 > 3 2011-06-13 2011-06-23 10 > 4 2011-06-14 2011-06-24 10 > 5 2011-06-15 16 >> I'd go for something like df <- within(df,{ days <- Sys.Date() - td days[!is.na(dd)] <- (dd - td)[!is.na(dd)] days[!is.na(fd)] <- (fd - td)[!is.na(fd)] }) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From dwinsemius at comcast.net Fri Jul 1 16:32:52 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 10:32:52 -0400 Subject: [R] Multilevel Survival Analysis - Cox PH Model In-Reply-To: References: <1309529438658-3638278.post@n4.nabble.com> Message-ID: <7316E9DE-F5F0-4F43-8571-842697E26AB0@comcast.net> On Jul 1, 2011, at 10:22 AM, David Winsemius wrote: > > On Jul 1, 2011, at 10:10 AM, dunner wrote: > >> Hello all, thanks for your time and patience. >> >> I'm looking for a method in R to analyse the following data: >> >> Time to waking after anaesthetic for medical procedures repeated on >> the same >> individual. >> >>> str(mysurv) >> labelled [1:740, 1:2] 20 20 15 20 30+ 40+ 50 30 15 10 ... >> - attr(*, "dimnames")=List of 2 >> ..$ : NULL >> ..$ : chr [1:2] "time" "status" >> - attr(*, "type")= chr "right" >> - attr(*, "units")= chr "Day" >> - attr(*, "time.label")= chr "ORIENTATION" >> - attr(*, "event.label")= chr "FullyOrientated" >> >> mysurv is constructed from the following data: >> >> head(data.frame(MRN, ORIENTATION, FullyOrientated)) >> >> MRN ORIENTATION FullyOrientated >> 1 0008291 20 2 >> 2 0008469 20 2 >> 3 0008469 15 2 >> 4 0010188 20 2 >> 5 0013664 30 1 >> 6 0014217 40 1 >> >> >> I had planned to use a Cox PH model to analyse time to waking >> (ORIENTATION = >> 10, 15, 20 mins ....... 50 mins) and whether or not people (MRN) >> are fully >> awake within an hour (FullyOrientated). I've put GENDER, etc. into >> the >> model but I have the following bias: >> >> The procedure is repeated weekly on each individual (MRN), so each >> individual has 5-9 cases associated with them. Currently I am >> including >> these in the model as if they were independent. >> >> Is there a way to account for the non-independence of these waking >> times? >> >> I'm thinking of something similar to the NLMER package and >> Multilevel / >> Mixed Effects analysis as described in Pinheiro and Bates. > > Have you looked at the coxme package? As an initial strata()-gem, as it were, perhaps just adding strata(MRN) may parcel out the intra-individual variability and degrees of freedom, so that they are not inappropriately included in the IV's. My initial suggestion of coxme may be overkill. > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From armstrong.wou at gmail.com Fri Jul 1 15:47:47 2011 From: armstrong.wou at gmail.com (William Armstrong) Date: Fri, 1 Jul 2011 06:47:47 -0700 Subject: [R] Writing Complex Formulas Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From stam_kiral at hotmail.com Fri Jul 1 15:48:37 2011 From: stam_kiral at hotmail.com (stamkiral) Date: Fri, 1 Jul 2011 06:48:37 -0700 (PDT) Subject: [R] multiple moderated regression steps In-Reply-To: <1309526551121-3638186.post@n4.nabble.com> References: <1309512259495-3637807.post@n4.nabble.com> <1309526551121-3638186.post@n4.nabble.com> Message-ID: <1309528117288-3638235.post@n4.nabble.com> variables were centered (except DV), because in social sciences zero is rarely a meaningful point on a scale (Cohen, Cohen, West, Aiken, 2006). for example in percieved social support questionnaire there is no value as zero. It is a Likert Type questionnaire and on questionnaire 1= strongly yes.....7= strongly no. So Aiken and West (1991), suggested to centered predictors and moderators to appoint zero a meaningful value to count in regression equation. also multilevel steps were run to avoid multicollinearity. according to results .05 were accepted as significant (in the coefficient table) and slope test were done for interactions. for plottin slopes unstandardised regression coefficient were used. -- View this message in context: http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3638235.html Sent from the R help mailing list archive at Nabble.com. From r.m.krug at gmail.com Fri Jul 1 17:02:07 2011 From: r.m.krug at gmail.com (Rainer M Krug) Date: Fri, 1 Jul 2011 17:02:07 +0200 Subject: [R] regexp problem Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ggrothendieck at gmail.com Fri Jul 1 17:08:12 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Fri, 1 Jul 2011 11:08:12 -0400 Subject: [R] regexp problem In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 11:02 AM, Rainer M Krug wrote: > Hi > > I have a question concerning regexp - I want to select with grep all > character strings which contain the numbers 11:20 (code below). > > At the moment I am using [], but that obviously does not work, as it matches > each element in the []. Is there a way to specify that the regexp should > match 11, but not 1? > > Here is the code code: > > x <- paste("suff", 1:40, "pref", sep="_") > x > ## ?[1] "suff_1_pref" ?"suff_2_pref" ?"suff_3_pref" ?"suff_4_pref" > ?"suff_5_pref" > ## ?[6] "suff_6_pref" ?"suff_7_pref" ?"suff_8_pref" ?"suff_9_pref" > ?"suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" > > i <- paste(11:20, collapse=",") > i > ## [1] "11,12,13,14,15,16,17,18,19,20" > > grep(paste("suff_[", i, "]", sep=""), x, value=TRUE) > ## ?[1] "suff_1_pref" ?"suff_2_pref" ?"suff_3_pref" ?"suff_4_pref" > ?"suff_5_pref" > ## ?[6] "suff_6_pref" ?"suff_7_pref" ?"suff_8_pref" ?"suff_9_pref" > ?"suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" > > ## But I would like to have > ## [1] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [6] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" Here are two approaches: grep("1\\d|20", x, value = TRUE) grep(paste(11:20, collapse = "|"), x, value = TRUE) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From r.m.krug at gmail.com Fri Jul 1 17:19:57 2011 From: r.m.krug at gmail.com (Rainer M Krug) Date: Fri, 1 Jul 2011 17:19:57 +0200 Subject: [R] regexp problem In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Fri Jul 1 17:21:19 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 11:21:19 -0400 Subject: [R] regexp problem In-Reply-To: References: Message-ID: On Jul 1, 2011, at 11:02 AM, Rainer M Krug wrote: > Hi > > I have a question concerning regexp - I want to select with grep all > character strings which contain the numbers 11:20 (code below). > > At the moment I am using [], but that obviously does not work, as it > matches > each element in the []. Is there a way to specify that the regexp > should > match 11, but not 1? > > Here is the code code: > > x <- paste("suff", 1:40, "pref", sep="_") > x > ## [1] "suff_1_pref" "suff_2_pref" "suff_3_pref" "suff_4_pref" > "suff_5_pref" > ## [6] "suff_6_pref" "suff_7_pref" "suff_8_pref" "suff_9_pref" > "suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" > > grep("suff_1[1-9]|suff_20", x, value=TRUE) [1] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" "suff_15_pref" "suff_16_pref" [7] "suff_17_pref" "suff_18_pref" "suff_19_pref" "suff_20_pref" > i <- paste(11:20, collapse=",") > i > ## [1] "11,12,13,14,15,16,17,18,19,20" That does not look right. You now have a single element with lots of commas. > > grep(paste("suff_[", i, "]", sep=""), x, value=TRUE) > ## [1] "suff_1_pref" "suff_2_pref" "suff_3_pref" "suff_4_pref" > "suff_5_pref" > ## [6] "suff_6_pref" "suff_7_pref" "suff_8_pref" "suff_9_pref" > "suff_10_pref" > ## [11] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [16] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > ## [21] "suff_21_pref" "suff_22_pref" "suff_23_pref" "suff_24_pref" > "suff_25_pref" > ## [26] "suff_26_pref" "suff_27_pref" "suff_28_pref" "suff_29_pref" > "suff_30_pref" > ## [31] "suff_31_pref" "suff_32_pref" "suff_33_pref" "suff_34_pref" > "suff_35_pref" > ## [36] "suff_36_pref" "suff_37_pref" "suff_38_pref" "suff_39_pref" > "suff_40_pref" > The list of values in an [ ] expression is not delimited by commas. You are matching on the first letter following the underscore whenever any character in the "i" string is present (including commas). > x[40] <- 'suff_,zz_pref' > grep(paste("suff_[", i, "]", sep=""), x, value=TRUE) # x[40] matches > ## But I would like to have > ## [1] "suff_11_pref" "suff_12_pref" "suff_13_pref" "suff_14_pref" > "suff_15_pref" > ## [6] "suff_16_pref" "suff_17_pref" "suff_18_pref" "suff_19_pref" > "suff_20_pref" > > Version and platform info: > >> version > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 13.0 > year 2011 > month 04 > day 13 > svn rev 55427 > language R > version.string R version 2.13.0 (2011-04-13) > >> sessionInfo() > R version 2.13.0 (2011-04-13) > Platform: i686-pc-linux-gnu (32-bit) > > locale: > [1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C > [3] LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.utf8 > [7] LC_PAPER=en_GB.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] reshape_0.8.4 plyr_1.5.2 tgp_2.4-2 lhs_0.5 > [5] RSQLite_0.9-4 DBI_0.2-5 date_1.2-29 simecol_0.7-2 > [9] lattice_0.19-26 deSolve_1.10-2 > > loaded via a namespace (and not attached): > [1] grid_2.13.0 tools_2.13.0 >> > > Thanks in advance, > > Rainer > > -- > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc (Conservation > Biology, > UCT), Dipl. Phys. (Germany) > > Centre of Excellence for Invasion Biology > Stellenbosch University > South Africa > > Tel : +33 - (0)9 53 10 27 44 > Cell: +33 - (0)6 85 62 59 98 > Fax (F): +33 - (0)9 58 10 27 44 > > Fax (D): +49 - (0)3 21 21 25 22 44 > > email: Rainer at krugs.de > > Skype: RMkrug > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From tibs at stanford.edu Fri Jul 1 17:15:30 2011 From: tibs at stanford.edu (robert tibshirani) Date: Fri, 1 Jul 2011 08:15:30 -0700 Subject: [R] new version of samr package Message-ID: We have posted a major new version 2.0 of the samr package for large scale significance testing, especially for gene expression data. This new version handles RNA-seq data, using the new method described in http://www-stat.stanford.edu/~tibs/ftp/Li_Tibs.pdf We have also added simple interface functions SAM and SAMseq. Comments (and bug reports) welcome! -- I get so much email that I might not reply to an incoming email, just because it got lost. So don't hesitate to email me again. The probability of a reply should increase. Prof. Robert Tibshirani ?Depts of Health Research and Policy, and Statistics ?Stanford Univ ?Stanford CA 94305 tibs at stanford.edu http://www-stat.stanford.edu/~tibs From gunter.berton at gene.com Fri Jul 1 17:31:20 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Fri, 1 Jul 2011 08:31:20 -0700 Subject: [R] Multilevel Survival Analysis - Cox PH Model In-Reply-To: <7316E9DE-F5F0-4F43-8571-842697E26AB0@comcast.net> References: <1309529438658-3638278.post@n4.nabble.com> <7316E9DE-F5F0-4F43-8571-842697E26AB0@comcast.net> Message-ID: Is there any right censoring? If not, then plain old lme, lmer, gam (in mgcv), ... etc. would seem to me do just fine for time to waking = ORIENTATION as a response -- or are you thinking of this as interval-censored data, which it would appear to be since you've binned the response? I strongly suspect that the simpler approach would work pretty well, anyway, but ... For FULLY ORIENTED, which is binary, glmm, glmer, etc. would seem to do if there's no censoring. I'd be happy doing this even for a "small" amount of right censoring (uh-oh) and the usual hack cheats --replacing the censoring time with something slightly bigger than the censoring time, for example. But here I'd be getting by knuckles slapped -- whack whack! -- for statistical hackery, probably justifiably, so you probably need to retreat to the original approach if that's the case. I would welcome criticism/correction on this suggestion if I'm all wet -- or even just damp. Cheers, Bert On Fri, Jul 1, 2011 at 7:32 AM, David Winsemius wrote: > > On Jul 1, 2011, at 10:22 AM, David Winsemius wrote: > >> >> On Jul 1, 2011, at 10:10 AM, dunner wrote: >> >>> Hello all, thanks for your time and patience. >>> >>> I'm looking for a method in R to analyse the following data: >>> >>> Time to waking after anaesthetic for medical procedures repeated on the >>> same >>> individual. >>> >>>> str(mysurv) >>> >>> labelled [1:740, 1:2] 20 ?20 ?15 ?20 ?30+ 40+ 50 ?30 ?15 ?10 ?... >>> - attr(*, "dimnames")=List of 2 >>> ..$ : NULL >>> ..$ : chr [1:2] "time" "status" >>> - attr(*, "type")= chr "right" >>> - attr(*, "units")= chr "Day" >>> - attr(*, "time.label")= chr "ORIENTATION" >>> - attr(*, "event.label")= chr "FullyOrientated" >>> >>> mysurv is constructed from the following data: >>> >>> head(data.frame(MRN, ORIENTATION, FullyOrientated)) >>> >>> ? ? MRN ORIENTATION FullyOrientated >>> 1 0008291 ? ? ? ? ? 20 ? ? ? ? ? ? ? 2 >>> 2 0008469 ? ? ? ? ? 20 ? ? ? ? ? ? ? 2 >>> 3 0008469 ? ? ? ? ? 15 ? ? ? ? ? ? ? 2 >>> 4 0010188 ? ? ? ? ? 20 ? ? ? ? ? ? ? 2 >>> 5 0013664 ? ? ? ? ? 30 ? ? ? ? ? ? ? 1 >>> 6 0014217 ? ? ? ? ? 40 ? ? ? ? ? ? ? 1 >>> >>> >>> I had planned to use a Cox PH model to analyse time to waking >>> (ORIENTATION = >>> 10, 15, 20 mins ....... 50 mins) and whether or not people (MRN) are >>> fully >>> awake within an hour (FullyOrientated). I've put ?GENDER, etc. into the >>> model but I have the following bias: >>> >>> The procedure is repeated weekly on each individual (MRN), so each >>> individual has 5-9 cases associated with them. Currently I am including >>> these in the model as if they were independent. >>> >>> Is there a way to account for the non-independence of these waking times? >>> >>> I'm thinking of something similar to the NLMER package and Multilevel / >>> Mixed Effects analysis as described in Pinheiro and Bates. >> >> Have you looked at the coxme package? > > As an initial strata()-gem, as it were, perhaps just adding strata(MRN) may > parcel out the intra-individual variability and degrees of freedom, so that > they are not inappropriately included in the IV's. My initial suggestion of > coxme may be overkill. > >> -- >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm From lists at revelle.net Fri Jul 1 17:36:24 2011 From: lists at revelle.net (William Revelle) Date: Fri, 1 Jul 2011 10:36:24 -0500 Subject: [R] very large pair() plot In-Reply-To: References: <1309382885395-3634075.post@n4.nabble.com> Message-ID: In addition to Uwe's suggestion of creating a pdf and then plotting to it, it is useful to set the gap size to 0 (gap=0) You might also look at pairs.panels in the psych package which implements one of the examples in pairs (i.e., it gives histograms on the diagonal and reports the correlation above the diagonal). Bill At 11:30 AM -0600 6/30/11, Greg Snow wrote: >In addition to Uwe's answer, you might also want to consider the >pairs2 function in the TeachingDemos package. It lets you plot >sections of the overall scatterplot matrix rather than the whole >thing, so you could spread the entire scatterplot matrix over >multiple pages. > >-- >Gregory (Greg) L. Snow Ph.D. >Statistical Data Center >Intermountain Healthcare >greg.snow at imail.org >801.408.8111 > > >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- >> project.org] On Behalf Of ahrager >> Sent: Wednesday, June 29, 2011 3:28 PM >> To: r-help at r-project.org >> Subject: [R] very large pair() plot >> >> Hi everyone, >> >> I'm a newbie and this is my first post. >> >> My boss wants me to make a series of scatter plots where 76 variables >> are >> plotted against each other. I know how to do this using pair()...my >> problem >> is that there are just too many plots to fit in the window. >> >> Is there any way I can get all the plots to fit and make the font size >> and >> marker size scale so it is readable? My goal is to create a *.pdf file >> that >> I can send to our large plotter. >> >> Thank you, >> Audrey Rager >> >> >> >> -- >> View this message in context: http://r.789695.n4.nabble.com/very-large- >> pair-plot-tp3634075p3634075.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. From marchywka at hotmail.com Fri Jul 1 17:33:01 2011 From: marchywka at hotmail.com (Mike Marchywka) Date: Fri, 1 Jul 2011 11:33:01 -0400 Subject: [R] How to fit ARMA model In-Reply-To: References: <1309503713258-3637632.post@n4.nabble.com>, Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From cody.shawn at yahoo.com Fri Jul 1 18:07:12 2011 From: cody.shawn at yahoo.com (Cody Hamilton) Date: Fri, 1 Jul 2011 09:07:12 -0700 (PDT) Subject: [R] testInstalledBasic In-Reply-To: <4E0DC5D2.5090002@statistik.tu-dortmund.de> Message-ID: <180935.90317.qm@web120526.mail.ne1.yahoo.com> Hi Uwe, Thank you for taking the time to look into this! I created the function my.test by modifying testInstalledBasic with the line change you list below and then ran: Sys.setenv(LC_COLLATE="C") my.test('basic') I get the same error message as before: > my.test('basic') running strict specific tests running code in ?eval-etc.R? comparing ?eval-etc.Rout? to ?eval-etc.Rout.save? ...[1] 1 Here is the function my.test: my.test<-function (scope = c("basic", "devel", "both")) { scope <- match.arg(scope) Sys.setlocale("LC_COLLATE", "C") tests1 <- c("eval-etc", "simple-true", "arith-true", "lm-tests", "ok-errors", "method-dispatch", "d-p-q-r-tests") tests2 <- c("complex", "print-tests", "lapack", "datasets") tests3 <- c("reg-tests-1a", "reg-tests-1b", "reg-tests-2", "reg-IO", "reg-IO2", "reg-S4") runone <- function(f, diffOK = FALSE, inC = TRUE) { f <- paste(f, "R", sep = ".") if (!file.exists(f)) { if (!file.exists(fin <- paste(f, "in", sep = ""))) stop("file ", sQuote(f), " not found", domain = NA) message("creating ", sQuote(f)) cmd <- paste(shQuote(file.path(R.home("bin"), "R")), "CMD BATCH --no-timing --vanilla --slave", fin) if (system(cmd)) stop("creation of ", sQuote(f), " failed") on.exit(unlink(f)) } message(" running code in ", sQuote(f)) outfile <- paste(f, "out", sep = "") cmd <- paste(shQuote(file.path(R.home("bin"), "R")), "CMD BATCH --vanilla --no-timing", shQuote(f), shQuote(outfile)) extra <- paste("LANGUAGE=C", "R_DEFAULT_PACKAGES=", "SRCDIR=.") if (inC) extra <- paste(extra, "LC_ALL=C") if (.Platform$OS.type == "windows") { Sys.setenv(LANGUAGE = "C") Sys.setenv(R_DEFAULT_PACKAGES = "") Sys.setenv(SRCDIR = ".") } else cmd <- paste(extra, cmd) res <- system(cmd) if (res) { file.rename(outfile, paste(outfile, "fail", sep = ".")) message("FAILED") return(1L) } savefile <- paste(outfile, "save", sep = ".") if (file.exists(savefile)) { message(" comparing ", sQuote(outfile), " to ", sQuote(savefile), " ...", appendLF = FALSE) res <- Rdiff(outfile, savefile, TRUE) if (!res) message(" OK") else if (!diffOK) return(1L) } 0L } owd <- setwd(file.path(R.home(), "tests")) on.exit(setwd(owd)) if (scope %in% c("basic", "both")) { message("running strict specific tests") for (f in tests1) if (runone(f)) return(1L) message("running sloppy specific tests") for (f in tests2) runone(f, TRUE) message("running regression tests") for (f in tests3) { if (runone(f)) return(invisible(1L)) if (f == "reg-plot") { message(" comparing 'reg-plot.ps' to 'reg-plot.ps.save' ...", appendLF = FALSE) system("diff reg-plot.ps reg-plot.ps.save") message("OK") } } runone("reg-tests-3", TRUE) message("running tests of plotting Latin-1") message(" expect failure or some differences if not in a Latin or UTF-8 locale") runone("reg-plot-latin1", TRUE, FALSE) message(" comparing 'reg-plot-latin1.ps' to 'reg-plot-latin1.ps.save' ...", appendLF = FALSE) system("diff reg-plot-latin1.ps reg-plot-latin1.ps.save") message("OK") } if (scope %in% c("devel", "both")) { message("running tests of consistency of as/is.*") runone("isas-tests") message("running tests of random deviate generation -- fails occasionally") runone("p-r-random-tests", TRUE) message("running tests of primitives") if (runone("primitives")) return(invisible(1L)) message("running regexp regression tests") if (runone("utf8-regex", inC = FALSE)) return(invisible(1L)) message("running tests to possibly trigger segfaults") if (runone("no-segfault")) return(invisible(1L)) } invisible(0L) } --- On Fri, 7/1/11, Uwe Ligges wrote: > From: Uwe Ligges > Subject: Re: [R] testInstalledBasic > To: "Cody Hamilton" > Cc: r-help at r-project.org > Date: Friday, July 1, 2011, 6:04 AM > > > On 01.07.2011 01:17, Cody Hamilton wrote: > > Hello, > > > > I installed R 2.13.0 on a Windows 2003 server.? I > downloaded the Rtools213.exe from http://www.murdoch-sutherland.com/Rtools/ > and placed it in the path (C:\Program > Files\R\R-2.13.0\bin). > > > > I submitted the following code: > > > > library(tools) > > Sys.setenv(LC_COLLATE=C) > > testInstalledBasic('basic') > > > > I get the following message in the R Console, which I > believe corresponds to a failure of the test: > > > >> library(tools) > >> Sys.setenv(LC_COLLATE=C) > >> testInstalledBasic('basic') > > running strict specific tests > >? ? running code in ?eval-etc.R? > >? ? comparing ?eval-etc.Rout? to > ?eval-etc.Rout.save? ...[1] 1 > > > > Is there something wrong with my install? > > > > I took a closer look and your problem is that you want > > Sys.setenv(LC_COLLATE="C") > > rather than > > Sys.setenv(LC_COLLATE=C) > > since C is a function but "C" the character you actually > want to set. > > > > > Anyway, there is a bug in ./src/library/tools/R/testing.R > (e.g. for > today's R-devel): > > The line > > ? ???tests3 <- c("reg-tests-1", > "reg-tests-2", "reg-IO", "reg-IO2", > "reg-S4") > > needs to be replaced by > > ? ???tests3 <- c("reg-tests-1a", > "reg-tests-1b", "reg-tests-2", > "reg-IO", "reg-IO2", "reg-S4") > > Any R core member around to fix this? > > > Best, > Uwe Ligges > > > > > > Regards, > >? ???-Cody > > > > ______________________________________________ > > R-help at r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > From ligges at statistik.tu-dortmund.de Fri Jul 1 18:13:09 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Fri, 01 Jul 2011 18:13:09 +0200 Subject: [R] testInstalledBasic In-Reply-To: <180935.90317.qm@web120526.mail.ne1.yahoo.com> References: <180935.90317.qm@web120526.mail.ne1.yahoo.com> Message-ID: <4E0DF215.4070801@statistik.tu-dortmund.de> On 01.07.2011 18:07, Cody Hamilton wrote: > Hi Uwe, > > Thank you for taking the time to look into this! > > I created the function my.test by modifying testInstalledBasic with the line change you list below and then ran: > > Sys.setenv(LC_COLLATE="C") > my.test('basic') > > I get the same error message as before: > >> my.test('basic') > running strict specific tests > running code in ?eval-etc.R? > comparing ?eval-etc.Rout? to ?eval-etc.Rout.save? ...[1] 1 But we really need your diffs! Uwe Ligges > Here is the function my.test: > > my.test<-function (scope = c("basic", "devel", "both")) > { > scope<- match.arg(scope) > Sys.setlocale("LC_COLLATE", "C") > tests1<- c("eval-etc", "simple-true", "arith-true", "lm-tests", > "ok-errors", "method-dispatch", "d-p-q-r-tests") > tests2<- c("complex", "print-tests", "lapack", "datasets") > tests3<- c("reg-tests-1a", "reg-tests-1b", "reg-tests-2", "reg-IO", "reg-IO2", "reg-S4") > > runone<- function(f, diffOK = FALSE, inC = TRUE) { > f<- paste(f, "R", sep = ".") > if (!file.exists(f)) { > if (!file.exists(fin<- paste(f, "in", sep = ""))) > stop("file ", sQuote(f), " not found", domain = NA) > message("creating ", sQuote(f)) > cmd<- paste(shQuote(file.path(R.home("bin"), "R")), > "CMD BATCH --no-timing --vanilla --slave", fin) > if (system(cmd)) > stop("creation of ", sQuote(f), " failed") > on.exit(unlink(f)) > } > message(" running code in ", sQuote(f)) > outfile<- paste(f, "out", sep = "") > cmd<- paste(shQuote(file.path(R.home("bin"), "R")), > "CMD BATCH --vanilla --no-timing", shQuote(f), shQuote(outfile)) > extra<- paste("LANGUAGE=C", "R_DEFAULT_PACKAGES=", "SRCDIR=.") > if (inC) > extra<- paste(extra, "LC_ALL=C") > if (.Platform$OS.type == "windows") { > Sys.setenv(LANGUAGE = "C") > Sys.setenv(R_DEFAULT_PACKAGES = "") > Sys.setenv(SRCDIR = ".") > } > else cmd<- paste(extra, cmd) > res<- system(cmd) > if (res) { > file.rename(outfile, paste(outfile, "fail", sep = ".")) > message("FAILED") > return(1L) > } > savefile<- paste(outfile, "save", sep = ".") > if (file.exists(savefile)) { > message(" comparing ", sQuote(outfile), " to ", > sQuote(savefile), " ...", appendLF = FALSE) > res<- Rdiff(outfile, savefile, TRUE) > if (!res) > message(" OK") > else if (!diffOK) > return(1L) > } > 0L > } > owd<- setwd(file.path(R.home(), "tests")) > on.exit(setwd(owd)) > if (scope %in% c("basic", "both")) { > message("running strict specific tests") > for (f in tests1) if (runone(f)) > return(1L) > message("running sloppy specific tests") > for (f in tests2) runone(f, TRUE) > message("running regression tests") > for (f in tests3) { > if (runone(f)) > return(invisible(1L)) > if (f == "reg-plot") { > message(" comparing 'reg-plot.ps' to 'reg-plot.ps.save' ...", > appendLF = FALSE) > system("diff reg-plot.ps reg-plot.ps.save") > message("OK") > } > } > runone("reg-tests-3", TRUE) > message("running tests of plotting Latin-1") > message(" expect failure or some differences if not in a Latin or UTF-8 locale") > runone("reg-plot-latin1", TRUE, FALSE) > message(" comparing 'reg-plot-latin1.ps' to 'reg-plot-latin1.ps.save' ...", > appendLF = FALSE) > system("diff reg-plot-latin1.ps reg-plot-latin1.ps.save") > message("OK") > } > if (scope %in% c("devel", "both")) { > message("running tests of consistency of as/is.*") > runone("isas-tests") > message("running tests of random deviate generation -- fails occasionally") > runone("p-r-random-tests", TRUE) > message("running tests of primitives") > if (runone("primitives")) > return(invisible(1L)) > message("running regexp regression tests") > if (runone("utf8-regex", inC = FALSE)) > return(invisible(1L)) > message("running tests to possibly trigger segfaults") > if (runone("no-segfault")) > return(invisible(1L)) > } > invisible(0L) > } > > > --- On Fri, 7/1/11, Uwe Ligges wrote: > >> From: Uwe Ligges >> Subject: Re: [R] testInstalledBasic >> To: "Cody Hamilton" >> Cc: r-help at r-project.org >> Date: Friday, July 1, 2011, 6:04 AM >> >> >> On 01.07.2011 01:17, Cody Hamilton wrote: >>> Hello, >>> >>> I installed R 2.13.0 on a Windows 2003 server. I >> downloaded the Rtools213.exe from http://www.murdoch-sutherland.com/Rtools/ >> and placed it in the path (C:\Program >> Files\R\R-2.13.0\bin). >>> >>> I submitted the following code: >>> >>> library(tools) >>> Sys.setenv(LC_COLLATE=C) >>> testInstalledBasic('basic') >>> >>> I get the following message in the R Console, which I >> believe corresponds to a failure of the test: >>> >>>> library(tools) >>>> Sys.setenv(LC_COLLATE=C) >>>> testInstalledBasic('basic') >>> running strict specific tests >>> running code in ?eval-etc.R? >>> comparing ?eval-etc.Rout? to >> ?eval-etc.Rout.save? ...[1] 1 >>> >>> Is there something wrong with my install? >> >> >> >> I took a closer look and your problem is that you want >> >> Sys.setenv(LC_COLLATE="C") >> >> rather than >> >> Sys.setenv(LC_COLLATE=C) >> >> since C is a function but "C" the character you actually >> want to set. >> >> >> >> >> Anyway, there is a bug in ./src/library/tools/R/testing.R >> (e.g. for >> today's R-devel): >> >> The line >> >> tests3<- c("reg-tests-1", >> "reg-tests-2", "reg-IO", "reg-IO2", >> "reg-S4") >> >> needs to be replaced by >> >> tests3<- c("reg-tests-1a", >> "reg-tests-1b", "reg-tests-2", >> "reg-IO", "reg-IO2", "reg-S4") >> >> Any R core member around to fix this? >> >> >> Best, >> Uwe Ligges >> >> >> >> >>> Regards, >>> -Cody >>> >>> ______________________________________________ >>> R-help at r-project.org >> mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, >> reproducible code. >> From f.harrell at vanderbilt.edu Fri Jul 1 18:25:01 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Fri, 1 Jul 2011 09:25:01 -0700 (PDT) Subject: [R] multiple moderated regression steps In-Reply-To: <1309528117288-3638235.post@n4.nabble.com> References: <1309512259495-3637807.post@n4.nabble.com> <1309526551121-3638186.post@n4.nabble.com> <1309528117288-3638235.post@n4.nabble.com> Message-ID: <1309537501504-3638669.post@n4.nabble.com> That concern has nothing to do with centering variables before including them in a model. Your multiple significance testing strategy is not based on statistical principles and will distort all inferences you obtain from the "final" model. Frank stamkiral wrote: > > variables were centered (except DV), because in social sciences zero is > rarely a meaningful point on a scale (Cohen, Cohen, West, Aiken, 2006). > for example in percieved social support questionnaire there is no value as > zero. It is a Likert Type questionnaire and on questionnaire 1= strongly > yes.....7= strongly no. So Aiken and West (1991), suggested to centered > predictors and moderators to appoint zero a meaningful value to count in > regression equation. also multilevel steps were run to avoid > multicollinearity. according to results .05 were accepted as significant > (in the coefficient table) and slope test were done for interactions. > for plottin slopes unstandardised regression coefficient were used. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/multiple-moderated-regression-steps-tp3637807p3638669.html Sent from the R help mailing list archive at Nabble.com. From cody.shawn at yahoo.com Fri Jul 1 18:51:56 2011 From: cody.shawn at yahoo.com (Cody Hamilton) Date: Fri, 1 Jul 2011 09:51:56 -0700 (PDT) Subject: [R] testInstalledBasic In-Reply-To: <4E0DF215.4070801@statistik.tu-dortmund.de> Message-ID: <865628.19532.qm@web120530.mail.ne1.yahoo.com> Hello Uwe, Please forgive my ignorance - how can I get my diffs? Regards, -Cody --- On Fri, 7/1/11, Uwe Ligges wrote: > From: Uwe Ligges > Subject: Re: [R] testInstalledBasic > To: "Cody Hamilton" > Cc: r-help at r-project.org > Date: Friday, July 1, 2011, 9:13 AM > > > On 01.07.2011 18:07, Cody Hamilton wrote: > > Hi Uwe, > > > > Thank you for taking the time to look into this! > > > > I created the function my.test by modifying > testInstalledBasic with the line change you list below and > then ran: > > > > Sys.setenv(LC_COLLATE="C") > > my.test('basic') > > > > I get the same error message as before: > > > >> my.test('basic') > > running strict specific tests > >? ? running code in ?eval-etc.R? > >? ? comparing ?eval-etc.Rout? to > ?eval-etc.Rout.save? ...[1] 1 > > > But we really need your diffs! > > Uwe Ligges > > > > Here is the function my.test: > > > > my.test<-function (scope = c("basic", "devel", > "both")) > > { > >? ? ? scope<- match.arg(scope) > >? ? ? Sys.setlocale("LC_COLLATE", "C") > >? ? ? tests1<- c("eval-etc", > "simple-true", "arith-true", "lm-tests", > >? ? ? ? ? "ok-errors", > "method-dispatch", "d-p-q-r-tests") > >? ? ? tests2<- c("complex", > "print-tests", "lapack", "datasets") > >? ? ? tests3<- c("reg-tests-1a", > "reg-tests-1b", "reg-tests-2", "reg-IO", "reg-IO2", > "reg-S4") > > > >? ? ? runone<- function(f, diffOK = > FALSE, inC = TRUE) { > >? ? ? ? ? f<- paste(f, "R", > sep = ".") > >? ? ? ? ? if (!file.exists(f)) > { > >? ? ? ? ? ? ? if > (!file.exists(fin<- paste(f, "in", sep = ""))) > >? ? ? ? ? ? ? ? > ? stop("file ", sQuote(f), " not found", domain = NA) > >? ? ? ? ? ? ? > message("creating ", sQuote(f)) > >? ? ? ? ? ? ? > cmd<- paste(shQuote(file.path(R.home("bin"), "R")), > >? ? ? ? ? ? ? ? > ? "CMD BATCH --no-timing --vanilla --slave", fin) > >? ? ? ? ? ? ? if > (system(cmd)) > >? ? ? ? ? ? ? ? > ? stop("creation of ", sQuote(f), " failed") > >? ? ? ? ? ? ? > on.exit(unlink(f)) > >? ? ? ? ? } > >? ? ? ? ? message("? > running code in ", sQuote(f)) > >? ? ? ? ? outfile<- > paste(f, "out", sep = "") > >? ? ? ? ? cmd<- > paste(shQuote(file.path(R.home("bin"), "R")), > >? ? ? ? ? ? ? "CMD > BATCH --vanilla --no-timing", shQuote(f), shQuote(outfile)) > >? ? ? ? ? extra<- > paste("LANGUAGE=C", "R_DEFAULT_PACKAGES=", "SRCDIR=.") > >? ? ? ? ? if (inC) > >? ? ? ? ? ? ? > extra<- paste(extra, "LC_ALL=C") > >? ? ? ? ? if > (.Platform$OS.type == "windows") { > >? ? ? ? ? ? ? > Sys.setenv(LANGUAGE = "C") > >? ? ? ? ? ? ? > Sys.setenv(R_DEFAULT_PACKAGES = "") > >? ? ? ? ? ? ? > Sys.setenv(SRCDIR = ".") > >? ? ? ? ? } > >? ? ? ? ? else cmd<- > paste(extra, cmd) > >? ? ? ? ? res<- > system(cmd) > >? ? ? ? ? if (res) { > >? ? ? ? ? ? ? > file.rename(outfile, paste(outfile, "fail", sep = ".")) > >? ? ? ? ? ? ? > message("FAILED") > >? ? ? ? ? ? ? > return(1L) > >? ? ? ? ? } > >? ? ? ? ? savefile<- > paste(outfile, "save", sep = ".") > >? ? ? ? ? if > (file.exists(savefile)) { > >? ? ? ? ? ? ? > message("? comparing ", sQuote(outfile), " to ", > >? ? ? ? ? ? ? ? > ? sQuote(savefile), " ...", appendLF = FALSE) > >? ? ? ? ? ? ? > res<- Rdiff(outfile, savefile, TRUE) > >? ? ? ? ? ? ? if > (!res) > >? ? ? ? ? ? ? ? > ? message(" OK") > >? ? ? ? ? ? ? else > if (!diffOK) > >? ? ? ? ? ? ? ? > ? return(1L) > >? ? ? ? ? } > >? ? ? ? ? 0L > >? ? ? } > >? ? ? owd<- setwd(file.path(R.home(), > "tests")) > >? ? ? on.exit(setwd(owd)) > >? ? ? if (scope %in% c("basic", "both")) > { > >? ? ? ? ? message("running > strict specific tests") > >? ? ? ? ? for (f in tests1) if > (runone(f)) > >? ? ? ? ? ? ? > return(1L) > >? ? ? ? ? message("running > sloppy specific tests") > >? ? ? ? ? for (f in tests2) > runone(f, TRUE) > >? ? ? ? ? message("running > regression tests") > >? ? ? ? ? for (f in tests3) { > >? ? ? ? ? ? ? if > (runone(f)) > >? ? ? ? ? ? ? ? > ? return(invisible(1L)) > >? ? ? ? ? ? ? if (f > == "reg-plot") { > >? ? ? ? ? ? ? ? > ? message("? comparing 'reg-plot.ps' to > 'reg-plot.ps.save' ...", > >? ? ? ? ? ? ? ? > ? ? appendLF = FALSE) > >? ? ? ? ? ? ? ? > ? system("diff reg-plot.ps reg-plot.ps.save") > >? ? ? ? ? ? ? ? > ? message("OK") > >? ? ? ? ? ? ? } > >? ? ? ? ? } > >? ? ? ? ? > runone("reg-tests-3", TRUE) > >? ? ? ? ? message("running > tests of plotting Latin-1") > >? ? ? ? ? message("? > expect failure or some differences if not in a Latin or > UTF-8 locale") > >? ? ? ? ? > runone("reg-plot-latin1", TRUE, FALSE) > >? ? ? ? ? message("? > comparing 'reg-plot-latin1.ps' to 'reg-plot-latin1.ps.save' > ...", > >? ? ? ? ? ? ? > appendLF = FALSE) > >? ? ? ? ? system("diff > reg-plot-latin1.ps reg-plot-latin1.ps.save") > >? ? ? ? ? message("OK") > >? ? ? } > >? ? ? if (scope %in% c("devel", "both")) > { > >? ? ? ? ? message("running > tests of consistency of as/is.*") > >? ? ? ? ? > runone("isas-tests") > >? ? ? ? ? message("running > tests of random deviate generation -- fails occasionally") > >? ? ? ? ? > runone("p-r-random-tests", TRUE) > >? ? ? ? ? message("running > tests of primitives") > >? ? ? ? ? if > (runone("primitives")) > >? ? ? ? ? ? ? > return(invisible(1L)) > >? ? ? ? ? message("running > regexp regression tests") > >? ? ? ? ? if > (runone("utf8-regex", inC = FALSE)) > >? ? ? ? ? ? ? > return(invisible(1L)) > >? ? ? ? ? message("running > tests to possibly trigger segfaults") > >? ? ? ? ? if > (runone("no-segfault")) > >? ? ? ? ? ? ? > return(invisible(1L)) > >? ? ? } > >? ? ? invisible(0L) > > } > > > > > > --- On Fri, 7/1/11, Uwe Ligges? > wrote: > > > >> From: Uwe Ligges > >> Subject: Re: [R] testInstalledBasic > >> To: "Cody Hamilton" > >> Cc: r-help at r-project.org > >> Date: Friday, July 1, 2011, 6:04 AM > >> > >> > >> On 01.07.2011 01:17, Cody Hamilton wrote: > >>> Hello, > >>> > >>> I installed R 2.13.0 on a Windows 2003 > server.? I > >> downloaded the Rtools213.exe from http://www.murdoch-sutherland.com/Rtools/ > >> and placed it in the path (C:\Program > >> Files\R\R-2.13.0\bin). > >>> > >>> I submitted the following code: > >>> > >>> library(tools) > >>> Sys.setenv(LC_COLLATE=C) > >>> testInstalledBasic('basic') > >>> > >>> I get the following message in the R Console, > which I > >> believe corresponds to a failure of the test: > >>> > >>>> library(tools) > >>>> Sys.setenv(LC_COLLATE=C) > >>>> testInstalledBasic('basic') > >>> running strict specific tests > >>>? ? ? running code in > ?eval-etc.R? > >>>? ? ? comparing > ?eval-etc.Rout? to > >> ?eval-etc.Rout.save? ...[1] 1 > >>> > >>> Is there something wrong with my install? > >> > >> > >> > >> I took a closer look and your problem is that you > want > >> > >> Sys.setenv(LC_COLLATE="C") > >> > >> rather than > >> > >> Sys.setenv(LC_COLLATE=C) > >> > >> since C is a function but "C" the character you > actually > >> want to set. > >> > >> > >> > >> > >> Anyway, there is a bug in > ./src/library/tools/R/testing.R > >> (e.g. for > >> today's R-devel): > >> > >> The line > >> > >>? ? ???tests3<- > c("reg-tests-1", > >> "reg-tests-2", "reg-IO", "reg-IO2", > >> "reg-S4") > >> > >> needs to be replaced by > >> > >>? ? ???tests3<- > c("reg-tests-1a", > >> "reg-tests-1b", "reg-tests-2", > >> "reg-IO", "reg-IO2", "reg-S4") > >> > >> Any R core member around to fix this? > >> > >> > >> Best, > >> Uwe Ligges > >> > >> > >> > >> > >>> Regards, > >>>? ? ???-Cody > >>> > >>> > ______________________________________________ > >>> R-help at r-project.org > >> mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, > self-contained, > >> reproducible code. > >> > From Greg.Snow at imail.org Fri Jul 1 19:04:12 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Fri, 1 Jul 2011 11:04:12 -0600 Subject: [R] source, echo...and clicking the mouse In-Reply-To: <4E0CF91F.2080904@gmail.com> References: <001001cc3762$978b6740$c6a235c0$@msu.edu> <1014D1BD-97EB-4BBC-A0BE-258CACBA32AF@comcast.net> <002101cc376a$d988e430$8c9aac90$@msu.edu> <4E0CF91F.2080904@gmail.com> Message-ID: OK, I missed that he did state the OS, my apologies to Steven. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: Duncan Murdoch [mailto:murdoch.duncan at gmail.com] > Sent: Thursday, June 30, 2011 4:31 PM > To: Greg Snow > Cc: Steven Wolf; 'David Winsemius'; r-help at r-project.org > Subject: Re: [R] source, echo...and clicking the mouse > > On 30/06/2011 5:33 PM, Greg Snow wrote: > > On some operating systems (which we don't know yours, see the posting > guide) the output is buffered and including a call to flush.console() > will flush all the output from the buffer to the console. Put the > function call throughout the script and when it is run it will stop > buffering for a bit. > > The OP said he's running in Windows 7. In that case, there's the > "Misc|Buffered output" option, which makes things appear as soon as > they > are printed. It appears to work for source(..., echo=TRUE) as well. > > Duncan Murdoch > > > > > > The other possibility is that your script does some plotting and that > is what is pausing until you click. In that case you need to use a > different graphics device or methodology to avoid this (see > ?interactive). > > From marc_schwartz at me.com Fri Jul 1 19:16:59 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Fri, 01 Jul 2011 12:16:59 -0500 Subject: [R] testInstalledBasic In-Reply-To: <865628.19532.qm@web120530.mail.ne1.yahoo.com> References: <865628.19532.qm@web120530.mail.ne1.yahoo.com> Message-ID: <80D5FC1B-FBDE-422D-BB4D-BEEFC45E02C1@me.com> Cody, The 'diff' program should be installed with Duncan's RTools package on Windows and if in your path, should be usable via the CLI. From a Windows console command line, in the folder where the output files in question are located, use: diff eval-etc.Rout eval-etc.Rout.save > diff.txt That will generate the file diff.txt, which will contain the line by line differences in the two files, which you can post back for review. There is also this post from Prof. Ripley from last year: https://stat.ethz.ch/pipermail/r-help/2010-May/237922.html which may perhaps be helpful for Uwe in considering other possible solutions if locale is relevant here. HTH, Marc Schwartz On Jul 1, 2011, at 11:51 AM, Cody Hamilton wrote: > Hello Uwe, > > Please forgive my ignorance - how can I get my diffs? > > Regards, > -Cody > > --- On Fri, 7/1/11, Uwe Ligges wrote: > >> From: Uwe Ligges >> Subject: Re: [R] testInstalledBasic >> To: "Cody Hamilton" >> Cc: r-help at r-project.org >> Date: Friday, July 1, 2011, 9:13 AM >> >> >> On 01.07.2011 18:07, Cody Hamilton wrote: >>> Hi Uwe, >>> >>> Thank you for taking the time to look into this! >>> >>> I created the function my.test by modifying >> testInstalledBasic with the line change you list below and >> then ran: >>> >>> Sys.setenv(LC_COLLATE="C") >>> my.test('basic') >>> >>> I get the same error message as before: >>> >>>> my.test('basic') >>> running strict specific tests >>> running code in ?eval-etc.R? >>> comparing ?eval-etc.Rout? to >> ?eval-etc.Rout.save? ...[1] 1 >> >> >> But we really need your diffs! >> >> Uwe Ligges From cody.shawn at yahoo.com Fri Jul 1 19:47:13 2011 From: cody.shawn at yahoo.com (Cody Hamilton) Date: Fri, 1 Jul 2011 10:47:13 -0700 (PDT) Subject: [R] testInstalledBasic In-Reply-To: <80D5FC1B-FBDE-422D-BB4D-BEEFC45E02C1@me.com> Message-ID: <800592.60526.qm@web120529.mail.ne1.yahoo.com> Hello Marc, I think I am quite a dunce! I have diff.exe in the folder C:\Program Files\R\R-2.13.0\bin, which is where the R.exe file is located. I ran the following from the command line: > diff C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout.save > diff.txt Error: unexpected symbol in "diff C" I tried again with quotes around the file names: > diff 'C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout' 'C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout.save' > diff.txt Error: unexpected string constant in "diff 'C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout'" Regards, Cody --- On Fri, 7/1/11, Marc Schwartz wrote: > From: Marc Schwartz > Subject: Re: [R] testInstalledBasic > To: "Cody Hamilton" > Cc: "Uwe Ligges" , r-help at r-project.org > Date: Friday, July 1, 2011, 10:16 AM > Cody, > > The 'diff' program should be installed with Duncan's RTools > package on Windows and if in your path, should be usable via > the CLI. > > From a Windows console command line, in the folder where > the output files in question are located, use: > > ? diff eval-etc.Rout eval-etc.Rout.save > diff.txt > > That will generate the file diff.txt, which will contain > the line by line differences in the two files, which you can > post back for review. > > There is also this post from Prof. Ripley from last year: > > ? https://stat.ethz.ch/pipermail/r-help/2010-May/237922.html > > which may perhaps be helpful for Uwe in considering other > possible solutions if locale is relevant here. > > HTH, > > Marc Schwartz > > > On Jul 1, 2011, at 11:51 AM, Cody Hamilton wrote: > > > Hello Uwe, > > > > Please forgive my ignorance - how can I get my diffs? > > > > Regards, > >???-Cody > > > > --- On Fri, 7/1/11, Uwe Ligges > wrote: > > > >> From: Uwe Ligges > >> Subject: Re: [R] testInstalledBasic > >> To: "Cody Hamilton" > >> Cc: r-help at r-project.org > >> Date: Friday, July 1, 2011, 9:13 AM > >> > >> > >> On 01.07.2011 18:07, Cody Hamilton wrote: > >>> Hi Uwe, > >>> > >>> Thank you for taking the time to look into > this! > >>> > >>> I created the function my.test by modifying > >> testInstalledBasic with the line change you list > below and > >> then ran: > >>> > >>> Sys.setenv(LC_COLLATE="C") > >>> my.test('basic') > >>> > >>> I get the same error message as before: > >>> > >>>> my.test('basic') > >>> running strict specific tests > >>>? ???running code in > ?eval-etc.R? > >>>? ???comparing > ?eval-etc.Rout? to > >> ?eval-etc.Rout.save? ...[1] 1 > >> > >> > >> But we really need your diffs! > >> > >> Uwe Ligges > > From ligges at statistik.tu-dortmund.de Fri Jul 1 20:04:12 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Fri, 01 Jul 2011 20:04:12 +0200 Subject: [R] testInstalledBasic In-Reply-To: <800592.60526.qm@web120529.mail.ne1.yahoo.com> References: <800592.60526.qm@web120529.mail.ne1.yahoo.com> Message-ID: <4E0E0C1C.8010905@statistik.tu-dortmund.de> On 01.07.2011 19:47, Cody Hamilton wrote: > Hello Marc, > > I think I am quite a dunce! > > I have diff.exe in the folder C:\Program Files\R\R-2.13.0\bin, which is where the R.exe file is located. > > I ran the following from the command line: > >> diff C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout.save> diff.txt > Error: unexpected symbol in "diff C" > > > I tried again with quotes around the file names: > >> diff 'C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout' 'C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout.save'> diff.txt > Error: unexpected string constant in "diff 'C:/Program Files/R/R-2.13.0/tests/eval-etc.Rout'" No, you need to run it in the Windows command line rather than in R. There you can also cd to the tests directory and ask R to R CMD Rdiff eval-etc.Rout eval-etc.Rout.save Uwe Ligges > Regards, > Cody > > --- On Fri, 7/1/11, Marc Schwartz wrote: > >> From: Marc Schwartz >> Subject: Re: [R] testInstalledBasic >> To: "Cody Hamilton" >> Cc: "Uwe Ligges", r-help at r-project.org >> Date: Friday, July 1, 2011, 10:16 AM >> Cody, >> >> The 'diff' program should be installed with Duncan's RTools >> package on Windows and if in your path, should be usable via >> the CLI. >> >> From a Windows console command line, in the folder where >> the output files in question are located, use: >> >> diff eval-etc.Rout eval-etc.Rout.save> diff.txt >> >> That will generate the file diff.txt, which will contain >> the line by line differences in the two files, which you can >> post back for review. >> >> There is also this post from Prof. Ripley from last year: >> >> https://stat.ethz.ch/pipermail/r-help/2010-May/237922.html >> >> which may perhaps be helpful for Uwe in considering other >> possible solutions if locale is relevant here. >> >> HTH, >> >> Marc Schwartz >> >> >> On Jul 1, 2011, at 11:51 AM, Cody Hamilton wrote: >> >>> Hello Uwe, >>> >>> Please forgive my ignorance - how can I get my diffs? >>> >>> Regards, >>> -Cody >>> >>> --- On Fri, 7/1/11, Uwe Ligges >> wrote: >>> >>>> From: Uwe Ligges >>>> Subject: Re: [R] testInstalledBasic >>>> To: "Cody Hamilton" >>>> Cc: r-help at r-project.org >>>> Date: Friday, July 1, 2011, 9:13 AM >>>> >>>> >>>> On 01.07.2011 18:07, Cody Hamilton wrote: >>>>> Hi Uwe, >>>>> >>>>> Thank you for taking the time to look into >> this! >>>>> >>>>> I created the function my.test by modifying >>>> testInstalledBasic with the line change you list >> below and >>>> then ran: >>>>> >>>>> Sys.setenv(LC_COLLATE="C") >>>>> my.test('basic') >>>>> >>>>> I get the same error message as before: >>>>> >>>>>> my.test('basic') >>>>> running strict specific tests >>>>> running code in >> ?eval-etc.R? >>>>> comparing >> ?eval-etc.Rout? to >>>> ?eval-etc.Rout.save? ...[1] 1 >>>> >>>> >>>> But we really need your diffs! >>>> >>>> Uwe Ligges >> >> From mtb954 at gmail.com Fri Jul 1 23:09:39 2011 From: mtb954 at gmail.com (Mark Na) Date: Fri, 1 Jul 2011 15:09:39 -0600 Subject: [R] Poisson GLM with a logged dependent variable...just asking for trouble? Message-ID: Dear R-helpers, I'm using a GLM with poisson errors to model integer count data as a function of one non-integer covariate. The model formula is: log(DV) ~ glm(log(IV,10),family=poisson). I'm getting a warning because the logged DV is no longer an integer. I have three questions: 1) Can I ignore the warning, or is logging the DV (resulting in non-integers) a serious violation of the Poisson error structure? 2) If the answer to #1 is "no, don't ignore it, it's serious" then can I use a quasipoisson error structure instead (does not give the same warning) and if so are there any pitfalls to using the quasipoisson model? Are there any better alternatives for count data where the counts must be logged? Or, should I just abandon logging the DV? In that case, how could I compare the fit of a Poisson model (without logging the DV) to that of a GLM with normal errors (with a logged DV). AIC would not be valid because the DVs are different, right? 3) The quasipoisson model doesn't return an AIC value. Why, and is there anything I can do to calculate AIC manually, that would allow me to compare this model to other models? Many thanks in advance for your help! Cheers, Mark From dwinsemius at comcast.net Fri Jul 1 23:50:04 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 17:50:04 -0400 Subject: [R] Access only part of last dimension of table/matrix Message-ID: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> I would like to do some operations inside a function using only one value for the last dimension of a table/matrix: tabfn <- function (dfrm, facvec, YN ="event"){ return( Etbl <- do.call(table, dfrm[ , c(facvec, "event") ]) ) # just want Etbl[,,,"TRUE"] or Etbl[,, "TRUE"] or Etbl[,"TRUE"] } tbl <- tabfn(testdf, c("x", "y") ) tbl # all value of event returned At the console it is easy for me to count the number of factors and use the right number of commas tbl[ , , "TRUE"] if I only want the slice with that value. How can I do this programmatically? Thnks. -- David. David Winsemius, MD West Hartford, CT From jekel at coll.mpg.de Fri Jul 1 19:31:19 2011 From: jekel at coll.mpg.de (Marc Jekel) Date: Fri, 01 Jul 2011 19:31:19 +0200 Subject: [R] how to apply several math/logic operations on columns/rows of a matrix Message-ID: <4E0E0467.80509@coll.mpg.de> Dear R-Fans, The more I work with matrices (e.g., data.frames) the more I think it would be helpful to have functions to apply (several!) mathematical and/or logical operators column- or row-wise in a matrix. I know the function apply() and all its derivates (e.g., lapply) but I think this does not help for solving (e.g.) the following task: assume there is a 3x3 matrix: 1 2 4 4 5 3 1 3 4 How do I find - for each column separately - the position of the column's minimum without using loop commands, i.e.: I could extract each column in a loop and use something like: for (loopColumn in 1 : 3){ extractedColumnVector = myMatrix[, loopColumn] position = which(extractedColumnVector == min (extractedColumnVector ) ) print(position) } I think that there should be something simpler out there to handle these kinds of tasks (maybe there is and I just don't know but I checked several R books and could not find a command to do this). It would be great to have a function in which it is possible to define a sequence of commands that can be applied column/row-wise. Thanks for a hint, Marc -- Dipl.-Psych. Marc Jekel MPI for Research on Collective Goods Kurt-Schumacher-Str. 10 D-53113 Bonn Germany email: jekel at coll.mpg.de phone: ++49 (0) 228 91416-852 http://www.coll.mpg.de/team/page/marc_jekel-0 From mma at mariomartinezaraya.com Fri Jul 1 22:10:15 2011 From: mma at mariomartinezaraya.com (=?ISO-8859-1?Q?Mario_Mart=EDnez?=) Date: Fri, 1 Jul 2011 16:10:15 -0400 Subject: [R] RMySQL, RODBC, dbReadTable and ISO-8859-1 (Spanish data) Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From shbmira at gmail.com Fri Jul 1 23:10:04 2011 From: shbmira at gmail.com (Sergio Mira) Date: Fri, 01 Jul 2011 18:10:04 -0300 Subject: [R] Initiating in BNArray Message-ID: <4E0E37AC.4090401@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From shbmira at gmail.com Fri Jul 1 23:13:14 2011 From: shbmira at gmail.com (Sergio Mira) Date: Fri, 01 Jul 2011 18:13:14 -0300 Subject: [R] Initiating in BNArray In-Reply-To: <4E0E37AC.4090401@gmail.com> References: <4E0E37AC.4090401@gmail.com> Message-ID: <4E0E386A.1080902@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tryingtolearnagain at gmail.com Fri Jul 1 18:14:53 2011 From: tryingtolearnagain at gmail.com (Trying To learn again) Date: Fri, 1 Jul 2011 18:14:53 +0200 Subject: [R] Eliminating a row if something happens Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tryingtolearnagain at gmail.com Fri Jul 1 23:05:26 2011 From: tryingtolearnagain at gmail.com (Trying To learn again) Date: Fri, 1 Jul 2011 23:05:26 +0200 Subject: [R] Eliminating a row if something happens In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vikas.bansal at kcl.ac.uk Fri Jul 1 18:47:47 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Fri, 1 Jul 2011 17:47:47 +0100 Subject: [R] For help in R coding Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> Dear all, I am doing a project on variant calling using R.I am working on pileup file.There are 10 columns in my data frame and I want to count the number of A,C,G and T in each row for column 9.example of column 9 is given below- .a,g,, .t,t,, .,c,c, .,a,,, .,t,t,t .c,,g,^!. .g,ggg.^!, .$,,,,,., a,g,,t, ,,,,,.,^!. ,$,,,,.,. This is a bit confusing for me as these characters are in one column and how can we scan them for each row to print number of A,C,G and T for each row. Most of the rows have . and , and other symbols but we will ignore them.I just want to run a loop with a counter which will count the number of A,C,G and T for each row and will give output something like this- A C G T 1 0 1 0 0 0 0 2 0 2 0 0 1 0 0 0 0 0 0 3 This output is for first 5 rows from the example given above. I am new to R can you please help me.I will be very thankful to you. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From dwinsemius at comcast.net Sat Jul 2 00:03:58 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 18:03:58 -0400 Subject: [R] how to apply several math/logic operations on columns/rows of a matrix In-Reply-To: <4E0E0467.80509@coll.mpg.de> References: <4E0E0467.80509@coll.mpg.de> Message-ID: On Jul 1, 2011, at 1:31 PM, Marc Jekel wrote: > Dear R-Fans, > > The more I work with matrices (e.g., data.frames) the more I think > it would be helpful to have functions to apply (several!) > mathematical and/or logical operators column- or row-wise in a matrix. > > I know the function apply() and all its derivates (e.g., lapply) > but I think this does not help for solving (e.g.) the following task: > > assume there is a 3x3 matrix: > > 1 2 4 > 4 5 3 > 1 3 4 > > How do I find - for each column separately - the position of the > column's minimum without using loop commands, i.e.: ?which.min # should have been linked from the help(which) page. ... Yep, it is. > mtx <- matrix(scan(textConnection("1 2 4 + 4 5 3 + 1 3 4")), 3, byrow=TRUE) Read 9 items > mtx [,1] [,2] [,3] [1,] 1 2 4 [2,] 4 5 3 [3,] 1 3 4 > apply(mtx, 2, which.min) [1] 1 1 2 > > I could extract each column in a loop and use something like: > > for (loopColumn in 1 : 3){ > > extractedColumnVector = myMatrix[, loopColumn] > > position = which(extractedColumnVector == min > (extractedColumnVector ) ) > > print(position) > } > > I think that there should be something simpler out there to handle > these kinds of tasks (maybe there is and I just don't know but I > checked several R books and could not find a command to do this). > > It would be great to have a function in which it is possible to > define a sequence of commands that can be applied column/row-wise. > > Thanks for a hint, > > Marc > > -- > Dipl.-Psych. Marc Jekel > > MPI for Research on Collective Goods > Kurt-Schumacher-Str. 10 > D-53113 Bonn > Germany > > email: jekel at coll.mpg.de > phone: ++49 (0) 228 91416-852 > > http://www.coll.mpg.de/team/page/marc_jekel-0 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From camelbbs at gmail.com Sat Jul 2 00:05:18 2011 From: camelbbs at gmail.com (chun-jiang he) Date: Fri, 1 Jul 2011 17:05:18 -0500 Subject: [R] a question about girafe Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Sat Jul 2 00:10:12 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 18:10:12 -0400 Subject: [R] Eliminating a row if something happens In-Reply-To: References: Message-ID: On Jul 1, 2011, at 12:14 PM, Trying To learn again wrote: > Hi all, > > I want to create a new matrix based on a previous matrix. > > You see, If my "data" matrix accomplises this: > > if(rowSums(data[i,])>16|rowSums(data[i,])<28) > data[i,]= data[i,] 'if' is not the right command. > # if sum of rows is more than 16 and less 28 I conservate the row > #but How I say if else "remove" the line (I tried this) but doesn?t > works...I?m really a asn Use logical indexing. (Using 'dat' because 'data' is bad practice as an object name.) dat2 <- dat[ rowSums(dat) >16 | rowSums(dat) < 28 , ] > > else if(data[i,]=data[-i,]) The i- else construct only works on scalars. If you give them vectors they only use the first value and then they warn. You can use ifelse on vectors. > > You see imagine my data matrix is 10x4 and only 3 rows passes the > condition...the resulting matrix will be 3x4 > > I tried also to save the new matrix as csv like this > > write.csv(data, file = "input1.csv") > > How should I proceed to save as a txt? > > Many thanks I really like this problems but I feel my mind is > restricted to > more easy things???jajaj > > [[alternative HTML version deleted]] If you are "trying to learn" then try to learn to post in plain text. -- David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Sat Jul 2 00:25:20 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 18:25:20 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote: > Dear all, > > I am doing a project on variant calling using R.I am working on > pileup file.There are 10 columns in my data frame and I want to > count the number of A,C,G and T in each row for column 9.example of > column 9 is given below- > > .a,g,, > .t,t,, > .,c,c, > .,a,,, > .,t,t,t > .c,,g,^!. > .g,ggg.^!, > .$,,,,,., > a,g,,t, > ,,,,,.,^!. > ,$,,,,.,. > > This is a bit confusing for me as these characters are in one column > and how can we scan them for each row to print number of A,C,G and T > for each row. Seems a bit clunky but this does the job (first the data): > txt <- " .a,g,, + .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,." > txtvec <- readLines(textConnection(txt)) Now the clunky solution, Basically subtracts 1 from the counts of "fragments" that result from splitting on each letter in turn. Could be made prettier with a function that did the job. > data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a"), length) , "-", 1)), + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), length) , "-", 1)), + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), length) , "-", 1)), + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), length) , "-", 1)) ) A C G T .a,g,, 1 0 1 0 .t,t,, 0 0 0 2 .,c,c, 0 2 0 0 .,a,,, 1 0 0 0 .,t,t,t 0 0 0 2 .c,,g,^!. 0 1 1 0 .g,ggg.^!, 0 0 4 0 .$,,,,,., 0 0 0 0 a,g,,t, 1 0 1 1 ,,,,,.,^!. 0 0 0 0 ,$,,,,.,. 0 0 0 0 Has the advantage that the input data ends up as rownames, which was a surprise. If you wanted to count "A" and "a" as equivalent, then the split argument should be "a|A" > Most of the rows have . and , and other symbols > but we will ignore them.I just want to run a loop with a counter > which will count the number of A,C,G and T for each row and will > give output something like this- > > > A C G T > 1 0 1 0 > 0 0 0 2 > 0 2 0 0 > 1 0 0 0 > 0 0 0 3 > > This output is for first 5 rows from the example given above. > > I am new to R can you please help me.I will be very thankful to you. > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From ggrothendieck at gmail.com Sat Jul 2 01:11:32 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Fri, 1 Jul 2011 19:11:32 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: On Fri, Jul 1, 2011 at 12:47 PM, Bansal, Vikas wrote: > Dear all, > > I am doing a project on variant calling using R.I am working on pileup file.There are 10 columns in my data frame and I want to count the number of A,C,G and T in each row for column 9.example of column 9 is given below- > > ? ? ? ? ? ?.a,g,, > ? ? ? ? ? ?.t,t,, > ? ? ? ? ? ?.,c,c, > ? ? ? ? ? ?.,a,,, > ? ? ? ? ? ?.,t,t,t > ? ? ? ? ? ?.c,,g,^!. > ? ? ? ? ? ?.g,ggg.^!, > ? ? ? ? ? ?.$,,,,,., > ? ? ? ? ? ?a,g,,t, > ? ? ? ? ? ?,,,,,.,^!. > ? ? ? ? ? ?,$,,,,.,. > > This is a bit confusing for me as these characters are in one column and how can we scan them for each row to print number of A,C,G and T for each row. > Most of the rows have ? ? ?. ? ? ? ? and ? ? ?, ? ?and other symbols but we will ignore them.I just want to run a loop with a counter which will count the number of A,C,G and T for each row and will give output something like this- > > > A ? C ? G ?T > 1 ? 0 ? 1 ?0 > 0 ? 0 ? 0 ?2 > 0 ? 2 ? 0 ?0 > 1 ? 0 ? 0 ?0 > 0 ? 0 ? 0 ?3 > > This output is for first 5 rows from the example given above. > Read the lines into L and then remove all but each of a, c, g and t computing the number of characters in the remaining character strings: Lines <- ".a,g,, .t,t,, .,c,c, .,a,,, .,t,t,t .c,,g,^!. .g,ggg.^!, .$,,,,,., a,g,,t, ,,,,,.,^!. ,$,,,,.,." L <- readLines(textConnection(Lines)) data.frame(a = nchar(gsub("[^a]", "", L)), c = nchar(gsub("[^c]", "", L)), g = nchar(gsub("[^g]", "", L)), t = nchar(gsub("[^t]", "", L)) ) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From dwinsemius at comcast.net Sat Jul 2 01:21:45 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 19:21:45 -0400 Subject: [R] Access only part of last dimension of table/matrix In-Reply-To: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> References: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> Message-ID: <661D8F9C-B760-4DAB-A601-8E98DEF16F14@comcast.net> On Jul 1, 2011, at 5:50 PM, David Winsemius wrote: > > I would like to do some operations inside a function using only one > value for the last dimension of a table/matrix: Sorry, I had meant to poste a test dataset: testdf <- data.frame(x=sample(letters[1:5], 25, replace=TRUE), y=sample(letters[1:5], 25, replace=TRUE), z=sample(letters[1:5], 25, replace=TRUE), event=sample(c(TRUE, FALSE), 25, replace=TRUE) ) > > tabfn <- function (dfrm, facvec, YN ="event"){ > return( Etbl <- do.call(table, dfrm[ , c(facvec, > "event") ]) ) > # just want Etbl[,,,"TRUE"] or Etbl[,, "TRUE"] or > Etbl[,"TRUE"] > } > tbl <- tabfn(testdf, c("x", "y") ) > tbl # all value of event returned > > At the console it is easy for me to count the number of factors and > use the right number of commas > > tbl[ , , "TRUE"] if I only want the slice with that value. How can I > do this programmatically? > I did come up with a solution: apply(etbl2, 1:(length(dim(Etbl2))-1), "[", 2) Doesn't seem as elegant as I might have liked but it "works". And I had puzzled and searched in the archives, SO, and multiple books without finding a worked example. > Thnks. > > -- > David. > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From marc_schwartz at me.com Sat Jul 2 01:44:48 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Fri, 1 Jul 2011 18:44:48 -0500 Subject: [R] Access only part of last dimension of table/matrix In-Reply-To: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> References: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> Message-ID: <1CE414A1-DE7A-4034-83EB-486E3B3F90B5@me.com> On Jul 1, 2011, at 4:50 PM, David Winsemius wrote: > > I would like to do some operations inside a function using only one value for the last dimension of a table/matrix: > > tabfn <- function (dfrm, facvec, YN ="event"){ > return( Etbl <- do.call(table, dfrm[ , c(facvec, "event") ]) ) > # just want Etbl[,,,"TRUE"] or Etbl[,, "TRUE"] or Etbl[,"TRUE"] > } > tbl <- tabfn(testdf, c("x", "y") ) > tbl # all value of event returned > > At the console it is easy for me to count the number of factors and use the right number of commas > > tbl[ , , "TRUE"] if I only want the slice with that value. How can I do this programmatically? > > Thnks. David, I had a vague recollection of something like this coming up at some point in the past and it took me a bit to get the right keywords to find it. I did not realize how far back it was (2001), but here are two possible solutions by Peter Dalgaard and Thomas Lumley from the same thread: https://stat.ethz.ch/pipermail/r-help/2001-October/016110.html https://stat.ethz.ch/pipermail/r-help/2001-October/016122.html It looks like Peter's solution is along the lines of the one that you just posted. Hope that this helps. Regards, Marc Schwartz From porteus at zoology.ubc.ca Sat Jul 2 01:46:31 2011 From: porteus at zoology.ubc.ca (Tom Porteus) Date: Fri, 1 Jul 2011 16:46:31 -0700 Subject: [R] Reverse legend label order in barplot Message-ID: <000001cc3849$1abe9f10$503bdd30$@ubc.ca> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From marc_schwartz at me.com Sat Jul 2 02:14:00 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Fri, 01 Jul 2011 19:14:00 -0500 Subject: [R] Reverse legend label order in barplot In-Reply-To: <000001cc3849$1abe9f10$503bdd30$@ubc.ca> References: <000001cc3849$1abe9f10$503bdd30$@ubc.ca> Message-ID: <0622CAA1-5D8F-42AF-AFCE-66B6AE4FE702@me.com> On Jul 1, 2011, at 6:46 PM, Tom Porteus wrote: > Hi list, > > > > I've thus far not found a solution to my problem and hope someone can help. > I have a data matrix and wish to plot a stacked bar plot using barplot(). > This is simple enough, but I have a problem with the legend labels being in > the reverse order from what I want. The default appears to have labels > ascending bottom-to-top reflecting bottom-to-top sub-bars, but I would like > the labels to be the reverse, i.e. ascend top-to-bottom. Please see my > example code below for an illustration of the problem. > > > > For a reason unknown to me, if I plot a juxtaposed barplot using > beside=TRUE, the legend labels are actually in the order I want > (top-to-bottom labels). Is there any way to make the legend appear the way > I want for a stacked barplot? If one exists, I can't find an appropriate > argument to pass to args.legend. > > Many thanks, > > Tom > > > > ### START ### > > A <- c(50,30,10,10) > > B <- c(20,10,30,10) > > mat <- cbind(A,B) > > rownames(mat) <- c(1,2,3,4) > > barplot(mat,legend.text=rownames(mat)) #legend labels are in wrong order > here > > barplot(mat,legend.text=rownames(mat),beside=TRUE) #correct order > > ### END ### Note that the legend in this case, by default IS in the same top to bottom order as the barplot sections in the stacked barplot. So there is a visual logic to this behavior. Here is one workaround by reversing the rownames for 'mat' that are used in the legend text and then reversing the color sequencing passed internally to legend(). By default, for a matrix, a gamma corrected grey scale is used if you don't define the 'col' argument in barplot(). We can replicate this using the same approach with grey.colors(): barplot(mat, legend.text = rev(rownames(mat)), args.legend = list(fill = grey.colors(nrow(mat)))) Alternatively, You can use legend() separately to add the legend to the barplot in the fashion that you desire. Thus: barplot(mat) legend("topright", legend = rownames(mat), fill = grey.colors(nrow(mat))) HTH, Marc Schwartz P.S. Happy Canada Day From shbmira at gmail.com Sat Jul 2 02:27:49 2011 From: shbmira at gmail.com (Sergio Mira) Date: Fri, 01 Jul 2011 21:27:49 -0300 Subject: [R] Initiating in BNArray In-Reply-To: References: <4E0E37AC.4090401@gmail.com> Message-ID: <4E0E6605.6070208@gmail.com> Thanks, Dennis! Do you or anyone know how do I create a dataset like total.data of the example in [1]?? My dataset is a matrix, not a list structure. Is there a way to convert it? Thanks! [1] - http://www.cls.zju.edu.cn/binfo/BNArray/ Em 01-07-2011 20:15, Dennis Murphy escreveu: > Hi: > > See inline. > > On Fri, Jul 1, 2011 at 2:10 PM, Sergio Mira wrote: >> Hi, >> >> I'm trying to understand some details about an example maintened in [1]. >> According that link, I have total.data as a data set (am I right?). >> >> But I don't understand how is built that table. >> I saved the dataset in a file, with dput(), and had something like this: >> >> structure(list(df.all = structure(list(V1 = structure(c(1L, 2L, >> 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, >> 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, >> 30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, >> 43L, 44L, 45L, 46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, >> 56L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L, 65L, 66L, 67L, 68L, >> 69L, 70L, 71L, 72L, 73L, 74L, 75L, 76L, 77L, 78L, 79L, 80L, 81L, >> 82L, 83L, 84L, 85L, 86L, 87L, 88L, 89L, 90L, 91L, 92L, 93L, 94L, >> 95L, 96L, 97L, 98L, 99L, 100L, 101L, 102L, 103L, 104L, 105L, >> 106L, 107L, 108L, 109L, 110L, 111L, 112L, 113L, 114L, 115L, 116L, >> ...... >> ...... >> ..... >> V2 = structure(c(278L, 23L, 11L, 169L, 14L, 86L, 94L, 51L, >> 37L, 43L, 22L, 169L, 49L, 120L, 18L, 60L, 42L, 41L, 38L, >> 64L, 38L, 32L, 140L, 146L, 106L, 26L, 46L, 65L, 17L, 106L, >> 20L, 33L, 68L, 62L, 111L, 10L, 149L, 17L, 49L, 164L, 271L, >> 8L, 60L, 2L, 48L, 127L, 80L, 70L, 13L, 31L, 32L, 3L, 50L, >> 144L, 25L, 12L, 84L, 80L, 116L, 6L, 49L, 127L, 5L, 56L, 13L, >> 49L, 39L, 13L, 22L, 24L, 55L, 44L, 92L, 59L, 111L, 10L, 58L, >> 104L, 3L, 177L, 36L, 38L, 50L, 28L, 190L, 17L, 21L, 2L, 38L, >> ...... >> ...... >> ...... >> "767", "768", "769", "770", "771", "772", "773", "774", "775", >> "776", "777", "778", "779", "780", "781", "782", "783", "784", >> "785", "786", "787", "788", "789", "790", "791", "792", "793", >> "794", "795", "796", "797", "798", "799")), n.changed = 799L, >> n.all = 6179L), .Names = c("df.all", "df.ori", "n.changed", >> "n.all")) >> >> >> What is that part 1L, 2L, 3L, ... ? > They represent integer values. > >> What is V1, V2, V3, ... ? > The names of the individual list components. > >> How is the relation between V1, V2, V3, ...? > You have a list structure. V1 is (apparently) the name of the first > component, followed by its values. Ditto for V2, V3, etc. Don't know > what's going on at the bottom, though. > > HTH, > Dennis >> Is there any help about that structure? >> >> I want to build a similar structure, but need to know what are the >> meaning of these things.. >> >> Sorry for noobing here... Very thanks! >> >> -- >> Regards, >> || ------ >> || Sergio Henrique Bento de Mira >> || Computer Science | Class of 2008/2 >> || Federal University Of Lavras | UFLA >> || Lavras, MG, Brasil >> || --- >> || sergiohbmira at computacao.ufla.br >> || Cell: (+55) (35) 9128-4240 >> || ------ >> "Be the change you want to see in the world" >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> -- Regards, || ------ || Sergio Henrique Bento de Mira || Computer Science | Class of 2008/2 || Federal University Of Lavras | UFLA || Lavras, MG, Brasil || --- || sergiohbmira at computacao.ufla.br || Cell: (+55) (35) 9128-4240 || ------ "Be the change you want to see in the world" From smeldrick at gmail.com Sat Jul 2 00:20:49 2011 From: smeldrick at gmail.com (Peter) Date: Fri, 1 Jul 2011 17:20:49 -0500 Subject: [R] beginner question - effective way to chart sleep habits Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Sat Jul 2 03:15:04 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 21:15:04 -0400 Subject: [R] Access only part of last dimension of table/matrix In-Reply-To: <1CE414A1-DE7A-4034-83EB-486E3B3F90B5@me.com> References: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> <1CE414A1-DE7A-4034-83EB-486E3B3F90B5@me.com> Message-ID: On Jul 1, 2011, at 7:44 PM, Marc Schwartz wrote: > On Jul 1, 2011, at 4:50 PM, David Winsemius wrote: > >> >> I would like to do some operations inside a function using only one >> value for the last dimension of a table/matrix: >> >> tabfn <- function (dfrm, facvec, YN ="event"){ >> return( Etbl <- do.call(table, dfrm[ , c(facvec, >> "event") ]) ) >> # just want Etbl[,,,"TRUE"] or Etbl[,, "TRUE"] or >> Etbl[,"TRUE"] >> } >> tbl <- tabfn(testdf, c("x", "y") ) >> tbl # all value of event returned >> >> At the console it is easy for me to count the number of factors and >> use the right number of commas >> >> tbl[ , , "TRUE"] if I only want the slice with that value. How can >> I do this programmatically? >> >> Thnks. > > > David, > > I had a vague recollection of something like this coming up at some > point in the past and it took me a bit to get the right keywords to > find it. > > I did not realize how far back it was (2001), but here are two > possible solutions by Peter Dalgaard and Thomas Lumley from the same > thread: > > https://stat.ethz.ch/pipermail/r-help/2001-October/016110.html > https://stat.ethz.ch/pipermail/r-help/2001-October/016122.html > > It looks like Peter's solution is along the lines of the one that > you just posted. Yeah. Thanks, Mark. Pretty much the same. Guess I'm in good company. (Surprised this isn't asked more frequently.) testdf <- data.frame(x=sample(letters[1:5], 25, replace=TRUE), y=sample(letters[1:5], 25, replace=TRUE), z=sample(letters[1:5], 25, replace=TRUE), event=sample(c(TRUE, FALSE), 25, replace=TRUE) ) etbl <- table(testdf[ , c(c("x", "y"), "event")]) apply(etbl, seq(length=length(dim(etbl))-1),"[", 2) # Dalgaard 2001 apply(etbl, 1:(length(dim(etbl))-1), "[", 2) # Winsemius 2011 Any idea how to make it a "selection"? By that I mean how to select values of the last dimension whose "event" values are "TRUE". I get "Error: object 'event' not found" with any expression involving event, since it is only an attribute in dimnames. > > Hope that this helps. Yes. At least there is nothing (yet) that I would call "truly elegant". Maybe it's just that I stumbled on my solution, rather than seeing it as a glaringly obvious application of apply(..., "[" , index) > > Regards, Sorry for the duplicate private message. Meant to hit reply to all. > > Marc Schwartz > David Winsemius, MD West Hartford, CT From marc_schwartz at me.com Sat Jul 2 03:29:24 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Fri, 1 Jul 2011 20:29:24 -0500 Subject: [R] Access only part of last dimension of table/matrix In-Reply-To: References: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> <1CE414A1-DE7A-4034-83EB-486E3B3F90B5@me.com> Message-ID: On Jul 1, 2011, at 8:15 PM, David Winsemius wrote: > > On Jul 1, 2011, at 7:44 PM, Marc Schwartz wrote: > >> On Jul 1, 2011, at 4:50 PM, David Winsemius wrote: >> >>> >>> I would like to do some operations inside a function using only one value for the last dimension of a table/matrix: >>> >>> tabfn <- function (dfrm, facvec, YN ="event"){ >>> return( Etbl <- do.call(table, dfrm[ , c(facvec, "event") ]) ) >>> # just want Etbl[,,,"TRUE"] or Etbl[,, "TRUE"] or Etbl[,"TRUE"] >>> } >>> tbl <- tabfn(testdf, c("x", "y") ) >>> tbl # all value of event returned >>> >>> At the console it is easy for me to count the number of factors and use the right number of commas >>> >>> tbl[ , , "TRUE"] if I only want the slice with that value. How can I do this programmatically? >>> >>> Thnks. >> >> >> David, >> >> I had a vague recollection of something like this coming up at some point in the past and it took me a bit to get the right keywords to find it. >> >> I did not realize how far back it was (2001), but here are two possible solutions by Peter Dalgaard and Thomas Lumley from the same thread: >> >> https://stat.ethz.ch/pipermail/r-help/2001-October/016110.html >> https://stat.ethz.ch/pipermail/r-help/2001-October/016122.html >> >> It looks like Peter's solution is along the lines of the one that you just posted. > > > Yeah. Thanks, Mark. Pretty much the same. Guess I'm in good company. (Surprised this isn't asked more frequently.) > > testdf <- data.frame(x=sample(letters[1:5], 25, replace=TRUE), > y=sample(letters[1:5], 25, replace=TRUE), > z=sample(letters[1:5], 25, replace=TRUE), > event=sample(c(TRUE, FALSE), 25, replace=TRUE) ) > > etbl <- table(testdf[ , c(c("x", "y"), "event")]) > apply(etbl, seq(length=length(dim(etbl))-1),"[", 2) # Dalgaard 2001 > apply(etbl, 1:(length(dim(etbl))-1), "[", 2) # Winsemius 2011 > > Any idea how to make it a "selection"? By that I mean how to select values of the last dimension whose "event" values are "TRUE". I get "Error: object 'event' not found" with any expression involving event, since it is only an attribute in dimnames. > > >> >> Hope that this helps. > > Yes. At least there is nothing (yet) that I would call "truly elegant". Maybe it's just that I stumbled on my solution, rather than seeing it as a glaringly obvious application of apply(..., "[" , index) > >> >> Regards, > > Sorry for the duplicate private message. Meant to hit reply to all. Not a problem David, I was just in the process of replying to it when I saw this reply. Try this: > apply(etbl, seq(length=length(dim(etbl))-1), function(x) x["TRUE"]) y x a b c d e a 0 1 1 0 0 b 1 0 0 1 0 c 0 0 0 0 0 d 1 0 0 0 2 e 0 1 0 1 0 Seems to work here, at least on your example data, but not fully tested on higher dimensional arrays. I also tried it on the UCBAdmissions data set: > str(UCBAdmissions) table [1:2, 1:2, 1:6] 512 313 89 19 353 207 17 8 120 205 ... - attr(*, "dimnames")=List of 3 ..$ Admit : chr [1:2] "Admitted" "Rejected" ..$ Gender: chr [1:2] "Male" "Female" ..$ Dept : chr [1:6] "A" "B" "C" "D" ... Get Dept == "C": > apply(UCBAdmissions, seq(length=length(dim(UCBAdmissions))-1), function(x) x["C"]) Gender Admit Male Female Admitted 120 202 Rejected 205 391 It actually scared me that I had any recollection of an isolated post from 10 years ago. Not sure what to make of that... Regards, Marc From dwinsemius at comcast.net Sat Jul 2 04:39:07 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Fri, 1 Jul 2011 22:39:07 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: On Jul 1, 2011, at 9:18 PM, Bansal, Vikas wrote: > Dear David, > > it is showing this error- Looks like a syntax error rather than a semantic error. > > data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit, > + split="a|A"), length) , "-", 1)),C = > unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="c|C"), > Error: unexpected ',' in: > "data.frame(A = unlist(lapply( lapply( sapply(, strsplit, There seems to be a missing object to the first argument of sapply...? You should supply str(mydf[,5]) or at least see if the error occurs on mydf[1:20, 5] and supply str on that it the error persists. -- David. > split="a|A"), length) , "-", 1)),C = > unlist(lapply( lapply( sapply((mydf[,5]," >> length) , "-", 1)),G = unlist(lapply( lapply( sapply((mydf[,5], >> strsplit, split="g|G"), > Error: unexpected ')' in "length)" >> length) , "-", 1)),T = unlist(lapply( lapply( sapply(mydf[,5], >> strsplit, split="t|T"), > Error: unexpected ')' in "length)" > > What should I do? > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Saturday, July 02, 2011 2:07 AM > To: Bansal, Vikas > Subject: Re: [R] For help in R coding > > On Jul 1, 2011, at 8:01 PM, Bansal, Vikas wrote: > >> Dear David, >> >> Thanks for your reply.I tried your code it is running but as I >> mentioned in my mail,I am working on pileup file.So I used a command- >> mydf=read.table( >> to read pileup file to have data frame i:e mydf.Now the problem is >> it has 10 columns and have to count the number of A C G T which is >> in 9th column. >> In your mail we input data like this >>> txt <- " .a,g,, >> + .t,t,, >> + .,c,c, >> + .,a,,, >> + .,t,t,t >> + .c,,g,^!. >> + .g,ggg.^!, >> + .$,,,,,., >> + a,g,,t, >> + ,,,,,.,^!. >> + ,$,,,,.,." >> >> but how I should input my data from dataframe mydf using txt command >> because there are thousands of rows? > > Just sent mydf[ , 9] as the argument in place of testvec. > >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Friday, July 01, 2011 11:25 PM >> To: Bansal, Vikas >> Cc: r-help at r-project.org >> Subject: Re: [R] For help in R coding >> >> On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote: >> >>> Dear all, >>> >>> I am doing a project on variant calling using R.I am working on >>> pileup file.There are 10 columns in my data frame and I want to >>> count the number of A,C,G and T in each row for column 9.example of >>> column 9 is given below- >>> >>> .a,g,, >>> .t,t,, >>> .,c,c, >>> .,a,,, >>> .,t,t,t >>> .c,,g,^!. >>> .g,ggg.^!, >>> .$,,,,,., >>> a,g,,t, >>> ,,,,,.,^!. >>> ,$,,,,.,. >>> >>> This is a bit confusing for me as these characters are in one column >>> and how can we scan them for each row to print number of A,C,G and T >>> for each row. >> >> Seems a bit clunky but this does the job (first the data): >>> txt <- " .a,g,, >> + .t,t,, >> + .,c,c, >> + .,a,,, >> + .,t,t,t >> + .c,,g,^!. >> + .g,ggg.^!, >> + .$,,,,,., >> + a,g,,t, >> + ,,,,,.,^!. >> + ,$,,,,.,." >> >>> txtvec <- readLines(textConnection(txt)) >> >> Now the clunky solution, Basically subtracts 1 from the counts of >> "fragments" that result from splitting on each letter in turn. Could >> be made prettier with a function that did the job. >> >>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split="a"), length) , "-", 1)), >> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >> length) , "-", 1)), >> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >> length) , "-", 1)), >> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >> length) , "-", 1)) ) >> A C G T >> .a,g,, 1 0 1 0 >> .t,t,, 0 0 0 2 >> .,c,c, 0 2 0 0 >> .,a,,, 1 0 0 0 >> .,t,t,t 0 0 0 2 >> .c,,g,^!. 0 1 1 0 >> .g,ggg.^!, 0 0 4 0 >> .$,,,,,., 0 0 0 0 >> a,g,,t, 1 0 1 1 >> ,,,,,.,^!. 0 0 0 0 >> ,$,,,,.,. 0 0 0 0 >> >> Has the advantage that the input data ends up as rownames, which >> was a >> surprise. >> >> If you wanted to count "A" and "a" as equivalent, then the split >> argument should be "a|A" >> >> >>> Most of the rows have . and , and other symbols >>> but we will ignore them.I just want to run a loop with a counter >>> which will count the number of A,C,G and T for each row and will >>> give output something like this- >>> >>> >>> A C G T >>> 1 0 1 0 >>> 0 0 0 2 >>> 0 2 0 0 >>> 1 0 0 0 >>> 0 0 0 3 >>> >>> This output is for first 5 rows from the example given above. >>> >>> I am new to R can you please help me.I will be very thankful to you. >>> >>> >>> >>> Thanking you, >>> Warm Regards >>> Vikas Bansal >>> Msc Bioinformatics >>> Kings College London >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius, MD >> West Hartford, CT >> >> >> >> >> >> > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT From vikas.bansal at kcl.ac.uk Sat Jul 2 03:06:55 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sat, 2 Jul 2011 02:06:55 +0100 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE2@KCL-MAIL01.kclad.ds.kcl.ac.uk> Dear David, Thanks for your reply.I tried your code it is running but as I mentioned in my mail,I am working on pileup file.So I used a command- mydf=read.table("Case2.pileup",fill=T,sep="\t") to read pileup file to have data frame i:e mydf.Now the problem is it has 10 columns and have to count the number of A C G T which is in 9th column. In your mail we input data like this > txt <- " .a,g,, + .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,." but how I should input my data(in column 9) from dataframe mydf using txt command because there are thousands of rows? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Friday, July 01, 2011 11:25 PM To: Bansal, Vikas Cc: r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote: > Dear all, > > I am doing a project on variant calling using R.I am working on > pileup file.There are 10 columns in my data frame and I want to > count the number of A,C,G and T in each row for column 9.example of > column 9 is given below- > > .a,g,, > .t,t,, > .,c,c, > .,a,,, > .,t,t,t > .c,,g,^!. > .g,ggg.^!, > .$,,,,,., > a,g,,t, > ,,,,,.,^!. > ,$,,,,.,. > > This is a bit confusing for me as these characters are in one column > and how can we scan them for each row to print number of A,C,G and T > for each row. Seems a bit clunky but this does the job (first the data): > txt <- " .a,g,, + .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,." > txtvec <- readLines(textConnection(txt)) Now the clunky solution, Basically subtracts 1 from the counts of "fragments" that result from splitting on each letter in turn. Could be made prettier with a function that did the job. > data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a"), length) , "-", 1)), + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), length) , "-", 1)), + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), length) , "-", 1)), + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), length) , "-", 1)) ) A C G T .a,g,, 1 0 1 0 .t,t,, 0 0 0 2 .,c,c, 0 2 0 0 .,a,,, 1 0 0 0 .,t,t,t 0 0 0 2 .c,,g,^!. 0 1 1 0 .g,ggg.^!, 0 0 4 0 .$,,,,,., 0 0 0 0 a,g,,t, 1 0 1 1 ,,,,,.,^!. 0 0 0 0 ,$,,,,.,. 0 0 0 0 Has the advantage that the input data ends up as rownames, which was a surprise. If you wanted to count "A" and "a" as equivalent, then the split argument should be "a|A" > Most of the rows have . and , and other symbols > but we will ignore them.I just want to run a loop with a counter > which will count the number of A,C,G and T for each row and will > give output something like this- > > > A C G T > 1 0 1 0 > 0 0 0 2 > 0 2 0 0 > 1 0 0 0 > 0 0 0 3 > > This output is for first 5 rows from the example given above. > > I am new to R can you please help me.I will be very thankful to you. > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From vikas.bansal at kcl.ac.uk Sat Jul 2 03:18:23 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sat, 2 Jul 2011 02:18:23 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> Dear David, it is showing this error- data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit, + split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="c|C"), Error: unexpected ',' in: "data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit, split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5]," > length) , "-", 1)),G = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="g|G"), Error: unexpected ')' in "length)" > length) , "-", 1)),T = unlist(lapply( lapply( sapply(mydf[,5], strsplit, split="t|T"), Error: unexpected ')' in "length)" What should I do? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Saturday, July 02, 2011 2:07 AM To: Bansal, Vikas Subject: Re: [R] For help in R coding On Jul 1, 2011, at 8:01 PM, Bansal, Vikas wrote: > Dear David, > > Thanks for your reply.I tried your code it is running but as I > mentioned in my mail,I am working on pileup file.So I used a command- > mydf=read.table( > to read pileup file to have data frame i:e mydf.Now the problem is > it has 10 columns and have to count the number of A C G T which is > in 9th column. > In your mail we input data like this >> txt <- " .a,g,, > + .t,t,, > + .,c,c, > + .,a,,, > + .,t,t,t > + .c,,g,^!. > + .g,ggg.^!, > + .$,,,,,., > + a,g,,t, > + ,,,,,.,^!. > + ,$,,,,.,." > > but how I should input my data from dataframe mydf using txt command > because there are thousands of rows? Just sent mydf[ , 9] as the argument in place of testvec. > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Friday, July 01, 2011 11:25 PM > To: Bansal, Vikas > Cc: r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote: > >> Dear all, >> >> I am doing a project on variant calling using R.I am working on >> pileup file.There are 10 columns in my data frame and I want to >> count the number of A,C,G and T in each row for column 9.example of >> column 9 is given below- >> >> .a,g,, >> .t,t,, >> .,c,c, >> .,a,,, >> .,t,t,t >> .c,,g,^!. >> .g,ggg.^!, >> .$,,,,,., >> a,g,,t, >> ,,,,,.,^!. >> ,$,,,,.,. >> >> This is a bit confusing for me as these characters are in one column >> and how can we scan them for each row to print number of A,C,G and T >> for each row. > > Seems a bit clunky but this does the job (first the data): >> txt <- " .a,g,, > + .t,t,, > + .,c,c, > + .,a,,, > + .,t,t,t > + .c,,g,^!. > + .g,ggg.^!, > + .$,,,,,., > + a,g,,t, > + ,,,,,.,^!. > + ,$,,,,.,." > >> txtvec <- readLines(textConnection(txt)) > > Now the clunky solution, Basically subtracts 1 from the counts of > "fragments" that result from splitting on each letter in turn. Could > be made prettier with a function that did the job. > >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, > split="a"), length) , "-", 1)), > + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), > length) , "-", 1)), > + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), > length) , "-", 1)), > + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), > length) , "-", 1)) ) > A C G T > .a,g,, 1 0 1 0 > .t,t,, 0 0 0 2 > .,c,c, 0 2 0 0 > .,a,,, 1 0 0 0 > .,t,t,t 0 0 0 2 > .c,,g,^!. 0 1 1 0 > .g,ggg.^!, 0 0 4 0 > .$,,,,,., 0 0 0 0 > a,g,,t, 1 0 1 1 > ,,,,,.,^!. 0 0 0 0 > ,$,,,,.,. 0 0 0 0 > > Has the advantage that the input data ends up as rownames, which was a > surprise. > > If you wanted to count "A" and "a" as equivalent, then the split > argument should be "a|A" > > >> Most of the rows have . and , and other symbols >> but we will ignore them.I just want to run a loop with a counter >> which will count the number of A,C,G and T for each row and will >> give output something like this- >> >> >> A C G T >> 1 0 1 0 >> 0 0 0 2 >> 0 2 0 0 >> 1 0 0 0 >> 0 0 0 3 >> >> This output is for first 5 rows from the example given above. >> >> I am new to R can you please help me.I will be very thankful to you. >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > > > > > David Winsemius, MD West Hartford, CT From jwiley.psych at gmail.com Sat Jul 2 04:45:00 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Fri, 1 Jul 2011 19:45:00 -0700 Subject: [R] beginner question - effective way to chart sleep habits In-Reply-To: References: Message-ID: Hi Peter, There are lots and lots of ways. Here are examples of a few that came to mind. If you have never used the ggplot2 package, you will first need to install it, which you can do by typing: install.packages("ggplot2") The code below should run "as is". You may see various trends over time or based on bedtime because the data is (pseudo)randomly generated, but you should see a "weekend" effect, which pops out in the circular plot. Hope this helps, Josh ## Make up some data with potentially interesting weekend pattern Date <- as.POSIXlt(as.POSIXct(1309639554, origin = "1970-01-01") + cumsum(rep(86400, 90)) + rnorm(90, sd = 2700)) d <- c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday") dat <- data.frame(Date, Day = factor(weekdays(Date), levels = d), Bedtime = Date$hour + Date$min/60, Hours = rnorm(length(Date), mean = 6.4)) weekends <- grep("Saturday|Sunday", dat$Day) dat[weekends, "Hours"] <- dat[weekends, "Hours"] + rnorm(length(weekends), 1) tmp <- with(dat, tapply(Hours, Day, mean)) dat2 <- data.frame(Days = factor(names(tmp), levels = d), Hours = as.numeric(tmp)) ## Load required packages require(ggplot2) ## Create three separate plots ## you can look at them individually by calling: print(plotobject) p.hours <- ggplot(dat, aes(x = Date, y = Hours)) + geom_line() + opts(title = "Hours Slept per Night") p.ave <- ggplot(dat2, aes(x = Days, y = Hours)) + geom_bar() + coord_polar() + opts(title = "Average Hours of Sleep per Day") p.th <- ggplot(dat, aes(x = Bedtime, y = Hours)) + geom_point() + opts(title = "Bedtime and Hours Slept") ## Put all plots in one big one pushViewport(vpList( viewport(x = 0, y = 0.5, width = .5, height = .5, just = c("left", "bottom"), name = "a"), viewport(x = .5, y = .5, width = .5, height = .5, just = c("left", "bottom"), name = "b"), viewport(x = 0, y = 0, width = 1, height = .5, just = c("left", "bottom"), name = "c"))) upViewport() downViewport("a") print(p.ave, newpage = FALSE) upViewport() downViewport("b") print(p.th, newpage = FALSE) upViewport() downViewport("c") print(p.hours, newpage = FALSE) On Fri, Jul 1, 2011 at 3:20 PM, Peter wrote: > Hi - beginning R user question here - each day, over the course of several > months, I've tracked the time I go to bed, the time I wake up, and my hours > spent sleeping. ?What would be a good way to display this information? ?I > think it would be ideal to show something resembling a bar and whisker graph > for each day that would show the interval of hours spent asleep (or perhaps > just a bar "floating" against a backdrop showing the hours of a given > day/night), and that would also have a simple line graph of the total number > of hours per day. > > Can I get a hint perhaps? ?Thanks very much. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From smeldrick at gmail.com Sat Jul 2 05:29:08 2011 From: smeldrick at gmail.com (Peter) Date: Fri, 1 Jul 2011 22:29:08 -0500 Subject: [R] beginner question - effective way to chart sleep habits In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From tonightsthenight at gmail.com Sat Jul 2 06:42:54 2011 From: tonightsthenight at gmail.com (Sam Albers) Date: Fri, 1 Jul 2011 21:42:54 -0700 Subject: [R] Italicized greek symbols in PDF plots In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jim at bitwrit.com.au Sat Jul 2 09:58:07 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Sat, 2 Jul 2011 17:58:07 +1000 Subject: [R] Access only part of last dimension of table/matrix In-Reply-To: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> References: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> Message-ID: <4E0ECF8F.8080203@bitwrit.com.au> On 07/02/2011 07:50 AM, David Winsemius wrote: > > I would like to do some operations inside a function using only one > value for the last dimension of a table/matrix: > > tabfn <- function (dfrm, facvec, YN ="event"){ > return( Etbl <- do.call(table, dfrm[ , c(facvec, "event") ]) ) > # just want Etbl[,,,"TRUE"] or Etbl[,, "TRUE"] or Etbl[,"TRUE"] > } > tbl <- tabfn(testdf, c("x", "y") ) > tbl # all value of event returned > > At the console it is easy for me to count the number of factors and use > the right number of commas > > tbl[ , , "TRUE"] if I only want the slice with that value. How can I do > this programmatically? > Hi David, I used to do something like this before completely reprogramming the barNest function. # set the first dimension (can be the last also) to a number "stat" sliceargs[[1]] <- x[[stat]] # for the rest of the dimensions of this array, tack on a comma for (arg in 2:ndim) sliceargs[[arg]] <- TRUE sliceargs[[ndim + 1]] <- slice # "slice" the array xslice[[stat]] <- do.call("[", sliceargs) To get the last dimension, just tack commas together for 1:(ndim-1) and then add the number at the end. I think Bill Dunlap gave me this solution when I was struggling with it. Jim From skkrause at gmail.com Sat Jul 2 08:05:29 2011 From: skkrause at gmail.com (Sara Krause) Date: Fri, 1 Jul 2011 23:05:29 -0700 Subject: [R] Plot error in package lme4 Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pdalgd at gmail.com Sat Jul 2 12:39:16 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Sat, 2 Jul 2011 12:39:16 +0200 Subject: [R] Access only part of last dimension of table/matrix In-Reply-To: References: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> <1CE414A1-DE7A-4034-83EB-486E3B3F90B5@me.com> Message-ID: <0E134856-97D6-4015-9E97-3608CC469DDD@gmail.com> On Jul 2, 2011, at 03:29 , Marc Schwartz wrote: > It actually scared me that I had any recollection of an isolated post from 10 years ago. Not sure what to make of that... If it is any consolation, the author had forgotten it entirely.... -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From ross.dunne at tcd.ie Sat Jul 2 12:52:44 2011 From: ross.dunne at tcd.ie (dunner) Date: Sat, 2 Jul 2011 03:52:44 -0700 (PDT) Subject: [R] Multilevel Survival Analysis - Cox PH Model In-Reply-To: References: <1309529438658-3638278.post@n4.nabble.com> <7316E9DE-F5F0-4F43-8571-842697E26AB0@comcast.net> Message-ID: <1309603964667-3640352.post@n4.nabble.com> There is indeed right censoring, but I obviously didn't explain it very well. Patients are either fully oriented or not (1 or 2) after an hour. If they're not, then the data is right censored. However, I don't feel that "coxme" is overkill at all, as I may also have to account for repeated COURSES of treatments on the same individiual, so the data would be structured as follows: TREATMENTS repeated within COURSES repeated within PATIENTS. However, I may need a little help later specifying the model technically in R. What would the random effects bit of the formula look like? At the moment I have mycoxme<-coxme(mysurv~SEX + (1|MRN)) for example. would a correct specification for the nested data be: mycoxme<-coxme(mysurv~SEX + (1|MRN|COURSE)) Then of course we have the added issue that treatments are ordered.... so is there a "Frailty" model that can account for that I wonder? Thanks for all your help. Ross -- View this message in context: http://r.789695.n4.nabble.com/Multilevel-Survival-Analysis-Cox-PH-Model-tp3638278p3640352.html Sent from the R help mailing list archive at Nabble.com. From bbolker at gmail.com Sat Jul 2 14:06:48 2011 From: bbolker at gmail.com (Ben Bolker) Date: Sat, 2 Jul 2011 12:06:48 +0000 Subject: [R] Plot error in package lme4 References: Message-ID: Sara Krause gmail.com> writes: > > Hi, > > I am new to R and not fantastic at statistics so it may well be that I am > doing something silly but I can't figure out what it is and hoping that > somebody can help. > > I am running package lme4, and trying to get a Residuals vs. Fitted graph. > When I try to plot, I receive an error. > > Error in as.double(y) : > cannot coerce type 'S4' to vector of type 'double' > > Here is the code I am using > > library("lme4") > > m1<-lmer(y~trt*time*loc_block*Season+(1|ID), data=d) > > plot(m1,add.smooth=F,which=1) > > Error in as.double(y) : > cannot coerce type 'S4' to vector of type 'double' > > I searched the forums for some answers but haven't found anything > useful. Any insight into what I am doing wrong and how to fix it. > > Sara The standard diagnostics which you have seen in 'lm' are not implemented for 'lme4'. You have to do it yourself, but it's not *too* hard: library(lme4) ## from ?lmer fm1 <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy) f <- fitted(fm1) r <- residuals(fm1) plot(f,r) sm <- loess(r~f) v <- seq(min(f),max(f),length=101) lines(v,predict(sm,data.frame(f=v)),col=2) Further questions about lme4 should probably go to the r-sig-mixed-models mailing list. From lm_mengxin at 163.com Sat Jul 2 11:56:46 2011 From: lm_mengxin at 163.com (=?GBK?B?w8/QwA==?=) Date: Sat, 2 Jul 2011 17:56:46 +0800 (CST) Subject: [R] The test of randomized slopes(intercepts) Message-ID: <12ae090.af01.130ea4769c0.Coremail.lm_mengxin@163.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From johannes.mohr at gmx.net Sat Jul 2 13:59:07 2011 From: johannes.mohr at gmx.net (akaebi) Date: Sat, 2 Jul 2011 04:59:07 -0700 (PDT) Subject: [R] Need help with my R- Project Message-ID: <1309607947215-3640419.post@n4.nabble.com> What happens to the correlation when i dichotomize the variables? ( i need a function for R) What happens if the correlation is always larger or smaller? What happens to the median? (also a function is needed) What happens if the distribution is not normal? I need point clouds! and last but not least... what happens if you change the scattering and the normal distribution I am looking forward to answers! many thanks! I'm just really desperate -- View this message in context: http://r.789695.n4.nabble.com/Need-help-with-my-R-Project-tp3640419p3640419.html Sent from the R help mailing list archive at Nabble.com. From patrick.grossmann at uni-bielefeld.de Sat Jul 2 14:37:28 2011 From: patrick.grossmann at uni-bielefeld.de (=?iso-8859-1?Q?=22Patrick_Gro=DFmann=22?=) Date: Sat, 02 Jul 2011 14:37:28 +0200 Subject: [R] Error when using plot in diag.panel argument of pairs Message-ID: <6714_1309610249_ZZh0u2W1fG2dm.00_fc2082cd1da76.4e0f2d28@uni-bielefeld.de> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From lui.r.project at googlemail.com Sat Jul 2 15:28:39 2011 From: lui.r.project at googlemail.com (Lui ##) Date: Sat, 2 Jul 2011 21:28:39 +0800 Subject: [R] SNOW libraries/functions, rGenoud In-Reply-To: <4E0DD6E5.8080604@statistik.tu-dortmund.de> References: <4E0DD6E5.8080604@statistik.tu-dortmund.de> Message-ID: Hello Uwe, thanks a lot! That solved the "problem" Have a nice weekend! Viele Gr??e Lui 2011/7/1 Uwe Ligges : > See ?clusterEvalQ > > Uwe Ligges > > On 01.07.2011 16:11, Lui ## wrote: >> >> Dear group, >> >> does anybody know how to export libraries/functions to all nodes when >> launching snow? I want to use a function from fBasics (dstable) for a >> rGenoud optimization routine, but I fail "making the function >> accessible" to the nodes created. I know how it works for variables, I >> know how it works in snowfall(which cant be used in that case), but I >> dont know how it culd work in snow. >> >> Help appreciated! >> >> Lui >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > From shantalnb at yahoo.com Sat Jul 2 15:03:33 2011 From: shantalnb at yahoo.com (Shantal Bonnick) Date: Sat, 2 Jul 2011 06:03:33 -0700 (PDT) Subject: [R] (no subject) Message-ID: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From izahn at psych.rochester.edu Sat Jul 2 16:15:17 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Sat, 2 Jul 2011 10:15:17 -0400 Subject: [R] (no subject) In-Reply-To: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> References: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> Message-ID: Please read the posting guide. "there's an error somewhere" is simply not enough to go on. Best, Ista On Sat, Jul 2, 2011 at 9:03 AM, Shantal Bonnick wrote: > Hi, > > If i want to repeat a function, say 100 times, how do i do that? i used the for > loop but there's an error somewhere. > > Thanks > shan > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From jdnewmil at dcn.davis.ca.us Sat Jul 2 16:15:38 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Sat, 02 Jul 2011 07:15:38 -0700 Subject: [R] (no subject) In-Reply-To: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> References: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> Message-ID: <874e21ff-38cc-4c75-858e-455cc9d101ce@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Sat Jul 2 16:29:13 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 02 Jul 2011 16:29:13 +0200 Subject: [R] Need help with my R- Project In-Reply-To: <1309607947215-3640419.post@n4.nabble.com> References: <1309607947215-3640419.post@n4.nabble.com> Message-ID: <4E0F2B39.2070903@statistik.tu-dortmund.de> On 02.07.2011 13:59, akaebi wrote: > What happens to the correlation when i dichotomize the variables? ( i need a > function for R) > > What happens if the correlation is always larger or smaller? What happens to > the median? (also a function is needed) > > What happens if the distribution is not normal? > > I need point clouds! > > and last but not least... > > what happens if you change the scattering and the normal distribution > > > I am looking forward to answers! many thanks! I'm just really desperate In principle, we do not answer homework questions on this list. If you need help and cannot solve the problems yourself, the first to ask is your supervisor. Uwe Ligges > > -- > View this message in context: http://r.789695.n4.nabble.com/Need-help-with-my-R-Project-tp3640419p3640419.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Sat Jul 2 16:36:17 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 02 Jul 2011 16:36:17 +0200 Subject: [R] Error when using plot in diag.panel argument of pairs In-Reply-To: <6714_1309610249_ZZh0u2W1fG2dm.00_fc2082cd1da76.4e0f2d28@uni-bielefeld.de> References: <6714_1309610249_ZZh0u2W1fG2dm.00_fc2082cd1da76.4e0f2d28@uni-bielefeld.de> Message-ID: <4E0F2CE1.8020505@statistik.tu-dortmund.de> If you send a better readable message with really reproducible code, we may provide a solution, this way it is really hard to read your code. And yes, please do not send html mail. My guess is that par(new=TRUE) will help, but I do not want to construct the examples at first. On 02.07.2011 14:37, "Patrick Gro?mann" wrote: > Dear Madame or Sir,I am having a problem in combining density-smoothed scatterplot matrices with a plot of kernel destiny estimations of each dimension plotted on the respective field of the diagonal.I have tried following approach using the package "sm" for the kernel density estimation, as well as "MASS" respectively:pairs(myTable[, 1:4],panel=function(x,y, ...){ xy=kde2d(x,y) image(xy, add=TRUE)},diag.panel = function(x,...){plot(density(x), add = TRUE)#boxplot(x, add = TRUE)})where myTable is an ordinary table I am using the first four columns of. Unfortunately this produces an error stating that plot has created a new plot. I suppose this is the same situation when forgetting "add = TRUE" with plotting functions such as boxplot (as in my code used with "boxplot(x, add = TRUE)" ) or any other (e.g. smoothScatter). Analyzing the partial plot, which can be seen before R rejects my code, I come to the conclusion that add might not be a valid argument for "plot ". My questions therefore is, whether there is an option to plot the result of the 1d kernel density estimation to the diagonal of the scatterplot produced by pairs or which plot function is valid in this case. So basically I would like to generate the same output I am able to produce with boxplots on the diagonal with a density estimation.Additional question: I bet there is an opportunity to include both the boxplot and the kernel density estimation on each field of the diagonal (as I saw before in a published paper), but I just could not figure out how to "overdraw". Maybe you could give me a slight idea (reference would be fine, too) on how to achieve this. I would be highly thankful!Kind regards and many thanks in advance,-Patrick Grossman -- > Patrick Gro?mann - HiWi zur Netzwerkdokumentation > Network Operation Center NOC > Hochschulrechenzentrum HRZ Universit?t Bielefeld > > B?ro: > UHG V0-251 > Telefon: > +49 521 106-12618 > Fax: > +49 521 106-2969 > Telefon Sekretariat: > +49 521 106-4951 > > http://www.uni-bielefeld.de/hrz/ > > > > [[alternative HTML version deleted]] > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Sat Jul 2 17:42:09 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 02 Jul 2011 17:42:09 +0200 Subject: [R] Error when using plot in diag.panel argument of pairs In-Reply-To: <4342_1309619351_ZZh0u7W3O~zo0.00_fc60cc8a1bc3b.4e0f50b7@uni-bielefeld.de> References: <6714_1309610249_ZZh0u2W1fG2dm.00_fc2082cd1da76.4e0f2d28@uni-bielefeld.de> <4E0F2CE1.8020505@statistik.tu-dortmund.de> <4342_1309619351_ZZh0u7W3O~zo0.00_fc60cc8a1bc3b.4e0f50b7@uni-bielefeld.de> Message-ID: <4E0F3C51.4070909@statistik.tu-dortmund.de> OK, so let me make it reproducible and start with myTable <- iris require("MASS") On 02.07.2011 17:09, "Patrick Gro?mann" wrote: > Dear Mr. Ligges, > > my apologies! As I can see in your E-Mail the quote of mine is really just unreadable! Anyhow, this was not produced by me writing messy messages, this is due to the Java application behind the client not being able to cope with external tab-characters on my browser when parsing the message while sending(I tipped the message on an external editor because of the code lines and copy pasted it). Of course I also never would add any html files! > > Anyhow, I send you my original message again (new = True did not help me. Assuming I used it the way you suggested it produced the same error). I send it to myself first, so I am sure, this mess won't happen again! > > My previous Message: > > ######### > Dear Madame or Sir, > > I am having a problem in combining density-smoothed scatterplot matrices with a plot of kernel destiny estimations of each dimension plotted on the respective diagonals. > > I have tried following approach using the package "sm" for the kernel density estimation, as well as "MASS" respectively: > > pairs > (myTable[, 1:4], > panel=function(x,y, ...) > { > xy=kde2d(x,y) > image(xy, add=TRUE) > } > ,diag.panel = function(x,...) > { > plot(density(x,...), add = TRUE) > #boxplot(x, add = TRUE) > } > ) Now we could go with: pairs(myTable[, 1:4], panel=function(x,y, ...) { xy <- kde2d(x,y) image(xy, add=TRUE) }, diag.panel = function(x,...) { pu <- par("usr") d <- density(x,...) par("usr" = c(pu[1:2], 0, max(d$y)*1.5)) lines(d) par("usr" = c(pu[1:2], 0, 1)) boxplot(x, at=0.5, boxwex=0.3, horizontal=TRUE, add=TRUE) } ) Uwe Ligges > where myTable is an ordinary table I am using the first four columns of. Unfortunately this produces an error stating that plot has created a new plot. I suppose this is the same situation when forgetting "add = TRUE" with plotting functions such as boxplot (as in my code used with "boxplot(x, add = TRUE)" ) or any other (e.g. smoothScatter). Analyzing the partial plot, which can be seen before R rejects my code, I come to the conclusion that add might not be a valid argument for "plot". > My questions therefore is, whether there is an option to plot the result of the 1d kernel density estimation to the diagonal of the scatterplot produced by pairs or which plot function is valid in this case. So basically I would like to generate the same output I am able to produce with boxplots on the diagonal with a density estimation. > > Additional question: I bet there is an opportunity to include both the boxplot and the kernel density estimation on each field of the diagonal (as I saw before in a published paper), but I just could not figure out how to "overdraw". Maybe you could give me a slight idea (reference would be fine, too) on how to achieve this. I would be highly thankful! > > > Kind regards and many thanks in advance, > -Patrick Grossman > > ############## > > Please excuse the inconvenience and note my thanks. > > Warmest regards, > -Patrick Grossman > > ----- Urspr?ngliche Nachricht ----- > Von: Uwe Ligges > Datum: Samstag, 2. Juli 2011, 16:36 > Betreff: Re: [R] Error when using plot in diag.panel argument of pairs > An: Patrick Gro?mann > Cc: r-help at r-project.org > >> If you send a better readable message with really reproducible >> code, we >> may provide a solution, this way it is really hard to read your >> code. >> And yes, please do not send html mail. My guess is that >> par(new=TRUE) >> will help, but I do not want to construct the examples at first. >> >> >> >> >> On 02.07.2011 14:37, "Patrick Gro?mann" wrote: >>> Dear Madame or Sir,I am having a problem in >> combining density-smoothed scatterplot matrices with a plot of >> kernel destiny estimations of each dimension plotted on the >> respective field of the diagonal.I have tried following approach >> using the package "sm" for the kernel density estimation, as >> well as "MASS" respectively:pairs(myTable[, >> 1:4],panel=function(x,y, ...){ >> xy=kde2d(x,y) image(xy, >> add=TRUE)},diag.panel = function(x,...){plot(density(x), add = >> TRUE)#boxplot(x, add = TRUE)})where myTable is an ordinary table >> I am using the first four columns of. Unfortunately this >> produces an error stating that plot has created a new plot. I >> suppose this is the same situation when forgetting "add = >> TRUE" with plotting functions such as boxplot (as in my code >> used with "boxplot(x, add = TRUE)" ) or any other (e.g. >> smoothScatter). Analyzing the partial plot, which can be >> seen before R rejects my code, I come to the conclusion that add >> might not be a valid argument for "plot >> ". My questions therefore is, whether there is an option to plot >> the result of the 1d kernel density estimation to the diagonal >> of the scatterplot produced by pairs or which plot function is >> valid in this case. So basically I would like to generate the >> same output I am able to produce with boxplots on the diagonal >> with a density estimation.Additional question: I bet there is an >> opportunity to include both the boxplot and the kernel density >> estimation on each field of the diagonal (as I saw before in a >> published paper), but I just could not figure out how to >> "overdraw". Maybe you could give me a slight idea (reference >> would be fine, too) on how to achieve this. I would be highly >> thankful!Kind regards and many thanks in advance,-Patrick >> Grossman -- >>> Patrick Gro?mann - HiWi zur Netzwerkdokumentation >>> Network Operation Center NOC >>> Hochschulrechenzentrum HRZ >> Universit?t Bielefeld >>> >>> B?ro: >>> UHG V0-251 >>> Telefon: >>> +49 521 106-12618 >>> Fax: >>> +49 521 106-2969 >>> Telefon Sekretariat: >>> +49 521 106-4951 >>> >>> http://www.uni-bielefeld.de/hrz/ >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> >>> >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R- >> project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > -- > Patrick Gro?mann - HiWi zur Netzwerkdokumentation > Network Operation Center NOC > Hochschulrechenzentrum HRZ Universit?t Bielefeld > > B?ro: > UHG V0-251 > Telefon: > +49 521 106-12618 > Fax: > +49 521 106-2969 > Telefon Sekretariat: > +49 521 106-4951 > > http://www.uni-bielefeld.de/hrz/ > > > From marc_schwartz at me.com Sat Jul 2 18:04:04 2011 From: marc_schwartz at me.com (Marc Schwartz) Date: Sat, 2 Jul 2011 11:04:04 -0500 Subject: [R] Access only part of last dimension of table/matrix In-Reply-To: <0E134856-97D6-4015-9E97-3608CC469DDD@gmail.com> References: <7344F63F-D9A3-4BE0-AFA2-51925DC4D414@comcast.net> <1CE414A1-DE7A-4034-83EB-486E3B3F90B5@me.com> <0E134856-97D6-4015-9E97-3608CC469DDD@gmail.com> Message-ID: On Jul 2, 2011, at 5:39 AM, peter dalgaard wrote: > > On Jul 2, 2011, at 03:29 , Marc Schwartz wrote: > >> It actually scared me that I had any recollection of an isolated post from 10 years ago. Not sure what to make of that... > > If it is any consolation, the author had forgotten it entirely.... LOL Peter! Thanks! Regards, Marc From dwinsemius at comcast.net Sat Jul 2 19:19:33 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sat, 2 Jul 2011 13:19:33 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk>, , <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: > > >>> Dear all, >>> >>> I am doing a project on variant calling using R.I am working on >>> pileup file.There are 10 columns in my data frame and I want to >>> count the number of A,C,G and T in each row for column 9.example of >>> column 9 is given below- >>> >>> .a,g,, >>> .t,t,, >>> .,c,c, >>> .,a,,, >>> .,t,t,t >>> .c,,g,^!. >>> .g,ggg.^!, >>> .$,,,,,., >>> a,g,,t, >>> ,,,,,.,^!. >>> ,$,,,,.,. >>> >>> This is a bit confusing for me as these characters are in one column >>> and how can we scan them for each row to print number of A,C,G and T >>> for each row. >> >> Seems a bit clunky but this does the job (first the data): >>> txt <- " .a,g,, >> + .t,t,, >> + .,c,c, >> + .,a,,, >> + .,t,t,t >> + .c,,g,^!. >> + .g,ggg.^!, >> + .$,,,,,., >> + a,g,,t, >> + ,,,,,.,^!. >> + ,$,,,,.,." >> >>> txtvec <- readLines(textConnection(txt)) >> >> Now the clunky solution, Basically subtracts 1 from the counts of >> "fragments" that result from splitting on each letter in turn. Could >> be made prettier with a function that did the job. >> >>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split="a"), length) , "-", 1)), >> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >> length) , "-", 1)), >> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >> length) , "-", 1)), >> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >> length) , "-", 1)) ) >> A C G T >> .a,g,, 1 0 1 0 >> .t,t,, 0 0 0 2 >> .,c,c, 0 2 0 0 >> .,a,,, 1 0 0 0 >> .,t,t,t 0 0 0 2 >> .c,,g,^!. 0 1 1 0 >> .g,ggg.^!, 0 0 4 0 >> .$,,,,,., 0 0 0 0 >> a,g,,t, 1 0 1 1 >> ,,,,,.,^!. 0 0 0 0 >> ,$,,,,.,. 0 0 0 0 >> >> Has the advantage that the input data ends up as rownames, which >> was a >> surprise. >> >> If you wanted to count "A" and "a" as equivalent, then the split >> argument should be "a|A" >> >> > >>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>> LIKE THIS. > BUT CAN I COUNT . AND , ALSO USING- > data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, > split=".|,"), length) , "-", 1)), > > I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME > PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT > EVEN CALCULATING AND JUST SHOWING 0. You need to use valid regex expressions for 'split'. Since "." and "," are special characters they need to be escaped when you wnat the literals to be recognized as such. I haven't figured out why but you need to drop the final operation of subtracting 1 from the values when counting commas: data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, split="\\."), length) , "-", 1)) ,commas = unlist( lapply( sapply(txtvec, strsplit, split="\\,"), length) ) ) periods commas .a,g,, 1 3 .t,t,, 1 3 .,c,c, 1 3 .,a,,, 1 4 .,t,t,t 1 4 .c,,g,^!. 1 4 .g,ggg.^!, 2 2 .$,,,,,., 2 6 a,g,,t, 0 4 ,,,,,.,^!. 1 7 ,$,,,,.,. 1 7 -- David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Sat Jul 2 19:23:35 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sat, 2 Jul 2011 13:23:35 -0400 Subject: [R] (no subject) In-Reply-To: References: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> Message-ID: <40781842-31EC-4A50-A71B-3596FC57CF2C@comcast.net> On Jul 2, 2011, at 10:15 AM, Ista Zahn wrote: > Please read the posting guide. "there's an error somewhere" is simply > not enough to go on. > True enough, but under the hypothesis that this might be some sort of simulation task, I would also suggest looking at: ?replicate > Best, > Ista > > On Sat, Jul 2, 2011 at 9:03 AM, Shantal Bonnick > wrote: >> Hi, >> >> If i want to repeat a function, say 100 times, how do i do that? i >> used the for >> loop but there's an error somewhere. >> >> Thanks >> shan >> [[alternative HTML version deleted]] Learning to post plain text with the yahoo interface should be added to shan's TO-DO list. -- David Winsemius, MD West Hartford, CT From ivowel at gmail.com Sat Jul 2 19:32:35 2011 From: ivowel at gmail.com (ivo welch) Date: Sat, 2 Jul 2011 10:32:35 -0700 Subject: [R] %dopar% parallel processing experiment Message-ID: dear R experts--- I am experimenting with multicore processing, so far with pretty disappointing results. Here is my simple example: A <- 100000 randvalues <- abs(rnorm(A)) minfn <- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ?## an arbitrary function ARGV <- commandArgs(trailingOnly=TRUE) if (ARGV[1] == "do-onecore") { ?library(foreach) ?discard <- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } else if (ARGV[1] == "do-multicore") { ?library(doMC) ?registerDoMC() ?cat("You have", getDoParWorkers(), "cores\n") ?discard <- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } else if (ARGV[1] == "plain") ?for (i in 1:A) discard <- uniroot( minfn, c(1e-20,9e20), i ) else cat("sorry, but argument", ARGV[1], "is not plain|do-onecore|do-multicore\n") on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, "plain" takes about 68 seconds (real and user, using the unix timing function). "do-onecore" takes about 300 seconds. "do-multicore" takes about 210 seconds real, (300 seconds user). this seems pretty disappointing. the cores are not used for the most part, either. feedback appreciated. /iaw ---- Ivo Welch (ivo.welch at gmail.com) From vikas.bansal at kcl.ac.uk Sat Jul 2 18:34:08 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sat, 2 Jul 2011 17:34:08 +0100 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk>, , <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> >> Dear all, >> >> I am doing a project on variant calling using R.I am working on >> pileup file.There are 10 columns in my data frame and I want to >> count the number of A,C,G and T in each row for column 9.example of >> column 9 is given below- >> >> .a,g,, >> .t,t,, >> .,c,c, >> .,a,,, >> .,t,t,t >> .c,,g,^!. >> .g,ggg.^!, >> .$,,,,,., >> a,g,,t, >> ,,,,,.,^!. >> ,$,,,,.,. >> >> This is a bit confusing for me as these characters are in one column >> and how can we scan them for each row to print number of A,C,G and T >> for each row. > > Seems a bit clunky but this does the job (first the data): >> txt <- " .a,g,, > + .t,t,, > + .,c,c, > + .,a,,, > + .,t,t,t > + .c,,g,^!. > + .g,ggg.^!, > + .$,,,,,., > + a,g,,t, > + ,,,,,.,^!. > + ,$,,,,.,." > >> txtvec <- readLines(textConnection(txt)) > > Now the clunky solution, Basically subtracts 1 from the counts of > "fragments" that result from splitting on each letter in turn. Could > be made prettier with a function that did the job. > >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, > split="a"), length) , "-", 1)), > + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), > length) , "-", 1)), > + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), > length) , "-", 1)), > + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), > length) , "-", 1)) ) > A C G T > .a,g,, 1 0 1 0 > .t,t,, 0 0 0 2 > .,c,c, 0 2 0 0 > .,a,,, 1 0 0 0 > .,t,t,t 0 0 0 2 > .c,,g,^!. 0 1 1 0 > .g,ggg.^!, 0 0 4 0 > .$,,,,,., 0 0 0 0 > a,g,,t, 1 0 1 1 > ,,,,,.,^!. 0 0 0 0 > ,$,,,,.,. 0 0 0 0 > > Has the advantage that the input data ends up as rownames, which was a > surprise. > > If you wanted to count "A" and "a" as equivalent, then the split > argument should be "a|A" > > >>AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE THIS. BUT CAN I COUNT . AND , ALSO USING- data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split=".|,"), length) , "-", 1)), I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN CALCULATING AND JUST SHOWING 0. >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > > > > > David Winsemius, MD West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From patrick.grossmann at uni-bielefeld.de Sat Jul 2 17:09:11 2011 From: patrick.grossmann at uni-bielefeld.de (=?iso-8859-1?Q?=22Patrick_Gro=DFmann=22?=) Date: Sat, 02 Jul 2011 17:09:11 +0200 Subject: [R] Error when using plot in diag.panel argument of pairs In-Reply-To: <4E0F2CE1.8020505@statistik.tu-dortmund.de> References: <6714_1309610249_ZZh0u2W1fG2dm.00_fc2082cd1da76.4e0f2d28@uni-bielefeld.de> <4E0F2CE1.8020505@statistik.tu-dortmund.de> Message-ID: <4342_1309619351_ZZh0u7W3O~zo0.00_fc60cc8a1bc3b.4e0f50b7@uni-bielefeld.de> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From btsai00 at gmail.com Sat Jul 2 17:45:21 2011 From: btsai00 at gmail.com (Brian Tsai) Date: Sat, 2 Jul 2011 08:45:21 -0700 Subject: [R] comparing hazard ratios Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From thomas.stibor at in.tum.de Sat Jul 2 18:36:42 2011 From: thomas.stibor at in.tum.de (Thomas Stibor) Date: Sat, 2 Jul 2011 18:36:42 +0200 Subject: [R] Vector of functions Message-ID: Hi there, I have a vector of some functions e.g. #-----------------------------# f.1 <- function(a) { return( a ); } f.2 <- function(a) { return( 1 - (tanh(a))^2 ); } f.3 <- function(a) { return( 1 / (1+exp(-a)) * (1 - (1 / (1+exp(-a)))) ); } func.l <- c(f.1, f.2, f.3); #-----------------------------# and would like to calculate the output value of a function in the vector, that is, pick e.g. function at position 2 in func.l and calculate the output value for a=42. func.l[2](42); # gives error f <- func.l[2]; f(42); # gives error Is there an easy way to solve this problem? Cheers, Thomas From j.tripp at warwick.ac.uk Sat Jul 2 16:42:55 2011 From: j.tripp at warwick.ac.uk (James Tripp) Date: Sat, 2 Jul 2011 15:42:55 +0100 Subject: [R] (no subject) In-Reply-To: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> References: <1309611813.51373.YahooMailRC@web111008.mail.gq1.yahoo.com> Message-ID: On 02/07/2011 14:03, Shantal Bonnick wrote: > Hi, > > If i want to repeat a function, say 100 times, how do i do that? i used the for > loop but there's an error somewhere. > > Thanks > shan > [[alternative HTML version deleted]] > Hi Shan, Could you post the code you're using? Then we can figure out where the error is coming from. Best, James From ligges at statistik.tu-dortmund.de Sat Jul 2 19:50:33 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 2 Jul 2011 19:50:33 +0200 Subject: [R] %dopar% parallel processing experiment In-Reply-To: References: Message-ID: <4E0F5A69.2010603@statistik.tu-dortmund.de> On 02.07.2011 19:32, ivo welch wrote: > dear R experts--- > > I am experimenting with multicore processing, so far with pretty > disappointing results. Here is my simple example: > > A<- 100000 > randvalues<- abs(rnorm(A)) > minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ## an > arbitrary function > > ARGV<- commandArgs(trailingOnly=TRUE) > > if (ARGV[1] == "do-onecore") { > library(foreach) > discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } else > if (ARGV[1] == "do-multicore") { > library(doMC) > registerDoMC() > cat("You have", getDoParWorkers(), "cores\n") > discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } else > if (ARGV[1] == "plain") > for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else > cat("sorry, but argument", ARGV[1], "is not plain|do-onecore|do-multicore\n") > > > on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, > > "plain" takes about 68 seconds (real and user, using the unix timing > function). > "do-onecore" takes about 300 seconds. > "do-multicore" takes about 210 seconds real, (300 seconds user). > > this seems pretty disappointing. the cores are not used for the most > part, either. feedback appreciated. Feedback is that a single computation within your foreach loop is so quick that the overhead of communicating data and results between processes costs more time than the actual evaluation, hence you are faster with a single process. What you should do is: write code that does, e.g., 10000 iterations within 10 other iterations and just do a foreach loop around the outer 10. Then you will probably be much faster (without testing). But this is essentially the example I am using for teaching to show when not to do parallel processing..... Best, Uwe Ligges > /iaw > > > ---- > Ivo Welch (ivo.welch at gmail.com) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jwiley.psych at gmail.com Sat Jul 2 19:59:39 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Sat, 2 Jul 2011 10:59:39 -0700 Subject: [R] Vector of functions In-Reply-To: References: Message-ID: Hi Thomas, On Sat, Jul 2, 2011 at 9:36 AM, Thomas Stibor wrote: > Hi there, > > I have a vector of some functions e.g. > #-----------------------------# > f.1 <- function(a) { > ?return( a ); > } > f.2 <- function(a) { > ?return( 1 - (tanh(a))^2 ); > } > f.3 <- function(a) { > ?return( 1 / (1+exp(-a)) * (1 - (1 / (1+exp(-a)))) ); > } > > func.l <- c(f.1, f.2, f.3); > #-----------------------------# > > and would like to calculate the output value of a function in the > vector, that is, pick e.g. function at position 2 in func.l and calculate the > output value for a=42. > > func.l[2](42); # gives error Almost there, you just need to use a different extraction operator... func.l[[2]](42) compare the output of func.l[2] func.l[[2]] the first one retains the list structure, but just returns a list with the second element of func.l, the second one actually returns a function. You can check this by looking at the class class(func.l[2]) class(func.l[[2]]) Cheers, Josh > > f <- func.l[2]; > f(42); # gives error > > Is there an easy way to solve this problem? > > Cheers, > ?Thomas > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From ivo.welch at gmail.com Sat Jul 2 20:04:48 2011 From: ivo.welch at gmail.com (ivo welch) Date: Sat, 2 Jul 2011 11:04:48 -0700 Subject: [R] %dopar% parallel processing experiment In-Reply-To: <4E0F5A69.2010603@statistik.tu-dortmund.de> References: <4E0F5A69.2010603@statistik.tu-dortmund.de> Message-ID: thank you, uwe. this is a little disappointing. parallel processing for embarrassingly simple parallel operations--those needing no communication---should be feasible if the thread is not always created and released, but held. is there light-weight parallel processing that could facilitate this? regards, /iaw 2011/7/2 Uwe Ligges : > > > On 02.07.2011 19:32, ivo welch wrote: >> >> dear R experts--- >> >> I am experimenting with multicore processing, so far with pretty >> disappointing results. ?Here is my simple example: >> >> A<- 100000 >> randvalues<- abs(rnorm(A)) >> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ?## an >> arbitrary function >> >> ARGV<- commandArgs(trailingOnly=TRUE) >> >> if (ARGV[1] == "do-onecore") { >> ? ?library(foreach) >> ? ?discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >> else >> if (ARGV[1] == "do-multicore") { >> ? ?library(doMC) >> ? ?registerDoMC() >> ? ?cat("You have", getDoParWorkers(), "cores\n") >> ? ?discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } >> else >> if (ARGV[1] == "plain") >> ? ?for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >> cat("sorry, but argument", ARGV[1], "is not >> plain|do-onecore|do-multicore\n") >> >> >> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >> >> ? "plain" takes about 68 seconds (real and user, using the unix timing >> function). >> ? "do-onecore" takes about 300 seconds. >> ? "do-multicore" takes about 210 seconds real, (300 seconds user). >> >> this seems pretty disappointing. ?the cores are not used for the most >> part, either. ?feedback appreciated. > > > Feedback is that a single computation within your foreach loop is so quick > that the overhead of communicating data and results between processes costs > more time than the actual evaluation, hence you are faster with a single > process. > > What you should do is: > > write code that does, e.g., 10000 iterations within 10 other iterations and > just do a foreach loop around the outer 10. Then you will probably be much > faster (without testing). But this is essentially the example I am using for > teaching to show when not to do parallel processing..... > > Best, > Uwe Ligges > > > > > > >> /iaw >> >> >> ---- >> Ivo Welch (ivo.welch at gmail.com) >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > From kw.stat at gmail.com Sat Jul 2 20:18:30 2011 From: kw.stat at gmail.com (Kevin Wright) Date: Sat, 2 Jul 2011 13:18:30 -0500 Subject: [R] problem with corrgram function In-Reply-To: <4E09B6C8.4030802@ull.es> References: <4E09B6C8.4030802@ull.es> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Sat Jul 2 20:24:08 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 2 Jul 2011 20:24:08 +0200 Subject: [R] %dopar% parallel processing experiment In-Reply-To: References: <4E0F5A69.2010603@statistik.tu-dortmund.de> Message-ID: <4E0F6248.3030109@statistik.tu-dortmund.de> On 02.07.2011 20:04, ivo welch wrote: > thank you, uwe. this is a little disappointing. parallel processing > for embarrassingly simple parallel operations--those needing no > communication---should be feasible if the thread is not always created > and released, but held. is there light-weight parallel processing > that could facilitate this? Hmmm, now that you asked I checked it myself using snow: On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK clsuters, i.e. slow communication) I get: > system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) user system elapsed 3.10 0.19 51.43 while on a single core without parallelization framework: > system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) user system elapsed 93.74 0.09 94.24 Hence (although my prior assumption was that the overhead would be big also for other frameworks than foreach) it scales perfectly well with snow, perhaps you have to use foreach in a different way? Best, Uwe Ligges > > regards, > > /iaw > > > 2011/7/2 Uwe Ligges: >> >> >> On 02.07.2011 19:32, ivo welch wrote: >>> >>> dear R experts--- >>> >>> I am experimenting with multicore processing, so far with pretty >>> disappointing results. Here is my simple example: >>> >>> A<- 100000 >>> randvalues<- abs(rnorm(A)) >>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ## an >>> arbitrary function >>> >>> ARGV<- commandArgs(trailingOnly=TRUE) >>> >>> if (ARGV[1] == "do-onecore") { >>> library(foreach) >>> discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >>> else >>> if (ARGV[1] == "do-multicore") { >>> library(doMC) >>> registerDoMC() >>> cat("You have", getDoParWorkers(), "cores\n") >>> discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) } >>> else >>> if (ARGV[1] == "plain") >>> for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >>> cat("sorry, but argument", ARGV[1], "is not >>> plain|do-onecore|do-multicore\n") >>> >>> >>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >>> >>> "plain" takes about 68 seconds (real and user, using the unix timing >>> function). >>> "do-onecore" takes about 300 seconds. >>> "do-multicore" takes about 210 seconds real, (300 seconds user). >>> >>> this seems pretty disappointing. the cores are not used for the most >>> part, either. feedback appreciated. >> >> >> Feedback is that a single computation within your foreach loop is so quick >> that the overhead of communicating data and results between processes costs >> more time than the actual evaluation, hence you are faster with a single >> process. >> >> What you should do is: >> >> write code that does, e.g., 10000 iterations within 10 other iterations and >> just do a foreach loop around the outer 10. Then you will probably be much >> faster (without testing). But this is essentially the example I am using for >> teaching to show when not to do parallel processing..... >> >> Best, >> Uwe Ligges >> >> >> >> >> >> >>> /iaw >>> >>> >>> ---- >>> Ivo Welch (ivo.welch at gmail.com) >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> From ivo.welch at gmail.com Sat Jul 2 20:35:08 2011 From: ivo.welch at gmail.com (ivo welch) Date: Sat, 2 Jul 2011 11:35:08 -0700 Subject: [R] Speed Advice for R --- avoid data frames Message-ID: This email is intended for R users that are not that familiar with R internals and are searching google about how to speed up R. Despite common misperception, R is not slow when it comes to iterative access. R is fast when it comes to matrices. R is very slow when it comes to iterative access into data frames. Such access occurs when a user uses "data$varname[index]", which is a very common operation. To illustrate, run the following program: R <- 1000; C <- 1000 example <- function(m) { cat("rows: "); cat(system.time( for (r in 1:R) m[r,20] <- sqrt(abs(m[r,20])) + rnorm(1) ), "\n") cat("columns: "); cat(system.time(for (c in 1:C) m[20,c] <- sqrt(abs(m[20,c])) + rnorm(1)), "\n") if (is.data.frame(m)) { cat("df: columns as names: "); cat(system.time(for (c in 1:C) m[[c]][20] <- sqrt(abs(m[[c]][20])) + rnorm(1)), "\n") } } cat("\n**** Now as matrix\n") example( matrix( rnorm(C*R), nrow=R ) ) cat("\n**** Now as data frame\n") example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) ) The following are the reported timing under R 2.12.0 on a Mac Pro 3,1 with ample RAM: matrix, columns: 0.01s matrix, rows: 0.175s data frame, columns: 53s data frame, rows: 56s data frame, names: 58s Data frame access is about 5,000 times slower than matrix column access, and 300 times slower than matrix row access. R's data frame operational speed is an amazing 40 data accesses per seconds. I have not seen access numbers this low for decades. How to avoid it? Not easy. One way is to create multiple matrices, and group them as an object. of course, this loses a lot of features of R. Another way is to copy all data used in calculations out of the data frame into a matrix, do the operations, and then copy them back. not ideal, either. In my opinion, this is an R design flow. Data frames are the fundamental unit of much statistical analysis, and should be fast. I think R lacks any indexing into data frames. Turning on indexing of data frames should at least be an optional feature. I hope this message post helps others. /iaw ---- Ivo Welch (ivo.welch at gmail.com) http://www.ivo-welch.info/ From ivo.welch at gmail.com Sat Jul 2 20:42:07 2011 From: ivo.welch at gmail.com (ivo welch) Date: Sat, 2 Jul 2011 11:42:07 -0700 Subject: [R] %dopar% parallel processing experiment In-Reply-To: <4E0F6248.3030109@statistik.tu-dortmund.de> References: <4E0F5A69.2010603@statistik.tu-dortmund.de> <4E0F6248.3030109@statistik.tu-dortmund.de> Message-ID: hi uwe--I did not know what snow was. from my 1 minute reading, it seems like a much more involved setup that is much more flexible after the setup cost has been incurred (specifically, allowing use of many machines). the attractiveness of the doMC/foreach framework is its simplicity of installation and use. but if I understand what you are telling me, you are using a different parallelization framework, and it shows that my example is completed a lot faster using this different parallelization framework. correct? if so, the problem is my use of the doMC framework, not the inherent cost of dealing with multiple processes. is this interpretation correct? regards, /iaw ---- Ivo Welch (ivo.welch at gmail.com) http://www.ivo-welch.info/ 2011/7/2 Uwe Ligges : > > > On 02.07.2011 20:04, ivo welch wrote: >> >> thank you, uwe. ?this is a little disappointing. ?parallel processing >> for embarrassingly simple parallel operations--those needing no >> communication---should be feasible if the thread is not always created >> and released, but held. ?is there light-weight parallel processing >> that could facilitate this? > > Hmmm, now that you asked I checked it myself using snow: > > On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK > clsuters, i.e. slow communication) I get: > > > >> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20), >> i))) > ? user ?system elapsed > ? 3.10 ? ?0.19 ? 51.43 > > while on a single core without parallelization framework: > >> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) > ? user ?system elapsed > ?93.74 ? ?0.09 ? 94.24 > > Hence (although my prior assumption was that the overhead would be big also > for other frameworks than foreach) it scales perfectly well with snow, > perhaps you have to use foreach in a different way? > > Best, > Uwe Ligges > > > > > >> >> regards, >> >> /iaw >> >> >> 2011/7/2 Uwe Ligges: >>> >>> >>> On 02.07.2011 19:32, ivo welch wrote: >>>> >>>> dear R experts--- >>>> >>>> I am experimenting with multicore processing, so far with pretty >>>> disappointing results. ?Here is my simple example: >>>> >>>> A<- 100000 >>>> randvalues<- abs(rnorm(A)) >>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ?## an >>>> arbitrary function >>>> >>>> ARGV<- commandArgs(trailingOnly=TRUE) >>>> >>>> if (ARGV[1] == "do-onecore") { >>>> ? ?library(foreach) >>>> ? ?discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >>>> else >>>> if (ARGV[1] == "do-multicore") { >>>> ? ?library(doMC) >>>> ? ?registerDoMC() >>>> ? ?cat("You have", getDoParWorkers(), "cores\n") >>>> ? ?discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) >>>> } >>>> else >>>> if (ARGV[1] == "plain") >>>> ? ?for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >>>> cat("sorry, but argument", ARGV[1], "is not >>>> plain|do-onecore|do-multicore\n") >>>> >>>> >>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >>>> >>>> ? "plain" takes about 68 seconds (real and user, using the unix timing >>>> function). >>>> ? "do-onecore" takes about 300 seconds. >>>> ? "do-multicore" takes about 210 seconds real, (300 seconds user). >>>> >>>> this seems pretty disappointing. ?the cores are not used for the most >>>> part, either. ?feedback appreciated. >>> >>> >>> Feedback is that a single computation within your foreach loop is so >>> quick >>> that the overhead of communicating data and results between processes >>> costs >>> more time than the actual evaluation, hence you are faster with a single >>> process. >>> >>> What you should do is: >>> >>> write code that does, e.g., 10000 iterations within 10 other iterations >>> and >>> just do a foreach loop around the outer 10. Then you will probably be >>> much >>> faster (without testing). But this is essentially the example I am using >>> for >>> teaching to show when not to do parallel processing..... >>> >>> Best, >>> Uwe Ligges >>> >>> >>> >>> >>> >>> >>>> /iaw >>>> >>>> >>>> ---- >>>> Ivo Welch (ivo.welch at gmail.com) >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> > From ligges at statistik.tu-dortmund.de Sat Jul 2 20:46:50 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 2 Jul 2011 20:46:50 +0200 Subject: [R] %dopar% parallel processing experiment In-Reply-To: References: <4E0F5A69.2010603@statistik.tu-dortmund.de> <4E0F6248.3030109@statistik.tu-dortmund.de> Message-ID: <4E0F679A.8020206@statistik.tu-dortmund.de> On 02.07.2011 20:42, ivo welch wrote: > hi uwe--I did not know what snow was. from my 1 minute reading, it > seems like a much more involved setup that is much more flexible after > the setup cost has been incurred (specifically, allowing use of many > machines). > > the attractiveness of the doMC/foreach framework is its simplicity of > installation and use. > > but if I understand what you are telling me, you are using a different > parallelization framework, and it shows that my example is completed a > lot faster using this different parallelization framework. correct? > if so, the problem is my use of the doMC framework, not the inherent > cost of dealing with multiple processes. is this interpretation > correct? Indeed. Uwe > regards, > > /iaw > > ---- > Ivo Welch (ivo.welch at gmail.com) > http://www.ivo-welch.info/ > > > 2011/7/2 Uwe Ligges: >> >> >> On 02.07.2011 20:04, ivo welch wrote: >>> >>> thank you, uwe. this is a little disappointing. parallel processing >>> for embarrassingly simple parallel operations--those needing no >>> communication---should be feasible if the thread is not always created >>> and released, but held. is there light-weight parallel processing >>> that could facilitate this? >> >> Hmmm, now that you asked I checked it myself using snow: >> >> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK >> clsuters, i.e. slow communication) I get: >> >> >> >>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20), >>> i))) >> user system elapsed >> 3.10 0.19 51.43 >> >> while on a single core without parallelization framework: >> >>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) >> user system elapsed >> 93.74 0.09 94.24 >> >> Hence (although my prior assumption was that the overhead would be big also >> for other frameworks than foreach) it scales perfectly well with snow, >> perhaps you have to use foreach in a different way? >> >> Best, >> Uwe Ligges >> >> >> >> >> >>> >>> regards, >>> >>> /iaw >>> >>> >>> 2011/7/2 Uwe Ligges: >>>> >>>> >>>> On 02.07.2011 19:32, ivo welch wrote: >>>>> >>>>> dear R experts--- >>>>> >>>>> I am experimenting with multicore processing, so far with pretty >>>>> disappointing results. Here is my simple example: >>>>> >>>>> A<- 100000 >>>>> randvalues<- abs(rnorm(A)) >>>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ## an >>>>> arbitrary function >>>>> >>>>> ARGV<- commandArgs(trailingOnly=TRUE) >>>>> >>>>> if (ARGV[1] == "do-onecore") { >>>>> library(foreach) >>>>> discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >>>>> else >>>>> if (ARGV[1] == "do-multicore") { >>>>> library(doMC) >>>>> registerDoMC() >>>>> cat("You have", getDoParWorkers(), "cores\n") >>>>> discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) >>>>> } >>>>> else >>>>> if (ARGV[1] == "plain") >>>>> for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >>>>> cat("sorry, but argument", ARGV[1], "is not >>>>> plain|do-onecore|do-multicore\n") >>>>> >>>>> >>>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >>>>> >>>>> "plain" takes about 68 seconds (real and user, using the unix timing >>>>> function). >>>>> "do-onecore" takes about 300 seconds. >>>>> "do-multicore" takes about 210 seconds real, (300 seconds user). >>>>> >>>>> this seems pretty disappointing. the cores are not used for the most >>>>> part, either. feedback appreciated. >>>> >>>> >>>> Feedback is that a single computation within your foreach loop is so >>>> quick >>>> that the overhead of communicating data and results between processes >>>> costs >>>> more time than the actual evaluation, hence you are faster with a single >>>> process. >>>> >>>> What you should do is: >>>> >>>> write code that does, e.g., 10000 iterations within 10 other iterations >>>> and >>>> just do a foreach loop around the outer 10. Then you will probably be >>>> much >>>> faster (without testing). But this is essentially the example I am using >>>> for >>>> teaching to show when not to do parallel processing..... >>>> >>>> Best, >>>> Uwe Ligges >>>> >>>> >>>> >>>> >>>> >>>> >>>>> /iaw >>>>> >>>>> >>>>> ---- >>>>> Ivo Welch (ivo.welch at gmail.com) >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From daniel at umd.edu Sat Jul 2 20:51:06 2011 From: daniel at umd.edu (Daniel Malter) Date: Sat, 2 Jul 2011 11:51:06 -0700 (PDT) Subject: [R] Repeating a function in R In-Reply-To: <1309613295639-3640508.post@n4.nabble.com> References: <1309613295639-3640508.post@n4.nabble.com> Message-ID: <1309632666024-3640966.post@n4.nabble.com> You can just tell the function to create 1000 random numbers. See ?runif for the specifics. The arguments are n, min, and max. 'n' is the one you are looking for. Da. -- View this message in context: http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640966.html Sent from the R help mailing list archive at Nabble.com. From daniel at umd.edu Sat Jul 2 20:57:23 2011 From: daniel at umd.edu (Daniel Malter) Date: Sat, 2 Jul 2011 11:57:23 -0700 (PDT) Subject: [R] Repeating a function in R In-Reply-To: <1309613295639-3640508.post@n4.nabble.com> References: <1309613295639-3640508.post@n4.nabble.com> Message-ID: <1309633043070-3640982.post@n4.nabble.com> If you want to repeat an entire function, use replicate as in replicate(15,sapply(1,function(x) runif(x))) Here, sapply(1,function(x) runif(x)) draws on uniformly distributed variable. replicate(15,...) is the wrapper function that tells to do this 15 times. The benefit here is that you can replicate a more complex function. However, if you just need 1000 random draws, I would use the easier approach I suggested above. Best, Daniel -- View this message in context: http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640982.html Sent from the R help mailing list archive at Nabble.com. From ligges at statistik.tu-dortmund.de Sat Jul 2 20:58:45 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 2 Jul 2011 20:58:45 +0200 Subject: [R] Speed Advice for R --- avoid data frames In-Reply-To: References: Message-ID: <4E0F6A65.8080101@statistik.tu-dortmund.de> Some comments: the comparison matrix rows vs. matrix columns is incorrect: Note that R has lazy evaluation, hence you construct your matrix in the timing for the rows and it is already constructed in the timing for the columns, hence you want to use: M <- matrix( rnorm(C*R), nrow=R ) D <- as.data.frame(matrix( rnorm(C*R), nrow=R ) ) example(M) example(D) Further on, you are correct with you statement that data.frame indexing is much slower, but if you can store your data in matrix form, just go on as it is. I doubt anybody is really going to make the index operation you cited within a loop. Then, with a data.frame, I can live with many vectorized replacements again: > system.time(D[,20] <- sqrt(abs(D[,20])) + rnorm(1000)) user system elapsed 0.01 0.00 0.01 > system.time(D[20,] <- sqrt(abs(D[20,])) + rnorm(1000)) user system elapsed 0.51 0.00 0.52 OK, it would be nice to do that faster, but this is not easy. I think R Core is happy to see contributions to make it faster without breaking existing features. Best wishes, Uwe On 02.07.2011 20:35, ivo welch wrote: > This email is intended for R users that are not that familiar with R > internals and are searching google about how to speed up R. > > Despite common misperception, R is not slow when it comes to iterative > access. R is fast when it comes to matrices. R is very slow when it > comes to iterative access into data frames. Such access occurs when a > user uses "data$varname[index]", which is a very common operation. To > illustrate, run the following program: > > R<- 1000; C<- 1000 > > example<- function(m) { > cat("rows: "); cat(system.time( for (r in 1:R) m[r,20]<- > sqrt(abs(m[r,20])) + rnorm(1) ), "\n") > cat("columns: "); cat(system.time(for (c in 1:C) m[20,c]<- > sqrt(abs(m[20,c])) + rnorm(1)), "\n") > if (is.data.frame(m)) { cat("df: columns as names: "); > cat(system.time(for (c in 1:C) m[[c]][20]<- sqrt(abs(m[[c]][20])) + > rnorm(1)), "\n") } > } > > cat("\n**** Now as matrix\n") > example( matrix( rnorm(C*R), nrow=R ) ) > > cat("\n**** Now as data frame\n") > example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) ) > > > The following are the reported timing under R 2.12.0 on a Mac Pro 3,1 > with ample RAM: > > matrix, columns: 0.01s > matrix, rows: 0.175s > data frame, columns: 53s > data frame, rows: 56s > data frame, names: 58s > > Data frame access is about 5,000 times slower than matrix column > access, and 300 times slower than matrix row access. R's data frame > operational speed is an amazing 40 data accesses per seconds. I have > not seen access numbers this low for decades. > > > How to avoid it? Not easy. One way is to create multiple matrices, > and group them as an object. of course, this loses a lot of features > of R. Another way is to copy all data used in calculations out of the > data frame into a matrix, do the operations, and then copy them back. > not ideal, either. > > In my opinion, this is an R design flow. Data frames are the > fundamental unit of much statistical analysis, and should be fast. I > think R lacks any indexing into data frames. Turning on indexing of > data frames should at least be an optional feature. > > > I hope this message post helps others. > > /iaw > > ---- > Ivo Welch (ivo.welch at gmail.com) > http://www.ivo-welch.info/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Sat Jul 2 21:00:38 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sat, 02 Jul 2011 21:00:38 +0200 Subject: [R] Repeating a function in R In-Reply-To: <1309632666024-3640966.post@n4.nabble.com> References: <1309613295639-3640508.post@n4.nabble.com> <1309632666024-3640966.post@n4.nabble.com> Message-ID: <4E0F6AD6.50303@statistik.tu-dortmund.de> On 02.07.2011 20:51, Daniel Malter wrote: > You can just tell the function to create 1000 random numbers. See ?runif for > the specifics. The arguments are n, min, and max. 'n' is the one you are > looking for. Thanks for providing help on R-help, but for the future please - respond to the OP rather than to the list only - cite the OP's message so that anybody else on this mailing list understand the context of your message. Best, Uwe Ligges > Da. > > -- > View this message in context: http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640966.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From djmuser at gmail.com Sat Jul 2 21:22:30 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sat, 2 Jul 2011 12:22:30 -0700 Subject: [R] For help in R coding In-Reply-To: <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> Message-ID: Hi: There seems to be a problem if the string ends in , or . , which makes it difficult for strsplit() to pick up if it is splitting on those characters. Here is an alternative, splitting on individual characters and using charmatch() instead: charsum <- function(s, char) { u <- strsplit(s, "") sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) } unname(sapply(txtvec, function(x) charsum(x, ','))) unname(sapply(txtvec, function(x) charsum(x, '.'))) Putting this into a data frame, dfout <- data.frame(periods = unname(sapply(txtvec, function(x) charsum(x, '.'))), commas = unname(sapply(txtvec, function(x) charsum(x, '.'))) ) txtvec HTH, Dennis On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius wrote: > > On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: > >> >> >>>> Dear all, >>>> >>>> I am doing a project on variant calling using R.I am working on >>>> pileup file.There are 10 columns in my data frame and I want to >>>> count the number of A,C,G and T in each row for column 9.example of >>>> column 9 is given below- >>>> >>>> ? ? ? ? .a,g,, >>>> ? ? ? ? .t,t,, >>>> ? ? ? ? .,c,c, >>>> ? ? ? ? .,a,,, >>>> ? ? ? ? .,t,t,t >>>> ? ? ? ? .c,,g,^!. >>>> ? ? ? ? .g,ggg.^!, >>>> ? ? ? ? .$,,,,,., >>>> ? ? ? ? a,g,,t, >>>> ? ? ? ? ,,,,,.,^!. >>>> ? ? ? ? ,$,,,,.,. >>>> >>>> This is a bit confusing for me as these characters are in one column >>>> and how can we scan them for each row to print number of A,C,G and T >>>> for each row. >>> >>> Seems a bit clunky but this does the job (first the data): >>>> >>>> txt <- " .a,g,, >>> >>> + ? ? ? ? ? ?.t,t,, >>> + ? ? ? ? ? ?.,c,c, >>> + ? ? ? ? ? ?.,a,,, >>> + ? ? ? ? ? ?.,t,t,t >>> + ? ? ? ? ? ?.c,,g,^!. >>> + ? ? ? ? ? ?.g,ggg.^!, >>> + ? ? ? ? ? ?.$,,,,,., >>> + ? ? ? ? ? ?a,g,,t, >>> + ? ? ? ? ? ?,,,,,.,^!. >>> + ? ? ? ? ? ?,$,,,,.,." >>> >>>> txtvec <- readLines(textConnection(txt)) >>> >>> Now the clunky solution, Basically subtracts 1 from the counts of >>> "fragments" that result from splitting on each letter in turn. Could >>> be made prettier with a function that did the job. >>> >>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>> >>> split="a"), length) , "-", 1)), >>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>> length) , "-", 1)), >>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>> length) , "-", 1)), >>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>> length) , "-", 1)) ) >>> ? ? ? ? ? ? ? ? ? ? A C G T >>> .a,g,, ? ? ? ? ? ? ? 1 0 1 0 >>> ? ? ? ? ?.t,t,, ? ? 0 0 0 2 >>> ? ? ? ? ?.,c,c, ? ? 0 2 0 0 >>> ? ? ? ? ?.,a,,, ? ? 1 0 0 0 >>> ? ? ? ? ?.,t,t,t ? ?0 0 0 2 >>> ? ? ? ? ?.c,,g,^!. ?0 1 1 0 >>> ? ? ? ? ?.g,ggg.^!, 0 0 4 0 >>> ? ? ? ? ?.$,,,,,., ?0 0 0 0 >>> ? ? ? ? ?a,g,,t, ? ?1 0 1 1 >>> ? ? ? ? ?,,,,,.,^!. 0 0 0 0 >>> ? ? ? ? ?,$,,,,.,. ?0 0 0 0 >>> >>> Has the advantage that the input data ends up as rownames, which was a >>> surprise. >>> >>> If you wanted to count "A" and "a" as equivalent, then the split >>> argument should be "a|A" >>> >>> >> >>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE >>>> THIS. >> >> BUT CAN I COUNT . AND , ALSO USING- >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split=".|,"), length) , "-", 1)), >> >> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES >> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >> CALCULATING AND JUST SHOWING 0. > > You need to use valid regex expressions for 'split'. Since "." and "," are > special characters they need to be escaped when you wnat the literals to be > recognized as such. > > I haven't figured out why but you need to drop the final operation of > subtracting 1 from the values when counting commas: > > data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? split="\\."), length) , "-", 1)) > ?,commas = unlist( lapply( sapply(txtvec, strsplit, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? split="\\,"), length) ) ) > ? ? ? ? ? ? ? ? ? ? ? periods commas > ?.a,g,, ? ? ? ? ? ? ? ? ? ? ?1 ? ? ?3 > ? ? ? ? ? ?.t,t,, ? ? ? ? ? 1 ? ? ?3 > ? ? ? ? ? ?.,c,c, ? ? ? ? ? 1 ? ? ?3 > ? ? ? ? ? ?.,a,,, ? ? ? ? ? 1 ? ? ?4 > ? ? ? ? ? ?.,t,t,t ? ? ? ? ?1 ? ? ?4 > ? ? ? ? ? ?.c,,g,^!. ? ? ? ?1 ? ? ?4 > ? ? ? ? ? ?.g,ggg.^!, ? ? ? 2 ? ? ?2 > ? ? ? ? ? ?.$,,,,,., ? ? ? ?2 ? ? ?6 > ? ? ? ? ? ?a,g,,t, ? ? ? ? ?0 ? ? ?4 > ? ? ? ? ? ?,,,,,.,^!. ? ? ? 1 ? ? ?7 > ? ? ? ? ? ?,$,,,,.,. ? ? ? ?1 ? ? ?7 > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From ivo.welch at gmail.com Sat Jul 2 21:35:29 2011 From: ivo.welch at gmail.com (ivo welch) Date: Sat, 2 Jul 2011 12:35:29 -0700 Subject: [R] Speed Advice for R --- avoid data frames In-Reply-To: <4E0F6A65.8080101@statistik.tu-dortmund.de> References: <4E0F6A65.8080101@statistik.tu-dortmund.de> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Sat Jul 2 22:04:18 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sat, 2 Jul 2011 16:04:18 -0400 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> Message-ID: <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> On reflection and a bit of testing I think the best approach would be to use gregexpr. For counting the number of commas, this appears quite straightforward. > sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 ) [1] 3 3 3 4 3 3 2 6 4 6 6 It easily generalizes to period and the `|` (or) operation on letters. ( did need to add the check since the length of gregexpr is always at least one but ihas value -1 when there is no match > sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 ) [1] 0 2 0 0 3 0 0 0 1 0 0 On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote: > Hi: > > There seems to be a problem if the string ends in , or . , which makes > it difficult for strsplit() to pick up if it is splitting on those > characters. Here is an alternative, splitting on individual characters > and using charmatch() instead: > > charsum <- function(s, char) { > u <- strsplit(s, "") > sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) > } > > unname(sapply(txtvec, function(x) charsum(x, ','))) > unname(sapply(txtvec, function(x) charsum(x, '.'))) > > Putting this into a data frame, > > dfout <- data.frame(periods = unname(sapply(txtvec, function(x) > charsum(x, '.'))), > commas = unname(sapply(txtvec, > function(x) charsum(x, '.'))) ) > txtvec > > HTH, > Dennis > > On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius > wrote: >> >> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: >> >>> >>> >>>>> Dear all, >>>>> >>>>> I am doing a project on variant calling using R.I am working on >>>>> pileup file.There are 10 columns in my data frame and I want to >>>>> count the number of A,C,G and T in each row for column 9.example >>>>> of >>>>> column 9 is given below- >>>>> >>>>> .a,g,, >>>>> .t,t,, >>>>> .,c,c, >>>>> .,a,,, >>>>> .,t,t,t >>>>> .c,,g,^!. >>>>> .g,ggg.^!, >>>>> .$,,,,,., >>>>> a,g,,t, >>>>> ,,,,,.,^!. >>>>> ,$,,,,.,. >>>>> >>>>> This is a bit confusing for me as these characters are in one >>>>> column >>>>> and how can we scan them for each row to print number of A,C,G >>>>> and T >>>>> for each row. >>>> >>>> Seems a bit clunky but this does the job (first the data): >>>>> >>>>> txt <- " .a,g,, >>>> >>>> + .t,t,, >>>> + .,c,c, >>>> + .,a,,, >>>> + .,t,t,t >>>> + .c,,g,^!. >>>> + .g,ggg.^!, >>>> + .$,,,,,., >>>> + a,g,,t, >>>> + ,,,,,.,^!. >>>> + ,$,,,,.,." >>>> >>>>> txtvec <- readLines(textConnection(txt)) >>>> >>>> Now the clunky solution, Basically subtracts 1 from the counts of >>>> "fragments" that result from splitting on each letter in turn. >>>> Could >>>> be made prettier with a function that did the job. >>>> >>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>> >>>> split="a"), length) , "-", 1)), >>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>>> length) , "-", 1)), >>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>>> length) , "-", 1)), >>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>>> length) , "-", 1)) ) >>>> A C G T >>>> .a,g,, 1 0 1 0 >>>> .t,t,, 0 0 0 2 >>>> .,c,c, 0 2 0 0 >>>> .,a,,, 1 0 0 0 >>>> .,t,t,t 0 0 0 2 >>>> .c,,g,^!. 0 1 1 0 >>>> .g,ggg.^!, 0 0 4 0 >>>> .$,,,,,., 0 0 0 0 >>>> a,g,,t, 1 0 1 1 >>>> ,,,,,.,^!. 0 0 0 0 >>>> ,$,,,,.,. 0 0 0 0 >>>> >>>> Has the advantage that the input data ends up as rownames, which >>>> was a >>>> surprise. >>>> >>>> If you wanted to count "A" and "a" as equivalent, then the split >>>> argument should be "a|A" >>>> >>>> >>> >>>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>>>> LIKE >>>>> THIS. >>> >>> BUT CAN I COUNT . AND , ALSO USING- >>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>> split=".|,"), length) , "-", 1)), >>> >>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME >>> PLACES >>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >>> CALCULATING AND JUST SHOWING 0. >> >> You need to use valid regex expressions for 'split'. Since "." and >> "," are >> special characters they need to be escaped when you wnat the >> literals to be >> recognized as such. >> >> I haven't figured out why but you need to drop the final operation of >> subtracting 1 from the values when counting commas: >> >> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split="\\."), length) , "-", 1)) >> ,commas = unlist( lapply( sapply(txtvec, strsplit, >> split="\\,"), length) ) ) >> periods commas >> .a,g,, 1 3 >> .t,t,, 1 3 >> .,c,c, 1 3 >> .,a,,, 1 4 >> .,t,t,t 1 4 >> .c,,g,^!. 1 4 >> .g,ggg.^!, 2 2 >> .$,,,,,., 2 6 >> a,g,,t, 0 4 >> ,,,,,.,^!. 1 7 >> ,$,,,,.,. 1 7 >> >> -- >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> David Winsemius, MD West Hartford, CT From djmuser at gmail.com Sat Jul 2 22:04:52 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sat, 2 Jul 2011 13:04:52 -0700 Subject: [R] Vector of functions In-Reply-To: References: Message-ID: Hi, To amplify on Josh's cogent remarks, the reason you can't create a vector of functions is because vectors are composed of atomic objects and functions are not atomic objects: > is.atomic(f.3) [1] FALSE > is.atomic(1) [1] TRUE > is.atomic(TRUE) [1] TRUE > is.atomic('a') [1] TRUE 'Vectors' of functions are actually (and appropriately) assigned to lists. You can check the structure of any R object with str(): > fns <- c(f.1, f.2, f.3) > str(fns) ## Oops, R coerced your vector into a list List of 3 $ :function (a) ..- attr(*, "source")= chr [1:3] "function(a) {" ... $ :function (a) ..- attr(*, "source")= chr [1:3] "function(a) {" ... $ :function (a) ..- attr(*, "source")= chr [1:3] "function(a) {" ... > funs <- list(f.1, f.2, f.3) # clearer code > str(funs) List of 3 $ :function (a) ..- attr(*, "source")= chr [1:3] "function(a) {" ... $ :function (a) ..- attr(*, "source")= chr [1:3] "function(a) {" ... $ :function (a) ..- attr(*, "source")= chr [1:3] "function(a) {" ... > identical(fns, funs) # Are the two objects identical? [1] TRUE You can have all kinds of fun with (lists of) functions. I wanted to plot your second function: > curve(funs[[2]], -10, 10) Error in curve(funs[[2]], -10, 10) : 'expr' must be a function, call or an expression containing 'x' ## Oh yeah, the argument of the function that curve() needs is x. ## We don't need to rewrite the function, though: > curve(funs[[2]](x), -10, 10) Now one can verify that the function calls make sense: > funs[[2]](0) [1] 1 > funs[[2]](pi/2) [1] 0.1588316 > funs[[2]](2 * pi) [1] 1.394927e-05 > funs[[2]](Inf) [1] 0 If you're going to be doing a lot of work with functions, I'd suggest picking up a book on R programming. Fortunately, there's a good one auf Deutsch by Uwe Ligges: Programmieren mit R. http://www.statistik.tu-dortmund.de/~ligges/PmitR/ and several good ones in English. See the book list page at CRAN: http://www.r-project.org/doc/bib/R-books.html HTH, Dennis On Sat, Jul 2, 2011 at 9:36 AM, Thomas Stibor wrote: > Hi there, > > I have a vector of some functions e.g. > #-----------------------------# > f.1 <- function(a) { > ?return( a ); > } > f.2 <- function(a) { > ?return( 1 - (tanh(a))^2 ); > } > f.3 <- function(a) { > ?return( 1 / (1+exp(-a)) * (1 - (1 / (1+exp(-a)))) ); > } > > func.l <- c(f.1, f.2, f.3); > #-----------------------------# > > and would like to calculate the output value of a function in the > vector, that is, pick e.g. function at position 2 in func.l and calculate the > output value for a=42. > > func.l[2](42); # gives error > > f <- func.l[2]; > f(42); # gives error > > Is there an easy way to solve this problem? > > Cheers, > ?Thomas > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From mailinglist.honeypot at gmail.com Sat Jul 2 22:23:49 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Sat, 2 Jul 2011 16:23:49 -0400 Subject: [R] %dopar% parallel processing experiment In-Reply-To: References: <4E0F5A69.2010603@statistik.tu-dortmund.de> <4E0F6248.3030109@statistik.tu-dortmund.de> Message-ID: Here's another datapoint using the multicore package -- which is what the foreach/doMC combo uses internally: I halved your A value to 50,000 because I was getting impatient :-) A=50000 randvalues <- abs(rnorm(A)) minfn <- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } system.time(a1 <- lapply(1:A, function(i) uniroot(minfn, c(1e-20, 9e20), i))) user system elapsed 40.826 0.108 41.273 library(multicore) system.time(a2 <- mclapply(1:A, function(i) uniroot(minfn, c(1e-20, 9e20), i))) user system elapsed 21.218 0.980 23.473 This was on my 2-core laptop, so -- almost perfect (eg. 2x) speedup. The advantage over snow is that there's less "ceremony" to get easy ||-ization on a single machine, the cons is that it only ||-izes over 1 multicore machine (and I don't think it 100% works on windows (if it all)), whereas snow and foreach are more flexible in that regard. -steve On Sat, Jul 2, 2011 at 2:42 PM, ivo welch wrote: > hi uwe--I did not know what snow was. ?from my 1 minute reading, it > seems like a much more involved setup that is much more flexible after > the setup cost has been incurred (specifically, allowing use of many > machines). > > the attractiveness of the doMC/foreach framework is its simplicity of > installation and use. > > but if I understand what you are telling me, you are using a different > parallelization framework, and it shows that my example is completed a > lot faster using this different parallelization framework. ?correct? > if so, the problem is my use of the doMC framework, not the inherent > cost of dealing with multiple processes. ?is this interpretation > correct? > > regards, > > /iaw > > ---- > Ivo Welch (ivo.welch at gmail.com) > http://www.ivo-welch.info/ > > > 2011/7/2 Uwe Ligges : >> >> >> On 02.07.2011 20:04, ivo welch wrote: >>> >>> thank you, uwe. ?this is a little disappointing. ?parallel processing >>> for embarrassingly simple parallel operations--those needing no >>> communication---should be feasible if the thread is not always created >>> and released, but held. ?is there light-weight parallel processing >>> that could facilitate this? >> >> Hmmm, now that you asked I checked it myself using snow: >> >> On a some years old 2-core AMD64 machine with R-2.13.0 and snow (using SOCK >> clsuters, i.e. slow communication) I get: >> >> >> >>> system.time(parSapply(cl, 1:A, function(i) uniroot(minfn, c(1e-20,9e20), >>> i))) >> ? user ?system elapsed >> ? 3.10 ? ?0.19 ? 51.43 >> >> while on a single core without parallelization framework: >> >>> system.time(sapply(1:A, function(i) uniroot(minfn, c(1e-20,9e20), i))) >> ? user ?system elapsed >> ?93.74 ? ?0.09 ? 94.24 >> >> Hence (although my prior assumption was that the overhead would be big also >> for other frameworks than foreach) it scales perfectly well with snow, >> perhaps you have to use foreach in a different way? >> >> Best, >> Uwe Ligges >> >> >> >> >> >>> >>> regards, >>> >>> /iaw >>> >>> >>> 2011/7/2 Uwe Ligges: >>>> >>>> >>>> On 02.07.2011 19:32, ivo welch wrote: >>>>> >>>>> dear R experts--- >>>>> >>>>> I am experimenting with multicore processing, so far with pretty >>>>> disappointing results. ?Here is my simple example: >>>>> >>>>> A<- 100000 >>>>> randvalues<- abs(rnorm(A)) >>>>> minfn<- function( x, i ) { log(abs(x))+x^3+i/A+randvalues[i] } ?## an >>>>> arbitrary function >>>>> >>>>> ARGV<- commandArgs(trailingOnly=TRUE) >>>>> >>>>> if (ARGV[1] == "do-onecore") { >>>>> ? ?library(foreach) >>>>> ? ?discard<- foreach(i = 1:A) %do% uniroot( minfn, c(1e-20,9e20), i ) } >>>>> else >>>>> if (ARGV[1] == "do-multicore") { >>>>> ? ?library(doMC) >>>>> ? ?registerDoMC() >>>>> ? ?cat("You have", getDoParWorkers(), "cores\n") >>>>> ? ?discard<- foreach(i = 1:A) %dopar% uniroot( minfn, c(1e-20,9e20), i ) >>>>> } >>>>> else >>>>> if (ARGV[1] == "plain") >>>>> ? ?for (i in 1:A) discard<- uniroot( minfn, c(1e-20,9e20), i ) else >>>>> cat("sorry, but argument", ARGV[1], "is not >>>>> plain|do-onecore|do-multicore\n") >>>>> >>>>> >>>>> on my Mac Pro 3,1 (2 quad-cores), R 2.12.0, which reports 8 cores, >>>>> >>>>> ? "plain" takes about 68 seconds (real and user, using the unix timing >>>>> function). >>>>> ? "do-onecore" takes about 300 seconds. >>>>> ? "do-multicore" takes about 210 seconds real, (300 seconds user). >>>>> >>>>> this seems pretty disappointing. ?the cores are not used for the most >>>>> part, either. ?feedback appreciated. >>>> >>>> >>>> Feedback is that a single computation within your foreach loop is so >>>> quick >>>> that the overhead of communicating data and results between processes >>>> costs >>>> more time than the actual evaluation, hence you are faster with a single >>>> process. >>>> >>>> What you should do is: >>>> >>>> write code that does, e.g., 10000 iterations within 10 other iterations >>>> and >>>> just do a foreach loop around the outer 10. Then you will probably be >>>> much >>>> faster (without testing). But this is essentially the example I am using >>>> for >>>> teaching to show when not to do parallel processing..... >>>> >>>> Best, >>>> Uwe Ligges >>>> >>>> >>>> >>>> >>>> >>>> >>>>> /iaw >>>>> >>>>> >>>>> ---- >>>>> Ivo Welch (ivo.welch at gmail.com) >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From jholtman at gmail.com Sun Jul 3 02:40:21 2011 From: jholtman at gmail.com (jim holtman) Date: Sat, 2 Jul 2011 20:40:21 -0400 Subject: [R] 2D Random walk In-Reply-To: <1309361423196-3633205.post@n4.nabble.com> References: <1291305509081-3069557.post@n4.nabble.com> <1309361423196-3633205.post@n4.nabble.com> Message-ID: This should work. You have to return an object from a function; you can not try to reference an object within a function. So the value is returned and saved in an object called 'rw' since that is how you are referening it. walk.2d<-function(n) { rw <- matrix(0, ncol = 2, nrow = n) # generate the indices to set the deltas indx <- cbind(seq(n), sample(c(1, 2), n, TRUE)) # now set the values rw[indx] <- sample(c(-1, 1), n, TRUE) # cumsum the columns rw[,1] <- cumsum(rw[, 1]) rw[,2] <- cumsum(rw[, 2]) rw # return value } n<-1000 rw <- walk.2d(n) plot(0, type="n",xlab="x",ylab="y",main="Random Walk Simulation In Two Dimensions",xlim=range(rw[,1]),ylim=range(rw[,2])) # use 'segments' to color each path segments(head(rw[, 1], -1), head(rw[, 2], -1), tail(rw[, 1], -1), tail(rw[, 2], -1), col ="blue") On Wed, Jun 29, 2011 at 11:30 AM, Komal wrote: > HI Jholtman, > > walk.2d<-function(n) > { > rw <- matrix(0, ncol = 2, nrow = n) > > # generate the indices to set the deltas > indx <- cbind(seq(n), sample(c(1, 2), n, TRUE)) > > # now set the values > rw[indx] <- sample(c(-1, 1), n, TRUE) > > # cumsum the columns > rw[,1] <- cumsum(rw[, 1]) > rw[,2] <- cumsum(rw[, 2]) > > return(rw[,1],rw[,2]) > } > n<-1000 > > plot(walk.2d(n), type="n",xlab="x",ylab="y",main="Random Walk Simulation In > Two Dimensions",xlim=range(rw[,1]),ylim=range(rw[,2])) > > ? ? ? ?# use 'segments' to color each path > segments(head(rw[, 1], -1), head(rw[, 2], -1), tail(rw[, 1], -1), tail(rw[, > 2], -1), col ="blue") > > I tried to make it in a function.. its not working I dont know why... please > help me correcting this code. > > -- > View this message in context: http://r.789695.n4.nabble.com/2D-Random-walk-tp3069557p3633205.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From dwinsemius at comcast.net Sun Jul 3 04:57:33 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sat, 2 Jul 2011 22:57:33 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> On Jul 2, 2011, at 4:46 PM, Bansal, Vikas wrote: > DEAR ALL, > I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY... > > df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") > txt=df[,9] > txtvec <- readLines(textConnection(txt)) > dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), > C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 )), > G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 )), > T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 )), > N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if > ( x[[1]] != -1) > length(x) else 0 ))) > The unlist operation is unnecessary since the sapply operation returns a vector. (It doesn't hurt, but it is unnecessary.) > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Saturday, July 02, 2011 9:04 PM > To: Dennis Murphy > Cc: r-help at r-project.org; Bansal, Vikas > Subject: Re: [R] For help in R coding > > On reflection and a bit of testing I think the best approach would be > to use gregexpr. For counting the number of commas, this appears quite > straightforward. > >> sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1) > length(x) else 0 ) > [1] 3 3 3 4 3 3 2 6 4 6 6 > > It easily generalizes to period and the `|` (or) operation on letters. > ( did need to add the check since the length of gregexpr is always at > least one but ihas value -1 when there is no match > >> sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1) > length(x) else 0 ) > [1] 0 2 0 0 3 0 0 0 1 0 0 > > > On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote: > >> Hi: >> >> There seems to be a problem if the string ends in , or . , which >> makes >> it difficult for strsplit() to pick up if it is splitting on those >> characters. Here is an alternative, splitting on individual >> characters >> and using charmatch() instead: >> >> charsum <- function(s, char) { >> u <- strsplit(s, "") >> sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) >> } >> >> unname(sapply(txtvec, function(x) charsum(x, ','))) >> unname(sapply(txtvec, function(x) charsum(x, '.'))) >> >> Putting this into a data frame, >> >> dfout <- data.frame(periods = unname(sapply(txtvec, function(x) >> charsum(x, '.'))), >> commas = unname(sapply(txtvec, >> function(x) charsum(x, '.'))) ) >> txtvec >> >> HTH, >> Dennis >> >> On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius >> wrote: >>> >>> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: >>> >>>> >>>> >>>>>> Dear all, >>>>>> >>>>>> I am doing a project on variant calling using R.I am working on >>>>>> pileup file.There are 10 columns in my data frame and I want to >>>>>> count the number of A,C,G and T in each row for column 9.example >>>>>> of >>>>>> column 9 is given below- >>>>>> >>>>>> .a,g,, >>>>>> .t,t,, >>>>>> .,c,c, >>>>>> .,a,,, >>>>>> .,t,t,t >>>>>> .c,,g,^!. >>>>>> .g,ggg.^!, >>>>>> .$,,,,,., >>>>>> a,g,,t, >>>>>> ,,,,,.,^!. >>>>>> ,$,,,,.,. >>>>>> >>>>>> This is a bit confusing for me as these characters are in one >>>>>> column >>>>>> and how can we scan them for each row to print number of A,C,G >>>>>> and T >>>>>> for each row. >>>>> >>>>> Seems a bit clunky but this does the job (first the data): >>>>>> >>>>>> txt <- " .a,g,, >>>>> >>>>> + .t,t,, >>>>> + .,c,c, >>>>> + .,a,,, >>>>> + .,t,t,t >>>>> + .c,,g,^!. >>>>> + .g,ggg.^!, >>>>> + .$,,,,,., >>>>> + a,g,,t, >>>>> + ,,,,,.,^!. >>>>> + ,$,,,,.,." >>>>> >>>>>> txtvec <- readLines(textConnection(txt)) >>>>> >>>>> Now the clunky solution, Basically subtracts 1 from the counts of >>>>> "fragments" that result from splitting on each letter in turn. >>>>> Could >>>>> be made prettier with a function that did the job. >>>>> >>>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>> >>>>> split="a"), length) , "-", 1)), >>>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>>>> length) , "-", 1)), >>>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>>>> length) , "-", 1)), >>>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>>>> length) , "-", 1)) ) >>>>> A C G T >>>>> .a,g,, 1 0 1 0 >>>>> .t,t,, 0 0 0 2 >>>>> .,c,c, 0 2 0 0 >>>>> .,a,,, 1 0 0 0 >>>>> .,t,t,t 0 0 0 2 >>>>> .c,,g,^!. 0 1 1 0 >>>>> .g,ggg.^!, 0 0 4 0 >>>>> .$,,,,,., 0 0 0 0 >>>>> a,g,,t, 1 0 1 1 >>>>> ,,,,,.,^!. 0 0 0 0 >>>>> ,$,,,,.,. 0 0 0 0 >>>>> >>>>> Has the advantage that the input data ends up as rownames, which >>>>> was a >>>>> surprise. >>>>> >>>>> If you wanted to count "A" and "a" as equivalent, then the split >>>>> argument should be "a|A" >>>>> >>>>> >>>> >>>>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>>>>> LIKE >>>>>> THIS. >>>> >>>> BUT CAN I COUNT . AND , ALSO USING- >>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>> split=".|,"), length) , "-", 1)), >>>> >>>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME >>>> PLACES >>>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >>>> CALCULATING AND JUST SHOWING 0. >>> >>> You need to use valid regex expressions for 'split'. Since "." and >>> "," are >>> special characters they need to be escaped when you wnat the >>> literals to be >>> recognized as such. >>> >>> I haven't figured out why but you need to drop the final operation >>> of >>> subtracting 1 from the values when counting commas: >>> >>> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, >>> split="\\."), length) , "-", 1)) >>> ,commas = unlist( lapply( sapply(txtvec, strsplit, >>> split="\\,"), length) ) ) >>> periods commas >>> .a,g,, 1 3 >>> .t,t,, 1 3 >>> .,c,c, 1 3 >>> .,a,,, 1 4 >>> .,t,t,t 1 4 >>> .c,,g,^!. 1 4 >>> .g,ggg.^!, 2 2 >>> .$,,,,,., 2 6 >>> a,g,,t, 0 4 >>> ,,,,,.,^!. 1 7 >>> ,$,,,,.,. 1 7 >>> >>> -- >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT From tryingtolearnagain at gmail.com Sat Jul 2 21:40:24 2011 From: tryingtolearnagain at gmail.com (Trying To learn again) Date: Sat, 2 Jul 2011 21:40:24 +0200 Subject: [R] How many times occurs Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From vikas.bansal at kcl.ac.uk Sat Jul 2 20:57:54 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sat, 2 Jul 2011 19:57:54 +0100 Subject: [R] For help in R coding In-Reply-To: <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk>, , <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE6@KCL-MAIL01.kclad.ds.kcl.ac.uk> Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Saturday, July 02, 2011 6:19 PM To: Bansal, Vikas Cc: r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: > > >>> Dear all, >>> >>> I am doing a project on variant calling using R.I am working on >>> pileup file.There are 10 columns in my data frame and I want to >>> count the number of A,C,G and T in each row for column 9.example of >>> column 9 is given below- >>> >>> .a,g,, >>> .t,t,, >>> .,c,c, >>> .,a,,, >>> .,t,t,t >>> .c,,g,^!. >>> .g,ggg.^!, >>> .$,,,,,., >>> a,g,,t, >>> ,,,,,.,^!. >>> ,$,,,,.,. >>> >>> This is a bit confusing for me as these characters are in one column >>> and how can we scan them for each row to print number of A,C,G and T >>> for each row. >> >> Seems a bit clunky but this does the job (first the data): >>> txt <- " .a,g,, >> + .t,t,, >> + .,c,c, >> + .,a,,, >> + .,t,t,t >> + .c,,g,^!. >> + .g,ggg.^!, >> + .$,,,,,., >> + a,g,,t, >> + ,,,,,.,^!. >> + ,$,,,,.,." >> >>> txtvec <- readLines(textConnection(txt)) >> >> Now the clunky solution, Basically subtracts 1 from the counts of >> "fragments" that result from splitting on each letter in turn. Could >> be made prettier with a function that did the job. >> >>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split="a"), length) , "-", 1)), >> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >> length) , "-", 1)), >> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >> length) , "-", 1)), >> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >> length) , "-", 1)) ) >> A C G T >> .a,g,, 1 0 1 0 >> .t,t,, 0 0 0 2 >> .,c,c, 0 2 0 0 >> .,a,,, 1 0 0 0 >> .,t,t,t 0 0 0 2 >> .c,,g,^!. 0 1 1 0 >> .g,ggg.^!, 0 0 4 0 >> .$,,,,,., 0 0 0 0 >> a,g,,t, 1 0 1 1 >> ,,,,,.,^!. 0 0 0 0 >> ,$,,,,.,. 0 0 0 0 >> >> Has the advantage that the input data ends up as rownames, which >> was a >> surprise. >> >> If you wanted to count "A" and "a" as equivalent, then the split >> argument should be "a|A" >> >> > >>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>> LIKE THIS. > BUT CAN I COUNT . AND , ALSO USING- > data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, > split=".|,"), length) , "-", 1)), > > I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME > PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT > EVEN CALCULATING AND JUST SHOWING 0. You need to use valid regex expressions for 'split'. Since "." and "," are special characters they need to be escaped when you wnat the literals to be recognized as such. I haven't figured out why but you need to drop the final operation of subtracting 1 from the values when counting commas: data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, split="\\."), length) , "-", 1)) ,commas = unlist( lapply( sapply(txtvec, strsplit, split="\\,"), length) ) ) periods commas .a,g,, 1 3 .t,t,, 1 3 .,c,c, 1 3 .,a,,, 1 4 .,t,t,t 1 4 .c,,g,^!. 1 4 .g,ggg.^!, 2 2 .$,,,,,., 2 6 a,g,,t, 0 4 ,,,,,.,^!. 1 7 ,$,,,,.,. 1 7 -- David Winsemius, MD West Hartford, CT SOME OF THE VALUES ARE COMING INCORRECT.I DO NOT KNOW WHY BUT IF YOU WILL SEE YOUR OUTPUT SOME OF COMMAS ARE 7 BUT ACTUALLY THERE ARE 6.THIS SAME PROBLEM IS OCCURRING DURING ALPHABETS ALSO WHEN I USE THIS- data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c|C"), length) , "-", 1)),G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g|G"), length) , "-", 1)),T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t|T"), length) , "-", 1)) ) I DONT KNOW WHY THIS CODE IS NOT CALCULATING THE EXACT NUMBER.CAN YOU PLEASE CHECK IT? From vikas.bansal at kcl.ac.uk Sat Jul 2 22:21:31 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sat, 2 Jul 2011 21:21:31 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE7@KCL-MAIL01.kclad.ds.kcl.ac.uk> HI THIS SEEMS LITTLE BIT CONFUSING.BUT I AM USING THIS CODING AS SUGGESTED BY YOU- df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") txt=df[,9] txtvec <- readLines(textConnection(txt)) vik=data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c|C"), length) , "-", 1)),G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g|G"), length) , "-", 1)),T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t|T"), length) , "-", 1)) ) THE THING IS,AT SOME PLACES IT IS CALCULATING PERFECTLY BUT AT SOME POSITIONS IT IS NOT.I AM TRYING TO FIND OUT THE SOLUTION IN BOOKS,ON THE NET BUT I DONT KNOW WHY THERE IS NOTHING RELATED TO THIS.I THINK THIS CODING SEEMS TO BE GOOD BUT I AM MISSING SOMETHING. FOR YOUR CONVENIENCE I HAVE ATTACHED MY Case2.pileup file. I AM VERY THANKFUL TO YOU AND APPRECIATE THAT YOU ARE HELPING AND TAKING YOUR PRECIOUS TIME. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: Dennis Murphy [djmuser at gmail.com] Sent: Saturday, July 02, 2011 8:22 PM To: r-help at r-project.org Cc: Bansal, Vikas; David Winsemius Subject: Re: [R] For help in R coding Hi: There seems to be a problem if the string ends in , or . , which makes it difficult for strsplit() to pick up if it is splitting on those characters. Here is an alternative, splitting on individual characters and using charmatch() instead: charsum <- function(s, char) { u <- strsplit(s, "") sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) } unname(sapply(txtvec, function(x) charsum(x, ','))) unname(sapply(txtvec, function(x) charsum(x, '.'))) Putting this into a data frame, dfout <- data.frame(periods = unname(sapply(txtvec, function(x) charsum(x, '.'))), commas = unname(sapply(txtvec, function(x) charsum(x, '.'))) ) txtvec HTH, Dennis On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius wrote: > > On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: > >> >> >>>> Dear all, >>>> >>>> I am doing a project on variant calling using R.I am working on >>>> pileup file.There are 10 columns in my data frame and I want to >>>> count the number of A,C,G and T in each row for column 9.example of >>>> column 9 is given below- >>>> >>>> .a,g,, >>>> .t,t,, >>>> .,c,c, >>>> .,a,,, >>>> .,t,t,t >>>> .c,,g,^!. >>>> .g,ggg.^!, >>>> .$,,,,,., >>>> a,g,,t, >>>> ,,,,,.,^!. >>>> ,$,,,,.,. >>>> >>>> This is a bit confusing for me as these characters are in one column >>>> and how can we scan them for each row to print number of A,C,G and T >>>> for each row. >>> >>> Seems a bit clunky but this does the job (first the data): >>>> >>>> txt <- " .a,g,, >>> >>> + .t,t,, >>> + .,c,c, >>> + .,a,,, >>> + .,t,t,t >>> + .c,,g,^!. >>> + .g,ggg.^!, >>> + .$,,,,,., >>> + a,g,,t, >>> + ,,,,,.,^!. >>> + ,$,,,,.,." >>> >>>> txtvec <- readLines(textConnection(txt)) >>> >>> Now the clunky solution, Basically subtracts 1 from the counts of >>> "fragments" that result from splitting on each letter in turn. Could >>> be made prettier with a function that did the job. >>> >>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>> >>> split="a"), length) , "-", 1)), >>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>> length) , "-", 1)), >>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>> length) , "-", 1)), >>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>> length) , "-", 1)) ) >>> A C G T >>> .a,g,, 1 0 1 0 >>> .t,t,, 0 0 0 2 >>> .,c,c, 0 2 0 0 >>> .,a,,, 1 0 0 0 >>> .,t,t,t 0 0 0 2 >>> .c,,g,^!. 0 1 1 0 >>> .g,ggg.^!, 0 0 4 0 >>> .$,,,,,., 0 0 0 0 >>> a,g,,t, 1 0 1 1 >>> ,,,,,.,^!. 0 0 0 0 >>> ,$,,,,.,. 0 0 0 0 >>> >>> Has the advantage that the input data ends up as rownames, which was a >>> surprise. >>> >>> If you wanted to count "A" and "a" as equivalent, then the split >>> argument should be "a|A" >>> >>> >> >>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE >>>> THIS. >> >> BUT CAN I COUNT . AND , ALSO USING- >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split=".|,"), length) , "-", 1)), >> >> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES >> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >> CALCULATING AND JUST SHOWING 0. > > You need to use valid regex expressions for 'split'. Since "." and "," are > special characters they need to be escaped when you wnat the literals to be > recognized as such. > > I haven't figured out why but you need to drop the final operation of > subtracting 1 from the values when counting commas: > > data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, > split="\\."), length) , "-", 1)) > ,commas = unlist( lapply( sapply(txtvec, strsplit, > split="\\,"), length) ) ) > periods commas > .a,g,, 1 3 > .t,t,, 1 3 > .,c,c, 1 3 > .,a,,, 1 4 > .,t,t,t 1 4 > .c,,g,^!. 1 4 > .g,ggg.^!, 2 2 > .$,,,,,., 2 6 > a,g,,t, 0 4 > ,,,,,.,^!. 1 7 > ,$,,,,.,. 1 7 > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From vikas.bansal at kcl.ac.uk Sat Jul 2 22:46:32 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sat, 2 Jul 2011 21:46:32 +0100 Subject: [R] For help in R coding In-Reply-To: <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk> DEAR ALL, I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY... df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") txt=df[,9] txtvec <- readLines(textConnection(txt)) dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 )), C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 )), G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 )), T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 )), N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 ))) Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Saturday, July 02, 2011 9:04 PM To: Dennis Murphy Cc: r-help at r-project.org; Bansal, Vikas Subject: Re: [R] For help in R coding On reflection and a bit of testing I think the best approach would be to use gregexpr. For counting the number of commas, this appears quite straightforward. > sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 ) [1] 3 3 3 4 3 3 2 6 4 6 6 It easily generalizes to period and the `|` (or) operation on letters. ( did need to add the check since the length of gregexpr is always at least one but ihas value -1 when there is no match > sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1) length(x) else 0 ) [1] 0 2 0 0 3 0 0 0 1 0 0 On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote: > Hi: > > There seems to be a problem if the string ends in , or . , which makes > it difficult for strsplit() to pick up if it is splitting on those > characters. Here is an alternative, splitting on individual characters > and using charmatch() instead: > > charsum <- function(s, char) { > u <- strsplit(s, "") > sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) > } > > unname(sapply(txtvec, function(x) charsum(x, ','))) > unname(sapply(txtvec, function(x) charsum(x, '.'))) > > Putting this into a data frame, > > dfout <- data.frame(periods = unname(sapply(txtvec, function(x) > charsum(x, '.'))), > commas = unname(sapply(txtvec, > function(x) charsum(x, '.'))) ) > txtvec > > HTH, > Dennis > > On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius > wrote: >> >> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: >> >>> >>> >>>>> Dear all, >>>>> >>>>> I am doing a project on variant calling using R.I am working on >>>>> pileup file.There are 10 columns in my data frame and I want to >>>>> count the number of A,C,G and T in each row for column 9.example >>>>> of >>>>> column 9 is given below- >>>>> >>>>> .a,g,, >>>>> .t,t,, >>>>> .,c,c, >>>>> .,a,,, >>>>> .,t,t,t >>>>> .c,,g,^!. >>>>> .g,ggg.^!, >>>>> .$,,,,,., >>>>> a,g,,t, >>>>> ,,,,,.,^!. >>>>> ,$,,,,.,. >>>>> >>>>> This is a bit confusing for me as these characters are in one >>>>> column >>>>> and how can we scan them for each row to print number of A,C,G >>>>> and T >>>>> for each row. >>>> >>>> Seems a bit clunky but this does the job (first the data): >>>>> >>>>> txt <- " .a,g,, >>>> >>>> + .t,t,, >>>> + .,c,c, >>>> + .,a,,, >>>> + .,t,t,t >>>> + .c,,g,^!. >>>> + .g,ggg.^!, >>>> + .$,,,,,., >>>> + a,g,,t, >>>> + ,,,,,.,^!. >>>> + ,$,,,,.,." >>>> >>>>> txtvec <- readLines(textConnection(txt)) >>>> >>>> Now the clunky solution, Basically subtracts 1 from the counts of >>>> "fragments" that result from splitting on each letter in turn. >>>> Could >>>> be made prettier with a function that did the job. >>>> >>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>> >>>> split="a"), length) , "-", 1)), >>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>>> length) , "-", 1)), >>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>>> length) , "-", 1)), >>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>>> length) , "-", 1)) ) >>>> A C G T >>>> .a,g,, 1 0 1 0 >>>> .t,t,, 0 0 0 2 >>>> .,c,c, 0 2 0 0 >>>> .,a,,, 1 0 0 0 >>>> .,t,t,t 0 0 0 2 >>>> .c,,g,^!. 0 1 1 0 >>>> .g,ggg.^!, 0 0 4 0 >>>> .$,,,,,., 0 0 0 0 >>>> a,g,,t, 1 0 1 1 >>>> ,,,,,.,^!. 0 0 0 0 >>>> ,$,,,,.,. 0 0 0 0 >>>> >>>> Has the advantage that the input data ends up as rownames, which >>>> was a >>>> surprise. >>>> >>>> If you wanted to count "A" and "a" as equivalent, then the split >>>> argument should be "a|A" >>>> >>>> >>> >>>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>>>> LIKE >>>>> THIS. >>> >>> BUT CAN I COUNT . AND , ALSO USING- >>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>> split=".|,"), length) , "-", 1)), >>> >>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME >>> PLACES >>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >>> CALCULATING AND JUST SHOWING 0. >> >> You need to use valid regex expressions for 'split'. Since "." and >> "," are >> special characters they need to be escaped when you wnat the >> literals to be >> recognized as such. >> >> I haven't figured out why but you need to drop the final operation of >> subtracting 1 from the values when counting commas: >> >> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split="\\."), length) , "-", 1)) >> ,commas = unlist( lapply( sapply(txtvec, strsplit, >> split="\\,"), length) ) ) >> periods commas >> .a,g,, 1 3 >> .t,t,, 1 3 >> .,c,c, 1 3 >> .,a,,, 1 4 >> .,t,t,t 1 4 >> .c,,g,^!. 1 4 >> .g,ggg.^!, 2 2 >> .$,,,,,., 2 6 >> a,g,,t, 0 4 >> ,,,,,.,^!. 1 7 >> ,$,,,,.,. 1 7 >> >> -- >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> David Winsemius, MD West Hartford, CT From awards at windows7download.com Sat Jul 2 22:51:16 2011 From: awards at windows7download.com (Windows 7 Download) Date: Sat, 2 Jul 2011 22:51:16 +0200 (CEST) Subject: [R] R for Windows - 5 stars award on Windows 7 Download Message-ID: <20110702205116.529F58D5F3C@ibm1.websys.sk> Dear R Development Core Team We are more than happy that Windows 7 was launched after long restless period of waiting. Due to this expected moment, we prepared and launched new Windows 7 download website that will be used by new Windows 7 customers to look for software compatible with Windows 7. R for Windows has been reviewed by Windows 7 Download and got 5 stars award: http://www.windows7download.com/win7-r-for-windows/snvrckjh.html Draw attention to your product by making it visible on webpage that will be used by people who are devoted to Windows 7. The number of Windows 7 customers will rise to huge quantity in a short period of time. Please publish Windows 7 Download award on your website by adding the following HTML code: 160 x 80: Windows 7 Download 120 x 60: Windows 7 Download Windows 7 compatible logo: R for Windows - Windows 7 compatible Text link: 5 Stars Awarded on Windows 7 Download Here are more images and options how link to us: http://www.windows7download.com/linktous.html Operation System Windows 7 is already looking forward to its rising popularity and this is the reason why you should not forgot this unique opportunity of rising interest of clients on software compatible with Windows 7. Highlight the facilities of your product by placing it on Windows 7 Download.com, which can help you to bring new valuable users. We're looking forward for further co-operation. Best regards, Windows 7 Download http://www.windows7download.com/ From tristan.linke at gmail.com Sun Jul 3 01:07:33 2011 From: tristan.linke at gmail.com (Tristan Linke) Date: Sun, 3 Jul 2011 00:07:33 +0100 Subject: [R] Simulating inhomogeneous Poisson process without loop Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From n.bowora at gmail.com Sun Jul 3 02:52:25 2011 From: n.bowora at gmail.com (EdBo) Date: Sat, 2 Jul 2011 17:52:25 -0700 (PDT) Subject: [R] Hint improve my code Message-ID: <1309654345875-3641354.post@n4.nabble.com> Hi I have developed the code below. I am worried that the parameters I want to be estimated are "not being found" when I ran my code. Is there a way I can code them so that R recognize that they should be estimated. This is the error I am getting. > out1=optim(llik,par=start.par) Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) : object 'au_j' not found #Yet al_j,au_j,sigma_j and b_j are just estimates that balance the likelihood function? llik=function(R_j,R_m) if(R_j< 0) { sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2] }else if(R_j>0) { sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2] }else if(R_j==0) { sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j))) } start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1) out1=optim(par=start.par,llik) My Data R_j R_m 2e-03 0.026567295 3e-03 0.009798475 5e-02 0.008497274 -1e-02 0.012464578 -9e-04 0.002896023 9e-02 0.000879473 1e-02 0.003194435 6e-04 0.010281122 Thank you in advance. Edward UCT -- View this message in context: http://r.789695.n4.nabble.com/Hint-improve-my-code-tp3641354p3641354.html Sent from the R help mailing list archive at Nabble.com. From shbmira at gmail.com Sun Jul 3 06:51:37 2011 From: shbmira at gmail.com (Sergio Mira) Date: Sun, 03 Jul 2011 01:51:37 -0300 Subject: [R] Error with package xlsx Message-ID: <4E0FF559.6070408@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bhh at xs4all.nl Sun Jul 3 06:59:43 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Sat, 2 Jul 2011 21:59:43 -0700 (PDT) Subject: [R] Hint improve my code In-Reply-To: <1309654345875-3641354.post@n4.nabble.com> References: <1309654345875-3641354.post@n4.nabble.com> Message-ID: <1309669183190-3641520.post@n4.nabble.com> EdBo wrote: > > Hi > > I have developed the code below. I am worried that the parameters I want > to be estimated are "not being found" when I ran my code. Is there a way I > can code them so that R recognize that they should be estimated. > > This is the error I am getting. > >> out1=optim(llik,par=start.par) > Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) : > object 'au_j' not found > > #Yet al_j,au_j,sigma_j and b_j are just estimates that balance the > likelihood function? > > llik=function(R_j,R_m) > if(R_j< 0) > { > sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2] > }else if(R_j>0) > { > sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2] > }else if(R_j==0) > { > sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j))) > } > start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1) > out1=optim(par=start.par,llik) > > My Data > > R_j R_m > 2e-03 0.026567295 > 3e-03 0.009798475 > 5e-02 0.008497274 > -1e-02 0.012464578 > -9e-04 0.002896023 > 9e-02 0.000879473 > 1e-02 0.003194435 > 6e-04 0.010281122 > You are only passing the R_j and R_m data as argument to your function. The parameters that are to be used for the maximization must also be passed as arguments. This is clearly explained in the help page for optim. So your function should read llik <- function(par, R_j, R_m) { al_j <- par[1] au_j <- par[2] sigma_j <- par[3] b_j <- par[4] if(R_j< 0) { sum(log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2) } else if(R_j>0) { sum(log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2) } else if(R_j==0) { sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j))) } } And then use optim as follows: out1 <- optim(par=start.par,llik, R_j=R_j, R_m=R_m) But this is not going to work as you want it to. You'll get warnings: 1: In if (R_j < 0) { ... : the condition has length > 1 and only the first element will be used R_j is a vector and if() takes a scalar (or vector of length 1). So it is not at all clear what the purpose of the if statement is. Berend -- View this message in context: http://r.789695.n4.nabble.com/Hint-improve-my-code-tp3641354p3641520.html Sent from the R help mailing list archive at Nabble.com. From jwiley.psych at gmail.com Sun Jul 3 07:24:01 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Sat, 2 Jul 2011 22:24:01 -0700 Subject: [R] Hint improve my code In-Reply-To: <1309654345875-3641354.post@n4.nabble.com> References: <1309654345875-3641354.post@n4.nabble.com> Message-ID: Hi Edward, One hint to improve your code is simply stylistic. Everyone tends to have their own favorite way, but adding spaces and line breaks can help make code much easier to read. Also, you seemed to have used brackets "[" instead of parentheses "(" for the first two calls to sum(). As Berend suggested, some more clarification on what is supposed to be happening will help you get more feedback. Lastly, although your data is readable as is, a very convenient way to provide the list with data is using the function, dput(). Here is a little example demonstrating its use: > x <- data.frame(x1 = c("c", "b", "a"), + x2 = factor(c("c", "b", "a")), stringsAsFactors = FALSE) > x ## print to console, two columns appear identical x1 x2 1 c c 2 b b 3 a a > ## but looking at the str()ucture, they are different > str(x) 'data.frame': 3 obs. of 2 variables: $ x1: chr "c" "b" "a" $ x2: Factor w/ 3 levels "a","b","c": 3 2 1 > ## use dput() to provide copy-and-pastable output > ## that others can use to access the same data as you easy peasy > dput(x) structure(list(x1 = c("c", "b", "a"), x2 = structure(c(3L, 2L, 1L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("x1", "x2"), row.names = c(NA, -3L), class = "data.frame") > > ## now someone else is ready to go with your data > newx <- structure(list(x1 = c("c", "b", "a"), x2 = structure(c(3L, 2L, + 1L), .Label = c("a", "b", "c"), class = "factor")), .Names = c("x1", + "x2"), row.names = c(NA, -3L), class = "data.frame") > > str(newx) 'data.frame': 3 obs. of 2 variables: $ x1: chr "c" "b" "a" $ x2: Factor w/ 3 levels "a","b","c": 3 2 1 > And here is an example of how I might have written your function (stylistically speaking). I am not advocating this as the "best", "correct", or "ideal", way. Just giving another example that (I at least) find easier to read and make sense of. llik <- function(R_j, R_m) { if (R_j < 0) { sum(log(1 / (2 * pi * (sigma_j^2))) - (1 / (2 * (sigma_j^2))*(R_j + al_j - b_j * R_m))^2) } else if (R_j > 0) { sum(log(1 / (2 * pi * (sigma_j^2))) - (1 / (2 * (sigma_j^2)) * (R_j + au_j - b_j * R_m))^2) } else if (R_j == 0) { sum(log(pnorm(au_j, mean = b_j * R_m, sd = sigma_j) - pnorm(al_j, mean = b_j * R_m, sd = sigma_j))) } } Cheers, Josh On Sat, Jul 2, 2011 at 5:52 PM, EdBo wrote: > Hi > > I have developed the code below. I am worried that the parameters I want to > be estimated are "not being found" when I ran my code. Is there a way I can > code them so that R recognize that they should be estimated. > > This is the error I am getting. > >> out1=optim(llik,par=start.par) > Error in pnorm(au_j, mean = b_j * R_m, sd = sigma_j) : > ?object 'au_j' not found > > #Yet al_j,au_j,sigma_j and b_j are just estimates that balance the > likelihood function? > > llik=function(R_j,R_m) > if(R_j< 0) > { > sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2] > }else if(R_j>0) > { > sum[log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2] > }else if(R_j==0) > { > sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j))) > } > start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1) > out1=optim(par=start.par,llik) > > My Data > > ? ? R_j ? ? ? ? R_m > ?2e-03 ? 0.026567295 > ?3e-03 ? 0.009798475 > ?5e-02 ? 0.008497274 > ?-1e-02 ? 0.012464578 > ?-9e-04 ? 0.002896023 > ?9e-02 ? 0.000879473 > ?1e-02 ? 0.003194435 > ?6e-04 ? 0.010281122 > > Thank you in advance. > > Edward > UCT > > -- > View this message in context: http://r.789695.n4.nabble.com/Hint-improve-my-code-tp3641354p3641354.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From Rainer.Schuermann at gmx.net Sun Jul 3 08:53:37 2011 From: Rainer.Schuermann at gmx.net (Rainer Schuermann) Date: Sun, 03 Jul 2011 08:53:37 +0200 Subject: [R] How many times occurs In-Reply-To: References: Message-ID: <2483451.pVaqqyxf0r@augeatur> On Saturday 02 July 2011 21:40:24 Trying To learn again wrote: Clumsy but it works (replace the bingo stuff with what you want to do next): x <- as.matrix( read.table( "input.txt") ) xdim <- dim( x ) ix <- xdim[ 1 ] jx <- xdim[ 2 ] - 2 bingo <- 0 for( i in 1:ix ) { for( j in 1:jx ) { if( x[i,j] == 8 && x[i,j+1] == 9 && x[i,j+2] == 2 ) { bingo <- bingo + 1 } } } print( bingo ) I'm sure there are more elegant and efficient solutions! Rgds, Rainer Here the matrix as dput( x ): structure(c(8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 8L, 5L, 5L, 8L, 5L, 4L, 9L, 4L, 4L, 9L, 4L, 5L, 2L, 5L, 5L, 2L, 5L, 8L, 8L, 8L, 8L, 8L, 8L, 5L, 9L, 5L, 5L, 9L, 9L, 6L, 2L, 6L, 6L, 2L, 2L, 6L, 1L, 4L, 6L, 1L, 2L), .Dim = c(6L, 10L), .Dimnames = list(NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"))) From Rainer.Schuermann at gmx.net Sun Jul 3 09:32:53 2011 From: Rainer.Schuermann at gmx.net (Rainer Schuermann) Date: Sun, 03 Jul 2011 09:32:53 +0200 Subject: [R] How many times occurs In-Reply-To: References: Message-ID: <3576298.QYAHqL2rXg@augeatur> Sorry, I forgot to attach the original post, so here once more with a cosmetic change: x <- as.matrix( read.table( "matr.txt") ) bingo <- 0 for( i in 1:dim( x )[1] ) { for( j in 1:dim( x )[2] - 2 ) { if( x[i,j] == 8 && x[i,j+1] == 9 && x[i,j+2] == 2 ) { bingo <- bingo + 1 } } } print( bingo ) Rgds, Rainer On Saturday 02 July 2011 21:40:24 Trying To learn again wrote: > Hi all, > > I have a data matrix likein "input.txt" > > 8 9 2 5 4 5 8 5 6 6 > 8 9 2 8 9 2 8 9 2 1 > 8 9 2 5 4 5 8 5 6 4 > 8 9 2 5 4 5 8 5 6 6 > 8 9 2 8 9 2 8 9 2 1 > 8 9 2 5 4 5 8 9 2 2 > > > In this example will be an 6x10 matrix (or data frame) > > I want to detect how many times in a row appears this combination 8 follewd > by 9 followed by 2, and create a new matrix with only this number of occurs > then if this number is less than "n" I keep the row. For example in the > last row the number n will be 2 because "series" 8 9 2 appears 2 times in > the same row. > > I tried this, but doesn?t works....also tried other thinks but also the same > results: > > *dat<-read.table('input1.txt')* > ** > ** > *dat1 <- dat[ dat[,1]=8 & dat[,2]=9 & dat[,3]=2 ,]=1* > *dat2<-dat[(dat[,2]= 8 & dat[,3]=9 & dat[,4]=2),]=1* > *dat3<-dat[(dat[,5]=8 & dat[,4]=9 & dat[,5]=2),]=1* > *dat4<-dat[(dat[,4]=8 & dat[,5]=9 & dat[,6]=2),]=1* > *dat5<-dat[(dat[,5]=8 & dat[,6]=9 & dat[,7]=2),]=1* > *dat6<-dat[(dat[,6]=8 & dat[,7]=9 & dat[,8]=2),6]=1* > *dat7<-dat[(dat[,7]=8 & dat[,8]=9 & dat[,9]=2),7]=1* > *dat8<-dat[(dat[,8]=8 & dat[,9]=9 & dat[,10]=2),8]=1* > ** > datfinal<-dat1+da2+dat3+dat4+dat5+dat6+dat7+dat8 > > final2 <- dat[ rowSums(datfinal) < 2 , ] > > So my last matrix "final2" will be "dat" without the rows that doesn?t pass > the conditions. > > [[alternative HTML version deleted]] > dput( x ): structure(c(8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 2L, 2L, 2L, 2L, 2L, 2L, 5L, 8L, 5L, 5L, 8L, 5L, 4L, 9L, 4L, 4L, 9L, 4L, 5L, 2L, 5L, 5L, 2L, 5L, 8L, 8L, 8L, 8L, 8L, 8L, 5L, 9L, 5L, 5L, 9L, 9L, 6L, 2L, 6L, 6L, 2L, 2L, 6L, 1L, 4L, 6L, 1L, 2L), .Dim = c(6L, 10L), .Dimnames = list(NULL, c("V1", "V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10"))) From osama52 at gmail.com Sun Jul 3 10:48:34 2011 From: osama52 at gmail.com (osama hussien) Date: Sun, 3 Jul 2011 10:48:34 +0200 Subject: [R] covariance matrix TL-moments Message-ID: The backage lmoments computes the L-moments covariances. does anyone know a backage to compute the TL-moments covariances thank -- Osama Abdelaziz Hussien Department of Statistics Faculty of Commerce Alexandria University Egypt From stam_kiral at hotmail.com Sun Jul 3 09:48:08 2011 From: stam_kiral at hotmail.com (stamkiral) Date: Sun, 3 Jul 2011 00:48:08 -0700 (PDT) Subject: [R] help for hierarchial regression output Message-ID: <1309679288724-3641615.post@n4.nabble.com> hi everybody, Im struggeling with hierarchial regression analysis. I need some need to interpret output.... consider this is my hierarchial regression analysis output: http://www.hizliupload.com/img/88056685462955425818.jpg ? want to ask question about how to interpret 2 way interaction. both in block 2 and block 3 (in block 3, 3 way interacitons are also exist!!!!!) two way interaction values are exist. but while in block 2, self blame X justice world belief interaction is significant (p=.014, drew with green color), however in block 3, this interaction is not significant. now Im confused with this situation, which value is correct???? should ? report that there is sig. self blame X jwb two way effect or not?????? If in first block my IVs are sig. should ? say there is a sig. main effect or not??? or have ? to look at last block for all interactions and main effects???? thanks, kahraman -- View this message in context: http://r.789695.n4.nabble.com/help-for-hierarchial-regression-output-tp3641615p3641615.html Sent from the R help mailing list archive at Nabble.com. From tryingtolearnagain at gmail.com Sun Jul 3 09:27:35 2011 From: tryingtolearnagain at gmail.com (Trying To learn again) Date: Sun, 3 Jul 2011 09:27:35 +0200 Subject: [R] How many times occurs In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dcgomuiri at gmail.com Sun Jul 3 11:13:37 2011 From: dcgomuiri at gmail.com (Daithi Murray) Date: Sun, 3 Jul 2011 17:13:37 +0800 Subject: [R] using Arial font Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From romanponce at hotmail.com Sun Jul 3 13:21:38 2011 From: romanponce at hotmail.com (Sergio Ivan Roman Ponce) Date: Sun, 3 Jul 2011 13:21:38 +0200 Subject: [R] PROBLEM IN R version 2.13.0 (2011-04-13) Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From romanpsergio at gmail.com Sun Jul 3 13:16:06 2011 From: romanpsergio at gmail.com (Sergio Ivan Roman ponce) Date: Sun, 3 Jul 2011 13:16:06 +0200 Subject: [R] PROBLEM IN R version 2.13.0 (2011-04-13) Message-ID: <000601cc3972$9b2797e0$d176c7a0$@gmail.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From murdoch.duncan at gmail.com Sun Jul 3 13:40:02 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Sun, 03 Jul 2011 07:40:02 -0400 Subject: [R] PROBLEM IN R version 2.13.0 (2011-04-13) In-Reply-To: References: Message-ID: <4E105512.8020402@gmail.com> On 11-07-03 7:21 AM, Sergio Ivan Roman Ponce wrote: > I am using R (version 2.13.0; 2011-04-13). Platform: x86_64-pc-mingw32/x64 > (64-bit). > > > > The messages that I see is > > > > ?This application has requested the Runtime to terminate it in an unusual > way. > > Please contact the application's support team for more information? > > > > I am using the following libraries: > >> library(package="gmodels") > >> library(package="gtools") > >> library(package="bitops") > >> library(package="caTools") > >> library(package="grid") > >> library(package="gdata") > > gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. > > > > gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. > > > > Attaching package: 'gdata' > > > > The following object(s) are masked from 'package:stats': > > > > nobs > > > > The following object(s) are masked from 'package:utils': > > > > object.size > > > >> library(package="gplots") > > > > Attaching package: 'gplots' > > > > The following object(s) are masked from 'package:stats': > > > > lowess > > > >> library(package="lattice") > >> library(package="MASS") > >> library(package="gregmisc") > > Mensajes de aviso perdidos > > > > The `gregmisc' *package* has converted into a *bundle* > > containing four sub-packages: gdata, gtools, gmodels, and gplots. > > Please load these packages directly. > >> library(package="asreml") > >> library(package="GraphAlignment") > > > > > > > > I am searching how to solve this problem. Isolate it, by cutting packages out of your list until the problem goes away. Find the minimal set of packages that trigger the crash. If you can get it down to just one package, then contact its maintainer. If it requires several, then post your script to load them, and someone else will see if the problem is reproducible. Duncan Murdoch From ggrothendieck at gmail.com Sun Jul 3 14:00:36 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Sun, 3 Jul 2011 08:00:36 -0400 Subject: [R] How many times occurs In-Reply-To: References: Message-ID: On Sat, Jul 2, 2011 at 3:40 PM, Trying To learn again wrote: > Hi all, > > I have a data matrix likein "input.txt" > > 8 9 2 5 4 5 8 5 6 6 > 8 9 2 8 9 2 8 9 2 1 > 8 9 2 5 4 5 8 5 6 4 > ?8 9 2 5 4 5 8 5 6 6 > 8 9 2 8 9 2 8 9 2 1 > 8 9 2 5 4 5 8 9 2 ? 2 > > > In this example will be an ?6x10 matrix (or data frame) > > I want to detect how many times in a row appears this combination ?8 follewd > by 9 followed by 2, and create a new matrix with only this number of occurs > then if this number is less than "n" I keep the row. For example in the > last row the number n will be 2 because "series" 8 9 2 appears 2 times in > the same row. > For each column of t(dat) apply the indicated function to successive triples and then sum the resulting columns: > library(zoo) > colSums(rollapply(zoo(t(dat)), 3, function(x) all(x == c(8, 9, 2)))) [1] 1 3 1 1 3 2 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From dwinsemius at comcast.net Sun Jul 3 14:24:02 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 3 Jul 2011 08:24:02 -0400 Subject: [R] Error with package xlsx In-Reply-To: <4E0FF559.6070408@gmail.com> References: <4E0FF559.6070408@gmail.com> Message-ID: <9E9F228F-40E9-4DD5-9AE5-DDE4B73BF945@comcast.net> On Jul 3, 2011, at 12:51 AM, Sergio Mira wrote: > Hi, > > Could anyone help me with this error? I have no idea what that is > about... > >> file <- system.file("DADOSCASUMCCORMICK.xlsx", package = "xlsx") > > data <- read.xlsx(file, 1) > Error in .jnew("java/io/FileInputStream", file) : > java.io.FileNotFoundException: (No such file or directory) Is there really a file by that name in your package directory? When I do that on my machine, file is "" after the first command, which is the documented behavior when the file cannot be found. My guess is that you need to provide a path and append it to the file name before sending a request to system.file. I'm also suggestinging that you don't really want to be saving you work files in system directories and you should be using getwd() and paste()-ing to your file name. > > Thanks! > > -- > Regards, > || ------ > || Sergio Henrique Bento de Mira > || Computer Science | Class of 2008/2 David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Sun Jul 3 15:15:51 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 3 Jul 2011 09:15:51 -0400 Subject: [R] Error with package xlsx In-Reply-To: <9E9F228F-40E9-4DD5-9AE5-DDE4B73BF945@comcast.net> References: <4E0FF559.6070408@gmail.com> <9E9F228F-40E9-4DD5-9AE5-DDE4B73BF945@comcast.net> Message-ID: On Jul 3, 2011, at 8:24 AM, David Winsemius wrote: > > On Jul 3, 2011, at 12:51 AM, Sergio Mira wrote: > >> Hi, >> >> Could anyone help me with this error? I have no idea what that is >> about... >> >>> file <- system.file("DADOSCASUMCCORMICK.xlsx", package = "xlsx") >> > data <- read.xlsx(file, 1) >> Error in .jnew("java/io/FileInputStream", file) : >> java.io.FileNotFoundException: (No such file or directory) > > Is there really a file by that name in your package directory? When > I do that on my machine, file is "" after the first command, which > is the documented behavior when the file cannot be found. > > My guess is that you need to provide a path and append it to the > file name before sending a request to system.file. I'm also > suggestinging that you don't really want to be saving you work files > in system directories and you should be using getwd() and paste()- > ing to your file name. On re-reading this I see that it was unclear that I was suggesting _not_ using system.file for this task. Just use getwd() and paste to construct a path/file.ext. >> >> Thanks! >> >> -- >> Regards, >> || ------ >> || Sergio Henrique Bento de Mira >> || Computer Science | Class of 2008/2 > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From jrkrideau at yahoo.ca Sun Jul 3 15:54:01 2011 From: jrkrideau at yahoo.ca (John Kane) Date: Sun, 3 Jul 2011 06:54:01 -0700 (PDT) Subject: [R] R for Windows - 5 stars award on Windows 7 Download In-Reply-To: <20110702205116.529F58D5F3C@ibm1.websys.sk> Message-ID: <1309701241.25116.YahooMailClassic@web38407.mail.mud.yahoo.com> SPAM? --- On Sat, 7/2/11, Windows 7 Download wrote: > From: Windows 7 Download > Subject: [R] R for Windows - 5 stars award on Windows 7 Download > To: "R Development Core Team" > Received: Saturday, July 2, 2011, 4:51 PM > Dear R Development Core Team > > We are more than happy that Windows 7 was launched after > long restless period of waiting. Due to this expected > moment, we prepared and launched new Windows 7 download > website that will be used by new Windows 7 customers to look > for software compatible with Windows 7. > > R for Windows has been reviewed by Windows 7 Download and > got 5 stars award: http://www.windows7download.com/win7-r-for-windows/snvrckjh.html > > Draw attention to your product by making it visible on > webpage that will be used by people who are devoted to > Windows 7. The number of Windows 7 customers will rise to > huge quantity in a short period of time. > > Please publish Windows 7 Download award on your website by > adding the following HTML code: > > 160 x 80: > target="_blank"> alt="Windows 7 Download" border="0"/> > > 120 x 60: > target="_blank"> alt="Windows 7 Download" border="0"/> > > Windows 7 compatible logo: > target="_blank"> alt="R for Windows - Windows 7 compatible" > border="0"/> > > Text link: > target="_blank">5 Stars Awarded on Windows 7 > Download > > Here are more images and options how link to us: http://www.windows7download.com/linktous.html > > Operation System Windows 7 is already looking forward to > its rising popularity and this is the reason why you should > not forgot this unique opportunity of rising interest of > clients on software compatible with Windows 7.? > Highlight the facilities of your product by placing it on > Windows 7 Download.com, which can help you to bring new > valuable users. > > We're looking forward for further co-operation. > > Best regards, > Windows 7 Download > > From ted.harding at wlandres.net Sun Jul 3 16:19:15 2011 From: ted.harding at wlandres.net ( (Ted Harding)) Date: Sun, 03 Jul 2011 15:19:15 +0100 (BST) Subject: [R] R for Windows - 5 stars award on Windows 7 Download In-Reply-To: <1309701241.25116.YahooMailClassic@web38407.mail.mud.yahoo.com> Message-ID: Yes, Spam. Though apparently not seriously malicious (but with some malicious potential). See: http://www.google.com/safebrowsing/diagnostic?site=windows7download.com With thanks to David Winsemius for tracking this down. Certainly not an official Windows 7 site (hosted in Slovakia by writers of poor English). Ted. On 03-Jul-11 13:54:01, John Kane wrote: > SPAM? > > --- On Sat, 7/2/11, Windows 7 Download > wrote: > >> From: Windows 7 Download >> Subject: [R] R for Windows - 5 stars award on Windows 7 Download >> To: "R Development Core Team" >> Received: Saturday, July 2, 2011, 4:51 PM >> Dear R Development Core Team >> >> We are more than happy that Windows 7 was launched after >> long restless period of waiting. Due to this expected >> moment, we prepared and launched new Windows 7 download >> website that will be used by new Windows 7 customers to look >> for software compatible with Windows 7. >> >> R for Windows has been reviewed by Windows 7 Download and >> got 5 stars award: >> http://www.windows7download.com/win7-r-for-windows/snvrckjh.html >> >> Draw attention to your product by making it visible on >> webpage that will be used by people who are devoted to >> Windows 7. The number of Windows 7 customers will rise to >> huge quantity in a short period of time. >> >> Please publish Windows 7 Download award on your website by >> adding the following HTML code: >> >> 160 x 80: >> > target="_blank">> src="http://www.windows7download.com/templates/w7d/images/awards/award_ >> 5.png" >> alt="Windows 7 Download" border="0"/> >> >> 120 x 60: >> > target="_blank">> src="http://www.windows7download.com/templates/w7d/images/awards/award_ >> 120x60_5.png" >> alt="Windows 7 Download" border="0"/> >> >> Windows 7 compatible logo: >> > target="_blank">> src="http://www.windows7download.com/templates/w7d/images/windows7compa >> tible.png" >> alt="R for Windows - Windows 7 compatible" >> border="0"/> >> >> Text link: >> > target="_blank">5 Stars Awarded on Windows 7 >> Download >> >> Here are more images and options how link to us: >> http://www.windows7download.com/linktous.html >> >> Operation System Windows 7 is already looking forward to >> its rising popularity and this is the reason why you should >> not forgot this unique opportunity of rising interest of >> clients on software compatible with Windows 7._ >> Highlight the facilities of your product by placing it on >> Windows 7 Download.com, which can help you to bring new >> valuable users. >> >> We're looking forward for further co-operation. >> >> Best regards, >> Windows 7 Download >> >> > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -------------------------------------------------------------------- E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 03-Jul-11 Time: 15:19:12 ------------------------------ XFMail ------------------------------ From dwinsemius at comcast.net Sun Jul 3 17:30:07 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 3 Jul 2011 11:30:07 -0400 Subject: [R] using Arial font In-Reply-To: References: Message-ID: On Jul 3, 2011, at 5:13 AM, Daithi Murray wrote: > To whom it may concern, > > I am e-mailing you concerning the use of Arial Font in the program R > and I > am using a mac. > > I am trying to create graphs in R and the publisher I wish to > publish an > article with needs the font to be Arial. I have tried looking around > to find > out how to do this with no luck and must help on the topic is geared > towards > Linux users - the only help available is from the PLoS ONE site: > First thing I did was check to see if Arial was already in my list of psotscriptFonts: names(postscriptFonts()) # Nope not there Then I used the Mac Font Book.app to see if it was on my system. It is. (And it's just an ordinary sans serif font, nothing special.) Then I right-clicked and asked to show it in the Finder. It is in / Macintosh HD/Library/Fonts/Microsoft/ and is named Arial.ttf rather than arial.ttf and the names of the italic, bold, and bold-italic faces have spaces and are also capitalized. (Now, I cannot be sure that you will find such a font on your system. I have installed several MS products and it seems possible that this font is there only because of that. And I do not know what the licensing provisions are.) I then checked using Terminal.app to see if the utility program you were advised to use was on my system: david-winsemiuss-mac-pro:~ davidwinsemius$ locate ttf2afm # It seems to be present as a result of having installed the Tex package since all of the directories where it is found are descendants of /usr/local/texlive > Enable the use of Arial in R > > First, convert the Arial .ttf files to afm: This is implicitly telling you to do these actions from a system window such as you get with Terminal.app I wish I could report that my efforts at following these instructions worked but I get the creation of zero MB files in my ~/ directory > > ttf2afm /usr/share/fonts/msttcorefonts/arial.ttf > ~/arial.afm > ttf2afm /usr/share/fonts/msttcorefonts/ariali.ttf > ~/ariali.afm > ttf2afm /usr/share/fonts/msttcorefonts/arialbd.ttf > ~/arialbd.afm > ttf2afm /usr/share/fonts/msttcorefonts/arialbi.ttf > ~/arialbi.afm > > and then do the following in R: > > postscript(file="try.ps", horizontal=F, > onefile=F, > width=4, height=4, > family=c("/home/stephen/arial.afm", > "/home/stephen/arialbd.afm", > "/home/stephen/ariali.afm", > "/home/stephen/arialbi.afm"), > pointsize=12) > hist(rnorm(100)) > dev.off() > > However, I do not have the above files and do not know where to get > them. My advice. Use an available sans-serif font. The default sans serif font on the R Mac combo is Helvetica but you can ensure its use with.... > postscript(file="try.ps", horizontal=F, + onefile=F, + width=4, height=4, + family="Helvetica", + pointsize=12) > hist(rnorm(100)) > dev.off() null device 1 If the people at the other end want to use font substitution then that should be easily accomplished. There should be enough information in the ps file for their software to figure out that you used a 12-point sans-serif font. (This is really a question better posed on the R-Mac-SIG mailing list to which I am cross-posting and to which follow-ups should be directed .... without following up to rhelp.) > > Many thanks, > > D. M > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From rose at rz.uni-potsdam.de Sun Jul 3 18:15:23 2011 From: rose at rz.uni-potsdam.de (rose at rz.uni-potsdam.de) Date: Sun, 03 Jul 2011 18:15:23 +0200 Subject: [R] Rmpi fails to install In-Reply-To: <1309456088.19092.6.camel@moose.ibmt.intern> References: <20110307215305.hppddvbncww4sw8o@webmail.uni-potsdam.de> <1952984.A666ioBvt6@localhost> <1309456088.19092.6.camel@moose.ibmt.intern> Message-ID: <20110703181523.c56gzrmqw4kgg4so@webmail.uni-potsdam.de> Quoting Juergen Rose : > Hi, > > I was just able to install the patched Rmpi on the second system with > openmpi-1.5.3. What can we that Rmpi_0.5-9a.tar.gz becomes a standard > CRAN package? > > Regards Juergen > Now there is a new problem. Testing openmpi-1.5.3-r1 and Rmpi_0.5-9a.tar.gz with the attached R script I got a Segmentation fault or the program hangs in the vicinity of closeCluster() and mpi.quit(), which does not happen with openmpi-1.4.3 and Rmpi_0.5-9.tar.gz: rose at orca:/home/rose/Txt/src/Test/R/Parallel(5)$ qlist -Iv openmpi sys-cluster/openmpi-1.5.3-r1 rose at orca:/home/rose/Txt/src/Test/R/Parallel(6)$ mpirun -np 4 -wdir . R --slave --vanilla -f mpi_test_parallel_mini.R [1] "Res=1.000 usrtime= 0.0 elatime= 0.0" [1] "Res=4.000 usrtime= 0.0 elatime= 0.0" [1] "Res=9.000 usrtime= 0.0 elatime= 0.0" [1] "Res=16.000 usrtime= 0.0 elatime= 0.0" [1] " Res=7.500 userTime= 0.0 elapTime= 0.0" [1] "before closeCluster()" [1] "after closeCluster()" [orca:29255] *** Process received signal *** [orca:29255] Signal: Segmentation fault (11) [orca:29255] Signal code: Address not mapped (1) [orca:29255] Failing at address: 0x7f8869c0e460 Speicherzugriffsfehler rose at caiman:/home/rose/Txt/src/Test/R/Parallel(47)$ qlist -Iv openmpi sys-cluster/openmpi-1.4.3 rose at caiman:/home/rose/Txt/src/Test/R/Parallel(48)$ mpirun -np 4 -wdir . R --slave --vanilla -f mpi_test_parallel_mini.R [1] "Res=1.000 usrtime= 0.0 elatime= 0.0" [1] "Res=4.000 usrtime= 0.0 elatime= 0.0" [1] "Res=9.000 usrtime= 0.0 elatime= 0.0" [1] "Res=16.000 usrtime= 0.0 elatime= 0.0" [1] " Res=7.500 userTime= 0.0 elapTime= 0.0" [1] "before closeCluster()" [1] "after closeCluster()" It seems, that the issue is not a special Rmpi problem, compare https://bugs.gentoo.org/show_bug.cgi?id=373923. From ligges at statistik.tu-dortmund.de Sun Jul 3 18:19:22 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Sun, 3 Jul 2011 18:19:22 +0200 Subject: [R] Speed Advice for R --- avoid data frames In-Reply-To: References: <4E0F6A65.8080101@statistik.tu-dortmund.de> Message-ID: <4E10968A.8060804@statistik.tu-dortmund.de> On 02.07.2011 21:35, ivo welch wrote: > hi uwe---thanks for the clarification. of course, my example should always > be done in vectorized form. I only used it to show how iterative access > compares in the simplest possible fashion.<100 accesses per seconds is > REALLY slow, though. > > I don't know R internals and the learning curve would be steep. moreover, > there is no guarantee that changes I would make would be accepted. so, I > cannot do this. > > however, for an R expert, this should not be too difficult. conceptually, > if data frame element access primitives are create/write/read/destroy in the > code, then it's truly trivial. just add a matrix (dim the same as the data > frame) of byte pointers to point at the storage upon creation/change time. > this would be quick-and-dirty. for curiosity, do you know which source > file has the data frame internals? maybe I will get tempted anyway if it is > simple enough. I think you should start to look at the mechanisms to construct data.frames (such as data.frame) and learn that data.frames are special lists. Then you may want to look at the differences between the .Primitive("[") and .Primitive("[<-") used for vectors (including vectors with dim attributes such as matrixes) and the correspoding methods for data.frames: "[<-.data.frame" and "[.data.frame". After that, I doubt you want to improve further on. Note also that data.frames can be pretty large and you really do not want to store a matrix of pointers as large as the data.frame. People working witrh large data.frames won't be happy with such a suggestion. If you want to follow up, I'd suggest to move the thread to R-devel where it seems to be more appropriate. Best, Uwe > > (a more efficient but more involved way to do this would be to store a data > frame internally always as a matrix of data pointers, but this would > probably require more surgery.) > > It is also not as important for me, as it is for others...to give a good > impression to those that are not aware of the tradeoffs---which is most > people considering to adopt R. > > /iaw > > > ---- > Ivo Welch (ivo.welch at gmail.com) > > > > > 2011/7/2 Uwe Ligges > >> Some comments: >> >> the comparison matrix rows vs. matrix columns is incorrect: Note that R has >> lazy evaluation, hence you construct your matrix in the timing for the rows >> and it is already constructed in the timing for the columns, hence you want >> to use: >> >> M<- matrix( rnorm(C*R), nrow=R ) >> D<- as.data.frame(matrix( rnorm(C*R), nrow=R ) ) >> example(M) >> example(D) >> >> Further on, you are correct with you statement that data.frame indexing is >> much slower, but if you can store your data in matrix form, just go on as it >> is. >> >> I doubt anybody is really going to make the index operation you cited >> within a loop. Then, with a data.frame, I can live with many vectorized >> replacements again: >> >>> system.time(D[,20]<- sqrt(abs(D[,20])) + rnorm(1000)) >> user system elapsed >> 0.01 0.00 0.01 >> >>> system.time(D[20,]<- sqrt(abs(D[20,])) + rnorm(1000)) >> user system elapsed >> 0.51 0.00 0.52 >> >> OK, it would be nice to do that faster, but this is not easy. I think R >> Core is happy to see contributions to make it faster without breaking >> existing features. >> >> >> >> Best wishes, >> Uwe >> >> >> >> >> On 02.07.2011 20:35, ivo welch wrote: >> >>> This email is intended for R users that are not that familiar with R >>> internals and are searching google about how to speed up R. >>> >>> Despite common misperception, R is not slow when it comes to iterative >>> access. R is fast when it comes to matrices. R is very slow when it >>> comes to iterative access into data frames. Such access occurs when a >>> user uses "data$varname[index]", which is a very common operation. To >>> illustrate, run the following program: >>> >>> R<- 1000; C<- 1000 >>> >>> example<- function(m) { >>> cat("rows: "); cat(system.time( for (r in 1:R) m[r,20]<- >>> sqrt(abs(m[r,20])) + rnorm(1) ), "\n") >>> cat("columns: "); cat(system.time(for (c in 1:C) m[20,c]<- >>> sqrt(abs(m[20,c])) + rnorm(1)), "\n") >>> if (is.data.frame(m)) { cat("df: columns as names: "); >>> cat(system.time(for (c in 1:C) m[[c]][20]<- sqrt(abs(m[[c]][20])) + >>> rnorm(1)), "\n") } >>> } >>> >>> cat("\n**** Now as matrix\n") >>> example( matrix( rnorm(C*R), nrow=R ) ) >>> >>> cat("\n**** Now as data frame\n") >>> example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) ) >>> >>> >>> The following are the reported timing under R 2.12.0 on a Mac Pro 3,1 >>> with ample RAM: >>> >>> matrix, columns: 0.01s >>> matrix, rows: 0.175s >>> data frame, columns: 53s >>> data frame, rows: 56s >>> data frame, names: 58s >>> >>> Data frame access is about 5,000 times slower than matrix column >>> access, and 300 times slower than matrix row access. R's data frame >>> operational speed is an amazing 40 data accesses per seconds. I have >>> not seen access numbers this low for decades. >>> >>> >>> How to avoid it? Not easy. One way is to create multiple matrices, >>> and group them as an object. of course, this loses a lot of features >>> of R. Another way is to copy all data used in calculations out of the >>> data frame into a matrix, do the operations, and then copy them back. >>> not ideal, either. >>> >>> In my opinion, this is an R design flow. Data frames are the >>> fundamental unit of much statistical analysis, and should be fast. I >>> think R lacks any indexing into data frames. Turning on indexing of >>> data frames should at least be an optional feature. >>> >>> >>> I hope this message post helps others. >>> >>> /iaw >>> >>> ---- >>> Ivo Welch (ivo.welch at gmail.com) >>> http://www.ivo-welch.info/ >>> >>> ______________________________**________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/**listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/** >>> posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From rose at rz.uni-potsdam.de Sun Jul 3 18:19:24 2011 From: rose at rz.uni-potsdam.de (rose at rz.uni-potsdam.de) Date: Sun, 03 Jul 2011 18:19:24 +0200 Subject: [R] Rmpi fails to install In-Reply-To: <20110703181523.c56gzrmqw4kgg4so@webmail.uni-potsdam.de> References: <20110307215305.hppddvbncww4sw8o@webmail.uni-potsdam.de> <1952984.A666ioBvt6@localhost> <1309456088.19092.6.camel@moose.ibmt.intern> <20110703181523.c56gzrmqw4kgg4so@webmail.uni-potsdam.de> Message-ID: <20110703181924.di1wv8ro5c0s4kcw@webmail.uni-potsdam.de> Quoting rose at rz.uni-potsdam.de: > Quoting Juergen Rose : > >> Hi, >> >> I was just able to install the patched Rmpi on the second system with >> openmpi-1.5.3. What can we that Rmpi_0.5-9a.tar.gz becomes a standard >> CRAN package? >> >> Regards Juergen >> > > Now there is a new problem. Testing openmpi-1.5.3-r1 and > Rmpi_0.5-9a.tar.gz with the attached R script I got a Segmentation > fault or the program hangs in the vicinity of closeCluster() and > mpi.quit(), which does not happen with openmpi-1.4.3 and > Rmpi_0.5-9.tar.gz: I forget to attach the R script. Now it should be attached. -------------- next part -------------- ####################################################### ## j.r. 02-Jul-11 ## ####################################################### suppressPackageStartupMessages(library('doMPI', quiet=TRUE)) suppressPackageStartupMessages(library('foreach', quiet=TRUE)) # create and register a doMPI cluster if necessary if (!identical(getDoParName(), 'doMPI')) { cl <- startMPIcluster() registerDoMPI(cl) } #logfile="mpi_test_parallel_mini.log" #sink(file=logfile,split=TRUE) Date <- system("date",intern=TRUE) Ncyc <- 4 resVec <- numeric(Ncyc) pto.T <- proc.time() pRes <- foreach(i = 1:Ncyc) %dopar% { pto <- proc.time() ptn <- proc.time(); dt <- ptn - pto; ut <- dt[1]; st <- dt[2]; et <- dt[3] res <- i*i list(Res=res, Ut=ut, St=st, Et=et) } for (i in 1:Ncyc) { resVec[i] <- pRes[[i]]$Res print(sprintf("Res=%5.3f usrtime=%6.1f elatime=%6.1f",pRes[[i]]$Res,pRes[[i]]$Ut,pRes[[i]]$Et)) } ptn.T <- proc.time(); dt <- ptn.T - pto.T; ut <- dt[1]; st <- dt[2]; et <- dt[3] meanRes<-mean(resVec) print(sprintf(" Res=%5.3f userTime=%5.1f elapTime=%5.1f",meanRes,ut,et)) #sink() print("before closeCluster()") closeCluster(cl) print("after closeCluster()") mpi.quit() From assiehrashidi at yahoo.com Sun Jul 3 18:33:50 2011 From: assiehrashidi at yahoo.com (Assieh Rashidi) Date: Sun, 3 Jul 2011 09:33:50 -0700 (PDT) Subject: [R] semi correlation Message-ID: <1309710830.80827.YahooMailNeo@web46404.mail.sp1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From bhh at xs4all.nl Sun Jul 3 19:22:13 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Sun, 3 Jul 2011 10:22:13 -0700 (PDT) Subject: [R] optim() In-Reply-To: <1309711512722-3642204.post@n4.nabble.com> References: <1309711512722-3642204.post@n4.nabble.com> Message-ID: <1309713733868-3642254.post@n4.nabble.com> Majdi wrote: > > The code that I am using is given by > > # compute the MLE for Normal(mu, var) > ####################################### > > U<-rnorm(10,mean=1, sd=2) > Normal.lik<-function(theta, U){ > mu=theta[1] > var=theta[2] > n<-nrow(U) > logl<- -0.5*n*log(var)-(sum((U-mu)**2))/(2*var) > return(-logl) > } > optim(c(0,1),Normal.lik,U=U,method="BFGS" > > U is not a matrix and therefore does not have rows. Use n <- length(U). Berend -- View this message in context: http://r.789695.n4.nabble.com/optim-tp3642204p3642254.html Sent from the R help mailing list archive at Nabble.com. From anopheles123 at gmail.com Sun Jul 3 19:28:45 2011 From: anopheles123 at gmail.com (Weidong Gu) Date: Sun, 3 Jul 2011 13:28:45 -0400 Subject: [R] How many times occurs In-Reply-To: References: Message-ID: another way of thinking is to turn the sequences into strings if your data is db >apply(db,1,function(x) length(gregexpr('892',paste(c(x),collapse=''))[[1]])) [1] 1 3 1 1 3 2 Weidong Gu On Sat, Jul 2, 2011 at 3:40 PM, Trying To learn again wrote: > Hi all, > > I have a data matrix likein "input.txt" > > 8 9 2 5 4 5 8 5 6 6 > 8 9 2 8 9 2 8 9 2 1 > 8 9 2 5 4 5 8 5 6 4 > ?8 9 2 5 4 5 8 5 6 6 > 8 9 2 8 9 2 8 9 2 1 > 8 9 2 5 4 5 8 9 2 ? 2 > > > In this example will be an ?6x10 matrix (or data frame) > > I want to detect how many times in a row appears this combination ?8 follewd > by 9 followed by 2, and create a new matrix with only this number of occurs > then if this number is less than "n" I keep the row. For example in the > last row the number n will be 2 because "series" 8 9 2 appears 2 times in > the same row. > > I tried this, but doesn?t works....also tried other thinks but also the same > results: > > *dat<-read.table('input1.txt')* > ** > ** > *dat1 <- dat[ dat[,1]=8 & dat[,2]=9 ?& dat[,3]=2 ,]=1* > *dat2<-dat[(dat[,2]= 8 ?& dat[,3]=9 ?& dat[,4]=2),]=1* > *dat3<-dat[(dat[,5]=8 ? & dat[,4]=9 ?& dat[,5]=2),]=1* > *dat4<-dat[(dat[,4]=8 & ? dat[,5]=9 ?& dat[,6]=2),]=1* > *dat5<-dat[(dat[,5]=8 & ? dat[,6]=9 ?& dat[,7]=2),]=1* > *dat6<-dat[(dat[,6]=8 & ? dat[,7]=9 ?& dat[,8]=2),6]=1* > *dat7<-dat[(dat[,7]=8 & ? ?dat[,8]=9 & dat[,9]=2),7]=1* > *dat8<-dat[(dat[,8]=8 & ? ?dat[,9]=9 & dat[,10]=2),8]=1* > ** > datfinal<-dat1+da2+dat3+dat4+dat5+dat6+dat7+dat8 > > final2 <- dat[ rowSums(datfinal) < 2 , ] > > So my last matrix "final2" will be "dat" without the rows that doesn?t pass > the conditions. > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From dwinsemius at comcast.net Sun Jul 3 20:08:58 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 3 Jul 2011 14:08:58 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: On Jul 3, 2011, at 1:07 PM, Bansal, Vikas wrote: > Yes you are right. unlist operation is unnecessary and I have tried > it yesterday and it is working without that operation also.But I > have one more problem on which I have worked whole day but did not > get any solution.As I told you I am new to R,I want to ask that how > I can use the (if condition) in the following code > > df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") > txtvec <- readLines(textConnection(df[,9])) > dad=data.frame(A = (sapply(gregexpr("A|a", (df[,9])), function(x) if > ( x[[1]] != -1) > length(x) else 0 )), > C = (sapply(gregexpr("C|c", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > G = (sapply(gregexpr("G|g", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > T = (sapply(gregexpr("T|t", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > N = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 ))) > > > Now my problem is in my data frame I have alphabets A,C,G and T in > 3rd column also.Now these commas (,)and dots(.) in column 9 are for > these alphabets which are in column 3.I want to use if condition > like this > > if in my dataframe column 3 have A then A = (sapply(gregexpr("\\,|\ > \.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else (A = (sapply(gregexpr("A|a", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )),if in my dataframe column 3 haveCA then C = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else C = (sapply(gregexpr("C|c", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), if in my dataframe column 3 have G then G = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else G = (sapply(gregexpr("G|g", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )) if in my dataframe column 3 have T then T = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else T = (sapply(gregexpr("T|t", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), > > > So I want to code so that it will give the output like this- > > DATA FRAME (Input) > > col3 col 9 > T .a,g,, > A .t,t,, > A .,c,c, > C .,a,,, > G .,t,t,t > A .c,,g,^!. > A .g,ggg.^!, > A .$,,,,,., > C a,g,,t, > T ,,,,,.,^!. > T ,$,,,,.,." > > > output > > A C G T > 1 0 1 4 > 4 0 0 2 > 4 2 0 0 > 1 5 0 0 > 0 0 4 3 > > > > This is the output for first five rows.v I was unable to follow the logic and because complete output was not offered, I am unable to check my guesses against you full specifications. -- David. > > > > Can you please help me how to use this if condition in your coding > or we can also do it by using some other condition rather than if > condition? > > > > > > > > > > > > > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Sunday, July 03, 2011 3:57 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 2, 2011, at 4:46 PM, Bansal, Vikas wrote: > >> DEAR ALL, >> I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY... >> >> df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = >> "character") >> txt=df[,9] >> txtvec <- readLines(textConnection(txt)) >> dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec), >> function(x) if ( x[[1]] != -1) >> length(x) else 0 )), >> C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if >> ( x[[1]] != -1) >> length(x) else 0 ))) >> > > The unlist operation is unnecessary since the sapply operation returns > a vector. (It doesn't hurt, but it is unnecessary.) >> >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Saturday, July 02, 2011 9:04 PM >> To: Dennis Murphy >> Cc: r-help at r-project.org; Bansal, Vikas >> Subject: Re: [R] For help in R coding >> >> On reflection and a bit of testing I think the best approach would be >> to use gregexpr. For counting the number of commas, this appears >> quite >> straightforward. >> >>> sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1) >> length(x) else 0 ) >> [1] 3 3 3 4 3 3 2 6 4 6 6 >> >> It easily generalizes to period and the `|` (or) operation on >> letters. >> ( did need to add the check since the length of gregexpr is always at >> least one but ihas value -1 when there is no match >> >>> sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1) >> length(x) else 0 ) >> [1] 0 2 0 0 3 0 0 0 1 0 0 >> >> >> On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote: >> >>> Hi: >>> >>> There seems to be a problem if the string ends in , or . , which >>> makes >>> it difficult for strsplit() to pick up if it is splitting on those >>> characters. Here is an alternative, splitting on individual >>> characters >>> and using charmatch() instead: >>> >>> charsum <- function(s, char) { >>> u <- strsplit(s, "") >>> sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) >>> } >>> >>> unname(sapply(txtvec, function(x) charsum(x, ','))) >>> unname(sapply(txtvec, function(x) charsum(x, '.'))) >>> >>> Putting this into a data frame, >>> >>> dfout <- data.frame(periods = unname(sapply(txtvec, function(x) >>> charsum(x, '.'))), >>> commas = unname(sapply(txtvec, >>> function(x) charsum(x, '.'))) ) >>> txtvec >>> >>> HTH, >>> Dennis >>> >>> On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius >>> wrote: >>>> >>>> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: >>>> >>>>> >>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I am doing a project on variant calling using R.I am working on >>>>>>> pileup file.There are 10 columns in my data frame and I want to >>>>>>> count the number of A,C,G and T in each row for column 9.example >>>>>>> of >>>>>>> column 9 is given below- >>>>>>> >>>>>>> .a,g,, >>>>>>> .t,t,, >>>>>>> .,c,c, >>>>>>> .,a,,, >>>>>>> .,t,t,t >>>>>>> .c,,g,^!. >>>>>>> .g,ggg.^!, >>>>>>> .$,,,,,., >>>>>>> a,g,,t, >>>>>>> ,,,,,.,^!. >>>>>>> ,$,,,,.,. >>>>>>> >>>>>>> This is a bit confusing for me as these characters are in one >>>>>>> column >>>>>>> and how can we scan them for each row to print number of A,C,G >>>>>>> and T >>>>>>> for each row. >>>>>> >>>>>> Seems a bit clunky but this does the job (first the data): >>>>>>> >>>>>>> txt <- " .a,g,, >>>>>> >>>>>> + .t,t,, >>>>>> + .,c,c, >>>>>> + .,a,,, >>>>>> + .,t,t,t >>>>>> + .c,,g,^!. >>>>>> + .g,ggg.^!, >>>>>> + .$,,,,,., >>>>>> + a,g,,t, >>>>>> + ,,,,,.,^!. >>>>>> + ,$,,,,.,." >>>>>> >>>>>>> txtvec <- readLines(textConnection(txt)) >>>>>> >>>>>> Now the clunky solution, Basically subtracts 1 from the counts of >>>>>> "fragments" that result from splitting on each letter in turn. >>>>>> Could >>>>>> be made prettier with a function that did the job. >>>>>> >>>>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>>> >>>>>> split="a"), length) , "-", 1)), >>>>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>>>>> length) , "-", 1)), >>>>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>>>>> length) , "-", 1)), >>>>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>>>>> length) , "-", 1)) ) >>>>>> A C G T >>>>>> .a,g,, 1 0 1 0 >>>>>> .t,t,, 0 0 0 2 >>>>>> .,c,c, 0 2 0 0 >>>>>> .,a,,, 1 0 0 0 >>>>>> .,t,t,t 0 0 0 2 >>>>>> .c,,g,^!. 0 1 1 0 >>>>>> .g,ggg.^!, 0 0 4 0 >>>>>> .$,,,,,., 0 0 0 0 >>>>>> a,g,,t, 1 0 1 1 >>>>>> ,,,,,.,^!. 0 0 0 0 >>>>>> ,$,,,,.,. 0 0 0 0 >>>>>> >>>>>> Has the advantage that the input data ends up as rownames, which >>>>>> was a >>>>>> surprise. >>>>>> >>>>>> If you wanted to count "A" and "a" as equivalent, then the split >>>>>> argument should be "a|A" >>>>>> >>>>>> >>>>> >>>>>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>>>>>> LIKE >>>>>>> THIS. >>>>> >>>>> BUT CAN I COUNT . AND , ALSO USING- >>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>> split=".|,"), length) , "-", 1)), >>>>> >>>>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME >>>>> PLACES >>>>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >>>>> CALCULATING AND JUST SHOWING 0. >>>> >>>> You need to use valid regex expressions for 'split'. Since "." and >>>> "," are >>>> special characters they need to be escaped when you wnat the >>>> literals to be >>>> recognized as such. >>>> >>>> I haven't figured out why but you need to drop the final operation >>>> of >>>> subtracting 1 from the values when counting commas: >>>> >>>> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, >>>> strsplit, >>>> split="\\."), length) , "-", 1)) >>>> ,commas = unlist( lapply( sapply(txtvec, strsplit, >>>> split="\\,"), length) ) ) >>>> periods commas >>>> .a,g,, 1 3 >>>> .t,t,, 1 3 >>>> .,c,c, 1 3 >>>> .,a,,, 1 4 >>>> .,t,t,t 1 4 >>>> .c,,g,^!. 1 4 >>>> .g,ggg.^!, 2 2 >>>> .$,,,,,., 2 6 >>>> a,g,,t, 0 4 >>>> ,,,,,.,^!. 1 7 >>>> ,$,,,,.,. 1 7 >>>> >>>> -- >>>> >>>> David Winsemius, MD >>>> West Hartford, CT >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> >> David Winsemius, MD >> West Hartford, CT >> > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT From pdalgd at gmail.com Sun Jul 3 20:52:38 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Sun, 3 Jul 2011 20:52:38 +0200 Subject: [R] Reverse legend label order in barplot In-Reply-To: <0622CAA1-5D8F-42AF-AFCE-66B6AE4FE702@me.com> References: <000001cc3849$1abe9f10$503bdd30$@ubc.ca> <0622CAA1-5D8F-42AF-AFCE-66B6AE4FE702@me.com> Message-ID: <0C085543-4943-41B2-9CB2-D953F348C8AE@gmail.com> On Jul 2, 2011, at 02:14 , Marc Schwartz wrote: > > > Alternatively, You can use legend() separately to add the legend to the barplot in the fashion that you desire. > > Thus: > > barplot(mat) > > legend("topright", legend = rownames(mat), > fill = grey.colors(nrow(mat))) > Yup. One thing: I'd future-proof it, making double-damn-sure that it is actually the same colors in the plot and the legend by going clr <- grey.colors(nrow(mat)) barplot(mat, col = clr) legend("topright", legend = rownames(mat), fill = clr) -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From vikas.bansal at kcl.ac.uk Sun Jul 3 19:07:21 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sun, 3 Jul 2011 18:07:21 +0100 Subject: [R] For help in R coding In-Reply-To: <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk> Yes you are right. unlist operation is unnecessary and I have tried it yesterday and it is working without that operation also.But I have one more problem on which I have worked whole day but did not get any solution.As I told you I am new to R,I want to ask that how I can use the (if condition) in the following code df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") txtvec <- readLines(textConnection(df[,9])) dad=data.frame(A = (sapply(gregexpr("A|a", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )), C = (sapply(gregexpr("C|c", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )), G = (sapply(gregexpr("G|g", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )), T = (sapply(gregexpr("T|t", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )), N = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 ))) Now my problem is in my data frame I have alphabets A,C,G and T in 3rd column also.Now these commas (,)and dots(.) in column 9 are for these alphabets which are in column 3.I want to use if condition like this if in my dataframe column 3 have A then A = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 ))) else (A = (sapply(gregexpr("A|a", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )),if in my dataframe column 3 haveCA then C = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 ))) else C = (sapply(gregexpr("C|c", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )), if in my dataframe column 3 have G then G = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 ))) else G = (sapply(gregexpr("G|g", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )) if in my dataframe column 3 have T then T = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 ))) else T = (sapply(gregexpr("T|t", (df[,9])), function(x) if ( x[[1]] != -1) length(x) else 0 )), So I want to code so that it will give the output like this- DATA FRAME (Input) col3 col 9 T .a,g,, A .t,t,, A .,c,c, C .,a,,, G .,t,t,t A .c,,g,^!. A .g,ggg.^!, A .$,,,,,., C a,g,,t, T ,,,,,.,^!. T ,$,,,,.,." output A C G T 1 0 1 4 4 0 0 2 4 2 0 0 1 5 0 0 0 0 4 3 This is the output for first five rows. Can you please help me how to use this if condition in your coding or we can also do it by using some other condition rather than if condition? ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Sunday, July 03, 2011 3:57 AM To: Bansal, Vikas Cc: Dennis Murphy; r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 2, 2011, at 4:46 PM, Bansal, Vikas wrote: > DEAR ALL, > I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY... > > df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") > txt=df[,9] > txtvec <- readLines(textConnection(txt)) > dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), > C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 )), > G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 )), > T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 )), > N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if > ( x[[1]] != -1) > length(x) else 0 ))) > The unlist operation is unnecessary since the sapply operation returns a vector. (It doesn't hurt, but it is unnecessary.) > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Saturday, July 02, 2011 9:04 PM > To: Dennis Murphy > Cc: r-help at r-project.org; Bansal, Vikas > Subject: Re: [R] For help in R coding > > On reflection and a bit of testing I think the best approach would be > to use gregexpr. For counting the number of commas, this appears quite > straightforward. > >> sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1) > length(x) else 0 ) > [1] 3 3 3 4 3 3 2 6 4 6 6 > > It easily generalizes to period and the `|` (or) operation on letters. > ( did need to add the check since the length of gregexpr is always at > least one but ihas value -1 when there is no match > >> sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1) > length(x) else 0 ) > [1] 0 2 0 0 3 0 0 0 1 0 0 > > > On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote: > >> Hi: >> >> There seems to be a problem if the string ends in , or . , which >> makes >> it difficult for strsplit() to pick up if it is splitting on those >> characters. Here is an alternative, splitting on individual >> characters >> and using charmatch() instead: >> >> charsum <- function(s, char) { >> u <- strsplit(s, "") >> sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) >> } >> >> unname(sapply(txtvec, function(x) charsum(x, ','))) >> unname(sapply(txtvec, function(x) charsum(x, '.'))) >> >> Putting this into a data frame, >> >> dfout <- data.frame(periods = unname(sapply(txtvec, function(x) >> charsum(x, '.'))), >> commas = unname(sapply(txtvec, >> function(x) charsum(x, '.'))) ) >> txtvec >> >> HTH, >> Dennis >> >> On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius >> wrote: >>> >>> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: >>> >>>> >>>> >>>>>> Dear all, >>>>>> >>>>>> I am doing a project on variant calling using R.I am working on >>>>>> pileup file.There are 10 columns in my data frame and I want to >>>>>> count the number of A,C,G and T in each row for column 9.example >>>>>> of >>>>>> column 9 is given below- >>>>>> >>>>>> .a,g,, >>>>>> .t,t,, >>>>>> .,c,c, >>>>>> .,a,,, >>>>>> .,t,t,t >>>>>> .c,,g,^!. >>>>>> .g,ggg.^!, >>>>>> .$,,,,,., >>>>>> a,g,,t, >>>>>> ,,,,,.,^!. >>>>>> ,$,,,,.,. >>>>>> >>>>>> This is a bit confusing for me as these characters are in one >>>>>> column >>>>>> and how can we scan them for each row to print number of A,C,G >>>>>> and T >>>>>> for each row. >>>>> >>>>> Seems a bit clunky but this does the job (first the data): >>>>>> >>>>>> txt <- " .a,g,, >>>>> >>>>> + .t,t,, >>>>> + .,c,c, >>>>> + .,a,,, >>>>> + .,t,t,t >>>>> + .c,,g,^!. >>>>> + .g,ggg.^!, >>>>> + .$,,,,,., >>>>> + a,g,,t, >>>>> + ,,,,,.,^!. >>>>> + ,$,,,,.,." >>>>> >>>>>> txtvec <- readLines(textConnection(txt)) >>>>> >>>>> Now the clunky solution, Basically subtracts 1 from the counts of >>>>> "fragments" that result from splitting on each letter in turn. >>>>> Could >>>>> be made prettier with a function that did the job. >>>>> >>>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>> >>>>> split="a"), length) , "-", 1)), >>>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>>>> length) , "-", 1)), >>>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>>>> length) , "-", 1)), >>>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>>>> length) , "-", 1)) ) >>>>> A C G T >>>>> .a,g,, 1 0 1 0 >>>>> .t,t,, 0 0 0 2 >>>>> .,c,c, 0 2 0 0 >>>>> .,a,,, 1 0 0 0 >>>>> .,t,t,t 0 0 0 2 >>>>> .c,,g,^!. 0 1 1 0 >>>>> .g,ggg.^!, 0 0 4 0 >>>>> .$,,,,,., 0 0 0 0 >>>>> a,g,,t, 1 0 1 1 >>>>> ,,,,,.,^!. 0 0 0 0 >>>>> ,$,,,,.,. 0 0 0 0 >>>>> >>>>> Has the advantage that the input data ends up as rownames, which >>>>> was a >>>>> surprise. >>>>> >>>>> If you wanted to count "A" and "a" as equivalent, then the split >>>>> argument should be "a|A" >>>>> >>>>> >>>> >>>>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>>>>> LIKE >>>>>> THIS. >>>> >>>> BUT CAN I COUNT . AND , ALSO USING- >>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>> split=".|,"), length) , "-", 1)), >>>> >>>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME >>>> PLACES >>>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >>>> CALCULATING AND JUST SHOWING 0. >>> >>> You need to use valid regex expressions for 'split'. Since "." and >>> "," are >>> special characters they need to be escaped when you wnat the >>> literals to be >>> recognized as such. >>> >>> I haven't figured out why but you need to drop the final operation >>> of >>> subtracting 1 from the values when counting commas: >>> >>> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, >>> split="\\."), length) , "-", 1)) >>> ,commas = unlist( lapply( sapply(txtvec, strsplit, >>> split="\\,"), length) ) ) >>> periods commas >>> .a,g,, 1 3 >>> .t,t,, 1 3 >>> .,c,c, 1 3 >>> .,a,,, 1 4 >>> .,t,t,t 1 4 >>> .c,,g,^!. 1 4 >>> .g,ggg.^!, 2 2 >>> .$,,,,,., 2 6 >>> a,g,,t, 0 4 >>> ,,,,,.,^!. 1 7 >>> ,$,,,,.,. 1 7 >>> >>> -- >>> >>> David Winsemius, MD >>> West Hartford, CT >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT From antonio.rrz at gmail.com Sun Jul 3 19:22:19 2011 From: antonio.rrz at gmail.com (Antonio Rodriges) Date: Sun, 3 Jul 2011 10:22:19 -0700 (PDT) Subject: [R] Isolines in vector format Message-ID: Dear R users, I am working with netcdf data of NCEP/NCAR Climate Reanalysis. R does have capability to draw isolines using "contour". However, I need not to draw but to export contours in any vector format. Is it possible? Thank you. From djmuser at gmail.com Sun Jul 3 21:24:35 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Sun, 3 Jul 2011 12:24:35 -0700 Subject: [R] semi correlation In-Reply-To: <1309710830.80827.YahooMailNeo@web46404.mail.sp1.yahoo.com> References: <1309710830.80827.YahooMailNeo@web46404.mail.sp1.yahoo.com> Message-ID: An sos search indicated that the ppcor package might be of use: its description file says that it is for partial and semi-partial correlation. Is that what you had in mind? # install.packages('sos') library(sos) findFn('semi correlation') HTH, Dennis On Sun, Jul 3, 2011 at 9:33 AM, Assieh Rashidi wrote: > Hi, I want to know how i could calculate semi correlation with R. Is there any package for it? > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jwiley.psych at gmail.com Sun Jul 3 21:26:07 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Sun, 3 Jul 2011 12:26:07 -0700 Subject: [R] PROBLEM IN R version 2.13.0 (2011-04-13) In-Reply-To: References: Message-ID: I could not find the "asreml" package on CRAN or bioconductor, so I did not install it, but I can load all of the other packages without a problem. If you post where to get the "asreml" package, I can try that too. Cheers, Josh R version 2.13.0 (2011-04-13) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GraphAlignment_1.14.0 gregmisc_2.1.1 MASS_7.3-13 [4] lattice_0.19-26 gplots_2.8.0 gdata_2.8.2 [7] caTools_1.12 bitops_1.0-4.1 gtools_2.6.2 [10] gmodels_2.15.1 loaded via a namespace (and not attached): [1] tools_2.13.0 On Sun, Jul 3, 2011 at 4:21 AM, Sergio Ivan Roman Ponce wrote: > I am using R (version 2.13.0; 2011-04-13). Platform: x86_64-pc-mingw32/x64 > (64-bit). > > > > The messages that I see is > > > > ?This application has requested the Runtime to terminate it in an unusual > way. > > Please contact the application's support team for more information? > > > > I am using the following libraries: > >> library(package="gmodels") > >> library(package="gtools") > >> library(package="bitops") > >> library(package="caTools") > >> library(package="grid") > >> library(package="gdata") > > gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. > > > > gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. > > > > Attaching package: 'gdata' > > > > The following object(s) are masked from 'package:stats': > > > > ? ?nobs > > > > The following object(s) are masked from 'package:utils': > > > > ? ?object.size > > > >> library(package="gplots") > > > > Attaching package: 'gplots' > > > > The following object(s) are masked from 'package:stats': > > > > ? ?lowess > > > >> library(package="lattice") > >> library(package="MASS") > >> library(package="gregmisc") > > Mensajes de aviso perdidos > > > > ? ? ? ?The `gregmisc' *package* has converted into a *bundle* > > ? ? ? ?containing four sub-packages: gdata, gtools, gmodels, and gplots. > > ? ? ? ?Please load these packages directly. > >> library(package="asreml") > >> library(package="GraphAlignment") > > > > > > > > I am searching how to solve this problem. > > > > Regards, > > > > > > Sergio Ivan Roman Ponce > > Kajavein 18, 1430 > ruega&ei=8dJNTauHItGcOpCWnBE&sa=X&oi=geocode_result&ct=title&resnum=1&ved=0C > BUQ8gEwAA> ?s ,Norway > > Mobile +47 - 94 43 61 86 > > e-mail: romanponce at hotmail.com > > skype: romanpsergio > > > > > ? ? ? ?[[alternative HTML version deleted]] Please do not post in HTML, when converted to plain text (as it is for this list), it comes through with lots of extra space at best and sometimes completely garbled. At least on my hotmail, a few buttons to the right of "Send" is a drop down menu that lets you select Rich text, HTML, or plain text. When sending to Rhelp, select "Plain text". > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From dwinsemius at comcast.net Sun Jul 3 21:58:15 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 3 Jul 2011 15:58:15 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <0086963E-3AAE-43FC-A4AE-32EE5A89D4CC@comcast.net> On Jul 3, 2011, at 1:07 PM, Bansal, Vikas wrote: > Yes you are right. unlist operation is unnecessary and I have tried > it yesterday and it is working without that operation also.But I > have one more problem on which I have worked whole day but did not > get any solution.As I told you I am new to R,I want to ask that how > I can use the (if condition) in the following code > > df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") > txtvec <- readLines(textConnection(df[,9])) > dad=data.frame(A = (sapply(gregexpr("A|a", (df[,9])), function(x) if > ( x[[1]] != -1) > length(x) else 0 )), > C = (sapply(gregexpr("C|c", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > G = (sapply(gregexpr("G|g", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > T = (sapply(gregexpr("T|t", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > N = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 ))) > > > Now my problem is in my data frame I have alphabets A,C,G and T in > 3rd column also.Now these commas (,)and dots(.) in column 9 are for > these alphabets which are in column 3.I want to use if condition > like this > > if in my dataframe column 3 have A then A = (sapply(gregexpr("\\,|\ > \.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else (A = (sapply(gregexpr("A|a", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )),if in my dataframe column 3 haveCA then C = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else C = (sapply(gregexpr("C|c", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), if in my dataframe column 3 have G then G = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else G = (sapply(gregexpr("G|g", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )) if in my dataframe column 3 have T then T = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else T = (sapply(gregexpr("T|t", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), > I finally figured out that you wanted this: > dat$newcol <- apply(dat, 1, function(x) gsub("\\,|\\.", x[3], x[9]) ) # So that replaces any instance of "," or "." in col9 with the letter in col3 # Then the same old routine as yesterday > dat$A <- sapply(gregexpr("A|a", (dat[,"newcol"])), function(x) if ( x[[1]] != -1) length(x) else 0 ) > dat$C <- sapply(gregexpr("C|c", (dat[,"newcol"])), function(x) if ( x[[1]] != -1) length(x) else 0 ) > dat$G <- sapply(gregexpr("G|g", (dat[,"newcol"])), function(x) if ( x[[1]] != -1) length(x) else 0 ) > dat$T <- sapply(gregexpr("T|t", (dat[,"newcol"])), function(x) if ( x[[1]] != -1) length(x) else 0 ) > dat[, c("A","C", "G", "T")] A C G T 1 1 0 1 4 2 4 0 0 2 3 4 2 0 0 4 1 5 0 0 5 0 0 4 3 6 5 1 1 0 7 4 0 4 0 8 8 0 0 0 9 1 4 1 1 10 0 0 0 8 11 0 0 0 8 > > So I want to code so that it will give the output like this- > > DATA FRAME (Input) > > col3 col 9 > T .a,g,, > A .t,t,, > A .,c,c, > C .,a,,, > G .,t,t,t > A .c,,g,^!. > A .g,ggg.^!, > A .$,,,,,., > C a,g,,t, > T ,,,,,.,^!. > T ,$,,,,.,." > > > output > > A C G T > 1 0 1 4 > 4 0 0 2 > 4 2 0 0 > 1 5 0 0 > 0 0 4 3 > > > > This is the output for first five rows. > > > > Can you please help me how to use this if condition in your coding > or we can also do it by using some other condition rather than if > condition? > > > > > > > > > > > > > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Sunday, July 03, 2011 3:57 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 2, 2011, at 4:46 PM, Bansal, Vikas wrote: > >> DEAR ALL, >> I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY... >> >> df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = >> "character") >> txt=df[,9] >> txtvec <- readLines(textConnection(txt)) >> dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec), >> function(x) if ( x[[1]] != -1) >> length(x) else 0 )), >> C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if >> ( x[[1]] != -1) >> length(x) else 0 ))) >> > > The unlist operation is unnecessary since the sapply operation returns > a vector. (It doesn't hurt, but it is unnecessary.) >> >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Saturday, July 02, 2011 9:04 PM >> To: Dennis Murphy >> Cc: r-help at r-project.org; Bansal, Vikas >> Subject: Re: [R] For help in R coding >> >> On reflection and a bit of testing I think the best approach would be >> to use gregexpr. For counting the number of commas, this appears >> quite >> straightforward. >> >>> sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1) >> length(x) else 0 ) >> [1] 3 3 3 4 3 3 2 6 4 6 6 >> >> It easily generalizes to period and the `|` (or) operation on >> letters. >> ( did need to add the check since the length of gregexpr is always at >> least one but ihas value -1 when there is no match >> >>> sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1) >> length(x) else 0 ) >> [1] 0 2 0 0 3 0 0 0 1 0 0 >> >> >> On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote: >> >>> Hi: >>> >>> There seems to be a problem if the string ends in , or . , which >>> makes >>> it difficult for strsplit() to pick up if it is splitting on those >>> characters. Here is an alternative, splitting on individual >>> characters >>> and using charmatch() instead: >>> >>> charsum <- function(s, char) { >>> u <- strsplit(s, "") >>> sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) >>> } >>> >>> unname(sapply(txtvec, function(x) charsum(x, ','))) >>> unname(sapply(txtvec, function(x) charsum(x, '.'))) >>> >>> Putting this into a data frame, >>> >>> dfout <- data.frame(periods = unname(sapply(txtvec, function(x) >>> charsum(x, '.'))), >>> commas = unname(sapply(txtvec, >>> function(x) charsum(x, '.'))) ) >>> txtvec >>> >>> HTH, >>> Dennis >>> >>> On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius >>> wrote: >>>> >>>> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: >>>> >>>>> >>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I am doing a project on variant calling using R.I am working on >>>>>>> pileup file.There are 10 columns in my data frame and I want to >>>>>>> count the number of A,C,G and T in each row for column 9.example >>>>>>> of >>>>>>> column 9 is given below- >>>>>>> >>>>>>> .a,g,, >>>>>>> .t,t,, >>>>>>> .,c,c, >>>>>>> .,a,,, >>>>>>> .,t,t,t >>>>>>> .c,,g,^!. >>>>>>> .g,ggg.^!, >>>>>>> .$,,,,,., >>>>>>> a,g,,t, >>>>>>> ,,,,,.,^!. >>>>>>> ,$,,,,.,. >>>>>>> >>>>>>> This is a bit confusing for me as these characters are in one >>>>>>> column >>>>>>> and how can we scan them for each row to print number of A,C,G >>>>>>> and T >>>>>>> for each row. >>>>>> >>>>>> Seems a bit clunky but this does the job (first the data): >>>>>>> >>>>>>> txt <- " .a,g,, >>>>>> >>>>>> + .t,t,, >>>>>> + .,c,c, >>>>>> + .,a,,, >>>>>> + .,t,t,t >>>>>> + .c,,g,^!. >>>>>> + .g,ggg.^!, >>>>>> + .$,,,,,., >>>>>> + a,g,,t, >>>>>> + ,,,,,.,^!. >>>>>> + ,$,,,,.,." >>>>>> >>>>>>> txtvec <- readLines(textConnection(txt)) >>>>>> >>>>>> Now the clunky solution, Basically subtracts 1 from the counts of >>>>>> "fragments" that result from splitting on each letter in turn. >>>>>> Could >>>>>> be made prettier with a function that did the job. >>>>>> >>>>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>>> >>>>>> split="a"), length) , "-", 1)), >>>>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>>>>> length) , "-", 1)), >>>>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>>>>> length) , "-", 1)), >>>>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>>>>> length) , "-", 1)) ) >>>>>> A C G T >>>>>> .a,g,, 1 0 1 0 >>>>>> .t,t,, 0 0 0 2 >>>>>> .,c,c, 0 2 0 0 >>>>>> .,a,,, 1 0 0 0 >>>>>> .,t,t,t 0 0 0 2 >>>>>> .c,,g,^!. 0 1 1 0 >>>>>> .g,ggg.^!, 0 0 4 0 >>>>>> .$,,,,,., 0 0 0 0 >>>>>> a,g,,t, 1 0 1 1 >>>>>> ,,,,,.,^!. 0 0 0 0 >>>>>> ,$,,,,.,. 0 0 0 0 >>>>>> >>>>>> Has the advantage that the input data ends up as rownames, which >>>>>> was a >>>>>> surprise. >>>>>> >>>>>> If you wanted to count "A" and "a" as equivalent, then the split >>>>>> argument should be "a|A" >>>>>> >>>>>> >>>>> >>>>>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>>>>>> LIKE >>>>>>> THIS. >>>>> >>>>> BUT CAN I COUNT . AND , ALSO USING- >>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>> split=".|,"), length) , "-", 1)), >>>>> >>>>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME >>>>> PLACES >>>>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >>>>> CALCULATING AND JUST SHOWING 0. >>>> >>>> You need to use valid regex expressions for 'split'. Since "." and >>>> "," are >>>> special characters they need to be escaped when you wnat the >>>> literals to be >>>> recognized as such. >>>> >>>> I haven't figured out why but you need to drop the final operation >>>> of >>>> subtracting 1 from the values when counting commas: >>>> >>>> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, >>>> strsplit, >>>> split="\\."), length) , "-", 1)) >>>> ,commas = unlist( lapply( sapply(txtvec, strsplit, >>>> split="\\,"), length) ) ) >>>> periods commas >>>> .a,g,, 1 3 >>>> .t,t,, 1 3 >>>> .,c,c, 1 3 >>>> .,a,,, 1 4 >>>> .,t,t,t 1 4 >>>> .c,,g,^!. 1 4 >>>> .g,ggg.^!, 2 2 >>>> .$,,,,,., 2 6 >>>> a,g,,t, 0 4 >>>> ,,,,,.,^!. 1 7 >>>> ,$,,,,.,. 1 7 >>>> >>>> -- >>>> >>>> David Winsemius, MD >>>> West Hartford, CT >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> >> David Winsemius, MD >> West Hartford, CT >> > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT From ewjwallace at gmail.com Sun Jul 3 22:39:46 2011 From: ewjwallace at gmail.com (Edward Wallace) Date: Sun, 3 Jul 2011 16:39:46 -0400 Subject: [R] VGAM constraints-related puzzle Message-ID: Edward Wallace gmail.com> writes: > > Hello R users, > I have a puzzle with the VGAM package, on my first excursion into > generalized additive models, in that this very nice package seems to > want to do either more or less than what I want. > > Precisely, I have a 4-component outcome, y, and am fitting multinomial > logistic regression with one predictor x. What I would like to find > out is, is there a single nonlinear function f(x) which acts in place > of the linear predictor x. There is a mechanistic reason to believe > this is sensible. So I'd like to fit a model > \eta_j = \beta_{ (j) 0 } + \beta_{ (j) x } f(x) > where both the function f(x) and its scaling coefficients \beta_{ (j) > x } are fit simultaneously. Here \eta_j is the linear predictor, the > logodds of outcome j vs the reference outcome. I cannot see how to fit > exactly this. Thomas Yee wrote : > Hello, > > try > > rrvglm(y ~ 1 + bs(x), fam = multinomial, trace = TRUE) > > It seems what you want is a stereotype model with > a smooth function. > Unfortunately rrvglm() is restricted to regression splines. Thank you very much. This seems to work, but occasionally quits producing the cryptic error "Error in devmu[smallmu] = smy * log(smu) : NAs are not allowed in subscripted assignments" Any ideas? > You could extract out the scaling coeffs and feed them > into vgam() using the constraints argument, but that > would not be optimal in any strict sense. Really? I thought I had tried adding constraint matrices such as [1 ; 2] and vglm raised an error saying it needed entries to be 1 or 0. I can check that if you'd like. Edward -- Edward Wallace, PhD Harvard FAS center for Systems Biology +1-773-517-4009 From pdalgd at gmail.com Sun Jul 3 22:57:37 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Sun, 3 Jul 2011 22:57:37 +0200 Subject: [R] PROBLEM IN R version 2.13.0 (2011-04-13) In-Reply-To: References: Message-ID: On Jul 3, 2011, at 21:26 , Joshua Wiley wrote: > I could not find the "asreml" package on CRAN or bioconductor, so I > did not install it, but I can load all of the other packages without a > problem. If you post where to get the "asreml" package, I can try > that too. The asreml package is interface code to software from VSN international. It is not clear from their website which version of R it expects to work with, so one could well expect that an incompatibility has crept in. As a matter of principle, we only try to remain compatible within versions with the same minor number (things may work anyway, we don't guarantee breakage either). I.e., _if_ this is the cause of the problem, it is something that needs to be taken up with the supplier. > > Cheers, > > Josh > > R version 2.13.0 (2011-04-13) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grid stats graphics grDevices utils datasets methods > [8] base > > other attached packages: > [1] GraphAlignment_1.14.0 gregmisc_2.1.1 MASS_7.3-13 > [4] lattice_0.19-26 gplots_2.8.0 gdata_2.8.2 > [7] caTools_1.12 bitops_1.0-4.1 gtools_2.6.2 > [10] gmodels_2.15.1 > > loaded via a namespace (and not attached): > [1] tools_2.13.0 > > On Sun, Jul 3, 2011 at 4:21 AM, Sergio Ivan Roman Ponce > wrote: >> I am using R (version 2.13.0; 2011-04-13). Platform: x86_64-pc-mingw32/x64 >> (64-bit). >> >> >> >> The messages that I see is >> >> >> >> ?This application has requested the Runtime to terminate it in an unusual >> way. >> >> Please contact the application's support team for more information? >> >> >> >> I am using the following libraries: >> >>> library(package="gmodels") >> >>> library(package="gtools") >> >>> library(package="bitops") >> >>> library(package="caTools") >> >>> library(package="grid") >> >>> library(package="gdata") >> >> gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. >> >> >> >> gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. >> >> >> >> Attaching package: 'gdata' >> >> >> >> The following object(s) are masked from 'package:stats': >> >> >> >> nobs >> >> >> >> The following object(s) are masked from 'package:utils': >> >> >> >> object.size >> >> >> >>> library(package="gplots") >> >> >> >> Attaching package: 'gplots' >> >> >> >> The following object(s) are masked from 'package:stats': >> >> >> >> lowess >> >> >> >>> library(package="lattice") >> >>> library(package="MASS") >> >>> library(package="gregmisc") >> >> Mensajes de aviso perdidos >> >> >> >> The `gregmisc' *package* has converted into a *bundle* >> >> containing four sub-packages: gdata, gtools, gmodels, and gplots. >> >> Please load these packages directly. >> >>> library(package="asreml") >> >>> library(package="GraphAlignment") >> >> >> >> >> >> >> >> I am searching how to solve this problem. >> >> >> >> Regards, >> >> >> >> >> >> Sergio Ivan Roman Ponce >> >> Kajavein 18, 1430 >> > ruega&ei=8dJNTauHItGcOpCWnBE&sa=X&oi=geocode_result&ct=title&resnum=1&ved=0C >> BUQ8gEwAA> ?s ,Norway >> >> Mobile +47 - 94 43 61 86 >> >> e-mail: romanponce at hotmail.com >> >> skype: romanpsergio >> >> >> >> >> [[alternative HTML version deleted]] > > Please do not post in HTML, when converted to plain text (as it is for > this list), it comes through with lots of extra space at best and > sometimes completely garbled. At least on my hotmail, a few buttons > to the right of "Send" is a drop down menu that lets you select Rich > text, HTML, or plain text. When sending to Rhelp, select "Plain > text". > >> >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From vikas.bansal at kcl.ac.uk Mon Jul 4 00:10:40 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Sun, 3 Jul 2011 23:10:40 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk> ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Sunday, July 03, 2011 7:08 PM To: Bansal, Vikas Cc: Dennis Murphy; r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 3, 2011, at 1:07 PM, Bansal, Vikas wrote: > Yes you are right. unlist operation is unnecessary and I have tried > it yesterday and it is working without that operation also.But I > have one more problem on which I have worked whole day but did not > get any solution.As I told you I am new to R,I want to ask that how > I can use the (if condition) in the following code > > df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") > txtvec <- readLines(textConnection(df[,9])) > dad=data.frame(A = (sapply(gregexpr("A|a", (df[,9])), function(x) if > ( x[[1]] != -1) > length(x) else 0 )), > C = (sapply(gregexpr("C|c", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > G = (sapply(gregexpr("G|g", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > T = (sapply(gregexpr("T|t", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > N = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 ))) > > > Now my problem is in my data frame I have alphabets A,C,G and T in > 3rd column also.Now these commas (,)and dots(.) in column 9 are for > these alphabets which are in column 3.I want to use if condition > like this > > if in my dataframe column 3 have A then A = (sapply(gregexpr("\\,|\ > \.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else (A = (sapply(gregexpr("A|a", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )),if in my dataframe column 3 haveCA then C = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else C = (sapply(gregexpr("C|c", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), if in my dataframe column 3 have G then G = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else G = (sapply(gregexpr("G|g", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )) if in my dataframe column 3 have T then T = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else T = (sapply(gregexpr("T|t", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), > > > So I want to code so that it will give the output like this- > > DATA FRAME (Input) > > col3 col 9 > T .a,g,, > A .t,t,, > A .,c,c, > C .,a,,, > G .,t,t,t > A .c,,g,^!. > A .g,ggg.^!, > A .$,,,,,., > C a,g,,t, > T ,,,,,.,^!. > T ,$,,,,.,." > > > output > > A C G T > 1 0 1 4 > 4 0 0 2 > 4 2 0 0 > 1 5 0 0 > 0 0 4 3 > > > > This is the output for first five rows.v I was unable to follow the logic and because complete output was not offered, I am unable to check my guesses against you full specifications. Oh sorry.I will explain it again.As I told you my dataframe has ten columns.but i am working on 3rd and 9th column. to calculate the number of A C G T . and , we used the following code- > df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") > txtvec <- readLines(textConnection(df[,9])) > dad=data.frame(A = (sapply(gregexpr("A|a", (df[,9])), function(x) if > ( x[[1]] != -1) > length(x) else 0 )), > C = (sapply(gregexpr("C|c", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > G = (sapply(gregexpr("G|g", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > T = (sapply(gregexpr("T|t", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 )), > N = (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] ! > = -1) > length(x) else 0 ))) now in 3rd column of my dataframe I have chAracters A or C or G or T.so my 3rd column and 9th column is like this- col3 col 9 > T .a,g,, > A .t,t,, > A .,c,c, > C .,a,,, > G .,t,t,t > A .c,,g,^!. > A .g,ggg.^!, > A .$,,,,,., > C a,g,,t, > T ,,,,,.,^!. > T ,$,,,,.,." Initially we were working on 9th column only to calculate number of A,C,G and T and (.) and (,) separately using code provided by you shown above. but now i want that if in column 3 I have T so it should make it equal to the number of .|, as I showed you my output output > > A C G T > 1 0 1 4 > 4 0 0 2 > 4 2 0 0 > 1 5 0 0 > 0 0 4 3 In the first row of my input I have T in 3rd column.so T=number of total . and , that is 4.and a and g are 1 in second row of my input i have A in 3rd column so A should be equal to total number of (.) and (,) that is 4 and remaining are the 2 T. that is why i wrote this thing using if condotion > if in my dataframe column 3 have A then A = (sapply(gregexpr("\\,|\ > \.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else (A = (sapply(gregexpr("A|a", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), if in my dataframe column 3 haveCA then C = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else C = (sapply(gregexpr("C|c", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), if in my dataframe column 3 have G then G = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else G = (sapply(gregexpr("G|g", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )) if in my dataframe column 3 have T then T = > (sapply(gregexpr("\\,|\\.", (df[,9])), function(x) if ( x[[1]] != -1) > length(x) else 0 ))) else T = (sapply(gregexpr("T|t", (df[,9])), > function(x) if ( x[[1]] != -1) > length(x) else 0 )), the code is same i just want to add a condition so that it should check that if in column 3, the character is A then make number of A equal to total number of . and , Should I explain better or can you please tell me which thing is not clear? > -- David. > > > > Can you please help me how to use this if condition in your coding > or we can also do it by using some other condition rather than if > condition? > > > > > > > > > > > > > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Sunday, July 03, 2011 3:57 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 2, 2011, at 4:46 PM, Bansal, Vikas wrote: > >> DEAR ALL, >> I TRIED THIS CODE AND THIS IS RUNNING PERFECTLY... >> >> df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = >> "character") >> txt=df[,9] >> txtvec <- readLines(textConnection(txt)) >> dad=data.frame(A = unlist(sapply(gregexpr("A|a", txtvec), >> function(x) if ( x[[1]] != -1) >> length(x) else 0 )), >> C = unlist(sapply(gregexpr("C|c", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> G = unlist(sapply(gregexpr("G|g", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> T = unlist(sapply(gregexpr("T|t", txtvec), function(x) if ( x[[1]] ! >> = -1) >> length(x) else 0 )), >> N = unlist(sapply(gregexpr("\\,|\\.", txtvec), function(x) if >> ( x[[1]] != -1) >> length(x) else 0 ))) >> > > The unlist operation is unnecessary since the sapply operation returns > a vector. (It doesn't hurt, but it is unnecessary.) >> >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Saturday, July 02, 2011 9:04 PM >> To: Dennis Murphy >> Cc: r-help at r-project.org; Bansal, Vikas >> Subject: Re: [R] For help in R coding >> >> On reflection and a bit of testing I think the best approach would be >> to use gregexpr. For counting the number of commas, this appears >> quite >> straightforward. >> >>> sapply(gregexpr("\\,", txtvec), function(x) if ( x[[1]] != -1) >> length(x) else 0 ) >> [1] 3 3 3 4 3 3 2 6 4 6 6 >> >> It easily generalizes to period and the `|` (or) operation on >> letters. >> ( did need to add the check since the length of gregexpr is always at >> least one but ihas value -1 when there is no match >> >>> sapply(gregexpr("t|T", txtvec), function(x) if ( x[[1]] != -1) >> length(x) else 0 ) >> [1] 0 2 0 0 3 0 0 0 1 0 0 >> >> >> On Jul 2, 2011, at 3:22 PM, Dennis Murphy wrote: >> >>> Hi: >>> >>> There seems to be a problem if the string ends in , or . , which >>> makes >>> it difficult for strsplit() to pick up if it is splitting on those >>> characters. Here is an alternative, splitting on individual >>> characters >>> and using charmatch() instead: >>> >>> charsum <- function(s, char) { >>> u <- strsplit(s, "") >>> sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) >>> } >>> >>> unname(sapply(txtvec, function(x) charsum(x, ','))) >>> unname(sapply(txtvec, function(x) charsum(x, '.'))) >>> >>> Putting this into a data frame, >>> >>> dfout <- data.frame(periods = unname(sapply(txtvec, function(x) >>> charsum(x, '.'))), >>> commas = unname(sapply(txtvec, >>> function(x) charsum(x, '.'))) ) >>> txtvec >>> >>> HTH, >>> Dennis >>> >>> On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius >>> wrote: >>>> >>>> On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: >>>> >>>>> >>>>> >>>>>>> Dear all, >>>>>>> >>>>>>> I am doing a project on variant calling using R.I am working on >>>>>>> pileup file.There are 10 columns in my data frame and I want to >>>>>>> count the number of A,C,G and T in each row for column 9.example >>>>>>> of >>>>>>> column 9 is given below- >>>>>>> >>>>>>> .a,g,, >>>>>>> .t,t,, >>>>>>> .,c,c, >>>>>>> .,a,,, >>>>>>> .,t,t,t >>>>>>> .c,,g,^!. >>>>>>> .g,ggg.^!, >>>>>>> .$,,,,,., >>>>>>> a,g,,t, >>>>>>> ,,,,,.,^!. >>>>>>> ,$,,,,.,. >>>>>>> >>>>>>> This is a bit confusing for me as these characters are in one >>>>>>> column >>>>>>> and how can we scan them for each row to print number of A,C,G >>>>>>> and T >>>>>>> for each row. >>>>>> >>>>>> Seems a bit clunky but this does the job (first the data): >>>>>>> >>>>>>> txt <- " .a,g,, >>>>>> >>>>>> + .t,t,, >>>>>> + .,c,c, >>>>>> + .,a,,, >>>>>> + .,t,t,t >>>>>> + .c,,g,^!. >>>>>> + .g,ggg.^!, >>>>>> + .$,,,,,., >>>>>> + a,g,,t, >>>>>> + ,,,,,.,^!. >>>>>> + ,$,,,,.,." >>>>>> >>>>>>> txtvec <- readLines(textConnection(txt)) >>>>>> >>>>>> Now the clunky solution, Basically subtracts 1 from the counts of >>>>>> "fragments" that result from splitting on each letter in turn. >>>>>> Could >>>>>> be made prettier with a function that did the job. >>>>>> >>>>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>>> >>>>>> split="a"), length) , "-", 1)), >>>>>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>>>>> length) , "-", 1)), >>>>>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>>>>> length) , "-", 1)), >>>>>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>>>>> length) , "-", 1)) ) >>>>>> A C G T >>>>>> .a,g,, 1 0 1 0 >>>>>> .t,t,, 0 0 0 2 >>>>>> .,c,c, 0 2 0 0 >>>>>> .,a,,, 1 0 0 0 >>>>>> .,t,t,t 0 0 0 2 >>>>>> .c,,g,^!. 0 1 1 0 >>>>>> .g,ggg.^!, 0 0 4 0 >>>>>> .$,,,,,., 0 0 0 0 >>>>>> a,g,,t, 1 0 1 1 >>>>>> ,,,,,.,^!. 0 0 0 0 >>>>>> ,$,,,,.,. 0 0 0 0 >>>>>> >>>>>> Has the advantage that the input data ends up as rownames, which >>>>>> was a >>>>>> surprise. >>>>>> >>>>>> If you wanted to count "A" and "a" as equivalent, then the split >>>>>> argument should be "a|A" >>>>>> >>>>>> >>>>> >>>>>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT >>>>>>> LIKE >>>>>>> THIS. >>>>> >>>>> BUT CAN I COUNT . AND , ALSO USING- >>>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>>>> split=".|,"), length) , "-", 1)), >>>>> >>>>> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME >>>>> PLACES >>>>> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >>>>> CALCULATING AND JUST SHOWING 0. >>>> >>>> You need to use valid regex expressions for 'split'. Since "." and >>>> "," are >>>> special characters they need to be escaped when you wnat the >>>> literals to be >>>> recognized as such. >>>> >>>> I haven't figured out why but you need to drop the final operation >>>> of >>>> subtracting 1 from the values when counting commas: >>>> >>>> data.frame(periods = unlist(lapply( lapply( sapply(txtvec, >>>> strsplit, >>>> split="\\."), length) , "-", 1)) >>>> ,commas = unlist( lapply( sapply(txtvec, strsplit, >>>> split="\\,"), length) ) ) >>>> periods commas >>>> .a,g,, 1 3 >>>> .t,t,, 1 3 >>>> .,c,c, 1 3 >>>> .,a,,, 1 4 >>>> .,t,t,t 1 4 >>>> .c,,g,^!. 1 4 >>>> .g,ggg.^!, 2 2 >>>> .$,,,,,., 2 6 >>>> a,g,,t, 0 4 >>>> ,,,,,.,^!. 1 7 >>>> ,$,,,,.,. 1 7 >>>> >>>> -- >>>> >>>> David Winsemius, MD >>>> West Hartford, CT >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >> >> David Winsemius, MD >> West Hartford, CT >> > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT From daniel at umd.edu Mon Jul 4 00:56:23 2011 From: daniel at umd.edu (Daniel Malter) Date: Sun, 3 Jul 2011 15:56:23 -0700 (PDT) Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <1309733783115-3642655.post@n4.nabble.com> Say you have the string x in a matrix x<-c('a,..gGGtTaac References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Sunday, July 03, 2011 7:08 PM >> > > the code is same i just want to add a condition so that it should > check that if in column 3, the character is A then make number of A > equal to total number of . and , > > Should I explain better or can you please tell me which thing is not > clear? My second posting today had a solution. > >> > -- > David. >> >> >> >> Can you please help me how to use this if condition in your coding >> or we can also do it by using some other condition rather than if >> condition? >> > David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Mon Jul 4 03:02:49 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Sun, 3 Jul 2011 21:02:49 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: >> So I want to code so that it will give the output like this- >> >> DATA FRAME (Input) Editing the task so it is reproducible: dat <- read.table(textConnection(' col3 col9 T .a,g,, A .t,t,, A .,c,c, C .,a,,, G .,t,t,t A .c,,g,^!. A .g,ggg.^!, A .$,,,,,., C a,g,,t, T ,,,,,.,^!. T ,$,,,,.,."'), header=TRUE, stringsAsFactors=FALSE) >> output >> >> A C G T >> 1 0 1 4 >> 4 0 0 2 >> 4 2 0 0 >> 1 5 0 0 >> 0 0 4 3 It's also possible to apply the logic that Gabor Grothendieck offered at the beginning of this thread: dat[, "newcol"] <- apply(dat, 1, function(x) gsub("\\,|\\." ,x[1], x[2]) ) # ... and the obvious repetition for C.G.T > dat[,"A"] <- nchar( gsub("[^aA]", "", dat[ , "newcol"] )) > dat col3 col9 newcol A 1 T .a,g,, TaTgTT 1 2 A .t,t,, AtAtAA 4 3 A .,c,c, AAcAcA 4 4 C .,a,,, CCaCCC 1 5 G .,t,t,t GGtGtGt 0 6 A .c,,g,^!. AcAAgA^!A 5 7 A .g,ggg.^!, AgAgggA^!A 4 8 A .$,,,,,., A$AAAAAAA 8 9 C a,g,,t, aCgCCtC 1 10 T ,,,,,.,^!. TTTTTTT^!T 0 11 T ,$,,,,.,." T$TTTTTTT" 0 I am deeply in debt to Gabor Grothendieck. He taught me all I know regarding regex. The man is a master at patterns. -- David Winsemius, MD West Hartford, CT From bbolker at gmail.com Mon Jul 4 03:29:39 2011 From: bbolker at gmail.com (Ben Bolker) Date: Mon, 4 Jul 2011 01:29:39 +0000 Subject: [R] Isolines in vector format References: Message-ID: Antonio Rodriges gmail.com> writes: > I am working with netcdf data of NCEP/NCAR Climate Reanalysis. R does > have capability to draw isolines using "contour". However, I need not > to draw but to export contours in any vector format. Is it possible? See ?contourLines (you will have to write a little bit of code to export the values, which are stored as a list of contours, to a file: see e.g. ?write) From n.bowora at gmail.com Sun Jul 3 23:49:31 2011 From: n.bowora at gmail.com (EdBo) Date: Sun, 3 Jul 2011 14:49:31 -0700 (PDT) Subject: [R] Hint improve my code In-Reply-To: <1309669183190-3641520.post@n4.nabble.com> References: <1309654345875-3641354.post@n4.nabble.com> <1309669183190-3641520.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From JRMH001 at gmail.com Sun Jul 3 22:35:08 2011 From: JRMH001 at gmail.com (J. R. M. Hosking) Date: Sun, 3 Jul 2011 16:35:08 -0400 Subject: [R] covariance matrix TL-moments In-Reply-To: References: Message-ID: On 2011-07-03 04:48, osama hussien wrote: > The backage lmoments computes the L-moments covariances. does anyone > know a backage to compute > the TL-moments covariances > thank As far as I know the answer is no. But if you study the code of function varLmoments in package nsRFA and see how closely the computations correspond to the expression for the covariances of "untrimmed" L-moments that you get by setting s=0 and t=0 in Hosking (2007, last two displayed equations on p.3029), it should not be difficult to extend those computations to the "trimmed" case with s>0 or t>0. Reference: J.R.M. Hosking (2007). Some theory and practical uses of trimmed L-moments. Journal of Statistical Planning and Inference, 137, 3024-3039. J. R. M. Hosking From romanponce at hotmail.com Sun Jul 3 22:35:40 2011 From: romanponce at hotmail.com (Sergio Ivan Roman Ponce) Date: Sun, 3 Jul 2011 22:35:40 +0200 Subject: [R] PROBLEM IN R version 2.13.0 (2011-04-13) In-Reply-To: References: Message-ID: Dear Josh This is the link of the reference manual: http://www.vsni.co.uk/downloads/asreml/release3/asreml-R.pdf The link to download the asreml-R is: http://www.vsni.co.uk/software/asreml/ I am still working on the code... Regards, > Sergio -----Original Message----- From: Joshua Wiley [mailto:jwiley.psych at gmail.com] Sent: domingo, 03 de julio de 2011 09:26 p.m. To: Sergio Ivan Roman Ponce Cc: r-help at r-project.org Subject: Re: [R] PROBLEM IN R version 2.13.0 (2011-04-13) I could not find the "asreml" package on CRAN or bioconductor, so I did not install it, but I can load all of the other packages without a problem. If you post where to get the "asreml" package, I can try that too. Cheers, Josh R version 2.13.0 (2011-04-13) Platform: x86_64-pc-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] grid stats graphics grDevices utils datasets methods [8] base other attached packages: [1] GraphAlignment_1.14.0 gregmisc_2.1.1 MASS_7.3-13 [4] lattice_0.19-26 gplots_2.8.0 gdata_2.8.2 [7] caTools_1.12 bitops_1.0-4.1 gtools_2.6.2 [10] gmodels_2.15.1 loaded via a namespace (and not attached): [1] tools_2.13.0 On Sun, Jul 3, 2011 at 4:21 AM, Sergio Ivan Roman Ponce wrote: > I am using R (version 2.13.0; 2011-04-13). Platform: > x86_64-pc-mingw32/x64 (64-bit). > > > > The messages that I see is > > > > ?This application has requested the Runtime to terminate it in an > unusual way. > > Please contact the application's support team for more information? > > > > I am using the following libraries: > >> library(package="gmodels") > >> library(package="gtools") > >> library(package="bitops") > >> library(package="caTools") > >> library(package="grid") > >> library(package="gdata") > > gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED. > > > > gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED. > > > > Attaching package: 'gdata' > > > > The following object(s) are masked from 'package:stats': > > > > nobs > > > > The following object(s) are masked from 'package:utils': > > > > object.size > > > >> library(package="gplots") > > > > Attaching package: 'gplots' > > > > The following object(s) are masked from 'package:stats': > > > > lowess > > > >> library(package="lattice") > >> library(package="MASS") > >> library(package="gregmisc") > > Mensajes de aviso perdidos > > > > The `gregmisc' *package* has converted into a *bundle* > > containing four sub-packages: gdata, gtools, gmodels, and gplots. > > Please load these packages directly. > >> library(package="asreml") > >> library(package="GraphAlignment") > > > > > > > > I am searching how to solve this problem. > > > > Regards, > > > > > > Sergio Ivan Roman Ponce > > Kajavein 18, 1430 > 5s,+No > ruega&ei=8dJNTauHItGcOpCWnBE&sa=X&oi=geocode_result&ct=title&resnum=1& > ved=0C > BUQ8gEwAA> ?s ,Norway > > Mobile +47 - 94 43 61 86 > > e-mail: romanponce at hotmail.com > > skype: romanpsergio > > > > > [[alternative HTML version deleted]] Please do not post in HTML, when converted to plain text (as it is for this list), it comes through with lots of extra space at best and sometimes completely garbled. At least on my hotmail, a few buttons to the right of "Send" is a drop down menu that lets you select Rich text, HTML, or plain text. When sending to Rhelp, select "Plain text". > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From n.bowora at gmail.com Mon Jul 4 01:32:33 2011 From: n.bowora at gmail.com (EdBo) Date: Sun, 3 Jul 2011 16:32:33 -0700 (PDT) Subject: [R] Hint improve my code In-Reply-To: <1309654345875-3641354.post@n4.nabble.com> References: <1309654345875-3641354.post@n4.nabble.com> Message-ID: <1309735953198-3642707.post@n4.nabble.com> Please also consider the code below. My attempt to loop through the rows of R_j so that I do not force if() to work on a vector. > llik <- function(par, R_j, R_m) { + al_j <- par[1] + au_j <- par[2] + sigma_j <- par[3] + b_j <- par[4] + + n=2 + runs=5 + est1=matrix(0,nrow=runs) + start.par=c(al_j=0,au_j=0,sigma_j=0.01,b_j=1) + out1=optim(par=start.par,llik, R_j=R_j, R_m=R_m) + for (i in 1: runs) + { + index_start=2*(i-1)+1 + index_end= 2*i + if(R_j[i]< 0) { + sum(log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2) + } else if(R_j[i]>0) { + sum(log(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2) + } else if(R_j[i]==0) { + sum(log(pnorm(au_j,mean=b_j*R_m,sd=sigma_j)-pnorm(al_j,mean=b_j*R_m,sd=sigma_j))) + } + est1[i] <- out1[index_start:index_end] + } + } > est1 Error: object 'est1' not found Thank you in advance -- View this message in context: http://r.789695.n4.nabble.com/Hint-improve-my-code-tp3641354p3642707.html Sent from the R help mailing list archive at Nabble.com. From jcpayne at uw.edu Mon Jul 4 03:52:02 2011 From: jcpayne at uw.edu (genghis) Date: Sun, 3 Jul 2011 18:52:02 -0700 (PDT) Subject: [R] superimposing different plot types in lattice panel.superpose Message-ID: <1309744322916-3642808.post@n4.nabble.com> I would like to plot 3 best-fit models in a single panel of a lattice plot, superimposed on 3 corresponding datasets in the same panel. My goal is to show the models as lines of 3 different colors, and the data as points whose colors correspond to the model colors. In essence, I have two levels of grouping: 1) model vs. data, and 2) model number. Since there is only one ?groups? variable, I have tried to deal with the additional grouping level by subsetting the data inside a custom panel function (basically a hack), but something about the way parameters are passed in panel.superpose (I think) is making it hard to show both points and lines. My question is very similar to a previous post: http://www.ask.com/web?q=r%20panel.superpose%20bwplot%20sim%20actual&o=15527&l=dis&prt=NIS&chn=retail&geo=US&ver=18 , but the questioner in that case was using bwplot, which automatically makes a separate plot for every level of the categorical variable, so they didn?t face the two-level grouping problem, and I have been unable to figure out how to adapt their answer. Another approach I tried was to put my model function inside of my custom panel function so that the analysis occurred there, but I couldn?t get it to subset the x and y data appropriately. In the toy problem below I want to plot each inverted V (?model?) as LINES, with a single POINT (?data?) in the center of each inverted V, the same color as the inverted V. The code runs, but I can?t seem to mix lines and points. In my real problem (stable isotopes with ellipses superimposed on data) there will be additional panels but I am creating only one panel here, for simplicity. #generate test "model results" x<-1:9 y<-rep(c(1,3,1),3) model<-c(rep("a",3),rep("b",3),rep("c",3)) modelresults<-data.frame(x,y,model) #generate test "data" x<-c(2,5,8) y<-rep(1.5,3) model<-c("a","b","c") data<-data.frame(x,y,model) #combine them into one data set combined<-make.groups(modelresults,data) #custom panel function panel.dualplot <- function(...) { panel.xyplot(x[which="modelresults"],y[which="modelresults"],...) panel.points(x[which="data"],y[which="data"],...) } #main call to xyplot xyplot(y ~ x, data=combined, type = "l", panel = panel.superpose, groups = model, panel.groups = panel.dualplot, ) I?d be very grateful for any suggestions. Thanks! John -- View this message in context: http://r.789695.n4.nabble.com/superimposing-different-plot-types-in-lattice-panel-superpose-tp3642808p3642808.html Sent from the R help mailing list archive at Nabble.com. From sleepingwell at gmail.com Mon Jul 4 06:48:11 2011 From: sleepingwell at gmail.com (Simon Knapp) Date: Mon, 4 Jul 2011 14:48:11 +1000 Subject: [R] RWinEdt problem Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sleepingwell at gmail.com Mon Jul 4 06:56:43 2011 From: sleepingwell at gmail.com (Simon Knapp) Date: Mon, 4 Jul 2011 14:56:43 +1000 Subject: [R] RWinEdt problem In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From eliot.isaac at gmail.com Mon Jul 4 07:00:43 2011 From: eliot.isaac at gmail.com (Eliot Miller) Date: Mon, 4 Jul 2011 01:00:43 -0400 Subject: [R] R hangs. Uninstall/reinstall seems insufficient. Message-ID: Hello, I'm running a Mac PowerBook G4 PowerPC with OSX 10.5.8 I was using R 2.12.xx with no problems. I tried to install R 2.13. Have never had any trouble with simply downloading the pkg and installing over previous versions. I did that, selected the packages I use regularly, set them to download with "Install Dependencies" checked and went off to sleep. When I came back in the morning, I had a message saying something to the effect of a large number of packages (dependencies) were not able to be installed. Other than that, all seemed fine. I can run codes and generally use R normally. However, if I misspell anything in the console, e.g. type "lirbary(xx)"...anything that would tell me such and such doesn't exist, it hangs indefinitely. I have to force quit. I tried uninstalling it and installing it again. I tried uninstalling it and installing a more recent build from http://r.research.att.com/ I tried uninstalling it and re-installing R 2.12 from the same site. I followed many of the instructions here, ** OS X 10.5 and higher you have to use ** For 32/64 Bit Binary pkg pkgutil --forget org.r-project.R.Leopard.fw.pkg ** For the 32 bit Tiger binary dmg: pkgutil --forget org.r-project.R.framework pkgutil --forget org.r-project.R ** Etc: ** Check for pkgs: pkgutil --pkgs | grep r-project ** UNinstall R rm -rf /Library/Frameworks/R.framework rm -rf /Applications/R.app until I couldn't find anymore R files on the computer, then re-installed, trying both 2.12 and 2.13 after first doing the above steps, and still no luck. I have also used pkgutil --unlink for the two files -- org.r-project.R.Leopard.fw.pkg, org.r-project.R.Leopard.GUI.pkg -- then restarted, installed again, all to no avail. In other words, I repeatedly deleted both the GUI and the Framework and--I believe--all associated files (e.g. Library/Receipts/bom), installed, and it did not solve the problem. No other programs on the computer seem to be running improperly. I have also created a new user on the computer and tried to install R with that user. Installation worked, but left with the same problem. There were some errors in the installation log from the installation on this new user: Jul 3 20:58:12 Macintosh-102 Installer[474]: JS: Package Authoring Error: Exception thrown while running volume check. TypeError: Result of expression 'my.target.systemVersion' [undefined] is not an object. and also a series of: Jul 3 20:59:10 Macintosh-102 Software Update[440]: Package Authoring Error: installation-check results requires a message Throughout all of this (except when I created a new user), whenever I re-installed R and started it up, it would start with the same background color I had chosen years ago, and was saying it was re-creating the history from .Rhistory (this latter part only until I went ahead and set in preferences to stop loading the history automatically). Obviously, there were still some files/settings somewhere. Finally, I downloaded an old version of appZapper capable of running on OSX 10.5 and deleted Rhistory and did the full uninstall with pkgutil...I was able to make R not load with any previous colors, etc, but the problem still exists. I also updated Java today to the newest version of Java 5, but I went into preferences and switched back to the old version, which had no effect on the problem with R. Apologies if you think this isn't an R question, and it's actually a problem of my computer, but I can't figure how that would be. Any suggestions? Any help would be greatly appreciated. Thanks! Eliot From tyagi149 at gmail.com Mon Jul 4 06:40:42 2011 From: tyagi149 at gmail.com (user123) Date: Sun, 3 Jul 2011 21:40:42 -0700 (PDT) Subject: [R] wavelets Message-ID: <1309754442486-3642973.post@n4.nabble.com> I'm new to the topic of wavelets. When I tried to use the mra function in the wavelets package, the data is not getting compressed. eg. if the original data has 500 values , the output data also has the same. However in MATLAB, depending on the level of decompositon, the data gets compressed. How do I implement this in R? -- View this message in context: http://r.789695.n4.nabble.com/wavelets-tp3642973p3642973.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Mon Jul 4 09:00:59 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Mon, 4 Jul 2011 00:00:59 -0700 Subject: [R] superimposing different plot types in lattice panel.superpose In-Reply-To: <1309744322916-3642808.post@n4.nabble.com> References: <1309744322916-3642808.post@n4.nabble.com> Message-ID: Hi: Here's one way out, if I read your intention properly... plot1 <- xyplot(y ~ x, data = modelresults, groups = model, type = 'l', xlim = c(0, 10), ylim = c(0.5, 3.5)) plot2 <- xyplot(y ~ x, data = data, groups = model, type = 'p', pch = 16, xlim = c(0, 10), ylim = c(0.5, 3.5)) plot1 + plot2 You might need latticeExtra to combine the plots (??); I had it loaded with lattice when I did the plot. HTH, Dennis On Sun, Jul 3, 2011 at 6:52 PM, genghis wrote: > I would like to plot 3 best-fit models in a single panel of a lattice plot, > superimposed on 3 corresponding datasets in the same panel. ?My goal is to > show the models as lines of 3 different colors, and the data as points whose > colors correspond to the model colors. ?In essence, I have two levels of > grouping: 1) model vs. data, and 2) model number. ?Since there is only one > ?groups? variable, I have tried to deal with the additional grouping level > by subsetting the data inside a custom panel function (basically a hack), > but something about the way parameters are passed in panel.superpose (I > think) is making it hard to show both points and lines. > > My question is very similar to a previous post: > http://www.ask.com/web?q=r%20panel.superpose%20bwplot%20sim%20actual&o=15527&l=dis&prt=NIS&chn=retail&geo=US&ver=18 > , but the questioner in that case was using bwplot, which automatically > makes a separate plot for every level of the categorical variable, so they > didn?t face the two-level grouping problem, and I have been unable to figure > out how to adapt their answer. ?Another approach I tried was to put my model > function inside of my custom panel function so that the analysis occurred > there, but I couldn?t get it to subset the x and y data appropriately. > > In the toy problem below I want to plot each inverted V (?model?) as LINES, > with a single POINT (?data?) in the center of each inverted V, the same > color as the inverted V. ?The code runs, but I can?t seem to mix lines and > points. ?In my real problem (stable isotopes with ellipses superimposed on > data) there will be additional panels but I am creating only one panel here, > for simplicity. > > #generate test "model results" > x<-1:9 > y<-rep(c(1,3,1),3) > model<-c(rep("a",3),rep("b",3),rep("c",3)) > modelresults<-data.frame(x,y,model) > > #generate test "data" > x<-c(2,5,8) > y<-rep(1.5,3) > model<-c("a","b","c") > data<-data.frame(x,y,model) > > #combine them into one data set > combined<-make.groups(modelresults,data) > > #custom panel function > panel.dualplot <- function(...) { > ? ?panel.xyplot(x[which="modelresults"],y[which="modelresults"],...) > ? ?panel.points(x[which="data"],y[which="data"],...) > } > > #main call to xyplot > xyplot(y ~ x, data=combined, > type = "l", > panel = panel.superpose, > groups = model, > panel.groups = panel.dualplot, > ) > > > I?d be very grateful for any suggestions. > > Thanks! > > John > > > > -- > View this message in context: http://r.789695.n4.nabble.com/superimposing-different-plot-types-in-lattice-panel-superpose-tp3642808p3642808.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From Thierry.ONKELINX at inbo.be Mon Jul 4 09:39:50 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Mon, 4 Jul 2011 07:39:50 +0000 Subject: [R] Poisson GLM with a logged dependent variable...just asking for trouble? In-Reply-To: References: Message-ID: Dear Mark, I think you want glm(DV ~ log10(IV), family=poisson) Note that the poisson family uses the log-link by default. Hence you don't need to log-transform DV yourself. Best regards, Thierry ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Mark Na > Verzonden: vrijdag 1 juli 2011 23:10 > Aan: r-help at r-project.org > Onderwerp: [R] Poisson GLM with a logged dependent variable...just asking for > trouble? > > Dear R-helpers, > > I'm using a GLM with poisson errors to model integer count data as a function of > one non-integer covariate. > > The model formula is: log(DV) ~ glm(log(IV,10),family=poisson). > > I'm getting a warning because the logged DV is no longer an integer. > > I have three questions: > > 1) Can I ignore the warning, or is logging the DV (resulting in > non-integers) a serious violation of the Poisson error structure? > > 2) If the answer to #1 is "no, don't ignore it, it's serious" then can I use a > quasipoisson error structure instead (does not give the same > warning) and if so are there any pitfalls to using the quasipoisson model? Are > there any better alternatives for count data where the counts must be logged? > Or, should I just abandon logging the DV? In that case, how could I compare the > fit of a Poisson model (without logging the DV) to that of a GLM with normal > errors (with a logged DV). AIC would not be valid because the DVs are different, > right? > > 3) The quasipoisson model doesn't return an AIC value. Why, and is there > anything I can do to calculate AIC manually, that would allow me to compare > this model to other models? > > Many thanks in advance for your help! > > Cheers, Mark > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jdnewmil at dcn.davis.ca.us Mon Jul 4 09:45:41 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Mon, 04 Jul 2011 00:45:41 -0700 Subject: [R] wavelets In-Reply-To: <1309754442486-3642973.post@n4.nabble.com> References: <1309754442486-3642973.post@n4.nabble.com> Message-ID: <1c665c21-baa5-41ad-8b37-dc7b0013b838@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ligges at statistik.tu-dortmund.de Mon Jul 4 09:49:04 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 04 Jul 2011 09:49:04 +0200 Subject: [R] RWinEdt problem In-Reply-To: References: Message-ID: <4E117070.8060206@statistik.tu-dortmund.de> On 04.07.2011 06:56, Simon Knapp wrote: > ... I just tried fiddling with the appearance settings, and when I uncheck > "Custom Colors" under "Document Tabs", the file names reappear, though I > don't get the coloring I am used to (red for modified, green for > unmodified). Right. That is fixed in RWinEdt 1.8-3 (i.e. customm coloring disabled) which has been uploaded to CRAN during the weekend. A Windows binary will be created shortly. Uwe > Thanks again, > Simon Knapp > > > On Mon, Jul 4, 2011 at 2:48 PM, Simon Knapp wrote: > >> Hi R Helpers, >> >> I am a long time RWinEdt user and have just acquired a new laptop. I have >> installed RWinEdt and things are going smoothly except for one small glitch >> - file names are not appearing on the document tabs. When I use WinEdt (as >> opposed to RWinEdt), they are appearing. Can anyone offer any advice on >> this? >> >> Thanks in advance, >> Simon Knapp >> >> OS: windows7 >> Arch: 64 bit >> R version: 2.13.0 (2011-04-13) >> WinEdt version: 5.14 (build 20050701) >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jwiley.psych at gmail.com Mon Jul 4 10:11:57 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Mon, 4 Jul 2011 01:11:57 -0700 Subject: [R] Wrong environment when evaluating and expression? Message-ID: Hi All, I have constructed two expressions (e1 & e2). I can see that they are not identical, but I cannot figure out how they differ. ############### dat <- mtcars e1 <- expression(with(data = dat, lm(mpg ~ hp))) e2 <- as.expression(substitute(with(data = dat, lm(f)), list(f = mpg ~ hp))) str(e1) str(e2) all.equal(e1, e2) identical(e1, e2) # false eval(e1) eval(e2) ################ The context is trying to use a list of formulae to generate several models from a multiply imputed dataset. The package I am using (mice) has methods for with() and that is how I can (easily) get the pooled results. Passing the formula directly does not work, so I was trying to generate the entire call and evaluate it as if I had typed it at the console, but I am missing something (probably rather silly). Thanks, Josh -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From Roger.Bivand at nhh.no Mon Jul 4 11:33:17 2011 From: Roger.Bivand at nhh.no (Roger Bivand) Date: Mon, 4 Jul 2011 11:33:17 +0200 Subject: [R] [R-pkgs] rgdal 0.7-1 release Message-ID: A new release of rgdal, a package providing bindings for the Geospatial Data Abstraction Library for reading and writing spatial data, has reached CRAN. This release changes the error handling mechanisms, and is more fully described in a posting on R-sig-geo: https://stat.ethz.ch/pipermail/r-sig-geo/2011-July/012126.html If any users observe unexpected behaviour following update, please revert to the 0.6-* series, and report with full details to the package maintainer. Extensive checking has been carried out, and no unexpected behaviour observed, but it is not feasible to check all possible use cases, especially erroneous use cases, hence this message. -- Roger Bivand Department of Economics, NHH Norwegian School of Economics, Helleveien 30, N-5045 Bergen, Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43 e-mail: Roger.Bivand at nhh.no _______________________________________________ R-packages mailing list R-packages at r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages From jrkrideau at yahoo.ca Mon Jul 4 13:21:51 2011 From: jrkrideau at yahoo.ca (John Kane) Date: Mon, 4 Jul 2011 04:21:51 -0700 (PDT) Subject: [R] Unusual graph- modified wind rose perhaps? Message-ID: <1309778511.5571.YahooMailClassic@web38408.mail.mud.yahoo.com> In a OpenOffice.org forum someone was asking if the spreadsheet could graph this http://www.elmundo.es/elmundosalud/documentos/2011/06/leche.html I didn't think it could. :) I don't think I've ever seen exactly this layout. Does anyone know if there is anything in R that does a graph like this or that can be adapted to do it. Unfortunately my Spanish is non-existent so I am not sure how effective the graph is in achieving whatever it's suppposed to do. A dot chart might be as effective but it is a flashy graphic. Thanks From ggrothendieck at gmail.com Mon Jul 4 13:26:22 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Mon, 4 Jul 2011 07:26:22 -0400 Subject: [R] Wrong environment when evaluating and expression? In-Reply-To: References: Message-ID: On Mon, Jul 4, 2011 at 4:11 AM, Joshua Wiley wrote: > Hi All, > > I have constructed two expressions (e1 & e2). ?I can see that they are > not identical, but I cannot figure out how they differ. > > ############### > dat <- mtcars > e1 <- expression(with(data = dat, lm(mpg ~ hp))) > e2 <- as.expression(substitute(with(data = dat, lm(f)), list(f = mpg ~ hp))) > > str(e1) > str(e2) > all.equal(e1, e2) > identical(e1, e2) # false > > eval(e1) > eval(e2) > ################ > > The context is trying to use a list of formulae to generate several > models from a multiply imputed dataset. ?The package I am using (mice) > has methods for with() and that is how I can (easily) get the pooled > results. ?Passing the formula directly does not work, so I was trying > to generate the entire call and evaluate it as if I had typed it at > the console, but I am missing something (probably rather silly). > In e1, mpg ~ hp is a call object but in e2 its a formula with an environment: > e1[[1]][[3]][[2]] mpg ~ hp > e2[[1]][[3]][[2]] mpg ~ hp > > class(e1[[1]][[3]][[2]]) [1] "call" > class(e2[[1]][[3]][[2]]) [1] "formula" > > environment(e2[[1]][[3]][[2]]) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From Thierry.ONKELINX at inbo.be Mon Jul 4 14:14:30 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Mon, 4 Jul 2011 12:14:30 +0000 Subject: [R] Unusual graph- modified wind rose perhaps? In-Reply-To: <1309778511.5571.YahooMailClassic@web38408.mail.mud.yahoo.com> References: <1309778511.5571.YahooMailClassic@web38408.mail.mud.yahoo.com> Message-ID: Dear John, You can get pretty close with ggplot2. Best regards, Thierry library(ggplot2) dataset <- data.frame(Name = LETTERS[1:26]) dataset$Score <- runif(nrow(dataset)) dataset$Category <- cut(dataset$Score, breaks = c(-Inf, 0.33, 0.66, Inf), labels = c("Bad", "Neutral", "Good")) dataset$Name <- factor(dataset$Name, levels = dataset$Name[order(dataset$Score)]) dataset$Location <- as.numeric(dataset$Name) ggplot(dataset, aes(x = Name, y = Score, fill = Category)) + geom_bar() + coord_polar() #with some extra tweeking dataset <- rbind(dataset, data.frame( Location = c(max(dataset$Location) + seq_len(max(dataset$Location) / 2), min(dataset$Location) - seq_len(max(dataset$Location) / 2)), Name = "", Score = 0, Category = "Good" ) ) ggplot(dataset, aes(x = Location, y = Score, fill = Category)) + geom_bar(stat = "identity") + coord_polar(start = pi, direction = -1) + scale_fill_manual(value = c(Good = "green", Neutral = "grey", Bad = "red")) + theme_bw() + scale_x_continuous("", breaks = dataset$Location, labels = dataset$Name) ---------------------------------------------------------------------------- ir. Thierry Onkelinx Instituut voor natuur- en bosonderzoek team Biometrie & Kwaliteitszorg Gaverstraat 4 9500 Geraardsbergen Belgium Research Institute for Nature and Forest team Biometrics & Quality Assurance Gaverstraat 4 9500 Geraardsbergen Belgium tel. + 32 54/436 185 Thierry.Onkelinx at inbo.be www.inbo.be To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens John Kane > Verzonden: maandag 4 juli 2011 13:22 > Aan: r-help at r-project.org > Onderwerp: [R] Unusual graph- modified wind rose perhaps? > > > In a OpenOffice.org forum someone was asking if the spreadsheet could graph > this http://www.elmundo.es/elmundosalud/documentos/2011/06/leche.html > > I didn't think it could. :) > > I don't think I've ever seen exactly this layout. Does anyone know if there is > anything in R that does a graph like this or that can be adapted to do it. > > Unfortunately my Spanish is non-existent so I am not sure how effective the > graph is in achieving whatever it's suppposed to do. A dot chart might be as > effective but it is a flashy graphic. > > Thanks > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jim at bitwrit.com.au Mon Jul 4 14:20:23 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Mon, 04 Jul 2011 22:20:23 +1000 Subject: [R] Unusual graph- modified wind rose perhaps? In-Reply-To: <1309778511.5571.YahooMailClassic@web38408.mail.mud.yahoo.com> References: <1309778511.5571.YahooMailClassic@web38408.mail.mud.yahoo.com> Message-ID: <4E11B007.2030303@bitwrit.com.au> On 07/04/2011 09:21 PM, John Kane wrote: > > In a OpenOffice.org forum someone was asking if the spreadsheet could graph this http://www.elmundo.es/elmundosalud/documentos/2011/06/leche.html > > I didn't think it could. :) > > I don't think I've ever seen exactly this layout. Does anyone know if there is anything in R that does a graph like this or that can be adapted to do it. > > Unfortunately my Spanish is non-existent so I am not sure how effective the graph is in achieving whatever it's suppposed to do. A dot chart might be as effective but it is a flashy graphic. > Hi John, It's a bit like the function I am working on after the request by Patrick Jemison (radial.pie), and I thought the interactive fill when you sweep the pointer over it is pretty neat. I don't think I'll go that far. I'll let you know when I've got a working function. Jim From johannes_graumann at web.de Mon Jul 4 14:35:01 2011 From: johannes_graumann at web.de (Johannes Graumann) Date: Mon, 4 Jul 2011 15:35:01 +0300 Subject: [R] Prevent 'R CMD check' from reporting "NA"/"NA_character_" missmatch? Message-ID: Hello, I'm writing a package am running 'R CMD check' on it. Is there any way to make 'R CMD check' not warn about a missmatch between 'NA_character_' (in the function definition) and 'NA' (in the documentation)? Thanks for any help. Sincerely, Joh From vaishali.sadaphal at tcs.com Mon Jul 4 09:47:21 2011 From: vaishali.sadaphal at tcs.com (Vaishali Sadaphal) Date: Mon, 4 Jul 2011 13:17:21 +0530 Subject: [R] Protecting R code Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sanketh.vijay at intermediasoftech.com Mon Jul 4 11:28:51 2011 From: sanketh.vijay at intermediasoftech.com (Sanketh) Date: Mon, 4 Jul 2011 02:28:51 -0700 (PDT) Subject: [R] Copying to R a rectangular array from a Java class In-Reply-To: <1304368924113-3490919.post@n4.nabble.com> References: <1304180854651-3486167.post@n4.nabble.com> <1304368924113-3490919.post@n4.nabble.com> Message-ID: <1309771731792-3643223.post@n4.nabble.com> Hi, can you please tel me how to retrieve String two dimensional array as like sapply? -- View this message in context: http://r.789695.n4.nabble.com/Copying-to-R-a-rectangular-array-from-a-Java-class-tp3486167p3643223.html Sent from the R help mailing list archive at Nabble.com. From n.bowora at gmail.com Mon Jul 4 11:34:31 2011 From: n.bowora at gmail.com (EdBo) Date: Mon, 4 Jul 2011 02:34:31 -0700 (PDT) Subject: [R] loop in optim Message-ID: <1309772071774-3643230.post@n4.nabble.com> Hi May you help me correct my loop function. I want optim to estimates al_j; au_j; sigma_j; b_j by looking at 0 to 20, 21 to 40, 41 to 60 data points. The final result should have 4 columns of each of the estimates AND 4 rows of each of 0 to 20, 21 to 40, 41 to 60. ###MY code is n=20 runs=4 out=matrix(0,nrow=runs) llik = function(x) { al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4] sum(na.rm=T, ifelse(a$R_j< 0, -log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, ifelse(a$R_j>0 , -log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, -log(pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2))- pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2))))) ) } start.par = c(0, 0, 0.01, 1) out1 = optim(llik, par=start.par, method="Nelder-Mead") for (i in 1: runs) { index_start=20*(i-1)+1 index_end= 20*i out[i]=out1[index_start:index_end] } out Thank you in advance Edward UCT ####My data R_j R_m -0.0625 0.002320654 0 -0.004642807 0.033333333 0.005936332 0.032258065 0.001060848 0 0.007114057 0.015625 0.005581558 0 0.002974794 0.015384615 0.004215271 0.060606061 0.005073116 0.028571429 -0.006001279 0 -0.002789594 0.013888889 0.00770633 0 0.000371663 0.02739726 -0.004224228 -0.04 0.008362539 0 -0.010951605 0 0.004682924 0.013888889 0.011839993 -0.01369863 0.004210383 -0.027777778 -0.04658949 0 0.00987272 -0.057142857 -0.062203157 -0.03030303 -0.119177639 0.09375 0.077054642 0 -0.022763619 -0.057142857 0.050408775 0 0.024706076 -0.03030303 0.004043701 0.0625 0.004951088 0 -0.005968731 0 -0.038292548 0 0.013381097 0.014705882 0.006424728 -0.014492754 -0.020115626 0 -0.004837891 -0.029411765 -0.022054654 0.03030303 0.008936428 0.044117647 8.16925E-05 0 -0.004827246 -0.042253521 0.004653096 -0.014705882 -0.004222151 0.029850746 0.000107267 -0.028985507 -0.001783206 0.029850746 -0.006372981 0.014492754 0.005492374 -0.028571429 -0.009005846 0 0.001031683 0.044117647 0.002800551 -- View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3643230.html Sent from the R help mailing list archive at Nabble.com. From uriblass at gmail.com Mon Jul 4 11:48:33 2011 From: uriblass at gmail.com (UriB) Date: Mon, 4 Jul 2011 02:48:33 -0700 (PDT) Subject: [R] How to build a matrix of number of appearance? Message-ID: <1309772913259-3643248.post@n4.nabble.com> I have a matrix of claims at year1 that I get simply by claims<-read.csv(file="Claims.csv") qq1<-claims[claims$Year=="Y1",] I have MemberID and ProviderID for every claim in qq1 both are integers An example for the type of questions that I want to answer is how many times ProviderID number 345 appears together with MemberID 23 in the table qq1 In order to answer these questions for every possible ProviderId and every possible MemberID I would like to have a matrix that has first column as memberID when every memberID in qq1 appears only once and columns that have number of appearance of ProviderID==i for every i that has sum(qq1$ProviderID==i)>0 My question is if there is a simple way to do it in R Thanks in Advance Uri -- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3643248.html Sent from the R help mailing list archive at Nabble.com. From ramzi.temanni at gmail.com Mon Jul 4 11:58:23 2011 From: ramzi.temanni at gmail.com (Ramzi TEMANNI) Date: Mon, 4 Jul 2011 11:58:23 +0200 Subject: [R] writeLines + foreach/doMC In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From marchywka at hotmail.com Mon Jul 4 12:34:10 2011 From: marchywka at hotmail.com (Mike Marchywka) Date: Mon, 4 Jul 2011 06:34:10 -0400 Subject: [R] wavelets In-Reply-To: <1c665c21-baa5-41ad-8b37-dc7b0013b838@email.android.com> References: <1309754442486-3642973.post@n4.nabble.com>, <1c665c21-baa5-41ad-8b37-dc7b0013b838@email.android.com> Message-ID: > From: jdnewmil at dcn.davis.ca.us > Date: Mon, 4 Jul 2011 00:45:41 -0700 > To: tyagi149 at gmail.com; r-help at r-project.org > Subject: Re: [R] wavelets > > Study the topic more carefully, I suppose. My understanding is that wavelets do not in themselves compress anything, but because they sort out the interesting data from the uninteresting data, it can be easy to toss the uninteresting data (lossy data compression). Perhaps you should understand better what your Matlab library is doing. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN: Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > user123 wrote: > > I'm new to the topic of wavelets. When I tried to use the mra function in the > wavelets package, the data is not getting compressed. eg. if the original > data has 500 values , the output data also has the same. > However in MATLAB, depending on the level of decompositon, the data gets > compressed. > How do I implement this in R? can you post some code? You can always compress into one value of course by turning bytes into a single char string, what you want is entropy. I posted some example code before and I remember it took effort to not get the subsampling. mra is probably multi-resolution analysis and I'd suppose you want all the samples. You probably need paper and pencil however at this point. ? > > -- > View this message in context: http://r.789695.n4.nabble.com/wavelets-tp3642973p3642973.html > Sent from the R help mailing list archive at Nabble.com. > > _____________________________________________ > > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ripley at stats.ox.ac.uk Mon Jul 4 14:52:54 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Mon, 4 Jul 2011 13:52:54 +0100 Subject: [R] Prevent 'R CMD check' from reporting "NA"/"NA_character_" missmatch? In-Reply-To: References: Message-ID: On Mon, 4 Jul 2011, Johannes Graumann wrote: > Hello, > > I'm writing a package am running 'R CMD check' on it. > > Is there any way to make 'R CMD check' not warn about a missmatch between > 'NA_character_' (in the function definition) and 'NA' (in the > documentation)? Be consistent .... Why do you want incorrect documentation of your package? (It is not clear of the circumstances here: normally 1 vs 1L and similar are not reported if they are the only errors.) And please do note the posting guide - this is not really the correct list - you were asked to give an actual example with output. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From mvalle at cscs.ch Mon Jul 4 14:55:20 2011 From: mvalle at cscs.ch (Mario Valle) Date: Mon, 4 Jul 2011 14:55:20 +0200 Subject: [R] writeLines + foreach/doMC In-Reply-To: References: Message-ID: <4E11B838.9080009@cscs.ch> Read something about parallel processing and how I/O should be done by a single process. Suggestion: write a different file from each thread then combine the results with cat or similar. Hope it helps mario On 04-Jul-11 11:58, Ramzi TEMANNI wrote: > Hi > I'm processing sequencing data trying to collapsing the locations of each > unique sequence and write the results to a file (as storing that in a table > will require 10GB mem at least) > so I wrote a function that, given a sequence id, provide the needed line to > be stored > library(doMC) # load library > registerDoMC(12) # assign the Number of CPU > > > fileConn<-file(paste(fq_file,"_SeqID.txt",sep=""),open = "at") # open > connection > writeLines(paste("ReadID","Freq","Seq","LOC_UG","Nb_UG_Seq",sep="\t"), > fileConn) # write header > foreach(i=1:length(uniq.Seq)) %dopar% # for eqch unique sequence > { > writeLines(paste(gettable1(uniq.Seq[i]),collapse=" "), fileConn) #write > the the results line > } > close(fileConn) > > the code excute well, but the problem is that some lines are wired: > The header and lot of lines are ok : > ReadID Freq Seq LOC_UG Nb_UG_Seq > HWI-EA332_0036:5:16:9530:21025#ATGC/1 XXXXXXXXXXXXXXXXXXXX 2 > XXXXX_10130:489:+,XXXXX_10130:489:+ 2 > HWI-EA332_0036:5:117:6674:4940#ATGC/1 XXXXXXXXXXXXXXXXXXXX 1 > XXXXX:432:-,XXXXX:432:- 2 > HWI-EA332_0036:5:62:15592:7375#ATGC/1 XXXXXXXXXXXXXXXXXXXX 2 > XXXXX_22660:253:+,XXXXX_22660:253:+ 2 > HWI-EA332_0036:5:110:14349:8422#ATGC/1 XXXXXXXXXXXXXXXXXXXX 4 > XXXXX_13806:399:+,XXXXX_13806:399:+,XXXXX_27263:481:+,XXXXX_27263:481:+ 4 > other looks wired > HWI-EA332_0036:5:17:1400ReadID Freq Seq LOC_UG Nb_UG_Seq > HWI-EA332_0036:5:61:7734:4201ReadID Freq Seq LOC_UG Nb_UG_Seq > HWI-EA332_0036:5:117:5361:10666#ATGReadID Freq Seq LOC_UG > Nb_UG_Seq > HWI-EA332_0036:5:115:7421:20664#ATGC/1 GATCReadID Freq Seq > LOC_UG Nb_UG_Seq > HWI-EA332_0036:5:175:95:- 2 > HWI-EA332_0036:5JCVI_35536:444:+ 2 > XXXXXXXXX 1 XXXXX_22484:571:-,XXXXX_22484:571:- 2 > > Is this due to the fact that one process start to write prior the other has > finished ? > Is there a way to solve this problem ? > Any suggestions would be greatly appreciated. > Thanks and have a nice day. > > > Best, > Ramzi TEMANNI > http://www.linkedin.com/in/ramzitemanni > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Ing. Mario Valle Data Analysis and Visualization Group | http://www.cscs.ch/~mvalle Swiss National Supercomputing Centre (CSCS) | Tel: +41 (91) 610.82.60 v. Cantonale Galleria 2, 6928 Manno, Switzerland | Fax: +41 (91) 610.82.82 From annakolar at yahoo.com Mon Jul 4 14:57:45 2011 From: annakolar at yahoo.com (Ana Kolar) Date: Mon, 4 Jul 2011 05:57:45 -0700 (PDT) Subject: [R] extracting data In-Reply-To: <4E0A113C.8040609@ucalgary.ca> References: <1309278525.73019.YahooMailNeo@web114720.mail.gq1.yahoo.com> <1309280081.85976.YahooMailNeo@web114713.mail.gq1.yahoo.com> <4E0A113C.8040609@ucalgary.ca> Message-ID: <1309784265.53222.YahooMailNeo@web114701.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jrkrideau at yahoo.ca Mon Jul 4 15:19:21 2011 From: jrkrideau at yahoo.ca (John Kane) Date: Mon, 4 Jul 2011 06:19:21 -0700 (PDT) Subject: [R] Unusual graph- modified wind rose perhaps? In-Reply-To: Message-ID: <1309785561.88509.YahooMailClassic@web38401.mail.mud.yahoo.com> Very pretty Thierry, I was wondering if ggplot2 could do something like it but my knowledge of ggplot2 is far to little to attempt it myself. I'm going to really have to spend some time on that code. Thanks --- On Mon, 7/4/11, ONKELINX, Thierry wrote: > From: ONKELINX, Thierry > Subject: RE: [R] Unusual graph- modified wind rose perhaps? > To: "John Kane" , "r-help at r-project.org" > Received: Monday, July 4, 2011, 8:14 AM > Dear John, > > You can get pretty close with ggplot2. > > Best regards, > > Thierry > > library(ggplot2) > dataset <- data.frame(Name = LETTERS[1:26]) > dataset$Score <- runif(nrow(dataset)) > dataset$Category <- cut(dataset$Score, breaks = c(-Inf, > 0.33, 0.66, Inf), labels = c("Bad", "Neutral", "Good")) > dataset$Name <- factor(dataset$Name, levels = > dataset$Name[order(dataset$Score)]) > dataset$Location <- as.numeric(dataset$Name) > > ggplot(dataset, aes(x = Name, y = Score, fill = Category)) > + geom_bar() + coord_polar() > > > #with some extra tweeking > dataset <- rbind(dataset, > ??? data.frame( > ??? ??? Location = > c(max(dataset$Location) + seq_len(max(dataset$Location) / > 2), min(dataset$Location) - seq_len(max(dataset$Location) / > 2)), > ??? ??? Name = "", > ??? ??? Score = 0, > ??? ??? Category = "Good" > ??? ) > ) > ggplot(dataset, aes(x = Location, y = Score, fill = > Category)) + geom_bar(stat = "identity") + coord_polar(start > = pi, direction = -1) + scale_fill_manual(value = c(Good = > "green", Neutral = "grey", Bad = "red")) + theme_bw() + > scale_x_continuous("", breaks = dataset$Location, labels = > dataset$Name) > > ---------------------------------------------------------------------------- > ir. Thierry Onkelinx > Instituut voor natuur- en bosonderzoek > team Biometrie & Kwaliteitszorg > Gaverstraat 4 > 9500 Geraardsbergen > Belgium > > Research Institute for Nature and Forest > team Biometrics & Quality Assurance > Gaverstraat 4 > 9500 Geraardsbergen > Belgium > > tel. + 32 54/436 185 > Thierry.Onkelinx at inbo.be > www.inbo.be > > To call in the statistician after the experiment is done > may be no more than asking him to perform a post-mortem > examination: he may be able to say what the experiment died > of. > ~ Sir Ronald Aylmer Fisher > > The plural of anecdote is not data. > ~ Roger Brinner > > The combination of some data and an aching desire for an > answer does not ensure that a reasonable answer can be > extracted from a given body of data. > ~ John Tukey > > > > -----Oorspronkelijk bericht----- > > Van: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] > > Namens John Kane > > Verzonden: maandag 4 juli 2011 13:22 > > Aan: r-help at r-project.org > > Onderwerp: [R] Unusual graph- modified wind rose > perhaps? > > > > > > In a OpenOffice.org forum someone was asking if the > spreadsheet could graph > > this http://www.elmundo.es/elmundosalud/documentos/2011/06/leche.html > > > > I didn't think it could. :) > > > > I don't think I've ever seen exactly this layout. Does > anyone know if there is > > anything in R that does a graph like this or that can > be adapted to do it. > > > > Unfortunately my Spanish is non-existent so I am not > sure how effective the > > graph is in achieving whatever it's suppposed to > do.? A dot chart might be as > > effective but it is a flashy graphic. > > > > Thanks > > > > ______________________________________________ > > R-help at r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > From statistics at inter.nl.net Mon Jul 4 15:42:20 2011 From: statistics at inter.nl.net (rgeskus) Date: Mon, 4 Jul 2011 06:42:20 -0700 (PDT) Subject: [R] cumulative incidence plot vs survival plot In-Reply-To: <1309441693.12823.3.camel@nemo> References: <1309206689.17839.YahooMailRC@web125803.mail.ne1.yahoo.com> <1309441693.12823.3.camel@nemo> Message-ID: <1309786940436-3643659.post@n4.nabble.com> Note that most of the nonparametric and semi-parametric competing risks analyses can be performed within the survival package. This includes nonparametric estimation of cause-specific cumulative incidence curves and the log-rank type test. It suffices to create a weighted data set as explained in Geskus, Biometrics 67, p. 39-49, 2011. Ronald Geskus Academic Medical Center Amsterdam, the Netherlands >> Hi, I am wondering if anyone can explain to me if cumulative incidence >> (CI) is .... > The cumulative incidence curve and the KM are not the same, when there > are multiple outcomes. See the "etype" argument to survfit, which is > used to create CI curves (?survfit.formula). For testing differences > between CI curves use the cmprsk library from Gray; it can also draw > curves by the survfit routine has a lot more flexibility. -- View this message in context: http://r.789695.n4.nabble.com/cumulative-incidence-plot-vs-survival-plot-tp3628772p3643659.html Sent from the R help mailing list archive at Nabble.com. From ligges at statistik.tu-dortmund.de Mon Jul 4 16:28:51 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Mon, 04 Jul 2011 16:28:51 +0200 Subject: [R] Protecting R code In-Reply-To: References: Message-ID: <4E11CE23.6050204@statistik.tu-dortmund.de> On 04.07.2011 09:47, Vaishali Sadaphal wrote: > Hi All, > > I need to give my R code to my client to use. I would like to protect the > logic/algorithms that have been coded in R. This means that I would not > like anyone to be able to read the code. > > I am searching for ways to protect R code. I would like to create a .exe > kind of file which could be executed without using R or requiring to > install R. I would not like the R code to be loaded in R. This is so > because, after R loads a function, if you type the function name on the > command prompt, you can see the complete code. I would not like to give > this type of access to the R code. > > I explored the option of creating .bat file (using command: R CMD BAT) and > byte code (using command: compile). These are not useful since they open > R, load these functions and then the R code is visible. > > Is there any other way to protect the R code which would help me package > all my files/source files and give me an executable file which would be > run without opening R? Another problem is that R is freely downloadable. > Is it somehow possible to protect the code from being loaded in R and > being seen. Hmmmm, R is open source software under the GPL (which is infective) and designed as such. Good luck it is almost impossible to hide the source code in R. And people who tried to generate C based binary packages found those can only be used under a small subset of platforms with few versions of R. Since R is distributed under the GPL: When you write code and make it available to others, you should be aware of this fact that you may have to distribute the sources under GPL as well - under some circumstances your lawyer can explain much better than I. > Thanks > -- > Vaishali > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If it is "confidential or privileged information", you should not send it to a mailing list where the archives are published. Best, Uwe Ligges > If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From b.rowlingson at lancaster.ac.uk Mon Jul 4 16:41:50 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Mon, 4 Jul 2011 15:41:50 +0100 Subject: [R] Protecting R code In-Reply-To: References: Message-ID: On Mon, Jul 4, 2011 at 8:47 AM, Vaishali Sadaphal wrote: > Hi All, > > I need to give my R code to my client to use. I would like to protect the > logic/algorithms that have been coded in R. This means that I would not > like anyone to be able to read the code. At some point the R code has to be run. Which means it has to be read by an interpreter that can handle R code. Which means, unless you rewrite the interpreter, the R code must exist as such. Even if you could compile R into C code into machine code and distribute a .exe file, its still possible in theory to reverse-engineer it and get something like the original back - the original logic if not the original names of the variables and functions. You could rewrite the interpreter to only run encrypted, signed code that requires a decryption key, but you still have to give the user the decryption key at some point in order to get the plaintext code. Again, its an obfuscation problem of hiding the key somewhere, and hence is going to fail. It all depends on how much expense you want to go to in order to make the expense of circumventing your solution more than its worth. Tell me how much that is, and I will tell you the solution. For total security[1], you need to run the code on servers YOU control, and only give access via a network API. You can do this with RServe or any of the HTTP-based systems like Rapache. Barry [1] Except of course servers can be hacked or socially-engineered into. For total security, disconnect your machine from the network and from any power supply. From dwinsemius at comcast.net Mon Jul 4 16:50:40 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 4 Jul 2011 10:50:40 -0400 Subject: [R] How to build a matrix of number of appearance? In-Reply-To: <1309772913259-3643248.post@n4.nabble.com> References: <1309772913259-3643248.post@n4.nabble.com> Message-ID: <17867D87-8C02-40DC-B647-B8C5DE41FD56@comcast.net> On Jul 4, 2011, at 5:48 AM, UriB wrote: > I have a matrix of claims at year1 that I get simply by > > claims<-read.csv(file="Claims.csv") > qq1<-claims[claims$Year=="Y1",] > > I have MemberID and ProviderID for every claim in qq1 both are > integers > > An example for the type of questions that I want to answer is > how many times ProviderID number 345 appears together with MemberID > 23 in > the table qq1 > > In order to answer these questions for every possible ProviderId and > every > possible MemberID > I would like to have a matrix that has first column as memberID when > every > memberID in qq1 appears only once and columns that have number of > appearance > of ProviderID==i for every i that has > sum(qq1$ProviderID==i)>0 > > My question is if there is a simple way to do it in R A really quick way of finding this would be: as.data.frame ( xtabs( ~ ProviderID +MemberID, data= qq1) ) -- David Winsemius, MD West Hartford, CT From spencer.graves at structuremonitoring.com Mon Jul 4 17:05:59 2011 From: spencer.graves at structuremonitoring.com (Spencer Graves) Date: Mon, 04 Jul 2011 08:05:59 -0700 Subject: [R] Protecting R code In-Reply-To: <4E11CE23.6050204@statistik.tu-dortmund.de> References: <4E11CE23.6050204@statistik.tu-dortmund.de> Message-ID: <4E11D6D7.4030702@structuremonitoring.com> On 7/4/2011 7:28 AM, Uwe Ligges wrote: > > > On 04.07.2011 09:47, Vaishali Sadaphal wrote: >> Hi All, >> >> I need to give my R code to my client to use. I would like to protect >> the >> logic/algorithms that have been coded in R. This means that I would not >> like anyone to be able to read the code. >> >> I am searching for ways to protect R code. I would like to create a .exe >> kind of file which could be executed without using R or requiring to >> install R. I would not like the R code to be loaded in R. This is so >> because, after R loads a function, if you type the function name on the >> command prompt, you can see the complete code. I would not like to give >> this type of access to the R code. >> >> I explored the option of creating .bat file (using command: R CMD >> BAT) and >> byte code (using command: compile). These are not useful since they open >> R, load these functions and then the R code is visible. >> >> Is there any other way to protect the R code which would help me package >> all my files/source files and give me an executable file which would be >> run without opening R? Another problem is that R is freely downloadable. >> Is it somehow possible to protect the code from being loaded in R and >> being seen. > > > Hmmmm, R is open source software under the GPL (which is infective) > and designed as such. Good luck it is almost impossible to hide the > source code in R. And people who tried to generate C based binary > packages found those can only be used under a small subset of > platforms with few versions of R. > > Since R is distributed under the GPL: When you write code and make it > available to others, you should be aware of this fact that you may > have to distribute the sources under GPL as well - under some > circumstances your lawyer can explain much better than I. Linux is distributed under the GPL, and people distribute software implemented in Linux without having to release their source code. There are different versions of the GPL. You should read them carefully and consult with an attorney. However, if you honestly read the GPL verbiage, you may find that you know more than your attorney -- but you still need the attorney. I'm not an attorney and I haven't read GPL verbiage in a while, but as I recall a key issue is whether your code is your creation or a modification of some other GPL code. If the latter, you could lose in court if challenged. I see two options: 1. Write the proprietary portion of your code in a compiled language like C, C++, or Fortran, and link from R to your compiled subroutines. If you do not already write R packages, I strongly urge you to first learn how to produce and use R packages. Documentation on "Creating R Packages" is available from any standard CRAN mirror. I suggest you create separate R packages (with different names) complete with documentation for your internal only version in R only and for your public version that uses compiled code. This allows you to prototype your new ideas quickly in R before you spend the money to convert them to compiled code. It also encourages you to build test cases in a way that increases software quality. Then you can distribute the public R package in its standard compiled format, which your users can install using the standard procedure to "Install package(s) from local zip file" (available on the "Packages" menu in Rgui). This is arguably the cleanest legally, because then it's clear that your proprietary code has an existence independent of R. You can distribute your package with an appropriate end user license agreement and instructions for how to install R and any CRAN packages you use plus your own code. 2. You can write something to encrypt your R code. I know someone who has done this. However, the legal status is not as clean as if you wrote you proprietary algorithm in a compiled language, because if someone with a larger budget for attorneys wants to take you to court demanding your source code, you might lose. I doubt if that would happen, but I'm not an attorney, so I don't know. I do know that people often lose legal battles just because their opponents have much better attorneys. The advantage of this is that you could then distribute your latest changes immediately after you get them working. Another disadvantage is that your code will have to decrypt the R code prior to running it, which means that your code might still be available to anyone clever enough to interrupt your code while it's running. Thus, it's not as secure as writing compiled code, in addition to not having as strong a claim to having an existence independent of R. You could also combine this with the first, where your latest release would encrypt your latest enhancements while you are working to translate those into compiled code. Few people with university appointments have to worry about these issues, because they get paid for generating new knowledge and sharing it with the world. The rest of us must find different answers for how to provide for ourselves and our families without a university salary. Hope this helps. Spencer Graves > >> Thanks >> -- >> Vaishali >> =====-----=====-----===== >> Notice: The information contained in this e-mail >> message and/or attachments to it may contain >> confidential or privileged information. > > > If it is "confidential or privileged information", you should not send > it to a mailing list where the archives are published. > > > Best, > Uwe Ligges > > > > > >> If you are >> not the intended recipient, any dissemination, use, >> review, distribution, printing or copying of the >> information contained in this e-mail message >> and/or attachments to it are strictly prohibited. If >> you have received this communication in error, >> please notify us by reply e-mail or telephone and >> immediately and permanently delete the message >> and any attachments. Thank you >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. From spencer.graves at prodsyse.com Mon Jul 4 17:12:24 2011 From: spencer.graves at prodsyse.com (Spencer Graves) Date: Mon, 04 Jul 2011 08:12:24 -0700 Subject: [R] Protecting R code In-Reply-To: References: Message-ID: <4E11D858.5010406@prodsyse.com> Hello: On 7/4/2011 7:41 AM, Barry Rowlingson wrote: > On Mon, Jul 4, 2011 at 8:47 AM, Vaishali Sadaphal > wrote: >> Hi All, >> >> I need to give my R code to my client to use. I would like to protect the >> logic/algorithms that have been coded in R. This means that I would not >> like anyone to be able to read the code. > At some point the R code has to be run. Which means it has to be > read by an interpreter that can handle R code. Which means, unless you > rewrite the interpreter, the R code must exist as such. > > Even if you could compile R into C code into machine code and > distribute a .exe file, its still possible in theory to > reverse-engineer it and get something like the original back - the > original logic if not the original names of the variables and > functions. > > You could rewrite the interpreter to only run encrypted, signed code > that requires a decryption key, but you still have to give the user > the decryption key at some point in order to get the plaintext code. > Again, its an obfuscation problem of hiding the key somewhere, and > hence is going to fail. > > It all depends on how much expense you want to go to in order to make > the expense of circumventing your solution more than its worth. Tell > me how much that is, and I will tell you the solution. > > For total security[1], you need to run the code on servers YOU > control, and only give access via a network API. You can do this with > RServe or any of the HTTP-based systems like Rapache. An organization I know that encrypted R code started with making it available only on their servers. This was maybe four years ago. I'm not sure what they do now, but I think they have since lost their major proponents of R internally and have probably translated all the code they wanted to sell into a compiled language in a way that didn't require R at all. Spencer > > Barry > > [1] Except of course servers can be hacked or socially-engineered > into. For total security, disconnect your machine from the network and > from any power supply. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From katerine.goyer at uqtr.ca Mon Jul 4 15:22:23 2011 From: katerine.goyer at uqtr.ca (Katerine Goyer) Date: Mon, 4 Jul 2011 09:22:23 -0400 Subject: [R] modification of cross-validations in rpart Message-ID: Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : From kb at flicon.de Mon Jul 4 15:41:33 2011 From: kb at flicon.de (kbr) Date: Mon, 4 Jul 2011 06:41:33 -0700 (PDT) Subject: [R] I need help for creating a "timevar" Message-ID: <1309786893784-3643658.post@n4.nabble.com> Hi all! I have data in ?Long? format which I would like to reshape to ?Wide?. I know that one possibility is the ?reshape? command, which needs a ?timevar?. Data look as follows: There are approx. 3000 persons (?IDENTITY?) and, for each person, there are between 2 and 20 events (?EVENT?). For now, there's one row for each event (9506 rows) http://r.789695.n4.nabble.com/file/n3643658/Screenshot-2.png What is missing is the ?timevar? (SPSS calls it ?INDEX?), which numbers the events WITHIN each person (right column). I managed to number the events from 1 to 9506 with the seq-command, first writing the number of rows in nEVENT: > number <-seq(file=event, 1, nEVENT, b=1) Yet, I didn't manage to do so for each individual separately. I guess it would be possible with the ?split? command, but I can't figure out how to apply it. Can anyone give me a hint? Thank you! Karen -- View this message in context: http://r.789695.n4.nabble.com/I-need-help-for-creating-a-timevar-tp3643658p3643658.html Sent from the R help mailing list archive at Nabble.com. From vaishali.sadaphal at tcs.com Mon Jul 4 18:48:13 2011 From: vaishali.sadaphal at tcs.com (Vaishali Sadaphal) Date: Mon, 4 Jul 2011 22:18:13 +0530 Subject: [R] Protecting R code In-Reply-To: <4E11D858.5010406@prodsyse.com> References: <4E11D858.5010406@prodsyse.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From marchywka at hotmail.com Mon Jul 4 19:11:17 2011 From: marchywka at hotmail.com (Mike Marchywka ) Date: Mon, 4 Jul 2011 17:11:17 +0000 Subject: [R] Protecting R code Message-ID: Put it on rapache or otherwise server but this seems like a waste depending on what you are doing Server side is only good way but making c++ may be interesting test Sent from my Verizon Wireless BlackBerry -----Original Message----- From: Vaishali Sadaphal Date: Mon, 4 Jul 2011 16:48:13 To: Cc: ; Subject: Re: [R] Protecting R code Hey All, Thank you so much for quick replies. Looks like translation to C/C++ is the only robust option. Do you think there exists any ready-made R to C translator? Thanks -- Vaishali Vaishali Paithankar Sadaphal Tata Consultancy Services Mailto: vaishali.sadaphal at tcs.com Website: http://www.tcs.com ____________________________________________ Experience certainty.?? IT Services ??????????????????????? Business Solutions ??????????????????????? Outsourcing ____________________________________________ From: Spencer Graves To: Barry Rowlingson Cc: Vaishali Sadaphal , r-help at r-project.org Date: 07/04/2011 08:42 PM Subject: Re: [R] Protecting R code Hello: On 7/4/2011 7:41 AM, Barry Rowlingson wrote: > On Mon, Jul 4, 2011 at 8:47 AM, Vaishali Sadaphal > ? wrote: >> Hi All, >> >> I need to give my R code to my client to use. I would like to protect the >> logic/algorithms that have been coded in R. This means that I would not >> like anyone to be able to read the code. >??? At some point the R code has to be run. Which means it has to be > read by an interpreter that can handle R code. Which means, unless you > rewrite the interpreter, the R code must exist as such. > >?? Even if you could compile R into C code into machine code and > distribute a .exe file, its still possible in theory to > reverse-engineer it and get something like the original back - the > original logic if not the original names of the variables and > functions. > >?? You could rewrite the interpreter to only run encrypted, signed code > that requires a decryption key, but you still have to give the user > the decryption key at some point in order to get the plaintext code. > Again, its an obfuscation problem of hiding the key somewhere, and > hence is going to fail. > >?? It all depends on how much expense you want to go to in order to make > the expense of circumventing your solution more than its worth. Tell > me how much that is, and I will tell you the solution. > >?? For total security[1], you need to run the code on servers YOU > control, and only give access via a network API. You can do this with > RServe or any of the HTTP-based systems like Rapache. ?????? An organization I know that encrypted R code started with making it available only on their servers.? This was maybe four years ago.? I'm not sure what they do now, but I think they have since lost their major proponents of R internally and have probably translated all the code they wanted to sell into a compiled language in a way that didn't require R at all. ?????? Spencer > > Barry > > [1] Except of course servers can be hacked or socially-engineered > into. For total security, disconnect your machine from the network and > from any power supply. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > =====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you ??????? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. From nicolas.chapados at gmail.com Mon Jul 4 18:35:36 2011 From: nicolas.chapados at gmail.com (Nicolas Chapados) Date: Mon, 4 Jul 2011 12:35:36 -0400 Subject: [R] forecast: bias in sampling from seasonal Arima model? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ata.sonu at gmail.com Mon Jul 4 18:45:08 2011 From: ata.sonu at gmail.com (ATANU) Date: Mon, 4 Jul 2011 09:45:08 -0700 (PDT) Subject: [R] Rpad library Message-ID: <1309797908267-3644041.post@n4.nabble.com> can anyone help me with a well documented tutorial on Rpad package? I need to do HTML programming in R.Can anyone help me with a tutorial? -- View this message in context: http://r.789695.n4.nabble.com/Rpad-library-tp3644041p3644041.html Sent from the R help mailing list archive at Nabble.com. From vikas.bansal at kcl.ac.uk Mon Jul 4 19:29:12 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Mon, 4 Jul 2011 18:29:12 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBF2@KCL-MAIL01.kclad.ds.kcl.ac.uk> Dear sir, I have one more problem.Sorry to disturb you again. I have a data frame like this- Col1 Col2 Col3 Col4 1 0 1 4 0 0 0 2 4 2 0 0 1 5 0 0 0 0 4 3 0 0 0 2 0 0 0 0 1 1 0 5 I want to delete all those rows which have more than two 0s like in above input row2 has 3 zeros,row6 has 3 zeros and row 7 has 4 zeros.so i want to exclude them so that my output should be- Col1 Col2 Col3 Col4 1 0 1 4 4 2 0 0 1 5 0 0 0 0 4 3 1 1 0 5 Can you please tell me how to code for this problem? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Monday, July 04, 2011 2:02 AM To: Bansal, Vikas Cc: Dennis Murphy; r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: >> So I want to code so that it will give the output like this- >> >> DATA FRAME (Input) Editing the task so it is reproducible: dat <- read.table(textConnection(' col3 col9 T .a,g,, A .t,t,, A .,c,c, C .,a,,, G .,t,t,t A .c,,g,^!. A .g,ggg.^!, A .$,,,,,., C a,g,,t, T ,,,,,.,^!. T ,$,,,,.,."'), header=TRUE, stringsAsFactors=FALSE) >> output >> >> A C G T >> 1 0 1 4 >> 4 0 0 2 >> 4 2 0 0 >> 1 5 0 0 >> 0 0 4 3 It's also possible to apply the logic that Gabor Grothendieck offered at the beginning of this thread: dat[, "newcol"] <- apply(dat, 1, function(x) gsub("\\,|\\." ,x[1], x[2]) ) # ... and the obvious repetition for C.G.T > dat[,"A"] <- nchar( gsub("[^aA]", "", dat[ , "newcol"] )) > dat col3 col9 newcol A 1 T .a,g,, TaTgTT 1 2 A .t,t,, AtAtAA 4 3 A .,c,c, AAcAcA 4 4 C .,a,,, CCaCCC 1 5 G .,t,t,t GGtGtGt 0 6 A .c,,g,^!. AcAAgA^!A 5 7 A .g,ggg.^!, AgAgggA^!A 4 8 A .$,,,,,., A$AAAAAAA 8 9 C a,g,,t, aCgCCtC 1 10 T ,,,,,.,^!. TTTTTTT^!T 0 11 T ,$,,,,.,." T$TTTTTTT" 0 I am deeply in debt to Gabor Grothendieck. He taught me all I know regarding regex. The man is a master at patterns. -- David Winsemius, MD West Hartford, CT From jwiley.psych at gmail.com Mon Jul 4 19:37:26 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Mon, 4 Jul 2011 10:37:26 -0700 Subject: [R] Wrong environment when evaluating and expression? In-Reply-To: References: Message-ID: Thanks Gabor, that makes sense now. In case anyone else runs into something similar, I ended up just passing a character string of the formula so it could be coerced to a formula in the correct environment. Thanks again, Josh On Mon, Jul 4, 2011 at 4:26 AM, Gabor Grothendieck wrote: > On Mon, Jul 4, 2011 at 4:11 AM, Joshua Wiley wrote: >> Hi All, >> >> I have constructed two expressions (e1 & e2). ?I can see that they are >> not identical, but I cannot figure out how they differ. >> >> ############### >> dat <- mtcars >> e1 <- expression(with(data = dat, lm(mpg ~ hp))) >> e2 <- as.expression(substitute(with(data = dat, lm(f)), list(f = mpg ~ hp))) >> >> str(e1) >> str(e2) >> all.equal(e1, e2) >> identical(e1, e2) # false >> >> eval(e1) >> eval(e2) >> ################ >> >> The context is trying to use a list of formulae to generate several >> models from a multiply imputed dataset. ?The package I am using (mice) >> has methods for with() and that is how I can (easily) get the pooled >> results. ?Passing the formula directly does not work, so I was trying >> to generate the entire call and evaluate it as if I had typed it at >> the console, but I am missing something (probably rather silly). >> > > In e1, mpg ~ hp is a call object but in e2 its a formula with an environment: > >> e1[[1]][[3]][[2]] > mpg ~ hp >> e2[[1]][[3]][[2]] > mpg ~ hp >> >> class(e1[[1]][[3]][[2]]) > [1] "call" >> class(e2[[1]][[3]][[2]]) > [1] "formula" >> >> environment(e2[[1]][[3]][[2]]) > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From dwinsemius at comcast.net Mon Jul 4 19:43:07 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 4 Jul 2011 13:43:07 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBF2@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBF2@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: <96812D2B-24B8-42A0-818C-922F81A6C08E@comcast.net> On Jul 4, 2011, at 1:29 PM, Bansal, Vikas wrote: > Dear sir, > > I have one more problem.Sorry to disturb you again. > > I have a data frame like this- > > Col1 Col2 Col3 Col4 > 1 0 1 4 > 0 0 0 2 > 4 2 0 0 > 1 5 0 0 > 0 0 4 3 > 0 0 0 2 > 0 0 0 0 > 1 1 0 5 > > I want to delete all those rows which have more than two 0s > like in above input row2 has 3 zeros,row6 has 3 zeros and row 7 has > 4 zeros.so i want to exclude them so that my output should be- > > Col1 Col2 Col3 Col4 > 1 0 1 4 > 4 2 0 0 > 1 5 0 0 > 0 0 4 3 > 1 1 0 5 > > Can you please tell me how to code for this problem? I am having a difficult time figuring out why this is not an obvious application for `apply` and "[" using logical indexing. I suggest you do some more self-study with the introductory material that you will find here: http://cran.r-project.org/other-docs.html (It also is a frequently asked question, so searching the archives for worked examples should also be considered.) Using search terms "delete all rows with ==" http://search.r-project.org/cgi-bin/namazu.cgi?query=delete+rows+with+all+%3D%3D&max=100&result=normal&sort=score&idxname=functions&idxname=Rhelp08&idxname=Rhelp10&idxname=Rhelp02 -- David. > > > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Monday, July 04, 2011 2:02 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > >>> So I want to code so that it will give the output like this- >>> >>> DATA FRAME (Input) > > Editing the task so it is reproducible: > > dat <- read.table(textConnection(' col3 col9 > T .a,g,, > A .t,t,, > A .,c,c, > C .,a,,, > G .,t,t,t > A .c,,g,^!. > A .g,ggg.^!, > A .$,,,,,., > C a,g,,t, > T ,,,,,.,^!. > T ,$,,,,.,."'), header=TRUE, > stringsAsFactors=FALSE) > >>> output >>> >>> A C G T >>> 1 0 1 4 >>> 4 0 0 2 >>> 4 2 0 0 >>> 1 5 0 0 >>> 0 0 4 3 > > It's also possible to apply the logic that Gabor Grothendieck offered > at the beginning of this thread: > > dat[, "newcol"] <- apply(dat, 1, function(x) gsub("\\,|\\." ,x[1], > x[2]) ) > # ... and the obvious repetition for C.G.T > >> dat[,"A"] <- nchar( gsub("[^aA]", "", dat[ , "newcol"] )) >> dat > col3 col9 newcol A > 1 T .a,g,, TaTgTT 1 > 2 A .t,t,, AtAtAA 4 > 3 A .,c,c, AAcAcA 4 > 4 C .,a,,, CCaCCC 1 > 5 G .,t,t,t GGtGtGt 0 > 6 A .c,,g,^!. AcAAgA^!A 5 > 7 A .g,ggg.^!, AgAgggA^!A 4 > 8 A .$,,,,,., A$AAAAAAA 8 > 9 C a,g,,t, aCgCCtC 1 > 10 T ,,,,,.,^!. TTTTTTT^!T 0 > 11 T ,$,,,,.,." T$TTTTTTT" 0 > > I am deeply in debt to Gabor Grothendieck. He taught me all I know > regarding regex. The man is a master at patterns. > > -- > > David Winsemius, MD > West Hartford, CT > David Winsemius, MD West Hartford, CT From jholtman at gmail.com Mon Jul 4 19:47:04 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 4 Jul 2011 13:47:04 -0400 Subject: [R] How to build a matrix of number of appearance? In-Reply-To: <1309772913259-3643248.post@n4.nabble.com> References: <1309772913259-3643248.post@n4.nabble.com> Message-ID: Here is another way: > xx <- data.frame(P = sample(5, 100, TRUE), M = sample(5, 100, TRUE), id = 1:100) > require(data.table) > xx <- data.table(xx) # convert to data.table > count <- xx[ + , list(count = length(id)) + , by = list(M, P) + ] > str(count) Classes ?data.table? and 'data.frame': 24 obs. of 3 variables: $ M : int 1 1 1 1 1 2 2 2 2 2 ... $ P : int 1 2 3 4 5 1 2 3 4 5 ... $ count: int 5 4 3 2 9 3 3 6 3 7 ... > count M P count 1 1 5 1 2 4 1 3 3 1 4 2 1 5 9 2 1 3 2 2 3 2 3 6 2 4 3 On Mon, Jul 4, 2011 at 5:48 AM, UriB wrote: > I have a matrix of claims at year1 that I get simply by > > claims<-read.csv(file="Claims.csv") > qq1<-claims[claims$Year=="Y1",] > > I have MemberID and ProviderID for every claim in qq1 both are integers > > An example for the type of questions that I want to answer is > how many times ProviderID number 345 appears together with MemberID 23 in > the table qq1 > > In order to answer these questions for every possible ProviderId and every > possible MemberID > I would like to have a matrix that has first column as memberID when every > memberID in qq1 appears only once and columns that have number of appearance > of ProviderID==i for every i that has > sum(qq1$ProviderID==i)>0 > > My question is if there is a simple way to do it in R > Thanks in Advance > > Uri > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3643248.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jwiley.psych at gmail.com Mon Jul 4 19:55:51 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Mon, 4 Jul 2011 10:55:51 -0700 Subject: [R] I need help for creating a "timevar" In-Reply-To: <1309786893784-3643658.post@n4.nabble.com> References: <1309786893784-3643658.post@n4.nabble.com> Message-ID: Hi Karen, As long as your IDENTITY column goes up in order, this should work: ## example data dat <- data.frame(IDENTITY = rep(101:103, 3:1), EVENT = "Event") dat$TIMEVAR <- unlist(with(dat, tapply(EVENT, IDENTITY, seq_along))) ## Result dat See ?tapply and ?seq_along for some documentation Hope this helps, Josh On Mon, Jul 4, 2011 at 6:41 AM, kbr wrote: > Hi all! > > I have data in ?Long? format which I would like to reshape to ?Wide?. I know > that one possibility is the ?reshape? command, which needs a ?timevar?. > > Data look as follows: There are approx. 3000 persons (?IDENTITY?) and, for > each person, there are between 2 and 20 events (?EVENT?). ?For now, there's > one row for each event (9506 rows) > > http://r.789695.n4.nabble.com/file/n3643658/Screenshot-2.png > > What is missing is the ?timevar? (SPSS calls it ?INDEX?), which numbers the > events WITHIN each person (right column). > > I managed to number the events from 1 to 9506 with the seq-command, first > writing the number of rows in nEVENT: >> number <-seq(file=event, 1, nEVENT, b=1) > Yet, I didn't manage to do so for each individual separately. I guess it > would be possible with the ?split? command, but I can't figure out how to > apply it. > > Can anyone give me a hint? > Thank you! > > Karen > > -- > View this message in context: http://r.789695.n4.nabble.com/I-need-help-for-creating-a-timevar-tp3643658p3643658.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From b.rowlingson at lancaster.ac.uk Mon Jul 4 20:04:47 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Mon, 4 Jul 2011 19:04:47 +0100 Subject: [R] Protecting R code In-Reply-To: References: <4E11D858.5010406@prodsyse.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jcpayne at uw.edu Mon Jul 4 19:33:30 2011 From: jcpayne at uw.edu (genghis) Date: Mon, 4 Jul 2011 10:33:30 -0700 (PDT) Subject: [R] SOLVED: superimposing different plot types in lattice panel.superpose In-Reply-To: References: <1309744322916-3642808.post@n4.nabble.com> Message-ID: <1309800810586-3644145.post@n4.nabble.com> Thank you very much Dennis, that's wonderful. I tried it first without LatticeExtra and it didn't work, so yes that package is the key. Best, John -- View this message in context: http://r.789695.n4.nabble.com/superimposing-different-plot-types-in-lattice-panel-superpose-tp3642808p3644145.html Sent from the R help mailing list archive at Nabble.com. From dwinsemius at comcast.net Mon Jul 4 20:16:09 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 4 Jul 2011 14:16:09 -0400 Subject: [R] Rpad library In-Reply-To: <1309797908267-3644041.post@n4.nabble.com> References: <1309797908267-3644041.post@n4.nabble.com> Message-ID: On Jul 4, 2011, at 12:45 PM, ATANU wrote: > can anyone help me with a well documented tutorial on Rpad package? > I need to > do HTML programming in R.Can anyone help me with a tutorial? Trivial Google searching produces this link: http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/index.html > > -- > View this message in context: http://r.789695.n4.nabble.com/Rpad-library-tp3644041p3644041.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From mailinglist.honeypot at gmail.com Mon Jul 4 20:18:39 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Mon, 4 Jul 2011 14:18:39 -0400 Subject: [R] Protecting R code In-Reply-To: References: <4E11D858.5010406@prodsyse.com> Message-ID: On Mon, Jul 4, 2011 at 2:04 PM, Barry Rowlingson wrote: > On Mon, Jul 4, 2011 at 5:48 PM, Vaishali Sadaphal > wrote: > >> >> Hey All, >> >> Thank you so much for quick replies. >> Looks like translation to C/C++ is the only robust option. Do you think >> there exists any ready-made R to C translator? >> >> > ?No, I think they are normally all born without the R to C translation > skills and acquire them through a long process of going to school and > college and spending long long hours studying R and C... > > ?I suggest that if your code is so commercially sensitive that you want it > written in C, then hire a C programmer to do it. Money well spent. Also -- check out Rcpp: http://cran.r-project.org/web/packages/Rcpp/index.html It will ease some of the R <--> C(++) bridging pain, but also provides things like "sugar": http://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-sugar.pdf Which may make writing your code in C++ a bit easier. YMMV, of course. The library is also GPL though, so I'm not sure what that will make your end code. Although I guess you'll just be linking to it at the end of the day, but I'm not sure what prevailing wisdom these days about whether or not that restricts the license of your code. HTH, -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From albertcoster2010 at gmail.com Mon Jul 4 20:24:45 2011 From: albertcoster2010 at gmail.com (albert coster) Date: Mon, 4 Jul 2011 20:24:45 +0200 Subject: [R] How to merge two files Message-ID: Dear all, I have two files : seq.txt: NNNNNNNNNNATTAAAGGGC scores.txt : 0.8 0.7 0.3 0.5 0.6 0.5 0.01 0.9 0.3 0.8 I want output as following A 0.8 T 0.7 T 0.3 A 0.5 A 0.6 A 0.5 G 0.01 G 0.9 G 0.3 C 0.8 Where N are deleted and only A/T/G/C are appearing in a column. Thanks Albert -------------- next part -------------- 0.8 0.7 0.3 0.5 0.6 0.5 0.01 0.9 0.3 0.8 -------------- next part -------------- NNNNNNNNNNATTAAAGGGC From annemarie.verkerk at mpi.nl Mon Jul 4 20:32:37 2011 From: annemarie.verkerk at mpi.nl (Annemarie Verkerk) Date: Mon, 04 Jul 2011 20:32:37 +0200 Subject: [R] placing multiple rows in a single row Message-ID: <4E120745.5010903@mpi.nl> Dear people from the R help list, I have a question that I can't get my head around to start answering, that is why I am writing to the list. I have data in a format like this (tabs might look weird): John A1 1 0 1 John A2 1 1 1 John A3 1 0 0 Mary A1 1 0 1 Mary A2 0 0 1 Mary A3 1 1 0 Peter A1 1 0 0 Peter A2 0 0 1 Peter A3 1 1 1 Josh A1 1 0 0 Josh A2 Josh A3 0 0 0 I want to convert it into a format where variable rows from a single subject are placed behind each other, but with the different scores still matching up (i.e., it needs to be able to cope with missing data, as for Josh's A2 score). John A1 1 0 1 A2 1 1 1 A3 1 0 0 Mary A1 1 0 1 A2 0 0 1 A3 1 1 0 Peter A1 1 0 0 A2 0 0 1 A3 1 1 1 Josh A1 1 0 0 A2 A3 0 0 0 Preferably, the row identification would become the header of the new table, something like this: A11 A12 A13 A21 A22 A23 A31 A32 A33 John 1 0 1 1 1 1 1 0 0 Mary 1 0 1 0 0 1 1 1 0 Peter 1 0 0 0 0 1 1 1 1 Josh 1 0 0 0 0 0 Probably, this has been addressed before - I just don't know how to search for the answer with the right search terms. Any help is appreciated, even just a link to a page where this is addressed! Thank you! Annemarie -- Annemarie Verkerk, MA Evolutionary Processes in Language and Culture (PhD student) Max Planck Institute for Psycholinguistics P.O. Box 310, 6500AH Nijmegen, The Netherlands +31 (0)24 3521 185 http://www.mpi.nl/research/research-projects/evolutionary-processes From paul.guilhamon at gmail.com Mon Jul 4 20:22:10 2011 From: paul.guilhamon at gmail.com (pguilha) Date: Mon, 4 Jul 2011 11:22:10 -0700 (PDT) Subject: [R] clustering based on most significant pvalues does not separate the groups! Message-ID: <1309803730461-3644249.post@n4.nabble.com> Hi all, I have some microarray data on 40 samples that fall into two groups. I have a value for 480k probes for each of those samples. I performed a t test (rowttests) on each row(giving the indices of the columns for each group) then used p.adjust() to adjust the pvalues for the number of tests performed. I then selected only the probes with adj-p.value<=0.05. I end up with roughly 2000 probes to do the clustering on but using pvclust, and hclust, the samples do no split up into the two groups. I would have imagined that using only those values that are significantly different between the two groups, the clustering should surely reflect that? Please, what am I missing!!!!??? Thanks! Paul PS: I am hoping I have just thought this through in the wrong way and there is a simple explanation, but can provide the code I am using for clustering if necessary! -- View this message in context: http://r.789695.n4.nabble.com/clustering-based-on-most-significant-pvalues-does-not-separate-the-groups-tp3644249p3644249.html Sent from the R help mailing list archive at Nabble.com. From jwiley.psych at gmail.com Mon Jul 4 20:45:23 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Mon, 4 Jul 2011 11:45:23 -0700 Subject: [R] How to merge two files In-Reply-To: References: Message-ID: Dear Albert, Here is one way: tmp.scores <- readLines("~/scores.txt") tmp.seq <- readLines("~/seq.txt") tmp.seq <- strsplit(gsub("N", "", tmp.seq), "")[[1]] genedat <- data.frame(Sequence = tmp.seq, Scores = as.numeric(tmp.scores)) ## Yields > genedat Sequence Scores 1 A 0.80 2 T 0.70 3 T 0.30 4 A 0.50 5 A 0.60 6 A 0.50 7 G 0.01 8 G 0.90 9 G 0.30 10 C 0.80 Hope this helps, Josh 2011/7/4 albert coster : > Dear all, > > I have two files : > > seq.txt: NNNNNNNNNNATTAAAGGGC > > scores.txt : > > 0.8 > 0.7 > 0.3 > 0.5 > 0.6 > 0.5 > 0.01 > 0.9 > 0.3 > 0.8 > > I want output as following > > A 0.8 > T 0.7 > T 0.3 > A 0.5 > A 0.6 > A 0.5 > G 0.01 > G 0.9 > G 0.3 > C 0.8 > > Where N are deleted and only A/T/G/C are appearing in a column. > > Thanks > > Albert > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From dwinsemius at comcast.net Mon Jul 4 21:00:47 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 4 Jul 2011 15:00:47 -0400 Subject: [R] placing multiple rows in a single row In-Reply-To: <4E120745.5010903@mpi.nl> References: <4E120745.5010903@mpi.nl> Message-ID: <138FA819-1BBB-4903-9675-516AC1CE9883@comcast.net> On Jul 4, 2011, at 2:32 PM, Annemarie Verkerk wrote: > Dear people from the R help list, > > I have a question that I can't get my head around to start > answering, that is why I am writing to the list. > > I have data in a format like this (tabs might look weird): > > John A1 1 0 1 > John A2 1 1 1 > John A3 1 0 0 > Mary A1 1 0 1 > Mary A2 0 0 1 > Mary A3 1 1 0 > Peter A1 1 0 0 > Peter A2 0 0 1 > Peter A3 1 1 1 > Josh A1 1 0 0 > Josh A2 > Josh A3 0 0 0 > > I want to convert it into a format where variable rows from a single > subject are placed behind each other, but with the different scores > still matching up (i.e., it needs to be able to cope with missing > data, as for Josh's A2 score). > > John A1 1 0 1 A2 1 1 1 A3 1 > 0 0 > Mary A1 1 0 1 A2 0 0 1 A3 1 > 1 0 > Peter A1 1 0 0 A2 0 0 1 A3 1 > 1 1 > Josh A1 1 0 0 A2 A3 0 0 0 > > Preferably, the row identification would become the header of the > new table, something like this: > > A11 A12 A13 A21 A22 A23 A31 A32 A33 > John 1 0 1 1 1 1 1 0 0 > Mary 1 0 1 0 0 1 1 1 0 > Peter 1 0 0 0 0 1 1 1 1 > Josh 1 0 0 0 0 0 > > Probably, this has been addressed before - I just don't know how to > search for the answer with the right search terms. > > Any help is appreciated, even just a link to a page where this is > addressed! There is a reshape function in the stats package that nobody except Phil Spector seems to understand and then there is the reshape and reshape2 packages that everybody seems to get. (I don't understand why the classification variables are on the left-hand-side, though. Positionally it makes some sense, but logically it does not connect with how I understand the process.) require(reshape2) # entered your data with default names V1 V2 V3 V4 V5 > nam123 V1 V2 V3 V4 V5 1 John A1 1 0 1 2 John A2 1 1 1 3 John A3 1 0 0 4 Mary A1 1 0 1 5 Mary A2 0 0 1 6 Mary A3 1 1 0 7 Peter A1 1 0 0 8 Peter A2 0 0 1 9 Peter A3 1 1 1 10 Josh A1 1 0 0 11 Josh A2 NA NA NA 12 Josh A3 0 0 0 > nams.mlt <- melt(nam123, idvars=c("V1", "V2")) > str(nams.mlt) 'data.frame': 36 obs. of 4 variables: $ V1 : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3 3 4 4 4 2 ... $ V2 : Factor w/ 3 levels "A1","A2","A3": 1 2 3 1 2 3 1 2 3 1 ... $ variable: Factor w/ 3 levels "V3","V4","V5": 1 1 1 1 1 1 1 1 1 1 ... $ value : int 1 1 1 1 0 1 1 0 1 1 ... > dcast(nams.mlt, V1+V2 ~ variable) V1 V2 V3 V4 V5 1 John A1 1 0 1 2 John A2 1 1 1 3 John A3 1 0 0 4 Josh A1 1 0 0 5 Josh A2 NA NA NA 6 Josh A3 0 0 0 7 Mary A1 1 0 1 8 Mary A2 0 0 1 9 Mary A3 1 1 0 10 Peter A1 1 0 0 11 Peter A2 0 0 1 12 Peter A3 1 1 1 > dcast(nams.mlt, V1 ~ V2+variable) V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5 1 John 1 0 1 1 1 1 1 0 0 2 Josh 1 0 0 NA NA NA 0 0 0 3 Mary 1 0 1 0 0 1 1 1 0 4 Peter 1 0 0 0 0 1 1 1 1 You can always change the names of the dataframe if you want, and in this case it would be a simple sub() operation. Personally I would substitute "." rather than "". -- David Winsemius, MD West Hartford, CT From jwiley.psych at gmail.com Mon Jul 4 21:25:39 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Mon, 4 Jul 2011 12:25:39 -0700 Subject: [R] loop in optim In-Reply-To: <1309772071774-3643230.post@n4.nabble.com> References: <1309772071774-3643230.post@n4.nabble.com> Message-ID: Hi Edward, At least for me, your llik() function returns Inf for the starting values specified, so optim() never gets to estimate anything. You need to alter llik() or find starting parameters that work before worrying about getting the for loop working. Cheers, Josh On Mon, Jul 4, 2011 at 2:34 AM, EdBo wrote: > Hi > > May you help me correct my loop function. > > I want optim to estimates al_j; au_j; sigma_j; ?b_j by looking at 0 to 20, > 21 to 40, 41 to 60 data points. > > The final result should have 4 columns of each of the estimates AND 4 rows > of each of 0 to 20, 21 to 40, 41 to 60. > > ###MY code is > > n=20 > runs=4 > out=matrix(0,nrow=runs) > > llik = function(x) > ? { > ? ?al_j=x[1]; au_j=x[2]; sigma_j=x[3]; ?b_j=x[4] > ? ?sum(na.rm=T, > ? ? ? ?ifelse(a$R_j< 0, -log(1/(2*pi*(sigma_j^2)))- > ? ? ? ? ? ? ? ? ? ? ? ? ? (1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, > ? ? ? ? ifelse(a$R_j>0 , -log(1/(2*pi*(sigma_j^2)))- > ? ? ? ? ? ? ? ? ? ? ? ? ? (1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, > > -log(pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2))- > ? ? ? ? ? ? ? ? ? ? ? ? ? pnorm(au_j,mean=b_j*a$R_m,sd=sqrt(sigma_j^2))))) > > ? ? ? ) > > ? } > > start.par = c(0, 0, 0.01, 1) > out1 = optim(llik, par=start.par, method="Nelder-Mead") > > > for (i in 1: runs) > { > ?index_start=20*(i-1)+1 > ?index_end= 20*i > ?out[i]=out1[index_start:index_end] > } > out > > > Thank you in advance > > Edward > UCT > ####My data > > R_j ? ? ? ? ? ? R_m > -0.0625 ? ? ? ? 0.002320654 > 0 ? ? ? ? ? ? ? -0.004642807 > 0.033333333 ? ? 0.005936332 > 0.032258065 ? ? 0.001060848 > 0 ? ? ? ? ? ? ? 0.007114057 > 0.015625 ? ? ? ?0.005581558 > 0 ? ? ? ? ? ? ? 0.002974794 > 0.015384615 ? ? 0.004215271 > 0.060606061 ? ? 0.005073116 > 0.028571429 ? ? -0.006001279 > 0 ? ? ? ? ? ? ? -0.002789594 > 0.013888889 ? ? 0.00770633 > 0 ? ? ? ? ? ? ? 0.000371663 > 0.02739726 ? ? ?-0.004224228 > -0.04 ? ? ? ? ? 0.008362539 > 0 ? ? ? ? ? ? ? -0.010951605 > 0 ? ? ? ? ? ? ? 0.004682924 > 0.013888889 ? ? 0.011839993 > -0.01369863 ? ? 0.004210383 > -0.027777778 ? ?-0.04658949 > 0 ? ? ? ? ? ? ? 0.00987272 > -0.057142857 ? ?-0.062203157 > -0.03030303 ? ? -0.119177639 > 0.09375 ? ? ? ? 0.077054642 > 0 ? ? ? ? ? ? ? -0.022763619 > -0.057142857 ? ?0.050408775 > 0 ? ? ? ? ? ? ? 0.024706076 > -0.03030303 ? ? 0.004043701 > 0.0625 ? ? ? ? ?0.004951088 > 0 ? ? ? ? ? ? ? -0.005968731 > 0 ? ? ? ? ? ? ? -0.038292548 > 0 ? ? ? ? ? ? ? 0.013381097 > 0.014705882 ? ? 0.006424728 > -0.014492754 ? ?-0.020115626 > 0 ? ? ? ? ? ? ? -0.004837891 > -0.029411765 ? ?-0.022054654 > 0.03030303 ? ? ?0.008936428 > 0.044117647 ? ? 8.16925E-05 > 0 ? ? ? ? ? ? ? -0.004827246 > -0.042253521 ? ?0.004653096 > -0.014705882 ? ?-0.004222151 > 0.029850746 ? ? 0.000107267 > -0.028985507 ? ?-0.001783206 > 0.029850746 ? ? -0.006372981 > 0.014492754 ? ? 0.005492374 > -0.028571429 ? ?-0.009005846 > 0 ? ? ? ? ? ? ? 0.001031683 > 0.044117647 ? ? 0.002800551 > > > > > > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3643230.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/ From rosenbat at gmail.com Mon Jul 4 22:24:47 2011 From: rosenbat at gmail.com (Ted Rosenbaum) Date: Mon, 4 Jul 2011 16:24:47 -0400 Subject: [R] R CMD SHLIB with ifort Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jdnewmil at dcn.davis.ca.us Mon Jul 4 22:52:50 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Mon, 04 Jul 2011 13:52:50 -0700 Subject: [R] R CMD SHLIB with ifort In-Reply-To: References: Message-ID: <89c17bbd-3846-43d4-8d97-5ba8ec45c03e@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From anopheles123 at gmail.com Mon Jul 4 23:05:22 2011 From: anopheles123 at gmail.com (Weidong Gu) Date: Mon, 4 Jul 2011 17:05:22 -0400 Subject: [R] modification of cross-validations in rpart In-Reply-To: References: Message-ID: One way around hacking rpart is to write code to do K fold samples based on unit outside rpart, then build trees using training sets and summarize scores on testing sets. Weidong Gu On Mon, Jul 4, 2011 at 9:22 AM, Katerine Goyer wrote: > > > > > > > > Hello, > > > > I am using > the rpart function (from the rpart package) to do a regression tree that would describe > the behaviour of a fish species according to several environmental variables. > For each fish (sampling unit), I have repeated observations of the response > variable, which means that the data are not independent. Normally, in this > case, V-fold cross-validation needs to be modified to prevent over-optimistic > predictions of error rates by cross-validation and overestimation of the tree > size. A way to overcome this problem is by selecting only whole sampling units > in our subsets of cross-validation. My problem is that I don?t know how to > perform this modification of the cross-validation process in the rpart > function. > > > Is there a > way to do this modification in rpart or is there any other function I could use > that would consider interdependence in the response variable? > > > Here is an > example of the code I am using (?Y? being the response variable and ?data.env? > being a data frame of the environmental > variables): > > > Tree = rpart(Y > ~ X1 + X2 + X3,xval=100,data=data.env) > > > > Thanks > > Katerine > > > > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From krcabrer at une.net.co Tue Jul 5 00:47:09 2011 From: krcabrer at une.net.co (Kenneth Roy Cabrera Torres) Date: Mon, 04 Jul 2011 17:47:09 -0500 Subject: [R] FinCenter in timeSeries with "merge", "cbind" and "rbind" Message-ID: <1309819629.918.5.camel@kenneth-desktop> Hi R users: When I try to merge or bind (cbind or rbind) two series, both with a "FinCenter" different that GMT, the result is "GMT" not the original financial center? What am I doing wrong? ###################################################### require(timeSeries) getRmetricsOptions("myFinCenter") setRmetricsOptions(myFinCenter = "America/Bogota") getRmetricsOptions("myFinCenter") fechas <- format(timeCalendar(2010, sample(12, 6))) datos <- matrix(round(rnorm(6), 3)) t1 <- sort(timeSeries(datos, fechas, units = "A")) t1 fechas <- format(timeCalendar(2010, sample(12, 6))) datos <- matrix(round(rnorm(6), 3)) t2 <- sort(timeSeries(datos, fechas, units = "B")) t2 merge(t1,t2) cbind(t1,t2) rbind(t1,t2) ###################################################### Thank you for your help. Kenneth From krcabrer at une.net.co Tue Jul 5 00:47:09 2011 From: krcabrer at une.net.co (Kenneth Roy Cabrera Torres) Date: Mon, 04 Jul 2011 17:47:09 -0500 Subject: [R] [R-SIG-Finance] FinCenter in timeSeries with "merge", "cbind" and "rbind" Message-ID: <1309819629.918.5.camel@kenneth-desktop> Hi R users: When I try to merge or bind (cbind or rbind) two series, both with a "FinCenter" different that GMT, the result is "GMT" not the original financial center? What am I doing wrong? ###################################################### require(timeSeries) getRmetricsOptions("myFinCenter") setRmetricsOptions(myFinCenter = "America/Bogota") getRmetricsOptions("myFinCenter") fechas <- format(timeCalendar(2010, sample(12, 6))) datos <- matrix(round(rnorm(6), 3)) t1 <- sort(timeSeries(datos, fechas, units = "A")) t1 fechas <- format(timeCalendar(2010, sample(12, 6))) datos <- matrix(round(rnorm(6), 3)) t2 <- sort(timeSeries(datos, fechas, units = "B")) t2 merge(t1,t2) cbind(t1,t2) rbind(t1,t2) ###################################################### Thank you for your help. Kenneth _______________________________________________ R-SIG-Finance at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go. _______________________________________________ e at 1 Churchill Place, London, E14 5HP. This email may relate to or be sent from other members of the Barclays Group. _______________________________________________ ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email From landronimirc at gmail.com Tue Jul 5 02:34:07 2011 From: landronimirc at gmail.com (Liviu Andronic) Date: Tue, 5 Jul 2011 02:34:07 +0200 Subject: [R] "Low Pain" Unicode Characters in pdf graph? In-Reply-To: References: Message-ID: On Sun, May 15, 2011 at 3:06 PM, ivo welch wrote: > Dear R-experts---is there a relatively low-pain way to get unicode > characters into a plot to a pdf device? > Have you tried Cairo package or cairo_pdf()? Both are making use of Cairo, which uses UTF-8 and automatically embeds fonts. Regards Liviu From ericstrom at aol.com Tue Jul 5 02:42:53 2011 From: ericstrom at aol.com (eric) Date: Mon, 4 Jul 2011 17:42:53 -0700 (PDT) Subject: [R] Stuck ...can't get sapply and xmlTreeParse working Message-ID: <1309826573807-3644894.post@n4.nabble.com> Can't seem to get the code below working. It gets stuck on line 24 inside the function hm; comments show the line in question. The function hm is called by sapply and is at the bottom of the code. Other stuff above line 24 works correctly including the first couple of lines of the function hm. Should I be using a different apply function or am I doing something wrong with xmlTreeParse ? library(XML) url.montco <- "http://webapp.montcopa.org/sherreal/salelist.asp?saledate=07/27/2011" tbl <-data.frame(readHTMLTable(url.montco))[, c(3,5,6,8,9)] tbl <-tbl[2: length(tbl[,1]),] names(tbl) <- c("Address", "Township", "Parcel", "SaleDate", "Costs"); rownames(tbl) <- NULL v <- gregexpr("( aka )|( AKA )",tbl$Address) s <-sapply(v, function(x) max(unlist(x))) tbl$Address <- substring(tbl$Address, ifelse(s== -1, 0, s+4), 10000) tbl$Cost <- gsub(',', '', tbl$Costs) temp <- strsplit(tbl$Cost, "\\$") temp <- do.call(rbind, temp) # create a matrix mode(temp) <- 'numeric' tbl$Debt <- round(temp[, 2]/1000,2) tbl$Court <- round(temp[, 3]/1000,2) z <- data.frame(substr(tbl$SaleDate,regexpr("[A-Za-z]", tbl$SaleDate), regexpr("[0-9]", tbl$SaleDate,)-1)) ; names(z) <- "Action" y <- data.frame(substr(tbl$SaleDate,regexpr("[0-9]", tbl$SaleDate),2011)) ; names(y) <- "ActionDate" tbl <-cbind(tbl[, c(1,2,3,7,8)],z,y) new.add <- paste(tbl$Address,"&citystatezip=",tbl$Township,"%2C+PA", sep='') new.add <- sub("^( )+","", new.add) new.add <-data.frame(gsub("( )+",'+', new.add)); names(new.add) <- "ParseAddress" hm <- function(x) { url.zill <-paste("http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1bup03e49vv_5kvb6&address=",x, sep="") ############## problem line is next ################################# zdoc <-xmlTreeParse(url.zill, useInternalNode=TRUE, isURL=TRUE) ############# problem line above ################################## f$zpid <- sapply(getNodeSet(zdoc, "//result/zpid"), xmlValue) f$zest.low <-sapply(getNodeSet(zdoc, "//valuationRange/low"), xmlValue) f$zest <- sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue) rm(zdoc) return(f) } j <-sapply(new.add, FUN=hm) print(zest) -- View this message in context: http://r.789695.n4.nabble.com/Stuck-can-t-get-sapply-and-xmlTreeParse-working-tp3644894p3644894.html Sent from the R help mailing list archive at Nabble.com. From dwinsemius at comcast.net Tue Jul 5 04:41:35 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Mon, 4 Jul 2011 22:41:35 -0400 Subject: [R] [R-SIG-Finance] FinCenter in timeSeries with "merge", "cbind" and "rbind" In-Reply-To: <1309819629.918.5.camel@kenneth-desktop> References: <1309819629.918.5.camel@kenneth-desktop> Message-ID: On Jul 4, 2011, at 6:47 PM, Kenneth Roy Cabrera Torres wrote: > Hi R users: > > When I try to merge or bind (cbind or rbind) two series, > both with a "FinCenter" different that GMT, the > result is "GMT" not the original financial center? It's not in the help(cbind.timeSeries) page but looking at the function (and at the documentation for the timeSeries class) you see that there is a "zone" argument and that it is "GMT" by default. So why don't you add something meaningful to your code (... that is not presented in a reproducible manner for testing.) It is listed in the documentation as zone="", but in the cbind function call, the code is zone="GMT". It appears that the "zone" argument might be for input and the FinCenter might be for output in the help page for timeDate, but I think that aspect of the various parts of the documentation is rather vague and might do with a bit of clarification. -- David. > > What am I doing wrong? > > ###################################################### > require(timeSeries) > > getRmetricsOptions("myFinCenter") > setRmetricsOptions(myFinCenter = "America/Bogota") > getRmetricsOptions("myFinCenter") > > fechas <- format(timeCalendar(2010, sample(12, 6))) > datos <- matrix(round(rnorm(6), 3)) > t1 <- sort(timeSeries(datos, fechas, units = "A")) > t1 > > fechas <- format(timeCalendar(2010, sample(12, 6))) > datos <- matrix(round(rnorm(6), 3)) > t2 <- sort(timeSeries(datos, fechas, units = "B")) > t2 > > merge(t1,t2) > cbind(t1,t2) > rbind(t1,t2) > ###################################################### > > Thank you for your help. > > Kenneth > > _______________________________________________ > R-SIG-Finance at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-sig-finance > -- Subscriber-posting only. If you want to post, subscribe first. > -- Also note that this is not the r-help list where general R > questions should go. > > _______________________________________________ > > > e at 1 Churchill Place, London, E14 5HP. This email may relate to > or be sent from other members of the Barclays Group. > _______________________________________________ > > ______________________________________________________________________ > This email has been scanned by the MessageLabs Email Security System. > For more information please visit http://www.messagelabs.com/email > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From martin.brandt at univie.ac.at Tue Jul 5 04:42:28 2011 From: martin.brandt at univie.ac.at (Martin B.) Date: Mon, 4 Jul 2011 19:42:28 -0700 (PDT) Subject: [R] Seasonality of time series Message-ID: <1309833748350-3644985.post@n4.nabble.com> Dear all, I have a time series of 10-day tropical rainfall data with a typical rainy and dry season. Is there a way to extract seasonal information with R, like the day of the start and end of each rainy season for each year? Martin Brandt University of Vienna -- View this message in context: http://r.789695.n4.nabble.com/Seasonality-of-time-series-tp3644985p3644985.html Sent from the R help mailing list archive at Nabble.com. From matildaelizabethv at gmail.com Tue Jul 5 01:50:50 2011 From: matildaelizabethv at gmail.com (Matilda E. Gogos) Date: Mon, 4 Jul 2011 19:50:50 -0400 Subject: [R] Bad Confirmation String Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jholtman at gmail.com Tue Jul 5 05:00:26 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 4 Jul 2011 23:00:26 -0400 Subject: [R] Stuck ...can't get sapply and xmlTreeParse working In-Reply-To: <1309826573807-3644894.post@n4.nabble.com> References: <1309826573807-3644894.post@n4.nabble.com> Message-ID: The value of 'url.zill' is a vector of 407 character strings: Browse[1]> str(url.zill) chr [1:407] "http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1bup03e49vv_5kvb6&address=10+PACER+LN&citystatezip=East+"| __truncated__ ... Isn't it supposed to be just a single file name? On Mon, Jul 4, 2011 at 8:42 PM, eric wrote: > Can't seem to get the code below working. It gets stuck on line 24 inside the > function hm; comments show the line in question. The function hm is called > by sapply and is at the bottom of the code. Other stuff above line 24 works > correctly including the first couple of lines of the function hm. Should I > be using a different apply function or am I doing something wrong with > xmlTreeParse ? > > > library(XML) > url.montco <- > "http://webapp.montcopa.org/sherreal/salelist.asp?saledate=07/27/2011" > tbl <-data.frame(readHTMLTable(url.montco))[, c(3,5,6,8,9)] > tbl <-tbl[2: length(tbl[,1]),] > names(tbl) <- c("Address", "Township", "Parcel", "SaleDate", "Costs"); > rownames(tbl) <- NULL > v <- gregexpr("( aka )|( AKA )",tbl$Address) > s <-sapply(v, function(x) max(unlist(x))) > tbl$Address <- substring(tbl$Address, ifelse(s== -1, 0, s+4), 10000) > tbl$Cost <- gsub(',', '', tbl$Costs) > temp <- strsplit(tbl$Cost, "\\$") > temp <- do.call(rbind, temp) ?# create a matrix > mode(temp) <- 'numeric' > tbl$Debt <- round(temp[, 2]/1000,2) > tbl$Court <- round(temp[, 3]/1000,2) > z <- data.frame(substr(tbl$SaleDate,regexpr("[A-Za-z]", tbl$SaleDate), > regexpr("[0-9]", tbl$SaleDate,)-1)) ; names(z) <- "Action" > y <- data.frame(substr(tbl$SaleDate,regexpr("[0-9]", tbl$SaleDate),2011)) ; > names(y) <- "ActionDate" > tbl <-cbind(tbl[, c(1,2,3,7,8)],z,y) > new.add <- paste(tbl$Address,"&citystatezip=",tbl$Township,"%2C+PA", sep='') > new.add <- sub("^( )+","", new.add) > new.add <-data.frame(gsub("( )+",'+', new.add)); names(new.add) <- > "ParseAddress" > hm <- function(x) { > ?url.zill > <-paste("http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1bup03e49vv_5kvb6&address=",x, > sep="") > ?############## problem line is next ################################# > ?zdoc <-xmlTreeParse(url.zill, useInternalNode=TRUE, isURL=TRUE) > ?############# problem line above ?################################## > ?f$zpid <- sapply(getNodeSet(zdoc, "//result/zpid"), xmlValue) > ?f$zest.low <-sapply(getNodeSet(zdoc, "//valuationRange/low"), xmlValue) > ?f$zest <- sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue) > ?rm(zdoc) > ?return(f) > } > j <-sapply(new.add, FUN=hm) > print(zest) > > -- > View this message in context: http://r.789695.n4.nabble.com/Stuck-can-t-get-sapply-and-xmlTreeParse-working-tp3644894p3644894.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From kaslah90 at yahoo.com Tue Jul 5 05:00:23 2011 From: kaslah90 at yahoo.com (Ungku Akashah) Date: Mon, 4 Jul 2011 20:00:23 -0700 (PDT) Subject: [R] Fw: volcano plot.r In-Reply-To: <1309396445.53106.YahooMailNeo@web46310.mail.sp1.yahoo.com> References: <1309396445.53106.YahooMailNeo@web46310.mail.sp1.yahoo.com> Message-ID: <1309834823.73584.YahooMailNeo@web46303.mail.sp1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jholtman at gmail.com Tue Jul 5 05:06:05 2011 From: jholtman at gmail.com (jim holtman) Date: Mon, 4 Jul 2011 23:06:05 -0400 Subject: [R] Stuck ...can't get sapply and xmlTreeParse working In-Reply-To: <1309826573807-3644894.post@n4.nabble.com> References: <1309826573807-3644894.post@n4.nabble.com> Message-ID: Probably this is what you want; convert the first column of 'new.add' to character and then use in the sapply. Now it seems to work in that data is read in, but the new error is that "f" is not defined. What is it supposed to be? > x <- as.character(new.add[[1]]) > z <- sapply(x, hm) Error in f$zpid <- sapply(getNodeSet(zdoc, "//result/zpid"), xmlValue) : object 'f' not found Enter a frame number, or 0 to exit 1: sapply(x, hm) 2: lapply(X, FUN, ...) 3: FUN(c("10+PACER+LN&citystatezip=East+Norriton%2C+PA", "141+ROSEMONT+AVE&citystatezip=Norristown%2C+PA", "6 On Mon, Jul 4, 2011 at 8:42 PM, eric wrote: > Can't seem to get the code below working. It gets stuck on line 24 inside the > function hm; comments show the line in question. The function hm is called > by sapply and is at the bottom of the code. Other stuff above line 24 works > correctly including the first couple of lines of the function hm. Should I > be using a different apply function or am I doing something wrong with > xmlTreeParse ? > > > library(XML) > url.montco <- > "http://webapp.montcopa.org/sherreal/salelist.asp?saledate=07/27/2011" > tbl <-data.frame(readHTMLTable(url.montco))[, c(3,5,6,8,9)] > tbl <-tbl[2: length(tbl[,1]),] > names(tbl) <- c("Address", "Township", "Parcel", "SaleDate", "Costs"); > rownames(tbl) <- NULL > v <- gregexpr("( aka )|( AKA )",tbl$Address) > s <-sapply(v, function(x) max(unlist(x))) > tbl$Address <- substring(tbl$Address, ifelse(s== -1, 0, s+4), 10000) > tbl$Cost <- gsub(',', '', tbl$Costs) > temp <- strsplit(tbl$Cost, "\\$") > temp <- do.call(rbind, temp) ?# create a matrix > mode(temp) <- 'numeric' > tbl$Debt <- round(temp[, 2]/1000,2) > tbl$Court <- round(temp[, 3]/1000,2) > z <- data.frame(substr(tbl$SaleDate,regexpr("[A-Za-z]", tbl$SaleDate), > regexpr("[0-9]", tbl$SaleDate,)-1)) ; names(z) <- "Action" > y <- data.frame(substr(tbl$SaleDate,regexpr("[0-9]", tbl$SaleDate),2011)) ; > names(y) <- "ActionDate" > tbl <-cbind(tbl[, c(1,2,3,7,8)],z,y) > new.add <- paste(tbl$Address,"&citystatezip=",tbl$Township,"%2C+PA", sep='') > new.add <- sub("^( )+","", new.add) > new.add <-data.frame(gsub("( )+",'+', new.add)); names(new.add) <- > "ParseAddress" > hm <- function(x) { > ?url.zill > <-paste("http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1bup03e49vv_5kvb6&address=",x, > sep="") > ?############## problem line is next ################################# > ?zdoc <-xmlTreeParse(url.zill, useInternalNode=TRUE, isURL=TRUE) > ?############# problem line above ?################################## > ?f$zpid <- sapply(getNodeSet(zdoc, "//result/zpid"), xmlValue) > ?f$zest.low <-sapply(getNodeSet(zdoc, "//valuationRange/low"), xmlValue) > ?f$zest <- sapply(getNodeSet(zdoc, "//zestimate/amount"), xmlValue) > ?rm(zdoc) > ?return(f) > } > j <-sapply(new.add, FUN=hm) > print(zest) > > -- > View this message in context: http://r.789695.n4.nabble.com/Stuck-can-t-get-sapply-and-xmlTreeParse-working-tp3644894p3644894.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From murdoch.duncan at gmail.com Tue Jul 5 05:15:48 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Mon, 04 Jul 2011 23:15:48 -0400 Subject: [R] Protecting R code In-Reply-To: References: Message-ID: <4E1281E4.4020703@gmail.com> On 11-07-04 3:47 AM, Vaishali Sadaphal wrote: > Hi All, > > I need to give my R code to my client to use. I would like to protect the > logic/algorithms that have been coded in R. This means that I would not > like anyone to be able to read the code. R is an open source project, so providing ways for you to do this is not one of our goals. If I were your client I would have asked for the source code for whatever you're doing; if your client isn't savvy enough to do that, you should provide it and explain why it is useful, and what your client is and isn't allowed to do with it. If you think your client will steal from you, then you should find another client. Duncan Murdoch > > I am searching for ways to protect R code. I would like to create a .exe > kind of file which could be executed without using R or requiring to > install R. I would not like the R code to be loaded in R. This is so > because, after R loads a function, if you type the function name on the > command prompt, you can see the complete code. I would not like to give > this type of access to the R code. > > I explored the option of creating .bat file (using command: R CMD BAT) and > byte code (using command: compile). These are not useful since they open > R, load these functions and then the R code is visible. > > Is there any other way to protect the R code which would help me package > all my files/source files and give me an executable file which would be > run without opening R? Another problem is that R is freely downloadable. > Is it somehow possible to protect the code from being loaded in R and > being seen. > > Thanks > -- > Vaishali > =====-----=====-----===== > Notice: The information contained in this e-mail > message and/or attachments to it may contain > confidential or privileged information. If you are > not the intended recipient, any dissemination, use, > review, distribution, printing or copying of the > information contained in this e-mail message > and/or attachments to it are strictly prohibited. If > you have received this communication in error, > please notify us by reply e-mail or telephone and > immediately and permanently delete the message > and any attachments. Thank you > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From n.bowora at gmail.com Tue Jul 5 05:21:14 2011 From: n.bowora at gmail.com (EdBo) Date: Mon, 4 Jul 2011 20:21:14 -0700 (PDT) Subject: [R] loop in optim In-Reply-To: References: <1309772071774-3643230.post@n4.nabble.com> Message-ID: <1309836074873-3645031.post@n4.nabble.com> Hi I have re-worked on my likelihood function and it is now working(#the code is below#). May you help me correct my loop function. I want optim to estimates al_j; au_j; sigma_j; b_j by looking at 0 to 20, 21 to 40, 41 to 60 data points. The final result should have 4 columns of each of the estimates AND 4 rows of each of 0 to 20, 21 to 40, 41 to 60. #likelihood function a=read.table("D:/hope.txt",header=T) attach(a) a llik = function(x) { al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4] sum(na.rm=T, ifelse(a$R_j< 0, log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, ifelse(a$R_j>0 , log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, log(ifelse (( pnorm (au_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- pnorm(al_j, mean=b_j * a$R_m, sd=sqrt (sigma_j^2) )) > 0, (pnorm (au_j,mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- pnorm(al_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2) )), 1)) )) ) } start.par = c(-0.01,0.01,0.1,1) out1 = optim(llik, par=start.par, method="Nelder-Mead") out1 -- View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3645031.html Sent from the R help mailing list archive at Nabble.com. From mateus_rabello at hotmail.com Tue Jul 5 06:57:57 2011 From: mateus_rabello at hotmail.com (Mateus Rabello) Date: Tue, 5 Jul 2011 01:57:57 -0300 Subject: [R] Create factor variable by groups Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From johannes_graumann at web.de Tue Jul 5 07:31:01 2011 From: johannes_graumann at web.de (Johannes Graumann) Date: Tue, 5 Jul 2011 08:31:01 +0300 Subject: [R] Prevent 'R CMD check' from reporting "NA"/"NA_character_" missmatch? References: Message-ID: Prof Brian Ripley wrote: > On Mon, 4 Jul 2011, Johannes Graumann wrote: > >> Hello, >> >> I'm writing a package am running 'R CMD check' on it. >> >> Is there any way to make 'R CMD check' not warn about a missmatch between >> 'NA_character_' (in the function definition) and 'NA' (in the >> documentation)? > > Be consistent .... Why do you want incorrect documentation of your > package? (It is not clear of the circumstances here: normally 1 vs 1L > and similar are not reported if they are the only errors.) > > And please do note the posting guide > > - this is not really the correct list > - you were asked to give an actual example with output. > Taken to R-devel. Thanks. Joh From djmuser at gmail.com Tue Jul 5 10:05:30 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 5 Jul 2011 01:05:30 -0700 Subject: [R] Create factor variable by groups In-Reply-To: References: Message-ID: Hi: There are several ways to do this; I'll offer one from the plyr package. See inline. On Mon, Jul 4, 2011 at 9:57 PM, Mateus Rabello wrote: > Hi, suppose that I have the following data.frame: > > ? ? ?cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y > ? ? ?24996 10020470 1 1 2 12 16 21 17 51 43 19 183 > ? ? ?24996 10020470 69 91 79 92 91 77 90 96 98 108 891 > ? ? ?36145 10020470 0 0 0 0 2 83 112 97 91 144 529 > ? ? ?44444 10023333 5 20 60 0 0 0 0 5 20 1000 1110 > > > I would like to create a new variable X that indicates which line, within the cnpj variable, has the highest value Y. For instance, within the cnpj = 10020470, the second line has the largest value Y (891). For cnpj = 10023333 is trivial (1110). Then, my new data.frame would become: > > ? ? ?cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y X > ? ? ?24996 10020470 1 1 2 12 16 21 17 51 43 19 183 FALSE > ? ? ?24996 10020470 69 91 79 92 91 77 90 96 98 108 891 TRUE > ? ? ?36145 10020470 0 0 0 0 2 83 112 97 91 144 529 FALSE > ? ? ?44444 10023333 5 20 60 0 0 0 0 5 20 1000 1110 TRUE > > > Notice that for every value of the variable cnpj, only one line will have X = TRUE. > > Then, I would like to create a variable Z that is the sum of variable Y, also by variable cnpj. Thus, if cnpj = 10020470, Z = 183 + 891 +529 and for cnpj = 10023333, Z = 120. These sums can easily be done with tapply or aggregate but those would eliminate line with equal cnpj and I don?t want that. I would like to achieve a data.frame like the following: > > ? ? ?cnae4 cnpj 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Y X Z > ? ? ?24996 10020470 1 1 2 12 16 21 17 51 43 19 183 FALSE 1603 > ? ? ?24996 10020470 69 91 79 92 91 77 90 96 98 108 891 TRUE 1603 > ? ? ?36145 10020470 0 0 0 0 2 83 112 97 91 144 529 FALSE 1603 > ? ? ?44444 10023333 5 20 60 0 0 0 0 5 20 1000 1110 TRUE 1110 To get the above structure, then assuming this data frame is named df, one way is to use the ddply() function in the plyr package with an external function to do the work for a generic subset with constant cnpj: library(plyr) myfun <- function(d) { d$X <- d$Y == max(d$Y) d$Z <- sum(d$Y) d } ddply(df, .(cnpj), myfun) > > > In the end I will eliminate all lines with X = FALSE. To do this, all you need is to rewrite the function slightly: myfun2 <- function(d) { d$Z <- sum(d$Y) d[which.max(d$Y), ] } ddply(df, .(cnpj), myfun) HTH, Dennis > > > Thank you and sorry for the long question. > > Mateus Rabello > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > From savicky at praha1.ff.cuni.cz Tue Jul 5 11:20:57 2011 From: savicky at praha1.ff.cuni.cz (Petr Savicky) Date: Tue, 5 Jul 2011 11:20:57 +0200 Subject: [R] modification of cross-validations in rpart In-Reply-To: References: Message-ID: <20110705092057.GA8459@praha1.ff.cuni.cz> On Mon, Jul 04, 2011 at 09:22:23AM -0400, Katerine Goyer wrote: > > Hello, > > I am using > the rpart function (from the rpart package) to do a regression tree that would describe > the behaviour of a fish species according to several environmental variables. > For each fish (sampling unit), I have repeated observations of the response > variable, which means that the data are not independent. Normally, in this > case, V-fold cross-validation needs to be modified to prevent over-optimistic > predictions of error rates by cross-validation and overestimation of the tree > size. A way to overcome this problem is by selecting only whole sampling units > in our subsets of cross-validation. My problem is that I don?t know how to > perform this modification of the cross-validation process in the rpart > function. > > > Is there a > way to do this modification in rpart or is there any other function I could use > that would consider interdependence in the response variable? > > > Here is an > example of the code I am using (?Y? being the response variable and ?data.env? > being a data frame of the environmental > variables): > > > Tree = rpart(Y > ~ X1 + X2 + X3,xval=100,data=data.env) > Hello. It may be needed to program crossvalidation at the R level using package tree, which does not contain crossvalidation itself. An example is as follows library(tree) X1 <- rnorm(200) X2 <- rnorm(200) X3 <- rnorm(200) Y <- ifelse(X1 > 0, X2, X3) data.env <- data.frame(X1, X2, X3, Y) ind <- rep(1:7, times=c(20, 30, 35, 30, 30, 25, 30)) # length(ind) == nrow(data.env) pred <- rep(NA, times=nrow(data.env)) for (i in unique(ind)) { Tree <- tree(Y ~ X1 + X2 + X3, data=data.env[ind != i, ]) PrunedTree <- prune.tree(Tree, best = 10) pred[ind == i] <- predict(PrunedTree, newdata=data.env[ind == i, ]) } plot(data.env$Y, pred, asp=1) The vector ind should be prepared so that all occurences of the same fish have the same value. See ?tree and ?prune.tree for further parameters. Consider also randomForest package, which may be more accurate, although it does not provide a comprehensible model. Hope this helps. Petr Savicky. From kb at flicon.de Tue Jul 5 12:05:54 2011 From: kb at flicon.de (kbr) Date: Tue, 5 Jul 2011 03:05:54 -0700 (PDT) Subject: [R] I need help for creating a "timevar" In-Reply-To: References: <1309786893784-3643658.post@n4.nabble.com> Message-ID: <1309860354333-3645575.post@n4.nabble.com> Hi Josh, It works perfectly! Thanks a lot for your quick and helpful answer! Karen -- View this message in context: http://r.789695.n4.nabble.com/I-need-help-for-creating-a-timevar-tp3643658p3645575.html Sent from the R help mailing list archive at Nabble.com. From annemarie.verkerk at mpi.nl Tue Jul 5 09:00:18 2011 From: annemarie.verkerk at mpi.nl (Annemarie Verkerk) Date: Tue, 05 Jul 2011 09:00:18 +0200 Subject: [R] placing multiple rows in a single row In-Reply-To: <138FA819-1BBB-4903-9675-516AC1CE9883@comcast.net> References: <4E120745.5010903@mpi.nl> <138FA819-1BBB-4903-9675-516AC1CE9883@comcast.net> Message-ID: <4E12B682.60102@mpi.nl> Dear David, thanks so much, I was able to get it to work for my data! I don't really understand yet how the function works, but it seems extremely useful. Thanks again! Annemarie David Winsemius wrote: > > On Jul 4, 2011, at 2:32 PM, Annemarie Verkerk wrote: > >> Dear people from the R help list, >> >> I have a question that I can't get my head around to start answering, >> that is why I am writing to the list. >> >> I have data in a format like this (tabs might look weird): >> >> John A1 1 0 1 >> John A2 1 1 1 >> John A3 1 0 0 >> Mary A1 1 0 1 >> Mary A2 0 0 1 >> Mary A3 1 1 0 >> Peter A1 1 0 0 >> Peter A2 0 0 1 >> Peter A3 1 1 1 >> Josh A1 1 0 0 >> Josh A2 >> Josh A3 0 0 0 >> >> I want to convert it into a format where variable rows from a single >> subject are placed behind each other, but with the different scores >> still matching up (i.e., it needs to be able to cope with missing >> data, as for Josh's A2 score). >> >> John A1 1 0 1 A2 1 1 1 A3 1 >> 0 0 >> Mary A1 1 0 1 A2 0 0 1 A3 1 1 0 >> Peter A1 1 0 0 A2 0 0 1 A3 1 >> 1 1 >> Josh A1 1 0 0 A2 A3 0 0 0 >> >> Preferably, the row identification would become the header of the new >> table, something like this: >> >> A11 A12 A13 A21 A22 A23 A31 A32 A33 >> John 1 0 1 1 1 1 1 0 0 >> Mary 1 0 1 0 0 1 1 1 0 >> Peter 1 0 0 0 0 1 1 1 1 >> Josh 1 0 0 0 0 0 >> >> Probably, this has been addressed before - I just don't know how to >> search for the answer with the right search terms. >> >> Any help is appreciated, even just a link to a page where this is >> addressed! > > There is a reshape function in the stats package that nobody except > Phil Spector seems to understand and then there is the reshape and > reshape2 packages that everybody seems to get. (I don't understand why > the classification variables are on the left-hand-side, though. > Positionally it makes some sense, but logically it does not connect > with how I understand the process.) > > require(reshape2) > # entered your data with default names V1 V2 V3 V4 V5 > > nam123 > V1 V2 V3 V4 V5 > 1 John A1 1 0 1 > 2 John A2 1 1 1 > 3 John A3 1 0 0 > 4 Mary A1 1 0 1 > 5 Mary A2 0 0 1 > 6 Mary A3 1 1 0 > 7 Peter A1 1 0 0 > 8 Peter A2 0 0 1 > 9 Peter A3 1 1 1 > 10 Josh A1 1 0 0 > 11 Josh A2 NA NA NA > 12 Josh A3 0 0 0 > > > nams.mlt <- melt(nam123, idvars=c("V1", "V2")) > > > str(nams.mlt) > 'data.frame': 36 obs. of 4 variables: > $ V1 : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3 3 4 4 4 2 ... > $ V2 : Factor w/ 3 levels "A1","A2","A3": 1 2 3 1 2 3 1 2 3 1 ... > $ variable: Factor w/ 3 levels "V3","V4","V5": 1 1 1 1 1 1 1 1 1 1 ... > $ value : int 1 1 1 1 0 1 1 0 1 1 ... > > > dcast(nams.mlt, V1+V2 ~ variable) > V1 V2 V3 V4 V5 > 1 John A1 1 0 1 > 2 John A2 1 1 1 > 3 John A3 1 0 0 > 4 Josh A1 1 0 0 > 5 Josh A2 NA NA NA > 6 Josh A3 0 0 0 > 7 Mary A1 1 0 1 > 8 Mary A2 0 0 1 > 9 Mary A3 1 1 0 > 10 Peter A1 1 0 0 > 11 Peter A2 0 0 1 > 12 Peter A3 1 1 1 > > dcast(nams.mlt, V1 ~ V2+variable) > V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5 > 1 John 1 0 1 1 1 1 1 0 0 > 2 Josh 1 0 0 NA NA NA 0 0 0 > 3 Mary 1 0 1 0 0 1 1 1 0 > 4 Peter 1 0 0 0 0 1 1 1 1 > > You can always change the names of the dataframe if you want, and in > this case it would be a simple sub() operation. Personally I would > substitute "." rather than "". -- Annemarie Verkerk, MA Evolutionary Processes in Language and Culture (PhD student) Max Planck Institute for Psycholinguistics P.O. Box 310, 6500AH Nijmegen, The Netherlands +31 (0)24 3521 185 http://www.mpi.nl/research/research-projects/evolutionary-processes From gabrielmartos at gmail.com Tue Jul 5 12:30:21 2011 From: gabrielmartos at gmail.com (gabrielmartos) Date: Tue, 5 Jul 2011 03:30:21 -0700 (PDT) Subject: [R] Problems in converting data points to functional data Message-ID: <1309861821818-3645596.post@n4.nabble.com> Hello! Im using my own bases representation for functional data. In order to make F-PCA now I need to declare to my matrix containing the curves (in rows) as a *functional object*. data2fd function doesn't work! I do this: - f.proyections = matrix containing in rows the curves representation (Im using my own basis) - t = vector containing all sample points evaluation (same length that number of columns in matrix f.proyections) fdf.proyections=data2fd(f.proyections,t) I receive this error: Error in .Internal(inherits(x, what, which)) : 'x' is missing I cant use another basis as in the examples on the help of R-library, because the functions are already represented using a base developed by myself... Any suggestion? Many thanks in advance! -- View this message in context: http://r.789695.n4.nabble.com/Problems-in-converting-data-points-to-functional-data-tp3645596p3645596.html Sent from the R help mailing list archive at Nabble.com. From statconsult90 at gmail.com Tue Jul 5 12:20:11 2011 From: statconsult90 at gmail.com (Stat Consult) Date: Tue, 5 Jul 2011 14:50:11 +0430 Subject: [R] condlogic.ff Message-ID: Dear All How can I Recompile "condlogic.ff " in "LogicReg" package for fitting a conditional logistic model? Best Regards, Leila From tryingtolearnagain at gmail.com Tue Jul 5 13:00:51 2011 From: tryingtolearnagain at gmail.com (Trying To learn again) Date: Tue, 5 Jul 2011 13:00:51 +0200 Subject: [R] Executing a function several time, how to save the output Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From uriblass at gmail.com Tue Jul 5 11:45:05 2011 From: uriblass at gmail.com (UriB) Date: Tue, 5 Jul 2011 02:45:05 -0700 (PDT) Subject: [R] How to build a matrix of number of appearance? In-Reply-To: <17867D87-8C02-40DC-B647-B8C5DE41FD56@comcast.net> References: <1309772913259-3643248.post@n4.nabble.com> <17867D87-8C02-40DC-B647-B8C5DE41FD56@comcast.net> Message-ID: <1309859105151-3645550.post@n4.nabble.com> Thanks for your reply Note that I guess that there are many providerID and I get the error cannot allocate vector of size 2.1 Gb (I can use the same trick for most of the other fields) Is there a way to do the same only for providerID with relatively high frequency? -- View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3645550.html Sent from the R help mailing list archive at Nabble.com. From uriblass at gmail.com Tue Jul 5 12:29:41 2011 From: uriblass at gmail.com (UriB) Date: Tue, 5 Jul 2011 03:29:41 -0700 (PDT) Subject: [R] How to translate string to variable inside a command in an easy way in R Message-ID: <1309861781957-3645594.post@n4.nabble.com> I want to write a function that get 2 strings y and z and does the following R command. temp<-qq1[qq1$z==y,] for example if it get y="AMI" and z="PrimaryConditionGroup" It should do the following temp<-qq1[qq1$PrimaryConditionGroup=="AMI",] I could do it by the following function that is ugly and I wonder if there is an easier way to do it espacielly when temp is not the final result that I want (so I practically do not have temp<<-temp because I do not need the function to remember temp but only to remember something else that is calculated based on temp). ugly<-function(y,z) { text1<-paste("temp<-qq1[qq1$",z,sep="") text1<-paste(text1,"==y",sep="") text1<-paste(text1,",]",sep="") eval(parse(text=text1)) temp<<-temp } -- View this message in context: http://r.789695.n4.nabble.com/How-to-translate-string-to-variable-inside-a-command-in-an-easy-way-in-R-tp3645594p3645594.html Sent from the R help mailing list archive at Nabble.com. From silvano at uel.br Wed Jul 6 13:36:27 2011 From: silvano at uel.br (Silvano) Date: Wed, 6 Jul 2011 08:36:27 -0300 Subject: [R] Tables and merge Message-ID: <8ACAAC0A4F0B43568802DAE967A4B6EE@ccePC> ----- Original Message ----- From: "Silvano" To: Sent: Thursday, June 30, 2011 9:07 AM Subject: Tables and merge > Hi, > > I have 21 files which is common variable CODE. > Each file refers to a question. > > I would like to join the 21 files into one, to construct > tables for each question by CODE. > > I tried the command (8 files only): > > require(foreign) > q1 = read.epiinfo('Dados/Q1.rec') > q2 = read.epiinfo('Dados/Q2.rec') > q3 = read.epiinfo('Dados/Q3.rec') > q4 = read.epiinfo('Dados/Q4.rec') > q5 = read.epiinfo('Dados/Q5.rec') > q6 = read.epiinfo('Dados/Q6.rec') > q7 = read.epiinfo('Dados/Q7.rec') > q8 = read.epiinfo('Dados/Q8.rec') > > juntos = merge(q1,q2,q3,q4,q5,q6,q7,q8) > > But it didn't work. Any suggestions? > > Thank you. > > -------------------------------------- > Silvano Cesar da Costa > Departamento de Estat?stica > Universidade Estadual de Londrina > Fone: 3371-4346 > -------------------------------------- > From albertcoster2010 at gmail.com Tue Jul 5 13:41:15 2011 From: albertcoster2010 at gmail.com (albert coster) Date: Tue, 5 Jul 2011 13:41:15 +0200 Subject: [R] problem in reading a sequence file Message-ID: Dear all, I have a file with some sequence (seq.txt). I am writting following code and getting error! Can please help me? seqfile<-read.table(file="seq.txt") Warning message: In read.table(file = "seq.txt") : incomplete final line found by readTableHeader on 'seq.txt' Thanks in advance Albert -------------- next part -------------- NNNNNNNNNNATTAAAGGGC From bbolker at gmail.com Tue Jul 5 13:53:45 2011 From: bbolker at gmail.com (Ben Bolker) Date: Tue, 5 Jul 2011 11:53:45 +0000 Subject: [R] Simulating inhomogeneous Poisson process without loop References: Message-ID: Tristan Linke gmail.com> writes: > > Dear all > > I want to simulate a stochastic jump variance process where N is Bernoulli > with intensity lambda0 + lambda1*Vt. lambda0 is constant and lambda1 can be > interpreted as a regression coefficient on the current variance level Vt. J > is a scaling factor > > How can I rewrite this avoiding the loop structure which is very > time-consuming for long simulations? > > for (i in 1:N){ > ... > N <- rbinom(n=1, size=1, prob=(lambda0+lambda1*Vt)) > Vt <- ... + J*N > .. > } Is it a typo that you are using N both as your loop variable and as part of the state of your simulation? > P.S. This is going towards the Duffie, Pan, Singleton 2000 Transform Pricing > paper, here stochastic volatility with state-dependent correlated jumps > (Eraker 2004). I don't think there's any way to rearrange this completely without loops, because each call of rbinom() depends on the previous updating. One thing that might speed things up a lot would be to pick N random uniform variates as a single vector in advance: rvec <- runif(N) for (i in 1:N) { Vt <- ... + J*(rvec[i] Message-ID: albert coster gmail.com> writes: > > Dear all, > > I have a file with some sequence (seq.txt). I am writting following code and > getting error! Can please help me? > > seqfile<-read.table(file="seq.txt") > Warning message: > In read.table(file = "seq.txt") : > incomplete final line found by readTableHeader on 'seq.txt' > > Thanks in advance > > Albert Very hard to say without more details. Please provide a reproducible example, or at least more information. That is not an error, it's a warning: it means there *might* be something wrong with your data file, but not necessarily. Have you inspected the results? Are they what you expected? If not, do they give you some more information about what might be wrong? Usual suspects: check for unterminated/single quotation marks in your file. From albertcoster2010 at gmail.com Tue Jul 5 14:06:02 2011 From: albertcoster2010 at gmail.com (albert coster) Date: Tue, 5 Jul 2011 14:06:02 +0200 Subject: [R] problem in reading a sequence file In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From therneau at mayo.edu Tue Jul 5 14:33:20 2011 From: therneau at mayo.edu (Terry Therneau) Date: Tue, 05 Jul 2011 07:33:20 -0500 Subject: [R] Multilevel Survival Analysis - Cox PH Model Message-ID: <1309869200.7054.4.camel@nemo> Three comments: 1. If there is no right censoring (and it appears not), I would use lmer on the awakening times, glmer on the FullyOriented variable. That is, I agree with Burt. Another option is GEE models 2. If you want to use a Cox model, then you can a. Add "+ cluster(id)" to the model statement. This adds a robust variance, and is closely related to GEE. b. Use coxme to fit a mixed effects model. Terry Therneau From therneau at mayo.edu Tue Jul 5 14:43:58 2011 From: therneau at mayo.edu (Terry Therneau) Date: Tue, 05 Jul 2011 07:43:58 -0500 Subject: [R] Multilevel Survival Analysis - Cox PH Model Message-ID: <1309869838.7054.9.camel@nemo> > Patients are either fully oriented or not (1 or 2) after an hour. If they're > not, then the data is right censored. It doesn't look like right censored data to me, unless the time variable were "time to full orientation"; you labeled it "time to awake" which appears to be something different. However, to answer your coxme question the random effect would be (1| MRN/COURSE) which stands for a random intercept term for each course, and one for each mrn within course. This is the same notation as lmer. Terry Therneau From p.pagel at wzw.tum.de Tue Jul 5 15:05:46 2011 From: p.pagel at wzw.tum.de (Philipp Pagel) Date: Tue, 5 Jul 2011 15:05:46 +0200 Subject: [R] problem in reading a sequence file In-Reply-To: References: Message-ID: <20110705130546.GA12629@arronax.matrix.invalid> On Tue, Jul 05, 2011 at 02:06:02PM +0200, albert coster wrote: > seqfile > V1 > 1 NNNNNNNNNNATTAAAGGGC > > I want only NNNNNNNNNNATTAAAGGGC . If I understand correctly, your file simply contains one string (sequence) per line. In that case you may want to use scan() instead for read.table but without more infromation it's hard to know. Can you proviede a very short example file (maybe 5 lines) and also the output of str(foo) where foo is the variable you read the file into? Also: do you want a data frame with a single column? Or rather a vector of strings? Something else? Does your file ONLY contain sequences - or are there also identifiers, annotations etc.? cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ From ottorino-luca.pantani at unifi.it Tue Jul 5 15:01:48 2011 From: ottorino-luca.pantani at unifi.it (ottorino) Date: Tue, 5 Jul 2011 15:01:48 +0200 Subject: [R] Prettier axis labels when using log (or exp!!) scales in Lattice (follow up) Message-ID: <1309870908.21166.18.camel@ottorino-amd> Hi all, my mail is a follow up of this thread http://tolstoy.newcastle.edu.au/R/e12/help/10/11/4172.html. I'm trying to alter the labels of an xyplot where the y variable is in the order of millions (cell counts) I've found plenty of examples on the R mailing list archives as well as in the book Lattice: Multivariate Data Visualization with R at chapter 8. Unfortunately all the examples refers to log transformation of data, but actually what I'm looking for is exactly the reverse (exp). An example would better show what I'm looking for. library(latticeExtra) xyplot(Sepal.Length*10e3 ~ Sepal.Width, iris) If the figures are higher, the format of the labels also changes xyplot(Sepal.Length*10e12 ~ Sepal.Width, iris) The y axis of the above plot is what I'm currently get with my data I would like the y axis in the form 50^3 . I would better say I want the y axis labels in the form expression(50^3). I know from ?xyplot that the argument scales accept "log", and "that this is in reality a transformation of the data, not the axes." So the plot I would like to get is similar to xyplot(Sepal.Length*10e3 ~ Sepal.Width, iris, scales=list(y=list(log=10)), yscale.components = yscale.components.logpower) but without transforming the variables In summary I would like to be able to use some sort of exp(10) instead than log(10). Any suggestion to accomplish this? Thanks in advance 8rino From therneau at mayo.edu Tue Jul 5 15:18:44 2011 From: therneau at mayo.edu (Terry Therneau) Date: Tue, 05 Jul 2011 08:18:44 -0500 Subject: [R] modification of cross-validations in rpart Message-ID: <1309871924.7054.15.camel@nemo> > Is there a way to do this modification in rpart or is there any other > function I could use that would consider interdependence in the > response variable? This feature already exists: the "xval" option can be a vector of integers that defines the "left out" groups. First all the 1's are left out, then the 2s, then the 3s, etc. Unfortunately this was overlooked in the documentation for rpart.control; it's in the documentation for xpred.rpart though. I'll fix this soon as part of some other updates. Terry Therneau From dwinsemius at comcast.net Tue Jul 5 15:57:24 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 09:57:24 -0400 Subject: [R] How to build a matrix of number of appearance? In-Reply-To: <1309859105151-3645550.post@n4.nabble.com> References: <1309772913259-3643248.post@n4.nabble.com> <17867D87-8C02-40DC-B647-B8C5DE41FD56@comcast.net> <1309859105151-3645550.post@n4.nabble.com> Message-ID: On Jul 5, 2011, at 5:45 AM, UriB wrote: > Thanks for your reply > Note that I guess that there are many providerID and I get the error > cannot > allocate vector of size 2.1 Gb What code? > (I can use the same trick for most of the other fields) > > Is there a way to do the same only for providerID with relatively high > frequency? You are posting to a mailing list from a non-official web mirror/ interface. Those of us using this list with mail clients cannot tell who you are responding to and what code is throwing an error without opening up a browser and following the link. below (and speaking from prior failed efforts at figuring out context on Nabble, maybe not even then.) Get with the program. Read the Posting Guide. As the sign says: > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html If you persist in psoting to R-help then ...Learn to include context. > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3645550.html > Sent from the R help mailing list archive at Nabble.com. -- David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Tue Jul 5 16:12:51 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 10:12:51 -0400 Subject: [R] placing multiple rows in a single row In-Reply-To: <4E12B682.60102@mpi.nl> References: <4E120745.5010903@mpi.nl> <138FA819-1BBB-4903-9675-516AC1CE9883@comcast.net> <4E12B682.60102@mpi.nl> Message-ID: On Jul 5, 2011, at 3:00 AM, Annemarie Verkerk wrote: > Dear David, > > thanks so much, I was able to get it to work for my data! I don't > really understand yet how the function works, but it seems extremely > useful. The melt operation creates a "long" data.frame, (which is what many plotting programs expect.) The dcast function creates a "wide" dataframe in the form of variable on the LHS of the formal being ID variables... ones that appear as values in the first columns, while variables on the RHS of the formula become column names. Any variables not in the formula (in this case the "value" variable of the melted df) become the interior entries of the new wide df. -- David > > Thanks again! > Annemarie > >> snipped original question >> >> There is a reshape function in the stats package that nobody except >> Phil Spector seems to understand and then there is the reshape and >> reshape2 packages that everybody seems to get. (I don't understand >> why the classification variables are on the left-hand-side, though. >> Positionally it makes some sense, but logically it does not connect >> with how I understand the process.) >> >> require(reshape2) >> # entered your data with default names V1 V2 V3 V4 V5 >> > nam123 >> V1 V2 V3 V4 V5 >> 1 John A1 1 0 1 >> > snipped >> >> > nams.mlt <- melt(nam123, idvars=c("V1", "V2")) >> >> > str(nams.mlt) >> 'data.frame': 36 obs. of 4 variables: >> $ V1 : Factor w/ 4 levels "John","Josh",..: 1 1 1 3 3 3 4 4 4 >> 2 ... >> $ V2 : Factor w/ 3 levels "A1","A2","A3": 1 2 3 1 2 3 1 2 3 >> 1 ... >> $ variable: Factor w/ 3 levels "V3","V4","V5": 1 1 1 1 1 1 1 1 1 >> 1 ... >> $ value : int 1 1 1 1 0 1 1 0 1 1 ... >> >> > dcast(nams.mlt, V1 ~ V2+variable) >> V1 A1_V3 A1_V4 A1_V5 A2_V3 A2_V4 A2_V5 A3_V3 A3_V4 A3_V5 >> 1 John 1 0 1 1 1 1 1 0 0 >> 2 Josh 1 0 0 NA NA NA 0 0 0 >> 3 Mary 1 0 1 0 0 1 1 1 0 >> 4 Peter 1 0 0 0 0 1 1 1 1 >> >> You can always change the names of the dataframe if you want, and >> in this case it would be a simple sub() operation. Personally I >> would substitute "." rather than "". > David Winsemius, MD West Hartford, CT From engstrom.gary at gmail.com Tue Jul 5 16:28:38 2011 From: engstrom.gary at gmail.com (gary engstrom) Date: Tue, 5 Jul 2011 10:28:38 -0400 Subject: [R] if else loop Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From daniel at umd.edu Tue Jul 5 16:47:32 2011 From: daniel at umd.edu (Daniel Malter) Date: Tue, 5 Jul 2011 07:47:32 -0700 (PDT) Subject: [R] Repeating a function in R In-Reply-To: <4E0F6AD6.50303@statistik.tu-dortmund.de> References: <1309613295639-3640508.post@n4.nabble.com> <1309632666024-3640966.post@n4.nabble.com> <4E0F6AD6.50303@statistik.tu-dortmund.de> Message-ID: <1309877252072-3646160.post@n4.nabble.com> Thanks, Uwe, for sending me this the second time. I send my responses through nabble. So #1 does not seem to be an option; #2 I sometimes forget. Regards, Daniel Uwe Ligges-3 wrote: > > On 02.07.2011 20:51, Daniel Malter wrote: >> You can just tell the function to create 1000 random numbers. See ?runif >> for >> the specifics. The arguments are n, min, and max. 'n' is the one you are >> looking for. > > Thanks for providing help on R-help, but for the future please > > - respond to the OP rather than to the list only > > - cite the OP's message so that anybody else on this mailing list > understand the context of your message. > > Best, > Uwe Ligges > > > > >> Da. >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640966.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3646160.html Sent from the R help mailing list archive at Nabble.com. From vikas.bansal at kcl.ac.uk Tue Jul 5 16:51:09 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Tue, 5 Jul 2011 15:51:09 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBFF@KCL-MAIL01.kclad.ds.kcl.ac.uk> Dear all, I have one problem and did not find any solution.Please I want your help. I have two data frames and I want to concatenate them.But the thing is- two data frames are like this- V1 V2 A C G T 10 135344109 0 0 1 0 10 135344110 0 1 0 0 10 135344111 0 0 1 0 10 135344112 0 0 1 0 10 135344113 0 0 1 0 10 135344114 1 0 0 0 10 135344115 1 0 0 0 10 135344116 0 0 0 1 10 135344117 0 1 0 0 10 135344118 0 0 0 1 second data frame- V1 V2 A C G T 10 135344111 1 0 1 0 10 135344113 0 0 1 0 10 135344109 0 3 1 0 10 135344114 1 0 0 0 10 145344115 1 0 0 0 10 135344116 1 0 0 1 10 132344117 0 1 0 0 10 135344118 0 0 0 1 10 135344110 0 1 0 0 now i have to create a new data frame which has insert column 3,4,5 and 6 of second data frame in first data frame if the value in second column is same in both the data frames (values in V2 column).So the output(new data frame) should be- V1 V2 A C G T A C G T 10 135344109 0 0 1 0 0 3 1 0 10 135344110 0 1 0 0 0 1 0 0 10 135344111 0 0 1 0 1 0 1 0 10 135344113 0 0 1 0 0 0 1 0 10 135344114 1 0 0 0 1 0 0 0 10 135344116 0 0 0 1 1 0 0 1 10 135344118 0 0 0 1 0 0 0 1 I f you see the output, second column values- V2 135344109 135344110 135344111 135344113 135344114 135344116 135344118 these values are common in both input dataframes. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsemius at comcast.net] Sent: Monday, July 04, 2011 12:11 AM To: Bansal, Vikas Cc: Dennis Murphy; r-help at r-project.org Subject: Re: [R] For help in R coding On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Sunday, July 03, 2011 7:08 PM >> > > the code is same i just want to add a condition so that it should > check that if in column 3, the character is A then make number of A > equal to total number of . and , > > Should I explain better or can you please tell me which thing is not > clear? My second posting today had a solution. > >> > -- > David. >> >> >> >> Can you please help me how to use this if condition in your coding >> or we can also do it by using some other condition rather than if >> condition? >> > David Winsemius, MD West Hartford, CT From Thierry.ONKELINX at inbo.be Tue Jul 5 16:53:41 2011 From: Thierry.ONKELINX at inbo.be (ONKELINX, Thierry) Date: Tue, 5 Jul 2011 14:53:41 +0000 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBFF@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBFF@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: Dear Vikas, Have at look at ?merge() Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Bansal, Vikas > Verzonden: dinsdag 5 juli 2011 16:51 > Aan: David Winsemius > CC: r-help at r-project.org > Onderwerp: Re: [R] For help in R coding > > Dear all, > > I have one problem and did not find any solution.Please I want your help. > > I have two data frames and I want to concatenate them.But the thing is- > > two data frames are like this- > > V1 V2 A C G T > 10 135344109 0 0 1 0 > 10 135344110 0 1 0 0 > 10 135344111 0 0 1 0 > 10 135344112 0 0 1 0 > 10 135344113 0 0 1 0 > 10 135344114 1 0 0 0 > 10 135344115 1 0 0 0 > 10 135344116 0 0 0 1 > 10 135344117 0 1 0 0 > 10 135344118 0 0 0 1 > > second data frame- > > V1 V2 A C G T > 10 135344111 1 0 1 0 > 10 135344113 0 0 1 0 > 10 135344109 0 3 1 0 > 10 135344114 1 0 0 0 > 10 145344115 1 0 0 0 > 10 135344116 1 0 0 1 > 10 132344117 0 1 0 0 > 10 135344118 0 0 0 1 > 10 135344110 0 1 0 0 > > now i have to create a new data frame which has insert column 3,4,5 and 6 of > second data frame in first data frame if the value in second column is same in > both the data frames (values in V2 column).So the output(new data frame) > should be- > > V1 V2 A C G T A C G T > 10 135344109 0 0 1 0 0 3 1 0 > 10 135344110 0 1 0 0 0 1 0 0 > 10 135344111 0 0 1 0 1 0 1 0 > 10 135344113 0 0 1 0 0 0 1 0 > 10 135344114 1 0 0 0 1 0 0 0 > 10 135344116 0 0 0 1 1 0 0 1 > 10 135344118 0 0 0 1 0 0 0 1 > > I f you see the output, second column values- > > V2 > 135344109 > 135344110 > 135344111 > 135344113 > 135344114 > 135344116 > 135344118 > > these values are common in both input dataframes. > > > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Monday, July 04, 2011 12:11 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > > > > ________________________________________ > > From: David Winsemius [dwinsemius at comcast.net] > > Sent: Sunday, July 03, 2011 7:08 PM > > >> > > > > the code is same i just want to add a condition so that it should > > check that if in column 3, the character is A then make number of A > > equal to total number of . and , > > > > Should I explain better or can you please tell me which thing is not > > clear? > > My second posting today had a solution. > > > > > >> > > -- > > David. > >> > >> > >> > >> Can you please help me how to use this if condition in your coding or > >> we can also do it by using some other condition rather than if > >> condition? > >> > > > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Tue Jul 5 16:56:55 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 05 Jul 2011 16:56:55 +0200 Subject: [R] condlogic.ff In-Reply-To: References: Message-ID: <4E132637.3090001@statistik.tu-dortmund.de> On 05.07.2011 12:20, Stat Consult wrote: > Dear All > How can I Recompile "condlogic.ff " in "LogicReg" package for > fitting a conditional logistic model? Although your mail address suggests you should know yourself, please read the manual "R Installation and Administration" that includes a description in how to install packages from (changed) sources. Best, Uwe Ligges > Best Regards, > Leila > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From ligges at statistik.tu-dortmund.de Tue Jul 5 17:01:42 2011 From: ligges at statistik.tu-dortmund.de (Uwe Ligges) Date: Tue, 05 Jul 2011 17:01:42 +0200 Subject: [R] Repeating a function in R In-Reply-To: <1309877252072-3646160.post@n4.nabble.com> References: <1309613295639-3640508.post@n4.nabble.com> <1309632666024-3640966.post@n4.nabble.com> <4E0F6AD6.50303@statistik.tu-dortmund.de> <1309877252072-3646160.post@n4.nabble.com> Message-ID: <4E132756.5070702@statistik.tu-dortmund.de> On 05.07.2011 16:47, Daniel Malter wrote: > Thanks, Uwe, for sending me this the second time. Daniel, I do not track names, sorry for posting twice. > I send my responses through nabble. Great, so you found the main problem already. > So #1 does not seem to be an option; It is an option: if Nabble can't do it (and I really want to know if it can), just use a more appropriate tool. > #2 I sometimes forget. Time to choose a tool that offers this by default so you can't forget. Best wishes, Uwe > Regards, > Daniel > > > Uwe Ligges-3 wrote: >> >> On 02.07.2011 20:51, Daniel Malter wrote: >>> You can just tell the function to create 1000 random numbers. See ?runif >>> for >>> the specifics. The arguments are n, min, and max. 'n' is the one you are >>> looking for. >> >> Thanks for providing help on R-help, but for the future please >> >> - respond to the OP rather than to the list only >> >> - cite the OP's message so that anybody else on this mailing list >> understand the context of your message. >> >> Best, >> Uwe Ligges >> >> >> >> >>> Da. >>> >>> -- >>> View this message in context: >>> http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3640966.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > View this message in context: http://r.789695.n4.nabble.com/Repeating-a-function-in-R-tp3640508p3646160.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From bhh at xs4all.nl Tue Jul 5 17:04:02 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Tue, 5 Jul 2011 08:04:02 -0700 (PDT) Subject: [R] problem in reading a sequence file In-Reply-To: References: Message-ID: <1309878242274-3646214.post@n4.nabble.com> albert coster wrote: > > Dear all, > > I have a file with some sequence (seq.txt). I am writting following code > and > getting error! Can please help me? > > > seqfile<-read.table(file="seq.txt") > Warning message: > In read.table(file = "seq.txt") : > incomplete final line found by readTableHeader on 'seq.txt' > The message is a warning not an error. It means that the last line of your file does not end with a line-ending sequence. Such as carriage return/linefeed (\r\n) or a just a linefeed (\n). In your editor you either need to press at the end of the last line or tell the editor to terminate the last line with a line-endng character. Berend -- View this message in context: http://r.789695.n4.nabble.com/problem-in-reading-a-sequence-file-tp3645717p3646214.html Sent from the R help mailing list archive at Nabble.com. From vikas.bansal at kcl.ac.uk Tue Jul 5 17:02:01 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Tue, 5 Jul 2011 16:02:01 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBFF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC01@KCL-MAIL01.kclad.ds.kcl.ac.uk> Yes sir.I have already looked at merge() but as I am new to R,I was not able to understand the argument that how should i create a code for the logic i gave in previous mail . Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] Sent: Tuesday, July 05, 2011 3:53 PM To: Bansal, Vikas; David Winsemius Cc: r-help at r-project.org Subject: RE: [R] For help in R coding Dear Vikas, Have at look at ?merge() Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Bansal, Vikas > Verzonden: dinsdag 5 juli 2011 16:51 > Aan: David Winsemius > CC: r-help at r-project.org > Onderwerp: Re: [R] For help in R coding > > Dear all, > > I have one problem and did not find any solution.Please I want your help. > > I have two data frames and I want to concatenate them.But the thing is- > > two data frames are like this- > > V1 V2 A C G T > 10 135344109 0 0 1 0 > 10 135344110 0 1 0 0 > 10 135344111 0 0 1 0 > 10 135344112 0 0 1 0 > 10 135344113 0 0 1 0 > 10 135344114 1 0 0 0 > 10 135344115 1 0 0 0 > 10 135344116 0 0 0 1 > 10 135344117 0 1 0 0 > 10 135344118 0 0 0 1 > > second data frame- > > V1 V2 A C G T > 10 135344111 1 0 1 0 > 10 135344113 0 0 1 0 > 10 135344109 0 3 1 0 > 10 135344114 1 0 0 0 > 10 145344115 1 0 0 0 > 10 135344116 1 0 0 1 > 10 132344117 0 1 0 0 > 10 135344118 0 0 0 1 > 10 135344110 0 1 0 0 > > now i have to create a new data frame which has insert column 3,4,5 and 6 of > second data frame in first data frame if the value in second column is same in > both the data frames (values in V2 column).So the output(new data frame) > should be- > > V1 V2 A C G T A C G T > 10 135344109 0 0 1 0 0 3 1 0 > 10 135344110 0 1 0 0 0 1 0 0 > 10 135344111 0 0 1 0 1 0 1 0 > 10 135344113 0 0 1 0 0 0 1 0 > 10 135344114 1 0 0 0 1 0 0 0 > 10 135344116 0 0 0 1 1 0 0 1 > 10 135344118 0 0 0 1 0 0 0 1 > > I f you see the output, second column values- > > V2 > 135344109 > 135344110 > 135344111 > 135344113 > 135344114 > 135344116 > 135344118 > > these values are common in both input dataframes. > > > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Monday, July 04, 2011 12:11 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > > > > ________________________________________ > > From: David Winsemius [dwinsemius at comcast.net] > > Sent: Sunday, July 03, 2011 7:08 PM > > >> > > > > the code is same i just want to add a condition so that it should > > check that if in column 3, the character is A then make number of A > > equal to total number of . and , > > > > Should I explain better or can you please tell me which thing is not > > clear? > > My second posting today had a solution. > > > > > >> > > -- > > David. > >> > >> > >> > >> Can you please help me how to use this if condition in your coding or > >> we can also do it by using some other condition rather than if > >> condition? > >> > > > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From sarah.goslee at gmail.com Tue Jul 5 17:13:37 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Tue, 5 Jul 2011 11:13:37 -0400 Subject: [R] if else loop In-Reply-To: References: Message-ID: Hi Gary, A solution in two pieces. First, you need to be able to match the rows of your data frame. There might be a more elegant way to do it, but I couldn't think of one that gave the option of ordering or not, so I wrote a function: isin <- function(dd, tomatch, ordered=TRUE) { # find tomatch in rows of df # order can be important or unimportant if(length(tomatch) != ncol(dd)) stop("tomatch must have the same number of elements as df has columns.\n") if(ordered) { rowmatch <- apply(dd, 1, function(x){sum(x == tomatch) == length(tomatch)}) } else { rowmatch <- apply(dd, 1, function(x)all(tomatch %in% x)) } rowmatch } # test isin() > set.seed(1234) > dd <- data.frame(a = sample(1:20, 100, replace=TRUE), b = sample(5:24, 100, replace=TRUE)) > # isin() returns a row index so you can do something more than just return > # something that looks just like the input, such as match the first two columns > # but return entire rows > dd[isin(dd, c(1, 19), ordered=FALSE),] a b 73 1 19 98 1 19 > dd[isin(dd, c(10, 13), ordered=TRUE),] [1] a b <0 rows> (or 0-length row.names) > dd[isin(dd, c(10, 13), ordered=FALSE),] a b 3 13 10 On Tue, Jul 5, 2011 at 10:28 AM, gary engstrom wrote: > Dear R help > > I was hoping you might be able to show me how to write a loop function take > would ccomplish this task. > > # code piece I am looking for > if(subset(dd,c(1,23,ordered=F))is found))( print subset) > else( continue evaluating subsets) > subset(dd,isin(dd,c(1,23), ordered = FALSE)) > subset(dd,isin(dd,c(3,23),ordered=F)) > subset(dd,isin(dd,c(4,11),ordered=F)) > subset(dd,isin(dd,c(7,15),ordered=F)) Part II: I'm not entirely sure what you're trying to do. If c(1,23) is not matched, do you want ALL of them, or should this be sequential? And why not just check for all of them, rather than making it conditional? Anyway, this should be enough to get you going: if(nrow(dd[isin(dd, c(1, 23), ordered=FALSE),]) > 0) { dd[isin(dd, c(1, 23), ordered=FALSE),] } else { dd[isin(dd, c(3, 23), ordered=FALSE), ] } # or, more elegantly: > all.matches <- list(c(1, 23), c(3, 23), c(4, 11), c(7, 15)) > lapply(all.matches, function(x)dd[isin(dd, x, ordered=FALSE), ]) [[1]] a b 24 1 23 [[2]] [1] a b <0 rows> (or 0-length row.names) [[3]] a b 89 4 11 [[4]] [1] a b <0 rows> (or 0-length row.names) -- Sarah Goslee http://www.functionaldiversity.org From p.pagel at wzw.tum.de Tue Jul 5 17:21:51 2011 From: p.pagel at wzw.tum.de (Philipp Pagel) Date: Tue, 5 Jul 2011 17:21:51 +0200 Subject: [R] problem in reading a sequence file In-Reply-To: References: <20110705130546.GA12629@arronax.matrix.invalid> Message-ID: <20110705152151.GA14677@arronax.matrix.invalid> On Tue, Jul 05, 2011 at 04:53:32PM +0200, albert coster wrote: I'm taking this back to the list so others can follow up. > Yes, the file is consists of one string (sequence) per line. > > The files format is following: > > Sequence > NNNNNNNNNNATTAAAGGGC OK - in that case (and as you want a vector anyway) you can use scan('seq.txt', what=character)() > > > seqfile<-read.table("seq.txt") > Warning message: > In read.table("seq.txt") : > incomplete final line found by readTableHeader on 'seq.txt' OK - that means you don't have a newline ('\n') at the end of your sequence file and read.table is warning you about that. > > str(seqfile) > 'data.frame': 2 obs. of 1 variable: > $ V1: Factor w/ 2 levels "NNNNNNNNNNATTAAAGGGC",..: 2 1 This indicates that there are at least two lines in the file (so you got two levels in the factor). So I would guess there is an empy line before your sequence or you really have the word 'Sequence' on line 1. For sequence data it probably does not make much sense to let R convert to factor and a character colunm would be prefered. This can be accomplished by using one of the options 'as.is', 'stringsAsFactors' or 'colClasses'. If you use scan you'll need to get rid of the extra line first. If you stick with read.table you can specify the first line as your header line using the header=TRUE option. Now you can address column 'Sequence' as such. Example: > dat <- read.table('seq.txt', as.is=T, header=TRUE) > dat$Sequence [1] "NNNNNNNNNNATTAAAGGGC" > dat[, 'Sequence'] [1] "NNNNNNNNNNATTAAAGGGC" > str(dat) 'data.frame': 1 obs. of 1 variable: $ Sequence: chr "NNNNNNNNNNATTAAAGGGC" cu Philipp -- Dr. Philipp Pagel Lehrstuhl f?r Genomorientierte Bioinformatik Technische Universit?t M?nchen Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://webclu.bio.wzw.tum.de/~pagel/ From amwootte at ncsu.edu Tue Jul 5 17:49:28 2011 From: amwootte at ncsu.edu (Adrienne Wootten) Date: Tue, 5 Jul 2011 11:49:28 -0400 Subject: [R] For help in R coding In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC01@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk> <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBFF@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC01@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: If I understand correctly, you want to keep the rows from each table which have common values in the second column. In which case, merge will work for this, such as in this example. Say you have these data frames: > frame1 x A C G 1 0 -1 2 2 -1 0 -1 3 0 0 -1 4 1 1 -1 5 0 1 0 6 -1 1 -1 7 0 -1 1 8 0 0 -1 9 1 -3 0 10 0 0 0 > frame2 x A C G 2 0 1 0 4 -1 0 -1 6 1 0 0 8 -1 -1 -1 10 -1 0 -1 12 1 -1 0 14 0 -2 0 16 0 -2 0 18 1 0 -1 20 0 -1 2 and you want to combine these tables and keep the rows which have values of column x in common or getting something like this. x A C G A C G 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which shows that the common values of x between the two tables is 2,4,6,8, and 10. The merge command to do this is: > merge(frame1,frame2,by="x") x A.x C.x G.x A.y C.y G.y 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which is the same as the desired output from before except for the column names. When there are common names between the two data frames being merged, R will add an extension to the column name to make it easy to determine which columns came from which data frame. In this case, the .x extension are columns from the first data frame (frame1 in this example) and the .y extension are columns from the second data frame (frame 2 in this example). I hope this helps! Adrienne On Tue, Jul 5, 2011 at 11:02 AM, Bansal, Vikas wrote: > > Yes sir.I have already looked at merge() > but as I am new to R,I was not able to understand the argument that how should i create a code for the logic i gave in previous mail . > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] > Sent: Tuesday, July 05, 2011 3:53 PM > To: Bansal, Vikas; David Winsemius > Cc: r-help at r-project.org > Subject: RE: [R] For help in R coding > > Dear Vikas, > > Have at look at ?merge() > > Best regards, > > Thierry > >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> Namens Bansal, Vikas >> Verzonden: dinsdag 5 juli 2011 16:51 >> Aan: David Winsemius >> CC: r-help at r-project.org >> Onderwerp: Re: [R] For help in R coding >> >> Dear all, >> >> I have one problem and did not find any solution.Please I want your help. >> >> I have two data frames and I want to concatenate them.But the thing is- >> >> two data frames are like this- >> >> V1 ? ? ? ? ? V2 ? ? ? ? ? ? ? A ? ? ? C ? ? ? G ? ? ? T >> 10 ? ?135344109 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344110 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344111 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344112 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344113 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344114 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344115 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344116 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> 10 ? ?135344117 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344118 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> >> second data frame- >> >> V1 ? ? ? ? ? V2 ? ? ? ? ? ? ? A ? ? ? C ? ? ? G ? ? ? T >> 10 ? ?135344111 ? ? ? 1 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344113 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344109 ? ? ? 0 ? ? ? 3 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344114 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?145344115 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344116 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> 10 ? ?132344117 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344118 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 >> 10 ? ?135344110 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 >> >> now i have to create a new data frame which has ?insert column 3,4,5 and 6 of >> second data frame in first data frame if the value in second column is same in >> both the data frames (values in V2 column).So the output(new data frame) >> should be- >> >> V1 ? ? ? ? ? V2 ? ? ? ? ? ? ? A ? ? ? C ? ? ? G ? ? ? T ? ? A ? ? ? C ? ? G ? ? ?T >> 10 ? ?135344109 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ?0 ? ? ? ?3 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344110 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ?0 ? ? ? ?1 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344111 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ?1 ? ? ? ?0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344113 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ? 0 ? ? ?0 ? ? ? ?0 ? ? ? 1 ? ? ? 0 >> 10 ? ?135344114 ? ? ? 1 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 0 >> 10 ? ?135344116 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ?1 ? ? ? ?0 ? ? ? 0 ? ? ? 1 >> 10 ? ?135344118 ? ? ? 0 ? ? ? 0 ? ? ? 0 ? ? ? 1 ? ? ?0 ? ? ? ?0 ? ? ? 0 ? ? ? 1 >> >> I f you see the output, second column values- >> >> ? ? ? ? ? ? ?V2 >> ? ? ? 135344109 >> ? ? ? 135344110 >> ? ? ? 135344111 >> ? ? ? 135344113 >> ? ? ? 135344114 >> ? ? ? 135344116 >> ? ? ? 135344118 >> >> these values are common in both input dataframes. >> >> >> >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Monday, July 04, 2011 12:11 AM >> To: Bansal, Vikas >> Cc: Dennis Murphy; r-help at r-project.org >> Subject: Re: [R] For help in R coding >> >> On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: >> >> > >> > ________________________________________ >> > From: David Winsemius [dwinsemius at comcast.net] >> > Sent: Sunday, July 03, 2011 7:08 PM >> >> >> >> > >> > the code is same i just want to add a condition so that ?it should >> > check that if in column 3, the character is A then make number of A >> > equal to total number of . and , >> > >> > Should I explain better or can you please tell me which thing is not >> > clear? >> >> My second posting today had a solution. >> >> >> > >> >> >> > -- >> > David. >> >> >> >> >> >> >> >> Can you please help me how to use this if condition in your coding or >> >> we can also do it by using some other condition rather than if >> >> condition? >> >> >> > >> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Adrienne Wootten Graduate Research Assistant State Climate Office of North Carolina Department of Marine, Earth and Atmospheric Sciences North Carolina State University From vikas.bansal at kcl.ac.uk Tue Jul 5 17:48:48 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Tue, 5 Jul 2011 16:48:48 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> , <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBFF@KCL-MAIL01.kclad.ds.kcl.ac.uk>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC02@KCL-MAIL01.kclad.ds.kcl.ac.uk> Hi I sorted out a little bit- I am using this code- vi=(m1 <- merge(blaa, daf, by.x = "V2", by.y = "V2")) (m2 <- merge(daf, blaa, by.x = "V2", by.y = "V2")) results are also coming fine. but i dont know i got another code- stopifnot(as.character(m1[,1]) == as.character(m2[,1]), all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]), dim(merge(m1, m2, by = integer(0))) == c(36, 10)) Can you tell me what this code will do???? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] Sent: Tuesday, July 05, 2011 3:53 PM To: Bansal, Vikas; David Winsemius Cc: r-help at r-project.org Subject: RE: [R] For help in R coding Dear Vikas, Have at look at ?merge() Best regards, Thierry > -----Oorspronkelijk bericht----- > Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > Namens Bansal, Vikas > Verzonden: dinsdag 5 juli 2011 16:51 > Aan: David Winsemius > CC: r-help at r-project.org > Onderwerp: Re: [R] For help in R coding > > Dear all, > > I have one problem and did not find any solution.Please I want your help. > > I have two data frames and I want to concatenate them.But the thing is- > > two data frames are like this- > > V1 V2 A C G T > 10 135344109 0 0 1 0 > 10 135344110 0 1 0 0 > 10 135344111 0 0 1 0 > 10 135344112 0 0 1 0 > 10 135344113 0 0 1 0 > 10 135344114 1 0 0 0 > 10 135344115 1 0 0 0 > 10 135344116 0 0 0 1 > 10 135344117 0 1 0 0 > 10 135344118 0 0 0 1 > > second data frame- > > V1 V2 A C G T > 10 135344111 1 0 1 0 > 10 135344113 0 0 1 0 > 10 135344109 0 3 1 0 > 10 135344114 1 0 0 0 > 10 145344115 1 0 0 0 > 10 135344116 1 0 0 1 > 10 132344117 0 1 0 0 > 10 135344118 0 0 0 1 > 10 135344110 0 1 0 0 > > now i have to create a new data frame which has insert column 3,4,5 and 6 of > second data frame in first data frame if the value in second column is same in > both the data frames (values in V2 column).So the output(new data frame) > should be- > > V1 V2 A C G T A C G T > 10 135344109 0 0 1 0 0 3 1 0 > 10 135344110 0 1 0 0 0 1 0 0 > 10 135344111 0 0 1 0 1 0 1 0 > 10 135344113 0 0 1 0 0 0 1 0 > 10 135344114 1 0 0 0 1 0 0 0 > 10 135344116 0 0 0 1 1 0 0 1 > 10 135344118 0 0 0 1 0 0 0 1 > > I f you see the output, second column values- > > V2 > 135344109 > 135344110 > 135344111 > 135344113 > 135344114 > 135344116 > 135344118 > > these values are common in both input dataframes. > > > > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: David Winsemius [dwinsemius at comcast.net] > Sent: Monday, July 04, 2011 12:11 AM > To: Bansal, Vikas > Cc: Dennis Murphy; r-help at r-project.org > Subject: Re: [R] For help in R coding > > On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: > > > > > ________________________________________ > > From: David Winsemius [dwinsemius at comcast.net] > > Sent: Sunday, July 03, 2011 7:08 PM > > >> > > > > the code is same i just want to add a condition so that it should > > check that if in column 3, the character is A then make number of A > > equal to total number of . and , > > > > Should I explain better or can you please tell me which thing is not > > clear? > > My second posting today had a solution. > > > > > >> > > -- > > David. > >> > >> > >> > >> Can you please help me how to use this if condition in your coding or > >> we can also do it by using some other condition rather than if > >> condition? > >> > > > > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From dwinsemius at comcast.net Tue Jul 5 18:05:36 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 12:05:36 -0400 Subject: [R] Tables and merge In-Reply-To: <8ACAAC0A4F0B43568802DAE967A4B6EE@ccePC> References: <8ACAAC0A4F0B43568802DAE967A4B6EE@ccePC> Message-ID: <984DA41F-56A5-49EE-BB6D-1069082AC4C5@comcast.net> On Jul 6, 2011, at 7:36 AM, Silvano wrote: > ----- Original Message ----- From: "Silvano" > To: > Sent: Thursday, June 30, 2011 9:07 AM > Subject: Tables and merge > >> I have 21 files which is common variable CODE. >> Each file refers to a question. >> >> I would like to join the 21 files into one, to construct >> tables for each question by CODE. >> >> I tried the command (8 files only): >> >> require(foreign) >> q1 = read.epiinfo('Dados/Q1.rec') >> q2 = read.epiinfo('Dados/Q2.rec') >> q3 = read.epiinfo('Dados/Q3.rec') >> q4 = read.epiinfo('Dados/Q4.rec') >> q5 = read.epiinfo('Dados/Q5.rec') >> q6 = read.epiinfo('Dados/Q6.rec') >> q7 = read.epiinfo('Dados/Q7.rec') >> q8 = read.epiinfo('Dados/Q8.rec') >> >> juntos = merge(q1,q2,q3,q4,q5,q6,q7,q8) >> >> But it didn't work. Any suggestions? Suggestion # 1: Read the Posting Guide. In there you are advised to report the verbatim text from error messages. Reading error messages is often informative. Suggestion # 2: Report the results of `str` on all of those "q" objects. We need to see whether there are the necessary common column names that would support a merge operation. Suggestion #3 : read the ?merge page and pay particular attention to number ("two") in the title. consider this possibility after further reading of ?merge and a bit of testing. merge(x=q1, y=list(q2,q3,q4,q5,q6,q7,q8) ) # Your error occurred because of positional matching. The q3 object is being assigned to the third argument , "by= ", and that is what your unreported error message was telling you. That construction seemed to work without error on a test I did with a slight modification of the first example on ?merge. After making the author column have the same name = `name`, I also got success with: do.call("merge", list(x=authors, y=list(books, books))) The non-do,call simplification above was not entirely predictably correct (to me anyway) , since the ?merge page does not say that a list object holding dataframes would be an acceptable "y" argument. But I see that as.data.frame(list(books, books)) does produce a data.frame and coercion with as.data.frame on that list object is probably what happened in the merge() call. -- David Winsemius, MD West Hartford, CT From rvalliant at survey.umd.edu Tue Jul 5 18:14:16 2011 From: rvalliant at survey.umd.edu (Richard Valliant) Date: Tue, 05 Jul 2011 12:14:16 -0400 Subject: [R] RWinEdt problem Message-ID: I received this reply offline from William Dietrich and am posting it in case it might help someone else. His fix did get RWinEdt going, but, in my case, the tabs that should have the open file names are blank. This fix worked with WinEdt 5.5 but not v.6. In the end, I decided to switch to Tinn-R. rv >>> "William Dieterich" < wdieterich at npipm.com > 06/30 11:11 PM >>> I had the same problem on the same system with RWinEdt. I finally resolved it by accident by launching R from the start menu (instead of the desktop shortcut). If I launched R with the desktop shortcut icon (as administrator) I would get the error you get when I attempted to load RWinEdt. But everything works fine if I launch R from the startup menu (as administrator). I think it has something to do with code attached to the desktop shortcut. When you install R it asks about creating a startup menu item - I think the problem is related to that. Sorry for the offline response but I am not currently subscribed to R Help. I couldn't resist giving a tip because this problem was a real headache for me too. William Dieterich, Ph.D. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Director of Research Northpointe Inc. Golden Office: 303.216.9458 Cell: 303.257.7128 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Richard Valliant Sent: Thursday, June 30, 2011 1:05 AM To: r-help at r-project.org Subject: [R] RWinEdt I have a problem using RWinEdt 1.8.2 in Windows 7 Professional (64 bit). System/software info: R version 2.13.0 (2011-04-13) Copyright (C) 2011 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-mingw32/x64 (64-bit) WinEdt Build: 20071003 (v. 5.5) After installing the R package and attempting to load I get: > library(RWinEdt) Warning message: In shell(paste("\"\"", .gW$InstallRoot, " \\WinEdt.exe\ " -C=\"R-WinEdt\" -E=", : '""C:\Program Files (x86)\WinEdt Team\WinEdt\WinEdt.exe" -C="R-WinEdt" -E="C:\Users\rvalliant\AppData\Roaming\WinEdt\R.ini""' execution failed with error code 1 > The WinEdt window does not open. I can open it manually (since the package installation created a desktop shortcut ("RWinEdt"). If a line of R code is highlighted in RWinEdt and sent to the R Console with Alt+p, the focus shifts to R console but nothing is copied. This has come up before in a message from John Seers on 2 Mar 2011. Uwe suggested this: "One installing RWinEdt the first time, please run R with Administrator privileges (right click to do so). Then installation should work smoothly with WinEdt < 6.0." I'm running WinEdt 5.5. I followed Uwe's suggestion but get the message above. Any suggestions? From vikas.bansal at kcl.ac.uk Tue Jul 5 18:16:59 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Tue, 5 Jul 2011 17:16:59 +0100 Subject: [R] For help in R coding In-Reply-To: References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE0@KCL-MAIL01.kclad.ds.kcl.ac.uk> <813856BE-5DD3-4708-B1FA-2611EBAD8C61@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE1@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE3@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE5@KCL-MAIL01.kclad.ds.kcl.ac.uk> <383EF813-24EA-42F3-B514-AB9CF8060BAC@comcast.net> <676B7003-AFFE-4CF1-9CC8-FB1D6953B761@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBE9@KCL-MAIL01.kclad.ds.kcl.ac.uk> <10E80E91-DE87-431A-8E41-50CDAFB73A4D@comcast.net> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBED@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBEF@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEBFF@KCL-MAIL01.kclad.ds.kcl.ac.uk> <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC01@KCL-MAIL01.kclad.ds.kcl.ac.uk>, Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC03@KCL-MAIL01.kclad.ds.kcl.ac.uk> Yes.this is perfect.but can i use the and (&) operator to check if two column have same value.Like- merge(frame1,frame2,by="x&G") according to your data given in previous mail. so the output should be- x G A.x C.x A.y C.y 4 -1 1 1 -1 0 8 -1 0 0 -1 -1 Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: wootten.adrienne at gmail.com [wootten.adrienne at gmail.com] On Behalf Of Adrienne Wootten [amwootte at ncsu.edu] Sent: Tuesday, July 05, 2011 4:49 PM To: Bansal, Vikas Cc: r-help at r-project.org Subject: Re: [R] For help in R coding If I understand correctly, you want to keep the rows from each table which have common values in the second column. In which case, merge will work for this, such as in this example. Say you have these data frames: > frame1 x A C G 1 0 -1 2 2 -1 0 -1 3 0 0 -1 4 1 1 -1 5 0 1 0 6 -1 1 -1 7 0 -1 1 8 0 0 -1 9 1 -3 0 10 0 0 0 > frame2 x A C G 2 0 1 0 4 -1 0 -1 6 1 0 0 8 -1 -1 -1 10 -1 0 -1 12 1 -1 0 14 0 -2 0 16 0 -2 0 18 1 0 -1 20 0 -1 2 and you want to combine these tables and keep the rows which have values of column x in common or getting something like this. x A C G A C G 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which shows that the common values of x between the two tables is 2,4,6,8, and 10. The merge command to do this is: > merge(frame1,frame2,by="x") x A.x C.x G.x A.y C.y G.y 2 -1 0 -1 0 1 0 4 1 1 -1 -1 0 -1 6 -1 1 -1 1 0 0 8 0 0 -1 -1 -1 -1 10 0 0 0 -1 0 -1 which is the same as the desired output from before except for the column names. When there are common names between the two data frames being merged, R will add an extension to the column name to make it easy to determine which columns came from which data frame. In this case, the .x extension are columns from the first data frame (frame1 in this example) and the .y extension are columns from the second data frame (frame 2 in this example). I hope this helps! Adrienne On Tue, Jul 5, 2011 at 11:02 AM, Bansal, Vikas wrote: > > Yes sir.I have already looked at merge() > but as I am new to R,I was not able to understand the argument that how should i create a code for the logic i gave in previous mail . > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ________________________________________ > From: ONKELINX, Thierry [Thierry.ONKELINX at inbo.be] > Sent: Tuesday, July 05, 2011 3:53 PM > To: Bansal, Vikas; David Winsemius > Cc: r-help at r-project.org > Subject: RE: [R] For help in R coding > > Dear Vikas, > > Have at look at ?merge() > > Best regards, > > Thierry > >> -----Oorspronkelijk bericht----- >> Van: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] >> Namens Bansal, Vikas >> Verzonden: dinsdag 5 juli 2011 16:51 >> Aan: David Winsemius >> CC: r-help at r-project.org >> Onderwerp: Re: [R] For help in R coding >> >> Dear all, >> >> I have one problem and did not find any solution.Please I want your help. >> >> I have two data frames and I want to concatenate them.But the thing is- >> >> two data frames are like this- >> >> V1 V2 A C G T >> 10 135344109 0 0 1 0 >> 10 135344110 0 1 0 0 >> 10 135344111 0 0 1 0 >> 10 135344112 0 0 1 0 >> 10 135344113 0 0 1 0 >> 10 135344114 1 0 0 0 >> 10 135344115 1 0 0 0 >> 10 135344116 0 0 0 1 >> 10 135344117 0 1 0 0 >> 10 135344118 0 0 0 1 >> >> second data frame- >> >> V1 V2 A C G T >> 10 135344111 1 0 1 0 >> 10 135344113 0 0 1 0 >> 10 135344109 0 3 1 0 >> 10 135344114 1 0 0 0 >> 10 145344115 1 0 0 0 >> 10 135344116 1 0 0 1 >> 10 132344117 0 1 0 0 >> 10 135344118 0 0 0 1 >> 10 135344110 0 1 0 0 >> >> now i have to create a new data frame which has insert column 3,4,5 and 6 of >> second data frame in first data frame if the value in second column is same in >> both the data frames (values in V2 column).So the output(new data frame) >> should be- >> >> V1 V2 A C G T A C G T >> 10 135344109 0 0 1 0 0 3 1 0 >> 10 135344110 0 1 0 0 0 1 0 0 >> 10 135344111 0 0 1 0 1 0 1 0 >> 10 135344113 0 0 1 0 0 0 1 0 >> 10 135344114 1 0 0 0 1 0 0 0 >> 10 135344116 0 0 0 1 1 0 0 1 >> 10 135344118 0 0 0 1 0 0 0 1 >> >> I f you see the output, second column values- >> >> V2 >> 135344109 >> 135344110 >> 135344111 >> 135344113 >> 135344114 >> 135344116 >> 135344118 >> >> these values are common in both input dataframes. >> >> >> >> >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ________________________________________ >> From: David Winsemius [dwinsemius at comcast.net] >> Sent: Monday, July 04, 2011 12:11 AM >> To: Bansal, Vikas >> Cc: Dennis Murphy; r-help at r-project.org >> Subject: Re: [R] For help in R coding >> >> On Jul 3, 2011, at 6:10 PM, Bansal, Vikas wrote: >> >> > >> > ________________________________________ >> > From: David Winsemius [dwinsemius at comcast.net] >> > Sent: Sunday, July 03, 2011 7:08 PM >> >> >> >> > >> > the code is same i just want to add a condition so that it should >> > check that if in column 3, the character is A then make number of A >> > equal to total number of . and , >> > >> > Should I explain better or can you please tell me which thing is not >> > clear? >> >> My second posting today had a solution. >> >> >> > >> >> >> > -- >> > David. >> >> >> >> >> >> >> >> Can you please help me how to use this if condition in your coding or >> >> we can also do it by using some other condition rather than if >> >> condition? >> >> >> > >> >> >> David Winsemius, MD >> West Hartford, CT >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Adrienne Wootten Graduate Research Assistant State Climate Office of North Carolina Department of Marine, Earth and Atmospheric Sciences North Carolina State University From tryingtolearnagain at gmail.com Tue Jul 5 17:51:59 2011 From: tryingtolearnagain at gmail.com (Trying To learn again) Date: Tue, 5 Jul 2011 17:51:59 +0200 Subject: [R] Executing a function several time, how to save the output In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sds at gnu.org Tue Jul 5 18:53:56 2011 From: sds at gnu.org (Sam Steingold) Date: Tue, 05 Jul 2011 12:53:56 -0400 Subject: [R] hash table access, vector access &c Message-ID: Hi, I am confused by the way the indexing works. I read a table from a csv file like this: ysmd <- read.csv("ysmd.csv",header=TRUE); ysmd.table <- hash(); for (i in 1:length(ysmd$X.stock)) ysmd.table[ysmd$X.stock[i]] <- ysmd[i,]; the first column ("X.stock") is a string (factor): > ysmd$X.stock[[100]] [1] FLO 7757 Levels: A AA AA- AAAAA AAC AACC AACOU AACOW AADR AAI AAME AAN AAON ... ZZZZT when I print ysmd.table, I see the data I expect: ... ZIOP : ZIOP 402600000 3.03 7.85 707694 6.3717 ZIP : ZIP 794900000 23.53 31.5 677046 23.2508 ZIPR : ZIPR 47100000 2.28 3.5 21865 2.4058 ZIV : ZIV -1 12.2987 17.3862 37455 16.6068 ZIXI : ZIXI 254900000 2.1 4.88 905849 3.5146 ... moreover, > ysmd.table[['FLO']] X.stock market.cap X52.week.low X52.week.high X3.month.average.daily.volume 100 FLO 2.984e+09 15.3133 22.37 1021580 X50.day.moving.average.price 100 21.3769 quite correctly. however, > ysmd.table[ysmd$X.stock[[100]]] containing 0 key-value pair(s). NA : NULL so, how do I access the hash table element using non-literal strings? or, how do I convert ysmd$X.stock[[100]] to a string from whatever it is now? thanks! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://www.PetitionOnline.com/tap12009/ http://ffii.org http://iris.org.il http://truepeace.org http://camera.org The dark past once was the bright future. From diggsb at ohsu.edu Tue Jul 5 18:58:12 2011 From: diggsb at ohsu.edu (Brian Diggs) Date: Tue, 5 Jul 2011 09:58:12 -0700 Subject: [R] Fw: volcano plot.r In-Reply-To: <1309834823.73584.YahooMailNeo@web46303.mail.sp1.yahoo.com> References: <1309396445.53106.YahooMailNeo@web46310.mail.sp1.yahoo.com> <1309834823.73584.YahooMailNeo@web46303.mail.sp1.yahoo.com> Message-ID: <4E1342A4.6030908@ohsu.edu> See inline below, On 7/4/2011 8:00 PM, Ungku Akashah wrote: > Hello. > > My name is Akashah. i work at metabolic laboratory. From my study, i > found that volcano plot can help a lot in my section. i already > studied about the volcano plot and get the coding to run in R > software, unfortunately, there is may be something wrong with the > coding. This is because no graph appear, but no error (blue color > text) was shown on the R console. Below is the coding for volcano > plot, i hope anybody can help me to solve the problem. > > > > # volcano_plot.r > # > # Author: Amsha Nahid, Jairus Bowne, Gerard Murray > # Purpose: Produces a volcano plot > # > # Input: Data matrix as specified in Data-matrix-format.pdf > # Output: Plots log2(fold change) vs log10(t-test P-value) > # > # Notes: Group value for control must be alphanumerically first > # Script will return an error if there are more than 2 groups > > # > # Load the data matrix > # > # Read in the .csv file > data<-read.csv("file:///Users/nadya/Desktop/praktikal UTM/TASKS1/RT BE EMS 300-399.csv", sep=",", row.names=1, header=TRUE) We don't have your example file, so it is hard to say what is wrong > # Get groups information > groups<-data[,1] > # Get levels for groups > grp_levs<-levels(groups) > if (length(levels(groups))> 2) > print("Number of groups is greater than 2!") else { > > # > # Split the matrix by group > # > new_mats<-c() > for (ii in 1:length(grp_levs)) > new_mats[ii]<-list(data[which(groups==levels(groups)[ii]),]) > > # > # Calculate the means > # > # For each matrix, calculate the averages per column > submeans<-c() > # Preallocate a matrix for the means > means<-matrix( > nrow = 2, > ncol = length(colnames(data[,-1])), > dimnames = list(grp_levs,colnames(data[,-1])) > ) > # Calculate the means for each variable per sample > for (ii in 1:length(new_mats)) > {submeans[ii]<-list(apply(new_mats[[ii]][,-1],2,mean,na.rm=TRUE)) > means[ii,]<-submeans[[ii]]} > > # > # Calculate the fold change > # > folds<-matrix( > nrow=length(means[,1]), > ncol=length(means[1,]), > dimnames=list(rownames(means),colnames(means)) > ) > for (ii in 1:length(means[,1])) > for (jj in 1:length(means[1,])) > folds[ii,jj]<-means[ii,jj]/means[1,jj] > > # > # t-test P value data > # > pvals<-matrix(nrow=ncol(data[,-1]),ncol=1,dimnames=list(colnames(data[-1]),"P-Value")) > > # > # Perform the t-Test > # > for(ii in 1:nrow(pvals)) { > pvals[ii,1]<-t.test(new_mats[[1]][,ii+1],new_mats[[2]][,ii+1])$p.value > } Everything up to here is just to process the data into the format you want to plot it in. If you provided the data at this stage (folds and pvals), then it would be easier to determine what is going on. I created some dummy data from which to start at this point. folds <- rbind(0,exp(rnorm(500))) pvals <- runif(500) > m<-length(pvals) > x_range<-range(c( > min( > range(log2(folds[2,])), > range(c(-1.5,1.5)) > ), > max( > range(log2(folds[2,])), > range(c(-1.5,1.5)) > ) > )) > y_range<-range(c( > min(range(-log10(pvals)), > range(c(0,2)) > ), > max(range(-log10(pvals)), > range(c(0,2)) > ) > )) > > # > # Plot data > # > # Define a function, since it's rather involved > volcano_plot<-function(fold, pval) > {plot(x_range, # x-dim > y_range, # y-dim > type="n", # empty plot > xlab="log2 Fold Change", # x-axis title > ylab="-log10 t-Test P-value", # y-axis title > main="Volcano Plot", # plot title > ) > abline(h=-log10(0.05),col="green",lty="44")# horizontal line at P=0.05 > abline(v=c(-1,1),col="violet",lty="1343") # vertical lines at 2-fold > # Plot points based on their values: > for (ii in 1:m) > # If it's below 0.05, we're not overly interested: purple. > if (-log10(pvals[ii])>(-log10(0.05))) { > # Otherwise, more checks; > # if it's greater than 2-fold decrease: blue > if (log2(folds[2,][ii])>(-1)) { > # If it's significant but didn't change much: orange > if (log2(folds[2,][ii])<1) { > points( > log2(folds[2,][ii]), > -log10(pvals[ii]), > col="orange", > pch=20 > ) > # Otherwise, greater than 2-fold increase: red > } else { > points( > log2(folds[2,][ii]), > -log10(pvals[ii]), > col="red", > pch=20 > ) > } > } else { > points( > log2(folds[2,][ii]), > -log10(pvals[ii]), > col="blue", > pch=20 > ) > } > } else { > points( > log2(folds[2,][ii]), > -log10(pvals[ii]), > col="purple", > pch=20 > ) > } > } > # Plot onscreen via function > x11() > volcano_plot(folds,pvals) Running the above line gives me a plot. So maybe the problem is specific to your data. Without it, no one can help you further. Look at folds and pvals right before calling volcano_plot; are there values you didn't expect? Do the same with log2(folds) and -log10(pvals). And if you do give the data, make it minimal. That is, strip it down to just the parts needed to make the volcano plot. Also, you can rearrange your code to compute x_range, y_range, and m inside volcano_plot so that the function does not depend on any variables other than the ones passed (folds and pvals). Another suggestion to simplify your code is to compute the colors using vector operations, making a single vector of colors corresponding to each point, and thus needing only a single points call. (You can start by replacing all your points calls with color[ii] <- "color name", and then have a single points call outside all those ifs): volcano_plot<-function(fold, pval) { m<-length(pvals) x_range <- range(c(log2(folds[2,]), -1.5, 1.5)) y_range <- range(c(-log10(pvals), 0, 2)) plot(x_range, # x-dim y_range, # y-dim type="n", # empty plot xlab="log2 Fold Change", # x-axis title ylab="-log10 t-Test P-value", # y-axis title main="Volcano Plot", # plot title ) abline(h=-log10(0.05),col="green",lty="44")# horizontal line at P=0.05 abline(v=c(-1,1),col="violet",lty="1343") # vertical lines at 2-fold ## Define colors based on their values: color <- c() for (ii in 1:m) ## If it's below 0.05, we're not overly interested: purple. if (-log10(pvals[ii])>(-log10(0.05))) { ## Otherwise, more checks; ## if it's greater than 2-fold decrease: blue if (log2(folds[2,][ii])>(-1)) { ## If it's significant but didn't change much: orange if (log2(folds[2,][ii])<1) { color[ii] <- "orange" ## Otherwise, greater than 2-fold increase: red } else { color[ii] <- "red" } } else { color[ii] <- "blue" } } else { color[ii] <- "purple" } points( log2(folds[2,]), -log10(pvals), col=color, pch=20 ) } # example dummy data folds <- rbind(0,exp(rnorm(500))) pvals <- runif(500) volcano_plot(folds, pvals) If you vectorize the computation of the colors, it simplifies even more: volcano_plot<-function(fold, pval) { x_range <- range(c(log2(folds[2,]), -1.5, 1.5)) y_range <- range(c(-log10(pvals), 0, 2)) plot(x_range, # x-dim y_range, # y-dim type="n", # empty plot xlab="log2 Fold Change", # x-axis title ylab="-log10 t-Test P-value", # y-axis title main="Volcano Plot", # plot title ) abline(h=-log10(0.05),col="green",lty="44")# horizontal line at P=0.05 abline(v=c(-1,1),col="violet",lty="1343") # vertical lines at 2-fold ## Define colors based on their values: ## Not significant: purple ## Significant and smaller than half fold change: blue ## Significant and larger than two fold change: red ## Significant but between half and two fold change: orange color <- ifelse(-log10(pvals)>(-log10(0.05)), ifelse(log2(folds[2,])>(-1), ifelse(log2(folds[2,]<1), "orange", "red"), "blue"), "purple") points( log2(folds[2,]), -log10(pvals), col=color, pch=20 ) } volcano_plot(folds, pvals) > # Return table to analyse results > > # > # Generate figures as image files > # > # (Uncomment blocks as necessary) > > ##### jpeg ##### > # pic_jpg<-function(filename, fold, pval) > # {# Start jpeg device with basic settings > # jpeg(filename, > # quality=100, # image quality (percent) > # bg="white", # background colour > # res=300, # image resolution (dpi) > # units="in", width=8.3, height=5.8) # image dimensions (inches) > # par(mgp=c(5,2,0), # axis margins > # # (title, labels, line) > # mar=c(6,6,4,2), # plot margins (b,l,t,r) > # las=1 # horizontal labels > # ) > # # Draw the plot > # volcano_plot(folds, pvals) > # dev.off() > # } > # pic_jpg("volcano_plot.jpg") > ##### end jpeg ##### > > > #### png ##### > # pic_png<-function(filename, fold, pval) > # {# Start png device with basic settings > # png(filename, > # bg="white", # background colour > # res=300, # image resolution (dpi) > # units="in", width=8.3, height=5.8) # image dimensions (inches) > # par(mgp=c(5,2,0), # axis margins > # # (title, labels, line) > # mar=c(6,6,4,2), # plot margins (b,l,t,r) > # las=1 # horizontal labels > # ) > # # Draw the plot > # volcano_plot(folds, pvals) > # dev.off() > # } > # pic_png("volcano_plot.png") > #### end png ##### > > > # #### tiff ##### > # pic_tiff<-function(filename, fold, pval) > # {# Start tiff device with basic settings > # tiff(filename, > # bg="white", # background colour > # res=300, # image resolution (dpi) > # units="in", width=8.3, height=5.8) # image dimensions (inches) > # compression="none" # image compression > # # (one of none, lzw, zip) > # par(mgp=c(5,2,0), # axis margins > # # (title, labels, line) > # mar=c(6,6,4,2), # plot margins (b,l,t,r) > # las=1 # horizontal labels > # ) > # # Draw the plot > # volcano_plot(folds, pvals) > # dev.off() > # } > # pic_tiff("volcano_plot.tif") > # #### end tiff ##### > > > > > # > # Legacy code which allows for blue/red to be independent of 0.05 > # (purple is limited to the middle lower region) > # > ##### > # for (ii in 1:m) > # if (log2(folds[2,][ii])<1) { > # if (log2(folds[2,][ii])>-1) { > # if (-log10(pvals[ii])<(-log10(0.05))) { > # points( > # log2(folds[2,][ii]), > # -log10(pvals[ii]), > # col="purple", > # pch=20 > # ) > # } else { > # points( > # log2(folds[2,][ii]), > # -log10(pvals[ii]), > # col="orange", > # pch=20 > # ) > # } > # } else { > # points( > # log2(folds[2,][ii]), > # -log10(pvals[ii]), > # col="blue", > # pch=20 > # ) > # } > # } else { > # points( > # log2(folds[2,][ii]), > # -log10(pvals[ii]), > # col="red", > # pch=20 > # ) > # } > > # If function from above needs to be closed > } > [[alternative HTML version deleted]] > > > > -- Brian S. Diggs, PhD Senior Research Associate, Department of Surgery Oregon Health & Science University From roger.bos at rothschild.com Tue Jul 5 19:12:53 2011 From: roger.bos at rothschild.com (Bos, Roger) Date: Tue, 5 Jul 2011 13:12:53 -0400 Subject: [R] Rpad library In-Reply-To: References: <1309797908267-3644041.post@n4.nabble.com> Message-ID: Please note that Rpad is not being updated and does not work (unmodified) with versions of R greater than 2.9. So if you are trying to use it and it is not working, that may explain your difficulty. I still use it because better alternatives, like RApache, don't work on Windows. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of David Winsemius Sent: Monday, July 04, 2011 2:16 PM To: ATANU Cc: r-help at r-project.org Subject: Re: [R] Rpad library On Jul 4, 2011, at 12:45 PM, ATANU wrote: > can anyone help me with a well documented tutorial on Rpad package? > I need to > do HTML programming in R.Can anyone help me with a tutorial? Trivial Google searching produces this link: http://rpad.googlecode.com/svn-history/r76/Rpad_homepage/index.html > > -- > View this message in context: > http://r.789695.n4.nabble.com/Rpad-library-tp3644041p3644041.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *************************************************************** This message is for the named person's use only. It may\...{{dropped:20}} From jwiley.psych at gmail.com Tue Jul 5 19:17:49 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 5 Jul 2011 10:17:49 -0700 Subject: [R] loop in optim In-Reply-To: <1309836074873-3645031.post@n4.nabble.com> References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> Message-ID: Do you want something like this? ## Your data afull <- read.table(textConnection(" R_j R_m -0.0625 0.002320654 0 -0.004642807 0.033333333 0.005936332 0.032258065 0.001060848 0 0.007114057 0.015625 0.005581558 0 0.002974794 0.015384615 0.004215271 0.060606061 0.005073116 0.028571429 -0.006001279 0 -0.002789594 0.013888889 0.00770633 0 0.000371663 0.02739726 -0.004224228 -0.04 0.008362539 0 -0.010951605 0 0.004682924 0.013888889 0.011839993 -0.01369863 0.004210383 -0.027777778 -0.04658949 0 0.00987272 -0.057142857 -0.062203157 -0.03030303 -0.119177639 0.09375 0.077054642 0 -0.022763619 -0.057142857 0.050408775 0 0.024706076 -0.03030303 0.004043701 0.0625 0.004951088 0 -0.005968731 0 -0.038292548 0 0.013381097 0.014705882 0.006424728 -0.014492754 -0.020115626 0 -0.004837891 -0.029411765 -0.022054654 0.03030303 0.008936428 0.044117647 8.16925E-05 0 -0.004827246 -0.042253521 0.004653096 -0.014705882 -0.004222151 0.029850746 0.000107267 -0.028985507 -0.001783206 0.029850746 -0.006372981 0.014492754 0.005492374 -0.028571429 -0.009005846 0 0.001031683 0.044117647 0.002800551"), header = TRUE) closeAllConnections() ## likelihood function llik = function(x) { al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4] sum(na.rm=T, ifelse(a$R_j< 0, log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, ifelse(a$R_j>0 , log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, log(ifelse (( pnorm (au_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- pnorm(al_j, mean=b_j * a$R_m, sd=sqrt (sigma_j^2) )) > 0, (pnorm (au_j,mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- pnorm(al_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2) )), 1)) )) ) } start.par = c(-0.01,0.01,0.1,1) out <- matrix(NA, nrow = 4, ncol = 4, dimnames = list( paste("Run:", 1:4, sep = ''), c("al_j", "au_j", "sigma_j", "b_j"))) ## Estimate parameters based on rows 0-20, 21-40, 41-60 of 'afull' for (i in 1:4) { a <- afull[seq(20 * (i - 1) +1, 20 * i), ] out[i, ] <- optim(llik, par = start.par, method = "Nelder-Mead")[[1]] } ## Yields > out al_j au_j sigma_j b_j Run:1 0.04001776 0.06010743 1.092618e-24 1.049971 Run:2 0.04002135 0.06008513 -7.156966e-25 1.049976 Run:3 0.04714390 0.27258724 3.303320e-24 0.948988 Run:4 -0.01000000 0.01000000 1.000000e-01 1.000000 On Mon, Jul 4, 2011 at 8:21 PM, EdBo wrote: > Hi > > I have re-worked on my likelihood function and it is now working(#the code > is below#). > > ?May you help me correct my loop function. > > ?I want optim to estimates al_j; au_j; sigma_j; ?b_j by looking at 0 to 20, > ?21 to 40, 41 to 60 data points. > > ?The final result should have 4 columns of each of the estimates AND 4 rows > ?of each of 0 to 20, 21 to 40, 41 to 60. > > #likelihood function > a=read.table("D:/hope.txt",header=T) > attach(a) > a > llik = function(x) > ? { > ? ?al_j=x[1]; au_j=x[2]; sigma_j=x[3]; ?b_j=x[4] > ? ?sum(na.rm=T, > ? ? ? ?ifelse(a$R_j< 0, log(1/(2*pi*(sigma_j^2)))- > ? ? ? ? ? ? ? ? ? ? ? ? ? (1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, > ? ? ? ? ifelse(a$R_j>0 , log(1/(2*pi*(sigma_j^2)))- > ? ? ? ? ? ? ? ? ? ? ? ? ? (1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?log(ifelse (( pnorm (au_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- > ? ? ? ? ? ? ? ? ? ? ? ? ? pnorm(al_j, mean=b_j * a$R_m, sd=sqrt (sigma_j^2) > )) > 0, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(pnorm (au_j,mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- > ? ? ? ? ? ? ? ? ? ? ? ? ? pnorm(al_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2) > )), > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1)) )) > ? ? ?) > ? } > start.par = c(-0.01,0.01,0.1,1) > out1 = optim(llik, par=start.par, method="Nelder-Mead") > out1 > > -- > View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3645031.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From dwinsemius at comcast.net Tue Jul 5 19:21:57 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 13:21:57 -0400 Subject: [R] hash table access, vector access &c In-Reply-To: References: Message-ID: On Jul 5, 2011, at 12:53 PM, Sam Steingold wrote: > Hi, > I am confused by the way the indexing works. Actually I suspect you may be confused by how factors work. See below. > I read a table from a csv file like this: > > ysmd <- read.csv("ysmd.csv",header=TRUE); # And note that by default all character columns will become factors. > ysmd.table <- hash(); > for (i in 1:length(ysmd$X.stock)) ysmd.table[ysmd$X.stock[i]] <- > ysmd[i,]; > > the first column ("X.stock") is a string (factor): > >> ysmd$X.stock[[100]] > [1] FLO > 7757 Levels: A AA AA- AAAAA AAC AACC AACOU AACOW AADR AAI AAME AAN > AAON ... ZZZZT > > when I print ysmd.table, I see the data I expect: > ... > ZIOP : ZIOP 402600000 3.03 7.85 707694 6.3717 > ZIP : ZIP 794900000 23.53 31.5 677046 23.2508 > ZIPR : ZIPR 47100000 2.28 3.5 21865 2.4058 > ZIV : ZIV -1 12.2987 17.3862 37455 16.6068 > ZIXI : ZIXI 254900000 2.1 4.88 905849 3.5146 > ... > > moreover, > >> ysmd.table[['FLO']] > X.stock market.cap X52.week.low X52.week.high > X3.month.average.daily.volume > 100 FLO 2.984e+09 15.3133 > 22.37 1021580 > X50.day.moving.average.price > 100 21.3769 > > quite correctly. > however, > >> ysmd.table[ysmd$X.stock[[100]]] > containing 0 key-value pair(s). > NA : NULL > > so, how do I access the hash table element using non-literal strings? > or, how do I convert ysmd$X.stock[[100]] to a string from whatever > it is > now? Have you considered: ysmd.table[ as.character( ysmd$X.stock[[100]]) ] It appears that ysmd$X.stock[[100]] is a factor, and if so, you probably want the character value that its numeric representation points to. This is, of course, guesswork because you have not disclosed what package `hash` comes from, so I do not have the benefit of looking at its help page. > > thanks! > > -- > Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) > X 11.0.60900031 -- David Winsemius, MD West Hartford, CT From vikas.bansal at kcl.ac.uk Tue Jul 5 19:17:27 2011 From: vikas.bansal at kcl.ac.uk (Bansal, Vikas) Date: Tue, 5 Jul 2011 18:17:27 +0100 Subject: [R] Output data frame using write.table Message-ID: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC06@KCL-MAIL01.kclad.ds.kcl.ac.uk> Dear all, I have a data frame whose name is m1. I want to write this data frame in text file as output.I am using this code- write.table(m1, file = "kas.txt", append = FALSE,row.names=F,quote=F,sep="\t") When I am opening my kas.txt file,the column names are not coming exactly above the column. What should I do.Please help me. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London From bistanz at gmail.com Tue Jul 5 19:36:57 2011 From: bistanz at gmail.com (=?KOI8-R?B?0MXU0s/Xyd4=?=) Date: Tue, 5 Jul 2011 12:36:57 -0500 Subject: [R] Matrix 3d plot Message-ID: I have a problem with a 3d plot, suppose we have a matrix like this: v1 v2 v3 v4 jan-2010 0.5 0.25 0.25 0.3 feb-2010 0.35 0.12 0.12 0.4 mar-2010 0.15 0.25 0.25 0.1 and i want to plot this matrix in 3d plot where x-axis is the first column of the matrix above, y - axis is the first row of the matrix above and the z-axis is the numbers corresponding, so z(jan-2010,v3)=0.25 I was trying with persp() but i see that in this function z is a vector but in my case is a matrix, so i receive an error message I think this sound no so hard, but actually i couldn?t find a function doing this, is there actually such a function? Christian From dwinsemius at comcast.net Tue Jul 5 19:37:29 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 13:37:29 -0400 Subject: [R] Output data frame using write.table In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC06@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC06@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: On Jul 5, 2011, at 1:17 PM, Bansal, Vikas wrote: > Dear all, > > I have a data frame whose name is m1. > I want to write this data frame in text file as output.I am using > this code- > > write.table(m1, file = "kas.txt", append = > FALSE,row.names=F,quote=F,sep="\t") > > When I am opening my kas.txt file,the column names are not coming > exactly above the column. The 'write' functions do not add spacing for alignment. Use print() and capture.output() ?capture.output -- David Winsemius, MD West Hartford, CT From murdoch.duncan at gmail.com Tue Jul 5 19:46:21 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 05 Jul 2011 13:46:21 -0400 Subject: [R] Matrix 3d plot In-Reply-To: References: Message-ID: <4E134DED.4060605@gmail.com> On 05/07/2011 1:36 PM, ???????? wrote: > I have a problem with a 3d plot, suppose we have a matrix like this: > > v1 v2 v3 v4 > jan-2010 0.5 0.25 0.25 0.3 > feb-2010 0.35 0.12 0.12 0.4 > mar-2010 0.15 0.25 0.25 0.1 > > and i want to plot this matrix in 3d plot where x-axis is the first > column of the matrix above, y - axis is the first row of the matrix > above and the z-axis is the numbers corresponding, so > z(jan-2010,v3)=0.25 > > I was trying with persp() but i see that in this function z is a > vector but in my case is a matrix, so i receive an error message > I think this sound no so hard, but actually i couldn?t find a function > doing this, is there actually such a function? persp() handles exactly the case you describe, but it wants numeric x and y. You didn't show what you tried, but this works: x <- 1:3 y <- 1:4 z <- matrix(rnorm(12), nrow=3) persp(x,y,z, col="red") It's not a very useful plot with the random data; it might be better with yours. Duncan Murdoch From dwinsemius at comcast.net Tue Jul 5 19:55:23 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 13:55:23 -0400 Subject: [R] Matrix 3d plot In-Reply-To: References: Message-ID: On Jul 5, 2011, at 1:36 PM, ???????? wrote: > I have a problem with a 3d plot, suppose we have a matrix like this: > > v1 v2 v3 v4 > jan-2010 0.5 0.25 0.25 0.3 > feb-2010 0.35 0.12 0.12 0.4 > mar-2010 0.15 0.25 0.25 0.1 > > and i want to plot this matrix in 3d plot where x-axis is the first > column of the matrix above, y - axis is the first row of the matrix > above and the z-axis is the numbers corresponding, so > z(jan-2010,v3)=0.25 > > I was trying with persp() But you didn't show your code. > but i see that in this function z is a > vector That is not how I read help(persp). > but in my case is a matrix, Assuming the matrix is named "z", what happens with : persp(z=z) > so i receive an error message And after reading the Posting Guide you will see that you are asked to report any error message verbatim. And perhaps even more importantly it asks that you provide a reproducible version of your object with dump. It's even easier to use dput, but do include one or the other. > I think this sound no so hard, but actually i couldn?t find a > function > doing this, is there actually such a function? There are many. But why not see if you can get persp to work? > Christian > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From sds at gnu.org Tue Jul 5 20:10:53 2011 From: sds at gnu.org (Sam Steingold) Date: Tue, 05 Jul 2011 14:10:53 -0400 Subject: [R] hash table access, vector access &c In-Reply-To: (David Winsemius's message of "Tue, 5 Jul 2011 13:21:57 -0400") References: Message-ID: > * David Winsemius [2011-07-05 13:21:57 -0400]: > On Jul 5, 2011, at 12:53 PM, Sam Steingold wrote: >> I am confused by the way the indexing works. > Actually I suspect you may be confused by how factors work. See below. probably both :-( being a lisper, I thought about factors as lisp symbols (and thus thought that they would be accepted everywhere strings are). > Have you considered: > > ysmd.table[ as.character( ysmd$X.stock[[100]]) ] > > It appears that ysmd$X.stock[[100]] is a factor, and if so, you probably > want the character value that its numeric representation points to. indeed: > as.character(ysmd$X.stock[[100]]) [1] "FLO" however, > ysmd.table[as.character(ysmd$X.stock[[100]])] containing 0 key-value pair(s). NA : NULL so, as.character is not the answer. > ysmd.table[["FLO"]] X.stock market.cap X52.week.low X52.week.high X3.month.average.daily.volume 100 FLO 2.984e+09 15.3133 22.37 1021580 X50.day.moving.average.price 100 21.3769 > This is, of course, guesswork because you have not disclosed what > package hash` comes from, so I do not have the benefit of looking at > its help page. I just did this: library(hash); hash-2.0.1 provided by Open Data. thanks a lot for your help! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://jihadwatch.org http://iris.org.il http://honestreporting.com http://openvotingconsortium.org http://thereligionofpeace.com Lisp: Serious empowerment. From annakolar at yahoo.com Tue Jul 5 20:21:14 2011 From: annakolar at yahoo.com (Ana Kolar) Date: Tue, 5 Jul 2011 11:21:14 -0700 (PDT) Subject: [R] sample function with different proportions Message-ID: <1309890074.75708.YahooMailNeo@web114704.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jwiley.psych at gmail.com Tue Jul 5 20:25:48 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 5 Jul 2011 11:25:48 -0700 Subject: [R] sample function with different proportions In-Reply-To: <1309890074.75708.YahooMailNeo@web114704.mail.gq1.yahoo.com> References: <1309890074.75708.YahooMailNeo@web114704.mail.gq1.yahoo.com> Message-ID: Hi Ana, Look at the documentation for ?sample, specifically, the "prob" argument. In your case this should work: sample(c(0,1), 100, replace = TRUE, prob = c(.3, .7)) note that you may not have *exactly* 70% 1 and 30%, in any given sample. HTH, Josh On Tue, Jul 5, 2011 at 11:21 AM, Ana Kolar wrote: > Hi there, > > I guess this is an easy one, but still: > > I would like to randomly sample 0s and 1s but in a way that I end up having for example 70% of 1s and the rest of 0s and not 50:50 as this function does:?sample(c(0,1), 100, replace = TRUE) > > Any recommendations? > > > Many thanks! > > Ana > ? ? ? ?[[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From ustaudinger at gmail.com Tue Jul 5 20:13:50 2011 From: ustaudinger at gmail.com (Ulrich Staudinger) Date: Tue, 5 Jul 2011 20:13:50 +0200 Subject: [R] problem in reading a sequence file In-Reply-To: References: Message-ID: <-6908510375615071716@unknownmsgid> Albert, the output you show contains a column header, v1, and a row index, 1. In order to access this information, you can for example use x[1,1]. read.table reads a table and thus expects rows and columns. kind regards, Ulrich -- comnect on xing or linkedin On 05.07.2011, at 14:07, albert coster wrote: > seqfile > V1 > 1 NNNNNNNNNNATTAAAGGGC > > I want only NNNNNNNNNNATTAAAGGGC . > > > Thanks > > Albert > > On Tue, Jul 5, 2011 at 1:58 PM, Ben Bolker wrote: > >> albert coster gmail.com> writes: >> >>> >>> Dear all, >>> >>> I have a file with some sequence (seq.txt). I am writting following code >> and >>> getting error! Can please help me? >>> >>> seqfile<-read.table(file="seq.txt") >>> Warning message: >>> In read.table(file = "seq.txt") : >>> incomplete final line found by readTableHeader on 'seq.txt' >>> >>> Thanks in advance >>> >>> Albert >> >> Very hard to say without more details. Please provide a >> reproducible example, or at least more information. >> That is not an error, it's a warning: it means there *might* be >> something wrong with your data file, but not necessarily. Have you >> inspected the results? Are they what you expected? If not, do they >> give you some more information about what might be wrong? >> Usual suspects: check for unterminated/single quotation marks in >> your file. >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From murdoch.duncan at gmail.com Tue Jul 5 20:30:11 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Tue, 05 Jul 2011 14:30:11 -0400 Subject: [R] sample function with different proportions In-Reply-To: References: <1309890074.75708.YahooMailNeo@web114704.mail.gq1.yahoo.com> Message-ID: <4E135833.3020701@gmail.com> On 05/07/2011 2:25 PM, Joshua Wiley wrote: > Hi Ana, > > Look at the documentation for ?sample, specifically, the "prob" > argument. In your case this should work: > > sample(c(0,1), 100, replace = TRUE, prob = c(.3, .7)) > > note that you may not have *exactly* 70% 1 and 30%, in any given sample. And if you want exact counts, you can use sample to permute a vector. For example: sample(rep(0:1, c(30, 70))) Duncan Murdoch > HTH, > > Josh > > On Tue, Jul 5, 2011 at 11:21 AM, Ana Kolar wrote: > > Hi there, > > > > I guess this is an easy one, but still: > > > > I would like to randomly sample 0s and 1s but in a way that I end up having for example 70% of 1s and the rest of 0s and not 50:50 as this function does: sample(c(0,1), 100, replace = TRUE) > > > > Any recommendations? > > > > > > Many thanks! > > > > Ana > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > > From ted.harding at wlandres.net Tue Jul 5 20:37:09 2011 From: ted.harding at wlandres.net ( (Ted Harding)) Date: Tue, 05 Jul 2011 19:37:09 +0100 (BST) Subject: [R] sample function with different proportions In-Reply-To: Message-ID: Well, you can have exactly 70:30%, i.e. 70% 1s and 30% 0s, but in random order. For example: Popn <- c(rep(1,70),rep(0,30)) Samp <- sample(Pop) (see '?sample' for this usage -- the result of sample(x) is a random permutation of the elements of x). In probabilistic terms, this is a "conditional" sample, i.e. what you would get by using sample(...,replace=TRUE) but rejecting samples which do not have 70% 1s and 30% 0s until you get a sample which does. Ted. On 05-Jul-11 18:25:48, Joshua Wiley wrote: > Hi Ana, > > Look at the documentation for ?sample, specifically, the "prob" > argument. In your case this should work: > > sample(c(0,1), 100, replace = TRUE, prob = c(.3, .7)) > > note that you may not have *exactly* 70% 1 and 30%, in any given > sample. > > HTH, > > Josh > > On Tue, Jul 5, 2011 at 11:21 AM, Ana Kolar wrote: >> Hi there, >> >> I guess this is an easy one, but still: >> >> I would like to randomly sample 0s and 1s but in a way that >> I end up having for example 70% of 1s and the rest of 0s and >> not 50:50 as this function does: sample(c(0,1), 100, replace = TRUE) >> >> Any recommendations? >> >> Many thanks! >> Ana -------------------------------------------------------------------- E-Mail: (Ted Harding) Fax-to-email: +44 (0)870 094 0861 Date: 05-Jul-11 Time: 19:37:06 ------------------------------ XFMail ------------------------------ From dwinsemius at comcast.net Tue Jul 5 20:39:09 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 14:39:09 -0400 Subject: [R] hash table access, vector access &c In-Reply-To: References: Message-ID: <51511E1A-F2E3-40F8-88B0-4AC3B0066E4A@comcast.net> On Jul 5, 2011, at 2:10 PM, Sam Steingold wrote: >> * David Winsemius [2011-07-05 13:21:57 >> -0400]: >> On Jul 5, 2011, at 12:53 PM, Sam Steingold wrote: >>> I am confused by the way the indexing works. >> Actually I suspect you may be confused by how factors work. See >> below. > > probably both :-( > > being a lisper, I thought about factors as lisp symbols (and thus > thought that they would be accepted everywhere strings are). > >> Have you considered: >> >> ysmd.table[ as.character( ysmd$X.stock[[100]]) ] >> >> It appears that ysmd$X.stock[[100]] is a factor, and if so, you >> probably >> want the character value that its numeric representation points to. > > indeed: > >> as.character(ysmd$X.stock[[100]]) > [1] "FLO" > > however, > >> ysmd.table[as.character(ysmd$X.stock[[100]])] > containing 0 key-value pair(s). > NA : NULL > > so, as.character is not the answer. My error. Note the difference in indexing functions. "[" is not "[[" > >> ysmd.table[["FLO"]] > X.stock market.cap X52.week.low X52.week.high > X3.month.average.daily.volume > 100 FLO 2.984e+09 15.3133 > 22.37 1021580 > X50.day.moving.average.price > 100 21.3769 So you are here demonstrating that you should be using "[[" > >> This is, of course, guesswork because you have not disclosed what >> package hash` comes from, so I do not have the benefit of looking at >> its help page. > > I just did this: > > library(hash); > hash-2.0.1 provided by Open Data. -- David Winsemius, MD West Hartford, CT From wdunlap at tibco.com Tue Jul 5 20:48:52 2011 From: wdunlap at tibco.com (William Dunlap) Date: Tue, 5 Jul 2011 11:48:52 -0700 Subject: [R] Wrong environment when evaluating and expression? In-Reply-To: References: Message-ID: <77EB52C6DD32BA4D87471DCD70C8D70004653091@NA-PA-VBE03.na.tibco.com> > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Joshua Wiley > Sent: Monday, July 04, 2011 1:12 AM > To: r-help at r-project.org > Subject: [R] Wrong environment when evaluating and expression? > > Hi All, > > I have constructed two expressions (e1 & e2). I can see that they are > not identical, but I cannot figure out how they differ. > > ############### > dat <- mtcars > e1 <- expression(with(data = dat, lm(mpg ~ hp))) > e2 <- as.expression(substitute(with(data = dat, lm(f)), > list(f = mpg ~ hp))) > > str(e1) > str(e2) > all.equal(e1, e2) > identical(e1, e2) # false With the appended str.language function you can see the difference between e1 and e2. It displays `name` class(length) of each component of a recursive object, along with a short text summary of it after a colon. > str.language(e1) `e1` expression(1): expression(with(data = da... `` call(3): with(data = dat, lm(mpg ~... `` name(1): with `data` name(1): dat `` call(2): lm(mpg ~ hp) `` name(1): lm `` call(3): mpg ~ hp `` name(1): ~ `` name(1): mpg `` name(1): hp > str.language(e2) `e2` expression(1): expression(with(data = da... `` call(3): with(data = dat, lm(mpg ~... `` name(1): with `data` name(1): dat `` call(2): lm(mpg ~ hp) `` name(1): lm `` formula(3): mpg ~ hp `` name(1): ~ `` name(1): mpg `` name(1): hp `Attributes of ` list(2): structure(list(class = "f... `class` character(1): "formula" `.Environment` environment(5): dat e1 e2 s... It is a bug in all.equal() that it ignores attributes of formulae. E.g., > all.equal(y~x, terms(y~x)) [1] TRUE > identical(y~x, terms(y~x)) [1] FALSE Here is str.language str.language <- function (object, ..., level = 0, name = deparse(substitute(object)), attributes = TRUE) { abbr <- function(string, maxlen = 25) { if (length(string) > 1 || nchar(string) > maxlen) paste(substring(string[1], 1, maxlen), "...", sep = "") else string } myDeparse <- function(object) { if (!is.environment(object)) { deparse(object) } else { ename <- environmentName(object) if (ename == "") ename <- "" paste(sep = "", "<", ename, "> ", paste(collapse = " ", objects(object))) } } cat(rep(" ", level), sep = "") if (is.null(name)) name <- "" cat(sprintf("`%s` %s(%d): %s\n", abbr(name), class(object), length(object), abbr(myDeparse(object)))) a <- attributes(object) if (is.recursive(object) && !is.environment(object)) { object <- as.list(object) names <- names(object) for (i in seq_along(object)) { str.language(object[[i]], ..., level = level + 1, name = names[i], attributes = attributes) } } if (attributes) { a$names <- NULL if (length(a) > 0) { str.language(a, level = level + 1, name = paste("Attributes of", abbr(name)), attributes = attributes) } } } Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > > eval(e1) > eval(e2) > ################ > > The context is trying to use a list of formulae to generate several > models from a multiply imputed dataset. The package I am using (mice) > has methods for with() and that is how I can (easily) get the pooled > results. Passing the formula directly does not work, so I was trying > to generate the entire call and evaluate it as if I had typed it at > the console, but I am missing something (probably rather silly). > > Thanks, > > Josh > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > http://www.joshuawiley.com/ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From Greg.Snow at imail.org Tue Jul 5 20:50:46 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Tue, 5 Jul 2011 12:50:46 -0600 Subject: [R] How to translate string to variable inside a command in an easy way in R In-Reply-To: <1309861781957-3645594.post@n4.nabble.com> References: <1309861781957-3645594.post@n4.nabble.com> Message-ID: You are suffering from the fact that the longest distance between 2 points is a shortcut. The df$column notation is a shortcut for df[[column]] that has some nice properties, but the shortcut gets in the way when you want to do something more structured. Try qq1[[z]]==y and avoid all that pasting, parsing, and evaluating. -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of UriB > Sent: Tuesday, July 05, 2011 4:30 AM > To: r-help at r-project.org > Subject: [R] How to translate string to variable inside a command in an > easy way in R > > I want to write a function that get 2 strings y and z and does the > following > R command. > > temp<-qq1[qq1$z==y,] > for example if it get y="AMI" and z="PrimaryConditionGroup" > It should do the following > temp<-qq1[qq1$PrimaryConditionGroup=="AMI",] > > I could do it by the following function that is ugly and I wonder if > there > is an easier way to do it espacielly when temp is not the final result > that > I want (so I practically do not have temp<<-temp because I do not need > the > function to remember temp but only to remember something else that is > calculated based on temp). > > ugly<-function(y,z) > { > text1<-paste("temp<-qq1[qq1$",z,sep="") > text1<-paste(text1,"==y",sep="") > text1<-paste(text1,",]",sep="") > eval(parse(text=text1)) > temp<<-temp > } > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to- > translate-string-to-variable-inside-a-command-in-an-easy-way-in-R- > tp3645594p3645594.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From sds at gnu.org Tue Jul 5 21:30:02 2011 From: sds at gnu.org (Sam Steingold) Date: Tue, 05 Jul 2011 15:30:02 -0400 Subject: [R] hash table access, vector access &c In-Reply-To: <51511E1A-F2E3-40F8-88B0-4AC3B0066E4A@comcast.net> (David Winsemius's message of "Tue, 5 Jul 2011 14:39:09 -0400") References: <51511E1A-F2E3-40F8-88B0-4AC3B0066E4A@comcast.net> Message-ID: > * David Winsemius [2011-07-05 14:39:09 -0400]: > > So you are here demonstrating that you should be using "[[" yes, thanks! now, how do I extend a frame with new columns based on a hash table? specifically, I have a frame: > str(etr.rt) 'data.frame': 75986 obs. of 15 variables: $ aaaaaa : POSIXlt, format: ... $ bbbbb : POSIXlt, format: ... $ cccccccc : num ... $ symbol : Factor w/ 4521 levels "A","AA","AACC",..: 985 985 2322 3677 4486 4486 1607 3677 4486 1279 ... $ dddd : int 500 500 ... $ eeeeeeee : num 16.61 5.74 ... and a hash table: > str(ysmd.table) Formal class 'hash' [package "hash"] with 1 slots ..@ .xData: > summary(ysmd.table) Length Class Mode 7757 hash S4 > ysmd.table[[as.character(etr.rt$symbol[[100]])]] X.stock market.cap X52.week.low X52.week.high 3122 DFS 1.4606e+10 12.11 26.95 X3.month.average.daily.volume X50.day.moving.average.price 3122 6153430 24.0242 I want to modify etr.rt (or create a new frame etr.rt.md) which would have all the columns of etr.rt plus 5 additional columns market.cap X52.week.low X52.week.high X3.month.average.daily.volume X50.day.moving.average.price which for the row number i in etr.rt come from ysmd.table[[as.character(etr.rt$symbol[[i]])]] (obviously, etr.rt$symbol[[i]] == ysmd.table[[as.character(etr.rt$symbol[[i]])]]$X.stock ) thanks! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://thereligionofpeace.com http://dhimmi.com http://ffii.org http://pmw.org.il http://mideasttruth.com http://openvotingconsortium.org The early bird may get the worm, but the second mouse gets the cheese. From AVSHALO2 at POST.TAU.AC.IL Tue Jul 5 20:55:27 2011 From: AVSHALO2 at POST.TAU.AC.IL (avsha38) Date: Tue, 5 Jul 2011 11:55:27 -0700 (PDT) Subject: [R] Survival Analysis Message-ID: <1309892127923-3646804.post@n4.nabble.com> Hello, I have few questions about recurring events. I would greatly appreciate it if anyone can assist me. I have data that consist of approx 1,100 Consecutive patients released from hospital after first Myocardial infarction (MI). They were followed for 13 years. Recurrent MI and unstable angina pectoris (UAP) leading to hospitalization were recorded (within-subject range: 0-4 for recurrent MI; 0-19 for UAP). Socio demographic and clinical data were obtained at study entry. I want to fit Semiparametric regression models. 1. What will be the best method regarding the time scale (calendar times or gap times) when I want the fit a model for recurrent MI ? For the UAP recurrent event? 2. In 25% of the subjects the last MI event was fatal; will it be correct to consider fatal and non-fatal events as if they are events of the same type? Which R Package is best to deal with informative censoring? Can the "Survival" Package deal with this? Thanks in advance, Avi -- View this message in context: http://r.789695.n4.nabble.com/Survival-Analysis-tp3646804p3646804.html Sent from the R help mailing list archive at Nabble.com. From mailinglist.honeypot at gmail.com Tue Jul 5 22:12:01 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 5 Jul 2011 16:12:01 -0400 Subject: [R] Output data frame using write.table In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC06@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC06@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: Hi, On Tue, Jul 5, 2011 at 1:17 PM, Bansal, Vikas wrote: > Dear all, > > I have a data frame whose name is m1. > I want to write this data frame in text file as output.I am using this code- > > write.table(m1, file = "kas.txt", append = FALSE,row.names=F,quote=F,sep="\t") > > When I am opening my kas.txt file,the column names are not coming exactly above the column. > What should I do.Please help me. You are writing a tab-delimited text file -- unless the width (number of characters) of all the elements of all of your columns is less than your tab width, then what you want will never happen. You can either set your "tab width" in your text editor that you're using to view the file to some insanely large number so you can make the above statement true, or you can do what David suggested ... Is the reason you are writing the file in text so that you can display it in some text editor at some later point, or do you actually want to use the data in other ways later? I also reckon if you open your tab-delimited file in something like excel, then all things should come out as "nice" as you want .. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From engstrom.gary at gmail.com Tue Jul 5 22:27:41 2011 From: engstrom.gary at gmail.com (gary engstrom) Date: Tue, 5 Jul 2011 16:27:41 -0400 Subject: [R] if else lop Message-ID: I am trying to use if...else loop and have included a code snippet which I might like to expand. Maybe you could steer me in the right direction. library(stats) library(prob) { a <- sample ( 1:4,100, replace=T,prob=c(0.1,0.2,0.5,0.3)) b<-sample(3:6,100,replace=T,prob=c(0.2,0.2,0.2,0.4)) } dd <- data.frame(a,b) if (subset finds a vector) ( print that vector) (else continue looking at subsets) subset(dd,isin(dd,c(1,4), ordered = FALSE)) subset(dd,isin(dd,c(3,3),ordered=F)) Thank you G From sarah.goslee at gmail.com Tue Jul 5 22:36:50 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Tue, 5 Jul 2011 16:36:50 -0400 Subject: [R] if else lop In-Reply-To: References: Message-ID: Gary, Was the second half of my message this morning not clear enough? It wasn't clear from your original message that you were using isin() from the prob package, rather than using isin() as pseudocode, so I'd written a function to do that part. But the second half of my message went through the if() part of your question. If it doesn't do what you want, you need to be clearer about what you need that it doesn't do. Sarah On Tue, Jul 5, 2011 at 4:27 PM, gary engstrom wrote: > I am trying to use if...else loop and have included a code snippet which I > > ?might like to expand. > ?Maybe you could steer me in the right direction. > ?library(stats) > ?library(prob) > ?{ > ? a <- sample ( 1:4,100, replace=T,prob=c(0.1,0.2,0.5,0.3)) > ?b<-sample(3:6,100,replace=T,prob=c(0.2,0.2,0.2,0.4)) > ?} > ?dd <- data.frame(a,b) > ?if (subset finds a vector) ( print that vector) > ?(else continue looking at subsets) > ?subset(dd,isin(dd,c(1,4), ordered = FALSE)) > ?subset(dd,isin(dd,c(3,3),ordered=F)) > ?Thank you > ?G > -- Sarah Goslee http://www.functionaldiversity.org From pdalgd at gmail.com Tue Jul 5 23:53:51 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Tue, 5 Jul 2011 23:53:51 +0200 Subject: [R] if else lop In-Reply-To: References: Message-ID: <2DF6E1B9-9A38-4B16-84D2-2447372E9696@gmail.com> On Jul 5, 2011, at 22:27 , gary engstrom wrote: > I am trying to use if...else loop Argh! A loop goes _around_ and around. "for", "repeat", "while". "if ... else" is a _branching_ construct. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From jwiley.psych at gmail.com Wed Jul 6 00:02:22 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 5 Jul 2011 15:02:22 -0700 Subject: [R] if else lop In-Reply-To: <2DF6E1B9-9A38-4B16-84D2-2447372E9696@gmail.com> References: <2DF6E1B9-9A38-4B16-84D2-2447372E9696@gmail.com> Message-ID: On Tue, Jul 5, 2011 at 2:53 PM, peter dalgaard wrote: > > On Jul 5, 2011, at 22:27 , gary engstrom wrote: > >> I am trying to use if...else loop > > Argh! > > A loop goes _around_ and around. "for", "repeat", "while". > > "if ... else" ?is a _branching_ construct. Nominated for a fortune. > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From gunter.berton at gene.com Wed Jul 6 00:07:38 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 5 Jul 2011 15:07:38 -0700 Subject: [R] if else lop In-Reply-To: <2DF6E1B9-9A38-4B16-84D2-2447372E9696@gmail.com> References: <2DF6E1B9-9A38-4B16-84D2-2447372E9696@gmail.com> Message-ID: Hi Peter: Beware that "gnashing of teeth" business. My Dad, a dentist, said that a couple of his patients who were musicians did this when they played and ground their teeth down so much he had to make them dentures! ;-) Cheers, Bert On Tue, Jul 5, 2011 at 2:53 PM, peter dalgaard wrote: > > On Jul 5, 2011, at 22:27 , gary engstrom wrote: > >> I am trying to use if...else loop > > Argh! > > A loop goes _around_ and around. "for", "repeat", "while". > > "if ... else" ?is a _branching_ construct. > > -- > Peter Dalgaard > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Email: pd.mes at cbs.dk ?Priv: PDalgd at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm From ekt.batey at gmail.com Wed Jul 6 00:08:48 2011 From: ekt.batey at gmail.com (Trey Batey) Date: Tue, 5 Jul 2011 15:08:48 -0700 Subject: [R] plotting survival curves (multiple curves on single graph) Message-ID: Hello. This is a follow-up to a question I posted last week. With some previous suggestions from the R-help community, I have been able to plot survival (, hazard, and density) curves using published data for Siler hazard parameters from a number of ethnographic populations. Can the function below be modified, perhaps with a "for" statement, so that multiple curves (different line types---one for each population) are plotted on a single graph for comparison? Thanks so much. --Trey The function and calls below use the data in this Excel file (feel free to access): https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US ## - plot Siler survival curve ############################## silsurv<-function(a1,b1,a2,a3,b3) { sil=function(t) { h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t) S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t))) d.t<-S.t*h.t #return(d.t) return(S.t) #return(h.t) } t<-seq(0,90,1) plot(t,sil(t),ylim=c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age (years)') } with(hazanth[1,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9) # plot for Hadza with(hazanth[2,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9) # plot for Ache with(hazanth[3,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9) # plot for Hiwi with(hazanth[4,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9) # plot for !Kung with(hazanth[5,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9) # plot for Yanomamo with(hazanth[6,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9) # plot for Tsimane ############################### From gunter.berton at gene.com Wed Jul 6 00:16:02 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 5 Jul 2011 15:16:02 -0700 Subject: [R] plotting survival curves (multiple curves on single graph) In-Reply-To: References: Message-ID: Yes, it can be done using basic plot commands. But if you really want to get fancy and plot "grouped" graphs, I strongly recommend you look into R's packages -- ggplot or trellis. Both have excellent documentation and companion books and were built for this sort of thing. The (considerable) learning curve will be worth the effort. Cheers, Bert On Tue, Jul 5, 2011 at 3:08 PM, Trey Batey wrote: > Hello. > > This is a follow-up to a question I posted last week. ?With some > previous suggestions from the R-help community, I have been able to > plot survival (, hazard, and density) curves using published data for > Siler hazard parameters from a number of ethnographic populations. > Can the function below be modified, perhaps with a "for" statement, so > that multiple curves (different line types---one for each population) > are plotted on a single graph for comparison? ?Thanks so much. > > --Trey > > The function and calls below use the data in this Excel file (feel > free to access): > https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US > > ## - plot Siler survival curve > ############################## > silsurv<-function(a1,b1,a2,a3,b3) > ?{ > ? ?sil=function(t) > ? ? ?{ > ? ? ? ?h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t) > ? ? ? ?S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t))) > ? ? ? ?d.t<-S.t*h.t > > ? ? ? ?#return(d.t) > ? ? ? ?return(S.t) > ? ? ? ?#return(h.t) > ? ? ?} > ? ?t<-seq(0,90,1) > plot(t,sil(t),ylim=c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age > (years)') > ?} > > with(hazanth[1,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9) > ?# plot for Hadza > with(hazanth[2,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9) > ?# plot for Ache > with(hazanth[3,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9) > ?# plot for Hiwi > with(hazanth[4,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9) > ?# plot for !Kung > with(hazanth[5,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9) > ?# plot for Yanomamo > with(hazanth[6,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9) > ?# plot for Tsimane > > ############################### > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm From mailinglist.honeypot at gmail.com Wed Jul 6 00:22:38 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Tue, 5 Jul 2011 18:22:38 -0400 Subject: [R] plotting survival curves (multiple curves on single graph) In-Reply-To: References: Message-ID: Quick note: On Tue, Jul 5, 2011 at 6:16 PM, Bert Gunter wrote: > Yes, it can be done using basic plot commands. > > But if you really want to get fancy and plot "grouped" graphs, I > strongly recommend you look into R's packages -- ggplot or trellis. Attempting to clear out any confusion before it sets in: I'm pretty sure Bert meant "lattice" instead of "trellis". -steve > Both have excellent documentation and companion books and ?were built > for this sort of thing. The (considerable) learning curve will be > worth the effort. > > Cheers, > Bert > > On Tue, Jul 5, 2011 at 3:08 PM, Trey Batey wrote: >> Hello. >> >> This is a follow-up to a question I posted last week. ?With some >> previous suggestions from the R-help community, I have been able to >> plot survival (, hazard, and density) curves using published data for >> Siler hazard parameters from a number of ethnographic populations. >> Can the function below be modified, perhaps with a "for" statement, so >> that multiple curves (different line types---one for each population) >> are plotted on a single graph for comparison? ?Thanks so much. >> >> --Trey >> >> The function and calls below use the data in this Excel file (feel >> free to access): >> https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US >> >> ## - plot Siler survival curve >> ############################## >> silsurv<-function(a1,b1,a2,a3,b3) >> ?{ >> ? ?sil=function(t) >> ? ? ?{ >> ? ? ? ?h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t) >> ? ? ? ?S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t))) >> ? ? ? ?d.t<-S.t*h.t >> >> ? ? ? ?#return(d.t) >> ? ? ? ?return(S.t) >> ? ? ? ?#return(h.t) >> ? ? ?} >> ? ?t<-seq(0,90,1) >> plot(t,sil(t),ylim=c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age >> (years)') >> ?} >> >> with(hazanth[1,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9) >> ?# plot for Hadza >> with(hazanth[2,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9) >> ?# plot for Ache >> with(hazanth[3,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9) >> ?# plot for Hiwi >> with(hazanth[4,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9) >> ?# plot for !Kung >> with(hazanth[5,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9) >> ?# plot for Yanomamo >> with(hazanth[6,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9) >> ?# plot for Tsimane >> >> ############################### >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > "Men by nature long to get on to the ultimate truths, and will often > be impatient with elementary studies or fight shy of them. If it were > possible to reach the ultimate truths without the elementary studies > usually prefixed to them, these would not be preparatory studies but > superfluous diversions." > > -- Maimonides (1135-1204) > > Bert Gunter > Genentech Nonclinical Biostatistics > 467-7374 > http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From dwinsemius at comcast.net Wed Jul 6 00:24:13 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 18:24:13 -0400 Subject: [R] plotting survival curves (multiple curves on single graph) In-Reply-To: References: Message-ID: <1661B728-2020-472E-9830-ABF03E59E003@comcast.net> On Jul 5, 2011, at 6:08 PM, Trey Batey wrote: > Hello. > > This is a follow-up to a question I posted last week. With some > previous suggestions from the R-help community, I have been able to > plot survival (, hazard, and density) curves using published data for > Siler hazard parameters from a number of ethnographic populations. > Can the function below be modified, perhaps with a "for" statement, so > that multiple curves (different line types---one for each population) > are plotted on a single graph for comparison? Thanks so much. > There are (at least) three methods to plot multiple curves in base plotting: -- plot() then lines() ?lines --plot(); par(add=TRUE); plot() ?par # There is also matplot() ?matplot After extracting the sil function to exist on its own, you could try: matplot(x=t, y=apply(hazanth[ ,3:7], 1, sil) My first choice would be to make a modified version of your silsurv that uses the lines function rather than plot and then you can just use the lines of code you already have. -- David > --Trey > > The function and calls below use the data in this Excel file (feel > free to access): > https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US > > ## - plot Siler survival curve > ############################## > silsurv<-function(a1,b1,a2,a3,b3) > { > sil=function(t) > { > h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t) > S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t))) > d.t<-S.t*h.t > > #return(d.t) > return(S.t) > #return(h.t) > } > t<-seq(0,90,1) > plot > (t > ,sil > (t > ),ylim=c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age > (years)') > } > > with > (hazanth > [1,3 > : > 7 > ],silsurv > (a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9) > # plot for Hadza > with > (hazanth > [2,3 > : > 7 > ],silsurv > (a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9) > # plot for Ache > with > (hazanth > [3,3 > : > 7 > ],silsurv > (a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9) > # plot for Hiwi > with > (hazanth > [4,3 > : > 7 > ],silsurv > (a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9) > # plot for !Kung > with > (hazanth > [5,3 > : > 7 > ],silsurv > (a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9) > # plot for Yanomamo > with > (hazanth > [6,3 > : > 7 > ],silsurv > (a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9) > # plot for Tsimane > > ############################### > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From dwinsemius at comcast.net Wed Jul 6 00:26:48 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 18:26:48 -0400 Subject: [R] plotting survival curves (multiple curves on single graph) In-Reply-To: <1661B728-2020-472E-9830-ABF03E59E003@comcast.net> References: <1661B728-2020-472E-9830-ABF03E59E003@comcast.net> Message-ID: <05B528BF-FA3A-4DB2-8031-14E6C6ACDCF7@comcast.net> On Jul 5, 2011, at 6:24 PM, David Winsemius wrote: > > On Jul 5, 2011, at 6:08 PM, Trey Batey wrote: > >> Hello. >> >> This is a follow-up to a question I posted last week. With some >> previous suggestions from the R-help community, I have been able to >> plot survival (, hazard, and density) curves using published data for >> Siler hazard parameters from a number of ethnographic populations. >> Can the function below be modified, perhaps with a "for" statement, >> so >> that multiple curves (different line types---one for each population) >> are plotted on a single graph for comparison? Thanks so much. >> > > There are (at least) three methods to plot multiple curves in base > plotting: > -- plot() then lines() > > ?lines > > --plot(); par(add=TRUE); plot() > Er, ... make that par(new=TRUE) > ?par > > # There is also matplot() > ?matplot > > After extracting the sil function to exist on its own, you could try: > > matplot(x=t, y=apply(hazanth[ ,3:7], 1, sil) > > My first choice would be to make a modified version of your silsurv > that uses the lines function rather than plot and then you can just > use the lines of code you already have. > > -- > David >> --Trey >> >> The function and calls below use the data in this Excel file (feel >> free to access): >> https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US >> >> ## - plot Siler survival curve >> ############################## >> silsurv<-function(a1,b1,a2,a3,b3) >> { >> sil=function(t) >> { >> h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t) >> S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t))) >> d.t<-S.t*h.t >> >> #return(d.t) >> return(S.t) >> #return(h.t) >> } >> t<-seq(0,90,1) >> plot >> (t >> ,sil >> (t >> ),ylim >> =c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age >> (years)') >> } >> >> with >> (hazanth >> [1,3 >> : >> 7 >> ],silsurv >> (a1 >> =a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9) >> # plot for Hadza >> with >> (hazanth >> [2,3 >> : >> 7 >> ],silsurv >> (a1 >> =a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9) >> # plot for Ache >> with >> (hazanth >> [3,3 >> : >> 7 >> ],silsurv >> (a1 >> =a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9) >> # plot for Hiwi >> with >> (hazanth >> [4,3 >> : >> 7 >> ],silsurv >> (a1 >> =a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9) >> # plot for !Kung >> with >> (hazanth >> [5,3 >> : >> 7 >> ],silsurv >> (a1 >> =a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9) >> # plot for Yanomamo >> with >> (hazanth >> [6,3 >> : >> 7 >> ],silsurv >> (a1 >> =a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9) >> # plot for Tsimane >> >> ############################### >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From gunter.berton at gene.com Wed Jul 6 00:26:56 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Tue, 5 Jul 2011 15:26:56 -0700 Subject: [R] plotting survival curves (multiple curves on single graph) In-Reply-To: References: Message-ID: Yes. Trellis plots are in the "lattice" package. My bad. -- Bert On Tue, Jul 5, 2011 at 3:22 PM, Steve Lianoglou wrote: > Quick note: > > On Tue, Jul 5, 2011 at 6:16 PM, Bert Gunter wrote: >> Yes, it can be done using basic plot commands. >> >> But if you really want to get fancy and plot "grouped" graphs, I >> strongly recommend you look into R's packages -- ggplot or trellis. > > Attempting to clear out any confusion before it sets in: I'm pretty > sure Bert meant "lattice" instead of "trellis". > > -steve > >> Both have excellent documentation and companion books and ?were built >> for this sort of thing. The (considerable) learning curve will be >> worth the effort. >> >> Cheers, >> Bert >> >> On Tue, Jul 5, 2011 at 3:08 PM, Trey Batey wrote: >>> Hello. >>> >>> This is a follow-up to a question I posted last week. ?With some >>> previous suggestions from the R-help community, I have been able to >>> plot survival (, hazard, and density) curves using published data for >>> Siler hazard parameters from a number of ethnographic populations. >>> Can the function below be modified, perhaps with a "for" statement, so >>> that multiple curves (different line types---one for each population) >>> are plotted on a single graph for comparison? ?Thanks so much. >>> >>> --Trey >>> >>> The function and calls below use the data in this Excel file (feel >>> free to access): >>> https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US >>> >>> ## - plot Siler survival curve >>> ############################## >>> silsurv<-function(a1,b1,a2,a3,b3) >>> ?{ >>> ? ?sil=function(t) >>> ? ? ?{ >>> ? ? ? ?h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t) >>> ? ? ? ?S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t))) >>> ? ? ? ?d.t<-S.t*h.t >>> >>> ? ? ? ?#return(d.t) >>> ? ? ? ?return(S.t) >>> ? ? ? ?#return(h.t) >>> ? ? ?} >>> ? ?t<-seq(0,90,1) >>> plot(t,sil(t),ylim=c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age >>> (years)') >>> ?} >>> >>> with(hazanth[1,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9) >>> ?# plot for Hadza >>> with(hazanth[2,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9) >>> ?# plot for Ache >>> with(hazanth[3,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9) >>> ?# plot for Hiwi >>> with(hazanth[4,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9) >>> ?# plot for !Kung >>> with(hazanth[5,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9) >>> ?# plot for Yanomamo >>> with(hazanth[6,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9) >>> ?# plot for Tsimane >>> >>> ############################### >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> "Men by nature long to get on to the ultimate truths, and will often >> be impatient with elementary studies or fight shy of them. If it were >> possible to reach the ultimate truths without the elementary studies >> usually prefixed to them, these would not be preparatory studies but >> superfluous diversions." >> >> -- Maimonides (1135-1204) >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> 467-7374 >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology > ?| Memorial Sloan-Kettering Cancer Center > ?| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm From jholtman at gmail.com Wed Jul 6 00:43:20 2011 From: jholtman at gmail.com (jim holtman) Date: Tue, 5 Jul 2011 18:43:20 -0400 Subject: [R] How to build a matrix of number of appearance? In-Reply-To: <1309859105151-3645550.post@n4.nabble.com> References: <1309772913259-3643248.post@n4.nabble.com> <17867D87-8C02-40DC-B647-B8C5DE41FD56@comcast.net> <1309859105151-3645550.post@n4.nabble.com> Message-ID: Provide some more information about the size of the data and the number of different ID combinations. I have found that in some cases like this using the 'sqldf' package helps since it can deal with large number of combinations. On Tue, Jul 5, 2011 at 5:45 AM, UriB wrote: > Thanks for your reply > Note that I guess that there are many providerID and I get the error cannot > allocate vector of size 2.1 Gb > (I can use the same trick for most of the other fields) > > Is there a way to do the same only for providerID with relatively high > frequency? > > -- > View this message in context: http://r.789695.n4.nabble.com/How-to-build-a-matrix-of-number-of-appearance-tp3643248p3645550.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jholtman at gmail.com Wed Jul 6 01:13:10 2011 From: jholtman at gmail.com (jim holtman) Date: Tue, 5 Jul 2011 19:13:10 -0400 Subject: [R] Output data frame using write.table In-Reply-To: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC06@KCL-MAIL01.kclad.ds.kcl.ac.uk> References: <5BDAC30CDCA4184E9E5E57167D4E2C96018E5C8FEC06@KCL-MAIL01.kclad.ds.kcl.ac.uk> Message-ID: Use 'write.csv' and then use EXCEL as the way of formatting the output in the way that you like it. Otherwise you want use 'sprintf' to specify how you want the formatting done. On Tue, Jul 5, 2011 at 1:17 PM, Bansal, Vikas wrote: > Dear all, > > I have a data frame whose name is m1. > I want to write this data frame in text file as output.I am using this code- > > write.table(m1, file = "kas.txt", append = FALSE,row.names=F,quote=F,sep="\t") > > When I am opening my kas.txt file,the column names are not coming exactly above the column. > What should I do.Please help me. > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From irene_vrbik at hotmail.com Tue Jul 5 22:29:35 2011 From: irene_vrbik at hotmail.com (statfan) Date: Tue, 5 Jul 2011 13:29:35 -0700 (PDT) Subject: [R] sampling from the multivariate truncated normal In-Reply-To: <4E082AD4.6000806@statistik.tu-dortmund.de> References: <1309116392642-3626438.post@n4.nabble.com> <4E082AD4.6000806@statistik.tu-dortmund.de> Message-ID: <1309897775594-3647039.post@n4.nabble.com> Well, for 0.828324 < x[2] < Inf the probablility is roughly 0 hence not easy to draw random numbers out there .... Uwe Ligges How is this probability roughly 0? -- View this message in context: http://r.789695.n4.nabble.com/sampling-from-the-multivariate-truncated-normal-tp3626438p3647039.html Sent from the R help mailing list archive at Nabble.com. From quagaars at gmail.com Wed Jul 6 01:16:38 2011 From: quagaars at gmail.com (Q) Date: Tue, 5 Jul 2011 16:16:38 -0700 (PDT) Subject: [R] Create a data frame of all possible unique combinations of factors Message-ID: <1309907798308-3647338.post@n4.nabble.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From btsai00 at gmail.com Wed Jul 6 00:03:17 2011 From: btsai00 at gmail.com (Brian Tsai) Date: Tue, 5 Jul 2011 15:03:17 -0700 Subject: [R] function to compute pvalue for comparing two ROC curves Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jholtman at gmail.com Wed Jul 6 02:03:09 2011 From: jholtman at gmail.com (jim holtman) Date: Tue, 5 Jul 2011 20:03:09 -0400 Subject: [R] Create a data frame of all possible unique combinations of factors In-Reply-To: <1309907798308-3647338.post@n4.nabble.com> References: <1309907798308-3647338.post@n4.nabble.com> Message-ID: Is this what you want: > test <- c("A","B","C","D") > expand.grid(test,test) Var1 Var2 1 A A 2 B A 3 C A 4 D A 5 A B 6 B B 7 C B 8 D B 9 A C 10 B C 11 C C 12 D C 13 A D 14 B D 15 C D 16 D D On Tue, Jul 5, 2011 at 7:16 PM, Q wrote: > Hello, > > I'm trying to create a data frame where each row has a unique combination of > factors. > > I start with a vector of species like so: > > > >> 1> test <- c("A","B","C","D") >> > >> 1> test >> > >> [1] "A" "B" "C" "D" >> > > To get all species combinations I have used expand.grid like this: > > > >> 1> pairs <- expand.grid(test,test) > >> 1> pairs > >> ? ?Var1 Var2 > >> 1 ? ? A ? ?A > >> 2 ? ? B ? ?A > >> 3 ? ? C ? ?A > >> 4 ? ? D ? ?A > >> 5 ? ? A ? ?B > >> 6 ? ? B ? ?B > >> 7 ? ? C ? ?B > >> 8 ? ? D ? ?B > >> 9 ? ? A ? ?C > >> 10 ? ?B ? ?C > >> 11 ? ?C ? ?C > >> 12 ? ?D ? ?C > >> 13 ? ?A ? ?D > >> 14 ? ?B ? ?D > >> 15 ? ?C ? ?D > >> 16 ? ?D ? ?D >> > > Now I want to select only the unique pairs, which I have tried to do with > the function "unique": > > > >> 1> unique(pairs) >> > > , but that doesn't do anything... I guess because it considers A,B to be > different from B,A. ?The data frame I would like to end up with should look > like this. > > > >> ? ?Var1 Var2 > >> 1 ? ? A ? ?A > >> 2 ? ? B ? ?A > >> 3 ? ? C ? ?A > >> 4 ? ? D ? ?A > >> 6 ? ? B ? ?B > >> 7 ? ? C ? ?B > >> 8 ? ? D ? ?B > >> 11 ? ?C ? ?C > >> 12 ? ?D ? ?C > >> 16 ? ?D ? ?D > >> > > Thanks for your help! > > Q > > -- > View this message in context: http://r.789695.n4.nabble.com/Create-a-data-frame-of-all-possible-unique-combinations-of-factors-tp3647338p3647338.html > Sent from the R help mailing list archive at Nabble.com. > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From jholtman at gmail.com Wed Jul 6 02:08:24 2011 From: jholtman at gmail.com (jim holtman) Date: Tue, 5 Jul 2011 20:08:24 -0400 Subject: [R] Create a data frame of all possible unique combinations of factors In-Reply-To: <1309907798308-3647338.post@n4.nabble.com> References: <1309907798308-3647338.post@n4.nabble.com> Message-ID: Missed that you wanted to elim duplicated: > z <- expand.grid(test,test) > # add 'unique' key > z$key <- apply(z, 1, function(x)paste(sort(x), collapse='')) > str(z) 'data.frame': 16 obs. of 3 variables: $ Var1: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ... $ Var2: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 2 2 2 2 3 3 ... $ key : chr "AA" "AB" "AC" "AD" ... - attr(*, "out.attrs")=List of 2 ..$ dim : int 4 4 ..$ dimnames:List of 2 .. ..$ Var1: chr "Var1=A" "Var1=B" "Var1=C" "Var1=D" .. ..$ Var2: chr "Var2=A" "Var2=B" "Var2=C" "Var2=D" > subset(z, !duplicated(z$key)) Var1 Var2 key 1 A A AA 2 B A AB 3 C A AC 4 D A AD 6 B B BB 7 C B BC 8 D B BD 11 C C CC 12 D C CD 16 D D DD > On Tue, Jul 5, 2011 at 7:16 PM, Q wrote: > Hello, > > I'm trying to create a data frame where each row has a unique combination of > factors. > > I start with a vector of species like so: > > > >> 1> test <- c("A","B","C","D") >> > >> 1> test >> > >> [1] "A" "B" "C" "D" >> > > To get all species combinations I have used expand.grid like this: > > > >> 1> pairs <- expand.grid(test,test) > >> 1> pairs > >> ? ?Var1 Var2 > >> 1 ? ? A ? ?A > >> 2 ? ? B ? ?A > >> 3 ? ? C ? ?A > >> 4 ? ? D ? ?A > >> 5 ? ? A ? ?B > >> 6 ? ? B ? ?B > >> 7 ? ? C ? ?B > >> 8 ? ? D ? ?B > >> 9 ? ? A ? ?C > >> 10 ? ?B ? ?C > >> 11 ? ?C ? ?C > >> 12 ? ?D ? ?C > >> 13 ? ?A ? ?D > >> 14 ? ?B ? ?D > >> 15 ? ?C ? ?D > >> 16 ? ?D ? ?D >> > > Now I want to select only the unique pairs, which I have tried to do with > the function "unique": > > > >> 1> unique(pairs) >> > > , but that doesn't do anything... I guess because it considers A,B to be > different from B,A. ?The data frame I would like to end up with should look > like this. > > > >> ? ?Var1 Var2 > >> 1 ? ? A ? ?A > >> 2 ? ? B ? ?A > >> 3 ? ? C ? ?A > >> 4 ? ? D ? ?A > >> 6 ? ? B ? ?B > >> 7 ? ? C ? ?B > >> 8 ? ? D ? ?B > >> 11 ? ?C ? ?C > >> 12 ? ?D ? ?C > >> 16 ? ?D ? ?D > >> > > Thanks for your help! > > Q > > -- > View this message in context: http://r.789695.n4.nabble.com/Create-a-data-frame-of-all-possible-unique-combinations-of-factors-tp3647338p3647338.html > Sent from the R help mailing list archive at Nabble.com. > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From dwinsemius at comcast.net Wed Jul 6 02:08:20 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Tue, 5 Jul 2011 20:08:20 -0400 Subject: [R] Create a data frame of all possible unique combinations of factors In-Reply-To: <1309907798308-3647338.post@n4.nabble.com> References: <1309907798308-3647338.post@n4.nabble.com> Message-ID: On Jul 5, 2011, at 7:16 PM, Q wrote: > Hello, > > I'm trying to create a data frame where each row has a unique > combination of > factors. > I start with a vector of species like so: > >> 1> test <- c("A","B","C","D") >> 1> test >> [1] "A" "B" "C" "D" >> > To get all species combinations I have used expand.grid like this: > >> 1> pairs <- expand.grid(test,test) > >> 1> pairs > >> Var1 Var2 > >> 1 A A >> 2 B A >> 3 C A >> 4 D A >> 5 A B > snipped > Now I want to select only the unique pairs, which I have tried to do > with > the function "unique": > > > >> 1> unique(pairs) You want the duplicated function or more precisely its negation, ... and you need to sort within rows if yoy want (b,a) to look like (a,b). >> > > , but that doesn't do anything... I guess because it considers A,B > to be > different from B,A. The data frame I would like to end up with > should look > like this. > pairs[!duplicated(t(apply(pairs, 1, sort))), ] Var1 Var2 1 A A 2 B A 3 C A 4 D A 6 B B 7 C B 8 D B 11 C C 12 D C 16 D D The t( ) is needed to get the row-wise arrangement restore after apply transposed it. > [[alternative HTML version deleted]] Nabble allows posting in plain text. You should. > PLease ---> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html -- David Winsemius, MD West Hartford, CT From djmuser at gmail.com Wed Jul 6 02:29:18 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Tue, 5 Jul 2011 17:29:18 -0700 Subject: [R] plotting survival curves (multiple curves on single graph) In-Reply-To: References: Message-ID: Hi: Here's one way to put all the plots in one graph using ggplot2 and a couple of tricks using the plyr package. You could take the data frame I generate below and use it as input to lattice graphics if you prefer. For groupwise plots, either as an ensemble or as separate panels, these packages are usually less work than an equivalent base graphic. OTOH, if you want a graphic to be 'just so', it may be better to use base graphics since both ggplot2 and lattice make certain design decisions that are not always trivial to override. I concur with Bert's comments re lattice and ggplot2, though. The first step is to produce a suitable data frame. To generate the Siler curves for each group, one can use the mapply() function in base R or the mdply() function in the plyr package. I chose the latter, but some post-processing is necessary. Once the curves are obtained, the next step is to generate a data frame that contains the curves which will subsequently be used as input to render the curves by group. library(ggplot2) # also loads package plyr time <- 0L:90L # Function to generate a Siler survival curve siler <- function(a1, b1, a2, a3, b3) { exp(-a1/b1 * (1 - exp(-b1 * time))- a2 * time + a3/b3 * (1 - exp(b3 * time))) } # See explanation below silerCurves <- as.vector(t(mdply(hazanth[, 3:7], siler)[, 6:96])) # Generate a data frame with the populations, times and curve values silerdat <- data.frame(pop = rep(hazanth[['pop']], each = length(time)), time = rep(time, nrow(hazanth)), surv = silerCurves) # Basic form of graph: ggplot(silerdat, aes(x = time, y = surv, colour = pop)) + geom_line(size = 1) + xlab('Years') + ylab('S(t)') To get the Siler curves, I used that part of the hazanth data frame containing the parameters that determine a curve. The mdply() function operates rowwise, and substitutes the values of the parameters into the calling function (siler) to generate each curve. (The order of the parameters in the input data frame have to be exactly the same as those in the calling function.) The result is a 6 x 96 data frame, but the first five columns are the input parameters which are no longer needed; moreover, we want to transpose the result (t) and then stack the columns one below the other (i.e., the first set of curve values is on top, the last on the bottom), which is what as.vector() does. The new data frame silerdat then associates a population with each curve and matches the times corresponding to the curve values. This data frame is then input into ggplot(). If ggplot2 is new to you, the basic idea is that a plot is assembled incrementally in 'layers'. The '+' sign indicates that a new layer is being added to the existing plot. 'Geoms' represent different geometric elements of a plot; in this case, geom_smooth() is a reasonable choice to fit a smooth curve through the (time, surv) pairs. Color is an 'aesthetic' that is used here to distinguish among the various populations. A default legend is produced with the plot, which can be optionally repositioned or modified. The on-line help pages for ggplot2 are found at http://had.co.nz/ggplot2/ Scroll down to 'Reference manual'. A freely available chapter of the ggplot2 book is a useful way to get started with the package; see the 'ggplot book' heading on the same page and click on the book website to gain access to it. Similar graphs using the lattice package are obtained as follows, with a couple of choices of legend: # Legend inside the plot region mykey <- list( corner = c(1, 1), title = 'Population', cex.title = 1.1, text = list(levels(silerdat$pop), cex = 0.8), lines = list(col = 1:6, lwd = 2)) xyplot(surv ~ time, data = silerdat, groups = pop, type = 'l', lwd = 2, col.line = 1:6, key = mykey, xlab = 'Years', ylab = 'S(t)') # Legend outside the plot region (similar to the ggplot() above) mykey2 <- list( space = 'right', title = 'Population', cex.title = 1.1, text = list(levels(silerdat$pop), cex = 0.8), lines = list(col = 1:6, lwd = 2)) xyplot(surv ~ time, data = silerdat, groups = pop, type = 'l', lwd = 2, col.line = 1:6, key = mykey2, xlab = 'Years', ylab = 'S(t)') HTH, Dennis On Tue, Jul 5, 2011 at 3:08 PM, Trey Batey wrote: > Hello. > > This is a follow-up to a question I posted last week. ?With some > previous suggestions from the R-help community, I have been able to > plot survival (, hazard, and density) curves using published data for > Siler hazard parameters from a number of ethnographic populations. > Can the function below be modified, perhaps with a "for" statement, so > that multiple curves (different line types---one for each population) > are plotted on a single graph for comparison? ?Thanks so much. > > --Trey > > The function and calls below use the data in this Excel file (feel > free to access): > https://docs.google.com/leaf?id=0B5zZGW2utJN0ZDk1NjA0ZjUtMWU0ZS00ZGQ3LWIxZTUtOWE0NGVmYWMxODJl&hl=en_US > > ## - plot Siler survival curve > ############################## > silsurv<-function(a1,b1,a2,a3,b3) > ?{ > ? ?sil=function(t) > ? ? ?{ > ? ? ? ?h.t<-a1*exp(-b1*t)+a2+a3*exp(b3*t) > ? ? ? ?S.t<-exp(-a1/b1*(1-exp(-b1*t))-a2*t+a3/b3*(1-exp(b3*t))) > ? ? ? ?d.t<-S.t*h.t > > ? ? ? ?#return(d.t) > ? ? ? ?return(S.t) > ? ? ? ?#return(h.t) > ? ? ?} > ? ?t<-seq(0,90,1) > plot(t,sil(t),ylim=c(0,1),type='l',cex.lab=0.8,cex.axis=0.75,ylab='S(t)',xlab='Age > (years)') > ?} > > with(hazanth[1,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[1,1],cex.main=0.9) > ?# plot for Hadza > with(hazanth[2,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[2,1],cex.main=0.9) > ?# plot for Ache > with(hazanth[3,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[3,1],cex.main=0.9) > ?# plot for Hiwi > with(hazanth[4,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[4,1],cex.main=0.9) > ?# plot for !Kung > with(hazanth[5,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[5,1],cex.main=0.9) > ?# plot for Yanomamo > with(hazanth[6,3:7],silsurv(a1=a1,b1=b1,a2=a2,a3=a3,b3=b3));title(main=hazanth[6,1],cex.main=0.9) > ?# plot for Tsimane > > ############################### > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jdnewmil at dcn.davis.ca.us Wed Jul 6 02:41:02 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Tue, 05 Jul 2011 17:41:02 -0700 Subject: [R] =?utf-8?q?Create_a_data_frame_of_all_possible_unique_combinat?= =?utf-8?q?ions_of=09factors?= In-Reply-To: <1309907798308-3647338.post@n4.nabble.com> References: <1309907798308-3647338.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jdnewmil at dcn.davis.ca.us Wed Jul 6 03:12:31 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Tue, 05 Jul 2011 18:12:31 -0700 Subject: [R] Tables and merge In-Reply-To: <8ACAAC0A4F0B43568802DAE967A4B6EE@ccePC> References: <8ACAAC0A4F0B43568802DAE967A4B6EE@ccePC> Message-ID: <94699d18-251d-46a5-80c4-6c32028e5ffd@email.android.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From xy11 at caa.columbia.edu Wed Jul 6 03:28:03 2011 From: xy11 at caa.columbia.edu (Xiao Yang) Date: Tue, 5 Jul 2011 21:28:03 -0400 Subject: [R] arma estimated return Message-ID: Hi I am new to time series analysis using R. does anyone know what the estimated long term average of return means. I am doing an arma model fitting of exchange rates, and the question I have been asked is to estimate the long term average for the returns. Is this same as the intercept term? I model the log of the exchange rate as a ma(1) and got 0 for the intercept. any help will be great thanks P From rolf.turner at xtra.co.nz Wed Jul 6 03:40:14 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Wed, 06 Jul 2011 13:40:14 +1200 Subject: [R] arma estimated return In-Reply-To: References: Message-ID: <4E13BCFE.7040007@xtra.co.nz> On 06/07/11 13:28, Xiao Yang wrote: > Hi > > I am new to time series analysis using R. does anyone know what the > estimated long term average of return means. I am doing an arma model > fitting of exchange rates, and the question I have been asked is to > estimate the long term average for the returns. Is this same as the > intercept term? I model the log of the exchange rate as a ma(1) and > got 0 for the intercept. > > any help will be great If this is a homework problem, ask your instructor. cheers, Rolf Turner From walmeszeviani at gmail.com Wed Jul 6 03:56:58 2011 From: walmeszeviani at gmail.com (Walmes Zeviani) Date: Tue, 5 Jul 2011 22:56:58 -0300 Subject: [R] Tables and merge In-Reply-To: References: <8ACAAC0A4F0B43568802DAE967A4B6EE@ccePC> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From jtlutz at bsu.edu Wed Jul 6 02:49:37 2011 From: jtlutz at bsu.edu (Lutz, Jacob T.) Date: Tue, 5 Jul 2011 20:49:37 -0400 Subject: [R] Retaining ID # with factor.scores procedure Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From n.bowora at gmail.com Wed Jul 6 04:07:06 2011 From: n.bowora at gmail.com (EdBo) Date: Tue, 5 Jul 2011 19:07:06 -0700 (PDT) Subject: [R] loop in optim In-Reply-To: References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> Message-ID: <1309918026000-3647549.post@n4.nabble.com> Hi Josh I have run the code and the structure of the output is what I wanted. However, the code is giving an identical result for all runs. I have attached the code I ran below as well as the output. I have just changed number of runs to match with the size of the data. a=read.table("D:/hope.txt",header=T) attach(a) a #likilihood function llik = function(x) { al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4] sum(na.rm=T, ifelse(a$R_j< 0, -log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, ifelse(a$R_j>0 , -log(1/(2*pi*(sigma_j^2)))- (1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, -log(ifelse (( pnorm (au_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- pnorm(al_j, mean=b_j * a$R_m, sd=sqrt (sigma_j^2) )) > 0, (pnorm (au_j,mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- pnorm(al_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2) )), 1)) )) ) } start.par = c(-0.01,0.01,0.1,1) #looping now runs=133/20+1 #total data points divided by number od days in each quater+1 out <- matrix(NA, nrow = runs, ncol = 4, dimnames = list(paste("Quater:", 1:runs, sep = ''), c("al_j", "au_j", "sigma_j", "b_j"))) for (i in 1:runs) { a[seq(20 * (i - 1) +1, 20 * i), ] out[i, ] <- optim(llik, par = start.par, method = "Nelder-Mead")[[1]] } out #results I am getting > out al_j au_j sigma_j b_j Quater:1 0.04001525 0.06006251 -7.171336e-25 1.049982 Quater:2 0.04001525 0.06006251 -7.171336e-25 1.049982 Quater:3 0.04001525 0.06006251 -7.171336e-25 1.049982 Quater:4 0.04001525 0.06006251 -7.171336e-25 1.049982 Quater:5 0.04001525 0.06006251 -7.171336e-25 1.049982 Quater:6 0.04001525 0.06006251 -7.171336e-25 1.049982 Quater:7 0.04001525 0.06006251 -7.171336e-25 1.049982 > -- View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3647549.html Sent from the R help mailing list archive at Nabble.com. From Yang.Lu at williams.edu Wed Jul 6 02:46:52 2011 From: Yang.Lu at williams.edu (Yang Lu) Date: Tue, 5 Jul 2011 20:46:52 -0400 (EDT) Subject: [R] how to best present concentrated data points/ ggplot2 Message-ID: <201107060046.001791@miram7700b.williams.edu> Hi all, I am trying to plot a weighted density plot for two different types and want to show the data points on the x axis. The code is as follows. The data points are very concentrated. Is there a better way to present it( should I set the alpha value or something else)? Thanks! YL library(ggplot2) x <- rnorm(10000) a <- rnorm(5000) b <- rnorm(5000) weights.x <- abs(a/sum(a)) weights.y <- abs(b/sum(b)) weight <- c(weights.x, weights.y) ze <- rep(0,10000) type <- c(rep("a",5000), rep("b",5000)) d <- data.frame(expo = x, weight = weight, type = type, ze = ze) m <- ggplot(d, aes(x = expo, group = type, col = type, weight = weight)) m+geom_density()+geom_point(aes(x = expo, y = ze, shape = type)) From quagaars at gmail.com Wed Jul 6 02:13:10 2011 From: quagaars at gmail.com (Q) Date: Tue, 5 Jul 2011 17:13:10 -0700 (PDT) Subject: [R] Create a data frame of all possible unique combinations of factors In-Reply-To: References: <1309907798308-3647338.post@n4.nabble.com> Message-ID: <1309911190946-3647415.post@n4.nabble.com> Ah! I like the idea. Thanks! jholtman wrote: > > Missed that you wanted to elim duplicated: > >> z <- expand.grid(test,test) >> # add 'unique' key >> z$key <- apply(z, 1, function(x)paste(sort(x), collapse='')) >> str(z) > 'data.frame': 16 obs. of 3 variables: > $ Var1: Factor w/ 4 levels "A","B","C","D": 1 2 3 4 1 2 3 4 1 2 ... > $ Var2: Factor w/ 4 levels "A","B","C","D": 1 1 1 1 2 2 2 2 3 3 ... > $ key : chr "AA" "AB" "AC" "AD" ... > - attr(*, "out.attrs")=List of 2 > ..$ dim : int 4 4 > ..$ dimnames:List of 2 > .. ..$ Var1: chr "Var1=A" "Var1=B" "Var1=C" "Var1=D" > .. ..$ Var2: chr "Var2=A" "Var2=B" "Var2=C" "Var2=D" > -- View this message in context: http://r.789695.n4.nabble.com/Create-a-data-frame-of-all-possible-unique-combinations-of-factors-tp3647338p3647415.html Sent from the R help mailing list archive at Nabble.com. From jwiley.psych at gmail.com Wed Jul 6 05:26:54 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 5 Jul 2011 20:26:54 -0700 Subject: [R] loop in optim In-Reply-To: <1309918026000-3647549.post@n4.nabble.com> References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> <1309918026000-3647549.post@n4.nabble.com> Message-ID: On Tue, Jul 5, 2011 at 7:07 PM, EdBo wrote: > Hi Josh > > I have run the code and the structure of the output is what I wanted. > However, the code is giving an identical result for all runs. Right because the object "a" stays the same for all runs. > > I have attached the code I ran below as well as the output. I have just > changed number of runs to match with the size of the data. > > a=read.table("D:/hope.txt",header=T) > attach(a) > a > ?#likilihood function > llik = function(x) > ? ?{ > ? ? al_j=x[1]; au_j=x[2]; sigma_j=x[3]; ?b_j=x[4] > ? ? sum(na.rm=T, > ? ? ? ? ifelse(a$R_j< 0, -log(1/(2*pi*(sigma_j^2)))- > ? ? ? ? ? ? ? ? ? ? ? ? ? ?(1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, > > ? ? ? ? ?ifelse(a$R_j>0 , -log(1/(2*pi*(sigma_j^2)))- > ? ? ? ? ? ? ? ? ? ? ? ? ? ?(1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-log(ifelse (( pnorm (au_j, mean=b_j * a$R_m, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?sd= sqrt(sigma_j^2))- > ? ? ? ? ? ? ? ? ? ? ? ? ?pnorm(al_j, mean=b_j * a$R_m, sd=sqrt (sigma_j^2) > )) > 0, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(pnorm (au_j,mean=b_j * a$R_m, sd= sqrt(sigma_j^2))- > ? ? ? ? ? ? ? ? ? ? ? ? ? ?pnorm(al_j, mean=b_j * a$R_m, sd= > sqrt(sigma_j^2) )), > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1)) )) > ? ? ? ) > ? ?} > start.par = c(-0.01,0.01,0.1,1) > #looping now > runs=133/20+1 #total data points divided by number od days in each quater+1 > out <- matrix(NA, nrow = runs, ncol = 4, > ?dimnames = list(paste("Quater:", 1:runs, sep = ''), > ?c("al_j", "au_j", "sigma_j", "b_j"))) > > ?for (i in 1:runs) { > ? a[seq(20 * (i - 1) +1, 20 * i), ] note that this is not what my original code does. In my code, I stored the full dataset in an object called "afull", then the object "a" is assigned as a subect of the rows from afull. Since the likelihood function is coded to reference "a", as "a" changes, the estimates change. subsetting without assigning the output anywhere does not actually change "a", so the likelihood function references the full dataset. Also, the way the funtion is written, there is no need to attach "a", and this can be rather dangerous when you are making changes because when you attach an object, a copy is created but that is not updated with assignment, so for example: dat <- data.frame(x = 1:10) attach(dat) # look at "x" from attached data frame x # now overwrite x in the data frame dat$x <- rnorm(10) # compare dat$x x these are no longer the same even though you may be expecting them to be the same. To do that you would need to: ## remove copies detach(dat) ## re-attach() so it now includes the updated version attach(dat) x > ? out[i, ] <- optim(llik, par = start.par, method = "Nelder-Mead")[[1]] > ?} > out > #results I am getting >> out > ? ? ? ? ? ? ? al_j ? ? ? au_j ? ? ? sigma_j ? ? ?b_j > Quater:1 0.04001525 0.06006251 -7.171336e-25 1.049982 > Quater:2 0.04001525 0.06006251 -7.171336e-25 1.049982 > Quater:3 0.04001525 0.06006251 -7.171336e-25 1.049982 > Quater:4 0.04001525 0.06006251 -7.171336e-25 1.049982 > Quater:5 0.04001525 0.06006251 -7.171336e-25 1.049982 > Quater:6 0.04001525 0.06006251 -7.171336e-25 1.049982 > Quater:7 0.04001525 0.06006251 -7.171336e-25 1.049982 >> > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3647549.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From izahn at psych.rochester.edu Wed Jul 6 05:45:37 2011 From: izahn at psych.rochester.edu (Ista Zahn) Date: Tue, 5 Jul 2011 23:45:37 -0400 Subject: [R] how to best present concentrated data points/ ggplot2 In-Reply-To: <201107060046.001791@miram7700b.williams.edu> References: <201107060046.001791@miram7700b.williams.edu> Message-ID: Hi Yang, Strategies for dealing with overplotting include transparency, size, and jittering. In your example you'll probably need all three. m + geom_point(aes(x = expo, y = ze, shape = type), size = 1, alpha = .2, position = position_jitter(width = 0, height = 5)) + geom_density() seems to work OK. Best, Ista On Tue, Jul 5, 2011 at 8:46 PM, Yang Lu wrote: > > Hi all, > > I am trying to plot a weighted density plot for two different types and want to show the data points on the x axis. > > The code is as follows. The data points are very concentrated. Is there a better way to present it( should I set the alpha value or something else)? > > Thanks! > > YL > > library(ggplot2) > > x <- rnorm(10000) > > a <- rnorm(5000) > > b <- rnorm(5000) > > weights.x <- abs(a/sum(a)) > > weights.y <- abs(b/sum(b)) > > weight <- c(weights.x, weights.y) > > ze <- rep(0,10000) > > type <- c(rep("a",5000), rep("b",5000)) > > d <- data.frame(expo = x, weight = weight, type = type, ze = ze) > > m <- ggplot(d, aes(x = expo, group = type, col = type, weight = weight)) > > m+geom_density()+geom_point(aes(x = expo, y = ze, shape = type)) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org From jwiley.psych at gmail.com Wed Jul 6 06:10:56 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Tue, 5 Jul 2011 21:10:56 -0700 Subject: [R] how to best present concentrated data points/ ggplot2 In-Reply-To: <201107060046.001791@miram7700b.williams.edu> References: <201107060046.001791@miram7700b.williams.edu> Message-ID: Hi Yang, I would take a slightly different approach and use what Wilkinson calls stripe density plots. The idea is that if you are trying to show a univariate density on dimension 1 with many overlapping or extremely close observations, space on dimension 1 is precious, in two dimensions, space on dimension 2 is abundant. Rather than use things like circles or squares which take up equal space on dims 1 & 2, use something that takes up little space on dim 1, of course for human perception, you want your plot to be visible, so extend the space used on dimension two. What I just described (in probably the most obfuscated possible way) are lines. Also, colour is sufficient to distinguish different types so I did not bother with different line types. Here is an example: library(ggplot2) set.seed(10) x <- rnorm(10000) a <- rnorm(5000) b <- rnorm(5000) weights.x <- abs(a/sum(a)) weights.y <- abs(b/sum(b)) weight <- c(weights.x, weights.y) type <- c(rep("a", 5000), rep("b", 5000)) ## make it so different types of points do not overlap ze <- c(rep(0, 5000), rep(-.5, 5000)) d <- data.frame(expo = x, weight = weight, type = type, ze = ze) m <- ggplot(d, aes(x = expo, group = type, col = type, weight = weight)) ## note, with this many observations and alpha, plot may be sloow m + geom_density() + geom_linerange(aes(x = expo, ymin = ze - .1, ymax = ze + .1), alpha = .25) HTH, Josh On Tue, Jul 5, 2011 at 5:46 PM, Yang Lu wrote: > Hi all, > > I am trying to plot a weighted density plot for two different types and want to show the data points on the x axis. > > The code is as follows. The data points are very concentrated. Is there a better way to present it( should I set the alpha value or something else)? > > Thanks! > > YL > > library(ggplot2) > > x <- rnorm(10000) > > a <- rnorm(5000) > > b <- rnorm(5000) > > weights.x <- abs(a/sum(a)) > > weights.y <- abs(b/sum(b)) > > weight <- c(weights.x, weights.y) > > ze <- rep(0,10000) > > type <- c(rep("a",5000), rep("b",5000)) > > d <- data.frame(expo = x, weight = weight, type = type, ze = ze) > > m <- ggplot(d, aes(x = expo, group = type, col = type, weight = weight)) > > m+geom_density()+geom_point(aes(x = expo, y = ze, shape = type)) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From laomeng.3 at gmail.com Wed Jul 6 06:26:38 2011 From: laomeng.3 at gmail.com (Lao Meng) Date: Wed, 6 Jul 2011 12:26:38 +0800 Subject: [R] How to compare ratio from multiple groups? Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From n.bowora at gmail.com Wed Jul 6 05:49:01 2011 From: n.bowora at gmail.com (EdBo) Date: Tue, 5 Jul 2011 20:49:01 -0700 (PDT) Subject: [R] loop in optim In-Reply-To: References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> <1309918026000-3647549.post@n4.nabble.com> Message-ID: <1973242757-1309924128-cardhu_decombobulator_blackberry.rim.net-958499837-@b5.c2.bise7.blackberry> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From knkanna87 at gmail.com Wed Jul 6 05:49:24 2011 From: knkanna87 at gmail.com (Nirmal Kanna) Date: Wed, 6 Jul 2011 09:19:24 +0530 Subject: [R] Leverage values in VGLM Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rolf.turner at xtra.co.nz Wed Jul 6 08:42:03 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Wed, 06 Jul 2011 18:42:03 +1200 Subject: [R] How to compare ratio from multiple groups? In-Reply-To: References: Message-ID: <4E1403BB.4020506@xtra.co.nz> On 06/07/11 16:26, Lao Meng wrote: > If I have 3 groups,and for each group,I get the ratio(e.g. incidence rate). > Now I wanna For crying out loud!!! Repeat after me: It's ***not*** ``wanna'', it's ***want to*** .... ***want to*** .... ***want to***!!!!!!! The word ``wanna'' is a colloquial contraction of ``want to''. It is acceptable in *very* casual oral communication. It is ***NEVER*** acceptable in written communication, except in the context of accurately rendering (verbatim) an oral communication (or perhaps for the purpose of humour). cheers, Rolf Turner > compare 3 ratio pairwise,and get the corresponding p values,i.e: > group1 vs group2 ,p value=? > group1 vs group3 ,p value=? > group2 vs group3 ,p value=? > > Which statistical test should be used? > > Thanks a lot for your help. From jim.silverton at gmail.com Wed Jul 6 09:11:34 2011 From: jim.silverton at gmail.com (Jim Silverton) Date: Wed, 6 Jul 2011 03:11:34 -0400 Subject: [R] Probability calculation.... Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From pmaclean2011 at yahoo.com Wed Jul 6 09:25:19 2011 From: pmaclean2011 at yahoo.com (Peter Maclean) Date: Wed, 6 Jul 2011 00:25:19 -0700 (PDT) Subject: [R] BY GROUP in evir R package In-Reply-To: <4DE939D8.8040305@pfaffikus.de> References: <31464.80001.qm@web121719.mail.ne1.yahoo.com> <4DE939D8.8040305@pfaffikus.de> Message-ID: <1309937119.92702.YahooMailRC@web121711.mail.ne1.yahoo.com> Dr.?Pfaff: How do we pass the "by" results to "rlevel.gev" function to get the?return level and also save the results (both rg2(par.ests and $par.ses) and rl) as.data.frame? #Grouped vector Gdata <- data.frame(n = rep(c(1,2,3), each = 100), y = rnorm(300)) library(evir) require(plyr) #Model for Grouped rg2<- by(Gdata,Gdata[,"n"], function(x) gev(x$y, 5, method = "BFGS", control =list(maxit = 500))) # rl <- rlevel.gev(rg2, k.blocks = 5, add = TRUE) ? ----- Original Message ---- From: Dr. Bernhard Pfaff To: Peter Maclean Sent: Fri, June 3, 2011 2:45:28 PM Subject: Re: BY GROUP in evir R package Hello Peter, many thanks for your email. Well, as you might have guessed, there is also a function by() in R that does the same job. See help("by") for more information. Best, Bernhard Peter Maclean schrieb: > Hi, > I am new in R and I want to use your package for data analysis. I usually use >SAS. I have rainfall data for different points. Each point has 120 observations. >The rainfall data is in the first column (RAIN) and the categorical variable >that group the data is in the second column (GROUP). The data frame is >rain.data. How can I use the gev function to estimate all three parameters by >GROUP variable group? In SAS there is a by() function that estimate the model by >group. However, I would like to move to R. >? With thanks, >? Peter Maclean > Department of Economics > University of Dar -es- Salaam, Tanzania > >? From mathew.brown at forst.uni-goettingen.de Wed Jul 6 10:09:35 2011 From: mathew.brown at forst.uni-goettingen.de (mathew brown) Date: Wed, 6 Jul 2011 10:09:35 +0200 Subject: [R] permil symbol linux Message-ID: <20110706100935.35b6bfb3.mathew.brown@forst.uni-goettingen.de> Hi, I'm trying to figure out how to make a plot with ylab showing the permil symbol. Anyone know how to do this? Thanks -- From pmaclean2011 at yahoo.com Wed Jul 6 10:29:01 2011 From: pmaclean2011 at yahoo.com (Peter Maclean) Date: Wed, 6 Jul 2011 01:29:01 -0700 (PDT) Subject: [R] Saving fExtremes estimates and k-block return level with confidence intervals. In-Reply-To: References: <1309407375.93675.YahooMailRC@web121703.mail.ne1.yahoo.com> Message-ID: <1309940941.99895.YahooMailRC@web121710.mail.ne1.yahoo.com> Hi: I am trying to compare the results of "evir" and "fExtreme" packages. I?could not figure out how to save the "evir" package results.?Also, how to pass the results to?"fExtreme" function "gevrlevelPlot" and?"evir" function "rlevel.gev" to get the return levels. I just need the?values and not graphs. ? #Example library(fExtremes) library(evir) require(plyr) y<- data.frame(rgev(300, xi = 0.1, mu = .5, sigma = 1.6)) colnames(y) <- c("y") n <- data.frame(n = rep(c(1,2,3), each = 100)) z <- cbind(n,y) y <- z$y # Model for grouped data ##fExtremes package z1 <- split(z,z$n) rgf <- lapply(z1, function(x){ ????????????? m <- as.numeric(x$y) ????????????? gevFit(m, block = 2, type = "mle") ?}) #Save results resf<- ldply(rgf, function(x) x at fit$par.ests) #Qs: How to transfer rge object to "gevrlevelPlot" function to get the values? ? ##evir package rge<- by(z,z[,"n"], function(x) gev(x$y, 2, method = "BFGS", control =list(maxit = 10000)))? ? #Qs:How to save par.ests and?par.ses? #Qs:How to transfer rge object to "rlevel.gev" function to get the values??????????? ? #Model for single vector rlf <- gevFit(y, block = 2, type = "mle") rlfp <- gevrlevelPlot(rlf, kBlocks = 2) rlfp rle <- gev(y, 2, method = "BFGS", control =list(maxit = 10000)) rlep<- rlevel.gev(rle, k.blocks = 2, add = FALSE) rlep ----- Original Message ---- From: Dennis Murphy To: Peter Maclean Sent: Thu, June 30, 2011 3:03:05 PM Subject: Re: [R] Saving fExtremes estimates and k-block return level with confidence intervals. Hi: The plyr package can help you out as well. As far as the estimates go, library(plyr) > ldply(res2, function(x) x at fit$par.ests) ? .id? ? ? ? xi? ? ? ? mu? ? ? beta 1? 1 0.1033614 2.5389580 0.9092611 2? 2 0.3401922 0.5192882 1.5290615 3? 3 0.5130798 0.5668308 1.2105666 You could also look into the l_ply() function, which takes a list as input and outputs nothing; however, it is usually called if one want to make a series of similar plots. You need to write a function that takes a generic list component and outputs a plot. Here's a fairly trivial example: testdf <- data.frame(gp = rep(LETTERS[1:3], each = 10), x = 1:10, ? ? ? ? ? ? ? ? ? ? ? y = rnorm(30)) l_ply(split(testdf, testdf$gp), function(df) plot(y ~ x, data = df)) Alternatively, # Observe that a data frame is used as input pfun <- function(df) plot(y ~ x, data = df) l_ply(split(testdf, testdf$gp), pfun) In this case, l_ply() plots y vs. x in each of the three subgroups of testdf separately. It appears you want to do a similar thing, but with the list of model outputs instead; in your case, the function you write should take a list as its sole argument. Any slots or components that are accessed are then relative to the input list object - see the anonymous function I wrote for the output data frame above as an illustration. Since it appears that gevrlevelPlot() also outputs a vector, you may want to write a function that returns the output vector as a data frame and call ldply() instead. I don't know enough about the fExtremes package to write this myself, but it should be possible to write a function for a generic model that generates both a plot and an output data frame. The plyr package is pretty good at this sort of thing. HTH, Dennis On Wed, Jun 29, 2011 at 9:16 PM, Peter Maclean wrote: > I am estimating a large model by groups. How do you save the results >and?returns > the associated quantiles? > For this example I need a data frame > n?? ?xi??????? mu????????beta > 1?? 0.1033614? 2.5389580 0.9092611 > 2? ?0.3401922? 0.5192882 1.5290615 > 3?? 0.5130798? 0.5668308 1.2105666 > I also want to apply gevrlevelPlot() for each "n" or group. > > #Example > n <- c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3,3) > y <- c(2,3,2,3,4,5,6,1,0,0,0,6, 2, 1, 0, 0,9,3) > z <- as.data.frame(cbind(n,y)) > colnames(z) <- c("n","y") > library(fExtremes) > z <- split(z, z$n) > res2 <-lapply(z, function(x){ > ?????????????? m <- as.numeric(x$y) > ?????????????? gevFit(m, block = 1, type = c("pwm")) > ??????????????? }) >> res2 > $`1` > Title: > ?GEV Parameter Estimation > Call: > ?gevFit(x = m, block = 1, type = c("pwm")) > Estimation Type: > ? gev pwm > Estimated Parameters: > ?????? xi??????? mu????? beta > 0.1033614 2.5389580 0.9092611 > Description > ? Wed Jun 29 23:07:48 2011 > > $`2` > Title: > ?GEV Parameter Estimation > Call: > ?gevFit(x = m, block = 1, type = c("pwm")) > Estimation Type: > ? gev pwm > Estimated Parameters: > ?????? xi??????? mu????? beta > 0.3401922 0.5192882 1.5290615 > Description > ? Wed Jun 29 23:07:48 2011 > > $`3` > Title: > ?GEV Parameter Estimation > Call: > ?gevFit(x = m, block = 1, type = c("pwm")) > Estimation Type: > ? gev pwm > Estimated Parameters: > ?????? xi??????? mu????? beta > 0.5130798 0.5668308 1.2105666 > Description > ? Wed Jun 29 23:07:48 2011 > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From ripley at stats.ox.ac.uk Wed Jul 6 10:45:49 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 6 Jul 2011 09:45:49 +0100 (BST) Subject: [R] permil symbol linux In-Reply-To: <20110706100935.35b6bfb3.mathew.brown@forst.uni-goettingen.de> References: <20110706100935.35b6bfb3.mathew.brown@forst.uni-goettingen.de> Message-ID: On Wed, 6 Jul 2011, mathew brown wrote: > Hi, > I'm trying to figure out how to make a plot with ylab showing the > permil symbol. Anyone know how to do this? Yes. Now, it you would follow the posting guide and give the 'at a minimum' information you were asked for, and the graphics device you want to use, we might be able to tell you more precisely how. Hint: that symbol is "\u0089", and also in latin-1. > Thanks > > -- > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From inpost at gmail.com Wed Jul 6 11:11:36 2011 From: inpost at gmail.com (e-letter) Date: Wed, 6 Jul 2011 10:11:36 +0100 Subject: [R] knapsack problem limitation Message-ID: Readers, Attempting to solve the knapsack problem (e.g. see: http://rosettacode.org/wiki/Knapsack_problem/Unbounded), the following error occurred: ... result would be too long a vector Is this indicative that R is not suitable to solve this problem when combinations are large? Is there a known limit? Is there an alternative, or better to use a different program? Thanks. From mathew.brown at forst.uni-goettingen.de Wed Jul 6 11:37:10 2011 From: mathew.brown at forst.uni-goettingen.de (mathew brown) Date: Wed, 6 Jul 2011 11:37:10 +0200 Subject: [R] permil symbol linux In-Reply-To: References: <20110706100935.35b6bfb3.mathew.brown@forst.uni-goettingen.de> Message-ID: <20110706113710.92fcd93f.mathew.brown@forst.uni-goettingen.de> Good point. Here is the code plot(dat$timestamp,dat$delta_18_16, ylab="", xlab="(min)", tck=0.05, col="blue") mtext(side=2, line=1.5, expression(""*delta*""^18*"O [\u0089]"), cex=1, adj=0.5) I'm still not sure how to get your code to work. thanks On Wed, 6 Jul 2011 09:45:49 +0100 (BST) Prof Brian Ripley wrote: > On Wed, 6 Jul 2011, mathew brown wrote: > > > Hi, > > > I'm trying to figure out how to make a plot with ylab showing the > > permil symbol. Anyone know how to do this? > > Yes. > > Now, it you would follow the posting guide and give the 'at a minimum' > information you were asked for, and the graphics device you want to > use, we might be able to tell you more precisely how. > > Hint: that symbol is "\u0089", and also in latin-1. > > > Thanks > > > > -- > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 -- Mathew Brown Institute of Bioclimatology University of G?ttingen B?sgenweg 2 37077 G?ttingen, Germany t: +49 551 39 9359 mathew.brown at forst.uni-goettingen.de From bhh at xs4all.nl Wed Jul 6 11:46:17 2011 From: bhh at xs4all.nl (Berend Hasselman) Date: Wed, 6 Jul 2011 02:46:17 -0700 (PDT) Subject: [R] loop in optim In-Reply-To: <1973242757-1309924128-cardhu_decombobulator_blackberry.rim.net-958499837-@b5.c2.bise7.blackberry> References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> <1309918026000-3647549.post@n4.nabble.com> <1973242757-1309924128-cardhu_decombobulator_blackberry.rim.net-958499837-@b5.c2.bise7.blackberry> Message-ID: <1309945577050-3648171.post@n4.nabble.com> EdBo wrote: > > You are right Joshua. > > I changed the code because I failed to understand how you attached the > full data set. How you made the data part of your code. > > I am new to R so I am used to one way of attaching data(the way I redone > it). > You don't need to "attach" the data by using attach(). You read the data into an object afull and then select the part you need and store that in object a. BTW: shouldn't the for (i in 1:4) be for (i in 1:3) if I understand the original question correctly? Berend -- View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3648171.html Sent from the R help mailing list archive at Nabble.com. From statmailinglists at googlemail.com Wed Jul 6 12:06:43 2011 From: statmailinglists at googlemail.com (Paolo Rossi) Date: Wed, 6 Jul 2011 11:06:43 +0100 Subject: [R] Group Data indexed by n Variables Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From ripley at stats.ox.ac.uk Wed Jul 6 12:07:55 2011 From: ripley at stats.ox.ac.uk (Prof Brian Ripley) Date: Wed, 6 Jul 2011 11:07:55 +0100 (BST) Subject: [R] permil symbol linux In-Reply-To: <20110706113710.92fcd93f.mathew.brown@forst.uni-goettingen.de> References: <20110706100935.35b6bfb3.mathew.brown@forst.uni-goettingen.de> <20110706113710.92fcd93f.mathew.brown@forst.uni-goettingen.de> Message-ID: You still have not sent the information requested in the posting guide, and I do not know your target device nor locale, nor is that a reproducible example. Please learn some respect for the time of the helpers here (and for all the work that went into making this possible in R). On Wed, 6 Jul 2011, mathew brown wrote: > Good point. > > Here is the code > plot(dat$timestamp,dat$delta_18_16, ylab="", xlab="(min)", tck=0.05, col="blue") > mtext(side=2, line=1.5, expression(""*delta*""^18*"O [\u0089]"), cex=1, adj=0.5) > > I'm still not sure how to get your code to work. Unsurprising, as you have not seen *my* code. > thanks > > On Wed, 6 Jul 2011 09:45:49 +0100 (BST) > Prof Brian Ripley wrote: > >> On Wed, 6 Jul 2011, mathew brown wrote: >> >>> Hi, >> >>> I'm trying to figure out how to make a plot with ylab showing the >>> permil symbol. Anyone know how to do this? >> >> Yes. >> >> Now, it you would follow the posting guide and give the 'at a minimum' >> information you were asked for, and the graphics device you want to >> use, we might be able to tell you more precisely how. >> >> Hint: that symbol is "\u0089", and also in latin-1. >> >>> Thanks >>> >>> -- >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> -- >> Brian D. Ripley, ripley at stats.ox.ac.uk >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ >> University of Oxford, Tel: +44 1865 272861 (self) >> 1 South Parks Road, +44 1865 272866 (PA) >> Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > -- > Mathew Brown > Institute of Bioclimatology > University of G?ttingen > B?sgenweg 2 > 37077 G?ttingen, Germany > t: +49 551 39 9359 > mathew.brown at forst.uni-goettingen.de > -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 From annemarie.verkerk at mpi.nl Wed Jul 6 08:19:56 2011 From: annemarie.verkerk at mpi.nl (Annemarie Verkerk) Date: Wed, 06 Jul 2011 08:19:56 +0200 Subject: [R] gradient generation Message-ID: <4E13FE8C.2090602@mpi.nl> Dear R-help subscribers; I have a question regarding making gradients in R. I've searched on the web, but was only able to find functions that make a gradient between color X and Y, which is not what I want. I want to 'pick out' individual, smaller pieces of a gradient based on a range of numbers. Say that I have a range of numbers, leading from 0 to 1. Then I want 0 to refer to the brightest red, and 1 to the brightest blue, and all values in between refer to shades of purple. (So, 0.5 would be regular purple, 0.7 would be a quite bluish purple, etc.) Then, I want to be able pick out a gradient from the larger 0-1 gradient, say for instance the gradient between 0.25 and 0.35 and refer to this with an object name for further use. I'm not sure whether this is possible - maybe it would only be possible to define a range of individual R colors first that already form a gradient, and then 'pick out' the colors that you want to make the gradient. In any case, I'm sure one could do this somehow in R! Thanks for your help, Annemarie -- Annemarie Verkerk, MA Evolutionary Processes in Language and Culture (PhD student) Max Planck Institute for Psycholinguistics P.O. Box 310, 6500AH Nijmegen, The Netherlands +31 (0)24 3521 185 http://www.mpi.nl/research/research-projects/evolutionary-processes From Lee.Averell at newcastle.edu.au Wed Jul 6 07:41:57 2011 From: Lee.Averell at newcastle.edu.au (Lee Averell) Date: Wed, 06 Jul 2011 15:41:57 +1000 Subject: [R] matching items in a data frame Message-ID: <4E148381020000530002C0E5@WINDOMPRD00.newcastle.edu.au> I have a data frame with 2 columns, the first is an index of participants and the second is a list of words presented to the participant (see below). > head(dat) s word 1a pianist 1a sweat 1a carnage 1a nymph 1a hank 1a waist > tail(dat) s word 4a package 4a blink 4a orange 4a bedroom 4a curb 4a bowl Some of the words are presented to multiple participants others are not. I am trying to get an index of 1) which words are repeated and 2) how many times they are repeated. Any suggestions? Lee From shabegIIT at live.in Wed Jul 6 08:24:22 2011 From: shabegIIT at live.in (massa1234) Date: Tue, 5 Jul 2011 23:24:22 -0700 (PDT) Subject: [R] How to use "Update" for an object of clss GOGARCH Message-ID: <1309933462645-3647817.post@n4.nabble.com> Hi, I am using gogarch from the package "gogarch". i have 13 series. the univariate model for the one of the series has a statistically insignificant constant term (garch(1,1)). so I want to re fit the model assuming the constant term top be zero....can i use "update" method to do so....if not please help me out. regards -- View this message in context: http://r.789695.n4.nabble.com/How-to-use-Update-for-an-object-of-clss-GOGARCH-tp3647817p3647817.html Sent from the R help mailing list archive at Nabble.com. From S.Ellison at LGCGroup.com Wed Jul 6 11:13:53 2011 From: S.Ellison at LGCGroup.com (S Ellison) Date: Wed, 6 Jul 2011 10:13:53 +0100 Subject: [R] clustering based on most significant pvalues does not separate the groups! In-Reply-To: <1309803730461-3644249.post@n4.nabble.com> References: <1309803730461-3644249.post@n4.nabble.com> Message-ID: <98B156BB22D11342A931E823798D434853E9728042@GOLD.corp.lgc-group.com> t-tests and the like test for a difference in mean value, not for non-overlapping populations or data sets. The fact that the mean of one data set differs significantly from the mean of the other does not mean that the ranges of the individual points in each data set are disjoint. set.seed(1023) x<-rnorm(60, 10) y<-x+0.75 boxplot(x,y) #Lots of overlap for individual points t.test(x,y) #Strongly significant difference Does that correspond to your situation well enough to account for your puzzlement? S Ellison > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of pguilha > Sent: 04 July 2011 19:22 > To: r-help at r-project.org > Subject: [R] clustering based on most significant pvalues > does not separate the groups! > > Hi all, > > I have some microarray data on 40 samples that fall into two > groups. I have a value for 480k probes for each of those > samples. I performed a t test > (rowttests) on each row(giving the indices of the columns for > each group) then used p.adjust() to adjust the pvalues for > the number of tests performed. I then selected only the > probes with adj-p.value<=0.05. I end up with roughly 2000 > probes to do the clustering on but using pvclust, and hclust, > the samples do no split up into the two groups. I would have > imagined that using only those values that are significantly > different between the two groups, the clustering should > surely reflect that? > > Please, what am I missing!!!!??? > > Thanks! > > Paul > > PS: I am hoping I have just thought this through in the wrong > way and there is a simple explanation, but can provide the > code I am using for clustering if necessary! > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/clustering-based-on-most-signifi > cant-pvalues-does-not-separate-the-groups-tp3644249p3644249.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} From statmailinglists at googlemail.com Wed Jul 6 12:16:07 2011 From: statmailinglists at googlemail.com (Paolo Rossi) Date: Wed, 6 Jul 2011 11:16:07 +0100 Subject: [R] Group Data indexed by n Variables In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From mathew.brown at forst.uni-goettingen.de Wed Jul 6 12:24:28 2011 From: mathew.brown at forst.uni-goettingen.de (mathew brown) Date: Wed, 6 Jul 2011 12:24:28 +0200 Subject: [R] permil symbol linux In-Reply-To: References: <20110706100935.35b6bfb3.mathew.brown@forst.uni-goettingen.de> <20110706113710.92fcd93f.mathew.brown@forst.uni-goettingen.de> Message-ID: <20110706122428.3506b560.mathew.brown@forst.uni-goettingen.de> To all in R land, Here is what I would like to do. x = c(1:10) y = c(1:10) plot(x,y) mtext(side=2, line=1.5, expression(""*delta*""^18*"O [permill]"), cex=1, adj=0.5) That's it. Except I would like to replace "permill" with the symbol. Thanks for the help On Wed, 6 Jul 2011 11:07:55 +0100 (BST) Prof Brian Ripley wrote: > You still have not sent the information requested in the posting > guide, and I do not know your target device nor locale, nor is that a > reproducible example. > > Please learn some respect for the time of the helpers here (and for > all the work that went into making this possible in R). > > On Wed, 6 Jul 2011, mathew brown wrote: > > > Good point. > > > > Here is the code > > plot(dat$timestamp,dat$delta_18_16, ylab="", xlab="(min)", tck=0.05, col="blue") > > mtext(side=2, line=1.5, expression(""*delta*""^18*"O [\u0089]"), cex=1, adj=0.5) > > > > I'm still not sure how to get your code to work. > > Unsurprising, as you have not seen *my* code. > > > thanks > > > > On Wed, 6 Jul 2011 09:45:49 +0100 (BST) > > Prof Brian Ripley wrote: > > > >> On Wed, 6 Jul 2011, mathew brown wrote: > >> > >>> Hi, > >> > >>> I'm trying to figure out how to make a plot with ylab showing the > >>> permil symbol. Anyone know how to do this? > >> > >> Yes. > >> > >> Now, it you would follow the posting guide and give the 'at a minimum' > >> information you were asked for, and the graphics device you want to > >> use, we might be able to tell you more precisely how. > >> > >> Hint: that symbol is "\u0089", and also in latin-1. > >> > >>> Thanks > >>> > >>> -- > >>> > >>> ______________________________________________ > >>> R-help at r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >>> > >> > >> -- > >> Brian D. Ripley, ripley at stats.ox.ac.uk > >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > >> University of Oxford, Tel: +44 1865 272861 (self) > >> 1 South Parks Road, +44 1865 272866 (PA) > >> Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > > > -- > > Mathew Brown > > Institute of Bioclimatology > > University of G?ttingen > > B?sgenweg 2 > > 37077 G?ttingen, Germany > > t: +49 551 39 9359 > > mathew.brown at forst.uni-goettingen.de > > > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 -- Mathew Brown Institute of Bioclimatology University of G?ttingen B?sgenweg 2 37077 G?ttingen, Germany t: +49 551 39 9359 mathew.brown at forst.uni-goettingen.de From jholtman at gmail.com Wed Jul 6 12:25:27 2011 From: jholtman at gmail.com (Jim Holtman) Date: Wed, 6 Jul 2011 06:25:27 -0400 Subject: [R] matching items in a data frame In-Reply-To: <4E148381020000530002C0E5@WINDOMPRD00.newcastle.edu.au> References: <4E148381020000530002C0E5@WINDOMPRD00.newcastle.edu.au> Message-ID: <7D0EEFF8-CD57-45E2-BADC-853E1001615B@gmail.com> table(dat$word) Sent from my iPad On Jul 6, 2011, at 1:41, Lee Averell wrote: > I have a data frame with 2 columns, the first is an index of participants and the second is a list of words presented to the participant (see below). > >> head(dat) > s word > 1a pianist > 1a sweat > 1a carnage > 1a nymph > 1a hank > 1a waist > >> tail(dat) > s word > 4a package > 4a blink > 4a orange > 4a bedroom > 4a curb > 4a bowl > > Some of the words are presented to multiple participants others are not. I am trying to get an index of 1) which words are repeated and 2) how many times they are repeated. Any suggestions? > Lee > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jim at bitwrit.com.au Wed Jul 6 14:10:50 2011 From: jim at bitwrit.com.au (Jim Lemon) Date: Wed, 06 Jul 2011 22:10:50 +1000 Subject: [R] gradient generation In-Reply-To: <4E13FE8C.2090602@mpi.nl> References: <4E13FE8C.2090602@mpi.nl> Message-ID: <4E1450CA.4090104@bitwrit.com.au> On 07/06/2011 04:19 PM, Annemarie Verkerk wrote: > Dear R-help subscribers; > > I have a question regarding making gradients in R. I've searched on the > web, but was only able to find functions that make a gradient between > color X and Y, which is not what I want. > > I want to 'pick out' individual, smaller pieces of a gradient based on a > range of numbers. Say that I have a range of numbers, leading from 0 to > 1. Then I want 0 to refer to the brightest red, and 1 to the brightest > blue, and all values in between refer to shades of purple. (So, 0.5 > would be regular purple, 0.7 would be a quite bluish purple, etc.) > > Then, I want to be able pick out a gradient from the larger 0-1 > gradient, say for instance the gradient between 0.25 and 0.35 and refer > to this with an object name for further use. > > I'm not sure whether this is possible - maybe it would only be possible > to define a range of individual R colors first that already form a > gradient, and then 'pick out' the colors that you want to make the > gradient. > Hi Annemarie, You can do this with a number of functions. As I am most familiar with color.scale (plotrix), I'll explain with that. # get the range of colors from 0 to 1 by 0.05 colorgrad01<-color.scale(seq(0,1,by=0.05),cs1=c(1,0),cs2=0,cs3=(0,1)) # now get your partial gradient colorgrad.25.35<-colorgrad01[6:8] In other words, create the complete color scale (as you might with the rainbow() function) and then pull out part of it to use. You could just index the entire color scale, of course. Jim From gavin.simpson at ucl.ac.uk Wed Jul 6 14:31:21 2011 From: gavin.simpson at ucl.ac.uk (Gavin Simpson) Date: Wed, 06 Jul 2011 13:31:21 +0100 Subject: [R] permil symbol linux In-Reply-To: <20110706122428.3506b560.mathew.brown@forst.uni-goettingen.de> References: <20110706100935.35b6bfb3.mathew.brown@forst.uni-goettingen.de> <20110706113710.92fcd93f.mathew.brown@forst.uni-goettingen.de> <20110706122428.3506b560.mathew.brown@forst.uni-goettingen.de> Message-ID: <1309955481.6763.13.camel@prometheus.geog.ucl.ac.uk> On Wed, 2011-07-06 at 12:24 +0200, mathew brown wrote: > To all in R land, > > Here is what I would like to do. > x = c(1:10) > y = c(1:10) > plot(x,y) > mtext(side=2, line=1.5, expression(""*delta*""^18*"O [permill]"), cex=1, adj=0.5) > > That's it. Except I would like to replace "permill" with the symbol. > Thanks for the help Are you trying to be difficult? Have you read the posting guide so you can provide the "at minimum" information Prof. Ripley asked you for? Simply repeating the question ad nauseam isn't what was requested. On my linux box, this will produce a d^{18}O permille **on screen** - not sure why you want this in square brackets? plot(1:10, ylab = expression(delta^{18}*O ~ ("\u2030"))) Now that glyph doesn't appear to be in the pdf device fonts for my encoding so when plotting to pdf() I have to add `encoding="WinAnsi.enc"` to my `pdf()` call - IIRC this advice at the time was supplied by Prof. Ripley. The advice may well be different on your **unstated** OS - hence the requests for more info. This is with: > sessionInfo() R version 2.13.0 Patched (2011-04-19 r55527) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] cocorresp_0.1-9 vegan_1.17-9 which would be part of the at minimum info Prof. Ripley asked for. And also which device (windows(), x11(), quartz(), pdf(), postscript() etc, etc.) do you *want* to print on? Please do read the posting guide - you might not be aware what information is required to answer your question, so what seems superfluous may well be, and in this case probably is, *essential*. G > > On Wed, 6 Jul 2011 11:07:55 +0100 (BST) > Prof Brian Ripley wrote: > > > You still have not sent the information requested in the posting > > guide, and I do not know your target device nor locale, nor is that a > > reproducible example. > > > > Please learn some respect for the time of the helpers here (and for > > all the work that went into making this possible in R). > > > > On Wed, 6 Jul 2011, mathew brown wrote: > > > > > Good point. > > > > > > Here is the code > > > plot(dat$timestamp,dat$delta_18_16, ylab="", xlab="(min)", tck=0.05, col="blue") > > > mtext(side=2, line=1.5, expression(""*delta*""^18*"O [\u0089]"), cex=1, adj=0.5) > > > > > > I'm still not sure how to get your code to work. > > > > Unsurprising, as you have not seen *my* code. > > > > > thanks > > > > > > On Wed, 6 Jul 2011 09:45:49 +0100 (BST) > > > Prof Brian Ripley wrote: > > > > > >> On Wed, 6 Jul 2011, mathew brown wrote: > > >> > > >>> Hi, > > >> > > >>> I'm trying to figure out how to make a plot with ylab showing the > > >>> permil symbol. Anyone know how to do this? > > >> > > >> Yes. > > >> > > >> Now, it you would follow the posting guide and give the 'at a minimum' > > >> information you were asked for, and the graphics device you want to > > >> use, we might be able to tell you more precisely how. > > >> > > >> Hint: that symbol is "\u0089", and also in latin-1. > > >> > > >>> Thanks > > >>> > > >>> -- > > >>> > > >>> ______________________________________________ > > >>> R-help at r-project.org mailing list > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >>> > > >> > > >> -- > > >> Brian D. Ripley, ripley at stats.ox.ac.uk > > >> Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > >> University of Oxford, Tel: +44 1865 272861 (self) > > >> 1 South Parks Road, +44 1865 272866 (PA) > > >> Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > > > > > > -- > > > Mathew Brown > > > Institute of Bioclimatology > > > University of G?ttingen > > > B?sgenweg 2 > > > 37077 G?ttingen, Germany > > > t: +49 551 39 9359 > > > mathew.brown at forst.uni-goettingen.de > > > > > > > -- > > Brian D. Ripley, ripley at stats.ox.ac.uk > > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > University of Oxford, Tel: +44 1865 272861 (self) > > 1 South Parks Road, +44 1865 272866 (PA) > > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% From bt_jannis at yahoo.de Wed Jul 6 14:47:34 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 6 Jul 2011 13:47:34 +0100 (BST) Subject: [R] including figures in html documentation/help Message-ID: <1309956454.93557.YahooMailClassic@web28204.mail.ukl.yahoo.com> Dear list members, is it somehow possible to include figures to the html help pages of individueal functions (containing for example a plot produced by that function?) I thought about adding these figures into a 'graphs' subfolder of the package folder and then to somehow insert some sort of html link into the documentation code. I use inlinedocs for creating the documentation. Any ideas? Jannis From f.harrell at vanderbilt.edu Wed Jul 6 14:57:44 2011 From: f.harrell at vanderbilt.edu (Frank Harrell) Date: Wed, 6 Jul 2011 05:57:44 -0700 (PDT) Subject: [R] Speed Advice for R --- avoid data frames In-Reply-To: <4E10968A.8060804@statistik.tu-dortmund.de> References: <4E0F6A65.8080101@statistik.tu-dortmund.de> <4E10968A.8060804@statistik.tu-dortmund.de> Message-ID: <1309957064694-3648681.post@n4.nabble.com> On occasion, as pointed out in an earlier posting, it is efficient to convert to a matrix and when finished convert back to a data frame. The Hmisc package's asNumericMatrix and matrix2dataFrame functions assist by converting character variables to factors if needed, and by holding on to original attributes of variables in the data frame such as "levels", then restoring the attributes. Frank Uwe Ligges-3 wrote: > > On 02.07.2011 21:35, ivo welch wrote: >> hi uwe---thanks for the clarification. of course, my example should >> always >> be done in vectorized form. I only used it to show how iterative access >> compares in the simplest possible fashion.<100 accesses per seconds is >> REALLY slow, though. >> >> I don't know R internals and the learning curve would be steep. >> moreover, >> there is no guarantee that changes I would make would be accepted. so, I >> cannot do this. >> >> however, for an R expert, this should not be too difficult. >> conceptually, >> if data frame element access primitives are create/write/read/destroy in >> the >> code, then it's truly trivial. just add a matrix (dim the same as the >> data >> frame) of byte pointers to point at the storage upon creation/change >> time. >> this would be quick-and-dirty. for curiosity, do you know which source >> file has the data frame internals? maybe I will get tempted anyway if it >> is >> simple enough. > > > I think you should start to look at the mechanisms to construct > data.frames (such as data.frame) and learn that data.frames are special > lists. Then you may want to look at the differences between the > .Primitive("[") and .Primitive("[<-") used for vectors (including > vectors with dim attributes such as matrixes) and the correspoding > methods for data.frames: "[<-.data.frame" and "[.data.frame". > > After that, I doubt you want to improve further on. Note also that > data.frames can be pretty large and you really do not want to store a > matrix of pointers as large as the data.frame. People working witrh > large data.frames won't be happy with such a suggestion. > > If you want to follow up, I'd suggest to move the thread to R-devel > where it seems to be more appropriate. > > Best, > Uwe > > > > > > >> >> (a more efficient but more involved way to do this would be to store a >> data >> frame internally always as a matrix of data pointers, but this would >> probably require more surgery.) >> >> It is also not as important for me, as it is for others...to give a good >> impression to those that are not aware of the tradeoffs---which is most >> people considering to adopt R. >> >> /iaw >> >> >> ---- >> Ivo Welch (ivo.welch at gmail.com) >> >> >> >> >> 2011/7/2 Uwe Ligges<ligges at statistik.tu-dortmund.de> >> >>> Some comments: >>> >>> the comparison matrix rows vs. matrix columns is incorrect: Note that R >>> has >>> lazy evaluation, hence you construct your matrix in the timing for the >>> rows >>> and it is already constructed in the timing for the columns, hence you >>> want >>> to use: >>> >>> M<- matrix( rnorm(C*R), nrow=R ) >>> D<- as.data.frame(matrix( rnorm(C*R), nrow=R ) ) >>> example(M) >>> example(D) >>> >>> Further on, you are correct with you statement that data.frame indexing >>> is >>> much slower, but if you can store your data in matrix form, just go on >>> as it >>> is. >>> >>> I doubt anybody is really going to make the index operation you cited >>> within a loop. Then, with a data.frame, I can live with many vectorized >>> replacements again: >>> >>>> system.time(D[,20]<- sqrt(abs(D[,20])) + rnorm(1000)) >>> user system elapsed >>> 0.01 0.00 0.01 >>> >>>> system.time(D[20,]<- sqrt(abs(D[20,])) + rnorm(1000)) >>> user system elapsed >>> 0.51 0.00 0.52 >>> >>> OK, it would be nice to do that faster, but this is not easy. I think R >>> Core is happy to see contributions to make it faster without breaking >>> existing features. >>> >>> >>> >>> Best wishes, >>> Uwe >>> >>> >>> >>> >>> On 02.07.2011 20:35, ivo welch wrote: >>> >>>> This email is intended for R users that are not that familiar with R >>>> internals and are searching google about how to speed up R. >>>> >>>> Despite common misperception, R is not slow when it comes to iterative >>>> access. R is fast when it comes to matrices. R is very slow when it >>>> comes to iterative access into data frames. Such access occurs when a >>>> user uses "data$varname[index]", which is a very common operation. To >>>> illustrate, run the following program: >>>> >>>> R<- 1000; C<- 1000 >>>> >>>> example<- function(m) { >>>> cat("rows: "); cat(system.time( for (r in 1:R) m[r,20]<- >>>> sqrt(abs(m[r,20])) + rnorm(1) ), "\n") >>>> cat("columns: "); cat(system.time(for (c in 1:C) m[20,c]<- >>>> sqrt(abs(m[20,c])) + rnorm(1)), "\n") >>>> if (is.data.frame(m)) { cat("df: columns as names: "); >>>> cat(system.time(for (c in 1:C) m[[c]][20]<- sqrt(abs(m[[c]][20])) + >>>> rnorm(1)), "\n") } >>>> } >>>> >>>> cat("\n**** Now as matrix\n") >>>> example( matrix( rnorm(C*R), nrow=R ) ) >>>> >>>> cat("\n**** Now as data frame\n") >>>> example( as.data.frame( matrix( rnorm(C*R), nrow=R ) ) ) >>>> >>>> >>>> The following are the reported timing under R 2.12.0 on a Mac Pro 3,1 >>>> with ample RAM: >>>> >>>> matrix, columns: 0.01s >>>> matrix, rows: 0.175s >>>> data frame, columns: 53s >>>> data frame, rows: 56s >>>> data frame, names: 58s >>>> >>>> Data frame access is about 5,000 times slower than matrix column >>>> access, and 300 times slower than matrix row access. R's data frame >>>> operational speed is an amazing 40 data accesses per seconds. I have >>>> not seen access numbers this low for decades. >>>> >>>> >>>> How to avoid it? Not easy. One way is to create multiple matrices, >>>> and group them as an object. of course, this loses a lot of features >>>> of R. Another way is to copy all data used in calculations out of the >>>> data frame into a matrix, do the operations, and then copy them back. >>>> not ideal, either. >>>> >>>> In my opinion, this is an R design flow. Data frames are the >>>> fundamental unit of much statistical analysis, and should be fast. I >>>> think R lacks any indexing into data frames. Turning on indexing of >>>> data frames should at least be an optional feature. >>>> >>>> >>>> I hope this message post helps others. >>>> >>>> /iaw >>>> >>>> ---- >>>> Ivo Welch (ivo.welch at gmail.com) >>>> http://www.ivo-welch.info/ >>>> >>>> ______________________________**________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> >>>> PLEASE do read the posting guide http://www.R-project.org/** >>>> posting-guide.html<http://www.R-project.org/posting-guide.html> >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Speed-Advice-for-R-avoid-data-frames-tp3640932p3648681.html Sent from the R help mailing list archive at Nabble.com. From murdoch.duncan at gmail.com Wed Jul 6 15:04:47 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 06 Jul 2011 09:04:47 -0400 Subject: [R] including figures in html documentation/help In-Reply-To: <1309956454.93557.YahooMailClassic@web28204.mail.ukl.yahoo.com> References: <1309956454.93557.YahooMailClassic@web28204.mail.ukl.yahoo.com> Message-ID: <4E145D6F.5080506@gmail.com> On 11-07-06 8:47 AM, Jannis wrote: > Dear list members, > > is it somehow possible to include figures to the html help pages of individueal functions (containing for example a plot produced by that function?) > > I thought about adding these figures into a 'graphs' subfolder of the package folder and then to somehow insert some sort of html link into the documentation code. > > I use inlinedocs for creating the documentation. Not in the current release, but this feature has been added to R-devel (which will be released at the end of October). The simplest form is to put \figure{filename.png} into your help page. The "filename.png" file should be stored in the man/figures directory of your package. You can also generate figures using R code, but it's a little tricky to make sure the generated files are stored in the right place. Here's an ugly example, which will probably be simpler by release time: \Sexpr[stage=render,results=rd]{ library(testpkg) # This is the package with the example library(grDevices) filename <- tempfile(fileext=".png") png(file=filename) plot(rnorm(100)) dev.off() paste("\\\\ifelse{html}{\\\\figure{", file.path("../../../session", basename(filename)), "}}{\\\\figure{", normalizePath(filename, "/"), "}}", sep="") } Documentation on this is currently sparse, but it's there. Duncan Murdoch From Bernhard_Pfaff at fra.invesco.com Wed Jul 6 15:17:12 2011 From: Bernhard_Pfaff at fra.invesco.com (Pfaff, Bernhard Dr.) Date: Wed, 6 Jul 2011 13:17:12 +0000 Subject: [R] BY GROUP in evir R package In-Reply-To: <1309937119.92702.YahooMailRC@web121711.mail.ne1.yahoo.com> References: <31464.80001.qm@web121719.mail.ne1.yahoo.com> <4DE939D8.8040305@pfaffikus.de> <1309937119.92702.YahooMailRC@web121711.mail.ne1.yahoo.com> Message-ID: Hello Peter, str(rg2) us quite revealing for this; by() returns a list and hence lapply() can be employed, e.g.: lapply(rg2, rlevel.gev, k.blocks = 5) By the same token, you can extract the relevant bits and pieces and put them together in a data.frame. Best, Bernhard > -----Urspr?ngliche Nachricht----- > Von: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Im Auftrag von Peter Maclean > Gesendet: Mittwoch, 6. Juli 2011 09:25 > An: Dr. Bernhard Pfaff > Cc: r-help at r-project.org > Betreff: Re: [R] BY GROUP in evir R package > > Dr.?Pfaff: > How do we pass the "by" results to "rlevel.gev" function to > get the?return level and also save the results (both > rg2(par.ests and $par.ses) and rl) as.data.frame? > > #Grouped vector > Gdata <- data.frame(n = rep(c(1,2,3), each = 100), y = rnorm(300)) > library(evir) > require(plyr) > > #Model for Grouped > rg2<- by(Gdata,Gdata[,"n"], function(x) gev(x$y, 5, method = > "BFGS", control =list(maxit = 500))) # rl <- rlevel.gev(rg2, > k.blocks = 5, add = TRUE) > ? > > > > ----- Original Message ---- > From: Dr. Bernhard Pfaff > To: Peter Maclean > Sent: Fri, June 3, 2011 2:45:28 PM > Subject: Re: BY GROUP in evir R package > > Hello Peter, > > many thanks for your email. Well, as you might have guessed, > there is also a > function by() in R that does the same job. See help("by") for > more information. > > Best, > Bernhard > > Peter Maclean schrieb: > > Hi, > > I am new in R and I want to use your package for data > analysis. I usually use > >SAS. I have rainfall data for different points. Each point > has 120 observations. > >The rainfall data is in the first column (RAIN) and the > categorical variable > >that group the data is in the second column (GROUP). The > data frame is > >rain.data. How can I use the gev function to estimate all > three parameters by > >GROUP variable group? In SAS there is a by() function that > estimate the model by > >group. However, I would like to move to R. > >? With thanks, > >? Peter Maclean > > Department of Economics > > University of Dar -es- Salaam, Tanzania > > > >? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ***************************************************************** Confidentiality Note: The information contained in this ...{{dropped:10}} From clark.thiago at gmail.com Wed Jul 6 14:41:32 2011 From: clark.thiago at gmail.com (Thiago Clark) Date: Wed, 6 Jul 2011 09:41:32 -0300 Subject: [R] Subset creates row_names column when exported to MYSQL Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From el at lisse.NA Wed Jul 6 12:41:11 2011 From: el at lisse.NA (Dr Eberhard Lisse) Date: Wed, 06 Jul 2011 11:41:11 +0100 Subject: [R] aggregation question In-Reply-To: References: Message-ID: <4E143BC7.2040408@lisse.NA> Hi, I am reading payment data like so 2010-01-01,100.00 2010-01-04,100.00 ... 2011-01-01,200.00 2011-01-07,100.00 and plot it aggregated per month like so library(zoo) df <- read.csv("daily.csv", colClasses=c(d="Date",s="numeric")) z <- zoo(df$s, df$d) z.mo <- aggregate(z, as.yearmon, sum) barplot(z.mo, col="darkblue") How do I get the monthly aggregated payments in different colors next to each other (ie for each year in a different color with the x axis showing the months)? Solution preferred, but pointers to documentation welcome :-)-O greetings, el -- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist (Saar) el at lisse.NA el108-ARIN / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 \ / Please do NOT email to this address Bachbrecht, Namibia ;____/ if it is DNS related in ANY way From Wim.Delva at ugent.be Wed Jul 6 13:10:14 2011 From: Wim.Delva at ugent.be (Wim Delva) Date: Wed, 6 Jul 2011 13:10:14 +0200 Subject: [R] superimposing network graphs Message-ID: Dear all, I have a undirected network (g), representing all the sexual relationships that ever existed in a model community. I also have a directed edgelist (e) which is a subset of the edgelist of g. e represents the transmission pathway of HIV. Now I would like to superimpose the picture of the sexual relationships with arrows in a different colour, to indicate where in the network HIV was transmitted. Any ideas on how to do this? Many thanks, Wim Wim Delva MD, PhD International Centre for Reproductive Health Ghent University, Belgium www.icrh.org South African Centre for Epidemiological Modelling and Analysis Stellenbosch University, South Africa www.sacema.com epi update: www. sacemaquarterly.com Tel: +27 21 808 27 79 (work) Cell: +27 72 842 82 33 From bt_jannis at yahoo.de Wed Jul 6 16:03:43 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 6 Jul 2011 15:03:43 +0100 (BST) Subject: [R] including figures in html documentation/help In-Reply-To: <4E145D6F.5080506@gmail.com> Message-ID: <1309961023.18291.YahooMailClassic@web28207.mail.ukl.yahoo.com> Thanks for your advice Duncan. In which file should I put the \figure{} command? I tried the *.Rd file, but the html files created are without the figure. Are you sure I only need to include the filename and no path? Jannis --- Duncan Murdoch schrieb am Mi, 6.7.2011: > Von: Duncan Murdoch > Betreff: Re: [R] including figures in html documentation/help > An: "Jannis" > CC: r-help at r-project.org > Datum: Mittwoch, 6. Juli, 2011 13:04 Uhr > On 11-07-06 8:47 AM, Jannis wrote: > > Dear list members, > > > > is it somehow possible to include figures to the html > help pages of individueal functions (containing for example > a plot produced by that function?) > > > > I thought about adding these figures into a 'graphs' > subfolder of the package folder and then to somehow insert > some sort of html link into the documentation code. > > > > I use inlinedocs for creating the documentation. > > Not in the current release, but this feature has been added > to R-devel (which will be released at the end of October). > > The simplest form is to put > > \figure{filename.png} > > into your help page.? The "filename.png" file should > be stored in the man/figures directory of your package. > > You can also generate figures using R code, but it's a > little tricky to make sure the generated files are stored in > the right place.? Here's an ugly example, which will > probably be simpler by release time: > > \Sexpr[stage=render,results=rd]{ > ? ???library(testpkg)? # This is > the package with the example > ? ???library(grDevices) > ? ???filename <- > tempfile(fileext=".png") > ? ???png(file=filename) > ? ???plot(rnorm(100)) > ? ???dev.off() > ? > ???paste("\\\\ifelse{html}{\\\\figure{", > file.path("../../../session", basename(filename)), > ? ? ? ? > ???"}}{\\\\figure{", normalizePath(filename, > "/"), "}}", sep="") > } > > > Documentation on this is currently sparse, but it's there. > > Duncan Murdoch > From murdoch.duncan at gmail.com Wed Jul 6 16:11:14 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 06 Jul 2011 10:11:14 -0400 Subject: [R] including figures in html documentation/help In-Reply-To: <1309961023.18291.YahooMailClassic@web28207.mail.ukl.yahoo.com> References: <1309961023.18291.YahooMailClassic@web28207.mail.ukl.yahoo.com> Message-ID: <4E146D02.1050903@gmail.com> On 06/07/2011 10:03 AM, Jannis wrote: > Thanks for your advice Duncan. In which file should I put the > > \figure{} > > command? I tried the *.Rd file, but the html files created are without the figure. Are you sure I only need to include the filename and no path? Yes, in the .Rd file. Can't really diagnose what went wrong for you, but my first guess would be that you're not using a sufficiently recent R-devel. Duncan Murdoch > > Jannis > > --- Duncan Murdoch schrieb am Mi, 6.7.2011: > > > Von: Duncan Murdoch > > Betreff: Re: [R] including figures in html documentation/help > > An: "Jannis" > > CC: r-help at r-project.org > > Datum: Mittwoch, 6. Juli, 2011 13:04 Uhr > > On 11-07-06 8:47 AM, Jannis wrote: > > > Dear list members, > > > > > > is it somehow possible to include figures to the html > > help pages of individueal functions (containing for example > > a plot produced by that function?) > > > > > > I thought about adding these figures into a 'graphs' > > subfolder of the package folder and then to somehow insert > > some sort of html link into the documentation code. > > > > > > I use inlinedocs for creating the documentation. > > > > Not in the current release, but this feature has been added > > to R-devel (which will be released at the end of October). > > > > The simplest form is to put > > > > \figure{filename.png} > > > > into your help page. The "filename.png" file should > > be stored in the man/figures directory of your package. > > > > You can also generate figures using R code, but it's a > > little tricky to make sure the generated files are stored in > > the right place. Here's an ugly example, which will > > probably be simpler by release time: > > > > \Sexpr[stage=render,results=rd]{ > > library(testpkg) # This is > > the package with the example > > library(grDevices) > > filename<- > > tempfile(fileext=".png") > > png(file=filename) > > plot(rnorm(100)) > > dev.off() > > > > paste("\\\\ifelse{html}{\\\\figure{", > > file.path("../../../session", basename(filename)), > > > > "}}{\\\\figure{", normalizePath(filename, > > "/"), "}}", sep="") > > } > > > > > > Documentation on this is currently sparse, but it's there. > > > > Duncan Murdoch > > From bt_jannis at yahoo.de Wed Jul 6 16:37:21 2011 From: bt_jannis at yahoo.de (Jannis) Date: Wed, 6 Jul 2011 15:37:21 +0100 (BST) Subject: [R] including figures in html documentation/help In-Reply-To: <4E146D02.1050903@gmail.com> Message-ID: <1309963041.69973.YahooMailClassic@web28206.mail.ukl.yahoo.com> Dear Duncan, OK, my fault. I did not realize that you only refer to r-devel. I found however a way for standard R by putting this into the Rd file: \details{ \if{html}{\out{image .. should be
here}}\ifelse{latex}{}{} } And saving the figure in inst/doc. Now I just need to find a way to get Inlinedocs to pass this code from the sourcecode of the function directly into the Rd files but I am sure some googeling will help me :-). Thanks again for the help! Jannis --- Duncan Murdoch schrieb am Mi, 6.7.2011: > Von: Duncan Murdoch > Betreff: Re: [R] including figures in html documentation/help > An: "Jannis" > CC: r-help at r-project.org > Datum: Mittwoch, 6. Juli, 2011 14:11 Uhr > On 06/07/2011 10:03 AM, Jannis > wrote: > > Thanks for your advice Duncan. In which file should I > put the > > > > \figure{} > > > > command? I tried the *.Rd file, but the html files > created are without the figure. Are you sure I only need to > include the filename and no path? > > Yes, in the .Rd file. > > Can't really diagnose what went wrong for you, but my first > guess would be that you're not using a sufficiently recent > R-devel. > > Duncan Murdoch > > > > > Jannis > > > > --- Duncan Murdoch? > schrieb am Mi, 6.7.2011: > > > > >? Von: Duncan Murdoch > > >? Betreff: Re: [R] including figures in html > documentation/help > > >? An: "Jannis" > > >? CC: r-help at r-project.org > > >? Datum: Mittwoch, 6. Juli, 2011 13:04 Uhr > > >? On 11-07-06 8:47 AM, Jannis wrote: > > >? >? Dear list members, > > >? > > > >? >? is it somehow possible to > include figures to the html > > >? help pages of individueal functions > (containing for example > > >? a plot produced by that function?) > > >? > > > >? >? I thought about adding these > figures into a 'graphs' > > >? subfolder of the package folder and then to > somehow insert > > >? some sort of html link into the > documentation code. > > >? > > > >? >? I use inlinedocs for creating > the documentation. > > > > > >? Not in the current release, but this > feature has been added > > >? to R-devel (which will be released at the > end of October). > > > > > >? The simplest form is to put > > > > > >? \figure{filename.png} > > > > > >? into your help page.? The > "filename.png" file should > > >? be stored in the man/figures directory of > your package. > > > > > >? You can also generate figures using R code, > but it's a > > >? little tricky to make sure the generated > files are stored in > > >? the right place.? Here's an ugly > example, which will > > >? probably be simpler by release time: > > > > > >? \Sexpr[stage=render,results=rd]{ > > >? ? ? ? library(testpkg)? > # This is > > >? the package with the example > > >? ? ? ? library(grDevices) > > >? ? ? ? filename<- > > >? tempfile(fileext=".png") > > >? ? ? ? png(file=filename) > > >? ? ? ? plot(rnorm(100)) > > >? ? ? ? dev.off() > > >? >? ? ? > paste("\\\\ifelse{html}{\\\\figure{", > > >? file.path("../../../session", > basename(filename)), > > >? >? ? ? "}}{\\\\figure{", > normalizePath(filename, > > >? "/"), "}}", sep="") > > >? } > > > > > > > > >? Documentation on this is currently sparse, > but it's there. > > > > > >? Duncan Murdoch > > > > > From gm.spam2011 at gmail.com Wed Jul 6 16:37:28 2011 From: gm.spam2011 at gmail.com (B Laura) Date: Wed, 6 Jul 2011 16:37:28 +0200 Subject: [R] time zone issue - beginners question Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From nicolas.chapados at gmail.com Wed Jul 6 16:49:36 2011 From: nicolas.chapados at gmail.com (Nicolas Chapados) Date: Wed, 6 Jul 2011 10:49:36 -0400 Subject: [R] forecast: bias in sampling from seasonal ARIMA model? Message-ID: [Reposting, since there was problems with encodings the first time around.] Dear all, I stumbled upon what appears to be a troublesome issue when sampling from an ARIMA model (from Rob Hyndman's excellent 'forecast' package) that contains a seasonal AR component. Here's how to reproduce the issue. (I'm using R 2.9.2 with forecast 2.19; see sessionInfo() below). First some data: > x <- c( 0.132475, 0.143119, 0.108104, 0.247291, 0.029510, -0.119591, -0.133313, -0.098128, 0.192698, 0.110328, 0.163671, -0.004925, -0.239209, -0.055122, -0.051121, 0.154108, 0.008665, -0.074702, 0.066534, -0.098728, -0.068668, 0.150935, -0.022547, 0.028625, 0.107092, -0.065396, -0.253247, -0.115240, -0.113535, -0.064191, -0.006032, 0.039233, 0.129013, -0.068462, 0.022398, -0.052427, -0.005586, 0.011447, -0.022667, -0.120536, -0.234398, -0.164087, -0.177160, -0.120624, -0.025104, 0.001144, -0.193424, -0.260674, -0.036976, -0.009590, -0.004920, 0.130545, 0.120527, 0.041121, -0.123321, 0.023836, -0.188418, 0.015807, -0.056012, 0.000496, 0.051806, -0.067574, 0.012775, 0.244083, 0.148857, 0.013874, 0.235252, 0.151935, 0.036986, 0.134482, -0.003359, -0.019422, 0.086195, 0.206569, 0.123565, 0.070835, -0.183189, -0.046513, 0.071920, -0.038360, 0.135293, 0.054746, -0.280340, 0.110638, 0.009729, 0.115541, 0.021397, 0.097835, -0.028434, -0.218416, 0.044552, 0.442563, 0.084317, 0.044149, 0.201100, 0.076112, -0.134955, 0.023870, 0.077111, 0.085490, 0.023154, 0.099757, -0.026509, -0.189839, 0.026614, 0.184916, -0.007266, 0.081276, 0.312526, 0.051199, -0.104707, -0.004206, 0.062440, 0.126385, -0.018100, 0.092513, 0.186459, -0.170184, -0.126168, 0.122739, 0.097495, 0.008633, -0.034519, 0.187264, -0.153409, 0.009440, 0.150561, 0.067744, 0.045129, 0.230831, -0.079700, -0.162694, -0.044251, -0.007663, 0.048986, 0.065724, 0.159706, 0.040067, -0.059949, 0.024810, -0.154852, 0.018080, 0.165935, 0.203050, 0.011035, -0.232585, -0.162248, -0.104872, -0.062516, -0.089766, 0.100304, 0.142170, -0.144969, -0.032500, -0.002131, 0.165890, 0.107629, 0.075752, 0.119003, 0.095955, 0.039842, 0.081208, 0.348529, 0.145694, -0.210700, 0.384966, -0.054503, 0.293329, 0.184295, 0.368986, 0.135270, 0.124917, 0.185286, -0.252088, -0.169708, -0.010204, 0.021934, 0.003572, 0.180148, 0.075836, -0.232065, -0.127255, -0.147122, 0.056163, 0.067004, 0.217810, 0.074513, -0.167389, 0.172578, -0.148127, 0.057025, 0.042623, 0.094214, 0.047004, -0.345453, -0.265104, -0.082897, 0.052705, -0.067002, 0.191941, 0.010989, -0.298567, -0.162841, 0.043773, 0.185459, 0.126305, 0.383101, 0.092747, -0.368453, -0.325097, 0.029564, -0.015390, 0.013807, 0.152062, -0.047015, -0.429245, -0.097742, 0.104502, -0.007547, -0.000245, 0.062830, 0.030093, -0.381043, -0.267704, -0.125930, -0.032264, -0.041657, 0.040073, 0.084431, -0.276316, -0.305253, -0.019942, 0.045390, 0.046090, 0.145700, 0.069920, -0.210079, 0.050967, 0.042283, 0.248840, 0.007883, 0.203171, 0.050722, -0.109773, -0.110301, -0.095433, 0.071133, 0.023793, 0.192476, 0.057746) First, a CORRECT model, containing a seasonal MA component but no seasonal AR component. After estimation, I forecast for 1 time-step, and I take the mean of sampling 10000 times from the same model: > my.arima1 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(0,0,2), period=7), include.mean=FALSE) > forecast(my.arima1, 1) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 251 -0.03143283 -0.1882245 0.1253589 -0.271225 0.2083594 > set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima1, 1)) )) [1] -0.03258454 The results ("Point Forecast" versus the output of mean()) are identical to some sampling error. Now the INCORRECT model arises from adding one seasonal AR component: > my.arima2 <- Arima(x, order=c(3,0,0), seasonal=list(order=c(1,0,2), period=7), include.mean=FALSE) > forecast(my.arima2, 1) Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 251 -0.1848579 -0.322421 -0.04729492 -0.3952424 0.02552655 > set.seed(1827) ; mean(sapply(seq_len(10000), function(i) as.numeric(simulate(my.arima2, 1)) )) [1] -0.05416299 For the results are substantially different (-0.18 versus -0.05), and the latter does not change much if I take a much bigger sample. Did anybody encounter this in the past? Is this a bug? For reference, here are the results of sessionInfo(): > sessionInfo() R version 2.9.2 (2009-08-24) x86_64-pc-linux-gnu locale: C attached base packages: [1] splines stats graphics grDevices utils datasets methods [8] base other attached packages: [1] KernSmooth_2.23-3 digest_0.4.2 forecast_2.19 fracdiff_1.3-2 [5] tseries_0.10-12 zoo_1.4-0 quadprog_1.4-11 glmnet_1.7 [9] Matrix_0.999375-30 biglm_0.7 DBI_0.2-4 inline_0.3.7 [13] XML_2.6-0 timeSeries_2110.86 timeDate_2110.87 RODBC_1.3-2 [17] reshape_0.8.3 plyr_0.1.9 MASS_7.2-48 nnet_7.2-48 [21] latticeExtra_0.5-1 RColorBrewer_1.0-2 lattice_0.17-25 gsubfn_0.5-7 [25] proto_0.3-8 multicore_0.1-3 data.table_1.4.1 loaded via a namespace (and not attached): [1] grid_2.9.2 tools_2.9.2 Many thanks for any help or pointers! + Nicolas Chapados From murdoch.duncan at gmail.com Wed Jul 6 17:19:05 2011 From: murdoch.duncan at gmail.com (Duncan Murdoch) Date: Wed, 06 Jul 2011 11:19:05 -0400 Subject: [R] including figures in html documentation/help In-Reply-To: <1309963041.69973.YahooMailClassic@web28206.mail.ukl.yahoo.com> References: <1309963041.69973.YahooMailClassic@web28206.mail.ukl.yahoo.com> Message-ID: <4E147CE9.5050704@gmail.com> On 06/07/2011 10:37 AM, Jannis wrote: > Dear Duncan, > > OK, my fault. I did not realize that you only refer to r-devel. I found however a way for standard R by putting this into the Rd file: > > > \details{ > \if{html}{\out{image .. should be
> here}}\ifelse{latex}{}{} > } > > And saving the figure in inst/doc. I think that will work. I'd recommend putting the alt text in place to be displayed in text and LaTeX versions of the page. (R-devel will include the figure in LaTeX, alt text in text. Duncan Murdoch > Now I just need to find a way to get Inlinedocs to pass this code from the sourcecode of the function directly into the Rd files but I am sure some googeling will help me :-). > > Thanks again for the help! > Jannis > > --- Duncan Murdoch schrieb am Mi, 6.7.2011: > > > Von: Duncan Murdoch > > Betreff: Re: [R] including figures in html documentation/help > > An: "Jannis" > > CC: r-help at r-project.org > > Datum: Mittwoch, 6. Juli, 2011 14:11 Uhr > > On 06/07/2011 10:03 AM, Jannis > > wrote: > > > Thanks for your advice Duncan. In which file should I > > put the > > > > > > \figure{} > > > > > > command? I tried the *.Rd file, but the html files > > created are without the figure. Are you sure I only need to > > include the filename and no path? > > > > Yes, in the .Rd file. > > > > Can't really diagnose what went wrong for you, but my first > > guess would be that you're not using a sufficiently recent > > R-devel. > > > > Duncan Murdoch > > > > > > > > Jannis > > > > > > --- Duncan Murdoch > > schrieb am Mi, 6.7.2011: > > > > > > > Von: Duncan Murdoch > > > > Betreff: Re: [R] including figures in html > > documentation/help > > > > An: "Jannis" > > > > CC: r-help at r-project.org > > > > Datum: Mittwoch, 6. Juli, 2011 13:04 Uhr > > > > On 11-07-06 8:47 AM, Jannis wrote: > > > > > Dear list members, > > > > > > > > > > is it somehow possible to > > include figures to the html > > > > help pages of individueal functions > > (containing for example > > > > a plot produced by that function?) > > > > > > > > > > I thought about adding these > > figures into a 'graphs' > > > > subfolder of the package folder and then to > > somehow insert > > > > some sort of html link into the > > documentation code. > > > > > > > > > > I use inlinedocs for creating > > the documentation. > > > > > > > > Not in the current release, but this > > feature has been added > > > > to R-devel (which will be released at the > > end of October). > > > > > > > > The simplest form is to put > > > > > > > > \figure{filename.png} > > > > > > > > into your help page. The > > "filename.png" file should > > > > be stored in the man/figures directory of > > your package. > > > > > > > > You can also generate figures using R code, > > but it's a > > > > little tricky to make sure the generated > > files are stored in > > > > the right place. Here's an ugly > > example, which will > > > > probably be simpler by release time: > > > > > > > > \Sexpr[stage=render,results=rd]{ > > > > library(testpkg) > > # This is > > > > the package with the example > > > > library(grDevices) > > > > filename<- > > > > tempfile(fileext=".png") > > > > png(file=filename) > > > > plot(rnorm(100)) > > > > dev.off() > > > > > > > paste("\\\\ifelse{html}{\\\\figure{", > > > > file.path("../../../session", > > basename(filename)), > > > > > "}}{\\\\figure{", > > normalizePath(filename, > > > > "/"), "}}", sep="") > > > > } > > > > > > > > > > > > Documentation on this is currently sparse, > > but it's there. > > > > > > > > Duncan Murdoch > > > > > > > > From justin_bem at yahoo.fr Wed Jul 6 17:19:46 2011 From: justin_bem at yahoo.fr (justin bem) Date: Wed, 6 Jul 2011 16:19:46 +0100 Subject: [R] Install.package error Message-ID: <1309965586.35130.YahooMailNeo@web29512.mail.ird.yahoo.com> Un texte encapsul? et encod? dans un jeu de caract?res inconnu a ?t? nettoy?... Nom : non disponible URL : From annakolar at yahoo.com Wed Jul 6 17:35:32 2011 From: annakolar at yahoo.com (Ana Kolar) Date: Wed, 6 Jul 2011 08:35:32 -0700 (PDT) Subject: [R] matching, treatment effect-ATT and Zelig package Message-ID: <1309966532.97789.YahooMailNeo@web114712.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From j.maas at uea.ac.uk Wed Jul 6 17:35:02 2011 From: j.maas at uea.ac.uk (Jim Maas) Date: Wed, 06 Jul 2011 16:35:02 +0100 Subject: [R] elegant way of removing NA's and selecting specific values from a data.frame Message-ID: <4E1480A6.10205@uea.ac.uk> I have a data.frame "e" and would like to extract the 23rd column, remove any NA's and then remove any values >= 30. I can do it in steps such as this but have failed to figure out how to do it in a single line .... any suggestions? first <- e[,23] second <- first[!is.na(first)] third <- second[second<=30] thanks a bunch J -- Dr. Jim Maas University of East Anglia From gunter.berton at gene.com Wed Jul 6 17:49:58 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Wed, 6 Jul 2011 08:49:58 -0700 Subject: [R] elegant way of removing NA's and selecting specific values from a data.frame In-Reply-To: <4E1480A6.10205@uea.ac.uk> References: <4E1480A6.10205@uea.ac.uk> Message-ID: ?"&" This is basic. Please read "An Intro to R" before posting any more such questions if you have not already done so. -- Bert On Wed, Jul 6, 2011 at 8:35 AM, Jim Maas wrote: > I have a data.frame "e" and would like to extract the 23rd column, remove > any NA's and then remove any values >= 30. ?I can do it in steps such as > this but have failed to figure out how to do it in a single line .... any > suggestions? > > first <- e[,23] > second <- first[!is.na(first)] > third <- second[second<=30] > > thanks a bunch > > J > > -- > Dr. Jim Maas > University of East Anglia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From ggrothendieck at gmail.com Wed Jul 6 18:02:46 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Wed, 6 Jul 2011 12:02:46 -0400 Subject: [R] time zone issue - beginners question In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 10:37 AM, B Laura wrote: > Hello all! > > As beginner I'm struggling for a while with time zones issue and can't find > a suitable solution. > I would be grateful for any help. > > Dataset imported from excel has a variable transplant.date which has been > recorded with CET time zone. > >> subDataset$transplant.date > ?[1] "2000-01-01 CET" "2000-01-01 CET" "2000-01-02 CET" "2000-01-02 CET" > "2000-01-02 CET" "2000-01-02 CET" "2000-01-04 CET" "2000-01-04 CET" > "2000-01-04 CET" "2000-01-04 CET" "2000-01-04 CET" "2000-01-05 CET" > "2000-01-05 CET" > [14] "2000-01-05 CET" "2000-01-05 CET" > > > However >> Sys.time() > [1] "2011-07-06 15:22:44 CEST" > > I need to calculate time difference in days but I'm still getting wrong > calculations. Most likely is this time zone issue. > > >> as.numeric(as.Date("2000-1-1")-as.Date(subDataset$transplant.date)) > [1] 1 1 0 0 0 0 -2 -2 -2 -2 -2 -3 -3 -3 -3 > > > Truncation doesn't help either > >> > trunc(as.Date("2000-1-1"),"days")-trunc(as.Date(subDataset$transplant.date),"days") > Time differences in days > ?[1] ?1 ?1 ?0 ?0 ?0 ?0 -2 -2 -2 -2 -2 -3 -3 -3 -3 > > If you don't have times, which seems to be the situation here, use "Date" class rather than "POSIXct". Then you can't have time zone issues in the first place. See R News 4/1. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From jdnewmil at dcn.davis.ca.us Wed Jul 6 18:12:29 2011 From: jdnewmil at dcn.davis.ca.us (Jeff Newmiller) Date: Wed, 06 Jul 2011 09:12:29 -0700 Subject: [R] time zone issue - beginners question In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From eggers at econ.uni-hamburg.de Wed Jul 6 16:31:29 2011 From: eggers at econ.uni-hamburg.de (Felix Eggers - Uni Hamburg) Date: Wed, 6 Jul 2011 16:31:29 +0200 (CEST) Subject: [R] mlogit: nested model with constant option in degenerate partition Message-ID: <41361.84.72.112.253.1309962689.squirrel@hermes1.econ.uni-hamburg.de> Dear R users, I am trying to estimate a nested logit model that has a constant option in a degenerate partition. The data set is a conjoint survey where respondents were asked to answer multiple choice sets. The choice sets consist of three alternatives (described by four varying attributes) and a no-choice option, i.e., none of the three alternatives. In the nested model one branch should represent the three varying choice options and the other branch the no-choice option. However, the R code I am using results in an error message because the no-choice option is not varying in the four attributes that I included in the mlogit formula. Is there another way of coding the no-choice option in this way? This is the R code I am using: > library(mlogit) > IA<-read.csv2("test.csv") # attributes are R, H, J, V, each having three levels. # r1, r2, h1, h2, etc are effect codes of these attribute levels. > head(IA, 20) Resp_id Resp_count chid Alternative_id Alt_text r1 r2 h1 h2 j1 j2 v1 v2 none_dummy selection_dummy 1 100007 1 1 1 Alt1 0 1 0 1 -1 -1 0 1 0 0 2 100007 1 1 2 Alt2 1 0 1 0 0 1 1 0 0 0 3 100007 1 1 3 Alt3 -1 -1 -1 -1 1 0 -1 -1 0 1 4 100007 1 1 4 NoChoice 0 0 0 0 0 0 0 0 1 0 5 100007 1 2 1 Alt1 0 1 -1 -1 0 1 -1 -1 0 0 6 100007 1 2 2 Alt2 -1 -1 1 0 -1 -1 1 0 0 1 7 100007 1 2 3 Alt3 1 0 0 1 1 0 0 1 0 0 8 100007 1 2 4 NoChoice 0 0 0 0 0 0 0 0 1 0 9 100007 1 3 1 Alt1 0 1 1 0 1 0 1 0 0 0 10 100007 1 3 2 Alt2 -1 -1 0 1 0 1 0 1 0 0 11 100007 1 3 3 Alt3 1 0 -1 -1 -1 -1 -1 -1 0 0 12 100007 1 3 4 NoChoice 0 0 0 0 0 0 0 0 1 1 13 100007 1 4 1 Alt1 0 1 1 0 1 0 -1 -1 0 0 14 100007 1 4 2 Alt2 -1 -1 0 1 0 1 1 0 0 1 15 100007 1 4 3 Alt3 1 0 -1 -1 -1 -1 0 1 0 0 16 100007 1 4 4 NoChoice 0 0 0 0 0 0 0 0 1 0 17 100007 1 5 1 Alt1 -1 -1 -1 -1 1 0 0 1 0 0 18 100007 1 5 2 Alt2 0 1 0 1 -1 -1 1 0 0 0 19 100007 1 5 3 Alt3 1 0 1 0 0 1 -1 -1 0 0 20 100007 1 5 4 NoChoice 0 0 0 0 0 0 0 0 1 1 > IADATA<-mlogit.data(IA, choice="selection_dummy", shape="long", alt.var="Alt_text", id.var="Resp_count", chid="chid") > nl<-mlogit(selection_dummy~r1+r2+h1+h2+j1+j2+v1+v2 | 0, data=IADATA, nests = list(opt1 = c("Alt1", "Alt2", "Alt3"), opt2 = "NoChoice"), unscaled=TRUE ) Error in solve.default(crossprod(attr(x, "gradi")[, !fixed])) : Lapack routine dgesv: system is exactly singular Thank you for your help! Best, Felix --- Felix Eggers From aparna.sampath26 at gmail.com Wed Jul 6 17:31:10 2011 From: aparna.sampath26 at gmail.com (Aparna Sampath) Date: Wed, 6 Jul 2011 08:31:10 -0700 (PDT) Subject: [R] Create simulated data's using mvrnorm Message-ID: <1309966270600-3649118.post@n4.nabble.com> Hi All This might be something very trivial but I seem to miss something in the syntax or logic which makes me keep wandering around the problem without arriving at a solution. What I want to do is to simulate a sample data for performing cluster analysis. I tried to use x1= mvrnorm(10,rep(0.8,3),diag(3)) x2= mvrnorm(10,rep(0,3),diag(3)) x3= mvrnorm(10,rep(-0.5,3),diag(3)) x=rbind(x1,x2,x3) I would like use table() to see if the separation is wide apart such that the first 10 rows of x are clustered together. for eg: when I use table() and if I get an ouptut like 1 2 3 1 10 0 0 2 0 10 0 3 0 0 10 Is there any way to get this kind of output? Thanks a lot! :) -- View this message in context: http://r.789695.n4.nabble.com/Create-simulated-data-s-using-mvrnorm-tp3649118p3649118.html Sent from the R help mailing list archive at Nabble.com. From noxyport at gmail.com Wed Jul 6 15:40:11 2011 From: noxyport at gmail.com (Pete Pete) Date: Wed, 6 Jul 2011 06:40:11 -0700 (PDT) Subject: [R] Reshape from long to wide format with date variable Message-ID: <1309959611143-3648833.post@n4.nabble.com> Hi, I need to reshape my dataframe from a long format to a wide format. Unfortunately, I have a continuous date variable which gives me headaches. Consider the following example: > id=c("034","034","016","016","016","340","340") > date=as.Date(c("1997-09-28", "1997-10-06", "1997-11-04", "2000-09-27", > "2003-07-20", "1997-11-08", "1997-11-08")) > ref=c("2","2","1","1","2","1","1") > data1=data.frame(id,date,ref) > data1 id date ref 1 034 1997-09-28 2 2 034 1997-10-06 2 3 016 1997-11-04 1 4 016 2000-09-27 1 5 016 2003-07-20 2 6 340 1997-11-08 1 7 340 1997-11-08 1 I would like to have it like this: > data2 id date1 date2 date3 ref1 ref2 ref3 1 034 1997-09-28 1997-10-06 NA 2 2 NA 2 016 1997-11-04 2000-09-27 2003-07-20 1 1 2 3 340 1997-11-08 1997-11-08 NA 1 1 NA All I tried the reshape package but ended up in multiple variables for each of the dates and that is not what I would like to have. Thanks for you help. -- View this message in context: http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-format-with-date-variable-tp3648833p3648833.html Sent from the R help mailing list archive at Nabble.com. From marchywka at hotmail.com Wed Jul 6 15:49:56 2011 From: marchywka at hotmail.com (Mike Marchywka) Date: Wed, 6 Jul 2011 09:49:56 -0400 Subject: [R] superimposing network graphs In-Reply-To: References: Message-ID: > Date: Wed, 6 Jul 2011 13:10:14 +0200 > To: r-help at r-project.org > CC: buttsc at uci.edu > Subject: [R] superimposing network graphs > > Dear all, > > I have a undirected network (g), representing all the sexual relationships that ever existed in a model community. > I also have a directed edgelist (e) which is a subset of the edgelist of g. e represents the transmission pathway of HIV. > Now I would like to superimpose the picture of the sexual relationships with arrows in a different colour, to indicate where in the network HIV was transmitted. If you can't find an R answer, I've found that "dot" from graphviz is really helpful. It should not be too hard to generate the source code from your raw data file or R but others may have a more R answer. Package sna may be helpful for an R-only approach. > > Any ideas on how to do this? > Many thanks, > > Wim > > > > Wim Delva MD, PhD > International Centre for Reproductive Health > Ghent University, Belgium > www.icrh.org > > South African Centre for Epidemiological Modelling and Analysis > Stellenbosch University, South Africa > www.sacema.com > epi update: www. sacemaquarterly.com > > Tel: +27 21 808 27 79 (work) > Cell: +27 72 842 82 33 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From nsoranzo at tiscali.it Wed Jul 6 16:15:25 2011 From: nsoranzo at tiscali.it (Nicola Soranzo) Date: Wed, 6 Jul 2011 16:15:25 +0200 Subject: [R] hdf5 library install issue In-Reply-To: <4D5EA2CB.6040409@ibt.unam.mx> Message-ID: <201107061615.25776.nsoranzo@tiscali.it> On Fri Feb 18 17:48:11 CET 2011 J?r?me wrote: > I'm trying to install the hdf5 library into R. The HDF5 package is > installed in a special directory, distributed accross my cluster: > /share/apps/HDF5 > > So i put the args option to the configure command as i read in previous > post in the list: > > jerome]# R CMD INSTALL --configure-vars='LDFLAGS=-L/share/apps/HDF5/lib' > --configure-args='--with-hdf5=/share/apps/HDF5' hdf5_1.6.9.tar.gz > > * installing to library '/share/apps/R-core-2.12.1_SHLIB/lib64/R/library' > * installing *source* package 'hdf5' ... > checking for gcc... gcc > checking for C compiler default output file name... a.out > checking whether the C compiler works... yes > checking whether we are cross compiling... no > checking for suffix of executables... > checking for suffix of object files... o > checking whether we are using the GNU C compiler... yes > checking whether gcc accepts -g... yes > checking for gcc option to accept ANSI C... none needed > checking for library containing inflate... -lz > checking for library containing H5open... -lhdf5 > checking for sufficiently new HDF5... yes > configure: creating ./config.status > config.status: creating src/Makevars > ** libs > gcc -std=gnu99 -I/share/apps/R-core-2.12.1_SHLIB/lib64/R/include > -I/share/apps/HDF5/include -I/usr/local/include -fpic -g -O2 -c > hdf5.c -o hdf5.o > gcc -std=gnu99 -shared -L/usr/local/lib64 -o hdf5.so hdf5.o > -Wl,-rpath,/share/apps/HDF5/lib -lhdf5 -lz -lm > -L/share/apps/R-core-2.12.1_SHLIB/lib64/R/lib -lR > /usr/bin/ld: cannot find -lhdf5 > collect2: ld returned 1 exit status > make: *** [hdf5.so] Error 1 > ERROR: compilation failed for package 'hdf5' > * removing '/share/apps/R-core-2.12.1_SHLIB/lib64/R/library/hdf5' > > > So, the problem is that the configure don't use in a correct way the > option "--with-hdf5", and neither the other configure-vars option. > > How can i do to have this library usable in R? Dear J?r?me, I had a similar problem, you can try a patch I developed (and sent to hdf5 package maintainer with no luck...) http://biowiki.crs4.it/biowiki/NicolaSoranzo#Patches_for_R_packages Regards, Nicola From paul.guilhamon at gmail.com Wed Jul 6 18:04:57 2011 From: paul.guilhamon at gmail.com (pguilha) Date: Wed, 6 Jul 2011 09:04:57 -0700 (PDT) Subject: [R] clustering based on most significant pvalues does not separate the groups! In-Reply-To: <98B156BB22D11342A931E823798D434853E9728042@GOLD.corp.lgc-group.com> References: <1309803730461-3644249.post@n4.nabble.com> <98B156BB22D11342A931E823798D434853E9728042@GOLD.corp.lgc-group.com> Message-ID: <1309968297082-3649233.post@n4.nabble.com> Yes absolutely, your explanation makes sense. Thanks very much. rgds Paul -- View this message in context: http://r.789695.n4.nabble.com/clustering-based-on-most-significant-pvalues-does-not-separate-the-groups-tp3644249p3649233.html Sent from the R help mailing list archive at Nabble.com. From jms2cor4 at gmail.com Wed Jul 6 18:42:43 2011 From: jms2cor4 at gmail.com (Jamie Smith) Date: Wed, 6 Jul 2011 11:42:43 -0500 Subject: [R] working with values from ranef() Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From sarahlecl at hotmail.com Wed Jul 6 15:50:30 2011 From: sarahlecl at hotmail.com (Sarah Leclaire) Date: Wed, 6 Jul 2011 14:50:30 +0100 Subject: [R] relative euclidean distance Message-ID: Hi, I would like to calculate the RELATIVE euclidean distance. Is there a function in R which does it ? (I calculated the abundance of 94 chemical compounds in secretion of several individuals, and I would like to have the chemical distance between 2 individuals as expressed by the relative euclidean distance. Some compounds are in very low abundance whereas others are in high abundance, that's why I would like to correct for the abundance) Thanks, Sarah From rmartinezg at cnio.es Wed Jul 6 17:27:11 2011 From: rmartinezg at cnio.es (Raquel Martinez Garcia) Date: Wed, 06 Jul 2011 17:27:11 +0200 Subject: [R] wgcna Message-ID: <4E147ECF.5000309@cnio.es> An embedded and charset-unspecified text was scrubbed... Name: no disponible URL: From jwiley.psych at gmail.com Wed Jul 6 19:05:49 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 6 Jul 2011 10:05:49 -0700 Subject: [R] Reshape from long to wide format with date variable In-Reply-To: <1309959611143-3648833.post@n4.nabble.com> References: <1309959611143-3648833.post@n4.nabble.com> Message-ID: Hi Pete, Try the reshape function (see ?reshape for documentation). It can be a bit confusing, but its worth learning if you deal with multiple observations per unit much. Code inline does what you want (though you might need a bit of tweaking to get pretty names, etc. HTH, Josh On Wed, Jul 6, 2011 at 6:40 AM, Pete Pete wrote: > Hi, > > I need to reshape my dataframe from a long format to a wide format. > Unfortunately, I have a continuous date variable which gives me headaches. > > Consider the following example: >> id=c("034","034","016","016","016","340","340") >> date=as.Date(c("1997-09-28", "1997-10-06", "1997-11-04", "2000-09-27", >> "2003-07-20", "1997-11-08", "1997-11-08")) >> ref=c("2","2","1","1","2","1","1") >> data1=data.frame(id,date,ref) ## create time variable data1$time <- with(data1, ave(1:nrow(data1), id, FUN = seq_along)) wdata1 <- reshape(data1, idvar = "id", timevar = "time", direction = "wide") > wdata1 id date.1 ref.1 date.2 ref.2 date.3 ref.3 1 034 1997-09-28 2 1997-10-06 2 3 016 1997-11-04 1 2000-09-27 1 2003-07-20 2 6 340 1997-11-08 1 1997-11-08 1 >> data1 > ? id ? ? ? date ref > 1 034 1997-09-28 ? 2 > 2 034 1997-10-06 ? 2 > 3 016 1997-11-04 ? 1 > 4 016 2000-09-27 ? 1 > 5 016 2003-07-20 ? 2 > 6 340 1997-11-08 ? 1 > 7 340 1997-11-08 ? 1 > > > I would like to have it like this: >> data2 > ? id ? ? ?date1 ? ? ?date2 ? ? ?date3 ref1 ref2 ref3 > 1 034 1997-09-28 1997-10-06 ? ? ? ? NA ? ?2 ? ?2 ? NA > 2 016 1997-11-04 2000-09-27 2003-07-20 ? ?1 ? ?1 ? ?2 > 3 340 1997-11-08 1997-11-08 ? ? ? ? NA ? ?1 ? ?1 ? NA > > All I tried the reshape package but ended up in multiple variables for each > of the dates and that is not what I would like to have. > > Thanks for you help. > > -- > View this message in context: http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-format-with-date-variable-tp3648833p3648833.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From peter.langfelder at gmail.com Wed Jul 6 19:06:15 2011 From: peter.langfelder at gmail.com (Peter Langfelder) Date: Wed, 6 Jul 2011 10:06:15 -0700 Subject: [R] wgcna In-Reply-To: <4E147ECF.5000309@cnio.es> References: <4E147ECF.5000309@cnio.es> Message-ID: On Wed, Jul 6, 2011 at 8:27 AM, Raquel Martinez Garcia wrote: > Hi, > > I'm running a tutorial ("Meta-analyses of data from two (or more) microarray data sets"), which use wgcna package. I have an error in the function modulePreservation (it is below). > I'm using R2.13 > Can you help me? Do you know, what is happens? Hi Raquel, I'm the author of the function. I see you have already modified the tutorial with your own input. The error you see may be a bug in the function, but it may also be due to the fact that you use only 2 permutations. I suggest you try to increase the number of permutations to at least 10, but for meaningful results you should use at least 50. HTH, Peter From pmaclean2011 at yahoo.com Wed Jul 6 19:18:10 2011 From: pmaclean2011 at yahoo.com (Peter Maclean) Date: Wed, 6 Jul 2011 10:18:10 -0700 (PDT) Subject: [R] BY GROUP in evir R package In-Reply-To: References: <31464.80001.qm@web121719.mail.ne1.yahoo.com> <4DE939D8.8040305@pfaffikus.de> <1309937119.92702.YahooMailRC@web121711.mail.ne1.yahoo.com> Message-ID: <1309972690.71427.YahooMailRC@web121713.mail.ne1.yahoo.com> Dr. Pfaff: After using str; can you give an example on data extration (e.g. for $par.ests?and @residuals) ----- Original Message ---- From: "Pfaff, Bernhard Dr." To: Peter Maclean ; Dr. Bernhard Pfaff Cc: "r-help at r-project.org" Sent: Wed, July 6, 2011 8:17:12 AM Subject: AW: [R] BY GROUP in evir R package Hello Peter, str(rg2) us quite revealing for this; by() returns a list and hence lapply() can be employed, e.g.: lapply(rg2, rlevel.gev, k.blocks = 5) By the same token, you can extract the relevant bits and pieces and put them together in a data.frame. Best, Bernhard > -----Urspr?ngliche Nachricht----- > Von: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] Im Auftrag von Peter Maclean > Gesendet: Mittwoch, 6. Juli 2011 09:25 > An: Dr. Bernhard Pfaff > Cc: r-help at r-project.org > Betreff: Re: [R] BY GROUP in evir R package > > Dr.?Pfaff: > How do we pass the "by" results to "rlevel.gev" function to > get the?return level and also save the results (both > rg2(par.ests and $par.ses) and rl) as.data.frame? > > #Grouped vector > Gdata <- data.frame(n = rep(c(1,2,3), each = 100), y = rnorm(300)) > library(evir) > require(plyr) > > #Model for Grouped > rg2<- by(Gdata,Gdata[,"n"], function(x) gev(x$y, 5, method = > "BFGS", control =list(maxit = 500))) # rl <- rlevel.gev(rg2, > k.blocks = 5, add = TRUE) > ? > > > > ----- Original Message ---- > From: Dr. Bernhard Pfaff > To: Peter Maclean > Sent: Fri, June 3, 2011 2:45:28 PM > Subject: Re: BY GROUP in evir R package > > Hello Peter, > > many thanks for your email. Well, as you might have guessed, > there is also a > function by() in R that does the same job. See help("by") for > more information. > > Best, > Bernhard > > Peter Maclean schrieb: > > Hi, > > I am new in R and I want to use your package for data > analysis. I usually use > >SAS. I have rainfall data for different points. Each point > has 120 observations. > >The rainfall data is in the first column (RAIN) and the > categorical variable > >that group the data is in the second column (GROUP). The > data frame is > >rain.data. How can I use the gev function to estimate all > three parameters by > >GROUP variable group? In SAS there is a by() function that > estimate the model by > >group. However, I would like to move to R. > >? With thanks, > >? Peter Maclean > > Department of Economics > > University of Dar -es- Salaam, Tanzania > > > >? > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ***************************************************************** Confidentiality Note: The information contained in this message, and any attachments, may contain confidential and/or privileged material. It is intended solely for the person(s) or entity to which it is addressed. Any review, retransmission, dissemination, or taking of any action in reliance upon this information by persons or entities other than the intended recipient(s) is prohibited. If you received this in error, please contact the sender and delete the material from any computer. ******************************************** From pmaclean2011 at yahoo.com Wed Jul 6 20:06:49 2011 From: pmaclean2011 at yahoo.com (Peter Maclean) Date: Wed, 6 Jul 2011 11:06:49 -0700 (PDT) Subject: [R] Split a row vector into columns Message-ID: <1309975609.86208.YahooMailRC@web121711.mail.ne1.yahoo.com> I want to create?columns from this row vector. From: ??? x1 x2 x3 x1 x2 x3 x1 x2 x3 ???? 1? 2 3??1? 2??3??1?? 2? 3 to: x1 x2 x3 1? 2?? 3 1? 2?? 3 1? 2?? 3?Peter Maclean Department of Economics UDSM From ahrager at gmail.com Wed Jul 6 18:51:22 2011 From: ahrager at gmail.com (ahrager) Date: Wed, 6 Jul 2011 09:51:22 -0700 (PDT) Subject: [R] very large pair() plot In-Reply-To: <4E0C5D98.2050300@statistik.tu-dortmund.de> References: <1309382885395-3634075.post@n4.nabble.com> <4E0C5D98.2050300@statistik.tu-dortmund.de> Message-ID: <1309971082709-3649361.post@n4.nabble.com> Uwe, This worked. Thank you so much, Audrey -- View this message in context: http://r.789695.n4.nabble.com/very-large-pair-plot-tp3634075p3649361.html Sent from the R help mailing list archive at Nabble.com. From cliffclive at gmail.com Wed Jul 6 19:18:38 2011 From: cliffclive at gmail.com (Cliff Clive) Date: Wed, 6 Jul 2011 10:18:38 -0700 (PDT) Subject: [R] Writing dataframes side by side in a file Message-ID: <1309972718080-3649420.post@n4.nabble.com> Is there a quick and easy way to write data frames side-by-side in a csv file with one column separating them? I could just fill them with empty rows so they all have the same height, then cbind them with empty columns in between, but I'm looking for a more elegant solution, if one exists. Thanks in advance, Cliff -- View this message in context: http://r.789695.n4.nabble.com/Writing-dataframes-side-by-side-in-a-file-tp3649420p3649420.html Sent from the R help mailing list archive at Nabble.com. From sarah.goslee at gmail.com Wed Jul 6 20:15:31 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 6 Jul 2011 14:15:31 -0400 Subject: [R] Split a row vector into columns In-Reply-To: <1309975609.86208.YahooMailRC@web121711.mail.ne1.yahoo.com> References: <1309975609.86208.YahooMailRC@web121711.mail.ne1.yahoo.com> Message-ID: You mean like: > myvec <- c(1,2,3,1,2,3,1,2,3) > myvec [1] 1 2 3 1 2 3 1 2 3 > matrix(myvec, ncol=3, byrow=TRUE) [,1] [,2] [,3] [1,] 1 2 3 [2,] 1 2 3 [3,] 1 2 3 > Or do you actually have more complex requirements? Sarah On Wed, Jul 6, 2011 at 2:06 PM, Peter Maclean wrote: > I want to create?columns from this row vector. From: > ??? x1 x2 x3 x1 x2 x3 x1 x2 x3 > ???? 1? 2 3??1? 2??3??1?? 2? 3 > > to: > x1 x2 x3 > 1? 2?? 3 > 1? 2?? 3 > 1? 2?? 3?Peter Maclean > Department of Economics > UDSM > -- Sarah Goslee http://www.functionaldiversity.org From wdunlap at tibco.com Wed Jul 6 20:43:59 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 6 Jul 2011 11:43:59 -0700 Subject: [R] elegant way of removing NA's and selecting specific values from a data.frame In-Reply-To: References: <4E1480A6.10205@uea.ac.uk> Message-ID: <77EB52C6DD32BA4D87471DCD70C8D700046532A9@NA-PA-VBE03.na.tibco.com> The question was whether you could do this 'in a single line' and the word 'elegant' was in the subject line. Those two things don't always go together. You can put semicolons between the statements so they all can go on one line, but that isn't very elegant. You could collapse the three assignments into one, as in third <- e[,23][!is.na(e[,23])][e[,23][!is.na(e[,23])] < 30] but that wastes time repeatedly calculating e[,23] and is.na(...) and is hard to read, both of which count against its elegance score. One could use & to go from 3 assignments to 2, as in first <- e[,23] third <- first[!is.na(first) & first < 30] You could write a function that returns TRUE where its logical input vector is TRUE and FALSE where it is NA or FALSE: is.true <- function(x) !is.na(x) & x and use it as first <- e[,23] third <- first[is.true(first < 30)] To my eyes the last is the most elegant way to do things, as the flow of data is very clear. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of Bert Gunter > Sent: Wednesday, July 06, 2011 8:50 AM > To: Jim Maas > Cc: r-help at r-project.org > Subject: Re: [R] elegant way of removing NA's and selecting > specific values from a data.frame > > ?"&" > > This is basic. Please read "An Intro to R" before posting any more > such questions if you have not already done so. > > -- Bert > > On Wed, Jul 6, 2011 at 8:35 AM, Jim Maas wrote: > > I have a data.frame "e" and would like to extract the 23rd > column, remove > > any NA's and then remove any values >= 30. ?I can do it in > steps such as > > this but have failed to figure out how to do it in a single > line .... any > > suggestions? > > > > first <- e[,23] > > second <- first[!is.na(first)] > > third <- second[second<=30] > > > > thanks a bunch > > > > J > > > > -- > > Dr. Jim Maas > > University of East Anglia > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > "Men by nature long to get on to the ultimate truths, and will often > be impatient with elementary studies or fight shy of them. If it were > possible to reach the ultimate truths without the elementary studies > usually prefixed to them, these would not be preparatory studies but > superfluous diversions." > > -- Maimonides (1135-1204) > > Bert Gunter > Genentech Nonclinical Biostatistics > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From debs_stata at yahoo.com Wed Jul 6 21:22:12 2011 From: debs_stata at yahoo.com (Debs Majumdar) Date: Wed, 6 Jul 2011 12:22:12 -0700 (PDT) Subject: [R] Dealing with missing values in a linear mixed model Message-ID: <1309980132.34254.YahooMailNeo@web114702.mail.gq1.yahoo.com> Hello, ?? I am trying to run a linear mixed model using model.a <- lme(Psstotals ~ Visit, data=caf, random= ~ Visit|Id) My dataset looks lie the following: ? ? Id?? Visit? Agecorrected Psstotals 1 106???? 0?????????? 19??????????? 8???????? 2 106???? 1?????????? 19??????????? 9 ??????? 3 106???? 2?????????? 19?????????? NA?????? 4 106???? 3?????????? 19?????????? NA?????? 5 106???? 4?????????? 19?????????? NA?????? 6 106???? 5?????????? 19?????????? 11 ????? ........................ ............. I have 14 visits for each ID and I do have some missing values for each Id. I get the following error when I run the above model: Error in na.fail.default(list(Visit = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,? : ? missing values in object I do not want to do a listwise deletion as I would like to run model with all available data. So, I tried the following model.a <- lme(Psstotals ~ Visit, data=caf, random= ~Visit |Id, na.action=na.exclude) Is this the right syntax? Thanks, Debs From langkamp at tomblog.de Wed Jul 6 21:24:24 2011 From: langkamp at tomblog.de (tomtomme) Date: Wed, 6 Jul 2011 12:24:24 -0700 (PDT) Subject: [R] accessing names of lists in a list Message-ID: <1309980264422-3649750.post@n4.nabble.com> After importing multiple files to data.frames in R, I want to rename all their columns and do other operations with them. The data.frame names are not continuous like 1, 3, 4, 6. I could not find a way of creating a list of the data.frames and loop this and ended up putting them into a list first: # get all objects all.obj = sapply(ls(), get) # get data frames dfrs = all.obj[sapply(all.obj, is.data.frame)] but then I get lists within lists: structure(list(`1` = structure(list(Datum = structure(c(... my problem now is how to access the inner list, for example to rename the "Datum" to "date". The following changes only the outer list: names(dfrs) <- c("date", "time", "temp","") with the result: structure(list(date = structure(list(Datum = structure(c(... Or isn?t there a way to avoid the list and just loop through the data.frames of your workspace regardless of number and naming of the data.frames and thus apply different operations on them like the renaming of the columns? Many thanks! -- View this message in context: http://r.789695.n4.nabble.com/accessing-names-of-lists-in-a-list-tp3649750p3649750.html Sent from the R help mailing list archive at Nabble.com. From mailinglist.honeypot at gmail.com Wed Jul 6 21:25:13 2011 From: mailinglist.honeypot at gmail.com (Steve Lianoglou) Date: Wed, 6 Jul 2011 15:25:13 -0400 Subject: [R] Writing dataframes side by side in a file In-Reply-To: <1309972718080-3649420.post@n4.nabble.com> References: <1309972718080-3649420.post@n4.nabble.com> Message-ID: Hi, On Wed, Jul 6, 2011 at 1:18 PM, Cliff Clive wrote: > Is there a quick and easy way to write data frames side-by-side in a csv file > with one column separating them? > > I could just fill them with empty rows so they all have the same height, > then cbind them with empty columns in between, but I'm looking for a more > elegant solution, if one exists. Perhaps using `merge` and playing with the `all`, `all.x` and `all.y` parameters will help you in making the large data.frame you would could serialize using write.table ... it's similar in principle to cbind-ing, but may prove less laborious. -steve -- Steve Lianoglou Graduate Student: Computational Systems Biology ?| Memorial Sloan-Kettering Cancer Center ?| Weill Medical College of Cornell University Contact Info: http://cbio.mskcc.org/~lianos/contact From info at aghmed.fsnet.co.uk Wed Jul 6 21:25:55 2011 From: info at aghmed.fsnet.co.uk (Michael Dewey) Date: Wed, 06 Jul 2011 20:25:55 +0100 Subject: [R] Unusual graph- modified wind rose perhaps? In-Reply-To: <1309778511.5571.YahooMailClassic@web38408.mail.mud.yahoo.c om> References: <1309778511.5571.YahooMailClassic@web38408.mail.mud.yahoo.com> Message-ID: At 12:21 04/07/2011, John Kane wrote: >In a OpenOffice.org forum someone was asking if the spreadsheet >could graph this >http://www.elmundo.es/elmundosalud/documentos/2011/06/leche.html > >I didn't think it could. :) > >I don't think I've ever seen exactly this layout. Does anyone know >if there is anything in R that does a graph like this or that can be >adapted to do it. > >Unfortunately my Spanish is non-existent so I am not sure how >effective the graph is in achieving whatever it's suppposed to >do. A dot chart might be as effective but it is a flashy graphic. Well nobody seems to have taken up the challenge John, so here goes. In summary the graphics show the properties of brands of full-fat milk sold in Spain. Some of them are supermarket own brands and others general branded products. By clicking on the rectangular buttons across the top you can select which criterion to look at and when you do there is usually an explanatory text shown upper-left in the plot. By default you get overall quality rating. Good features of the graphic seem to me to be that the brands are always in the same order so if you buy your milk from (say) the Eroski supermarket chain you can track it easily through the different criteria. Another good feature for me is the fact that the underlying values do not clutter up the visual presentation but can be retrieved by hovering over the sector Several of the criteria are inherently continuous but have been categorised which seems to me a shame. Overall for a newspaper graphic it seemed of good quality to me but YMMV. >Thanks Michael Dewey info at aghmed.fsnet.co.uk http://www.aghmed.fsnet.co.uk/home.html From djmuser at gmail.com Wed Jul 6 22:08:52 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 6 Jul 2011 13:08:52 -0700 Subject: [R] Reshape from long to wide format with date variable In-Reply-To: <1309959611143-3648833.post@n4.nabble.com> References: <1309959611143-3648833.post@n4.nabble.com> Message-ID: Hi: Here's one way with the reshape package. I converted ref to numeric and date to character string first. Sometimes these little things matter... library(plyr) library(reshape) # Modified original data; note the option in the data.frame() statement id=c("034","034","016","016","016","340","340") date=c("1997-09-28", "1997-10-06", "1997-11-04", "2000-09-27", "2003-07-20", "1997-11-08", "1997-11-08") ref=c(2, 2, 1, 1, 2, 1, 1) data1=data.frame(id, date, ref, stringsAsFactors = FALSE) # Add a new variable named occasion within id data2 <- ddply(data1, .(id), transform, occasion = seq_along(id)) # Use the cast() function in reshape twice, adjust names in each c1 <- cast(data2, id ~ occasion, value = 'date') c2 <- cast(data2, id ~ occasion, value = 'ref') names(c1)[-1] <- paste('date', 1:3, sep = '') names(c2)[-1] <- paste('ref', 1:3, sep = '') # merge c1 and c2 by id: merge(c1, c2, by = 'id') The cast() function sets the rows to be ids, the columns to be occasion and value to be the name of the variable whose values should fill the cells attributable to id * occasion combinations. HTH, Dennis On Wed, Jul 6, 2011 at 6:40 AM, Pete Pete wrote: > Hi, > > I need to reshape my dataframe from a long format to a wide format. > Unfortunately, I have a continuous date variable which gives me headaches. > > Consider the following example: >> id=c("034","034","016","016","016","340","340") >> date=as.Date(c("1997-09-28", "1997-10-06", "1997-11-04", "2000-09-27", >> "2003-07-20", "1997-11-08", "1997-11-08")) >> ref=c("2","2","1","1","2","1","1") >> data1=data.frame(id,date,ref) >> data1 > ? id ? ? ? date ref > 1 034 1997-09-28 ? 2 > 2 034 1997-10-06 ? 2 > 3 016 1997-11-04 ? 1 > 4 016 2000-09-27 ? 1 > 5 016 2003-07-20 ? 2 > 6 340 1997-11-08 ? 1 > 7 340 1997-11-08 ? 1 > > > I would like to have it like this: >> data2 > ? id ? ? ?date1 ? ? ?date2 ? ? ?date3 ref1 ref2 ref3 > 1 034 1997-09-28 1997-10-06 ? ? ? ? NA ? ?2 ? ?2 ? NA > 2 016 1997-11-04 2000-09-27 2003-07-20 ? ?1 ? ?1 ? ?2 > 3 340 1997-11-08 1997-11-08 ? ? ? ? NA ? ?1 ? ?1 ? NA > > All I tried the reshape package but ended up in multiple variables for each > of the dates and that is not what I would like to have. > > Thanks for you help. > > -- > View this message in context: http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-format-with-date-variable-tp3648833p3648833.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From srcorsi at usgs.gov Wed Jul 6 22:13:05 2011 From: srcorsi at usgs.gov (Steven R Corsi) Date: Wed, 06 Jul 2011 15:13:05 -0500 Subject: [R] reading data from password protected url In-Reply-To: <4E066C43.8080208@wald.ucdavis.edu> References: <4E04FDAD.50402@usgs.gov> <4E066C43.8080208@wald.ucdavis.edu> Message-ID: <4E14C1D1.50007@usgs.gov> Hi Duncan Your method works well for my situation when I make only one call to the database/URL with the login info. Our database is configured like the first situation (cookies) that you described below. Now, I will need to make multiple successive calls to get data for different sites in the database (one call per site). It doesn't seem to work at times when I do this. Is there something that needs to be done to re-initialize (Do I need to log out before making the second call)? Thanks Steve =============================================== Steven R. Corsi Phone: (608) 821-3835 Research Hydrologist email: srcorsi at usgs.gov U.S. Geological Survey Wisconsin Water Science Center 8505 Research Way Middleton, WI 53562 =============================================== On 6/25/2011 6:16 PM, Duncan Temple Lang wrote: > Hi Steve > > RCurl can help you when you need to have more control over Web requests. > The details vary from Web site to Web site and the different ways to specify > passwords, etc. > > If the JSESSIONID and NCES_JSESSIONID are regular cookies and returned in the first > request as cookies, then you can just have RCurl handle the cookies > But the basics for your case are > > library(RCurl) > h = getCurlHandle( cookiefile = "") > > Then make your Web request using getURLContent(), getForm() or postForm() > but making certain to pass the curl handle stored in h in each call, e.g. > > ans = getForm(yourURL, login = "bob", password = "jane", curl = h) > > txt = getURLContent(dataURL, curl = h) > > > If JSESSIONID and NCES_JSESSIONID are not returned as cookies but HTTP header fields, then you > need to process the header. > Something like > > rdr = dynCurlReader(h) > > ans = getForm(yourURL, login = "bob", password = "jane", curl = h, header = rdr$update) > > Then the header from the HTTP response is available as > rdr$header() > > and you can use parseHTTPHeader(rdr$header()) to convert it into a named vector. > > > HTH, > D. > > On 6/24/11 2:12 PM, Steven R Corsi wrote: >> I am trying to retrieve data from a password protected database. I have login information and the proper url. When I >> make a request to the url, I get back some info, but need to read the "hidden header" information that has JSESSIONID >> and NCES_JSESSIONID. They need to be used to set cookies before sending off the actual url request that will result in >> the data transfer. Any help would be much appreciated. >> Thanks >> Steve >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Greg.Snow at imail.org Wed Jul 6 22:20:21 2011 From: Greg.Snow at imail.org (Greg Snow) Date: Wed, 6 Jul 2011 14:20:21 -0600 Subject: [R] accessing names of lists in a list In-Reply-To: <1309980264422-3649750.post@n4.nabble.com> References: <1309980264422-3649750.post@n4.nabble.com> Message-ID: In general, if the data frames are all related then it is best to keep them together in a list like you have. But if you want to change the names of the component data frames then you can use a loop, or sometimes better use the lapply function. Here is a basic example: tmp <- list( df1=data.frame(x=1:10, y=rnorm(10)), df2=data.frame(x=1:100, y=rnorm(100)), df3=data.frame(x=1:1000, y=rnorm(1000))) str(tmp) tmp <- lapply( tmp, function(x){ names(x)[1] <- 'xx';x}) str(tmp) -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.snow at imail.org 801.408.8111 > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r- > project.org] On Behalf Of tomtomme > Sent: Wednesday, July 06, 2011 1:24 PM > To: r-help at r-project.org > Subject: [R] accessing names of lists in a list > > After importing multiple files to data.frames in R, I want to rename > all > their columns and do other operations with them. The data.frame names > are > not continuous like 1, 3, 4, 6. > I could not find a way of creating a list of the data.frames and loop > this > and ended up putting them into a list first: > > # get all objects > all.obj = sapply(ls(), get) > # get data frames > dfrs = all.obj[sapply(all.obj, is.data.frame)] > > but then I get lists within lists: > > structure(list(`1` = structure(list(Datum = structure(c(... > > my problem now is how to access the inner list, for example to rename > the > "Datum" to "date". The following changes only the outer list: > > names(dfrs) <- c("date", "time", "temp","") > > with the result: > structure(list(date = structure(list(Datum = structure(c(... > > Or isn?t there a way to avoid the list and just loop through the > data.frames > of your workspace regardless of number and naming of the data.frames > and > thus apply different operations on them like the renaming of the > columns? > Many thanks! > > -- > View this message in context: http://r.789695.n4.nabble.com/accessing- > names-of-lists-in-a-list-tp3649750p3649750.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code. From jwiley.psych at gmail.com Wed Jul 6 22:22:31 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 6 Jul 2011 13:22:31 -0700 Subject: [R] Wrong environment when evaluating and expression? In-Reply-To: <77EB52C6DD32BA4D87471DCD70C8D70004653091@NA-PA-VBE03.na.tibco.com> References: <77EB52C6DD32BA4D87471DCD70C8D70004653091@NA-PA-VBE03.na.tibco.com> Message-ID: Thanks Bill! That is very useful. Is the str.language function in any package (findFn("str.language") came up empty)? It certainly helped me, not only to understand this particular problem, but in trying to wrap my head around language objects (which I only very poorly grasp) in general. Josh On Tue, Jul 5, 2011 at 11:48 AM, William Dunlap wrote: >> -----Original Message----- >> From: r-help-bounces at r-project.org >> [mailto:r-help-bounces at r-project.org] On Behalf Of Joshua Wiley >> Sent: Monday, July 04, 2011 1:12 AM >> To: r-help at r-project.org >> Subject: [R] Wrong environment when evaluating and expression? >> >> Hi All, >> >> I have constructed two expressions (e1 & e2). ?I can see that they are >> not identical, but I cannot figure out how they differ. >> >> ############### >> dat <- mtcars >> e1 <- expression(with(data = dat, lm(mpg ~ hp))) >> e2 <- as.expression(substitute(with(data = dat, lm(f)), >> list(f = mpg ~ hp))) >> >> str(e1) >> str(e2) >> all.equal(e1, e2) >> identical(e1, e2) # false > > With the appended str.language function you can see the difference > between e1 and e2. ?It displays > ?`name` class(length) > of each component of a recursive object, along with a short text summary > of > it after a colon. > >> str.language(e1) > `e1` expression(1): expression(with(data = da... > ?`` call(3): with(data = dat, lm(mpg ~... > ? ?`` name(1): with > ? ?`data` name(1): dat > ? ?`` call(2): lm(mpg ~ hp) > ? ? ?`` name(1): lm > ? ? ?`` call(3): mpg ~ hp > ? ? ? ?`` name(1): ~ > ? ? ? ?`` name(1): mpg > ? ? ? ?`` name(1): hp >> str.language(e2) > `e2` expression(1): expression(with(data = da... > ?`` call(3): with(data = dat, lm(mpg ~... > ? ?`` name(1): with > ? ?`data` name(1): dat > ? ?`` call(2): lm(mpg ~ hp) > ? ? ?`` name(1): lm > ? ? ?`` formula(3): mpg ~ hp > ? ? ? ?`` name(1): ~ > ? ? ? ?`` name(1): mpg > ? ? ? ?`` name(1): hp > ? ? ? ?`Attributes of ` list(2): structure(list(class = "f... > ? ? ? ? ?`class` character(1): "formula" > ? ? ? ? ?`.Environment` environment(5): dat e1 e2 s... > > It is a bug in all.equal() that it ignores attributes of formulae. > E.g., > > ?> all.equal(y~x, terms(y~x)) > ?[1] TRUE > ?> identical(y~x, terms(y~x)) > ?[1] FALSE > > Here is str.language > > str.language <- > function (object, ..., level = 0, name = deparse(substitute(object)), > ? ?attributes = TRUE) > { > ? ?abbr <- function(string, maxlen = 25) { > ? ? ? ?if (length(string) > 1 || nchar(string) > maxlen) > ? ? ? ? ? ?paste(substring(string[1], 1, maxlen), "...", sep = "") > ? ? ? ?else string > ? ?} > ? ?myDeparse <- function(object) { > ? ? ? ?if (!is.environment(object)) { > ? ? ? ? ? ?deparse(object) > ? ? ? ?} > ? ? ? ?else { > ? ? ? ? ? ?ename <- environmentName(object) > ? ? ? ? ? ?if (ename == "") > ? ? ? ? ? ? ? ?ename <- "" > ? ? ? ? ? ?paste(sep = "", "<", ename, "> ", paste(collapse = " ", > ? ? ? ? ? ? ? ?objects(object))) > ? ? ? ?} > ? ?} > ? ?cat(rep(" ?", level), sep = "") > ? ?if (is.null(name)) > ? ? ? ?name <- "" > ? ?cat(sprintf("`%s` %s(%d): %s\n", abbr(name), class(object), > ? ? ? ?length(object), abbr(myDeparse(object)))) > ? ?a <- attributes(object) > ? ?if (is.recursive(object) && !is.environment(object)) { > ? ? ? ?object <- as.list(object) > ? ? ? ?names <- names(object) > ? ? ? ?for (i in seq_along(object)) { > ? ? ? ? ? ?str.language(object[[i]], ..., level = level + 1, > ? ? ? ? ? ? ? ?name = names[i], attributes = attributes) > ? ? ? ?} > ? ?} > ? ?if (attributes) { > ? ? ? ?a$names <- NULL > ? ? ? ?if (length(a) > 0) { > ? ? ? ? ? ?str.language(a, level = level + 1, name = paste("Attributes > of", > ? ? ? ? ? ? ? ?abbr(name)), attributes = attributes) > ? ? ? ?} > ? ?} > } > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> >> eval(e1) >> eval(e2) >> ################ >> >> The context is trying to use a list of formulae to generate several >> models from a multiply imputed dataset. ?The package I am using (mice) >> has methods for with() and that is how I can (easily) get the pooled >> results. ?Passing the formula directly does not work, so I was trying >> to generate the entire call and evaluate it as if I had typed it at >> the console, but I am missing something (probably rather silly). >> >> Thanks, >> >> Josh >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> University of California, Los Angeles >> http://www.joshuawiley.com/ >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From jrkrideau at yahoo.ca Wed Jul 6 22:26:27 2011 From: jrkrideau at yahoo.ca (John Kane) Date: Wed, 6 Jul 2011 13:26:27 -0700 (PDT) Subject: [R] Unusual graph- modified wind rose perhaps? In-Reply-To: Message-ID: <1309983987.82343.YahooMailClassic@web38407.mail.mud.yahoo.com> Thanks Michael. It looked like a decent display but it's nice to know what is going on. I has figured out something to do with milk but that was my total knowledge.. --- On Wed, 7/6/11, Michael Dewey wrote: > From: Michael Dewey > Subject: Re: [R] Unusual graph- modified wind rose perhaps? > To: "John Kane" , r-help at r-project.org > Received: Wednesday, July 6, 2011, 3:25 PM > At 12:21 04/07/2011, John Kane > wrote: > > > In a OpenOffice.org forum someone was asking if the > spreadsheet could graph this http://www.elmundo.es/elmundosalud/documentos/2011/06/leche.html > > > > I didn't think it could. :) > > > > I don't think I've ever seen exactly this layout. Does > anyone know if there is anything in R that does a graph like > this or that can be adapted to do it. > > > > Unfortunately my Spanish is non-existent so I am not > sure how effective the graph is in achieving whatever it's > suppposed to do.? A dot chart might be as effective but > it is a flashy graphic. > > Well nobody seems to have taken up the challenge John, so > here goes. > > In summary the graphics show the properties of brands of > full-fat milk sold in Spain. Some of them are supermarket > own brands and others general branded products. By clicking > on the rectangular buttons across the top you can select > which criterion to look at and when you do there is usually > an explanatory text shown upper-left in the plot. By default > you get overall quality rating. > > Good features of the graphic seem to me to be that the > brands are always in the same order so if you buy your milk > from (say) the Eroski supermarket chain you can track it > easily through the different criteria. Another good feature > for me is the fact that the underlying values do not clutter > up the visual presentation but can be retrieved by hovering > over the sector > > Several of the criteria are inherently continuous but have > been categorised which seems to me a shame. > > Overall for a newspaper graphic it seemed of good quality > to me but YMMV. > > > > Thanks > > Michael Dewey > info at aghmed.fsnet.co.uk > http://www.aghmed.fsnet.co.uk/home.html > > From jrkrideau at yahoo.ca Wed Jul 6 22:32:12 2011 From: jrkrideau at yahoo.ca (John Kane) Date: Wed, 6 Jul 2011 13:32:12 -0700 (PDT) Subject: [R] Split a row vector into columns In-Reply-To: <1309975609.86208.YahooMailRC@web121711.mail.ne1.yahoo.com> Message-ID: <1309984332.47029.YahooMailClassic@web38405.mail.mud.yahoo.com> t(matrix(rep(1:3, 3), nrow=3)) --- On Wed, 7/6/11, Peter Maclean wrote: > From: Peter Maclean > Subject: Re: [R] Split a row vector into columns > To: r-help at r-project.org > Received: Wednesday, July 6, 2011, 2:06 PM > I want to create?columns from this > row vector. From: > ??? x1 x2 x3 x1 x2 x3 x1 x2 x3 > ???? 1? 2 3??1? 2??3??1?? 2? 3 > > to: > x1 x2 x3 > 1? 2?? 3 > 1? 2?? 3 > 1? 2?? 3?Peter Maclean > Department of Economics > UDSM > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > From annemarie.verkerk at mpi.nl Wed Jul 6 21:31:51 2011 From: annemarie.verkerk at mpi.nl (Annemarie Verkerk) Date: Wed, 06 Jul 2011 21:31:51 +0200 Subject: [R] question about getting things out of an lapply Message-ID: <4E14B827.9070800@mpi.nl> Dear R-help subscribers, I have a quite stupid question about using lapply. I have the following function: create.gradient <- function(i){ colorgrad01<-color.scale(seq(0,1,by=0.01), extremes=c("red","blue")) tree1$edge[i,1] -> x tree1$edge[i,2] -> y print(x) print(y) all2[x] -> z all2[y] -> z2 round(z, digits = 2) -> z round(z2, digits = 2) -> z2 z*100 -> z z2*100 -> z2 print(z) print(z2) colorgrad<-colorgrad01[z:z2] colorgrad } Basically, I want to pick a partial gradient out of a bigger gradient (colorgrad01) for values that are on row i, from a matrix called tree1. when I use lapply: lapply(tree1$edge, create.gradient) I get the following error message: Error in FUN(X[[27L]], ...) : subscript out of bounds I'm not sure what's wrong: it could be either fact that 'colorgrad' is a character string; i.e. consisting of multiple characters and not just one, or because 'i' doesn't come back in the object 'colorgrad' that it has to return. Or it could be something else entirely... In any case, what I prefer as output is a vector with all the different 'colorgrad's it generates with each run. Thanks a lot for any help you might be able to offer! Annemarie -- Annemarie Verkerk, MA Evolutionary Processes in Language and Culture (PhD student) Max Planck Institute for Psycholinguistics P.O. Box 310, 6500AH Nijmegen, The Netherlands +31 (0)24 3521 185 http://www.mpi.nl/research/research-projects/evolutionary-processes From dpitkin at pobox.com Wed Jul 6 21:17:19 2011 From: dpitkin at pobox.com (David Pitkin) Date: Wed, 6 Jul 2011 15:17:19 -0400 Subject: [R] trouble parsing a date using strptime() Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From nevo84 at gmail.com Wed Jul 6 22:37:05 2011 From: nevo84 at gmail.com (omernevo) Date: Wed, 6 Jul 2011 13:37:05 -0700 (PDT) Subject: [R] Showing which bars in a bar chart are significantly different Message-ID: <1309984625369-3649897.post@n4.nabble.com> Hello, a probably rather stupid question to which I can't find an answer: I have a bar chart, and I want to present which bars are significantly different by placing a line with an asterisk above then (similarly to fig. 3 in: http://jnm.snmjournals.org/content/46/4/574.figures-only). Does anyone have a reference where can I find some instructions how to learn this? Thanks a lot! Omer -- View this message in context: http://r.789695.n4.nabble.com/Showing-which-bars-in-a-bar-chart-are-significantly-different-tp3649897p3649897.html Sent from the R help mailing list archive at Nabble.com. From slomascolo at gmail.com Wed Jul 6 20:13:50 2011 From: slomascolo at gmail.com (Silvia Lomascolo) Date: Wed, 6 Jul 2011 11:13:50 -0700 (PDT) Subject: [R] significant results with KW but not in post-hoc test Message-ID: <1309976030932-3649545.post@n4.nabble.com> Dear all, I did a Kruskall-Wallis test for a comparison of a variable of interest between 10 sites and I get a significant result (p=0.0019). however, when I perform a post-hoc test using kruskalmc from the pgirmess package, I get no difference between any of the paired comparisons. I cannot find anything in the internet, not in my stats books about how to explain this contradictory result. I have also seen that somebody else asked this question in this forum but she got no answers. Can anybody help me understand what is going on? Thanks, Silvia. -- View this message in context: http://r.789695.n4.nabble.com/significant-results-with-KW-but-not-in-post-hoc-test-tp3649545p3649545.html Sent from the R help mailing list archive at Nabble.com. From jwiley.psych at gmail.com Wed Jul 6 22:44:10 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 6 Jul 2011 13:44:10 -0700 Subject: [R] question about getting things out of an lapply In-Reply-To: <4E14B827.9070800@mpi.nl> References: <4E14B827.9070800@mpi.nl> Message-ID: Dear Annemarie, Can you replicate the problem using a madeup dataset or one of the ones built into R? It strikes me as odd to pass tree1$edge directly to lapply, when it is also hardcoded into the function, but I do not have a sense exactly for what you are doing and without data it is hard to play around. Cheers, Josh On Wed, Jul 6, 2011 at 12:31 PM, Annemarie Verkerk wrote: > Dear R-help subscribers, > > I have a quite stupid question about using lapply. I have the following > function: > > create.gradient <- function(i){ > colorgrad01<-color.scale(seq(0,1,by=0.01), extremes=c("red","blue")) > tree1$edge[i,1] -> x this works, but it would typically be written: x <- tree1$edge[i, 1] flipping back and forth can be a smidge (about 5 pinches under an iota) confusing. > tree1$edge[i,2] -> y > print(x) > print(y) > all2[x] -> z > all2[y] -> z2 > round(z, digits = 2) -> z > round(z2, digits = 2) -> z2 > z*100 -> z > z2*100 -> z2 > print(z) > print(z2) > colorgrad<-colorgrad01[z:z2] > colorgrad > } > > Basically, I want to pick a partial gradient out of a bigger gradient > (colorgrad01) for values that are on row i, from a matrix called tree1. > > when I use lapply: > > lapply(tree1$edge, create.gradient) > > I get the following error message: > > Error in FUN(X[[27L]], ...) : subscript out of bounds > > I'm not sure what's wrong: it could be either fact that 'colorgrad' is a > character string; i.e. consisting of multiple characters and not just one, > or because 'i' doesn't come back in the object 'colorgrad' that it has to > return. Or it could be something else entirely... > > In any case, what I prefer as output is a vector with all the different > 'colorgrad's it generates with each run. > > Thanks a lot for any help you might be able to offer! > Annemarie > > -- > Annemarie Verkerk, MA > Evolutionary Processes in Language and Culture (PhD student) > Max Planck Institute for Psycholinguistics > P.O. Box 310, 6500AH Nijmegen, The Netherlands > +31 (0)24 3521 185 > http://www.mpi.nl/research/research-projects/evolutionary-processes > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From sds at gnu.org Wed Jul 6 22:44:41 2011 From: sds at gnu.org (Sam Steingold) Date: Wed, 06 Jul 2011 16:44:41 -0400 Subject: [R] hash table access, vector access &c In-Reply-To: (Sam Steingold's message of "Tue, 05 Jul 2011 15:30:02 -0400") References: <51511E1A-F2E3-40F8-88B0-4AC3B0066E4A@comcast.net> Message-ID: > * Sam Steingold [2011-07-05 15:30:02 -0400]: > > I want to modify etr.rt (or create a new frame etr.rt.md) which would > have all the columns of etr.rt plus 5 additional columns > > market.cap > X52.week.low > X52.week.high > X3.month.average.daily.volume > X50.day.moving.average.price > > which for the row number i in etr.rt come from > ysmd.table[[as.character(etr.rt$symbol[[i]])]] > (obviously, > > etr.rt$symbol[[i]] == ysmd.table[[as.character(etr.rt$symbol[[i]])]]$X.stock > > ) this function does the job but it unthinkably slow. ysmd.extend <- function (fr) { len <- dim(fr)[1]; fr$mcap <- vector(mode="numeric", len); fr$lo52 <- vector(mode="numeric", len); fr$hi52 <- vector(mode="numeric", len); fr$dvol <- vector(mode="numeric", len); fr$ma50 <- vector(mode="numeric", len); for (i in 1:len) { cat(i," ",fr$symbol[[i]],"(",as.character(fr$symbol[[i]]),")\n") tmp <- ysmd.table[[as.character(fr$symbol[[i]])]]; fr$mcap[i] <- tmp$market.cap; fr$lo52[i] <- tmp$X52.week.low; fr$hi52[i] <- tmp$X52.week.high; fr$dvol[i] <- tmp$X3.month.average.daily.volume; fr$ma50[i] <- tmp$X50.day.moving.average.price; } fr } system.time(etr.rt <- ysmd.extend(etr.rt)); is there a way to do this without the inner for loop? thanks! -- Sam Steingold (http://sds.podval.org/) on CentOS release 5.6 (Final) X 11.0.60900031 http://mideasttruth.com http://www.memritv.org http://ffii.org http://iris.org.il http://dhimmi.com http://memri.org http://palestinefacts.org Politically Correct Chess: Translucent VS. Transparent. From B.Ouattara at swansea.ac.uk Wed Jul 6 22:45:47 2011 From: B.Ouattara at swansea.ac.uk (Ouattara) Date: Wed, 6 Jul 2011 21:45:47 +0100 Subject: [R] help with nprmpi Message-ID: <003901cc3c1d$afae5de0$0f0b19a0$@ouattara@swansea.ac.uk> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From csun at cfr.msstate.edu Wed Jul 6 22:51:30 2011 From: csun at cfr.msstate.edu (Edwin Sun) Date: Wed, 6 Jul 2011 13:51:30 -0700 (PDT) Subject: [R] Piecewise distribution function estimation with Generalized Pareto for tail Message-ID: <1309985490542-3649961.post@n4.nabble.com> Hello all, I am trying to estimate the cumulative distribution function for a single stock return time series. A piecewise estimation is composed of three parts: parametric generalized Pareto (GP) for the lower tail (10% of the observation), non-parametric kernel-smoothed interior (80% of the observations), and GP for the upper tail (10%). I wonder if anyone has clue about this in R. The software of Matlab has a function called 'paretotails' in the Econometrics Toolbox to do the estimation. On this site, a couple of old messages were related but no clear answers were given. Thank you, Edwin Sun -- View this message in context: http://r.789695.n4.nabble.com/Piecewise-distribution-function-estimation-with-Generalized-Pareto-for-tail-tp3649961p3649961.html Sent from the R help mailing list archive at Nabble.com. From sarah.goslee at gmail.com Wed Jul 6 22:52:18 2011 From: sarah.goslee at gmail.com (Sarah Goslee) Date: Wed, 6 Jul 2011 16:52:18 -0400 Subject: [R] Showing which bars in a bar chart are significantly different In-Reply-To: <1309984625369-3649897.post@n4.nabble.com> References: <1309984625369-3649897.post@n4.nabble.com> Message-ID: I'd do it by hand with either segments() or arrows() and text(), but without a reproducible example I can't give you specific instructions. Sarah On Wed, Jul 6, 2011 at 4:37 PM, omernevo wrote: > Hello, > > a probably rather stupid question to which I can't find an answer: > > I have a bar chart, and I want to present which bars are significantly > different by placing a line with an asterisk above then (similarly to fig. 3 > in: http://jnm.snmjournals.org/content/46/4/574.figures-only). > > Does anyone have a reference where can I find some instructions how to learn > this? > > Thanks a lot! > Omer > -- Sarah Goslee http://www.functionaldiversity.org From jwiley.psych at gmail.com Wed Jul 6 22:55:03 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 6 Jul 2011 13:55:03 -0700 Subject: [R] Showing which bars in a bar chart are significantly different In-Reply-To: <1309984625369-3649897.post@n4.nabble.com> References: <1309984625369-3649897.post@n4.nabble.com> Message-ID: On Wed, Jul 6, 2011 at 1:37 PM, omernevo wrote: > Hello, > > a probably rather stupid question to which I can't find an answer: > > I have a bar chart, and I want to present which bars are significantly > different by placing a line with an asterisk above then (similarly to fig. 3 I would highly recommend some reading about data visualizations techniques (and while you're at it, something on significance testing). Here are two: http://www.b-eye-network.com/view/2468 http://biostat.mc.vanderbilt.edu/wiki/Main/DynamitePlots I would argue for a paradigm switch. > in: http://jnm.snmjournals.org/content/46/4/574.figures-only). > > Does anyone have a reference where can I find some instructions how to learn > this? I don't know of an automated way off hand. But, if you assign the results of barplot() (I am assuming you are using traditional graphics), tmp <- barplot(rnorm(20)) tmp # has the locations on the x axis for each bar now you could go to town with lines() (see ?lines for documentation) to get those bracket thingies and text() to add the asterisks and labels. Alternately, you could add asterisks and such just using points() points(x, y, pch = "*") where x and y are the coordinates where you want the * placed. Good luck, Josh > > Thanks a lot! > Omer > > -- > View this message in context: http://r.789695.n4.nabble.com/Showing-which-bars-in-a-bar-chart-are-significantly-different-tp3649897p3649897.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From jrkrideau at yahoo.ca Wed Jul 6 23:01:10 2011 From: jrkrideau at yahoo.ca (John Kane) Date: Wed, 6 Jul 2011 14:01:10 -0700 (PDT) Subject: [R] Showing which bars in a bar chart are significantly different In-Reply-To: <1309984625369-3649897.post@n4.nabble.com> Message-ID: <1309986070.80660.YahooMailClassic@web38407.mail.mud.yahoo.com> # Incremental approach bb <- c(23, 45, 67) bsp <- barplot(bb,beside=TRUE) # get midpoints of the bars and plot # draw lines segments( .7, 50, 1.9, 50) segments(.7, 50, .7, 48) segments(1.9, 50, 1.9, 48) # Or all in one go segments(c(.7, .7, 1.9), c(50,50,50), c(1.9,.7,1.9), c(50, 48, 48)) --- On Wed, 7/6/11, omernevo wrote: > From: omernevo > Subject: [R] Showing which bars in a bar chart are significantly different > To: r-help at r-project.org > Received: Wednesday, July 6, 2011, 4:37 PM > Hello, > > a probably rather stupid question to which I can't find an > answer: > > I have a bar chart, and I want to present which bars are > significantly > different by placing a line with an asterisk above then > (similarly to fig. 3 > in: http://jnm.snmjournals.org/content/46/4/574.figures-only). > > Does anyone have a reference where can I find some > instructions how to learn > this? > > Thanks a lot! > Omer > > -- > View this message in context: http://r.789695.n4.nabble.com/Showing-which-bars-in-a-bar-chart-are-significantly-different-tp3649897p3649897.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > From jholtman at gmail.com Wed Jul 6 23:06:29 2011 From: jholtman at gmail.com (jim holtman) Date: Wed, 6 Jul 2011 17:06:29 -0400 Subject: [R] trouble parsing a date using strptime() In-Reply-To: References: Message-ID: paste on a 'day' since it is trying to convert to something that is ambigous: > strptime("2011010","%Y%W%w") [1] "2011-01-02" > strptime("2011520","%Y%W%w") [1] "2011-12-25" > strptime("2011120","%Y%W%w") [1] "2011-03-20" > strptime("2011200","%Y%W%w") [1] "2011-05-15" On Wed, Jul 6, 2011 at 3:17 PM, David Pitkin wrote: > Hi, > > I am having a trouble parsing dates using strptime() that I get in the > format of year and week number. The data looks like this "201127" which > means year 2011 and week 27. I would like to graph this using ggplot but > then I get a gap between 201054 and 201101 so I thought I would just easily > convert it. > > I tried to use strptime and as.Date and the format string of %Y%W but it > seems to only parse the year out of the string and then return today's month > and date. Can anyone point me where I am going wrong or another avenue to > try? > > David > >> strptime("201101","%Y%W")[1] "2011-07-06" > >> strptime("201001","%Y%W")[1] "2010-07-06" > >> strptime("201114","%Y%W")[1] "2011-07-06" > >> strptime("201130","%Y%W")[1] "2011-07-06" > > > >> sessionInfo() > R version 2.12.2 (2011-02-25) > Platform: x86_64-pc-mingw32/x64 (64-bit) > > locale: > [1] LC_COLLATE=English_United States.1252 > [2] LC_CTYPE=English_United States.1252 > [3] LC_MONETARY=English_United States.1252 > [4] LC_NUMERIC=C > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] grid ? ? ?stats ? ? graphics ?grDevices utils ? ? datasets ?methods > [8] base > > other attached packages: > [1] RSiteSearch_1.0-7 sos_1.3-0 ? ? ? ? brew_1.0-6 ? ? ? ?ggplot2_0.8.9 > [5] proto_0.3-9.1 ? ? reshape_0.8.4 ? ? plyr_1.4 ? ? ? ? ?RODBC_1.3-2 > > loaded via a namespace (and not attached): > [1] digest_0.4.2 tools_2.12.2 > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From stevenkennedy2263 at gmail.com Wed Jul 6 23:17:54 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Thu, 7 Jul 2011 07:17:54 +1000 Subject: [R] Subset creates row_names column when exported to MYSQL In-Reply-To: References: Message-ID: What function are you using to export your data.frame to MySQL? On Wed, Jul 6, 2011 at 10:41 PM, Thiago Clark wrote: > Dear R-helpers, > > I have a huge dataset and I create a filter selecting only the cases I want > using: >>data <- subset(data, data$var=='x' | data$var=='y' | data$var=='z' | ... ) > > The problem is, when i check my new data it doen't show a row_names column > but when the data is exported to MYSQL (using RMYSQL) it creates a column > row_names. > > I've already tried >>row.names(data)<- NULL > tried >>write.table > then >>read.table > > tried >>as.matrix(data) > > but the row_names column does not appear in R, but appears everytime I > export to MYSQL > > I've done a similar work with another dataset but the difference was that in > the filter I used >>data <- subset(data, substr(as.numeric(data$var),1,2==0 | > substr(as.numeric(data$var),1,2==1 | ?... ) > > I don't know how to get rid of that column. > > thanks > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From adrion at ibe.med.uni-muenchen.de Wed Jul 6 22:43:43 2011 From: adrion at ibe.med.uni-muenchen.de (Christine) Date: Wed, 6 Jul 2011 13:43:43 -0700 (PDT) Subject: [R] Dealing with missing values in a linear mixed model In-Reply-To: <1309980132.34254.YahooMailNeo@web114702.mail.gq1.yahoo.com> References: <1309980132.34254.YahooMailNeo@web114702.mail.gq1.yahoo.com> Message-ID: <1309985023613-3649941.post@n4.nabble.com> Hi, within lme(), I think it is only possible to do na.action = na.omit. The default action (= na.fail) causes lme() to print an error message and terminate if there are any incomplete observations Best, Christine ----- -- Christine Adrion, Dipl.-Stat.,MPH Ludwig-Maximilians-Universit?t M?nchen IBE - Institut f?r Medizinische Informations- verarbeitung, Biometrie und Epidemiologie Marchioninistr. 15 D- 81377 M?nchen Germany Tel: +49 89 7095 7486 Fax: +49 89 7095 7491 eMail: adrion at ibe.med.uni-muenchen.de web: http://www.ibe.med.uni-muenchen.de -- View this message in context: http://r.789695.n4.nabble.com/Dealing-with-missing-values-in-a-linear-mixed-model-tp3649747p3649941.html Sent from the R help mailing list archive at Nabble.com. From alex.zhang at ymail.com Wed Jul 6 23:02:56 2011 From: alex.zhang at ymail.com (Alex Zhang) Date: Wed, 6 Jul 2011 14:02:56 -0700 (PDT) Subject: [R] How to Execute A Query Stored In Access 2007 Message-ID: <1309986176.15038.YahooMailNeo@web114008.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From walt at dataanalyticscorp.com Wed Jul 6 22:43:56 2011 From: walt at dataanalyticscorp.com (Data Analytics Corp.) Date: Wed, 06 Jul 2011 16:43:56 -0400 Subject: [R] finding the intersection of two vectors Message-ID: <4E14C90C.2080006@dataanalyticscorp.com> Hi, Suppose I have two vectors, not necessarily the same length (in fact, they usually are different lengths): y.1 that has increasing values between 0 and 1; y.2 that has decreasing values between 1.0 and 0. You can picture these as being supply (= y.1) and demand (= y.2) curves from economics. I typically plot these vectors on the same graph against a common x variable, which happens to be price for what I do. The price variable runs from, say, $0 to $25. When I plot y.1 and y.2, I've been eye-balling a vertical line at a price point where y.1 intersects y.2. I'm now tired of eye-balling a line through the intersection -- takes too much time to get it right or just close enough. I can't figure out how to find the price value at which the two curves intersect. Going back to the economics interpretation, I want the price where supply equals demand. Any suggestions as to how I can find that price point in R? Any functions that help? Thanks, Walt ________________________ Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 ________________________ (V) 609-936-8999 (F) 609-936-3733 walt at dataanalyticscorp.com www.dataanalyticscorp.com From dwinsemius at comcast.net Wed Jul 6 23:50:38 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 6 Jul 2011 17:50:38 -0400 Subject: [R] finding the intersection of two vectors In-Reply-To: <4E14C90C.2080006@dataanalyticscorp.com> References: <4E14C90C.2080006@dataanalyticscorp.com> Message-ID: On Jul 6, 2011, at 4:43 PM, Data Analytics Corp. wrote: > Hi, > > Suppose I have two vectors, not necessarily the same length (in > fact, they usually are different lengths): y.1 that has increasing > values between 0 and 1; y.2 that has decreasing values between 1.0 > and 0. You can picture these as being supply (= y.1) and demand (= > y.2) curves from economics. I typically plot these vectors on the > same graph against a common x variable, which happens to be price > for what I do. The price variable runs from, say, $0 to $25. When > I plot y.1 and y.2, I've been eye-balling a vertical line at a price > point where y.1 intersects y.2. I'm now tired of eye-balling a line > through the intersection -- takes too much time to get it right or > just close enough. I can't figure out how to find the price value > at which the two curves intersect. Going back to the economics > interpretation, I want the price where supply equals demand. Any > suggestions as to how I can find that price point in R? Any > functions that help? ?approxfun # or.. ?splinefun # should allow you to make two functions and then solve for the X taht minimizes the difference. -- David Winsemius, MD West Hartford, CT From pmaclean2011 at yahoo.com Wed Jul 6 23:51:50 2011 From: pmaclean2011 at yahoo.com (Peter Maclean) Date: Wed, 6 Jul 2011 14:51:50 -0700 (PDT) Subject: [R] Split a row vector into columns In-Reply-To: References: <1309975609.86208.YahooMailRC@web121711.mail.ne1.yahoo.com> Message-ID: <1309989110.50515.YahooMailRC@web121706.mail.ne1.yahoo.com> That is?what I wanted ?Peter Maclean Department of Economics UDSM ----- Original Message ---- From: Sarah Goslee To: Peter Maclean Cc: r-help at r-project.org Sent: Wed, July 6, 2011 1:15:31 PM Subject: Re: [R] Split a row vector into columns You mean like: > myvec <- c(1,2,3,1,2,3,1,2,3) > myvec [1] 1 2 3 1 2 3 1 2 3 > matrix(myvec, ncol=3, byrow=TRUE) ? ? [,1] [,2] [,3] [1,]? ? 1? ? 2? ? 3 [2,]? ? 1? ? 2? ? 3 [3,]? ? 1? ? 2? ? 3 > Or do you actually have more complex requirements? Sarah On Wed, Jul 6, 2011 at 2:06 PM, Peter Maclean wrote: > I want to create?columns from this row vector. From: > ??? x1 x2 x3 x1 x2 x3 x1 x2 x3 > ???? 1? 2 3??1? 2??3??1?? 2? 3 > > to: > x1 x2 x3 > 1? 2?? 3 > 1? 2?? 3 > 1? 2?? 3?Peter Maclean > Department of Economics > UDSM > -- Sarah Goslee http://www.functionaldiversity.org From marchywka at hotmail.com Thu Jul 7 00:01:07 2011 From: marchywka at hotmail.com (Mike Marchywka) Date: Wed, 6 Jul 2011 18:01:07 -0400 Subject: [R] problem loading rgdal with Rapache, problem solved due to libexpat with apache. Message-ID: Has anyone had problems with Rapache that don't show up on command line execution of R? I just ran into this loading rgdal in Rapache page and having a problem with loading shared object. The final complaint was that Stop_XMLParser was undefined- this was surprising since grep -l showed it in expat lib and it worked fine from command line. Finally, I noted the search path had apache/lib ahead of the other places where expat exists.? The libexpat there although apparently having same versio, so.0.5.0 did not grep for this symbol. Anyway, copying one into the other fixed the problem and the page works fine but curious is anyone has thoughts on what could have caused this. Sorry I don't have specific output but i thought you may remember if you ran into it and it is not worth trying to replicate. Thanks. From wdunlap at tibco.com Thu Jul 7 00:02:06 2011 From: wdunlap at tibco.com (William Dunlap) Date: Wed, 6 Jul 2011 15:02:06 -0700 Subject: [R] Wrong environment when evaluating and expression? In-Reply-To: References: <77EB52C6DD32BA4D87471DCD70C8D70004653091@NA-PA-VBE03.na.tibco.com> Message-ID: <77EB52C6DD32BA4D87471DCD70C8D70004653341@NA-PA-VBE03.na.tibco.com> No, it is not in any package. Feel free to use it as you wish - it has no licensing restrictions. Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: Joshua Wiley [mailto:jwiley.psych at gmail.com] > Sent: Wednesday, July 06, 2011 1:23 PM > To: William Dunlap > Cc: r-help at r-project.org > Subject: Re: [R] Wrong environment when evaluating and expression? > > Thanks Bill! That is very useful. Is the str.language function in > any package (findFn("str.language") came up empty)? It certainly > helped me, not only to understand this particular problem, but in > trying to wrap my head around language objects (which I only very > poorly grasp) in general. > > Josh > > On Tue, Jul 5, 2011 at 11:48 AM, William Dunlap > wrote: > >> -----Original Message----- > >> From: r-help-bounces at r-project.org > >> [mailto:r-help-bounces at r-project.org] On Behalf Of Joshua Wiley > >> Sent: Monday, July 04, 2011 1:12 AM > >> To: r-help at r-project.org > >> Subject: [R] Wrong environment when evaluating and expression? > >> > >> Hi All, > >> > >> I have constructed two expressions (e1 & e2). ?I can see > that they are > >> not identical, but I cannot figure out how they differ. > >> > >> ############### > >> dat <- mtcars > >> e1 <- expression(with(data = dat, lm(mpg ~ hp))) > >> e2 <- as.expression(substitute(with(data = dat, lm(f)), > >> list(f = mpg ~ hp))) > >> > >> str(e1) > >> str(e2) > >> all.equal(e1, e2) > >> identical(e1, e2) # false > > > > With the appended str.language function you can see the difference > > between e1 and e2. ?It displays > > ?`name` class(length) > > of each component of a recursive object, along with a short > text summary > > of > > it after a colon. > > > >> str.language(e1) > > `e1` expression(1): expression(with(data = da... > > ?`` call(3): with(data = dat, lm(mpg ~... > > ? ?`` name(1): with > > ? ?`data` name(1): dat > > ? ?`` call(2): lm(mpg ~ hp) > > ? ? ?`` name(1): lm > > ? ? ?`` call(3): mpg ~ hp > > ? ? ? ?`` name(1): ~ > > ? ? ? ?`` name(1): mpg > > ? ? ? ?`` name(1): hp > >> str.language(e2) > > `e2` expression(1): expression(with(data = da... > > ?`` call(3): with(data = dat, lm(mpg ~... > > ? ?`` name(1): with > > ? ?`data` name(1): dat > > ? ?`` call(2): lm(mpg ~ hp) > > ? ? ?`` name(1): lm > > ? ? ?`` formula(3): mpg ~ hp > > ? ? ? ?`` name(1): ~ > > ? ? ? ?`` name(1): mpg > > ? ? ? ?`` name(1): hp > > ? ? ? ?`Attributes of ` list(2): structure(list(class = "f... > > ? ? ? ? ?`class` character(1): "formula" > > ? ? ? ? ?`.Environment` environment(5): dat e1 e2 s... > > > > It is a bug in all.equal() that it ignores attributes of formulae. > > E.g., > > > > ?> all.equal(y~x, terms(y~x)) > > ?[1] TRUE > > ?> identical(y~x, terms(y~x)) > > ?[1] FALSE > > > > Here is str.language > > > > str.language <- > > function (object, ..., level = 0, name = > deparse(substitute(object)), > > ? ?attributes = TRUE) > > { > > ? ?abbr <- function(string, maxlen = 25) { > > ? ? ? ?if (length(string) > 1 || nchar(string) > maxlen) > > ? ? ? ? ? ?paste(substring(string[1], 1, maxlen), "...", sep = "") > > ? ? ? ?else string > > ? ?} > > ? ?myDeparse <- function(object) { > > ? ? ? ?if (!is.environment(object)) { > > ? ? ? ? ? ?deparse(object) > > ? ? ? ?} > > ? ? ? ?else { > > ? ? ? ? ? ?ename <- environmentName(object) > > ? ? ? ? ? ?if (ename == "") > > ? ? ? ? ? ? ? ?ename <- "" > > ? ? ? ? ? ?paste(sep = "", "<", ename, "> ", paste(collapse = " ", > > ? ? ? ? ? ? ? ?objects(object))) > > ? ? ? ?} > > ? ?} > > ? ?cat(rep(" ?", level), sep = "") > > ? ?if (is.null(name)) > > ? ? ? ?name <- "" > > ? ?cat(sprintf("`%s` %s(%d): %s\n", abbr(name), class(object), > > ? ? ? ?length(object), abbr(myDeparse(object)))) > > ? ?a <- attributes(object) > > ? ?if (is.recursive(object) && !is.environment(object)) { > > ? ? ? ?object <- as.list(object) > > ? ? ? ?names <- names(object) > > ? ? ? ?for (i in seq_along(object)) { > > ? ? ? ? ? ?str.language(object[[i]], ..., level = level + 1, > > ? ? ? ? ? ? ? ?name = names[i], attributes = attributes) > > ? ? ? ?} > > ? ?} > > ? ?if (attributes) { > > ? ? ? ?a$names <- NULL > > ? ? ? ?if (length(a) > 0) { > > ? ? ? ? ? ?str.language(a, level = level + 1, name = > paste("Attributes > > of", > > ? ? ? ? ? ? ? ?abbr(name)), attributes = attributes) > > ? ? ? ?} > > ? ?} > > } > > > > Bill Dunlap > > Spotfire, TIBCO Software > > wdunlap tibco.com > > > >> > >> eval(e1) > >> eval(e2) > >> ################ > >> > >> The context is trying to use a list of formulae to generate several > >> models from a multiply imputed dataset. ?The package I am > using (mice) > >> has methods for with() and that is how I can (easily) get > the pooled > >> results. ?Passing the formula directly does not work, so I > was trying > >> to generate the entire call and evaluate it as if I had typed it at > >> the console, but I am missing something (probably rather silly). > >> > >> Thanks, > >> > >> Josh > >> > >> > >> -- > >> Joshua Wiley > >> Ph.D. Student, Health Psychology > >> University of California, Los Angeles > >> http://www.joshuawiley.com/ > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > University of California, Los Angeles > https://joshuawiley.com/ > From gs00811 at gmail.com Thu Jul 7 00:37:59 2011 From: gs00811 at gmail.com (Aishuijiao) Date: Wed, 6 Jul 2011 15:37:59 -0700 (PDT) Subject: [R] The BiodiversityR can't work Message-ID: <1309991879107-3650199.post@n4.nabble.com> Sourced: BiodiversityGUI.R Error : .onAttach failed in attachNamespace() for 'Rcmdr', details: call: get(Menus[m, 5]) error: object 'chisquareDistributionPlot' not found Error in BiodiversityRGUI() : needs Rcmdr Hi Guys, I am from China, I wanna use Biodiversity R, but it can't work. Can somebody help me? Thanks -- View this message in context: http://r.789695.n4.nabble.com/The-BiodiversityR-can-t-work-tp3650199p3650199.html Sent from the R help mailing list archive at Nabble.com. From d.cross at tcu.edu Thu Jul 7 00:48:57 2011 From: d.cross at tcu.edu (David Cross) Date: Wed, 6 Jul 2011 17:48:57 -0500 Subject: [R] The BiodiversityR can't work In-Reply-To: <1309991879107-3650199.post@n4.nabble.com> References: <1309991879107-3650199.post@n4.nabble.com> Message-ID: I have not used BiodiversityGUI.R, but it looks like it needs Rcmdr ... are you using Rcmdr? In case you are not, here is the url: http://socserv.mcmaster.ca/jfox/Misc/Rcmdr/ Cheers David Cross d.cross at tcu.edu www.davidcross.us On Jul 6, 2011, at 5:37 PM, Aishuijiao wrote: > Sourced: BiodiversityGUI.R > Error : .onAttach failed in attachNamespace() for 'Rcmdr', details: > call: get(Menus[m, 5]) > error: object 'chisquareDistributionPlot' not found > Error in BiodiversityRGUI() : needs Rcmdr > > Hi Guys, > I am from China, I wanna use Biodiversity R, but it can't work. > Can somebody help me? > > Thanks > > > > -- > View this message in context: http://r.789695.n4.nabble.com/The-BiodiversityR-can-t-work-tp3650199p3650199.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From stevenkennedy2263 at gmail.com Thu Jul 7 01:18:01 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Thu, 7 Jul 2011 09:18:01 +1000 Subject: [R] How to Execute A Query Stored In Access 2007 In-Reply-To: <1309986176.15038.YahooMailNeo@web114008.mail.gq1.yahoo.com> References: <1309986176.15038.YahooMailNeo@web114008.mail.gq1.yahoo.com> Message-ID: What package are you using to connect to the Access database? On Thu, Jul 7, 2011 at 7:02 AM, Alex Zhang wrote: > Hey guys, > > Could you please teach me how to run or execute a query stored in an Access 2007 database? > > I can connect and run queries from my Access database without any problem. But I have an append query stored there. Say called "Append2Tbl". How do I execute it without copying and pasting the query into my R code? Thanks a lot! > > - Alex > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From bps0002 at auburn.edu Thu Jul 7 01:32:54 2011 From: bps0002 at auburn.edu (B77S) Date: Wed, 6 Jul 2011 16:32:54 -0700 (PDT) Subject: [R] identifying a 'run' in a vector Message-ID: <1309995174025-3650295.post@n4.nabble.com> Hi, How can I discern which elements in x (see below) are in 'order', but more specifically.. only the 1st 'ordered run'? I would like for it to return elements 1:8... there may be ordered values after 1:8, but those are not of interest. x <- c(1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45) Thanks for any suggestions. -- View this message in context: http://r.789695.n4.nabble.com/identifying-a-run-in-a-vector-tp3650295p3650295.html Sent from the R help mailing list archive at Nabble.com. From jwiley.psych at gmail.com Thu Jul 7 01:46:53 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 6 Jul 2011 16:46:53 -0700 Subject: [R] identifying a 'run' in a vector In-Reply-To: <1309995174025-3650295.post@n4.nabble.com> References: <1309995174025-3650295.post@n4.nabble.com> Message-ID: Hi, If an "ordered run" means the difference is between the ith and ith + 1 position is 1, then: out <- rle(diff(x)) ?diff gives you the differences (i + 1) - (i), and then run length encoding encodes how long a run of the same number is. In this case, there first run is length 7. rle() outputs a list. If you only want to consider runs that change by 1, then use the "values" element of the list to select only those out$values == 1 if it is not at the beginning of the vector, cumsum(out$lengths) can help to find the right indices to extract your run. Hope this helps, Josh On Wed, Jul 6, 2011 at 4:32 PM, B77S wrote: > Hi, > > How can I discern which elements in x (see below) are in 'order', but more > specifically.. only the 1st 'ordered run'? > I would like for it to return elements 1:8... there may be ordered values > after 1:8, but those are not of interest. > > x <- c(1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45) > > > Thanks for any suggestions. > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/identifying-a-run-in-a-vector-tp3650295p3650295.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From bps0002 at auburn.edu Thu Jul 7 01:49:45 2011 From: bps0002 at auburn.edu (B77S) Date: Wed, 6 Jul 2011 16:49:45 -0700 (PDT) Subject: [R] identifying a 'run' in a vector In-Reply-To: <1309995174025-3650295.post@n4.nabble.com> References: <1309995174025-3650295.post@n4.nabble.com> Message-ID: <1309996185485-3650318.post@n4.nabble.com> well.. the following works, but if you have another idea I am still interested. 1:(which(diff(x)!=1)[1]) B77S wrote: > > Hi, > > How can I discern which elements in x (see below) are in 'order', but more > specifically.. only the 1st 'ordered run'? > I would like for it to return elements 1:8... there may be ordered values > after 1:8, but those are not of interest. > > x <- c(1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45) > > > Thanks for any suggestions. > -- View this message in context: http://r.789695.n4.nabble.com/identifying-a-run-in-a-vector-tp3650295p3650318.html Sent from the R help mailing list archive at Nabble.com. From irasharenow100 at yahoo.com Thu Jul 7 02:03:56 2011 From: irasharenow100 at yahoo.com (Ira Sharenow) Date: Wed, 6 Jul 2011 17:03:56 -0700 Subject: [R] Seasonal correlations Message-ID: <009801cc3c39$5e11d090$4401a8c0@IraHP> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From p_connolly at slingshot.co.nz Thu Jul 7 02:01:38 2011 From: p_connolly at slingshot.co.nz (p_connolly at slingshot.co.nz) Date: Thu, 07 Jul 2011 12:01:38 +1200 Subject: [R] help with nprmpi In-Reply-To: <003901cc3c1d$afae5de0$0f0b19a0$@ouattara@swansea.ac.uk> References: <003901cc3c1d$afae5de0$0f0b19a0$@ouattara@swansea.ac.uk> Message-ID: <20110707120138.cgk8gws4s0kwww8s@webmail.slingshot.co.nz> Quoting Ouattara : > Dear > > I have been trying to a program which requires "nprmpi". However, I have > tried to install the downloaded zip file but get the error: "cannot open > compressed file 'npRmpi_0.40-7.tar.gz/DESCRIPTION'" Since you didn't tell us any basic information, we have to guess what your OS and other details are. My guess that you're trying to use a file which Windows can't deal with. > > > > I tried the command: install.packages("nprmpi") but get the message "package > 'nprmpi' is not available" That would be consistent with using Windows. On the CRAN site, the information relating to the mprmpi package has this to say: Windows binary: not available, see ReadMe. HTH > > > > I then tried version 2.13 of R (as I am still using version 2.12) but still > get error message. > > > > I was wondering if somebody has managed to install it and can kindly assist, > please. > > Many thanks and sorry for taking your time. > > Best wishes, > > os > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From jwiley.psych at gmail.com Thu Jul 7 02:25:48 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 6 Jul 2011 17:25:48 -0700 Subject: [R] identifying a 'run' in a vector In-Reply-To: <1309996185485-3650318.post@n4.nabble.com> References: <1309995174025-3650295.post@n4.nabble.com> <1309996185485-3650318.post@n4.nabble.com> Message-ID: I seem to recall seeing this done in one or two elegant lines, but.... run <- function(x, type = 1) { index <- rle(diff(c(NA, x))) i <- cumsum(index$lengths) j <- match(type, index$values) x[seq.int(i[j - 1], i[j])] } run(c(1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45)) run(c(20, 22, 24, 26, 1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45)) run(c(NA, 20, 24, 28, 23:30)) run(c(20, 22, 24, 26, 1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45), type = 2) Cheers, Josh On Wed, Jul 6, 2011 at 4:49 PM, B77S wrote: > > well.. the following works, but if you have another idea I am still > interested. > > 1:(which(diff(x)!=1)[1]) > > > > > > > > B77S wrote: >> >> Hi, >> >> How can I discern which elements in x (see below) are in 'order', but more >> specifically.. only the 1st 'ordered run'? >> I would like for it to return elements 1:8... there may be ordered values >> after 1:8, but those are not of interest. >> >> x <- c(1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45) >> >> >> Thanks for any suggestions. >> > > -- > View this message in context: http://r.789695.n4.nabble.com/identifying-a-run-in-a-vector-tp3650295p3650318.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From jim.silverton at gmail.com Thu Jul 7 02:58:51 2011 From: jim.silverton at gmail.com (Jim Silverton) Date: Wed, 6 Jul 2011 20:58:51 -0400 Subject: [R] Simulating from the null distribution of a 2 x 3 table Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From dwinsemius at comcast.net Thu Jul 7 03:18:05 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Wed, 6 Jul 2011 21:18:05 -0400 Subject: [R] Simulating from the null distribution of a 2 x 3 table In-Reply-To: References: Message-ID: <76CC7546-FEA7-44EC-AE3C-53B4E23BEB2A@comcast.net> On Jul 6, 2011, at 8:58 PM, Jim Silverton wrote: > Dear all, > > I want to simulate from the null distribution of the following 2 x 3 > table, > > 2 5 10 > 4 8 5 > > I am using a chi-squared test. Yeah. Right. A "chi-squared test". That certainly narrows it down ... to maybe one quarter of all statistical tests ever invented. > Anyone has any idea how to do this? Depending on what you actually mean by "the null distribution of a 2 X 3 table". possibly: ?r2dtable -- David. > > -- > Thanks, > Jim. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT From ggrothendieck at gmail.com Thu Jul 7 03:19:11 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Wed, 6 Jul 2011 21:19:11 -0400 Subject: [R] identifying a 'run' in a vector In-Reply-To: <1309995174025-3650295.post@n4.nabble.com> References: <1309995174025-3650295.post@n4.nabble.com> Message-ID: On Wed, Jul 6, 2011 at 7:32 PM, B77S wrote: > Hi, > > How can I discern which elements in x (see below) are in 'order', but more > specifically.. only the 1st 'ordered run'? > I would like for it to return elements 1:8... there may be ordered values > after 1:8, but those are not of interest. > > x <- c(1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45) > > Since the definition of an ordered run is not given we assume that it is a sequence of numbers which each increase by 1 over the prior number. If that is not it then you will need to clarify the problem definition. First calculate a logical vector which is TRUE at each position which starts a new run. Note that the first position in x always starts a new run even if that run is a singleton so it can be set to TRUE. The remaining elements can be computed using diff as shown. The resulting logical vector is the argument to cumsum below. Taking the cumulative sum of this logical vector gives a vector the same length as x but with each element of the 1st run replaced with 1, each element of the 2nd run replaced with 2 and so on. Finally, since we only want the 1st run we pick out those positions of x where the cumsum equals 1. > x[cumsum(c(TRUE, diff(x) != 1)) == 1] [1] 1 2 3 4 5 6 7 8 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From clark.thiago at gmail.com Thu Jul 7 03:27:19 2011 From: clark.thiago at gmail.com (Thiago Clark) Date: Wed, 6 Jul 2011 22:27:19 -0300 Subject: [R] Subset creates row_names column when exported to MYSQL In-Reply-To: References: Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From alex.zhang at ymail.com Thu Jul 7 03:40:41 2011 From: alex.zhang at ymail.com (Alex Zhang) Date: Wed, 6 Jul 2011 18:40:41 -0700 (PDT) Subject: [R] How to Execute A Query Stored In Access 2007 In-Reply-To: References: <1309986176.15038.YahooMailNeo@web114008.mail.gq1.yahoo.com> Message-ID: <1310002841.88883.YahooMailNeo@web114006.mail.gq1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From gunter.berton at gene.com Thu Jul 7 04:34:47 2011 From: gunter.berton at gene.com (Bert Gunter) Date: Wed, 6 Jul 2011 19:34:47 -0700 Subject: [R] Simulating from the null distribution of a 2 x 3 table In-Reply-To: References: Message-ID: Homework? If not, context? -- Bert On Wed, Jul 6, 2011 at 5:58 PM, Jim Silverton wrote: > Dear all, > > I want to simulate from the null distribution of the following 2 x 3 table, > > 2 ? 5 ?10 > 4 ? 8 ? 5 > > I am using ?a chi-squared test. > Anyone has any idea how to do this? > > -- > Thanks, > Jim. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics From bps0002 at auburn.edu Thu Jul 7 04:38:19 2011 From: bps0002 at auburn.edu (B77S) Date: Wed, 6 Jul 2011 19:38:19 -0700 (PDT) Subject: [R] identifying a 'run' in a vector In-Reply-To: References: <1309995174025-3650295.post@n4.nabble.com> Message-ID: <1310006299768-3650509.post@n4.nabble.com> Yes Gabor, you definition (a sequence of numbers which each increase by 1 over the prior number) is what I meant. Sorry it that was not clear and I thank you and Joshua for your time and your explanation. This should work fine. Gabor Grothendieck wrote: > > On Wed, Jul 6, 2011 at 7:32 PM, B77S <bps0002 at auburn.edu> wrote: >> Hi, >> >> How can I discern which elements in x (see below) are in 'order', but >> more >> specifically.. only the 1st 'ordered run'? >> I would like for it to return elements 1:8... there may be ordered values >> after 1:8, but those are not of interest. >> >> x <- c(1, 2, 3, 4, 5, 6, 7, 8, 20, 21, 22, 45) >> >> > > Since the definition of an ordered run is not given we assume that it > is a sequence of numbers which each increase by 1 over the prior > number. If that is not it then you will need to clarify the problem > definition. > > First calculate a logical vector which is TRUE at each position which > starts a new run. Note that the first position in x always starts a > new run even if that run is a singleton so it can be set to TRUE. The > remaining elements can be computed using diff as shown. The > resulting logical vector is the argument to cumsum below. > > Taking the cumulative sum of this logical vector gives a vector the > same length as x but with each element of the 1st run replaced with 1, > each element of the 2nd run replaced with 2 and so on. > > Finally, since we only want the 1st run we pick out those positions of > x where the cumsum equals 1. > >> x[cumsum(c(TRUE, diff(x) != 1)) == 1] > [1] 1 2 3 4 5 6 7 8 > > > -- > Statistics & Software Consulting > GKX Group, GKX Associates Inc. > tel: 1-877-GKX-GROUP > email: ggrothendieck at gmail.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- View this message in context: http://r.789695.n4.nabble.com/identifying-a-run-in-a-vector-tp3650295p3650509.html Sent from the R help mailing list archive at Nabble.com. From tlumley at uw.edu Thu Jul 7 05:01:28 2011 From: tlumley at uw.edu (Thomas Lumley) Date: Thu, 7 Jul 2011 15:01:28 +1200 Subject: [R] Simulating from the null distribution of a 2 x 3 table In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 12:58 PM, Jim Silverton wrote: > Dear all, > > I want to simulate from the null distribution of the following 2 x 3 table, > > 2 ? 5 ?10 > 4 ? 8 ? 5 > > I am using ?a chi-squared test. > Anyone has any idea how to do this? The r2dtable() function will simulate tables with a given set of row and column totals. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland From stevenkennedy2263 at gmail.com Thu Jul 7 05:16:35 2011 From: stevenkennedy2263 at gmail.com (Steven Kennedy) Date: Thu, 7 Jul 2011 13:16:35 +1000 Subject: [R] How to Execute A Query Stored In Access 2007 In-Reply-To: <1310002841.88883.YahooMailNeo@web114006.mail.gq1.yahoo.com> References: <1309986176.15038.YahooMailNeo@web114008.mail.gq1.yahoo.com> <1310002841.88883.YahooMailNeo@web114006.mail.gq1.yahoo.com> Message-ID: You should just be able to use the sqlQuery function to execute the query something like this: con<-odbcConnect(...) sqlQuery(con,"EXEC ") On Thu, Jul 7, 2011 at 11:40 AM, Alex Zhang wrote: > Steven - I use?RODBC. Thx, > - Alex > ________________________________ > From: Steven Kennedy > To: Alex Zhang > Cc: "r-help at R-project.org" > Sent: Wednesday, July 6, 2011 7:18 PM > Subject: Re: [R] How to Execute A Query Stored In Access 2007 > > What package are you using to connect to the Access database? > > > On Thu, Jul 7, 2011 at 7:02 AM, Alex Zhang wrote: >> Hey guys, >> >> Could you please teach me how to run or execute a query stored in an >> Access 2007 database? >> >> I can connect and run queries from my Access database without any problem. >> But I have an append query stored there. Say called "Append2Tbl". How do I >> execute it without copying and pasting the query into my R code? Thanks a >> lot! >> >> - Alex >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > From pmaclean2011 at yahoo.com Thu Jul 7 05:41:02 2011 From: pmaclean2011 at yahoo.com (Peter Maclean) Date: Wed, 6 Jul 2011 20:41:02 -0700 (PDT) Subject: [R] Substring a column vector Message-ID: <1310010062.44852.YahooMailRC@web121709.mail.ne1.yahoo.com> How to substring the following vector (in data frame b) >b a 11 12 1234 1245 124567 126786 145769 such that: a1??? a2 1???? 1 1???? 2 12??? 23 12??? 34 124?? 567 126???787 145???769 ? I tried logical commands with substr() did not work out. From scottanthonyparsons at gmail.com Thu Jul 7 05:27:17 2011 From: scottanthonyparsons at gmail.com (Sparsons) Date: Wed, 6 Jul 2011 20:27:17 -0700 (PDT) Subject: [R] CAPdiscrim error in BiodiversityR Message-ID: <1310009237545-3650568.post@n4.nabble.com> Hello, I having trouble running the CAPdiscrim function located in biodiversityR. My data tables are as follows: community data frame (called "spdata") Species1... Speciesn site1.. site2.. siten with abundance data as values. Site names are row names. and environmental data (called "envdata") year elevation site1... site2... siten my command lines are as follows: dists = vegdist(spdata, method ="bray") capmodel = CAPdiscrim(dists ~ elevation, data =envdata, axes =4, m=0, permutations = 999) This returns the following error: Error in eval(predvars, data, env) : numeric 'envir' arg not of length one In addition: Warning messages: 1: In cmdscale(distmatrix, k = nrow(x) - 1, eig = T, add = F) : some of the first 88 eigenvalues are < 0 2: In sqrt(ev) : NaNs produced same error occurs when i run: capmodel = CAPdiscrim(spdata ~ elevation, data =envdata, dist = "bray", axes =4, m=0, permutations = 999) I'm quite stumped as to what is going on here. What is envir? Any ideas anyone? Thanks in advance, Best Regards, Scott Parsons PhD James Cook University, Townsville, QLD, Australia -- View this message in context: http://r.789695.n4.nabble.com/CAPdiscrim-error-in-BiodiversityR-tp3650568p3650568.html Sent from the R help mailing list archive at Nabble.com. From n.bowora at gmail.com Thu Jul 7 01:34:08 2011 From: n.bowora at gmail.com (EdBo) Date: Wed, 6 Jul 2011 16:34:08 -0700 (PDT) Subject: [R] loop in optim In-Reply-To: <1309945577050-3648171.post@n4.nabble.com> References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> <1309918026000-3647549.post@n4.nabble.com> <1973242757-1309924128-cardhu_decombobulator_blackberry.rim.net-958499837-@b5.c2.bise7.blackberry> <1309945577050-3648171.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From n.bowora at gmail.com Thu Jul 7 05:39:19 2011 From: n.bowora at gmail.com (EdBo) Date: Wed, 6 Jul 2011 20:39:19 -0700 (PDT) Subject: [R] loop in optim In-Reply-To: <1973242757-1309924128-cardhu_decombobulator_blackberry.rim.net-958499837-@b5.c2.bise7.blackberry> References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> <1309918026000-3647549.post@n4.nabble.com> <1973242757-1309924128-cardhu_decombobulator_blackberry.rim.net-958499837-@b5.c2.bise7.blackberry> Message-ID: <1310009959045-3650592.post@n4.nabble.com> I have one last theoretical question, I did not adjust my code prior so that it maximise the likehood function. I googled that to make optim maximise you multiply fn by -1. In my code, would that be the same as saying "-sum" on the "sum" part of my code (see below)? llik = function(x) { al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4] sum(na.rm=T, ifelse(a$R_j< 0, log(1 / ( sqrt(2*pi) * sigma_j) )- -- View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3650592.html Sent from the R help mailing list archive at Nabble.com. From djmuser at gmail.com Thu Jul 7 06:36:37 2011 From: djmuser at gmail.com (Dennis Murphy) Date: Wed, 6 Jul 2011 21:36:37 -0700 Subject: [R] Substring a column vector In-Reply-To: <1310010062.44852.YahooMailRC@web121709.mail.ne1.yahoo.com> References: <1310010062.44852.YahooMailRC@web121709.mail.ne1.yahoo.com> Message-ID: Hi: Here's one take: shalve <- function(x) { nc <- sapply(x, nchar) ns <- nc %/% 2 cbind(a1 = substring(x, 1, ns), a2 = substring(x, ns + 1, nc)) } > shalve(a) a1 a2 [1,] "1" "1" [2,] "1" "2" [3,] "12" "34" [4,] "12" "45" [5,] "124" "567" [6,] "126" "786" [7,] "145" "769" HTH, Dennis On Wed, Jul 6, 2011 at 8:41 PM, Peter Maclean wrote: > How to substring the following vector (in data frame b) >>b > a > 11 > 12 > 1234 > 1245 > 124567 > 126786 > 145769 > > such that: > > a1??? a2 > 1???? 1 > 1???? 2 > 12??? 23 > 12??? 34 > 124?? 567 > 126???787 > 145???769 > I tried logical commands with substr() did not work out. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From jwiley.psych at gmail.com Thu Jul 7 07:00:58 2011 From: jwiley.psych at gmail.com (Joshua Wiley) Date: Wed, 6 Jul 2011 22:00:58 -0700 Subject: [R] loop in optim In-Reply-To: References: <1309772071774-3643230.post@n4.nabble.com> <1309836074873-3645031.post@n4.nabble.com> <1309918026000-3647549.post@n4.nabble.com> <1973242757-1309924128-cardhu_decombobulator_blackberry.rim.net-958499837-@b5.c2.bise7.blackberry> <1309945577050-3648171.post@n4.nabble.com> Message-ID: I don't know what else to say. Your code looks right to me, and it all runs. I would check the value of a at each loop: for (i in 1:4) { a <- afull[seq(20 * (i - 1) +1, 20 * i), ] print(a) # so you can see what it is out[i, ] <- optim(llik, par = start.par, method = "Nelder-Mead")[[1]] } Also, I would try running the code in a clean version of R. That is, a version of R without any non-standard packages loaded, or clutter in the workspace. You can do this by shutting down R, deleting the old workspace, and then starting a new session (there are many other ways to do the same thing, that's just one). Josh On Wed, Jul 6, 2011 at 4:34 PM, EdBo wrote: > I am sorry if I sound stupid but I am not able to correct the error > even after running this code. > >> afull=read.table("D:/hope.txt",header=T) >> llik = function(x) > + ? { > + ? ?al_j=x[1]; au_j=x[2]; sigma_j=x[3]; ?b_j=x[4] > + ? ?sum(na.rm=T, > + ? ? ? ?ifelse(a$R_j< 0, log(1/(2*pi*(sigma_j^2)))- > + ? ? ? ? ? ? ? ? ? ? ? ? ? (1/(2*(sigma_j^2))*(a$R_j+al_j-b_j*a$R_m))^2, > + ? ? ? ? ifelse(a$R_j>0 , log(1/(2*pi*(sigma_j^2)))- > + ? ? ? ? ? ? ? ? ? ? ? ? ? (1/(2*(sigma_j^2))*(a$R_j+au_j-b_j*a$R_m))^2, > + > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?log(ifelse (( pnorm (au_j, mean=b_j * > + a$R_m, sd= sqrt(sigma_j^2))- > + ? ? ? ? ? ? ? ? ? ? ? ? ? pnorm(al_j, mean=b_j * a$R_m, sd=sqrt (sigma_j^2) > + )) > 0, > + > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?(pnorm (au_j,mean=b_j * a$R_m, > + sd= sqrt(sigma_j^2))- > + > + ? ? ? ? ? ? ? ? ? ? ? ? ? pnorm(al_j, mean=b_j * a$R_m, sd= sqrt(sigma_j^2) > + )), > + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?1)) )) > + ? ? ?) > + ? } >> >> start.par = c(-0.01,0.01,0.1,1) >> >> >> out <- matrix(NA, nrow = 4, ncol = 4, dimnames = list( > + ?paste("Run:", 1:4, sep = ''), > + ?c("al_j", "au_j", "sigma_j", "b_j"))) >> >> ## Estimate parameters based on rows 0-20, 21-40, 41-60 of 'afull' >> for (i in 1:4) { > + ?a <- afull[seq(20 * (i - 1) +1, 20 * i), ] > + ?out[i, ] <- optim(llik, par = start.par, method = "Nelder-Mead")[[1]] > + } >> out > ? ? ? ? ? al_j ? ? ?au_j ? ? ? sigma_j ? ? ?b_j > Run:1 0.1088116 0.1621605 -1.554167e-24 0.969153 > Run:2 0.1088116 0.1621605 -1.554167e-24 0.969153 > Run:3 0.1088116 0.1621605 -1.554167e-24 0.969153 > Run:4 0.1088116 0.1621605 -1.554167e-24 0.959875 > > On 6 July 2011 11:46, Berend Hasselman [via R] > wrote: >> EdBo wrote: >> You are right Joshua. >> >> I changed the code because I failed to understand how you attached the full >> data set. How you made the data part of your code. >> >> I am new to R so I am used to one way of attaching data(the way I redone >> it). >> >> You don't need to "attach" the data by using attach(). >> You read the data into an object afull and then select the part you need and >> store that in object a. >> >> BTW: shouldn't the for (i in 1:4) be for (i in 1:3) if I understand the >> original question correctly? >> >> Berend >> >> ________________________________ >> If you reply to this email, your message will be added to the discussion >> below: >> http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3648171.html >> To unsubscribe from loop in optim, click here. > > > -- > View this message in context: http://r.789695.n4.nabble.com/loop-in-optim-tp3643230p3650297.html > Sent from the R help mailing list archive at Nabble.com. > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles https://joshuawiley.com/ From lina.hellstrom at lnu.se Thu Jul 7 07:20:40 2011 From: lina.hellstrom at lnu.se (=?iso-8859-1?Q?Lina_Hellstr=F6m?=) Date: Thu, 7 Jul 2011 07:20:40 +0200 Subject: [R] Rms package - problems with fit.mult.impute Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From kuehnik_0505 at gmx-topmail.de Thu Jul 7 09:54:04 2011 From: kuehnik_0505 at gmx-topmail.de (Niklaus Kuehnis) Date: Thu, 07 Jul 2011 09:54:04 +0200 Subject: [R] Mediation with censored outcome Message-ID: <4E15661C.10800@gmx-topmail.de> Hi all, Is it possible to test mediation effects with a censored outcome variable using the 'mediation' package? While on p. 1f. of Imai, Keele, Tingley and Yamamoto (2011) the tobit model is mentioned for use with 'mediate', Table 1 (p. 8) shows that tobit via vglm cannot be used to estimate causal mediation effects. Also help(mediate) does not mention 'vglm' class objects as possible arguments of 'mediate'. So, is calculating mediation effects using 'mediate' with vglm-tobit models possible at all? Thanks in advance, Niklaus --- Imai, K., Keele, L., Tingley, D., & Yamamoto, T. (2011). Causal mediation analysis using R. http://www.polisci.ohio-state.edu/faculty/lkeele/mediationIII.pdf From B.Ouattara at swansea.ac.uk Thu Jul 7 09:53:56 2011 From: B.Ouattara at swansea.ac.uk (Ouattara) Date: Thu, 7 Jul 2011 08:53:56 +0100 Subject: [R] help with nprmpi In-Reply-To: <20110707120138.cgk8gws4s0kwww8s@webmail.slingshot.co.nz> References: <003901cc3c1d$afae5de0$0f0b19a0$@ouattara@swansea.ac.uk> <20110707120138.cgk8gws4s0kwww8s@webmail.slingshot.co.nz> Message-ID: <000f01cc3c7b$0a441620$1ecc4260$@ouattara@swansea.ac.uk> Many thanks. Best wishes, os -----Original Message----- From: p_connolly at slingshot.co.nz [mailto:p_connolly at slingshot.co.nz] Sent: 07 July 2011 01:02 To: Ouattara Cc: r-help at r-project.org Subject: Re: [R] help with nprmpi Quoting Ouattara : > Dear > > I have been trying to a program which requires "nprmpi". However, I have > tried to install the downloaded zip file but get the error: "cannot open > compressed file 'npRmpi_0.40-7.tar.gz/DESCRIPTION'" Since you didn't tell us any basic information, we have to guess what your OS and other details are. My guess that you're trying to use a file which Windows can't deal with. > > > > I tried the command: install.packages("nprmpi") but get the message "package > 'nprmpi' is not available" That would be consistent with using Windows. On the CRAN site, the information relating to the mprmpi package has this to say: Windows binary: not available, see ReadMe. HTH > > > > I then tried version 2.13 of R (as I am still using version 2.12) but still > get error message. > > > > I was wondering if somebody has managed to install it and can kindly assist, > please. > > Many thanks and sorry for taking your time. > > Best wishes, > > os > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From gavin.simpson at ucl.ac.uk Thu Jul 7 10:12:59 2011 From: gavin.simpson at ucl.ac.uk (Gavin Simpson) Date: Thu, 07 Jul 2011 09:12:59 +0100 Subject: [R] relative euclidean distance In-Reply-To: References: Message-ID: <1310026379.2731.3.camel@chrysothemis.geog.ucl.ac.uk> On Wed, 2011-07-06 at 14:50 +0100, Sarah Leclaire wrote: > Hi, > > I would like to calculate the RELATIVE euclidean distance. Is there a > function in R which does it ? > > (I calculated the abundance of 94 chemical compounds in secretion of > several individuals, and I would like to have the chemical distance > between 2 individuals as expressed by the relative euclidean distance. > Some compounds are in very low abundance whereas others are in high > abundance, that's why I would like to correct for the abundance) > > Thanks, > Sarah A simple solution to this is to transform the data and then compute the Euclidean distance using dist(). decostand(foo, method = "normalize") and disttransform(foo, method = "chord") in package BiodiversityR can do this for you without you having to write a function yourself. Pass the returned object to dist() to get the distances you want. HTH G -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% From pdalgd at gmail.com Thu Jul 7 10:16:55 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Thu, 7 Jul 2011 10:16:55 +0200 Subject: [R] finding the intersection of two vectors In-Reply-To: References: <4E14C90C.2080006@dataanalyticscorp.com> Message-ID: <4930FD05-EDFD-4161-9ADB-DA174B3C1CCA@gmail.com> On Jul 6, 2011, at 23:50 , David Winsemius wrote: > > On Jul 6, 2011, at 4:43 PM, Data Analytics Corp. wrote: > >> Hi, >> >> Suppose I have two vectors, not necessarily the same length (in fact, they usually are different lengths): y.1 that has increasing values between 0 and 1; y.2 that has decreasing values between 1.0 and 0. You can picture these as being supply (= y.1) and demand (= y.2) curves from economics. I typically plot these vectors on the same graph against a common x variable, which happens to be price for what I do. The price variable runs from, say, $0 to $25. When I plot y.1 and y.2, I've been eye-balling a vertical line at a price point where y.1 intersects y.2. I'm now tired of eye-balling a line through the intersection -- takes too much time to get it right or just close enough. I can't figure out how to find the price value at which the two curves intersect. Going back to the economics interpretation, I want the price where supply equals demand. Any suggestions as to how I can find that price point in R? Any functions that help? > > ?approxfun # or.. > ?splinefun # should allow you to make two functions and then solve for the X taht minimizes the difference. > With linear interpolation, uniroot() on the difference between the two approxfun()s should get you there rather quickly. (Unless the curves are very smooth, I'd avoid splinefun because of the risk of introducing oscillations.) > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From lui.r.project at googlemail.com Thu Jul 7 10:27:15 2011 From: lui.r.project at googlemail.com (Lui ##) Date: Thu, 7 Jul 2011 16:27:15 +0800 Subject: [R] parallel computing with 'foreach' In-Reply-To: <4E0CB152.3070802@statistik.tu-dortmund.de> References: <4E0CB152.3070802@statistik.tu-dortmund.de> Message-ID: Hello Stacey, I do not know whether my answer comes late or not, just came across your post now. I had a similar problem... First: You might want to think about whether to try to parallelize the thing or not. Unless coxph takes several minutes, it is probably of no great help to parallelize it, because there are many jobs associated with it. All workers need to be "taught" about the environment (the functions and variables they need to know) and some coordination work is necessary as well. So if every for-loop takes a longer time: you may want to use foreach, otherwise there's no great benefit (probably). What you could do is save only the functions you need in a separate R file and just have the workers initialize the functions you need for that. So you split up your source code in two parts - one containing the functions you need in the loop later and one that controls how the functions work together... You can try : ##declare a function that loads only the libraries and functions necessary inside the loop mysource <- function(envir, filename) source("source.R") ##tell the programm to have every worker execute that function smpopts <- list(initEnvir = mysource) ##have it executed with the foreach loop foreach (.....,.options.smp=smpopts){ Hope that helps... Best Lui 2011/7/1 Uwe Ligges : > Type > ??foreach > and read the whole help page - as the positng guide asked you to do before > posting, you will find the line describing the argument ".packages". > > Uwe Ligges > > > > On 28.06.2011 21:17, Stacey Wood wrote: >> >> Hi all, >> I would like to parallelize some R code and would like to use the >> 'foreach' >> package with a foreach loop. ?However, whenever I call a function from an >> enabled package outside of MASS, I get an error message that a number of >> the >> functions aren't recognized (even though the functions should be defined). >> For example: >> >> library(foreach) >> library(doSMP) >> library(survival) >> # Create the simplest test data set >> test1<- list(time=c(4,3,1,1,2,2,3), >> ? ? ? ? ? ? ? status=c(1,1,1,0,1,1,0), >> ? ? ? ? ? ? ? x=c(0,2,1,1,1,0,0), >> ? ? ? ? ? ? ? sex=c(0,0,0,0,1,1,1)) >> # Fit a stratified model >> coxph(Surv(time, status) ~ x + strata(sex), test1) >> >> w<- startWorkers() >> registerDoSMP(w) >> foreach(i=1:3) %dopar% { >> # Fit a stratified model >> fit<-coxph(Surv(time, status) ~ x + strata(sex), test1) >> summary(fit)$coef[i] >> } >> stopWorkers(w) >> ####Error message: >> Error in { : task 1 failed - "could not find function "coxph"" >> >> >> If I call library(survival) inside the foreach loop, everything runs >> properly. ?I don't think that I should have to call the package >> iteratively >> inside the loop. ?I would like to use a foreach loop inside code for my >> own >> package, but this is a problem since I can't call my own package in the >> source code for the package itself! ?Any advice would be appreciated. >> >> Thanks, >> Stacey >> >> ? ? ? ?[[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > From pdalgd at gmail.com Thu Jul 7 10:29:57 2011 From: pdalgd at gmail.com (peter dalgaard) Date: Thu, 7 Jul 2011 10:29:57 +0200 Subject: [R] Simulating from the null distribution of a 2 x 3 table In-Reply-To: References: Message-ID: <1AA1F7EF-14BD-4B75-8A2B-A7B2B2082CF3@gmail.com> On Jul 7, 2011, at 05:01 , Thomas Lumley wrote: > On Thu, Jul 7, 2011 at 12:58 PM, Jim Silverton wrote: >> Dear all, >> >> I want to simulate from the null distribution of the following 2 x 3 table, >> >> 2 5 10 >> 4 8 5 >> >> I am using a chi-squared test. >> Anyone has any idea how to do this? > > The r2dtable() function will simulate tables with a given set of row > and column totals. Or, as a shortcut, maybe look into chisq.test(..., simulate.p.value=TRUE). Notice that both use hypergeometric-type sampling. One could also consider sampling with rmultinom (either of order 6, twice of order three, or thrice of order 2), or even rpois, assuming a Poisson distribution of the total count. > > -thomas > > -- > Thomas Lumley > Professor of Biostatistics > University of Auckland > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com From jeroen.ooms at stat.ucla.edu Thu Jul 7 10:18:25 2011 From: jeroen.ooms at stat.ucla.edu (jeroen00ms) Date: Thu, 7 Jul 2011 01:18:25 -0700 (PDT) Subject: [R] datastructure for multi-choice factors Message-ID: <1310026705710-3650940.post@n4.nabble.com> I am working on a system to visualize survey responses. Survey responses typically include factors, numeric, timestamps, textfields and therefore fit perfectly nice in dataframes, making it easy to visualize using standard R functions. However I am currently working on a survey that also include questions in which the respondent can check more than one answer on a single multichoice item. I.e. this represents a factor for which every row has multiple responses. I am looking for a way to put this into a dataframe together with the other questions of the survey. I considered three workarounds, but both are problematic: - Column-wise expanding: convert a single multi-choice item into N binary column factors for every possible response (level) with 1/0 values representing if the answer was checked or not. Problem with this is that you lose the information that these N columns are in fact one question and it becomes very hard to vizualise this single question. - Row wise expanding: convert a single response into N rows, one for every response. Problem with this is that if the factor is part of the dataframe, also all of the other items have to be duplicated, leading to artificial results. I was wondering if there is a more natural datastructure to put a multi-choice item into a dataframe? Some code for illustration: people <- list( name=c("John", "Mary", "Jennifer", "Neil"), gender=factor(c("M","F","F","M")), age=c(34,23,40,30), residence=sapply(list("US", c("US", "CA"), "MX", c("MX", "US", "CA")), factor, levels=c("US", "CA", "MX")) ); -- View this message in context: http://r.789695.n4.nabble.com/datastructure-for-multi-choice-factors-tp3650940p3650940.html Sent from the R help mailing list archive at Nabble.com. From momadou at yahoo.fr Thu Jul 7 08:56:46 2011 From: momadou at yahoo.fr (Komine) Date: Wed, 6 Jul 2011 23:56:46 -0700 (PDT) Subject: [R] Problem with varpart (vegan library) Message-ID: <1310021806552-3650816.post@n4.nabble.com> Hi, I did a linear regression with 5 explanatory variables. Now, to see the contribution of each variable, I use varpart from vegan library. But varpart don?t accepts my 5 explanatory variables, it accept only 4. 1- How must I do to use my 5 explanatory variables? 2- Is it the sum of variance fraction of each variable must be equal to 1. Thanks for your help. Komine -- View this message in context: http://r.789695.n4.nabble.com/Problem-with-varpart-vegan-library-tp3650816p3650816.html Sent from the R help mailing list archive at Nabble.com. From nevo84 at gmail.com Thu Jul 7 08:25:07 2011 From: nevo84 at gmail.com (omernevo) Date: Wed, 6 Jul 2011 23:25:07 -0700 (PDT) Subject: [R] Showing which bars in a bar chart are significantly different In-Reply-To: <1309986070.80660.YahooMailClassic@web38407.mail.mud.yahoo.com> References: <1309984625369-3649897.post@n4.nabble.com> <1309986070.80660.YahooMailClassic@web38407.mail.mud.yahoo.com> Message-ID: <1310019907480-3650765.post@n4.nabble.com> Thanks all! I will try this later on today... Omer -- View this message in context: http://r.789695.n4.nabble.com/Showing-which-bars-in-a-bar-chart-are-significantly-different-tp3649897p3650765.html Sent from the R help mailing list archive at Nabble.com. From rolf.turner at xtra.co.nz Thu Jul 7 11:13:17 2011 From: rolf.turner at xtra.co.nz (Rolf Turner) Date: Thu, 07 Jul 2011 21:13:17 +1200 Subject: [R] finding the intersection of two vectors In-Reply-To: <4E14C90C.2080006@dataanalyticscorp.com> References: <4E14C90C.2080006@dataanalyticscorp.com> Message-ID: <4E1578AD.7010107@xtra.co.nz> On 07/07/11 08:43, Data Analytics Corp. wrote: > Hi, > > Suppose I have two vectors, not necessarily the same length (in fact, > they usually are different lengths): y.1 that has increasing values > between 0 and 1; y.2 that has decreasing values between 1.0 and 0. > You can picture these as being supply (= y.1) and demand (= y.2) > curves from economics. I typically plot these vectors on the same > graph against a common x variable, which happens to be price for what > I do. The price variable runs from, say, $0 to $25. When I plot y.1 > and y.2, I've been eye-balling a vertical line at a price point where > y.1 intersects y.2. I'm now tired of eye-balling a line through the > intersection -- takes too much time to get it right or just close > enough. I can't figure out how to find the price value at which the > two curves intersect. Going back to the economics interpretation, I > want the price where supply equals demand. Any suggestions as to how > I can find that price point in R? Any functions that help? There will actually be two pairs of points, one pair on the price curve and one pair on the demand curve, so that the intersection of the two curves lies between the respective pair on each curve. Without an algebraic expression for the curves, that's all you can say. The following function joins the respective pairs of points > findInt <- function (x1,y1,x2,y2,plot=FALSE) { > # > # x1 and y1 are the coordinates of the points on the INCREASING curve. > # x2 and y2 are the coordinates of the points on the DECREASING curve. > # > y1star <- approx(x2,y2,xout=x1,yleft=Inf,yright=-Inf)$y > k <- sum(y1 <= y1star) > y2star <- approx(x1,y1,xout=x2,yleft=-Inf,yright=Inf)$y > ell <- sum(y2 >= y2star) > b1 <- y1[k] > b2 <- y2[ell] > m1 <- (y1[k+1] - y1[k])/(x1[k+1] - x1[k]) > m2 <- (y2[ell+1] - y2[ell])/(x2[ell+1] - x2[ell]) > x <- (b1-b2-m1*x1[k]+m2*x2[ell])/(m2-m1) > y <- b1 + m1*(x-x1[k]) > if(plot) { > plot(x1,y1,xlim=range(x1,x2),ylim=range(y1,y2)) > points(x2,y2,col="red") > segments(x1[k],y1[k],x1[k+1],y1[k+1]) > segments(x2[ell],y2[ell],x2[ell+1],y2[ell+1]) > points(x,y,pch=20) > } > c(x=x,y=y) > } by straight lines and finds the point of intersection of the two lines. This is probably pretty similar to the point you'd get by eye-balling the data. cheers, Rolf Turner From el at lisse.NA Thu Jul 7 11:41:17 2011 From: el at lisse.NA (Dr Eberhard Lisse) Date: Thu, 07 Jul 2011 10:41:17 +0100 Subject: [R] aggregation question Message-ID: <4E157F3D.3090305@lisse.NA> Hi, I am reading payment data like so 2010-01-01,100.00 2010-01-04,100.00 ... 2011-01-01,200.00 2011-01-07,100.00 and plot it aggregated per month like so library(zoo) df <- read.csv("daily.csv", colClasses=c(d="Date",s="numeric")) z <- zoo(df$s, df$d) z.mo <- aggregate(z, as.yearmon, sum) barplot(z.mo, col="darkblue") How do I get the monthly aggregated payments in different colors next to each other (ie for each year in a different color with the x axis showing the months)? Solution preferred, but pointers to documentation welcome :-)-O greetings, el -- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist (Saar) el at lisse.NA el108-ARIN / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 \ / Please do NOT email to this address Bachbrecht, Namibia ;____/ if it is DNS related in ANY way From b.rowlingson at lancaster.ac.uk Thu Jul 7 11:42:32 2011 From: b.rowlingson at lancaster.ac.uk (Barry Rowlingson) Date: Thu, 7 Jul 2011 10:42:32 +0100 Subject: [R] finding the intersection of two vectors In-Reply-To: <4E14C90C.2080006@dataanalyticscorp.com> References: <4E14C90C.2080006@dataanalyticscorp.com> Message-ID: On Wed, Jul 6, 2011 at 9:43 PM, Data Analytics Corp. wrote: > close enough. ?I can't figure out how to find the price value at which the > two curves intersect. ?Going back to the economics interpretation, I want > the price where supply equals demand. ?Any suggestions as to how I can find > that price point in R? ?Any functions that help? You could roll out the 100,000-pound gorilla that is rgeos, treat the two lines as spatial lines and then use gIntersection: > x1 [1] 3 4 6 8 10 > x2 [1] 1 2 6 7 10 > y1 [1] 0.23898824 0.48215370 0.45215557 0.08049115 0.18068038 > y2 [1] 0.2749391 0.3638511 0.1650239 0.3064780 0.8515887 > s1=SpatialLines(list(Lines(list(Line(cbind(x1,y1))),ID=1))) > s2=SpatialLines(list(Lines(list(Line(cbind(x2,y2))),ID=1))) > gIntersection(s1,s2) SpatialPoints: x y 1 3.256617 0.3013887 1 6.877310 0.2891230 Coordinate Reference System (CRS) arguments: NA Here my example crosses twice at those x-y coordinates. Note that if the two lines are exactly equal along a line, you'll get back a SpatialLines object as part of your result. In your case this would be if supply dropped to 24 as demand rose to 24 and they both stayed like that for some time before crossing. If you want the first time that the values are equal you'd just take the minimum X-coordinate in the returned object... But yeah, it might be overkill, but possibly handy if you want to compute multiple times when two curves cross. Barry From janko.thyson.rstuff at googlemail.com Thu Jul 7 12:01:22 2011 From: janko.thyson.rstuff at googlemail.com (Janko Thyson) Date: Thu, 07 Jul 2011 12:01:22 +0200 Subject: [R] Simple inheritance check fails (integer from numeric) Message-ID: <4E1583F2.5020404@googlemail.com> Dear list, In a function, I don't care if my input has class 'integer' or 'numeric', so I wanted to use 'inherits()' to control for that. However, this function tells me that an actual object of class 'integer' does not inherit from class 'numeric'. The class def of 'integer' does state 'numeric' as one of the superclasses. Isn't that somewhat inconsistent? > getClass("integer") Class "integer" [package "methods"] No Slots, prototype of class "integer" Extends: "numeric", "vector", "data.frameRowLabels" > a <- 1:3 > class(a) [1] "integer" > inherits(a, "numeric") [1] FALSE Regards, Janko From petr.pikal at precheza.cz Thu Jul 7 12:08:25 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Thu, 7 Jul 2011 12:08:25 +0200 Subject: [R] Odp: aggregation question In-Reply-To: <4E157F3D.3090305@lisse.NA> References: <4E157F3D.3090305@lisse.NA> Message-ID: Hi > > Hi, > > I am reading payment data like so > > 2010-01-01,100.00 > 2010-01-04,100.00 > ... > 2011-01-01,200.00 > 2011-01-07,100.00 > > and plot it aggregated per month like so > > library(zoo) > df <- read.csv("daily.csv", colClasses=c(d="Date",s="numeric")) > z <- zoo(df$s, df$d) > z.mo <- aggregate(z, as.yearmon, sum) > barplot(z.mo, col="darkblue") > > How do I get the monthly aggregated payments in different colors > next to each other (ie for each year in a different color with the x > axis showing the months)? What about putting suitable set of colours to col argument? Regards Petr > > Solution preferred, but pointers to documentation welcome :-)-O > > greetings, el > -- > Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist (Saar) > el at lisse.NA el108-ARIN / * | Telephone: +264 81 124 6733 (cell) > PO Box 8421 \ / Please do NOT email to this address > Bachbrecht, Namibia ;____/ if it is DNS related in ANY way > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. From Bernhard_Pfaff at fra.invesco.com Thu Jul 7 12:13:21 2011 From: Bernhard_Pfaff at fra.invesco.com (Pfaff, Bernhard Dr.) Date: Thu, 7 Jul 2011 10:13:21 +0000 Subject: [R] BY GROUP in evir R package In-Reply-To: <1309972690.71427.YahooMailRC@web121713.mail.ne1.yahoo.com> References: <31464.80001.qm@web121719.mail.ne1.yahoo.com> <4DE939D8.8040305@pfaffikus.de> <1309937119.92702.YahooMailRC@web121711.mail.ne1.yahoo.com> <1309972690.71427.YahooMailRC@web121713.mail.ne1.yahoo.com> Message-ID: lapply(rg2, function(x) x$par.ests) there is no slot residuals! The function gev() does return a S3-object with class attribute 'gev', see ?gev. > > Dr. Pfaff: > > After using str; can you give an example on data extration > (e.g. for $par.ests?and @residuals) > > > > ----- Original Message ---- > From: "Pfaff, Bernhard Dr." > To: Peter Maclean ; Dr. Bernhard Pfaff > > Cc: "r-help at r-project.org" > Sent: Wed, July 6, 2011 8:17:12 AM > Subject: AW: [R] BY GROUP in evir R package > > Hello Peter, > > str(rg2) > > us quite revealing for this; by() returns a list and hence > lapply() can be > employed, e.g.: > lapply(rg2, rlevel.gev, k.blocks = 5) > > By the same token, you can extract the relevant bits and > pieces and put them > together in a data.frame. > > Best, > Bernhard > > > -----Urspr?ngliche Nachricht----- > > Von: r-help-bounces at r-project.org > > [mailto:r-help-bounces at r-project.org] Im Auftrag von Peter Maclean > > Gesendet: Mittwoch, 6. Juli 2011 09:25 > > An: Dr. Bernhard Pfaff > > Cc: r-help at r-project.org > > Betreff: Re: [R] BY GROUP in evir R package > > > > Dr.?Pfaff: > > How do we pass the "by" results to "rlevel.gev" function to > > get the?return level and also save the results (both > > rg2(par.ests and $par.ses) and rl) as.data.frame? > > > > #Grouped vector > > Gdata <- data.frame(n = rep(c(1,2,3), each = 100), y = rnorm(300)) > > library(evir) > > require(plyr) > > > > #Model for Grouped > > rg2<- by(Gdata,Gdata[,"n"], function(x) gev(x$y, 5, method = > > "BFGS", control =list(maxit = 500))) # rl <- rlevel.gev(rg2, > > k.blocks = 5, add = TRUE) > > ? > > > > > > > > ----- Original Message ---- > > From: Dr. Bernhard Pfaff > > To: Peter Maclean > > Sent: Fri, June 3, 2011 2:45:28 PM > > Subject: Re: BY GROUP in evir R package > > > > Hello Peter, > > > > many thanks for your email. Well, as you might have guessed, > > there is also a > > function by() in R that does the same job. See help("by") for > > more information. > > > > Best, > > Bernhard > > > > Peter Maclean schrieb: > > > Hi, > > > I am new in R and I want to use your package for data > > analysis. I usually use > > >SAS. I have rainfall data for different points. Each point > > has 120 observations. > > >The rainfall data is in the first column (RAIN) and the > > categorical variable > > >that group the data is in the second column (GROUP). The > > data frame is > > >rain.data. How can I use the gev function to estimate all > > three parameters by > > >GROUP variable group? In SAS there is a by() function that > > estimate the model by > > >group. However, I would like to move to R. > > >? With thanks, > > >? Peter Maclean > > > Department of Economics > > > University of Dar -es- Salaam, Tanzania > > > > > >? > > > > ______________________________________________ > > R-help at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > ***************************************************************** > Confidentiality Note: The information contained in this message, > and any attachments, may contain confidential and/or privileged > material. It is intended solely for the person(s) or entity to > which it is addressed. Any review, retransmission, dissemination, > or taking of any action in reliance upon this information by > persons or entities other than the intended recipient(s) is > prohibited. If you received this in error, please contact the > sender and delete the material from any computer. > ***************************************************************** > From el at lisse.NA Thu Jul 7 12:23:12 2011 From: el at lisse.NA (Dr Eberhard Lisse) Date: Thu, 07 Jul 2011 11:23:12 +0100 Subject: [R] Odp: aggregation question In-Reply-To: References: <4E157F3D.3090305@lisse.NA> Message-ID: <4E158910.30602@lisse.NA> Petr, Maybe I did not make it clear, I apologize for that: I want January to December on the X Axis (as 12 discrete (months)) and then for each month the values for each year as bars in different colors next to each other, ie Jan-2009, Jan-2011, Jan-2011...Dec-2009, Dec-2011, Dec-2011 whereas at the moment I get Jan-2009, Feb-2009, Mar-2009...Oct-2011, Nov-2011, Dec-2011 In SQL something like GROUP BY MONTH, YEAR as opposed to GROUP BY YEAR, MONTH. greetings, el on 2011-07-07 11:08 Petr PIKAL said the following: [...] >> How do I get the monthly aggregated payments in different colors >> next to each other (ie for each year in a different color with the x >> axis showing the months)? > > What about putting suitable set of colours to col argument? > > Regards > Petr [...] -- Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist (Saar) el at lisse.NA el108-ARIN / * | Telephone: +264 81 124 6733 (cell) PO Box 8421 \ / Please do NOT email to this address Bachbrecht, Namibia ;____/ if it is DNS related in ANY way From S.Ellison at LGCGroup.com Thu Jul 7 12:22:44 2011 From: S.Ellison at LGCGroup.com (S Ellison) Date: Thu, 7 Jul 2011 11:22:44 +0100 Subject: [R] relative euclidean distance In-Reply-To: <1310026379.2731.3.camel@chrysothemis.geog.ucl.ac.uk> References: <1310026379.2731.3.camel@chrysothemis.geog.ucl.ac.uk> Message-ID: <98B156BB22D11342A931E823798D434853E97281B8@GOLD.corp.lgc-group.com> > -----Original Message----- > > I would like to calculate the RELATIVE euclidean distance. > Is there a > > function in R which does it ? > > > > A simple solution to this is to transform the data and then > compute the Euclidean distance using dist(). > > decostand(foo, method = "normalize") and > > disttransform(foo, method = "chord") in package BiodiversityR > See also ?scale in the base package, which will centre and scale by sd by default. But 'relative euclidean distance' is not that straightforward to explain. 'Relative' usually means 'divided by the true (or mean) value', or at least it does for most chemists. You almost certainly don't mean 'euclidean distance divided by mean euclidean distance'. I suspect - because I'm a chemist and it's what I'd by considering in your shoes - that what you're asking for is the euclidean distance between points defined by concentrations of your 94 analytes scaled by mean value. scale() will (by default) scale by dividing by centring on means and dividing by the sd, and that is usually the most sensible thing to do for multivariate data sets where the units or scales for each variable are very different. Scaling by sd and scaling by mean value could give appreciably different answers. Although relative sd for chemical measurement is often near-constant over modest ranges, there is no particular reason to expect that the sd is strictly proportional to the mean over orders of magnitude, and in fact it generally isn't (relative SD tends to be larger for low-level analytes than for higher levels). The difference between the two would be essentially that if you divide by mean value, things with a large relative SD will tend to dominate the variations in 'distance', whereas if you centre and divide by SD, that won't happen to the same extent. But which option is more useful is hard to predict. Me, I think I'd try both and see which made most sense. Incidentally, scaling by mean value without centring using scale() would probably look something like x.scaled <- scale(x, center=FALSE, scale=apply(x,2,mean)) assuming x has columns corresponding to your measurements and dist(x.scaled) then gives you your distance matrix. Steve E lab of the Government Chemist UK ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} From noxyport at gmail.com Thu Jul 7 10:48:57 2011 From: noxyport at gmail.com (Pete Pete) Date: Thu, 7 Jul 2011 01:48:57 -0700 (PDT) Subject: [R] Reshape from long to wide format with date variable In-Reply-To: References: <1309959611143-3648833.post@n4.nabble.com> Message-ID: <1310028537337-3650995.post@n4.nabble.com> Thanks, Josh! The index variable (time) was my problem. My R skills are too low! :) Problem solved! -- View this message in context: http://r.789695.n4.nabble.com/Reshape-from-long-to-wide-format-with-date-variable-tp3648833p3650995.html Sent from the R help mailing list archive at Nabble.com. From newzealandspaul at gmail.com Thu Jul 7 11:48:36 2011 From: newzealandspaul at gmail.com (Paul) Date: Thu, 7 Jul 2011 21:48:36 +1200 Subject: [R] Parsing Apache Combined Log Format in R with regex Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From rvalliant at survey.umd.edu Thu Jul 7 12:13:06 2011 From: rvalliant at survey.umd.edu (Richard Valliant) Date: Thu, 07 Jul 2011 06:13:06 -0400 Subject: [R] R-help Digest, Vol 101, Issue 7 Message-ID: I will be out of the office July 7-18, 2011, with limited email access. For immediate help, please call the JPSM main number, 301-314-7911. From mihalicza.peter at emki.hu Thu Jul 7 12:18:31 2011 From: mihalicza.peter at emki.hu (mihalicza.peter at emki.hu) Date: 7 Jul 2011 12:18:31 +0200 Subject: [R] =?utf-8?q?R-help_Digest=2C_Vol_101=2C_Issue_7?= Message-ID: <20110707101831.18941.qmail@mlnet.hu> J?lius 7-t?l 14-ig irod?n k?v?l vagyok, ?s az emailjeimet nem ?rem el. S?rg?s esetben k?rem forduljon K?rp?ti Edithez (karpati.edit at gyemszi.hu). ?dv?zlettel, Mihalicza P?ter I will be out of the office from 7 July till 14 July with no access to my emails. In urgent cases please contact Ms. Edit K?rp?ti (karpati.edit at gyemszi.hu). With regards, Peter Mihalicza From vincy_pyne at yahoo.ca Thu Jul 7 13:27:48 2011 From: vincy_pyne at yahoo.ca (Vincy Pyne) Date: Thu, 7 Jul 2011 04:27:48 -0700 (PDT) Subject: [R] Generalized Logistic and Richards Curve Message-ID: <1310038068.8504.YahooMailClassic@web120314.mail.ne1.yahoo.com> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From petr.pikal at precheza.cz Thu Jul 7 13:27:18 2011 From: petr.pikal at precheza.cz (Petr PIKAL) Date: Thu, 7 Jul 2011 13:27:18 +0200 Subject: [R] Odp: aggregation question In-Reply-To: <4E158910.30602@lisse.NA> References: <4E157F3D.3090305@lisse.NA> <4E158910.30602@lisse.NA> Message-ID: OK > Petr, > > Maybe I did not make it clear, I apologize for that: > > I want January to December on the X Axis (as 12 discrete (months)) > and then for each month the values for each year as bars in > different colors next to each other, ie Jan-2009, Jan-2011, > Jan-2011...Dec-2009, Dec-2011, Dec-2011 whereas at the moment I get > Jan-2009, Feb-2009, Mar-2009...Oct-2011, Nov-2011, Dec-2011 Well you can look at examples of barplot, especially this one barplot(VADeaths, beside = TRUE, col = c("lightblue", "mistyrose", "lightcyan", "lavender", "cornsilk"), legend = rownames(VADeaths), ylim = c(0, 100)) title(main = "Death Rates in Virginia", font.main = 4) If you look at VADeaths structure you see you need some structured data x<-seq(as.Date("2000/1/1"), by="month", length.out=24) x.m<-aggregate(1:24, list(format(x, "%m"), format(x, "%Y")), sum) x.m Group.1 Group.2 x 1 01 2000 1 2 02 2000 2 3 03 2000 3 4 04 2000 4 5 05 2000 5 and you can use e.g. xtabs or maybe cast from reshape package x.xt<-xtabs(x~Group.1+Group.2,x.m) barplot(x.xt, beside=TRUE, col=rainbow(12)) x.xt<-xtabs(x~Group.2+Group.1,x.m) barplot(x.xt, beside=T, col=1:2) Or you could look at ggplot2 package. Regards Petr > > In SQL something like GROUP BY MONTH, YEAR as opposed to GROUP BY > YEAR, MONTH. > > greetings, el > > on 2011-07-07 11:08 Petr PIKAL said the following: > [...] > >> How do I get the monthly aggregated payments in different colors > >> next to each other (ie for each year in a different color with the x > >> axis showing the months)? > > > > What about putting suitable set of colours to col argument? > > > > Regards > > Petr > [...] > > -- > Dr. Eberhard W. Lisse \ / Obstetrician & Gynaecologist (Saar) > el at lisse.NA el108-ARIN / * | Telephone: +264 81 124 6733 (cell) > PO Box 8421 \ / Please do NOT email to this address > Bachbrecht, Namibia ;____/ if it is DNS related in ANY way From Ryszard.Czerminski at astrazeneca.com Thu Jul 7 13:29:26 2011 From: Ryszard.Czerminski at astrazeneca.com (Czerminski, Ryszard) Date: Thu, 7 Jul 2011 07:29:26 -0400 Subject: [R] sparse kernel regression In-Reply-To: <4E158910.30602@lisse.NA> References: <4E157F3D.3090305@lisse.NA> <4E158910.30602@lisse.NA> Message-ID: <7CC8995FE5E0BF49BF5FC4C0B5C0F37407D48487@usbordembx01.rd.astrazeneca.net> I am looking for the implementation of sparse kernel regression approach e.g. as in this paper: The Generalized LASSO. Volker Roth IEEE Transactions on Neural Networks, Vol. 15, NO. 1, January 2004. I would appreciate any pointers. Best regards, Ryszard -------------------------------------------------------------------------- Confidentiality Notice: This message is private and may ...{{dropped:8}} From nashjc at uottawa.ca Thu Jul 7 13:40:05 2011 From: nashjc at uottawa.ca (John C Nash) Date: Thu, 07 Jul 2011 07:40:05 -0400 Subject: [R] loop in optim In-Reply-To: References: Message-ID: <4E159B15.3070508@uottawa.ca> 2 comments below. On 07/07/2011 06:00 AM, r-help-request at r-project.org wrote: > Date: Wed, 6 Jul 2011 20:39:19 -0700 (PDT) > From: EdBo > To: r-help at r-project.org > Subject: Re: [R] loop in optim > Message-ID: <1310009959045-3650592.post at n4.nabble.com> > Content-Type: text/plain; charset=us-ascii > > I have one last theoretical question, I did not adjust my code prior so that > it maximise the likehood function. I googled that to make optim maximise you > multiply fn by -1. > > In my code, would that be the same as saying "-sum" on the "sum" part of my > code (see below)? > > llik = function(x) > { > al_j=x[1]; au_j=x[2]; sigma_j=x[3]; b_j=x[4] > sum(na.rm=T, > > ifelse(a$R_j< 0, log(1 / ( sqrt(2*pi) * sigma_j) )- > The optimx package has a "maximize" control because we felt that the fnscale approach, while perfectly correct, is not comfortable for users and is not standard across other optimization tools. Note that this package is undergoing a fairly extensive overhaul at the moment (the development version is on R-forge in the project 'optimizer') to include some safeguards on functions that return NaN etc. as well as a number of other changes -- hopefully improvements. A second comment on this looping: Why do you not use the parameters from the last estimation as the starting parameters for the next? Unless you are expecting very extreme changes over the moving window of data, this should appreciably speed up the optimization. John Nash From ggrothendieck at gmail.com Thu Jul 7 14:09:12 2011 From: ggrothendieck at gmail.com (Gabor Grothendieck) Date: Thu, 7 Jul 2011 08:09:12 -0400 Subject: [R] aggregation question In-Reply-To: <4E157F3D.3090305@lisse.NA> References: <4E157F3D.3090305@lisse.NA> Message-ID: On Thu, Jul 7, 2011 at 5:41 AM, Dr Eberhard Lisse wrote: > Hi, > > I am reading payment data like so > > 2010-01-01,100.00 > 2010-01-04,100.00 > ... > 2011-01-01,200.00 > 2011-01-07,100.00 > > and plot it aggregated per month like so > > library(zoo) > df <- read.csv("daily.csv", colClasses=c(d="Date",s="numeric")) > z <- zoo(df$s, df$d) > z.mo <- aggregate(z, as.yearmon, sum) > barplot(z.mo, col="darkblue") > > How do I get the monthly aggregated payments in different colors > next to each other (ie for each year in a different color with the x > axis showing the months)? > Read it in with read.zoo aggregating at the same time to yearmon class and then issue the appropriate lattice barchart command: Lines <- "Date,Value 2010-01-01,100.00 2010-01-04,100.00 2010-02-04,100.00 2011-01-01,200.00 2011-01-07,100.00 2011-02-07,100.00" library(zoo) z <- read.zoo(textConnection(Lines), header = TRUE, sep = ",", FUN = as.yearmon, aggregate = sum) library(lattice) year <- factor(as.numeric(floor(time(z)))) value <- coredata(z) month <- coredata(cycle(z)) barchart(value ~ month | year, horiz = FALSE, col = 1:12, origin = 0) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com From nfultz at ucla.edu Thu Jul 7 14:55:07 2011 From: nfultz at ucla.edu (Neal Fultz) Date: Thu, 7 Jul 2011 08:55:07 -0400 Subject: [R] datastructure for multi-choice factors In-Reply-To: <1310026705710-3650940.post@n4.nabble.com> References: <1310026705710-3650940.post@n4.nabble.com> Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From Barbara.Rogo at uniroma1.it Thu Jul 7 15:35:54 2011 From: Barbara.Rogo at uniroma1.it (Barbara.Rogo at uniroma1.it) Date: Thu, 7 Jul 2011 15:35:54 +0200 Subject: [R] vector of dates Message-ID: I have to construct a vector of date with a cycle "for". I use the function "seq", but when I allocate in a vector, this becomes a number!!! How do I have? thank you Example: dataval=as.Date("2011/07/01") date_val=seq(dataval,length=260,by="-7 day") date_inizio=c() date_condizione=c() for (k in 1:length(date_val)){ date_inizio[k]=seq(date_val[k],length=2,by="-5 years")[2] date_condizione[k]=seq(date_val[k],length=2,by="-2 years")[2] } From bbolker at gmail.com Thu Jul 7 15:49:04 2011 From: bbolker at gmail.com (Ben Bolker) Date: Thu, 7 Jul 2011 13:49:04 +0000 Subject: [R] Generalized Logistic and Richards Curve References: <1310038068.8504.YahooMailClassic@web120314.mail.ne1.yahoo.com> Message-ID: Vincy Pyne yahoo.ca> writes: > Dear R helpers, I am not a statistician and right now struggling > with Richards curve. Wikipedia says > (http://en.wikipedia.org/wiki/Generalised_logistic_function) The > "generalized logistic curve or function", also known as Richard's > curve is a widely-used and flexible sigmoid function for growth > modelling, extending the well-known logistic curve. Now I am > confused and will like to know if the Generalized Logistic > distribution as described in lmomco package is same as what > wikipedia is describing. In other words, is Generalized Logistic > Function same as Generalized logistic distribution? I do understand > there is separate R package "richards' for dealing with Richards > curve. Kindly guide Vincy [[alternative HTML version deleted]] I think not quite. In general it's unlikely that something described as a "function" will necessarily be the same as something described as a "distribution", since the latter (or at least its density function) has to integrate to 1 and the former doesn't ... Looking at 'cdfglo' in the lmomco manual gives F(x)=1/(1+exp(-y)) y = -k^{-1} log(1-k(x-xi)/alpha) (for k not equal 0) whereas wikipedia gives Y(t) = A + { K-A \over (1 + Q e^{-B(t - M)}) ^ {1 / \nu} } In order to carefully check whether these are the same (e.g. whether the density function (rather than the CDF which is given in the lmomco manual) is the same as the Richards curve) you would have to match up terms. One tip-off that they can't be identical is that 'cdfglo' has 3 parameters (location parameter xi, scale parameter alpha, shape parameter k) while the Richards has 5 (location M, scale B, scale ? Q, shape nu, lower value A, upper value K). I think they would be *nearly* equivalent for A=0, K=1 in the Richards (then M=xi,B=k/alpha, nu=alpha) but not quite. A little more algebra is required. If you tell us more about what you're trying to do you might get more useful advice ... From jholtman at gmail.com Thu Jul 7 15:52:41 2011 From: jholtman at gmail.com (jim holtman) Date: Thu, 7 Jul 2011 09:52:41 -0400 Subject: [R] vector of dates In-Reply-To: References: Message-ID: You are storing the results in to a vector that is converted to numeric; that is why you see the numbers. Try this: > dataval=as.Date("2011/07/01") > date_val=seq(dataval,length=260,by="-7 day") > date_inizio=c() > date_condizione=c() > for (k in 1:length(date_val)){ + date_inizio[k]=seq(date_val[k],length=2,by="-5 years")[2] + date_condizione[k]=seq(date_val[k],length=2,by="-2 years")[2] + } > str(date_inizio) num [1:260] 13330 13323 13316 13309 13302 ... > class(date_inizio) <- "Date" > str(date_inizio) Date[1:260], format: "2006-07-01" "2006-06-24" "2006-06-17" "2006-06-10" "2006-06-03" "2006-05-27" ... > On Thu, Jul 7, 2011 at 9:35 AM, wrote: > > ? I have to construct a vector of date with a cycle "for". I use the function > ? "seq", but when I allocate in a vector, this becomes a number!!! > ? How do I have? thank you > ? Example: > ? dataval=as.Date("2011/07/01") > ? date_val=seq(dataval,length=260,by="-7 day") > ? date_inizio=c() > ? date_condizione=c() > ? for (k in 1:length(date_val)){ > ? ? ? date_inizio[k]=seq(date_val[k],length=2,by="-5 years")[2] > ? ? ? date_condizione[k]=seq(date_val[k],length=2,by="-2 years")[2] > ? ? ? } > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From annemarie.verkerk at mpi.nl Thu Jul 7 13:25:16 2011 From: annemarie.verkerk at mpi.nl (Annemarie Verkerk) Date: Thu, 07 Jul 2011 13:25:16 +0200 Subject: [R] question about getting things out of an lapply In-Reply-To: References: <4E14B827.9070800@mpi.nl> Message-ID: <4E15979C.4040300@mpi.nl> Dear Josh, thanks for pointing this out - the idea behind writing this function is plotting gradients on branches of phylogenetic trees - 'tree' refers to a phylogenetic tree. It's easy to create a random phylogenetic tree in R: library(ape) library(plotrix) rtree(15) -> tree This gives you a tree with 15 taxa. You can plot it with plot() if you want to take a look. then the data - you can create a fake data set: rnorm(15, mean = 0.5, sd = 0.15) -> data for the data which the function needs, you also need: ace(data, tree) -> results data <- append(data,results$ace) names(data) <- NULL I also tried with the following updated code I still got the same error message: create.gradient <- function(i){ colorgrad01<-color.scale(seq(0,1,by=0.01), extremes=c("red","blue")) tree$edge[i,1] -> x tree$edge[i,2] -> y print(x) print(y) data[x] -> z data[y] -> z2 round(z, digits = 2) -> z round(z2, digits = 2) -> z2 z*100 -> z z2*100 -> z2 print(z) print(z2) colorgrad<-colorgrad01[z:z2] colorgrad } lapply(tree$edge, create.gradient) - Error in FUN(X[[26L]], ...) : subscript out of bounds I hope this help and you can replicate the problem too. Thanks! Annemarie Joshua Wiley wrote: > Dear Annemarie, > > Can you replicate the problem using a madeup dataset or one of the > ones built into R? It strikes me as odd to pass tree1$edge directly > to lapply, when it is also hardcoded into the function, but I do not > have a sense exactly for what you are doing and without data it is > hard to play around. > > Cheers, > > Josh > > On Wed, Jul 6, 2011 at 12:31 PM, Annemarie Verkerk > wrote: > >> Dear R-help subscribers, >> >> I have a quite stupid question about using lapply. I have the following >> function: >> >> create.gradient <- function(i){ >> colorgrad01<-color.scale(seq(0,1,by=0.01), extremes=c("red","blue")) >> tree1$edge[i,1] -> x >> > > this works, but it would typically be written: > > x <- tree1$edge[i, 1] > > flipping back and forth can be a smidge (about 5 pinches under an > iota) confusing. > > >> tree1$edge[i,2] -> y >> print(x) >> print(y) >> all2[x] -> z >> all2[y] -> z2 >> round(z, digits = 2) -> z >> round(z2, digits = 2) -> z2 >> z*100 -> z >> z2*100 -> z2 >> print(z) >> print(z2) >> colorgrad<-colorgrad01[z:z2] >> colorgrad >> } >> >> Basically, I want to pick a partial gradient out of a bigger gradient >> (colorgrad01) for values that are on row i, from a matrix called tree1. >> >> when I use lapply: >> >> lapply(tree1$edge, create.gradient) >> >> I get the following error message: >> >> Error in FUN(X[[27L]], ...) : subscript out of bounds >> >> I'm not sure what's wrong: it could be either fact that 'colorgrad' is a >> character string; i.e. consisting of multiple characters and not just one, >> or because 'i' doesn't come back in the object 'colorgrad' that it has to >> return. Or it could be something else entirely... >> >> In any case, what I prefer as output is a vector with all the different >> 'colorgrad's it generates with each run. >> >> Thanks a lot for any help you might be able to offer! >> Annemarie >> >> -- >> Annemarie Verkerk, MA >> Evolutionary Processes in Language and Culture (PhD student) >> Max Planck Institute for Psycholinguistics >> P.O. Box 310, 6500AH Nijmegen, The Netherlands >> +31 (0)24 3521 185 >> http://www.mpi.nl/research/research-projects/evolutionary-processes >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > > -- Annemarie Verkerk, MA Evolutionary Processes in Language and Culture (PhD student) Max Planck Institute for Psycholinguistics P.O. Box 310, 6500AH Nijmegen, The Netherlands +31 (0)24 3521 185 http://www.mpi.nl/research/research-projects/evolutionary-processes From ankur.verma at genpact.com Thu Jul 7 15:38:21 2011 From: ankur.verma at genpact.com (Verma, Ankur) Date: Thu, 7 Jul 2011 19:08:21 +0530 Subject: [R] Taking inputs from the user Message-ID: An embedded and charset-unspecified text was scrubbed... Name: not available URL: From alfreale74 at gmail.com Thu Jul 7 16:09:51 2011 From: alfreale74 at gmail.com (Alfredo Alessandrini) Date: Thu, 7 Jul 2011 16:09:51 +0200 Subject: [R] coefficients lm of data.frame Message-ID: Hi, I've a data frame like this: > as.data.frame(cbind(rnorm(1:12),rnorm(1:12))) V1 V2 1 -1.30849402 -0.52094136 2 0.96157302 0.76217871 3 -0.44223351 -1.72630871 4 -0.10432438 -1.04732942 5 -1.38748914 0.95877311 6 -0.63965975 0.65494811 7 -0.24058318 0.19496830 8 -0.11172988 1.01680655 9 0.08065333 0.22168589 10 0.25196536 0.84619914 11 -0.59536986 -0.08243074 12 1.09115054 0.49822977 I need to add two columns as result of the fitting of linear model based on a preset numbers of row. For example if I need to compute a lm each 4 rows, I get the data.frame below, where intercept1 and coeff1 is obtained from V1 and V2 of first 4 rows lm(V2 ~ V1), and so on... V1 V2 "intercept" "coeff" 1 0.6931694 0.05797771 intercept1 coeff1 2 -1.4069786 0.23983307 intercept1 coeff1 3 -1.4901708 0.45079601 intercept1 coeff1 4 0.2215696 1.87888983 intercept1 coeff1 5 -0.5828106 0.90376622 intercept2 coeff2 6 -0.7607985 0.71419938 intercept2 coeff2 7 0.1273495 0.06199312 intercept2 coeff2 8 -0.5612245 1.02223971 intercept2 coeff2 9 -0.1439178 0.92135354 intercept3 coeff3 10 -1.1011662 0.02894731 intercept3 coeff3 11 -0.4098710 -0.01231322 intercept3 coeff3 12 1.1511811 -0.63923140 intercept3 coeff3 Thanks in advance, Alfredo From dwinsemius at comcast.net Thu Jul 7 16:09:53 2011 From: dwinsemius at comcast.net (David Winsemius) Date: Thu, 7 Jul 2011 10:09:53 -0400 Subject: [R] Simple inheritance check fails (integer from numeric) In-Reply-To: <4E1583F2.5020404@googlemail.com> References: <4E1583F2.5020404@googlemail.com> Message-ID: On Jul 7, 2011, at 6:01 AM, Janko Thyson wrote: > Dear list, > > In a function, I don't care if my input has class 'integer' or > 'numeric', so I wanted to use 'inherits()' to control for that. > > However, this function tells me that an actual object of class > 'integer' does not inherit from class 'numeric'. The class def of > 'integer' does state 'numeric' as one of the superclasses. Isn't > that somewhat inconsistent? > > > getClass("integer") > Class "integer" [package "methods"] > > No Slots, prototype of class "integer" > > Extends: "numeric", "vector", "data.frameRowLabels" > > > a <- 1:3 > > class(a) > [1] "integer" > > > inherits(a, "numeric") > [1] FALSE > > a <- 1:3 > is.numeric(a) [1] TRUE -- David Winsemius, MD West Hartford, CT From jholtman at gmail.com Thu Jul 7 16:13:47 2011 From: jholtman at gmail.com (jim holtman) Date: Thu, 7 Jul 2011 10:13:47 -0400 Subject: [R] Taking inputs from the user In-Reply-To: References: Message-ID: Give them an Excel spreadsheet that they can fill in the values. They can then send the spreadsheet to you can you can have your R script read the information from it and send it back. You did not mention how they are supposed to"get the output". Do you want to setup a central server that can receive the email, run the script and then send back the results? On Thu, Jul 7, 2011 at 9:38 AM, Verma, Ankur wrote: > Hi, > > I am currently a new user in R and was working on the randomForest package. I am trying to predict price points using this statistical package. The issue is that I need to setup a tool so that I can give it to Sales Executive who can plug in the necessary variables and get the output. Is there a way to do that ?? They don't have R on their systems and I doubt they are going to install it. > > Need urgent help on this. > > Thanks, > Ankur > > This e-mail (and any attachments), is confidential and may be privileged. It may be read, copied and used only > by intended recipients. Unauthorized access to this e-mail (or attachments) and disclosure or copying of its > contents or any action taken in reliance on it is unlawful. Unintended recipients must notify the sender immediately > by e-mail/phone & delete it from their system without making any copies or disclosing it to a third person. > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? From janko.thyson.rstuff at googlemail.com Thu Jul 7 16:25:32 2011 From: janko.thyson.rstuff at googlemail.com (Janko Thyson) Date: Thu, 07 Jul 2011 16:25:32 +0200 Subject: [R] Simple inheritance check fails (integer from numeric) In-Reply-To: R